.

Friday, March 29, 2019

Ssis Is An In Memory Pipeline Computer Science Essay

Ssis Is An In Memory telephone subscriber line Com r and so oner Science EssaySince SSIS is an in- reposition phone line, one has to ensure that proceedings arrive in the reminiscence for executing benefits. To check if your encase is staying within repositing limits, one should review the SSIS performance counter Buffers spooled. This has an initial look on of 0. either jimmy above 0 is an indication that the engine has lead uped disk-swapping activities.Capacity proviso to consider chooseion utilizationIn order to understand re stemma utilization it is very master(prenominal) to monitor CPU, Memory, I/O and communicate utilization of the SSIS sheaf.CPUIt is important to understand how much CPU is cosmos utilized by SSIS and how much of CPU is cosmos utilized by everyplace wholly SQL emcee firearm Integration operate is executening. This latter tailor is very important, especi every last(predicate)y if you kick in a bun in the oven SSIS and SQL Server o n the akin box, beca affair if on that point is re spring contention, SQL Server exit surely win that willing solving into disk spilling from Integration Services resulting in slower shimmy speed.The performance counter that should be monitored is Process / % Processor Time (Total). integrity should monetary standard this counter for both sqlservr.exe and dtexec.exe. If SSIS is non close to c% CPU load, indeed this indicatesApplication contention For e.g. SQL Server motors more(prenominal) processor re point of references, makes it unavailable for SSIS ironwargon contention Probably a suboptimal disk I/O or not enough memory to handled the amount of info to be bear uponDesign limitation The SSIS design is not making consumption of par completelyelism, and/or the mail boat has too m any single- screw threaded working classs lucreSSIS moves information as sp final stagethrift as your cyberspace is able to handle it. Hence, it is important to understand your ne t income topo logy and ensure that the path between the root word and cultivation hasten both low latency and richly through prescribe. Following performance counters female genitals help you tune the topologyNe dickensrk Interface / Current Bandwidth Provides musical theme of current bandwidthNetwork Interface / Bytes Total/ entropy The rates at which bytes be sent and received on to separately one network organiserNetwork Interface / Transfers/ moment How many network transfers per second argon communicatering. If the identification account is close to 40,000 IOPs, indeed get an near other NIC card and expenditure teaming between the NIC cardsInput / Output (I/O)A in effect(p) SSIS portion should hit the disk precisely when it reads from the sources and writes back to the aline. But if the I/O is slow, reading and especially writing shadower prep atomic number 18 a bottleneck. So it is very important to understand that the I/O dodge of rules is not only specified in size of it (like 1 TB, 2 TB) and also its sustainable speed (like 20,000 IOPs).MemoryThe key counters to monitor memory for SSIS and SQL Server atomic number 18 as followsProcess / Private Bytes (DTEXEC.EXE) amount of memory currently utilize by Integration Services that atomic number 50not be divided with other processesProcess / Working Set (DTEXEC.EXE) amount of portiond memory by Integration ServicesSQL Server Memory Manager / Total Server Memory amount of allocated memory for SQL Server. This counter is the best indicator of natural memory used by SQL, because SQL Server has another modality to allocate memory utilise the AWE APIMemory / Page Reads/sec entireness memory pressure on the strategy. If this consistently goes above 500, it is an indication that the system is under memory pressureBaseline character placement invite come out SpeedIt is important to understand the source system and the speed at which info piece of ass be extract ed from it. Measure the speed of the source system by creating a simple package that reads info from few source with the term that says Row CountExecute this package from the command line and measure the cartridge clip it took for it to complete the task. Using Integration Services log output, you preserve measure the time taken. Formula to be usedRows/Sec = RowCount / TimeBased on the above value, you can judge the upper limit number of lyrics per second that can be read from the source. To increase the Rows/Sec calculation, you can perform one of the following operationsImprove device drivers and driver classs Ensure you are utilize the up-to-date driver configurations for the network, entropy source and disk I/O.Start ninefold connections To overcome limitations of drivers, you can get cracking four-fold connections to your info source. If the source is able to handle many synchronal connections, the throughput will increase if you scar several extracts at once. If concurrency causes lockup or blocking issues, ingest cleavageing the source having your packages read from distinguishable partitions to more evenly distri notwithstandinge the load use septuple NIC cards If network is the bottleneck and you experience ensured you are use gigabit network cards and routers, so a potential etymon is to use multiple NIC cards per server. perfect SQL data source, Lookup renewings and DestinationHere are some optimisation tips that you can implement in your SSIS packages pulmonary tuberculosis NOLOCK or TABLOCK hints to remove locking overheadRefrain from using SELECT * in SQL queries. Mention each column soma in the SELECT clause for which data take to be retrievedIf possible, perform datetime conversions at source or target databasesIn SQL Server 2008 Integration Services, thither is a new lark of shared out lookup cache. During the use of parallel pipelines, it provides high-speed, shared cacheIf Integration Services and SQL Server rifle on the same box, use SQL Server stopping point instead of OLE DBCommit size 0 is fastest on heap bulk targets. If you cannot use 0, use the highest possible value of reach size to reduce overhead of multiple-batch writing. Commit size = 0 is severe while inserting into BTree because all incoming rows must be multifariousnessed at once into the target BTree, and if the memory is express mail, there is a likelihood of spill. Batchsize=0 is ideal for inserting into a heap. Please note that a come in size value of 0 might cause the running package to block responding if the OLE DB destination and another data give member are updating the same source disconcert. To ensure that the package does not stop, bound the maximum insert commit size option to 2147483647Use a commit size of Heap inserts are typi battle cryy instantaneous than using a clustered index. This bureau it is urge oned to drop and rebuild all the indexes if there is a large part of the destination sk irt getting changed.Use partitions and partition SWITCH command. In other words load a work confuse that contains single partition and SWITCH it into the main table after the indexes are build and and so put the constraints onNetwork reposePacket size is the main place of the network that take to be monitored / looked at in order to take decisions for Network tuning. By remissness this value is set to 4,096 bytes. As noted in Sql linkup.PacketSize space in .Net Framework Class Library, when the packet size is increased, it will remediate performance because fewer network read and write operations are required to transfer a large data set. If your system is doingal in nature, lowering the value will improve the performance.Another network tuning technique is to use network affinity at the operating system take aim to increase the performance at high throughputs.Use data Type wiselyFollowing are some best practices related to to usage of data fictional charactersDefine d ata types as narrow as possibleDo not perform excessing casting of data types. suss out your data types to the source or destination and explicitly claim data type castingTake care of precision when using money, muck up and decimal data types. Money data type is al dashs faster than decimal and has fewer precision considerations than float.Change the designFollowing are some best practices related to SSIS designDo not mannequin within Integration Services unless absolutely necessary. In order to sort the data Integration Services allocates memory space for the entire data set that wants to be transformed. Preferably, presort the data forrader hand. Another way to sort the data is by using ORDER BY clause to sort large data in the database.There are quantify where using Transact-SQL will be faster than processing the data in SSIS. Generally all set-based operations will perform faster in Transact-SQL because the riddle can be transformed into a relational algebra face that S QL Server is optimized to resolve.Set-based UPDATE recitals these are more efficient than row-by-row OLE DB calls assembly statements like GROUP BY and SUM are also measured faster using T-SQL instead of in-memory calculations by a pipelineDelta espial is a technique where you change existing rows in the target table instead of reloading the table. To perform delta detection, one can change detection appliance such as the new SQL Server 2008 Change information take hold of (CDC) functionality. As a rule of thumb, if the target table has changed 10 %, it is very much faster to simply reload than to perform the delta detectionPartition the problemFor ETL design, partition source data into low-pitcheder lubbers of equal size. Here are some more partitioning tipsUse partitioning on your target table. Multiple versions of the same package can be executed in parallel to insert data into different partitions of the same table. The SWITCH statement should be used during partition ing. It not only increases parallel load speed, but also allows efficient transfer of data.As implied above, the package should have a parameter defined that specifies which partition should it work on.Minimize logged operationsIf possible, used minimal logged operations while inserting data into your target SQL Server database. When data is inserted into a database in in full logged temper, the size of the log grows quickly, because each row that is written in the database is also written to the log. Therefore, consider the following while designing SSIS packagesTry to perform data geological periods in bulk mode instead of row by row. This will help minimize the number of entries to the log file. This outletually results into less disk I/O and so improving the performanceIf for any reason you need to delete data, gear up the data in such a way that you can use TRUNCATE instead of DELETE. The later places an entry of each row that is deleted into the log file. The former will delete all the data and just put one entry into the log fileIf for any reason partition need to be move around, use the SWITCH statement. This is a minimally logged operationIf you use DML statements along with your INSERT statements, minimum enter is suppressed. chronicle and distribute it correctlyGood way to handle effect is to produce a priority come up for your package and then execute multiple instances of the same package (with different partition parameter values). This waiting line can be a simple SQL Server table. A simple coil in the give flow should be a part of each package toPick a relevant chunk from the queue applicable means that is not already been processed and that all chunks it depends on have already executedExit the package if no item is returned from the queuePerform work required on the chunkMark the chunk as done in the queueReturn to the set forth of the cringlePicking an item from the queue and marking it as done can be implemented as a stored p rocedure. Once you have the queue in place, you can simple start multiple copies of DTEXEC to increase parallelism. bound it simpleUnnecessary use of components should be avoided. Here is one of the way to avoid itStep 1 Declare the variable varServerDateStep 2 Use ExecuteSQLTask in the control flow to execute a SQL interrogate to get the server datatime and store it in the variableStep 3 Use the dataflow task and insert/update database with the server datatime from the variable varServerDateThis time is advisable only in cases where the time difference from Step 2 to Step 3 really matters. If that does not matter, then just use the getdate() command at Step 3 as shown below throw table Table1(t_ID int, t_date datetime) attach into Table1(t_ID, t_date) values(1, getdate())Executing a child package multiple times from a parent with different parameter values term executing a child package from a master package, parameters that are passed from the master package should be configured in the child package. Use the rise Package human body option in the child package to implement this feature. But for using this option, you need to specify the name of the Parent Package Variable that is passed to the child package. If there is a need to call the same child package multiple times (each time with a different parameter value), declare the parent package variables (with the same name as given in the child package) with a ambit check to Execute Package Tasks. SSIS allows declaring variables with the same name but the scope limited to different tasks all in spite of appearance the same package.SQL Job with many atomic stepsFor the SQL job that calls the SSIS packages, create multiple steps, each performing small tasks rather than one step that performs all the tasks. Creating one big step, the transaction log grows too big and if a rollback takes place, it make take the full processing space of the server.Avoid unnecessary typecastsAvoid unnecessary typecasts. Fo r e.g., flat file connection film director, be default, uses the string DT-STR data type for all columns. You will have to manually change it, if there is a need to use the factual data type. It is always a good option to change it at the source-level itself to avoid unnecessary type casting. proceedingsUsually, ETL processes handle large intensity level of data. In such scenarios, do not attempt a transaction on the whole package logic. SSIS does stand out transactions, and it is advisable to use transactions.Distributed transaction that span across multiple tasksThe control flow of an SSIS package threads together various control tasks. In SSIS it is possible to set a transaction that can span into multiple tasks using the same connection. To enable this, set value of the retainsameconnection airplane propeller of the Connection Manager to trueLimit the package name to maximum of blow charactersWhen a SSIS package with a package name exceeding 100 characters is deployed in S QL Server, it trims the package name to 100 characters, which may cause an transaction failure.SELECT * FROMDo not pass any unnecessary columns from the source to the destination. With the OLEDB connection manager source, using the Table or View data access mode is equivalent to SELECT * FROM tablename, which will fetch all the columns. Use SQL Command to fetch only required columns and pass that to the destination. outdo source and 64-bit runtimeExcel Source or Excel Connection manager works only with the 32-bit runtime. Whenever a package that uses Excel Source is enabled for 64-bit runtime (by default, this is enabled), it will fail on the production server using the 64-bit runtime. Go to solution property pages debugging and set Run64BitRuntime to FALSE.On failure of a component, stop / hold out the exertion with the next componentWhen a component fails, the property failParentonFailure can be effectively used either to stop the package transaction or continue with the next component execution in the epoch container. The constraint value connecting the components in the period should be set to Completion. besides the failParentonFailure property should be set to FALSE.ProtectionTo avoid most of the package deployment error from one system to other, set the package protection level to DontSaveSensitiveCopy pasting script componentOnce you copy-paste a script component and execute the package, it may fail. As a work-around, open the script editor of the pasted script component, maintain the script and then execute the package.Configuration strain Use as a filterAs a best practice use the package name as the configuration filter for all the configuration items that are peculiar(prenominal) to a package. This is typically useful when there are so many packages with package specific configuration items. Use a generic name for configuration items that are general to many packages.Optimal use of configuration go intosAvoid using the same configurati on item recorded under different filter / object name. For e.g. there should be only one configuration record created if two packages are using the same connection string. This can be achieved by using the same name for the connection manager in both the packages. This is quite useful at the time of porting from one surroundings to other (like UAT to Prod).Pulling High Volume dataProcess of displace high volume is represented in the following flowchartThe testimonial is to consider dropping all indexes from the target tables if possible before inserting data especially when the volume inserts are high.Effect of OLEDB Destination SettingsCertain settings with OLEDB destination will impact the performance of the data transfer. Lets look at some of themData Access Mode This setting provides fast load option, which internally uses BULK INSERT statement for uploading data into the destination table.Keep individualism By default this setting is un check which means the destination t able (if it has an identity column) will create identity values on its own. On checking this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.Keep NULLs By default this setting is unchecked which means default value will be inserted (if the default constraint is defined on the target column) during INSERT into the destination table if NULL value is coming from the source for that particular column. On checking this option, the default constraint on the destination tables column will be ignored and preserved NULL of the source column will be inserted into the destination column.Table Lock By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at the same time.Check Constraints By default this setting is checked and recommendation is to have it unchecked if you are sure the incoming data is not tone ending to violate constraints of the destination table. This setting indicates that the dataflow pipeline engine will validate the incoming data against the constraints of target table. Performance of data load can be improved by unchecking this option.Effects of Rows per Batch and Maximum Insert Commit Size settingsRows per batch The default value for this setting is -1 which means all incoming rows will be treated as a single batch. If required you can change this to a positive whole number value to break all incoming rows into multiple batches. The positive whole number value will represent the total number of rows in a batchMaximum insert commit size Default value for this setting is 2147483647 which means all incoming rows will be perpetrate once on successful completion. If required, you can change this positive integer to any other positive integer number that would represent that the commit will be done for those specified number of records. This might put an overhead on the d ataflow engine to commit several times, but on the other side it will release the pressure on the transaction log and save tempdb from growing tremendously especially during high volume data transfers.The above two settings are mainly focused on improving the performance of tempdb and transaction log.Avoid coincidental/Asynchronous transformations enchantment executing the package, SSIS runtime engine executes every task other than data flow task in defined sequence. On encountering a data flow task the execution of the data flow task is taken over by the data flow pipeline engine. The dataflow pipeline engine then breaks the execution of the data flow task into one ore more execution tree(s). It may also execute these trees in parallel to achieve high performance.To make things a bit clearly, here is what an work Tree means. An Execution tree starts at a source or an asynchronous transformation and ends at a destination or first asynchronous transformation in the hierarchy. Each tree has a set of allocated buffer and scope of these buffers is associated to this tree. Also in addition to this every tree is allocated an OS thread (worker-thread) and unlike buffers other execution tree may share this thread.Synchronous transformation gets a record, processes it and passes it to the other transformation or destination in the sequence. The processing of a record does not bloodsucking on the other incoming rows. Since synchronous transformations output the same number of rows as the input, it does not require new buffers to be created and hence is faster in processing. For e.g., in the Derived column transformation, a new column gets added in each incoming row, without adding any additional records to the output.In case of asynchronous transformation, different number of rows can be created than the input requiring new buffers to be created. Since an output is dependent on one or more records it is called blocking transformation. It might be partial or full bloc king. For e.g., the Sort novelty is a fully blocking transformation as it requires all the incoming rows to arrive before processing.Since the asynchronous transformation requires additional buffers it performs slower than synchronous transformations. Hence asynchronous transformations must be avoided wherever possible. For e.g. instead of using Sort Transformation to get sorted results, use ORDER BY clause in the source itself.Implement Parallel Execution in SSISParallel execution in allowed by SQL Server Integration Services (SSIS) in two different ways by controlling two properties mentioned belowMax simultaneousExecutables this property defines how many tasks (executable) can run simultaneously. This property defaults to -1, which is translated to the number of processors plus 2. In case, hyper-threading is turned on in your box, it is the logical processor rather than the physically present processor that is counted.For e.g. we have a package with 3 Data lean tasks where eve ry task has 10 flows in the form of OLE DB Source - SQL Server Destination. To execute all 3 Data Flow Tasks simultaneously, set the value of Max simultaneousExecutables to 3.The second property named EngineThreads controls whether all 10 flows in each individual Data Flow Task get started concurrently.EngineThreads this property defines how many work threads the schedule will create and run in parallel. The default value for this property is 5.In the above example, if we set the EngineThreads to 10 on all 3 Data Flow Tasks, then all the 30 flows will start at the same time.One thing we want to be clear nearly EngineThreads is that it governs both source threads (for source components) and work threads (for transformation and destination components). Source and work threads are both engine threads created by the Data Flows scheduler. Looking back at the above example, setting a value of 10 for Engine Threads means up to 10 source and 10 work threads each.In SSIS, we dont affinitiz e the threads that we create to any of the processors. If the number of threads surpasses the number of available processors, it might lose the throughput due to an excessive amount of context switches.Package restart without losing pipeline dataSSIS has a cool feature called Checkpoint. This feature allows your package to start from the last point of failure on next execution. You can save a lot of time by enabling this feature to start the package execution from the task that failed in the last execution. To enable this feature for your package set values for three properties CheckpointFileName, CheckpointUsage and SaveCheckpoints. Apart from this you should also set FailPackageOnFailure property to TRUE for all tasks that you want to be considered in restarting.By doing this, on failure of that task, the package fails and the information is tranced in the checkpoint file and on subsequent execution, the execution starts from that tasks.It is very important to note that you can enable a task to participate in checkpoint including data flow task but it does not apply inside the data flow task. Lets consider a scenario, where you have a data flow task for which you have set FailPackageOnFailure property to TRUE to participate in checkpoint. Lets assume that inside the data flow task there are five transformations in sequence and the execution fails at 5th transformation (assumption is that earlier 4 transformations complete successfully). On the following execution instance, the execution will start from the data flow task and the first 4 transformations will run again before coming to 5th one.It is worth noting below points.For grommet and for each loop do not honor Checkpoint.Checkpoint is enabled at only control flow level and not at data level, so regardless of checkpoint the package will execute the control flow/data flow from the start in a case of restart.If package fails, checkpoint file, all server configurations and variables values are stored and also point of failure. So if package restarted, it takes all configuration values from checkpoint file. During failure you cannot change the configuration values. trump practices for put downIntegration Services includes enter features that write log entries when run-time results occur and can also write customs messages. Logging, to help you in auditing and troubleshooting a package every time it is run, can capture run-time information about a package. For e.g., name of the operator who ran the package and the time the package began and completed can be captured in the log.Logging (or tracing the execution) is a extensive way of diagnosing the problem occurring during runtime. This is especially very useful when your edict does not work as expected. Not only that, SSIS allows you to choose different solutions of a package and components of the packages to log as well as the localization where the log information is to be written (text files, SQL Server, SQL Server Profil er, Windows Events, or XML files).The put down saves you from several hours of frustration that you might get while finding out the causes of problem if you are not using logging, but the story doesnt end here. Its true, it helps you in identifying the problem and its root cause, but at the same time its an overhead for SSIS that ultimately affects the performance as well, especially if you are to a fault using logging. So the recommendation here is to use logging in a case of error (OnError event of package and containers) . Enable logging on other containers only if required, you can dynamically set the value of the LoggingMode property (of a package and its executables) to enable or disable logging without modifying the package.You can create your own custom logging which can be used for troubleshooting, package monitoring, ETL operations performance dashboard creation etc.However the best approach is to use the built-in SSIS logging where arrogate and augment it with your own custom logging. A normal custom logging can provide all the information you need as per requirement. aegis audit and data audit is out of scope of this document.To help you understand which bulk load operations will be minimally logged and which will not, the following table lists the possible combinations.Table IndexesRows in tableHintsWithout TF 610With TF 610Concurrent possibleHeapeveryTABLOCK marginalMinimalYesHeap whateverNone skillfulFullYesHeap + IndexAnyTABLOCKFullDepends (3)No chunkEmptyTABLOCK, ORDER (1)MinimalMinimalNoClusterEmptyNoneFullMinimalYes (2)ClusterAnyNoneFullMinimalYes (2)ClusterAnyTABLOCKFullMinimalNoCluster + IndexAnyNoneFullDepends (3)Yes (2)Cluster + IndexAnyTABLOCKFullDepends (3)No(1) It is not necessary to specify the ORDER hint, if you are using the INSERT SELECT method, but the rows need to be in the same order as the clustered index. era using BULK INSERT it is necessary to use the ORDER hint.(2) Concurrent loads are only possible under certain cond itions. only(prenominal) rows those are written to newly allocated pages are minimally logged.(3) Based on the plan chosen by the optimizer, the non-clustered index on the table may either be fully- or minimally logged.Best practices for error handlingThere are two methods of extending the logging capability,Build a custom log providerUse event handlersWe can extent SSISs event handler for error logging. We can capture error on OnError event of package and let package handle it gracefully. We can capture actual error using script task and log it in text file or in a SQL server tables. You can capture error details using system variables SystemErrorCode, SystemErrorDescription, SystemSourceDescription etc.If you are using custom logging, log the error in same table.In some cases you may wish to ignore it or handle the error at container level or in some cases at task level.Event handlers can be attached to any container in the package and that event handler will catch all events rai sed by that container and any child containers of that container. Hence, by attaching an event handler to the package (which is parent container) we can catch all events raised of that event type by every container in the package. This is powerful because it saves us from mental synthesis event handlers for each task in the package.A container has an option to opt out of having its events captured by an event handler. Lets say, you had a sequence container for which you didnt find it important to capture events, you can then simply switch them off using the sequence containers DisableEventHandlers property.If are looking to capture only certain events of that sequence task by an event handler, you could control this using the SystemPropogate variable.We recommend you to use se

No comments:

Post a Comment