Before heading towards the actual discussion about asynchronous data management, buffers, and execution trees, it is better to learn about the terminology, Traffic Management, first. If you have traveled to a metropolitan area or areliving in a big city, ever wondered how the traffic management system works in large areas? It is just marvelous how traffic lights are managed and coordinated.
Today, you can travel to different areas within minutes or hours because of the efficiency of the traffic management system. So, the objective of designing a good management system is reducing congestion and redirecting traffic to less moving areas. Let us come around the SSIS now, where the SSIS engine works similar to a traffic management system of a metropolitan area.
The SSIS engine coordinates well to maintain the data flow for the effective processing of information and also manages data storage too. When working as an SSIS developer, you should know how the engine works and its various components for effective data management and brief understanding of differences between data flow vs control flow. In this blog, we will mainly discuss the SSIS control flow vs data flow, data flow control architecture in the SSIS, which leads to better designing of a software system with well-processed information. Before understanding the data flow vs control flow, let’s understand the basic architecture of SSIS first.
The objective of the SSIS tool is to extract, load, and transform the data into some meaningful content. Under the basic architecture of SSIS, there are a total of five components available:
In this blog, we will discuss the first two components of the SSIS Architecture and how they help in effective data processing and data transformation.
Get access to complete SQL server self learning module here!
Learn SQL Server in the Easiest Way
If we discuss the SSIS architecture then it is vital for you to understand these two terms, Data flow, and Control flow. One excellent way to know about both terms is to check out the working of components and how they are connected together.
For the Control Flow, the task is the smallest piece of work that you have to complete on priority. If a particular task is not completed as needed, then you cannot switch to other tasks.
For the Data Flow, Data source, destination, and the data transformation are the major components and their functioning is highly different from a task. When Data is transformed into anything meaningful, it is done on the basis of logic how are components connected together, data management, etc.
Both Control Flow and Data Flow work similar to an objective of processing objects and there are connectors to make a bridge among them. Still, there is an ocean of differences between both of the terms. For example, you cannot pass the data among components using control flow but it works as a task coordinator only. Here are some of the major objectives of the control flow.
As we have discussed earlier, tasks in the control flow are majorly designed to execute either serially or parallelly. Further, it is possible to divide a task into multiple branches that can be executed in parallel one by one. Here,you can also define the priority about which task to be executed first. For example, if there are a few tasks that are connected in a linear fashion, then they can be serialized and only one task has to be executed at one time.
On the other side, Data Flow can split, merge, or branch tasks for parallel processing but it is absolutely different from the Control Flow in SSIS. Here are some of the most important features of Data Flow in SSIS.
The major difference between control flow and data flow in SSIS is that Control Flow can execute only one task at a time in a linear fashion. On the other hand, Data Flow can perform multiple transformations at the same time.
So this was it on SSIS control flow vs data flow, now let’s understand how data packets are executed in SSIS.
Sit for free demo classes of SSIS Server Corporate Training now, new batches available!
SQL Server Training & Certification
Now, the question comes, how to calculate the time taken by control flow in the execution of different tasks throughout the process. It can be done by considering the first control flow task in mind and keep noticing the time until the last task is being executed. It does not matter either they are control flow tasks or Data Flow transformations, but you have to keep track of each and everything to optimize the workflow and manage the overall processing time.
There are certain stages when data has to be executed both serially and parallelly. When data is executed serially, then the second task always has to wait until the first task gets finished. On the other hand, when tasks are executed in parallel, then multiple tasks can be managed together and it also reduces the overall data processing.
Control Flow is also about tasks and containers and it is directly related to the workflow concept in SSIS, where you can use ETL tools to extract, transform, and load the data. Control Flow in SSIS is designed to execute tasks serially and parallelly or you can use a combination of both. If we talk about the control flow tasks,they are synchronous in nature and one task has to be completed first before it jumps to the second one. However, it is also possible thatdesigning a control flow task that is not related to constraints of other tasks but they are connected synchronously. In brief, a package cannot halt the execution of a task but it will continue to execute until it does not complete. So, SSIS will not release the task until it will get a clear status, either pass or fail.
Keep in mind that synchronous data flow should not be mixed with synchronous data transformations. There is a slight difference and you have to understand it well when working with the SSIS tool.
Data Flow is the primary step in the SSIS where data is managed, transformed, integrated, and cleansed. Data flow can also be defined as the data pipeline. For example, each house has a primary water source that distributes water into different outlets across the house.
As soon as you focus on one outlet, water starts flowing from it but the primary source would always be the same. If you turn off all water outlets in a house then the pressure of water is automatically redirected to the primary source. Water will not come out until you release the pressure.
At the same time, when you turn on all water outlets together then it may affect the speed and it may cause various problems too. Let us connect the same concept to the Data Flow in SSIS. For example, if we want to stream 10K rows everyday and you don’t have that many resources available to process this data together, then it may create backward pressure on the source and you have to reduce the count to process them effectively.
Also, you have to focus on a few concepts like Memory buffer architecture, types of transformation, mode of communication, and execution trees, etc for effective execution of the data. Once data architecture is reviewed well, then you can shift for monitoring packages and determine how the SSIS engine works on the data processing.
You can manage data in groups using buffers. A buffer is a memory block where you can store rows or columns of the data and apply the transformation. In simple words, you have to extract data from sources and store them into buffers. The size of a memory buffer can be defined dynamically as needed. Further, they are connected in a pipeline where one data transformation is passed or shared by others.
Moving ahead, let us discuss the different types of transformations in Data Flow. The major level of difference between them is the way they communicate with each other and how one transformation is passed to others. They are divided on the basis of two categories, Blocking nature and the communication system. Transformations can be blocking, non-blocking, or the semi-blocking. Communication can be either synchronous or asynchronous in nature.
The obvious difference between different types of transformations is their blocking nature that could be blocking, non-blocking or semi-blocking in nature. It will help you to determine whether data is passed downstream immediately or after the data is fully received. By default, it is very tough defining the blocking nature of the transformation. However, when we work on the second phase, that is how communication is established among transformation, then we can understand how an engine can perform different transformations together.
You will be amazed knowing that most of the SSIS transformations are non-blocking in nature. Two more categories of non blocking transformation are streaming and row-based transformations. You have to check the category based on their configurations. For the streaming transformations, it is easy to apply transformations logic immediately. These transformations are more related to the CPU resources, so they are termed as streaming transformations.
Another common type of non-blocking transformation is row-based, where data can be quickly transformed and passed to another logic and saved to the memory buffer. It describes that rows flowing through the transformation should be acted one by one, with a requirement to manage interaction with some outside process like database, files, or components.
Another common category is semi-blocking transformations. Only a few transformations are semi-blocking in nature. Union All, Merge, Merge Join transformations are semi-blocking in nature and they need proper resources to be sorted and merged. Please keep in mind that semi-blocking transformations generally need a little more resource when compared to non-blocking transformations, and buffers have to stay in memory until the complete data is received.
The final category is blocking transformation, where we have to deal with upstream of the data before it is loaded to the destination. There are only a few blocking transformations when compared to non-blocking ones because only a few require the blocking permission to complete an operation. Two most popular examples of blocking transformations are sort and aggregate feature.
Blocking transformations are usually more intensive to resources when compared to other types of transformations. Firstly, they need a lot of memory for data storage. Secondly, it puts a heavy burden on the processor for performing data aggregation, data sorting, etc.
Till the time, we have discussed the blocking nature of transformations. Now, we will discuss how they communicate together either synchronously or asynchronously. It would be better to discuss asynchronous data transfer first. Any transformation is termed as asynchronous in nature if buffers used for the input are different from buffers used in the output.
Sometimes, there may be performance issues, so store a copy of the data to accomplish the desired output. One of the major components of the SSIS engine is the buffer manager and to manage asynchronous outputs, you have to work on buffer manager and decommission buffers to use it somewhere else.
Asynchronous data transfer is one when buffers are quickly handed over to the next transformation based on the data logic. If you will discuss the concept closely, then there is certain overlapping between streaming transformation and synchronous transformations. It is important using the word buffer here because the same buffers are used for input and output during the data transformation.
Sign up for the Microsoft SSIS Server Corporate Training for a growing career!
SQL Server Training & Certification
The above discussion gives you a clear picture of Data flow control flow architecture in SSIS and the difference between control flow and data flow in SSIS. You must have a sound idea of how both are different from each other and how they can help you in robust data processing and data transformations.
We also discuss the type of transformations, their blocking nature, and the communication modes. You will need an IT background to understand all these concepts in depth. With this blog, we have tried to give you a sound explanation of data flow vs control flow around a complete architecture of SSIS.
The objective of the SSIS tool is to extract, transform, and load the data. It is used by almost every organization today generating huge data daily. It is the reason that the number of job options in SSIS is also huge. If you are planning to start a career in SSIS, then you have come to the right place.
We strongly recommend you to join the online SSIS training program at JanBask Training to give new heights to your career and grow immensely. If you have been looking to learn about data flow vs control flow, here you will get the deep learning on the difference between control flow and data flow in SSIS and other SSIS related concepts. Also, take a free demo class before making the final enrollment to have a great exposure to online learning environment in depth.
We wish you great luck for a successful career journey in the SQL Server space ahead!
A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.
MS SQL Server
Receive Latest Materials and Offers on SQL Server Course