Introduction
Before heading towards the actual discussion about asynchronous data management, buffers, and execution trees, it is better to learn about the terminology, Traffic Management, first. If you have traveled to a metropolitan area or are living in a big city, ever wondered how the traffic management system works in large areas? It is just marvelous how traffic lights are managed and coordinated.
Today, you can travel to different areas within minutes or hours because of the efficiency of the traffic management system. So, the objective of designing a good management system is reducing congestion and redirecting traffic to less moving areas. And to gain more perfection in managing data you can also pursue MSBI Certification Training. Attaining this certification will help you grab a good salary package among all candidates.
Let us come around the SSIS now, where the SSIS engine works similar to a traffic management system of a metropolitan area.
The SSIS engine coordinates well to maintain the data flow for the effective processing of information and also manages data storage too. Knowing the SSIS Developer Career Growth and being one, you should know how the engine works and its various components for effective data management and brief understanding of differences between data flow vs control flow.
The Basic Architecture of SSIS
The objective of the SSIS tool is to extract, load, and transform the data into some meaningful content. Under the basic architecture of SSIS, there are a total of five components available:
- Data Flow: It basically deals with the source, destination, and data transformations.
- Control Flow: It majorly deals with containers and tasks.
- Event Handler: It is responsible for sending messages, and events.
- Package Explorer: It gives you a single view of all packages.
- Parameters: They are basically used to manage interactions among users.
In this blog, we will discuss the first two components of the SSIS Architecture and how they help in effective data processing and data transformation.
Get access to the complete SQL server self-learning module or else undergo SSIS Tutorial to be a professional in your field!
Learn SQL Server in the Easiest Way
- Learn from the videos
- Learn anytime anywhere
- Pocket-friendly mode of learning
- Complimentary eBook available
What is Data Flow & Control Flow in SSIS?
If we discuss the SSIS architecture in Data Management then it is vital for you to understand these two terms, Data flow, and Control flow. One excellent way to know about both terms is to check out the working of components and how they are connected together.
For the Control Flow, the task is the smallest piece of work that you have to complete on priority. If a particular task is not completed as needed, then you cannot switch to other tasks.
For the Data Flow, Data source, destination, and the data transformation are the major components and their functioning is highly different from a task. When Data is transformed into anything meaningful, it is done on the basis of logic how are components connected together, data management, etc.
Data Flow vs Control Flow – All You Should Know
Both Control Flow and Data Flow work similar to an objective of processing objects and there are connectors to make a bridge among them. Still, there is an ocean of differences between both of the terms that you will get to know in the upcoming paragraphs. For example, you cannot pass the data among components using control flow but it works as a task coordinator only.
The Professionals of JanBask Training have mentioned some of the major objectives of the control flow that you should know:
- It helps in orchestrating workflows.
- It is more process-oriented.
- It can help in executing tasks either serially or parallelly.
- It helps in processing data synchronously.
As we have discussed earlier, tasks in the control flow are majorly designed to execute either serially or parallelly. Further, it is possible to divide a task into multiple branches that can be executed in parallel one by one. Here, you can also define the priority about which task to be executed first. For example, if there are a few tasks that are connected in a linear fashion, then they can be serialized and only one task has to be executed at one time.
On the other side, Data Flow can split, merge, or branch tasks for parallel processing but it is absolutely different from the Control Flow in SSIS. Here are some of the most important features of Data Flow in SSIS.
- It is more information-oriented, not based on processes.
- It can group data together based on logic and transforming it into meaningful content.
- Multiple data batches can be coordinated for effective processing.
- Data is extracted from the source and loaded to the destination later.
The major difference between control flow and data flow in SSIS is that Control Flow can execute only one task at a time in a linear fashion. On the other hand, Data Flow can perform multiple transformations at the same time.
So this was it on SSIS control flow vs data flow, now let’s understand how data packets are executed in SSIS.
Sit for free demo classes of SSIS Server Corporate Training now, new batches available!
SQL Server Training & Certification
- No cost for a Demo Class
- Industry Expert as your Trainer
- Available as per your schedule
- Customer Support Available
How are data packets executed in SSIS?
Now, the question comes, how to calculate the time taken by control flow in the execution of different tasks throughout the process. It can be done by considering the first control flow task in mind and keep noticing the time until the last task is being executed. It does not matter either they are control flow tasks or Data Flow transformations, but you have to keep track of each and everything to optimize the workflow and manage the overall processing time.
There are certain stages when data has to be executed both serially and parallelly. When data is executed serially, then the second task always has to wait until the first task gets finished. On the other hand, when tasks are executed in parallel, then multiple tasks can be managed together and it also reduces the overall data processing.
Workflow Management in Control Flow
Control Flow is also about tasks and containers and it is directly related to the workflow concept in SSIS, where you can use ETL tools to extract, transform, and load the data. Control Flow in SSIS is designed to execute tasks serially and parallelly or you can use a combination of both. If we talk about the control flow tasks,they are synchronous in nature and one task has to be completed first before it jumps to the second one. However, it is also possible that design a control flow task that is not related to constraints of other tasks but they are connected synchronously. In brief, a package cannot halt the execution of a task but it will continue to execute until it does not complete. So, SSIS will not release the task until it will get a clear status, either pass or fail.
Keep in mind that synchronous data flow should not be mixed with synchronous data transformations. There is a slight difference and you have to understand it well when working with the SSIS tool.
In brief, a package cannot halt the execution of a task but it will continue to execute until it does not complete. So, SSIS will not release the task until it will get a clear status, either pass or fail.
While working you should always remember that the synchronous data flow should not be mixed with synchronous data transformations. There is a slight difference and you have to understand it well when working with the SSIS tool.
NOTE: Test your knowledge and Play SQL Quiz TODAY!!
Data Transformations in Data Flow
Data Flow is the primary step in the SSIS where data is managed, transformed, integrated, and cleansed. Data flow can also be defined as the data pipeline. For example, each house has a primary water source that distributes water into different outlets across the house.
As soon as you focus on one outlet, water starts flowing from it but the primary source would always be the same. If you turn off all water outlets in a house then the pressure of water is automatically redirected to the primary source. Water will not come out until you release the pressure.
At the same time, when you turn on all water outlets together then it may affect the speed and it may cause various problems too. Let us connect the same concept to the Data Flow in SSIS. For example, if we want to stream 10K rows every day and you don’t have that many resources available to process this data together, then it may create backward pressure on the source and you have to reduce the count to process them effectively.
Also, you have to focus on a few concepts like Memory buffer architecture, types of transformation, mode of communication, and execution trees, etc for effective execution of the data. Once data architecture is reviewed well, then you can shift to monitoring packages and determine how the SSIS engine works on the data processing.
NOTE: Join our E-Learning Community for the right guidance and career support!
Memory Buffer Architecture in SSIS
You can manage data in groups using buffers. A buffer is a memory block where you can store rows or columns of the data and apply the transformation. In simple words, you have to extract data from sources and store them into buffers. The size of a memory buffer can be defined dynamically as needed. Further, they are connected in a pipeline where one data transformation is passed or shared by others.
Moving ahead, let us discuss the different types of transformations in Data Flow. The major level of difference between them is the way they communicate with each other and how one transformation is passed to others. They are divided on the basis of two categories, Blocking nature and the communication system. Transformations can be blocking, non-blocking, or the semi-blocking. Communication can be either synchronous or asynchronous in nature.
Blocking Nature of Transformations
The obvious difference between different types of transformations is their blocking nature that could be blocking, non-blocking or semi-blocking in nature. It will help you to determine whether data is passed downstream immediately or after the data is fully received. By default, it is very tough defining the blocking nature of the transformation. However, when we work on the second phase, that is how communication is established among transformation, then we can understand how an engine can perform different transformations together.
You will be amazed knowing that most of the SSIS transformations are non-blocking in nature. Two more categories of non blocking transformation are streaming and row-based transformations. You have to check the category based on their configurations. For the streaming transformations, it is easy to apply transformations logic immediately. These transformations are more related to the CPU resources, so they are termed as streaming transformations.
Another common type of non-blocking transformation is row-based, where data can be quickly transformed and passed to another logic and saved to the memory buffer. It describes that rows flowing through the transformation should be acted one by one, with a requirement to manage interaction with some outside process like database, files, or components.
Semi-Blocking & Blocking Transformations
Another common category is semi-blocking transformations. Only a few transformations are semi-blocking in nature. Union All, Merge, Merge Join transformations are semi-blocking in nature and they need proper resources to be sorted and merged. Please keep in mind that semi-blocking transformations generally need a little more resource when compared to non-blocking transformations, and buffers have to stay in memory until the complete data is received.
The final category is blocking transformation, where we have to deal with upstream of the data before it is loaded to the destination. There are only a few blocking transformations when compared to non-blocking ones because only a few require the blocking permission to complete an operation. Two most popular examples of blocking transformations are sort and aggregate feature.
Blocking transformations are usually more intensive to resources when compared to other types of transformations. Firstly, they need a lot of memory for data storage. Secondly, it puts a heavy burden on the processor for performing data aggregation, data sorting, etc.
Synchronous and Asynchronous Transformation
Till the time, we have discussed the blocking nature of transformations. Now, we will discuss how they communicate together either synchronously or asynchronously. It would be better to discuss asynchronous data transfer first. Any transformation is termed as asynchronous in nature if buffers used for the input are different from buffers used in the output.
Sometimes, there may be performance issues, so store a copy of the data to accomplish the desired output. One of the major components of the SSIS engine is the buffer manager and to manage asynchronous outputs, you have to work on buffer manager and decommission buffers to use it somewhere else.
Asynchronous data transfer is one when buffers are quickly handed over to the next transformation based on the data logic. If you will discuss the concept closely, then there is certain overlapping between streaming transformation and synchronous transformations. It is important using the word buffer here because the same buffers are used for input and output during the data transformation.
Sign up for the Microsoft SSIS Server Corporate Training for a growing career!
SQL Server Training & Certification
- Personalized Free Consultation
- Access to Our Learning Management System
- Access to Our Course Curriculum
- Be a Part of Our Free Demo Class
Final Words
The above discussion gives you a clear picture of Data flow control flow architecture in SSIS and the difference between control flow and data flow in SSIS. You must have a sound idea of how both are different from each other and how they can help you in robust data processing and data transformations.
We also discuss the type of transformations, their blocking nature, and the communication modes. You will need an IT background to understand all these concepts in depth. With this blog, we have tried to give you a sound explanation of data flow vs control flow around a complete architecture of SSIS.
What’s Next?
The objective of the SSIS tool is to extract, transform, and load the data. It is used by almost every organization today generating huge data daily. It is the reason that the number of job options in SSIS is also huge. If you are planning to start a career in SSIS, then you have come to the right place.
We strongly recommend you to join the online SSIS training program at JanBask Training to give new heights to your career and grow immensely. If you have been looking to learn about data flow vs control flow, here you will get the deep learning on the difference between control flow and data flow in SSIS and other SSIS related concepts. Also, take a free demo class before making the final enrollment to have a great exposure to online learning environment in depth.
We wish you great luck for a successful career journey in the MS SQL Server Certification Course space ahead
SQL Server Course
Upcoming Batches
Trending Courses
Cyber Security
- Introduction to cybersecurity
- Cryptography and Secure Communication
- Cloud Computing Architectural Framework
- Security Architectures and Models
Upcoming Class
-0 day 13 Sep 2024
QA
- Introduction and Software Testing
- Software Test Life Cycle
- Automation Testing and API Testing
- Selenium framework development using Testing
Upcoming Class
-0 day 13 Sep 2024
Salesforce
- Salesforce Configuration Introduction
- Security & Automation Process
- Sales & Service Cloud
- Apex Programming, SOQL & SOSL
Upcoming Class
8 days 21 Sep 2024
Business Analyst
- BA & Stakeholders Overview
- BPMN, Requirement Elicitation
- BA Tools & Design Documents
- Enterprise Analysis, Agile & Scrum
Upcoming Class
-0 day 13 Sep 2024
MS SQL Server
- Introduction & Database Query
- Programming, Indexes & System Functions
- SSIS Package Development Procedures
- SSRS Report Design
Upcoming Class
7 days 20 Sep 2024
Data Science
- Data Science Introduction
- Hadoop and Spark Overview
- Python & Intro to R Programming
- Machine Learning
Upcoming Class
-0 day 13 Sep 2024
DevOps
- Intro to DevOps
- GIT and Maven
- Jenkins & Ansible
- Docker and Cloud Computing
Upcoming Class
6 days 19 Sep 2024
Hadoop
- Architecture, HDFS & MapReduce
- Unix Shell & Apache Pig Installation
- HIVE Installation & User-Defined Functions
- SQOOP & Hbase Installation
Upcoming Class
-0 day 13 Sep 2024
Python
- Features of Python
- Python Editors and IDEs
- Data types and Variables
- Python File Operation
Upcoming Class
15 days 28 Sep 2024
Artificial Intelligence
- Components of AI
- Categories of Machine Learning
- Recurrent Neural Networks
- Recurrent Neural Networks
Upcoming Class
8 days 21 Sep 2024
Machine Learning
- Introduction to Machine Learning & Python
- Machine Learning: Supervised Learning
- Machine Learning: Unsupervised Learning
Upcoming Class
21 days 04 Oct 2024
Tableau
- Introduction to Tableau Desktop
- Data Transformation Methods
- Configuring tableau server
- Integration with R & Hadoop
Upcoming Class
-0 day 13 Sep 2024