Our Support: During the COVID-19 outbreak, we request learners to CALL US for Special Discounts!

- SQL Server Blogs -

What is the Difference Between Control Flow & Data Flow in SSIS?



Introduction

Before heading towards the actual discussion about asynchronous data management, buffers, and execution trees, it is better to learn about the terminology, Traffic Management, first. If you have traveled to a metropolitan area or areliving in a big city, ever wondered how the traffic management system works in large areas? It is just marvelous how traffic lights are managed and coordinated.

Today, you can travel to different areas within minutes or hours because of the efficiency of the traffic management system. So, the objective of designing a good management system is reducing congestion and redirecting traffic to less moving areas. Let us come around the SSIS now, where the SSIS engine works similar to a traffic management system of a metropolitan area.

Read: Most Popular SQL Server Performance Tuning Tips

The SSIS engine coordinates well to maintain the data flow for the effective processing of information and also manages data storage too. When working as an SSIS developer, you should know how the engine works and its various components for effective data management and brief understanding of differences between data flow vs control flow. In this blog, we will mainly discuss the SSIS control flow vs data flow, data flow control architecture in the SSIS, which leads to better designing of a software system with well-processed information. Before understanding the data flow vs control flow, let’s understand the basic architecture of SSIS first.

The Basic Architecture of SSIS

The objective of the SSIS tool is to extract, load, and transform the data into some meaningful content. Under the basic architecture of SSIS, there are a total of five components available:

The Basic Architecture of SSIS

  • Data Flow: It basically deals with the source, destination, and data transformations.
  • Control Flow: It majorly deals with containers and tasks.
  • Event Handler: It is responsible for sending messages, and events.
  • Package Explorer: It gives you a single view of all packages.
  • Parameters: They are basically used to manage interactions among users.

In this blog, we will discuss the first two components of the SSIS Architecture and how they help in effective data processing and data transformation.

Get access to complete SQL server self learning module here!

Learn SQL Server in the Easiest Way

  • Learn from the videos
  • Learn anytime anywhere
  • Pocket-friendly mode of learning
  • Complimentary eBook available

What is Data Flow & Control Flow in SSIS?

If we discuss the SSIS architecture then it is vital for you to understand these two terms, Data flow, and Control flow. One excellent way to know about both terms is to check out the working of components and how they are connected together.

For the Control Flow, the task is the smallest piece of work that you have to complete on priority. If a particular task is not completed as needed, then you cannot switch to other tasks.

Read: SSRS Tutorial for Beginners

For the Data Flow, Data source, destination, and the data transformation are the major components and their functioning is highly different from a task. When Data is transformed into anything meaningful, it is done on the basis of logic how are components connected together, data management, etc.

data flow vs control flow

Data Flow vs Control Flow – All You Should Know

Both Control Flow and Data Flow work similar to an objective of processing objects and there are connectors to make a bridge among them. Still, there is an ocean of differences between both of the terms. For example, you cannot pass the data among components using control flow but it works as a task coordinator only. Here are some of the major objectives of the control flow.

  • It helps in orchestrating workflows.
  • It is more process-oriented.
  • It can help in executing tasks either serially or parallelly.
  • It helps in processing data synchronously.

As we have discussed earlier, tasks in the control flow are majorly designed to execute either serially or parallelly. Further, it is possible to divide a task into multiple branches that can be executed in parallel one by one. Here,you can also define the priority about which task to be executed first. For example, if there are a few tasks that are connected in a linear fashion, then they can be serialized and only one task has to be executed at one time.

Read: What is Foreign Key in SQL? How to Set, Add, Create & Use of Foreign Key

On the other side, Data Flow can split, merge, or branch tasks for parallel processing but it is absolutely different from the Control Flow in SSIS. Here are some of the most important features of Data Flow in SSIS.

  • It is more information-oriented, not based on processes.
  • It can group data together based on logic and transforming it into meaningful content.
  • Multiple data batches can be coordinated for effective processing.
  • Data is extracted from the source and loaded to the destination later.

The major difference between control flow and data flow in SSIS is that Control Flow can execute only one task at a time in a linear fashion. On the other hand, Data Flow can perform multiple transformations at the same time.

So this was it on SSIS control flow vs data flow, now let’s understand how data packets are executed in SSIS.

Sit for free demo classes of SSIS Server Corporate Training now, new batches available!

SQL Server Training & Certification

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

How are data packets executed in SSIS?

Now, the question comes, how to calculate the time taken by control flow in the execution of different tasks throughout the process. It can be done by considering the first control flow task in mind and keep noticing the time until the last task is being executed. It does not matter either they are control flow tasks or Data Flow transformations, but you have to keep track of each and everything to optimize the workflow and manage the overall processing time.

There are certain stages when data has to be executed both serially and parallelly. When data is executed serially, then the second task always has to wait until the first task gets finished. On the other hand, when tasks are executed in parallel, then multiple tasks can be managed together and it also reduces the overall data processing.

Workflow Management in Control Flow

Control Flow is also about tasks and containers and it is directly related to the workflow concept in SSIS, where you can use ETL tools to extract, transform, and load the data. Control Flow in SSIS is designed to execute tasks serially and parallelly or you can use a combination of both. If we talk about the control flow tasks,they are synchronous in nature and one task has to be completed first before it jumps to the second one. However, it is also possible thatdesigning a control flow task that is not related to constraints of other tasks but they are connected synchronously. In brief, a package cannot halt the execution of a task but it will continue to execute until it does not complete. So, SSIS will not release the task until it will get a clear status, either pass or fail.

Read: SSRS Sub Reports and deployment process-How to do it

Keep in mind that synchronous data flow should not be mixed with synchronous data transformations. There is a slight difference and you have to understand it well when working with the SSIS tool.

Data Transformations in Data Flow

Data Flow is the primary step in the SSIS where data is managed, transformed, integrated, and cleansed. Data flow can also be defined as the data pipeline. For example, each house has a primary water source that distributes water into different outlets across the house.

Read: How to Install Microsoft SQL Server Express

As soon as you focus on one outlet, water starts flowing from it but the primary source would always be the same. If you turn off all water outlets in a house then the pressure of water is automatically redirected to the primary source. Water will not come out until you release the pressure.

At the same time, when you turn on all water outlets together then it may affect the speed and it may cause various problems too. Let us connect the same concept to the Data Flow in SSIS. For example, if we want to stream 10K rows everyday and you don’t have that many resources available to process this data together, then it may create backward pressure on the source and you have to reduce the count to process them effectively.

Read: Power BI - Getting Started with Query Editor in Power BI

Also, you have to focus on a few concepts like Memory buffer architecture, types of transformation, mode of communication, and execution trees, etc for effective execution of the data. Once data architecture is reviewed well, then you can shift for monitoring packages and determine how the SSIS engine works on the data processing.

Memory Buffer Architecture in SSIS

You can manage data in groups using buffers. A buffer is a memory block where you can store rows or columns of the data and apply the transformation. In simple words, you have to extract data from sources and store them into buffers. The size of a memory buffer can be defined dynamically as needed. Further, they are connected in a pipeline where one data transformation is passed or shared by others.

Moving ahead, let us discuss the different types of transformations in Data Flow. The major level of difference between them is the way they communicate with each other and how one transformation is passed to others. They are divided on the basis of two categories, Blocking nature and the communication system. Transformations can be blocking, non-blocking, or the semi-blocking. Communication can be either synchronous or asynchronous in nature.

Read: Introducing SSIS Architecture & DW Concepts Overview

Blocking Nature of Transformations

The obvious difference between different types of transformations is their blocking nature that could be blocking, non-blocking or semi-blocking in nature. It will help you to determine whether data is passed downstream immediately or after the data is fully received. By default, it is very tough defining the blocking nature of the transformation. However, when we work on the second phase, that is how communication is established among transformation, then we can understand how an engine can perform different transformations together.

You will be amazed knowing that most of the SSIS transformations are non-blocking in nature. Two more categories of non blocking transformation are streaming and row-based transformations. You have to check the category based on their configurations. For the streaming transformations, it is easy to apply transformations logic immediately. These transformations are more related to the CPU resources, so they are termed as streaming transformations.

Read: SQL Server Recovery Models-Simple, Full and Bulk Log

Another common type of non-blocking transformation is row-based, where data can be quickly transformed and passed to another logic and saved to the memory buffer. It describes that rows flowing through the transformation should be acted one by one, with a requirement to manage interaction with some outside process like database, files, or components.

Semi-Blocking & Blocking Transformations

Another common category is semi-blocking transformations. Only a few transformations are semi-blocking in nature. Union All, Merge, Merge Join transformations are semi-blocking in nature and they need proper resources to be sorted and merged. Please keep in mind that semi-blocking transformations generally need a little more resource when compared to non-blocking transformations, and buffers have to stay in memory until the complete data is received.

Read: What is SQL Subquery? Types of Subqueries in SQL

The final category is blocking transformation, where we have to deal with upstream of the data before it is loaded to the destination. There are only a few blocking transformations when compared to non-blocking ones because only a few require the blocking permission to complete an operation. Two most popular examples of blocking transformations are sort and aggregate feature.

Blocking transformations are usually more intensive to resources when compared to other types of transformations. Firstly, they need a lot of memory for data storage. Secondly, it puts a heavy burden on the processor for performing data aggregation, data sorting, etc.

Read: Different Types of SQL Server & SQL Database Functions

Synchronous and Asynchronous Transformation

Till the time, we have discussed the blocking nature of transformations. Now, we will discuss how they communicate together either synchronously or asynchronously. It would be better to discuss asynchronous data transfer first. Any transformation is termed as asynchronous in nature if buffers used for the input are different from buffers used in the output.

Read: Difference Between Clustered and Non-Clustered Index in the SQL

Sometimes, there may be performance issues, so store a copy of the data to accomplish the desired output. One of the major components of the SSIS engine is the buffer manager and to manage asynchronous outputs, you have to work on buffer manager and decommission buffers to use it somewhere else.

Asynchronous data transfer is one when buffers are quickly handed over to the next transformation based on the data logic. If you will discuss the concept closely, then there is certain overlapping between streaming transformation and synchronous transformations. It is important using the word buffer here because the same buffers are used for input and output during the data transformation.

Sign up for the Microsoft SSIS Server Corporate Training for a growing career!

SQL Server Training & Certification

  • Personalized Free Consultation
  • Access to Our Learning Management System
  • Access to Our Course Curriculum
  • Be a Part of Our Free Demo Class

Final Words

The above discussion gives you a clear picture of Data flow control flow architecture in SSIS and the difference between control flow and data flow in SSIS. You must have a sound idea of how both are different from each other and how they can help you in robust data processing and data transformations.

We also discuss the type of transformations, their blocking nature, and the communication modes. You will need an IT background to understand all these concepts in depth. With this blog,  we have tried to give you a sound explanation of data flow vs control flow around a complete architecture of SSIS.

What’s Next?

The objective of the SSIS tool is to extract, transform, and load the data. It is used by almost every organization today generating huge data daily. It is the reason that the number of job options in SSIS is also huge. If you are planning to start a career in SSIS, then you have come to the right place.

Read: Top 50 MongoDB Interview Questions and Answers

We strongly recommend you to join the online SSIS training program at JanBask Training to give new heights to your career and grow immensely. If you have been looking to  learn about data flow vs control flow, here you will get the deep learning on the difference between control flow and data flow in SSIS and other SSIS related concepts. Also, take a free demo class before making the final enrollment to have a great exposure to  online learning environment in depth.

We wish you great luck for a successful career journey in the SQL Server space ahead!


    Janbask Training

    A dynamic, highly professional, and a global online training course provider committed to propelling the next generation of technology learners with a whole new way of training experience.


Comments

Trending Courses

AWS

  • AWS & Fundamentals of Linux
  • Amazon Simple Storage Service
  • Elastic Compute Cloud
  • Databases Overview & Amazon Route 53

Upcoming Class

4 days 14 Jul 2020

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing

Upcoming Class

-0 day 10 Jul 2020

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning

Upcoming Class

6 days 16 Jul 2020

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation

Upcoming Class

7 days 17 Jul 2020

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL

Upcoming Class

5 days 15 Jul 2020

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing

Upcoming Class

-0 day 10 Jul 2020

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum

Upcoming Class

4 days 14 Jul 2020

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design

Upcoming Class

5 days 15 Jul 2020

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation

Upcoming Class

13 days 23 Jul 2020

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks

Upcoming Class

4 days 14 Jul 2020

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning

Upcoming Class

7 days 17 Jul 2020

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop

Upcoming Class

3 days 13 Jul 2020

Search Posts

Reset

Receive Latest Materials and Offers on SQL Server Course

Interviews