Good Friday Sale : Flat 30% off on live classes + 2 free self-paced courses! - SCHEDULE CALL

- SQL Server Blogs -

Introducing SSIS Architecture & DW Concepts Overview

Introduction

Nowadays most of the organizations across the world invest heavily in Information Technology. They have software to maintain the attendance of the employees. They have software for document management. They now have an ERP. They also have document management software, Software to capture CCTV footages and data saved in Excel, Word, and notepad as well. And above all, they have a Business Intelligence Software which provides attractive analytical reports to the management. Now the question arises-how would you manage to get all those data from different applications together in one place and create meaningful reports using them?

The main problem that will arise when you try to bring data from all those databases and other storage media’s together is that all those data are stored on different platforms. For example, your Enterprise Resource Planning Software might be using the Oracle database as back end whereas your HR software could be using MS SQL Server. Although both of this software is databases at the end of the day, each of them has a certain unique way of managing data. It is here that SSIS plays its part.

The most important feature of an SSIS package is the ETL service. The full form of ETL is- extract, transform and load. The process of extraction includes collecting data from different data sources. The transformation includes cleaning up those collected data, removing redundancy and bringing those collected data into a common platform. The loading includes storing that collected data into a separate relational database called data warehouse to be used later on for the analytical and reporting purpose.

SSIS is one such popular ETL tool that is available in the market nowadays. Over the next few paragraphs, we will try to understand the basics of SSIS, different components of SSIS and introduce you to the basics of the data warehouse database.

Data warehouse Design and Components

What is the warehouse? It is a place where goods are safely stored properly so that they can be easily accessed when required. There are different parts of the warehouse building dedicated for storing different types of items. Similarly, a data warehouse is a database where data extracted and cleaned up from different sources can be safely stored and retrieved later on when needed.

Components of Data warehouse are as below

  • Data Warehouse Database: A separate relational database to store the final extracted data that will be used for analytical purposes.
  • Sourcing, Acquisition, Cleanup and Transformation Tools: Different standard software like InformaticaPowercenter, Microsoft SSIS, Talend, etc. that helps in the ETL process.
  • Metadata: This comprises the data related to the configuration of an ETL package developed in SSIS. When we talk about configuration, it means data related to the source of data, the destination database, what fields need to be exported and how many times, etc.
  • Access Tools: Access tools are tools that are used to access the data in the data warehouse. This includes query writing tools, reporting tools, etc.
  • Data Marts: A data mart is a repository of data that is designed to serve a particular community of knowledge workers.
  • Data Warehouse Administration and Management: Includes features on how to provide access rights to different users of a specific set of data.
  • Information Delivery System: This includes the process of delivering specific data to one or more locations using specific algorithms.

Learn SQL Server in the Easiest Way

  • Learn from the videos
  • Learn anytime anywhere
  • Pocket-friendly mode of learning
  • Complimentary eBook available

Catalog DB

The SQL Server Integration Service Catalog database is the fundamental center when we work with Integration Services (SSIS) projects that are transferred to the Integration Services server. For instance, project and parameters can be set, design situations to indicate runtime values for package execution and investigate packages, and can oversee Integration Services server activities.

Control Flow

A control stream portrays a work process of tasks that can be executed, here and there in a specific order (We have utilized the precedence control operator). The looping example is a decent case of a control-flow requirement, likewise, standalone SQL Scripts can be executed, call into COM interfaces can be made, .NET components can be executed, or email can be sent. 

Inside control flow, the dataflow tasks are found.

DataFlow

The Data Flow task hides the data flow engine that relocates data from sources to destinations, and allows the user to transform, clean, and modify data as it is moved. Adding a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.

Components of DataFlow

Source

Transform

Destination

This control permits us to determine the data source from where data will be extracted. 

These controls help to delineate source data types with the target data types.

These controls help to indicate the target database where data will be located.

Below is a small example of the SSIS Package.

The following package connects to a SQL Database, transforms the data and multicast it to multiple flat files.

Read: What is a CASE Statement in the SQL?

SSIS Datatypes

The data types supported by SSIS can be divided into the following categories:

Numeric String Date/Time Binary Boolean Identifier
Supports numeric values are formatted as currencies, decimals, and signed and unsigned integers. It supports the most number of numeric types than any other application.

Supports string values formatted as characters and strings.

Supports date values, time values, or both in various formats.

Supports binary and image values.

Handle Boolean values.

Handle globally unique identifiers (GUIDs).

Below is a screenshot of the types of data types in SSIS.

Variables

Variables store esteems thatanSSISs package and its containers, tasks, and event handlers can use at run time.

SSIS Local and Global Variables

SSIS Local Variables

SSIS Global Variables

 Defined by package developers

 Defined by Integration Services

Can create as many variables as possible

Cannot add new variables with existing variables

Dynamic Precedence control

Precedence constraints interface executable, containers, and tasks in packages in a control flow, and indicate conditions that decide if executable run. An executable can be a For Loop, for each Loop, or Sequence compartment; a task; or an event handler. Event handlers likewise use precedence constraints to connect their executable into a control flow. 

A precedence constraint connects two executable: the precedence executable and the constrained executable. The precedence executable runs before the constrained executable and the execution consequence of the precedence executable may decide if the constrained executable runs. The accompanying outline shows two executable connected by a precedence constraint. 

The beneath example shows three tasks: - OLE DB Source, Data Conversion, Multicast. They are associated with an arrow-like priority control.

SSIS Environment Variables

SSIS Environment Variables give the instrument to set values at the time a bundle is executed. This usefulness is helpful for any number of things, much of the time for indicating various qualities between Development, quality, and production environments.

How to generate a dynamic connection string

Below we will learn how to generate a dynamic connection string for an SSIS project. We will create an SSIS package which will try to import a data set from two excel file having the same set of data.

Read: What is SQL Subquery? Types of Subqueries in SQL

Step 1 

Let us first create two excel files and name them Testfile1 and Testfile2. The excel files look as below.

Testfile1                                            Testfile2

Step 2

Let us now create an SSIS package. The package will look like below.

Step 3

The exported data will look like below.

Step 4

Let us now create a variable to hold the file names to be exported

Step 5

Let us now make the excel connection string flexible. That means we are going to parameterize the connection string.

We will right-click on the Excel source connection string and we will choose properties

Step 6

Copy the text in the connection property.

Step 7

Then let us go to the expression property

Read: Data Anamolies-Understanding Them And Learning Ways To Resolve Them

Step 8

Let us now create a custom expression for the connection string property.

Step 9

Next, we choose the connection string under property and paste the copied connection string in the expression section.


 

Step 10

We need to edit the connection string as below.

The initial connection string was like below

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\SSIS File\Testfile1 ;Extended Properties="EXCEL 12.0 XML;HDR=YES";

And we edited it to

"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\\SSIS File\\" +  @[User::Excel] + " ;Extended Properties=\"Excel 12.0 XML;HDR=YES\";"

Step 11

Now let us change the file name in the variable value like below and run the package

Step 11

As you can see as soon as we change the file name and run the package, the output changes like below.

So in this way, we can use dynamic connection string in the SSIS package to connect to different source files.

SQL Server Training & Certification

  • No cost for a Demo Class
  • Industry Expert as your Trainer
  • Available as per your schedule
  • Customer Support Available

Summary

In this blog, we have tried to give you an introduction to basic SSIS mechanisms. We have discussed data warehouse, different SSIS tools, and variables and also parameterized connection string. This would give a head start to someone interested in knowing about SSIS and can go ahead learning more on the topics discussed above in detail.

Read: Top 50 DB2 Interview Questions and Answers for Freshers & Experienced


fbicons FaceBook twitterTwitter google+Google+ lingedinLinkedIn pinterest Pinterest emailEmail

     Logo

    Sanchayan Banerjee

    I love to learn new things and also like sharing my knowledge with others. As an experienced IT Professional I like to update myself constantly with new and upcoming technologies. The database management system is one of my favorite subjects which I constantly explore.


Comments

Trending Courses

Cyber Security Course

Cyber Security

  • Introduction to cybersecurity
  • Cryptography and Secure Communication 
  • Cloud Computing Architectural Framework
  • Security Architectures and Models
Cyber Security Course

Upcoming Class

-0 day 29 Mar 2024

QA Course

QA

  • Introduction and Software Testing
  • Software Test Life Cycle
  • Automation Testing and API Testing
  • Selenium framework development using Testing
QA Course

Upcoming Class

-0 day 29 Mar 2024

Salesforce Course

Salesforce

  • Salesforce Configuration Introduction
  • Security & Automation Process
  • Sales & Service Cloud
  • Apex Programming, SOQL & SOSL
Salesforce Course

Upcoming Class

7 days 05 Apr 2024

Business Analyst Course

Business Analyst

  • BA & Stakeholders Overview
  • BPMN, Requirement Elicitation
  • BA Tools & Design Documents
  • Enterprise Analysis, Agile & Scrum
Business Analyst Course

Upcoming Class

-0 day 29 Mar 2024

MS SQL Server Course

MS SQL Server

  • Introduction & Database Query
  • Programming, Indexes & System Functions
  • SSIS Package Development Procedures
  • SSRS Report Design
MS SQL Server Course

Upcoming Class

7 days 05 Apr 2024

Data Science Course

Data Science

  • Data Science Introduction
  • Hadoop and Spark Overview
  • Python & Intro to R Programming
  • Machine Learning
Data Science Course

Upcoming Class

-0 day 29 Mar 2024

DevOps Course

DevOps

  • Intro to DevOps
  • GIT and Maven
  • Jenkins & Ansible
  • Docker and Cloud Computing
DevOps Course

Upcoming Class

7 days 05 Apr 2024

Hadoop Course

Hadoop

  • Architecture, HDFS & MapReduce
  • Unix Shell & Apache Pig Installation
  • HIVE Installation & User-Defined Functions
  • SQOOP & Hbase Installation
Hadoop Course

Upcoming Class

-0 day 29 Mar 2024

Python Course

Python

  • Features of Python
  • Python Editors and IDEs
  • Data types and Variables
  • Python File Operation
Python Course

Upcoming Class

7 days 05 Apr 2024

Artificial Intelligence Course

Artificial Intelligence

  • Components of AI
  • Categories of Machine Learning
  • Recurrent Neural Networks
  • Recurrent Neural Networks
Artificial Intelligence Course

Upcoming Class

8 days 06 Apr 2024

Machine Learning Course

Machine Learning

  • Introduction to Machine Learning & Python
  • Machine Learning: Supervised Learning
  • Machine Learning: Unsupervised Learning
Machine Learning Course

Upcoming Class

21 days 19 Apr 2024

 Tableau Course

Tableau

  • Introduction to Tableau Desktop
  • Data Transformation Methods
  • Configuring tableau server
  • Integration with R & Hadoop
 Tableau Course

Upcoming Class

7 days 05 Apr 2024

Search Posts

Reset

Receive Latest Materials and Offers on SQL Server Course

Interviews