Scenario: Remove duplicate records from file systems using either sort transformation or aggregate transformation.
Method 1: (Using aggregate transformation) open business intelligence development studio create a new package and rename it a flat file source with aggregate.
In control flow drag and drop data flow task and rename it as avoid duplicate records using aggregate ----------->
In data flow drag and drop flat file source ----------->
Double click on flat file source to configure it ----------->
Click new to create a new flat file connection manager ----------->
Provide connection manager name and description (storing details) ----------->
Click Browse ----------->
Navigate to the path ----------->
Select student details t2t and click open ----------->
Select columns from left pane ----------->
Click ok ----------->
Select columns in flat file source editor ----------->
Rename the output columns as mentioned below
Column 0 – SND
Column 1 – S Name
Column 2 – Qualification
Column 3 – Marks
Click ok ----------->
Drop and drag aggregate transformation and make a connection from flat file source to aggregate ----------->
Double click on aggregate to configure it ----------->
Select or check all available input columns (Sno, Sname, S Qualification, marks) ----------->
Note: Make sure that the operation is group by for all the selected columns
Click ok ----------->
Drag and drop OLEDB destination and make a connection from aggregate to OLEDB destination ----------->
Double click on OLEDB destination to configure it ----------->
Provide destination connection manager ----------->
Click new to create new destination table if it is not exists. ----------->
Rename the table as aggregated data ----------->
Select mappings ----------->
Click ok ----------->
Execute package
Desired to gain proficiency on MSBI? Explore the blog post on MSBI Training to become a pro in MSBI.
Method 2: Remove duplicate records and also sort the data using sort transformation.
Open business intelligence development studio
Create a new package and rename it as avoid duplicate using sort .dtsx. ----------->
In control flow drag and drop data flow task and rename it as DRT sort transformation In connection manager section right click and select new flat file connection. ----------->
Provide connection manager name and description ----------->
Click browse and select student details .Txt and click open ----------->
Click ok ----------->
Drag and drop flat file source ----------->
Double click on flat file source and select student details flat file connection manager from drop down list. ----------->
Select columns and rename the output columns as mentioned below.
Column 0 -- SNO
Column 1 – SName
Column 2 – Qualification
Column 3 - Marks
Click ok ----------->
Drag and drop sort transformation and make a connection from flat file source to sort ----------->
Double click on sort to configure it and select all input columns. ----------->
Check remove rows with duplicate sort values to remove duplicate records from the sorted data sets ----------->
Click ok ----------->
Drag and drop OLEDB destination and make a connection from sort to OLEDB destination ----------->
Double click on OLEDB destination ----------->
Provide destination connection manager if exists and click new to create a new destination table if not exists. ----------->
Rename the new table as sorted data ----------->
Click ok ----------->
Select mapping from left panel ----------->
Click ok
Flat file formats:
In SSIS flat file can be configured in 3 ways.
Example: 1, Rama, MCA
2, Vijay, Batch
Bulk insert task: Bulk insert task can be used to insert data from a text file or a flat file into SQL server table. Mainly it is advised to use in applications which required quick loading of large amounts of data.
Steps to configure bulk insert table
Prepare the following data with in a text file and rename it as student details. Txt the sample flat file can be used for the above mentioned methods (Method 1 and method 2)
1, Rama, MCA, 72
2, Vijay, Btech, 71
3, Siva, MCA, 70
4, Laxmi, MCA, 80
5, Arpitha, BSC, 90
Open Business intelligence development studio ----------->
Create a new package and rename it as bulk insert .dtsx ----------->
In control flow drag and drop the bulk insert task ----------->
In connection manager section, right click and select new flat file connection ----------->
Provide connection manager name and description ----------->
Click browse and select student details txt ----------->
Click open ----------->
Select advanced tab and rename the column names as mentioned below ----------->
Column 0 – SNO
Column 1 – S Name
Column 2 – Qualification
Column 3 – marks ----------->
Click ok ----------->
Open SSMS (SQL server management studio) to create destination table to store the data which is coming from student details txt ----------->
USE (Adventure works)
GO
(REATE TABLE [DBO] [Bulk Insert]
([SNO] [Varchar] (50),
[S Name] [Varchar] (50) NULL<
[Qualification] [Varchar] (50) NULL<
[Marks] [Varchar] (50) NULL. ----------->
Press F5to execute the query in SSMS ----------->
Double click on bulk insert task to configure it and set the following properties ----------->
Source connection
File – select student details from drop down list
Destination connection
Connection – select local Host, Adventure work
Destination table – select [Adv. works]. [dbo].
[Bulk Insert] from drop down list.
Column delimited – Select comma{,} option since student details flat file is comma {,} delimited select options tab or page
Note: The default value is zero i.e all the data in the specified data file is 1 batch
The default is zero, indicates the last row in the specified data file.
TIPS to improve the performance of bulk insert task.
Data Viewers: Data viewers used to debug the package and also user or developer, can monitor the data which is flowing from source to other stages (transformation or destinations). In SSIS data viewers are categorized into 4 parts
Check out the top MSBI Interview Questions now!
Select the columns to be displayed in the grid format.
Histogram: Select histogram type and click histogram tab.
Select the column which you want to model with the histogram.
Select the column that you want to model in the column chart
Steps to configure Data Viewers
Open Business intelligence development studio ----------->
Create a new package and rename it as data viewers .dtsx ----------->
In control flow drag and drop data flow task ----------->
In data flow drag and drop OLEDB source and configure it. ----------->
Drag and drop OLEDB destination and configure it ----------->
Make a connection from OLEDB source to OLEDB destination. ----------->
Double click on data flow path ----------->
Select data viewers option ----------->
Click add to add any data viewers ----------->
Select grid type ----------->
Select grid tab ----------->
Select the columns which you want to display in grid format and click ok ----------->
Click ok ----------->
Execute package
Note: Data viewers cannot be implemented on production environment. But data viewers can implemented only on development environment for debugging the package.
For an Indepth knowledge on MSBI click on:
You liked the article?
Like: 0
Vote for difficulty
Current difficulty (Avg): Medium
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.