Why SQL server BULK INSERT insert data in randomlly?

564    Asked by Ankesh Kumar in Salesforce , Asked on Jul 19, 2021

I have a csv file with 350,000 rows. Would like to insert data to temporary table in same order as is in csv file. I am trying to BULK INSERT using: BULK INSERT ##AllRows FROM @FilePath WITH ( FIELDTERMINATOR = 'n' ) Unfortunately BULK INSERT inserts data in random order. My header in every execution is in different row. I ran it on SQL Server 2016. Is there a possibility that in older versions of SQL Server order manner was different? Using the FIRSTROW option does not recognize header as first row of file. File does not have any column for which we can order. In file always header is in first row. It could be a coincidence but even with FIRSTROW=2 it is possibility that my header will be in table. I checked it. It looks like the more rows csv file contains the more chance that insert to table will be with random order.


Answered by Cameron Oliver

BULK INSERT in SQL server statement loads data from a data file into a table. This functionality is similar to that provided by the in option of the bcp command; however, the data file is read by the SQL Server process. For a description of the BULK INSERT syntax, see BULK INSERT (Transact-SQL). Bulk insert reads huge chunks of data by default (the file is one batch) and assumes that if the sort property is not set, order is not needed., If you specify the sort, the bulk file will be sorted and aligned with the clustered index. The rows from the file are read in order, and added to the table in the same order. The issue occurs when you read rows from the table. Without an ORDER BY clause on your SELECT, SQL Server is free to return rows from the table in whatever order is convenient.

Details The question doesn't provide a definition for the table ##AllRows, but it seems certain the table is a heap (a table without a clustered index). SQL Server reads pages from a heap using Index Allocation Map (IAM) structures. This means data tends to be returned in file and page id order within each IAM chain, which will generally not reflect the order in which data was inserted. This is the underlying cause of the behaviour you are seeing.

Solutions You need a column to indicate the order of rows in the file, then order by that column when you write your query. Unfortunately, SQL Server does not provide a built-in way to add this 'sequence' column during import. There are a couple of common workarounds:

  • Pre-process the source file outside SQL Server to add a sequence number to each row. This is the most reliable method.
  • Assign a sequence number during the import.

The second method carries some risk because there is no documented guarantee that this will work reliably in all circumstances. Nevertheless, people have been using this idea successfully for a long time. The general idea is:

  • Add an IDENTITY column to the import table.
  • Create a view over the import table, omitting the IDENTITY column.
  • BULK INSERT into the view.

This will not work with a global temporary table, because a view cannot reference that type of table. You would need to use a regular table (perhaps in tempdb) instead.

Example 1

I used the following script to successfully import a csv file containing the Complete Works of Shakespeare: The first step is to create a table in tempdb with an extra IDENTITY column:

  USE tempdb; GO CREATE TABLE dbo.Test ( id integer IDENTITY PRIMARY KEY, line nvarchar(4000) NOT NULL );

Now we create a view over that table, omitting the IDENTITY column:

  CREATE VIEW dbo.ImportTest WITH SCHEMABINDING AS SELECT T.line FROM dbo.Test AS T;

Finally, we bulk insert into the view:

  BULK INSERT dbo.ImportTest FROM 'C:Tempshakespeare.txt' WITH ( CODEPAGE = '65001', DATAFILETYPE = 'char', ROWTERMINATOR = '
' );

We can now see the first few lines in file order using a SELECT with ORDER BY:

  SELECT TOP (20) T.id, T.line FROM dbo.Test AS T ORDER BY T.id ASC;

The result shows the text in the right order:


Example 2

  It is also possible to use OPENROWSET with a format file. Using the same example csv file, I was able to import the data using the following format file (saved as shakespeare.xml):


and: INSERT dbo.Test WITH (TABLOCK) ( line ) SELECT ORO.line FROM OPENROWSET ( BULK 'C:Tempshakespeare.txt', FORMATFILE = 'C:Tempshakespeare.xml', CODEPAGE = '65001' ) AS ORO; Notice that this method does not require a view, so you could target a global temporary table. The target table still needs the extra IDENTITY column.








Your Answer

Interviews

Parent Categories