C# How SQL bulk copy work for large data insert

Question

C# How SQL bulk copy work for large data insert

Sudip Bhatt 2,281

1) what is default BatchSize of sql bulkcopy operation if i do not mention any value ?
2) i read some post and person mention 5000 BatchSize give him very good result. should i use 5000 batch size ?
How do i know what batch Size would be good for my scenario?

3) When SQL bulkcopy insert data in multiple batch then does it maintain any transaction? suppose 2 batch successfully inserted data but get any error for last batch then whole data will be deleted from table or not ?

4) SQL bulkcopy lock full table when inserting data? basically from multiple thread i am inserting data by SQL bulkcopy to same table. if table will be locked then parallel insert will not be possible. so suggest me best way to insert large data to same table from multiple thread?

Please guide me. Thanks

Accepted answer

1 additional answer

Your answer

Answer 1

Alberto Poblacion 1,571

1) The default is to process the whole operation in a single batch. So if, for example, you give a DataTable with 10000 rows to SqlBulkcopy for inserting it into the DB, then BulkInsert will insert the whole 10000 rows in a single batch. By adjusting the batch size property you can tell it to send several smaller batches. You will need to do a bit of benchmarking to find out the optimal value because it depends on your data size and on the characteristics of the server and network. A batch size of 4000 is typical, but your results may vary.

2) Yes, the transactions work with the bulk inserts, so if you roll back the transaction, any inserts as well as anything else that you did within the transaction will be rolled back. EDIT: But you need to use an explicit transaction for this to work. There is a property called UseInternalTransaction, but this only works for each individual batch. Other batches will not be rolled back if an insert fails, unless you started an explicit transaction before starting the bulk copy.

Be aware, however, that bulk inserts have some differences in behavior when compared to standard insert statements. For instance, if the database recovery model is set to bulk logged, then then the bulk inserts are not logged individually; triggers are not fired; block allocation in tables uses larger blocks (which could affect database growth if you are using bulk insert for smallish amounts of data), and so on. On the other hand, they are significantly faster than individual inserts, if you really need the speed.

3) Table locking is affected by the number of rows inserted in the table. Initially, row locking is used, but then the server escalates it to page locks and then table locks as needed for efficiency. This happens regardless of whether you are using bulk insert or individual insert statements.

Note that, depending on the version of SQL Server, if the table has indexes then inserting data from multiple threads will have horrible performance due to the locking on the indexes. This has been greatly alleviated in SQL Server 2019, but it can hurt performance if you try to insert lots of rows from multiple threads in an earlier version.

Sudip Bhatt 2,281 Reputation points

2021-01-11T08:49:34.817+00:00

1) what is max limit of number of rows a bulk copy can insert data into table ?

2)please clarify this one (which could affect database growth if you are using bulk insert for smallish amounts of data). if i really need speed then what approach i should follow ?

3) when lock is escalated to table lock ? when large number of rows are inserted then lock escalated to table lock ?

4) Sir you Said- if the table has indexes then inserting data from multiple threads will have horrible performance due to the locking on the indexes. This has been greatly alleviated in SQL Server 2019, but it can hurt performance if you try to insert lots of rows from multiple threads in an earlier version.

What changes done in sql server 2019 which does not affected when multiple data inserted into table though table has multiple indexs.

i am using sql server 2017. Thanks
Alberto Poblacion 1,571 Reputation points

2021-01-11T17:31:48.193+00:00

Answering in pieces because otherwise Q&A says that I have exceeded the maximum length:

1) Bulk Copy works by opening a Tabular Data Stream (TDS) connection to the server and then streaming all the rows through this connection. As far as I know, there is no limit to the number of rows that can be "pumped" in this way. However, you will find practical limitations when you do this by means of a SqlBulkCopy object, because you have to supply the rows in a DataTable. Depending on how much memory you have and how large are your rows, you will be limited in how big a DataTable you can comfortably handle.
Alberto Poblacion 1,571 Reputation points

2021-01-11T17:32:37.897+00:00

Second part of answer:

2) The growth issue affects the way in which new extents are allocated to a table when you add new rows. If you use bulk insert, the server presumes that you are going to send large amounts of data, so it assigns a big chunk of space to the table. This is wasted if you only send a few rows. The remedy is to check the number of rows that you are going to write. If there are few of them, use Insert Statements instead of BulkCopy. This should be fast enough if the number is indeed small.

3) There is no obvious rule for when the lock escalation occurs. The server uses an internal algorithm to decide when it considers adequate to perform the escalation in order to save resources. If you are doing individual inserts you can disable escalation by means of a query hint, but I do not know how to pass such hint when you are doing a bulk insert.
Alberto Poblacion 1,571 Reputation points

2021-01-11T17:33:29.2+00:00

And the last part:

4) When you insert rows into a table, SQL Server locks portions of the indexes to prevent corrupting an index if two threads try to modify it at the same time. This causes unexpected contention. If you perform the experiment of inserting rows rapidly into a table, you may find (for instance) that you are capable of inserting 30000 rows per second (obviously your results will vary depending on your data and server). If you try to do the same thing from more than one thread at the same time, you might find that the throughput drops to 10000 rows per second. So you actually get significantly worse performance if you do the writes from more than one thread than if you use a single thread. One solution is to upgrade to SQL Server 2019, which has been optimized to avoid most of this contention. Another solution is to create the table as a Heap with no indexes. Once you have finished doing all insertions, you can then add an index to the table if you need it.

Answer 2

Sushant Bagul 1

Hi @Alberto Poblacion ,
On same line we have developed a .Net application on 4.5.2 framework along with Oracle19c as database where we have used Oracle.DataAccess(4.122.19.1) as where we have file with 26 + lakh records which we are loading into database, the same code was working in DDTek.Oracle ODP.Net with Oracle12c but we have migrated database to 19c so its only supporting Oracle.DataAccess and here we are getting "Attempted to read or write protected memory. this is often an indication that memory is corrupt" for records more than 6.5 lakh.
So we can say that for OracleBulkCopy its allowed 6.5 Lakh records to loaded at a same time but not more than that.

Could you please help us with best approach so that we can load all records in one go? using C# with Oracle19c

Alberto Poblacion 1,571 Reputation points

2021-12-02T16:16:27.857+00:00

Sorry, I cannot help you here. I have only used the bulk copy with Microsoft SQL Server, but never with Oracle.
One suggestion that I can make is to configure the BulkCopy object so that the Batch size to a moderate value, such as 5000 rows (note that this does not limit the total number of rows that you can provide for insertion). You will not gain much speed from larger batches, or at least not in SQL Server; I can't say for Oracle. Maybe this will not trigger the error that you are seeing. If this doesn't work, I suggest asking in an Oracle forum, since the OracleBulkCopy from OracledataAccess comes from Oracle and not from Microsoft

Share via

C# How SQL bulk copy work for large data insert

1 additional answer

Your answer