How to quickly put very large CSV data into a DB?

Ryan Lee 0 Reputation points
2023-12-30T06:59:00.9833333+00:00

I have 22 files that are between 35GB and 50GB each, totaling 880GB of CSV files. There are about 4.6 billion rows of data.

I am using "Azure Database for MySQL" and would like to import all of these CSV files into it.

I tested the import by following MS's guide (https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-migrate-import-export) and it took me about 3 minutes to import 2000 rows. It looks like it will take me 13 years to import all this data this way.

Next, I looked up to try "LOAD DATA INFILE" and it says I need to use "Azure Storage", but when I try to upload the CSV to "Azure Storage", I get an error saying that "Azure Blob Storage" only supports up to 512MB for a single file.

What is the fastest way to get the data into the RDB in my situation? It doesn't necessarily have to be MySQL, it's all about getting the CSV into the DB in the fastest way possible.

Please advise.

Thank you.

Azure Database Migration service
Azure SQL Database
Azure Database for MySQL
Azure Database for MySQL
An Azure managed MySQL database service for app development and deployment.
850 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 21,725 Reputation points
    2023-12-31T02:27:40.5366667+00:00

    Thanks for posting your question in the Microsoft Q&A forum.
    Azure Data Factory is a cloud-based data integration service that supports large-scale data movement and transformation. It can be used to automate loading data from your CSV files into the target database. Azure Data Factory can handle parallel processing and be optimized for large datasets.

    Consider temporarily disabling or dropping non-essential indexes during the import process and recreating them after the data is loaded.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful **

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.