Help with SQL - splitting a CSV into smaller files

Question

Help with SQL - splitting a CSV into smaller files

Change Media 1

I'm trying to split a large CSV into smaller chunks (1M rows or less) using SQL since you can't open/save files with more rows using Excel.

I've figured out how to add row numbers using the ROW_NUMBER command (Select ROW_NUMBER() over(order by <column name> asc) as row#), but I cannot figure out how to only show results for records 1 through 999,999 so I can export into Excel and repeat for 1,000,000 through the rest of the original file.

How can I show results between 2 values for the row number column I just created? When I attempt WHERE row# between 1 and 999999, I get "invalid column name row#."

LiHongMSFT-4306 31,566 Reputation points

2022-09-02T08:19:52+00:00

Hi @Change Media
Have you validate all the answers so far and provide any update?
And don't forget to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread.
Thanks for your understanding!

3 answers

Your answer

LiHongMSFT-4306 31,566 Reputation points

2022-09-02T08:19:52+00:00

Hi @Change Media
Have you validate all the answers so far and provide any update?
And don't forget to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread.
Thanks for your understanding!

Answer 1

Tom Phillips 17,771

You can do this easily with SSIS.

Please see:
https://social.technet.microsoft.com/wiki/contents/articles/3172.split-a-flat-text-file-into-multiple-flat-text-files-using-ssis.aspx
https://www.youtube.com/watch?v=b_32_zS1tSE

Answer 2

Erland Sommarskog 121.4K MVP Volunteer Moderator

This happens because a query is logically evaluated in the order FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY. So what you define in the SELECT list cannot be referred to in the WHERE clause.

But you can easily address with a Common Table Expression (CTE) to restructure your query:

   ; WITH CTE AS (  
       SELECT ... row# = row_number()....  
       FROM   tbl  
   )  
   SELECT ...  
   FROM   CTE  
   WHERE row# BETWEEN 1 AND 999999

Answer 3

Hi @Change Media

I get "invalid column name row#."

As Erland answered, it is the SQL query processing order reason. You could use Subquery or CTE to handle this issue.

Also, you can have a try on OFFSET ... FETCH NEXT, check this:

--1~1000000  
SELECT <column list>  
FROM <table name>  
ORDER BY <column name> ASC OFFSET 0 ROWS FETCH NEXT 1000000 ROWS ONLY;  
  
--1000001~2000000  
SELECT <column list>  
FROM <table name>  
ORDER BY <column name> ASC OFFSET 1000000 ROWS FETCH NEXT 1000000 ROWS ONLY;

Refer to this blog for more details: SQL Server OFFSET FETCH

Best regards,
LiHong

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

Help with SQL - splitting a CSV into smaller files

3 answers

Your answer