Help with SQL - splitting a CSV into smaller files

Change Media 1 Reputation point
2022-08-30T21:27:27.027+00:00

I'm trying to split a large CSV into smaller chunks (1M rows or less) using SQL since you can't open/save files with more rows using Excel.

I've figured out how to add row numbers using the ROW_NUMBER command (Select ROW_NUMBER() over(order by <column name> asc) as row#), but I cannot figure out how to only show results for records 1 through 999,999 so I can export into Excel and repeat for 1,000,000 through the rest of the original file.

How can I show results between 2 values for the row number column I just created? When I attempt WHERE row# between 1 and 999999, I get "invalid column name row#."

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,367 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,601 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Tom Phillips 17,721 Reputation points
    2022-08-31T12:14:27.71+00:00
    1 person found this answer helpful.
    0 comments No comments

  2. Erland Sommarskog 107.2K Reputation points
    2022-08-30T21:59:54.707+00:00

    This happens because a query is logically evaluated in the order FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY. So what you define in the SELECT list cannot be referred to in the WHERE clause.

    But you can easily address with a Common Table Expression (CTE) to restructure your query:

       ; WITH CTE AS (  
           SELECT ... row# = row_number()....  
           FROM   tbl  
       )  
       SELECT ...  
       FROM   CTE  
       WHERE row# BETWEEN 1 AND 999999  
    
    0 comments No comments

  3. LiHongMSFT-4306 25,651 Reputation points
    2022-08-31T02:49:05.357+00:00

    Hi @Change Media

    I get "invalid column name row#."

    As Erland answered, it is the SQL query processing order reason. You could use Subquery or CTE to handle this issue.

    Also, you can have a try on OFFSET ... FETCH NEXT, check this:

    --1~1000000  
    SELECT <column list>  
    FROM <table name>  
    ORDER BY <column name> ASC OFFSET 0 ROWS FETCH NEXT 1000000 ROWS ONLY;  
      
    --1000001~2000000  
    SELECT <column list>  
    FROM <table name>  
    ORDER BY <column name> ASC OFFSET 1000000 ROWS FETCH NEXT 1000000 ROWS ONLY;  
    

    Refer to this blog for more details: SQL Server OFFSET FETCH

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments