Help with SQL - splitting a CSV into smaller files

Change Media 1 Reputation point
2022-08-30T21:27:27.027+00:00

I'm trying to split a large CSV into smaller chunks (1M rows or less) using SQL since you can't open/save files with more rows using Excel.

I've figured out how to add row numbers using the ROW_NUMBER command (Select ROW_NUMBER() over(order by <column name> asc) as row#), but I cannot figure out how to only show results for records 1 through 999,999 so I can export into Excel and repeat for 1,000,000 through the rest of the original file.

How can I show results between 2 values for the row number column I just created? When I attempt WHERE row# between 1 and 999999, I get "invalid column name row#."

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
9,036 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 71,986 Reputation points MVP
    2022-08-30T21:59:54.707+00:00

    This happens because a query is logically evaluated in the order FROM-WHERE-GROUP BY-HAVING-SELECT-ORDER BY. So what you define in the SELECT list cannot be referred to in the WHERE clause.

    But you can easily address with a Common Table Expression (CTE) to restructure your query:

       ; WITH CTE AS (  
           SELECT ... row# = row_number()....  
           FROM   tbl  
       )  
       SELECT ...  
       FROM   CTE  
       WHERE row# BETWEEN 1 AND 999999  
    
  2. CosmogHong-MSFT 8,281 Reputation points Microsoft Vendor
    2022-08-31T02:49:05.357+00:00

    Hi @Change Media

    I get "invalid column name row#."

    As Erland answered, it is the SQL query processing order reason. You could use Subquery or CTE to handle this issue.

    Also, you can have a try on OFFSET ... FETCH NEXT, check this:

    --1~1000000  
    SELECT <column list>  
    FROM <table name>  
    ORDER BY <column name> ASC OFFSET 0 ROWS FETCH NEXT 1000000 ROWS ONLY;  
      
    --1000001~2000000  
    SELECT <column list>  
    FROM <table name>  
    ORDER BY <column name> ASC OFFSET 1000000 ROWS FETCH NEXT 1000000 ROWS ONLY;  
    

    Refer to this blog for more details: SQL Server OFFSET FETCH

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.