Share via

Synapse csv file in Serverless SQL Pool

Debbie Edwards 526 Reputation points
2023-10-24T16:03:29.4+00:00

Hi,

I have the following working on the master serverless sql pool

SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY GA, tab) as Key, GA, tab, TYPE 
FROM
    OPENROWSET(
        BULK 'https://devuktrainsa.dfs.core.windows.net/rawdata/2022-2023/england_census.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0',
        HEADER_ROW = TRUE
    ) with (
        Key varchar(50) COLLATE Latin1_General_100_BIN2_UTF8,
        GA VARCHAR(3) COLLATE Latin1_General_100_BIN2_UTF8,
        tab VARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
        TYPE varchar(50) COLLATE Latin1_General_100_BIN2_UTF8
      
    ) as rows

And now Im thinking. How do I set that as an external table? Moving from csv to PARQUET?

Is it doable? Im very confused All i can find is these examples

CREATE EXTERNAL TABLE census_external_table
(
    decennialTime varchar(20),
    stateName varchar(100),
    countyName varchar(100),
    population int,
    race varchar(50),
    sex    varchar(10),
    minAge int,
    maxAge int
)  
WITH (
    LOCATION = '/parquet/',
    DATA_SOURCE = population_ds,  
    FILE_FORMAT = census_file_format
)
GO

SELECT TOP 1 * FROM census_external_table

But to me it looks like this is just taking the data source I created over my top level folder.

I switch to File Format PARQUET for my example

But how does it know I want to so the above script creating my dimension?

Im clearly very confused over the creation of external tables.

Am I thinking about this incorrectly. Or are external tables just straight over the data source. You cant do transformations on them?

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

0 comments No comments

3 answers

Sort by: Most helpful
  1. Debbie Edwards 526 Reputation points
    2023-10-26T10:49:06.9933333+00:00

    I got it

    --USE dev_training_db
    --to do Let create our dim table. Taking it from CSV and moving it to PARQUET
    CREATE EXTERNAL TABLE [staging].[dim_estab]
     WITH (
        LOCATION = 'staging/parquet/',
        DATA_SOURCE = test_data,  
        FILE_FORMAT = parquetfile1
    )
    AS 
    SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY GA, tab) as TestKey, ISNULL(LA,-1) AS GA, tab, TYPE 
    FROM
        OPENROWSET(
            BULK 'https://devukstrainsa.dfs.core.windows.net/rawdata/*/census.csv',
            FORMAT = 'CSV',
            PARSER_VERSION = '2.0',
            HEADER_ROW = TRUE
        ) with (
            TestKey varchar(50) COLLATE Latin1_General_100_BIN2_UTF8,
            GA VARCHAR(3) COLLATE Latin1_General_100_BIN2_UTF8,
            tab VARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
            TYPE varchar(50) COLLATE Latin1_General_100_BIN2_UTF8
          
        ) as rows
    
    going through the learning path again and I only needed the one with
    

    Was this answer helpful?


  2. Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
    2023-10-24T19:53:55.65+00:00

    Hello Debbie Edwards,

    I am glad to know you were able to answer your own question.

    Regarding the error message "Different number of columns in CREATE TABLE or CREATE EXTERNAL TABLE and SELECT query" usually occurs when the number of columns in the SELECT query does not match the number of columns in the CREATE TABLE or CREATE EXTERNAL TABLE statement.

    In your case, it looks like the number of columns in the SELECT query and the CREATE EXTERNAL TABLE statement are the same, so the error message is unexpected.

    To troubleshoot the issue further:

    Can you try running the SELECT query separately and see if it returns the expected results?

    If the SELECT query returns the expected results, then the issue might be with the CREATE EXTERNAL TABLE statement.

    Try creating the external table without the AS SELECT clause and see if it works.

    CREATE EXTERNAL TABLE dim_test
    (
            Key INT,
            GA VARCHAR(3) COLLATE Latin1_General_100_BIN2_UTF8,
            tab VARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
            TYPE varchar(50) COLLATE Latin1_General_100_BIN2_UTF8
    )  
    WITH (
        LOCATION = '/parquet/',
        DATA_SOURCE = census_data,  
        FILE_FORMAT = parquetfile1
    );
    

    Was this answer helpful?

    0 comments No comments

  3. Debbie Edwards 526 Reputation points
    2023-10-24T17:43:18.86+00:00

    I have just answered my own question again and now I understand CTAS a bit more but its still not working

    CREATE EXTERNAL TABLE dim_test
    (
            Key INT,
            GA VARCHAR(3) COLLATE Latin1_General_100_BIN2_UTF8,
            tab VARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
            TYPE varchar(50) COLLATE Latin1_General_100_BIN2_UTF8
    )  
    WITH (
        LOCATION = '/parquet/',
        DATA_SOURCE = census_data,  
        FILE_FORMAT = parquetfile1
    )
    AS 
    SELECT DISTINCT ROW_NUMBER() OVER (ORDER BY GA, tab) as Key, GLA, tab, TYPE 
    FROM
        OPENROWSET(
            BULK 'https://devukstrainsa.dfs.core.windows.net/rawdata/2022-2023/census.csv',
            FORMAT = 'CSV',
            PARSER_VERSION = '2.0',
            HEADER_ROW = TRUE
        ) with (
            Key varchar(50) COLLATE Latin1_General_100_BIN2_UTF8,
            GA VARCHAR(3) COLLATE Latin1_General_100_BIN2_UTF8,
            tab VARCHAR(10) COLLATE Latin1_General_100_BIN2_UTF8,
            TYPE varchar(50) COLLATE Latin1_General_100_BIN2_UTF8
          
        ) as rows
    

    I have to add the WITH because of collation issues. However Im getting this error

    Different number of columns in CREATE TABLE or CREATE EXTERNAL TABLE and SELECT query.

    But this simple isn't true. Both have 4 and they are exactly the same so I dont understand this error. can any one help. I really want to create my first external table

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.