While Creating External Table in Synapse Can I use wild card in Location?

Shivendoo Kumar 751 Reputation points
2021-10-05T06:48:26.85+00:00

Hi All,

Working on a project where I have to bring data from ADLS to Synapse. I am planning to use Polybase so that I can create external tables based on CSV files from ADLS and avoid physical data movement to save time and money.

On ADLS, I may have multiple files with the same name but with a date suffix like Sales_01_Oct_2021.CSV, Sales_02_Oct_2021.CSV etc. These files are having one-day data in them. I want to create an external table called External.Sales and this table should have data from all the files starting with Sales. Is this possible..?

https://towardsdatascience.com/loading-csv-data-into-azure-synapse-analytics-by-using-polybase-5ae942ce3059
-- Create a temp table to hold the imported data
CREATE EXTERNAL TABLE dbo.FIPSLOOKUP_EXT (
UID INT NOT NULL,
iso2 VARCHAR(2) NULL,
iso3 VARCHAR(3) NULL,
code3 INT NULL,
FIPS INT NULL,
Admin2 VARCHAR(255) NULL,
provincestate VARCHAR(255) NULL,
countryregion VARCHAR(255) NULL,
latitude DECIMAL(12,9) NULL,
longitude DECIMAL(12,9) NULL,
combined_key VARCHAR(255) NULL,
population INT NULL
)
WITH (
LOCATION='../Sales_',*
DATA_SOURCE=AzureStorage,
FILE_FORMAT=csvFile
);

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,334 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Shivendoo Kumar 751 Reputation points
    2021-10-05T22:56:48.65+00:00

    I found the answer here:
    https://stackoverflow.com/questions/54543064/create-view-in-polybase

    Since they are all targeting the same table, you don't need to use this file-by-file method. Polybase will load a wildcard-like set of files. If you put all your data files of the same type in a folder and set the FOLDER as the location, all the files in the folder will be loaded in parallel. You will find your loads hugely faster … at the moment you're processing files in series, using the method I suggest will process them in parallel as fast as readers are available.

    The only extra step I need to do here is to create a separate folder for each table.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.