Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server
This article reviews options for using PolyBase to query external data in-place, referred to as data virtualization, for a variety of external data sources.
To use PolyBase, you must create external tables to reference your external data. Refer to:
Note
In order to use PolyBase you must have sysadmin or CONTROL SERVER level permissions on the database.
For examples of queries, see PolyBase Queries.
For more tutorials on various external data sources, review:
Sometimes, you might need to know which file or folder source correlates to a specific row in the result set.
You can use functions filepath
and filename
to return file names and/or the path in the result set. Or you can use them to filter data based on the file name and/or folder path. In the following sections, you'll find short descriptions along samples.
This function returns the file name that the row originates from.
Return data type is nvarchar(1024). For optimal performance, always cast result of filename function to appropriate data type. If you use character data type, make sure appropriate length is used.
The following sample reads the NYC Yellow Taxi data files for the last three months of 2017 and returns the number of rides per file. The OPENROWSET
part of the query specifies which files will be read.
SELECT
nyc.filename() AS [filename]
,COUNT_BIG(*) AS [rows]
FROM
OPENROWSET(
BULK 'parquet/taxi/year=2017/month=9/*.parquet',
DATA_SOURCE = 'SqlOnDemandDemo',
FORMAT='PARQUET'
) nyc
GROUP BY nyc.filename();
The following example shows how filename()
can be used in the WHERE
clause to filter the files to be read. It accesses the entire folder in the OPENROWSET
part of the query and filters files in the WHERE
clause.
Your results will be the same as the prior example.
SELECT
r.filename() AS [filename]
,COUNT_BIG(*) AS [rows]
FROM OPENROWSET(
BULK 'csv/taxi/yellow_tripdata_2017-*.csv',
DATA_SOURCE = 'SqlOnDemandDemo',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2)
WITH (C1 varchar(200) ) AS [r]
WHERE
r.filename() IN ('yellow_tripdata_2017-10.csv', 'yellow_tripdata_2017-11.csv', 'yellow_tripdata_2017-12.csv')
GROUP BY
r.filename()
ORDER BY
[filename];
This function returns a full path or a part of path:
Return data type is nvarchar(1024). For optimal performance, always cast result of filepath
function to appropriate data type. If you use character data type, make sure appropriate length is used.
The following sample reads NYC Yellow Taxi data files for the last three months of 2017. It returns the number of rides per file path. The OPENROWSET
part of the query specifies which files will be read.
SELECT
r.filepath() AS filepath
,COUNT_BIG(*) AS [rows]
FROM OPENROWSET(
BULK 'csv/taxi/yellow_tripdata_2017-1*.csv',
DATA_SOURCE = 'SqlOnDemandDemo',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2
)
WITH (
vendor_id INT
) AS [r]
GROUP BY
r.filepath()
ORDER BY
filepath;
The following example shows how filepath()
can be used in the WHERE
clause to filter the files to be read.
You can use the wildcards in the OPENROWSET
part of the query and filter the files in the WHERE
clause. Your results will be the same as the prior example.
SELECT
r.filepath() AS filepath
,r.filepath(1) AS [year]
,r.filepath(2) AS [month]
,COUNT_BIG(*) AS [rows]
FROM OPENROWSET(
BULK 'csv/taxi/yellow_tripdata_*-*.csv',
DATA_SOURCE = 'SqlOnDemandDemo',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2
)
WITH (
vendor_id INT
) AS [r]
WHERE
r.filepath(1) IN ('2017')
AND r.filepath(2) IN ('10', '11', '12')
GROUP BY
r.filepath()
,r.filepath(1)
,r.filepath(2)
ORDER BY
filepath;
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Introduction to SQL Server 2022 data virtualization - Training
Learn about data virtualization, how to use Polybase to access and query external data, and enhanced Polybase features in SQL Server 2022.
Certification
Microsoft Certified: Fabric Data Engineer Associate - Certifications
As a Fabric Data Engineer, you should have subject matter expertise with data loading patterns, data architectures, and orchestration processes.
Documentation
PolyBase query scenarios - SQL Server
See examples of queries using the PolyBase feature of SQL Server, including SELECT, JOIN external with local tables, import/export data, and new catalog views.
PolyBase features and limitations - SQL Server
PolyBase features available for SQL Server products and services, including a list of T-SQL operators supported for pushdown and known limitations.
Virtualize a CSV file with PolyBase - SQL Server
Virtualize a CSV file with PolyBase starting with SQL Server 2022.