Applies to: SQL Server 2016 (13.x) and later versions
PolyBase vs. linked servers
The following table highlights the differences between PolyBase and linked server features:
PolyBase | Linked Servers |
---|---|
Database scoped object | Instance scoped object |
Uses ODBC drivers | Uses OLEDB providers |
Supports read-only operations for all data sources and write to some external data sources | Supports both read and write operations |
Queries to remote data source from a single connection can be scaled-out | Queries to remote data source from a single connection cannot be scaled-out |
Predicates pushdown is supported | Predicates pushdown is supported |
No separate configuration needed for availability group | Separate configuration needed for each instance in availability group |
Basic authentication only | Basic & integrated authentication |
Suitable for analytic queries processing large number of rows | Suitable for OLTP queries returning single or few rows |
Queries using external table cannot participate in distributed transaction | Distributed queries can participate in distributed transaction |
What's new in PolyBase in SQL Server 2022?
SQL Server 2022 (16.x) now supports CSV, Parquet, and Delta files stored on Azure Storage Account v2, Azure Data Lake Storage Gen2, or any S3-compatible object storage using the S3 REST API, on-premises or in the cloud.
SQL Server 2022 (16.x) can now use CREATE EXTERNAL TABLE as SELECT (CETAS), together with commands like OPENROWSET, CREATE EXTERNAL TABLE (CET), and all the new T-SQL enhancements.
Currently, support for Hadoop external data sources has been removed in SQL Server 2022 (16.x).
SQL Server 2022 (16.x) Cumulative update 2 now introduces support for Oracle TNS files. Please refer to CREATE EXTERNAL DATA SOURCE for more information and samples.
Read more on Data Virtualization with PolyBase for SQL Server 2022 and review SQL Server 2022 PolyBase enhancements.
What's new in PolyBase in SQL Server 2019?
PolyBase in SQL Server 2019 (15.x) can now read data from a larger variety of data sources. The data from these external data sources can be stored as external tables on your SQL Server. PolyBase also supports pushdown computation to these external data sources, excluding ODBC generic types.
Compatible Data Sources
- SQL Server
- Oracle
- Teradata
- MongoDB
- Compatible ODBC generic types
SQL Server 2019 (15.x) Cumulative update 19 now introduces support for Oracle TNS files. Please refer to CREATE EXTERNAL DATA SOURCE for more information and samples.
Note
PolyBase can allow connection to external data sources using third-party ODBC drivers. These drivers are not provided along with PolyBase and may not work as intended. For more information, see Configure PolyBase to access external data with ODBC generic types.
What CONNECTION_OPTIONS can I specify for connection to third-party data sources?
Review CREATE EXTERNAL DATA SOURCE (Transact-SQL) CONNECTION_OPTIONS for supported data provider connection options for SQL Server, Oracle, Teradata, the MongoDB API for Cosmos DB, and generic ODBC data providers.
How can I view the remote query execution plan of PolyBase pushdown computation?
This is possible via a couple of methods depending on the version of SQL Server. See PolyBase Troubleshooting - To view the PolyBase query plan.
How can I tell whether PolyBase pushdown computation is happening?
Is PolyBase supported on Azure SQL Database?
No, but you can query data in the files placed on Azure Blob Storage using OPENROWSET function, or use CREATE EXTERNAL TABLE to read from a serverless SQL pool in Synapse Analytics. For more information, see Azure SQL can read Azure Data Lake storage files using Synapse SQL external tables.
Is PolyBase supported on Azure SQL Managed Instance?
Azure SQL Managed Instance has its own data virtualization capabilities for Azure Data Lake Storage (ADLS) and Azure Blob Storage as data sources. For more information, see Data virtualization with Azure SQL Managed Instance.
Which ports should I allow through my firewall for PolyBase?
No additional ports are needed to open for PolyBase itself, except in the case of a PolyBase scale-out group. Without a scale-out group, the PolyBase service alone should not require any firewall rule.
On Windows servers, PolyBase will leverage the ports used by SQL Server in the IPAll section of the TCP/IP settings in SQL Server Configuration Manager. For more information, see TCP/IP Properties. On Linux servers, PolyBase will leverage the network.tcpport setting. For more information, see TCP port.
For PolyBase scale-out groups over multiple Windows servers, see port information in the next question.
However, you must allow other ports for communication to the various external data source(s) used by PolyBase. Refer to ODBC configuration, Azure resource configuration, or specific data source documentation. For example:
Source platform | Reference |
---|---|
Microsoft SQL Server | Ports Used By SQL Server |
Hortonworks HDP | Administering HDFS |
Cloudera CDH | Ports Used by CDH Components |
What ports are used for PolyBase scale-out group?
By default, the PolyBase data movement service connects to the head node of a scale-out group over a range of ports from TCP 16450-16460, typically using 16450-16453, plus port 17001. These ports are only used when a PolyBase scale-out group is configured and should be allowed through server firewalls. SQL Server Setup should automatically create a local firewall rule allowing these ports upon feature installation. For more information, see Configure PolyBase scale-out groups on Windows.
Note
The Microsoft SQL Server PolyBase scale-out groups will be retired. Scale-out group functionality will be removed from the product in SQL Server 2022 (16.x).
PolyBase in Big Data Clusters vs. PolyBase in stand-alone instances
For more information on connections using the ODBC generic connector, visit our How to guide for configuring ODBC generic types.
The following table highlights the PolyBase features available in SQL Server 2019 (15.x) stand-alone install and SQL Server 2019 (15.x) big data cluster:
Feature | Big Data Cluster | Stand alone instance |
---|---|---|
Create external data source for SQL Server, Oracle, Teradata, and Mongo DB | X | X |
Create external data source using a compatible third-party ODBC Driver | X | |
Create external data source for HADOOP data source | X | X |
Create external data source for Azure Blob Storage | X | X |
Create external table on a SQL Server data pool | X | |
Create external table on a SQL Server storage pool | X | |
Scale-out query execution | X | X (Windows only) |
Important
The Microsoft SQL Server 2019 Big Data Clusters add-on will be retired. Support for SQL Server 2019 Big Data Clusters will end on February 28, 2025. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform and the software will continue to be maintained through SQL Server cumulative updates until that time. For more information, see the announcement blog post and Big data options on the Microsoft SQL Server platform.