PolyBase features and limitations
Applies to:
SQL Server 2016 and later
Azure SQL Database
Azure Synapse Analytics
Analytics Platform System (PDW)
This article is a summary of PolyBase features available for SQL Server products and services.
Feature summary for product releases
This table lists the key features for PolyBase and the products in which they're available.
Feature | SQL Server (Beginning with 2016) | Azure SQL Database | Azure Synapse Analytics | Parallel Data Warehouse |
---|---|---|---|---|
Query Hadoop data with Transact-SQL | Yes | No | No | Yes |
Import data from Hadoop | Yes | No | No | Yes |
Export data to Hadoop | Yes | No | No | Yes |
Query, import from, export to Azure HDInsight | No | No | No | No |
Push down query computations to Hadoop | Yes | No | No | Yes |
Import data from Azure Blob storage | Yes | Yes* | Yes | Yes |
Export data to Azure Blob storage | Yes | No | Yes | Yes |
Import data from Azure Data Lake Store | No | No | Yes | No |
Export data to Azure Data Lake Store | No | No | Yes | No |
Run PolyBase queries from Microsoft BI tools | Yes | No | Yes | Yes |
* Introduced in SQL Server 2017 (14.x), see Examples of bulk access to data in Azure Blob storage.
Known limitations
PolyBase has the following limitations:
Before SQL Server 2019 (15.x), the maximum possible row size, which includes the full length of variable length columns, can't exceed 32 KB in SQL Server or 1 MB in Azure Synapse Analytics. Starting with SQL Server 2019 (15.x), this limitation is lifted. The limit remains 1 MB for Hadoop data sources, but is limited only by the maximum SQL Server limit for other data sources.
When data is exported into an ORC file format from SQL Server or Azure Synapse Analytics, text-heavy columns might be limited. They can be limited to as few as 50 columns because of Java out-of-memory error messages. To work around this issue, export only a subset of the columns.
PolyBase can't connect to any Hadoop instance if Knox is enabled.
If you use Hive tables with transactional = true, PolyBase can't access the data in the Hive table's directory.
PolyBase services require SQL Server service to have TCP/IP network protocol enabled to function correctly. Additionally, if TCP/IP Protocol configuration setting Listen All is set to No, you must still have an entry for the correct listener port in either TCP Dynamic Ports or TCP Ports under IPAll in TCP/IP Properties. This is required due to the way PolyBase services resolve the listener port of the SQL Server Engine.
PolyBase on SQL Server on Linux will not function if IPv6 is disabled in the kernel. See Release notes for SQL Server 2019 on Linux for further information on this limitation.
PolyBase services require Shared Memory protocol to be enabled to function properly.
If you have a default SQL Server instance that is configured to listen on TCP port other than 1433, you cannot use it as a head node in a PolyBase scale-out group. When executing
sp_polybase_join_group
, if you pass 'MSSQLSERVER' as the instance name, SQL Server will assume port 1433 is the listener port, so the Data Movement service will be unable to connect to the head node when starting.Oracle synonyms are not supported for usage with PolyBase.
UTF-8 collations are not supported for Hadoop external data sources.
Starting in SQL Server 2022 (16.x), Hadoop is no longer supported.
Next steps
For more information about PolyBase, see Introducing data virtualization with PolyBase.
Feedback
Submit and view feedback for