Introduction to PolyBase
- 8 minutes
PolyBase is the feature that SQL Server uses to enable the data virtualization concept. PolyBase was originally released in SQL Server 2016 and is improved in each later version of SQL Server. However, the general concept of accessing data remotely without having to copy the data dates from SQL Server 7.0 with the introduction of Linked Server.
The following table lists the first SQL Server version to support various PolyBase features.
| SQL Server 2016 | SQL Server 2017 | SQL Server 2019 | SQL Server 2025 |
|---|---|---|---|
| • Hadoop • Azure Blob Storage |
• OPENROWSET enhancements • CSV for Azure Blob Storage • Database Scoped Credential |
• SQL Server • Oracle • Azure Cosmos DB • MongoDB • Teradata • Linux support • Generic ODBC |
• New connector framework • Object storage integration • CSV • Parquet • Delta • CETAS |
For more information about PolyBase, see PolyBase features and limitations.
PolyBase enhancements in SQL Server 2025
Native support for CSV, Parquet, & Delta 1: PolyBase Query Service for External Data installation is no longer required to use OPENROWSET, CREATE EXTERNAL TABLE, or CREATE EXTERNAL TABLE AS SELECT with the following types of external data: Parquet, Delta, Azure Blob Storage (ABS), Azure Data Lake Storage (ADLS), or S3-Compatible Object storage.
Use generic ODBC data sources on Linux: For more information, see Configure PolyBase to access external data with ODBC generic types.
TDS 8.0 support: When using Microsoft ODBC Driver 18 for SQL Server, TDS 8.0 isn't supported for SQL Server as an external data source.
S3-compatible object storage
SQL Server 2025 supports S3-compatible object storage. To enable this integration, SQL Server 2025 uses a REST API connector framework architecture that follows the S3 framework. Any object storage that supports the S3 framework also works with SQL Server 2025. S3-compatible object storage solutions can run locally, in your network, in the cloud, or in a hybrid environment.
Object storage, also known as object-based storage, is a strategy that manages and manipulates data storage as distinct units, called objects. These objects are kept in a single storehouse and aren't ingrained in files inside other folders. Instead, object storage combines the pieces of data that make up a file, adds all relevant metadata to that file, and attaches a custom identifier.
Some main features of object storage compared to a traditional file system are:
- Keeps metadata embedded in the file.
- Lets files have attributes like tags.
- More cost-effective to scale and easier to maintain.
- Optimized for large amounts of data, such as Big Data, Internet of Things (IoT), AI, Machine Learning, and analytics.
- Not recommended for high-transactional or online transaction processing (OLTP) workloads.
You can also use S3-compatible object storage for backup and restore scenarios by using the BACKUP TO URL command. For more information, see SQL Server backup and restore with S3-compatible object storage.
Amazon Web Services (AWS) established the S3 standard framework, and major storage providers like Cloudian, Dell, MinIO, and PureStorage now offer S3-compatible object storage solutions. If a solution offers compatibility with S3 REST APIs, it's compatible with SQL Server 2025.
For more information about object storage benefits, installation, and testing, see the following storage partner documentation. For more object storage providers, see Providers of S3-compatible object storage.
- Cloudian HyperStore
- Dell Isildon OneFS
- Dell ECS Community Edition
- Hitachi Hitachi Content Platform (HCP)
- MinIO Multi-Cloud Object Storage
- Pure Storage Pure FlashBlade.
Some object storage partners offer the ability to run their solution as software capable of virtualizing your current storage. You can install and try these solutions on your own machine or virtual machine (VM).
PolyBase services vs. the PolyBase REST API feature
To use PolyBase, you must install the PolyBase Query Service for External Data and enable PolyBase at an instance level by using sp_configure. PolyBase setup installs two PolyBase services, SQL Server PolyBase Engine and SQL Server PolyBase Data Movement.
SQL Server PolyBase Engine
- Service executable:
mpdwsvc.exe -dweng - Parses queries.
- Generates query plans.
- Distributes work to compute nodes (SQL Server 2019).
- Processes compute node results and results back to the client (SQL Server 2019).
- Service executable:
SQL Server PolyBase Data Movement
- Service executable:
mpdwsvc.exe -dms - Transfers data between external data sources and between PolyBase head and compute nodes (SQL Server 2019).
- Inserts data into other data sources, such as Azure Storage.
- Service executable:
Data sources like SQL Server, Oracle, MongoDB, or ODBC-based sources use these PolyBase services. Data sources that use the SQL Server 2025 REST API-based PolyBase architecture don't require these services to be running or configured, but the PolyBase Query Service for External Data must still be installed and enabled.
You can use the PolyBase REST APIs to access Azure Data Lake Storage, Azure Blob Storage, any S3-compatible object storage, and file formats such as Parquet, Delta, and CSV files. Previously supported data sources still use the SQL Server PolyBase Engine and SQL Server PolyBase Data Movement services.
| Data source | PolyBase services | PolyBase REST API feature |
|---|---|---|
| Azure Blob Storage | ||
| Azure Data Lake Storage | ||
| S3-compatible object storage | ||
| SQL Server | ||
| Oracle | ||
| Teradata | ||
| MongoDB or Azure Cosmos DB API for MongoDB | ||
| Generic Open Database Connectivity (ODBC) | ||
| Bulk operations |