Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Learn about the supported authentication methods for SharePoint ingestion into Azure Databricks.
Important
The managed SharePoint connector is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.
Tip
This page covers the managed SharePoint connector for ingesting unstructured files (PDFs, DOCX, and more) for use in applications such as RAG.
To build custom pipelines with the SharePoint connector, providing full control over parsing, transformations, and ingestion of both structured files (for example, CSV and Excel) and unstructured files into Delta tables, see Ingest files from SharePoint.
Choose your SharePoint connector
Lakeflow Connect offers two complementary SharePoint connectors. They both access data in SharePoint, but they support distinct goals.
| Consideration | Managed SharePoint connector | Standard SharePoint connector |
|---|---|---|
| Management and customization | A fully-managed connector. Simple, low-maintenance connectors for enterprise applications that ingest data in to Delta tables and keep them in sync with the source. See Managed connectors in Lakeflow Connect. |
Build custom ingestion pipelines with SQL, PySpark, or Lakeflow Spark Declarative Pipelines using batch and streaming APIs such as read_files, spark.read, COPY INTO, and Auto Loader.Offers the flexibility to perform complex transformations during ingestion, while giving you greater responsibility for managing and maintaining your pipelines. |
| Output format | Uniform binary content table. Ingests each file in binary format (one file per row), along with file metadata in additional columns. |
Structured Delta tables. Ingests structured files (like CSV and Excel) as Delta tables. Can also be used to ingest unstructured files in binary format. |
| Granularity, filtering, and selection | No subfolder or file level selection today. No pattern-based filtering. Ingests all files in the specified SharePoint document library. |
Granular and custom. URL-based selection to ingest from document libraries, subfolders, or individual files. Also supports pattern-based filtering using the pathGlobFilter option. |
Which authentication methods are supported?
The SharePoint connector supports the following authentication methods:
Which authentication method should I choose?
In most scenarios, Databricks recommends machine-to-machine (M2M) OAuth. M2M scopes connector permissions to a specific site. However, if you want to scope permissions to whatever the authenticating user can access, choose user-to-machine (U2M) OAuth instead. Both methods offer automated token refresh and heightened security.
Manual token refresh authentication is considered a legacy method and is not recommended.
U2M compared to M2M
The following table compares U2M and M2M for authentication to SharePoint:
| Feature | OAuth U2M | OAuth M2M |
|---|---|---|
| Authentication type | Delegated access (user-based) | App-only access (service principal) |
| User interaction required | Yes - User must sign in | No - Fully automated |
| Best for | User-specific access scenarios | Automated production pipelines |
| Token refresh | Handled automatically by Azure Databricks | Handled automatically by Azure Databricks |
| SharePoint permissions | Delegated permissions | Application permissions |
| Access scope | Limited to user's permissions | Defined by app registration |