Microsoft SharePoint connector FAQs

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

This page answers frequently asked questions about the Microsoft SharePoint connector in Databricks Lakeflow Connect.

General managed connector FAQs

The answers in Managed connector FAQs apply to all managed connectors in Lakeflow Connect. Keep reading for connector-specific FAQs.

How does the connector handle updates in SharePoint?

On subsequent pipeline runs, the connector only re-ingests files that were updated since the last run. It does not incrementally update within those files (for example, only the Excel data that changed within a specific file).

Which APIs does the connector use?

The connector uses the Microsoft Graph API.

Does the refresh token expire?

Yes. By default, the refresh token expires after 90 days. This is true for all supported authentication methods.

Is M2M authentication supported?

Yes. The connector supports both delegated access (U2M OAuth) and app-only access (M2M OAuth). For setup instructions, see Configure OAuth M2M for SharePoint ingestion.

What storage modes are supported?

Unstructured (BINARYFILE) ingestion supports SCD_TYPE_1 storage mode. Structured ingestion (CSV, JSON, XML, EXCEL, and other formats) supports APPEND_ONLY storage mode. SCD type 2 is not currently supported.

Because SCD_TYPE_1 and APPEND_ONLY are the defaults for their respective format types and also the only options currently supported, setting storage_mode explicitly in table_configuration is optional.

What file formats are supported?

The connector supports both unstructured and structured file ingestion:

  • Unstructured: BINARYFILE
    • Files are ingested as rows with a content column plus metadata columns. Use for PDFs, images, Office files, and other files you intend to process downstream.
  • Structured: CSV, JSON, XML, EXCEL, PARQUET, AVRO, ORC
    • Files are parsed and each row inside the file becomes a row in the destination table.