Google Drive connector FAQs

Important

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

This page answers frequently asked questions about the Google Drive connector in Databricks Lakeflow Connect.

General managed connector FAQs

The answers in Managed connector FAQs apply to all managed connectors in Lakeflow Connect. Keep reading for connector-specific FAQs.

What file formats are supported?

The connector supports both unstructured and structured file ingestion:

  • Unstructured: BINARYFILE
    • Files are ingested as rows with a content column plus metadata columns. Use for PDFs, images, Office files, and other files you intend to process downstream.
  • Structured: CSV, JSON, XML, EXCEL, PARQUET, AVRO, ORC
    • Files are parsed and each row inside the file becomes a row in the destination table.

The connector skips unsupported Google formats (for example, Google Forms, Google Sites, Google Jams, and Google Vids) during ingestion.

What storage modes are supported?

Unstructured (BINARYFILE) ingestion supports SCD_TYPE_1 storage mode. Structured ingestion (CSV, JSON, XML, EXCEL, and other formats) supports APPEND_ONLY storage mode. SCD type 2 is not currently supported.

Because SCD_TYPE_1 and APPEND_ONLY are the defaults for their respective format types and also the only options currently supported, setting storage_mode explicitly in table_configuration is optional.

How does incremental ingestion work?

On subsequent pipeline runs, the connector only re-ingests files that were added or updated since the last run. It does not incrementally update within those files (for example, only the rows in a CSV that changed).

Can I ingest a single file?

Not directly. The connector ingests all files in a folder or drive. However, you can approximate single-file selection by pointing the url at the folder containing the file and using file_filters with a path_filter glob pattern that matches only that file's name. See Google Drive connector reference.

Is there a file size limit?

For unstructured (BINARYFILE) ingestion, large files might affect pipeline performance. Azure Databricks recommends ingesting at most once hourly and monitoring pipeline run times for signs of resource pressure.

How are built-in Google formats handled?

When using the managed Google Drive connector, built-in Google formats (Google Docs, Google Sheets, Google Slides) are automatically exported to an open format during ingestion. Set the format in file_ingestion_options to BINARYFILE to ingest them as binary, or use EXCEL for Google Sheets. For more details on Google format handling with the managed connector, see How built-in Google formats are handled.

What is the difference between the managed Google Drive connector and the standard Google Drive connector?

The managed Google Drive connector (gdrive_options in the pipeline API) is a fully managed ingestion pipeline that incrementally syncs files from Google Drive into Delta tables, with schema inference, schema evolution, file filtering, and orchestration via Workflows. It is configured through the Lakeflow Connect pipeline API.

The standard Google Drive connector uses Spark and SQL functions (read_files, spark.read, Auto Loader) to build custom pipelines. Use it when you need fine-grained control over how files are read and transformed, or when you want to use Spark reader APIs directly.