Share via


Google Analytics Raw Data connector limitations

Important

The Google Analytics Raw Data connector is in Public Preview.

This page lists limitations and considerations for ingesting raw, event-level data from Google Analytics using Databricks Lakeflow Connect and Google BigQuery.

General SaaS connector limitations

The limitations in this section apply to all SaaS connectors in Lakeflow Connect.

  • When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
  • When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Declarative Pipelines behavior.
  • During source maintenance periods, Databricks might not be able to access your data.
  • If a source table name conflicts with an existing destination table name, the pipeline update fails.
  • Multi-destination pipeline support is API-only.
  • You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
  • Column-level selection and deselection are API-only.
  • If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
  • Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
  • The source system assumes that the cursor columns are monotonically increasing.

Connector-specific limitations

The limitations in this section are specific to the GA4 connector.

Authentication

  • The connector only supports authentication using a GCP service account.

Pipelines

  • Updates and deletes in GA4 are not ingested.
  • The connector only supports one GA4 property per pipeline.
  • Ingestion from Universal Analytics (UA) is not supported.

Tables

  • The connector can't reliably ingest BigQuery date-partitioned tables that are larger than 50 GB.
  • The connector only ingests raw data that you export from GA4 to BigQuery, and it inherits GA4 limits on the amount of historical data that you can export to BigQuery.
  • The initial load fetches the data for all dates that are present in your GA4/BigQuery project.
  • Databricks can't guarantee retention of events_intraday data for a given day after the data is available in the events table. This is because the events_intraday table is only intended for interim use until the events table is ready for that day.
  • The connector assumes that each row is unique. Databricks can't guarantee correct behavior if there are unexpected duplicates.