Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
This page shows how to use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Azure Databricks.
What Structured Streaming functionality does Unity Catalog support?
Unity Catalog doesn't add any explicit limits for Structured Streaming sources and sinks available on Azure Databricks.
With Unity Catalog and Structured Streaming you can:
- Stream data from both managed and external tables. See Unity Catalog managed tables in Azure Databricks for Delta Lake and Apache Iceberg.
- Use external locations managed by Unity Catalog to interact with data using object storage URIs.
- Write to external tables using either table names or file paths. To interact with managed tables, you must use the table name.
For Structured Streaming checkpoints, you must use paths in external locations managed by Unity Catalog. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.
Read a Unity Catalog view as a stream
In Databricks Runtime 14.3 LTS and above, you can use Structured Streaming to read from views registered with Unity Catalog. The underlying tables must use the Delta Lake format. For other limitations, see Limitations.
To read a view with Structured Streaming, use the .table() method with the view's identifier:
df = (spark.readStream
.table("demoView")
)
Users must have SELECT privileges on the target view.
If you modify the view definition to add or change the tables referenced in the view, you can't use the same streaming checkpoint.
Supported streaming options
The streaming reader applies options to the files and metadata of the underlying Delta tables for the specified view.
The following options are supported:
maxFilesPerTriggermaxBytesPerTriggerignoreDeletesskipChangeCommitswithEventTimeOrderstartingTimestampstartingVersion
Reads on views with UNION ALL don't support the withEventTimeOrder and startingVersion options.
If you provide unsupported options, such as readChangeFeed, Spark raises this exception:
AnalysisException: [UNSUPPORTED_STREAMING_OPTIONS_FOR_VIEW.UNSUPPORTED_OPTION] Unsupported for streaming a view. Reason: option <option> is not supported.
Supported streaming operations
Supported operations include:
| Operation | Description | Operator | Example |
|---|---|---|---|
| Project | Controls column-level permissions | SELECT... FROM... |
CREATE VIEW project_view AS SELECT id, value FROM source_table |
| Filter | Controls row-level permissions | WHERE... |
CREATE VIEW filter_view AS SELECT * FROM source_table WHERE value > 100 |
| Union all | Results from multiple tables | UNION ALL |
CREATE VIEW union_view AS SELECT id, value FROM source_table1 UNION ALL SELECT * FROM source_table2 |
Unsupported operations include aggregations, sorting, and table-valued functions such as table_changes(). For detail on table-valued functions, see Table-valued function (TVF) invocation.
If you stream from a view with an unsupported operation, Spark raises this exception:
UnsupportedOperationException: [UNEXPECTED_OPERATOR_IN_STREAMING_VIEW] Unexpected operator <operator> in the CREATE VIEW statement as a streaming source. A streaming view query must consist only of SELECT, WHERE, and UNION ALL operations.
Limitations
- Apache Spark continuous processing mode is not supported. See Continuous Processing in the Spark Structured Streaming Programming Guide.
- For a list of Structured Streaming features that are not supported on Unity Catalog based on the compute access mode, see Streaming limitations and Streaming and materialized view requirements on dedicated compute.
- Views as a streaming source have additional limitations:
- You can only stream from views that query Delta tables. Other data sources are not supported.
- You must register views with Unity Catalog. See Create a view.
- Streaming reads on views don't support all operations or options. See Supported streaming operations and Supported streaming options.