Instruire
Modul
Use Azure Synapse serverless SQL pool to query files in a data lake - Training
Use Azure Synapse serverless SQL pool to query files in a data lake
Acest browser nu mai este acceptat.
Faceți upgrade la Microsoft Edge pentru a profita de cele mai noi funcții, actualizări de securitate și asistență tehnică.
cloud_files_state
table-valued functionApplies to: Databricks SQL
Databricks Runtime 11.3 LTS and above
Returns the file-level state of an Auto Loader or read_files
stream.
cloud_files_state( { TABLE ( table_name ) | checkpoint } )
read_files
. The name must not include a temporal specification. Available in Databricks Runtime 13.3 LTS and above.checkpoint
: A STRING
literal. The checkpoint directory for a stream using the Auto Loader source. See What is Auto Loader?.Returns a table with the following schema:
path STRING NOT NULL PRIMARY KEY
The path of a file.
size BIGINT NOT NULL
The size of a file in bytes.
create_time TIMESTAMP NOT NULL
The time that a file was created.
discovery_time TIMESTAMP NOT NULL
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was discovered.
commit_time TIMESTAMP
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was committed to the checkpoint after processing.
NULL
if the file is not yet processed. A file might be processed, but might be
marked as committed arbitrarily later. Marking the file as committed means that
Auto Loader does not require the file for processing again.
archive_time TIMESTAMP
Important
This feature is in Private Preview. To try it, reach out to your Azure Databricks contact.
The time that a file was archived. NULL
if the file has not been archived.
source_id STRING
The ID of the Auto Loader source in the streaming query. This value is '0'
for streams that ingest from a
single cloud object store location.
flow_name STRING
Applies to: Databricks SQL
Databricks Runtime 13.3 and above
The flow_name
represents a specific streaming flow in DLT that contains one or more cloud-files sources.
NULL
if no table_name
was given.
You need to have:
OWNER
privileges on the streaming table if using a streaming table identifier.READ FILES
privileges on the checkpoint location if providing a checkpoint under an external location.-- Simple example from checkpoint
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');
/some/input/path
/other/input/path
-- Simple example from source subdir
> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');
/some/input/path
/other/input/path
-- Simple example from streaming table
> SELECT path FROM CLOUD_FILES_STATE(TABLE(my_streaming_table));
/some/input/path
/other/input/path
Instruire
Modul
Use Azure Synapse serverless SQL pool to query files in a data lake - Training
Use Azure Synapse serverless SQL pool to query files in a data lake
Documentație
Auto Loader options - Azure Databricks
Reference documentation for Auto Loader and cloudFiles options, parameters, and keywords.
Using Auto Loader with Unity Catalog - Azure Databricks
Use Auto Loader for incremental data ingestion from external locations or to tables managed by Unity Catalog.
Common data loading patterns - Azure Databricks
Learn common data loading patterns leveraging Auto Loader.