Opetus
Sertifiointi
Microsoft-sertifiointi: Azure Data Fundamentals - Certifications
Todista Perusta tietämys Microsoft Azure -tietopalveluihin liittyvistä ydintietokäsitteistä.
Tätä selainta ei enää tueta.
Päivitä Microsoft Edgeen, jotta voit hyödyntää uusimpia ominaisuuksia, suojauspäivityksiä ja teknistä tukea.
A table resides in a schema and contains rows of data. All tables created in Azure Databricks use Delta Lake by default. Tables backed by Delta Lake are also called Delta tables.
A Delta table stores data as a directory of files in cloud object storage and registers table metadata to the metastore within a catalog and schema. All Unity Catalog managed tables and streaming tables are Delta tables. Unity Catalog external tables can be Delta tables but are not required to be.
You can create tables on Databricks that don’t use Delta Lake. These tables don’t provide the transactional guarantees or optimized performance of Delta tables.
Tables store rows of data. A table resides in a schema and contains rows of data. The following example shows a table prod.people_ops_employees that contains data about five employees. The metadata is registered in Unity Catalog and the data is stored in cloud storage.
To create a table, users must have CREATE TABLE
and USE SCHEMA
permissions on the schema, and they must have the USE CATALOG
permission on its parent catalog. To query a table, users must have the SELECT
permission on the table, the USE SCHEMA
permission on its parent schema, and the USE CATALOG
permission on its parent catalog.
For more on Unity Catalog permissions, see Manage privileges in Unity Catalog.
In Unity Catalog, tables sit at the third level of the three-level namespace (catalog.schema.table
), as shown in the following table. Unity Catalog external tables can be Delta tables but are not required to be.
Backed by Delta Lake, a Delta table stores data as a directory of files in cloud object storage and registers table metadata to the metastore within a catalog and schema. Because Delta tables are the default on Databricks, most references to tables describe the behavior of Delta tables unless otherwise noted. All Unity Catalog managed tables and streaming tables are Delta tables. See What is Delta Lake?.
Databricks recommends that you always interact with Delta tables using fully-qualified table names rather than file paths.
Managed tables manage underlying data files alongside the metastore registration. Databricks recommends that you use managed tables whenever you create a new table. Unity Catalog managed tables are the default when you create tables in Azure Databricks. They always use Delta Lake. See Work with managed tables.
External tables, sometimes called unmanaged tables, reference data stored outside of Databricks in an external storage system, such as cloud object storage. They decouple the management of underlying data files from metastore registration. Unity Catalog >supports external tables in several formats, including Delta Lake. Unity Catalog external tables can store data files using common formats readable by external systems. See Work with external tables.
Streaming tables are Delta tables primarily used for processing incremental data. Most updates to streaming tables happen through refresh operations.
You can register streaming tables in Unity Catalog using Databricks SQL or define them as part of a DLT pipeline. See How streaming tables work, Load data using streaming tables in Databricks SQL, and What is DLT?.
Foreign tables represent data stored in external systems connected to Azure Databricks through Lakehouse Federation. Foreign tables are read-only on Azure Databricks. See What is Lakehouse Federation?.
Any Delta table managed by Unity Catalog that has a primary key is a feature table. You can optionally configure feature tables using the online Feature Store for low-latency use cases. See Work with feature tables in Workspace Feature Store (legacy).
Hive tables describe two distinct concepts on Azure Databricks, both of which are legacy patterns and not recommended.
Tables registered using the legacy Hive metastore store data in the legacy DBFS root, by default. Databricks recommends migrating all tables from the legacy HMS to Unity Catalog. See Database objects in the legacy Hive metastore.
Apache Spark supports registering and querying Hive tables, but these codecs are not optimized for Azure Databricks. Databricks recommends registering Hive tables only to support queries against data written by external systems. See Hive table (legacy).
The term live tables refers to an earlier implementation of functionality now implemented as materialized views. Any legacy code that references live tables should be updated to use syntax for materialized views. See What is DLT? and Use materialized views in Databricks SQL.
Opetus
Sertifiointi
Microsoft-sertifiointi: Azure Data Fundamentals - Certifications
Todista Perusta tietämys Microsoft Azure -tietopalveluihin liittyvistä ydintietokäsitteistä.