Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
User-facing catalog API, accessible through SparkSession.catalog. This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog.
Syntax
# Access through SparkSession
spark.catalog
Methods
| Method | Description |
|---|---|
currentCatalog() |
Returns the current default catalog in this session. |
setCurrentCatalog(catalogName) |
Sets the current default catalog in this session. |
listCatalogs(pattern) |
Returns a list of catalogs in this session. |
currentDatabase() |
Returns the current default database in this session. |
setCurrentDatabase(dbName) |
Sets the current default database in this session. |
listDatabases(pattern) |
Returns a list of databases available across all sessions. |
getDatabase(dbName) |
Gets the database with the specified name. Throws an AnalysisException when the database cannot be found. |
databaseExists(dbName) |
Checks if the database with the specified name exists. |
listTables(dbName, pattern) |
Returns a list of tables and views in the specified database. Includes all temporary views. |
getTable(tableName) |
Gets the table or view with the specified name. Throws an AnalysisException when no table can be found. |
tableExists(tableName, dbName) |
Checks if the table or view with the specified name exists. |
listColumns(tableName, dbName) |
Returns a list of columns for the given table or view in the specified database. |
listFunctions(dbName, pattern) |
Returns a list of functions registered in the specified database. Includes all temporary functions. |
functionExists(functionName, dbName) |
Checks if the function with the specified name exists. Includes temporary functions. |
getFunction(functionName) |
Gets the function with the specified name. Throws an AnalysisException when the function cannot be found. |
createTable(tableName, path, source, schema, description, **options) |
Creates a table based on the dataset in a data source and returns the associated DataFrame. |
dropTempView(viewName) |
Drops the local temporary view with the given name. Also uncaches the view if it was cached. |
dropGlobalTempView(viewName) |
Drops the global temporary view with the given name. Also uncaches the view if it was cached. |
isCached(tableName) |
Returns true if the table is currently cached in-memory. |
cacheTable(tableName, storageLevel) |
Caches the specified table in-memory or with the given storage level. Defaults to MEMORY_AND_DISK. |
uncacheTable(tableName) |
Removes the specified table from the in-memory cache. |
clearCache() |
Removes all cached tables from the in-memory cache. |
refreshTable(tableName) |
Invalidates and refreshes all cached data and metadata of the given table. |
recoverPartitions(tableName) |
Recovers all the partitions of the given table and updates the catalog. Only works with partitioned tables. |
refreshByPath(path) |
Invalidates and refreshes all cached data and metadata for any DataFrame containing the given data source path. |
Examples
spark.catalog.currentDatabase()
'default'
spark.catalog.listDatabases()
[Database(name='default', catalog='spark_catalog', description='default database', ...)]
_ = spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
spark.catalog.tableExists("tbl1")
True
spark.catalog.cacheTable("tbl1")
spark.catalog.isCached("tbl1")
True
spark.catalog.uncacheTable("tbl1")
spark.catalog.isCached("tbl1")
False