Nota
L-aċċess għal din il-paġna jeħtieġ l-awtorizzazzjoni. Tista’ tipprova tidħol jew tibdel id-direttorji.
L-aċċess għal din il-paġna jeħtieġ l-awtorizzazzjoni. Tista’ tipprova tibdel id-direttorji.
Applies to:
Databricks SQL
Databricks Runtime
Loads data from a file location into a Delta table. This is a retryable and idempotent operation — Files in the source location that have already been loaded are skipped. This is true even if the files have been modified since they were loaded.
For examples, see Common data loading patterns using COPY INTO.
Syntax
COPY INTO target_table [ BY POSITION | ( col_name [ , <col_name> ... ] ) ]
FROM { source_clause |
( SELECT expression_list FROM source_clause ) }
FILEFORMAT = data_source
[ VALIDATE [ ALL | num_rows ROWS ] ]
[ FILES = ( file_name [, ...] ) | PATTERN = glob_pattern ]
[ FORMAT_OPTIONS ( { data_source_reader_option = value } [, ...] ) ]
[ COPY_OPTIONS ( { copy_option = value } [, ...] ) ]
source_clause
source [ WITH ( [ CREDENTIAL { credential_name |
(temporary_credential_options) } ]
[ ENCRYPTION (encryption_options) ] ) ]
Parameters
target_tableIdentifies an existing Delta table. The target_table must not include a temporal specification or options specification.
If the table name is provided in the form of a location, such as:
delta.`/path/to/table`, Unity Catalog can govern access to the locations that are being written to. You can write to an external location by:- Defining the location as an external location and having
WRITE FILESpermissions on that external location. - Having
WRITE FILESpermissions on a named storage credential that provide authorization to write to a location using:COPY INTO delta.`/some/location` WITH (CREDENTIAL <named-credential>)
See Connect to cloud object storage using Unity Catalog for more details.
- Defining the location as an external location and having
BY POSITION| ( col_name [ , <col_name> … ] )Matches source columns to target table columns by ordinal position. Type casting of the matched columns is done automatically.
This parameter is only supported for headerless CSV file format. You must specify
FILEFORMAT = CSV.FORMAT_OPTIONSmust also be set to("headers" = "false")(FORMAT_OPTIONS ("headers" = "false")is the default).Syntax option 1:
BY POSITION- Matches source columns to target table columns by ordinal position automatically.
- The default name matching is not used for matching.
IDENTITYcolumns andGENERATEDcolumns of the target table are ignored when matching the source columns.- If the number of source columns doesn't equal the filtered target table columns,
COPY INTOraises an error.
Syntax option 2:
( col_name [ , <col_name> ... ] )- Matches source columns to the specified target table columns by relative ordinal position using a target table column name list in parentheses, separated by comma.
- The original table column order and column names are not used for matching.
IDENTITYcolumns andGENERATEDcolumns cannot be specified in the column name list, otherwiseCOPY INTOraises an error.- The specified columns cannot be duplicated.
- When the number of source columns doesn't equal the specified table columns,
COPY INTOraises an error. - For the columns that are not specified in the column name list,
COPY INTOassigns default values, if any, and assignsNULLotherwise. If any column is not nullable,COPY INTOraises an error.
- Matches source columns to target table columns by ordinal position automatically.
sourceThe file location to load the data from. Files in this location must have the format specified in
FILEFORMAT. The location is provided in the form of a URI.Access to the source location can be provided through:
credential_nameOptional name of the credential used to access or write to the storage location. You use this credential only if the file location is not included in an external location. See credential_name.
Inline temporary credentials.
- Defining the source location as an external location and having
READ FILESpermissions on the external location through Unity Catalog. - Using a named storage credential with
READ FILESpermissions that provide authorization to read from a location through Unity Catalog.
You don't need to provide inline or named credentials if the path is already defined as an external location that you have permissions to use. See Overview of external locations for more details.
Note
If the source file path is a root path, please add a slash (
/) at the end of the file path, for example,s3://my-bucket/.Accepted credential options are:
AZURE_SAS_TOKENfor ADLS and Azure Blob StorageAWS_ACCESS_KEY,AWS_SECRET_KEY, andAWS_SESSION_TOKENfor AWS S3
Accepted encryption options are:
TYPE = 'AWS_SSE_C', andMASTER_KEYfor AWS S3
See Load data using COPY INTO with temporary credentials.
SELECT expression_listSelects the specified columns or expressions from the source data before copying into the Delta table. The expressions can be anything you use with
SELECTstatements, including window operations. You can use aggregation expressions only for global aggregates–you cannotGROUP BYon columns with this syntax.FILEFORMAT = data_sourceThe format of the source files to load. One of
CSV,JSON,AVRO,ORC,PARQUET,TEXT,BINARYFILE.VALIDATEApplies to:
Databricks SQL
Databricks Runtime 10.4 LTS and aboveThe data that is to be loaded into a table is validated but not written to the table. These validations include:
- Whether the data can be parsed.
- Whether the schema matches that of the table or if the schema needs to be evolved.
- Whether all nullability and check constraints are met.
The default is to validate all of the data that is to be loaded. You can provide a number of rows to be validated with the
ROWSkeyword, such asVALIDATE 15 ROWS. TheCOPY INTOstatement returns a preview of the data of 50 rows or less when a number of less than 50 is used with theROWSkeyword).FILESA list of file names to load, with a limit of 1000 files. Cannot be specified with
PATTERN.PATTERNA glob pattern that identifies the files to load from the source directory. Cannot be specified with
FILES.Pattern Description ?Matches any single character *Matches zero or more characters [abc]Matches a single character from character set {a,b,c}. [a-z]Matches a single character from the character range {a…z}. [^a]Matches a single character that is not from character set or range {a}. Note that the ^character must occur immediately to the right of the opening bracket.{ab,cd}Matches a string from the string set {ab, cd}. {ab,c{de, fh}}Matches a string from the string set {ab, cde, cfh}. FORMAT_OPTIONSOptions to be passed to the Apache Spark data source reader for the specified format. See Format options for each file format.
COPY_OPTIONSOptions to control the operation of the
COPY INTOcommand.force: boolean, defaultfalse. If set totrue, idempotency is disabled and files are loaded regardless of whether they've been loaded before.mergeSchema: boolean, defaultfalse. If set totrue, the schema can be evolved according to the incoming data.
Invoke COPY INTO concurrently
COPY INTO supports concurrent invocations against the same table. As long as COPY INTO is invoked concurrently on distinct sets of input files, each invocation should eventually succeed, otherwise you get a transaction conflict. COPY INTO should not be invoked concurrently to improve performance; a single COPY INTO command with multiple files typically performs better than running concurrent COPY INTO commands with a single file each. COPY INTO can be called concurrently when:
- Multiple data producers don't have an easy way to coordinate and cannot make a single invocation.
- When a very large directory can be ingested sub-directory by sub-directory. When ingesting directories with a very large number of files, Databricks recommends using Auto Loader when possible.
Access file metadata
To learn how to access metadata for file-based data sources, see File metadata column.
Format options
For options specific to each file format (JSON, CSV, XML, Parquet, Avro, text, ORC, and binary), see DataFrameReader options.