Ingest from storage

Članak
12/26/2023

The .ingest into command ingests data into a table by "pulling" the data from one or more cloud storage files. For example, the command can retrieve 1000 CSV-formatted blobs from Azure Blob Storage, parse them, and ingest them together into a single target table. Data is appended to the table without affecting existing records, and without modifying the table's schema.

Note

This ingestion method is intended for exploration and prototyping. Don't use it in production or high-volume scenarios.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.ingest [async] into table TableName SourceDataLocator [with ( IngestionPropertyName = IngestionPropertyValue [, ...] )]

Learn more about syntax conventions.

Parameters

Name	Type	Required	Description
`async`	`string`		If specified, the command returns immediately and continues ingestion in the background. The results of the command include an `OperationId` value that can then be used with the `.show operation` command to retrieve the ingestion completion status and results.
TableName	`string`	✔️	The name of the table into which to ingest data. The table name is always relative to the database in context. If no schema mapping object is provided, the schema of the database in context is used.
SourceDataLocator	`string`	✔️	A single or comma-separated list of storage connection strings. A single connection string must refer to a single file hosted by a storage account. Ingestion of multiple files can be done by specifying multiple connection strings, or by ingesting from a query of an external table.

Note

We recommend using obfuscated string literals for the SourceDataPointer. The service will scrub credentials in internal traces and error messages.

Ingestion properties

Important

In queued ingestion data is batched using Ingestion properties. The more distinct ingestion mapping properties used, such as different ConstValue values, the more fragmented the ingestion becomes, which can lead to performance degradation.

The following table lists the properties supported by Azure Data Explorer, describes them, and provides examples:

Property	Description	Example
`ingestionMapping`	A string value that indicates how to map data from the source file to the actual columns in the table. Define the `format` value with the relevant mapping type. See data mappings.	`with (format="json", ingestionMapping = "[{\"column\":\"rownumber\", \"Properties\":{\"Path\":\"$.RowNumber\"}}, {\"column\":\"rowguid\", \"Properties\":{\"Path\":\"$.RowGuid\"}}]")` (deprecated: `avroMapping`, `csvMapping`, `jsonMapping`)
`ingestionMappingReference`	A string value that indicates how to map data from the source file to the actual columns in the table using a named mapping policy object. Define the `format` value with the relevant mapping type. See data mappings.	`with (format="csv", ingestionMappingReference = "Mapping1")` (deprecated: `avroMappingReference`, `csvMappingReference`, `jsonMappingReference`)
`creationTime`	The datetime value (formatted as an ISO8601 string) to use at the creation time of the ingested data extents. If unspecified, the current value (`now()`) will be used. Overriding the default is useful when ingesting older data, so that the retention policy will be applied correctly. When specified, make sure the `Lookback` property in the target table's effective Extents merge policy is aligned with the specified value.	`with (creationTime="2017-02-13")`
`extend_schema`	A Boolean value that, if specified, instructs the command to extend the schema of the table (defaults to `false`). This option applies only to `.append` and `.set-or-append` commands. The only allowed schema extensions have additional columns added to the table at the end.	If the original table schema is `(a:string, b:int)`, a valid schema extension would be `(a:string, b:int, c:datetime, d:string)`, but `(a:string, c:datetime)` wouldn't be valid
`folder`	For ingest-from-query commands, the folder to assign to the table. If the table already exists, this property will override the table's folder.	`with (folder="Tables/Temporary")`
`format`	The data format (see supported data formats).	`with (format="csv")`
`ingestIfNotExists`	A string value that, if specified, prevents ingestion from succeeding if the table already has data tagged with an `ingest-by:` tag with the same value. This ensures idempotent data ingestion. For more information, see ingest-by: tags.	The properties `with (ingestIfNotExists='["Part0001"]', tags='["ingest-by:Part0001"]')` indicate that if data with the tag `ingest-by:Part0001` already exists, then don't complete the current ingestion. If it doesn't already exist, this new ingestion should have this tag set (in case a future ingestion attempts to ingest the same data again.)
`ignoreFirstRecord`	A Boolean value that, if set to `true`, indicates that ingestion should ignore the first record of every file. This property is useful for files in `CSV`and similar formats, if the first record in the file are the column names. By default, `false` is assumed.	`with (ignoreFirstRecord=false)`
`policy_ingestiontime`	A Boolean value that, if specified, describes whether to enable the Ingestion Time Policy on a table that is created by this command. The default is `true`.	`with (policy_ingestiontime=false)`
`recreate_schema`	A Boolean value that, if specified, describes whether the command may recreate the schema of the table. This property applies only to the `.set-or-replace` command. This property takes precedence over the `extend_schema` property if both are set.	`with (recreate_schema=true)`
`tags`	A list of tags to associate with the ingested data, formatted as a JSON string	`with (tags="['Tag1', 'Tag2']")`
`validationPolicy`	A JSON string that indicates which validations to run during ingestion of data represented using CSV format. See Data ingestion for an explanation of the different options.	`with (validationPolicy='{"ValidationOptions":1, "ValidationImplications":1}')` (this is actually the default policy)
`zipPattern`	Use this property when ingesting data from storage that has a ZIP archive. This is a string value indicating the regular expression to use when selecting which files in the ZIP archive to ingest. All other files in the archive will be ignored.	`with (zipPattern="*.csv")`

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication method	Azure Blob Storage / Data Lake Storage Gen2	Data Lake Storage Gen1
Impersonation	Storage Blob Data Reader	Reader
Shared Access (SAS) token	List + Read	This authentication method isn't supported in Gen1.
Microsoft Entra access token
Storage account access key		This authentication method isn't supported in Gen1.
Managed identity	Storage Blob Data Reader	Reader

Returns

The result of the command is a table with as many records as there are data shards ("extents") generated by the command. If no data shards have been generated, a single record is returned with an empty (zero-valued) extent ID.

Name	Type	Description
ExtentId	`guid`	The unique identifier for the data shard that was generated by the command.
ItemLoaded	`string`	One or more storage files that are related to this record.
Duration	`timespan`	How long it took to perform ingestion.
HasErrors	`bool`	Whether this record represents an ingestion failure or not.
OperationId	`guid`	A unique ID representing the operation. Can be used with the `.show operation` command.

Note

This command doesn't modify the schema of the table being ingested into. If necessary, the data is "coerced" into this schema during ingestion, not the other way around (extra columns are ignored, and missing columns are treated as null values).

Examples

Azure Blob Storage with shared access signature

The following example instructs your cluster to read two blobs from Azure Blob Storage as CSV files, and ingest their contents into table T. The ... represents an Azure Storage shared access signature (SAS) which gives read access to each blob. Note also the use of obfuscated strings (the h in front of the string values) to ensure that the SAS is never recorded.

.ingest into table T (
    h'https://contoso.blob.core.windows.net/container/file1.csv?...',
    h'https://contoso.blob.core.windows.net/container/file2.csv?...'
)

Azure Blob Storage with managed identity

The following example shows how to read a CSV file from Azure Blob Storage and ingest its contents into table T using managed identity authentication. For additional information on managed identity authentication method, see Managed Identity Authentication Overview.

.ingest into table T ('https://StorageAccount.blob.core.windows.net/Container/file.csv;managed_identity=802bada6-4d21-44b2-9d15-e66b29e4d63e')

Azure Data Lake Storage Gen 2

The following example is for ingesting data from Azure Data Lake Storage Gen 2 (ADLSv2). The credentials used here (...) are the storage account credentials (shared key), and we use string obfuscation only for the secret part of the connection string.

.ingest into table T (
  'abfss://myfilesystem@contoso.dfs.core.windows.net/path/to/file1.csv;...'
)

Azure Data Lake Storage

The following example ingests a single file from Azure Data Lake Storage (ADLS). It uses the user's credentials to access ADLS (so there's no need to treat the storage URI as containing a secret). It also shows how to specify ingestion properties.

.ingest into table T ('adl://contoso.azuredatalakestore.net/Path/To/File/file1.ext;impersonate')
  with (format='csv')

Amazon S3 with an access key

The following example ingests a single file from Amazon S3 using an access key ID and a secret access key.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/path/to/file.csv;AwsCredentials=AKIAIOSFODNN7EXAMPLE,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY')
  with (format='csv')

Amazon S3 with a presigned URL

The following example ingests a single file from Amazon S3 using a preSigned URL.

.ingest into table T ('https://bucketname.s3.us-east-1.amazonaws.com/file.csv?<<pre signed string>>')
  with (format='csv')

Ingest from storage

Permissions

Syntax

Parameters

Ingestion properties

Authentication and authorization

Returns

Examples

Azure Blob Storage with shared access signature

Azure Blob Storage with managed identity

Azure Data Lake Storage Gen 2

Azure Data Lake Storage

Amazon S3 with an access key

Amazon S3 with a presigned URL

Povratne informacije

Povratne informacije

Dodatni resursi