你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
infer_storage_schema plugin
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer
This plugin infers the schema of external data, and returns it as CSL schema string. The string can be used when creating external tables. The plugin is invoked with the evaluate
operator.
Authentication and authorization
In the properties of the request, you specify storage connection strings to access. Each storage connection string specifies the authorization method to use for access to the storage. Depending on the authorization method, the principal may need to be granted permissions on the external storage to perform the schema inference.
The following table lists the supported authentication methods and any required permissions by storage type.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Impersonation | Storage Blob Data Reader | Reader |
Shared Access (SAS) token | List + Read | This authentication method isn't supported in Gen1. |
Microsoft Entra access token | ||
Storage account access key | This authentication method isn't supported in Gen1. |
Syntax
evaluate
infer_storage_schema(
Options )
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
Options | dynamic |
✔️ | A property bag specifying the properties of the request. |
Supported properties of the request
Name | Type | Required | Description |
---|---|---|---|
StorageContainers | dynamic |
✔️ | An array of storage connection strings that represent prefix URI for stored data artifacts. |
DataFormat | string |
✔️ | One of the supported data formats. |
FileExtension | string |
If specified, the function only scans files ending with this file extension. Specifying the extension may speed up the process or eliminate data reading issues. | |
FileNamePrefix | string |
If specified, the function only scans files starting with this prefix. Specifying the prefix may speed up the process. | |
Mode | string |
The schema inference strategy. A value of: any , last , all . The function infers the data schema from the first found file, from the last written file, or from all files respectively. The default value is last . |
|
InferenceOptions | dynamic |
More inference options. Valid options: UseFirstRowAsHeader for delimited file formats. For example, 'InferenceOptions': {'UseFirstRowAsHeader': true} . |
Returns
The infer_storage_schema
plugin returns a single result table containing a single row/column containing CSL schema string.
Note
- Storage container URI secret keys must have the permissions for List in addition to Read.
- Schema inference strategy 'all' is a very "expensive" operation, as it implies reading from all artifacts found and merging their schema.
- Some returned types may not be the actual ones as a result of wrong type guess (or, as a result of schema merge process). This is why you should review the result carefully before creating an external table.
Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.windows.net/MobileEvents;secretKey'
],
'FileExtension': '.parquet',
'FileNamePrefix': 'part-',
'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)
Output
CslSchema |
---|
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real |
Use the returned schema in external table definition:
.create external table MobileEvents(
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real
)
kind=blob
partition by (dt:datetime = bin(event_time, 1d), app:string = app_id)
pathformat = ('app=' app '/dt=' datetime_pattern('yyyyMMdd', dt))
dataformat = parquet
(
h@'https://storageaccount.blob.core.windows.net/MovileEvents;secretKey'
)