你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

Parquet mapping

项目
08/12/2024

Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer

Use Parquet mapping to map incoming data to columns inside tables when your ingestion source file is in Parquet format.

Each element in the mapping list defines the mapping for a specific column. These elements are constructed from three properties: column, datatype, and properties. Learn more in the data mappings overview.

Each Parquet mapping element must contain either of the following optional properties:

Property	Type	Description
Field	`string`	Name of the field in the Parquet record.
Path	`string`	If the value starts with `$` it's interpreted as the path to the field in the Parquet document that will become the content of the column in the table. The path that denotes the entire Parquet record is `$`. If the value doesn't start with `$` it's interpreted as a constant value. Paths that include special characters should be escaped as ['Property Name']. For more information, see JSONPath syntax.
ConstValue	`string`	The constant value to be used for a column instead of some value inside the Parquet file.
Transform	`string`	Transformation that should be applied on the content with mapping transformations.

Note

Field and Path are mutually exclusive.

The following alternatives are equivalent:

[
  {"Column": "event_name", "Properties": {"Path": "$.EventName"}}
]

[
  {"Column": "event_name", "Properties": {"Field": "EventName"}}
]

Important

For queued ingestion:

If the table referenced in the mapping doesn't exist in the database, it gets created automatically, given that valid data types are specified for all columns.
If a column referenced in the mapping doesn't exist in the table, it gets added automatically to the table as the last column upon the first time data is ingested for that column, given a valid data type is specified for the column. To add new columns to a mapping, use the .alter ingestion mapping command.
Data is batched using Ingestion properties. The more distinct ingestion mapping properties used, such as different ConstValue values, the more fragmented the ingestion becomes, which can lead to performance degradation.

Parquet type conversions

Comprehensive support is provided for converting data types when you're ingesting or querying data from a Parquet source.

The following table provides a mapping of Parquet field types, and the table column types they can be converted to. The first column lists the Parquet type, and the others show the table column types they can be converted to.

Note

For Parquest DECIMAL types, the physical type is specified in parentheses, as follows:

I32: INT32 (32-bit integer)
I64: INT64 (64-bit integer)
FLBA: Fixed Length Byte Array
BA: Byte Array

Parquet type	bool	int	long	real	decimal	datetime	timespan	string	guid	dynamic
INT8	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
INT16	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
INT32	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
INT64	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
UINT8	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
UINT16	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
UINT32	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
UINT64	✔️	✔️	✔️	❌	✔️	❌	❌	✔️	❌	❌
FLOAT32	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
FLOAT64	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
BOOLEAN	✔️	❌	❌	❌	✔️	❌	❌	✔️	❌	❌
DECIMAL (I32)	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
DECIMAL (I64)	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	❌
DECIMAL (FLBA)	❌	❌	✔️	✔️	✔️	❌	❌	✔️	❌	❌
DECIMAL (BA)	✔️	❌	✔️	✔️	✔️	❌	❌	✔️	❌	✔️
TIMESTAMP	❌	❌	❌	❌	❌	✔️	❌	✔️	❌	❌
DATE	❌	❌	❌	❌	❌	✔️	❌	✔️	❌	❌
STRING	❌	✔️	✔️	✔️	✔️	❌	❌	✔️	❌	✔️
UUID	❌	❌	❌	❌	❌	❌	❌	✔️	✔️	❌
JSON	❌	❌	❌	❌	❌	❌	❌	✔️	❌	✔️
LIST	❌	❌	❌	❌	❌	❌	❌	❌	❌	✔️
MAP	❌	❌	❌	❌	❌	❌	❌	❌	❌	✔️
STRUCT	❌	❌	❌	❌	❌	❌	❌	❌	❌	✔️

Examples

[
  {"Column": "event_timestamp", "Properties": {"Path": "$.Timestamp"}},
  {"Column": "event_name",      "Properties": {"Path": "$.Event.Name"}},
  {"Column": "event_type",      "Properties": {"Path": "$.Event.Type"}},
  {"Column": "event_time",      "Properties": {"Path": "$.Timestamp", "Transform": "DateTimeFromUnixMilliseconds"}},
  {"Column": "ingestion_time",  "Properties": {"ConstValue": "2021-01-01T10:32:00"}},
  {"Column": "full_record",     "Properties": {"Path": "$"}}
]

The mapping above is serialized as a JSON string when it's provided as part of the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format = "parquet",
    ingestionMapping =
    ```
    [
      {"Column": "column_a", "Properties": {"Path": "$.Field1.Subfield"}},
      {"Column": "column_b", "Properties": {"Path": "$.[\'Field name with space\']"}},
    ]
    ```
  )

Pre-created mapping

When the mapping is pre-created, reference the mapping by name in the .ingest management command.

.ingest into Table123 (@"source1", @"source2")
  with
  (
      format="parquet",
      ingestionMappingReference = "Mapping_Name"
  )

Identity mapping

Use Parquet mapping during ingestion without defining a mapping schema (see identity mapping).

.ingest into Table123 (@"source1", @"source2")
  with
  (
    format="parquet"
  )

通过

Parquet mapping

Parquet type conversions

Examples

Pre-created mapping

Identity mapping

反馈

其他资源