from_avro function

Applies to: check marked yes Databricks Runtime 16.0 and above

Returns a struct value with the avroBin and jsonSchemaStr.

Syntax

from_avro(avroBin, jsonSchemaStr, options )

Arguments

  • avroBin: A BINARY expression specifying a row of Avro data.
  • avroSchemaSpec: The target schema in JSON format. It must match the schema encoded in avroBin as specified in to_avro().
  • options: A MAP<STRING,STRING> literal specifying directives.

Returns

A STRUCT with field names and types based on the result of schema_of_json(jsonStr).

avroBin must be well-formed with respect to the avroSchemaSpec and options or Databricks raises an exception.

Options

Option Values Description
mode FAILFAST, PERMISSIVE Error handling mode. Default: FAILFAST. In PERMISSIVE mode, corrupt records are set to NULL instead of raising an error.
compression uncompressed, snappy, deflate, bzip2, xz, zstandard Compression codec for encoding Avro data.
avroSchemaEvolutionMode none, restart Schema evolution mode. Default: none. When set to restart, the query throws an UnknownFieldException when the schema changes. Restart the job to use the new schema. See Use schema evolution mode with from_avro.
recursiveFieldMaxDepth Range: -1 to 15 Maximum recursion depth along a single recursive path. Default: -1, which does not limit recursion depth.
When a shared type is reachable from many distinct schema paths, schema expansion might cause the driver to run out of memory because this option bounds depth on one path only. To workaround:

Examples

> SELECT from_avro(to_avro(5), '{ "type" : "int" }', NULL:MAP<STRING, STRING>);
  5

> SELECT from_avro(to_avro(5, '{ "type" : "int" }'), '{ "type" : "int" }', NULL:MAP<STRING, STRING>);
  5

> SELECT from_avro(to_avro(named_struct('num', 5, 'txt', 'hello')), '{ "type" : "record", "name": "my_record", "fields": [{ "name": "num", "type": "int"}, { "name": "txt", "type": "string"}]}', NULL:MAP<STRING, STRING>);
  {"num":5,"txt":"hello"}

> SELECT from_avro(to_avro(named_struct('num', 5, 'txt', 'hello')),
                   '{ "type" : "record", "name": "my_record", "fields": [{ "name": "num", "type": "int"}, { "name": "txt", "type": "double"}]}',
                   map('mode', 'failfast'));
  Error: Avro data is not valid for the specified schema.

> SELECT from_avro(to_avro(named_struct('num', 5, 'txt', 'hello')),
                   '{ "type" : "record", "name": "my_record", "fields": [{ "name": "num", "type": "int"}, { "name": "txt", "type": "double"}]}',
                   map('mode', 'permissive'));
  {"num":null,"txt":null}