Azure Schema Registry in Azure Event Hubs
In many event streaming and messaging scenarios, the event or message payload contains structured data. Schema-driven formats such as Apache Avro are often used to serialize or deserialize such structured data.
An event producer uses a schema to serialize event payload and publish it to an event broker such as Event Hubs. Event consumers read event payload from the broker and de-serialize it using the same schema. So, both producers and consumers can validate the integrity of the data with the same schema.
What is Azure Schema Registry?
Azure Schema Registry is a feature of Event Hubs, which provides a central repository for schemas for event-driven and messaging-centric applications. It provides the flexibility for your producer and consumer applications to exchange data without having to manage and share the schema. It also provides a simple governance framework for reusable schemas and defines relationship between schemas through a grouping construct (schema groups).
With schema-driven serialization frameworks like Apache Avro, moving serialization metadata into shared schemas can also help with reducing the per-message overhead. That's because each message won't need to have the metadata (type information and field names) as it's the case with tagged formats such as JSON.
Having schemas stored alongside the events and inside the eventing infrastructure ensures that the metadata that's required for serialization or de-serialization is always in reach and schemas can't be misplaced.
The feature isn't available in the basic tier.
Schema Registry information flow
The information flow when you use schema registry is the same for all protocols that you use to publish or consume events from Azure Event Hubs.
The following diagram shows how the information flows when event producers and consumers use Schema Registry with the Kafka protocol.
- Kafka producer application uses
KafkaAvroSerializerto serialize event data using the specified schema. Producer application provides details of the schema registry endpoint and other optional parameters that are required for schema validation.
- The serializer looks for the schema in the schema registry to serialize event data. If it finds the schema, then the corresponding schema ID is returned. You can configure the producer application to auto register the schema with the schema registry if it doesn't exist.
- Then the serializer prepends the schema ID to the serialized data that is published to the Event Hubs.
- Kafka consumer application uses
KafkaAvroDeserializerto deserialize data that it receives from the event hub.
- The deserializer uses the schema ID (prepended by the producer) to retrieve schema from the schema registry.
- The de-serializer uses the schema to deserialize event data that it receives from the event hub.
- The schema registry client uses caching to prevent redundant schema registry lookups in the future.
Schema Registry elements
An Event Hubs namespace now can host schema groups alongside event hubs (or Kafka topics). It hosts a schema registry and can have multiple schema groups. In spite of being hosted in Azure Event Hubs, the schema registry can be used universally with all Azure messaging services and any other message or events broker. Each of these schema groups is a separately securable repository for a set of schemas. Groups can be aligned with a particular application or an organizational unit.
Schema group is a logical group of similar schemas based on your business criteria. A schema group can hold multiple versions of a schema. The compatibility enforcement setting on a schema group can help ensure that newer schema versions are backwards compatible.
The security boundary imposed by the grouping mechanism help ensures that trade secrets don't inadvertently leak through metadata in situations where the namespace is shared among multiple partners. It also allows for application owners to manage schemas independent of other applications that share the same namespace.
Schemas define the contract between producers and consumers. A schema defined in an Event Hubs schema registry helps manage the contract outside of event data, thus removing the payload overhead. A schema has a name, type (example: record, array, and so on.), compatibility mode (none, forward, backward, full), and serialization type (only Avro for now). You can create multiple versions of a schema and retrieve and use a specific version of a schema.
Schemas need to evolve with the business requirement of producers and consumers. Azure Schema Registry supports schema evolution by introducing compatibility modes at the schema group level. When you create a schema group, you can specify the compatibility mode of the schemas that you include in that schema group. When you update a schema, the change should comply with the assigned compatibility mode and then only it creates a new version of the schema.
Azure Schema Registry for Event Hubs support following compatibility modes.
Backward compatibility mode allows the consumer code to use a new version of schema but it can process messages with old version of the schema. When you use backward compatibility mode in a schema group, it allows following changes to be made on a schema.
- Delete fields.
- Add optional fields.
Forward compatibility allows the consumer code to use an old version of the schema but it can read messages with the new schema. Forward compatibility mode allows following changes to be made on a schema.
- Add fields
- Delete optional fields
None compatibility mode is used, the schema registry doesn't do any compatibility checks when you update schemas.
You can use one of the following libraries to include an Avro serializer, which you can use to serialize and deserialize payloads containing Schema Registry schema identifiers and Avro-encoded data.
- .NET - Microsoft.Azure.Data.SchemaRegistry.ApacheAvro
- Java - azure-data-schemaregistry-avro
- Python - azure-schemaregistry-avroserializer
- Apache Kafka - Run Kafka-integrated Apache Avro serializers and deserializers backed by Azure Schema Registry. The Java client's Apache Kafka client serializer for the Azure Schema Registry can be used in any Apache Kafka scenario and with any Apache Kafka® based deployment or cloud service.
- Azure CLI - For an example of adding a schema to a schema group using CLI, see Adding a schema to a schema group using CLI.
- PowerShell - For an example of adding a schema to a schema group using PowerShell, see Adding a schema to a schema group using PowerShell.
For limits (for example: number of schema groups in a namespace) of Event Hubs, see Event Hubs quotas and limits
Azure role-based access control
When accessing the schema registry programmatically, you need to register an application in Azure Active Directory (Azure AD) and add the security principal of the application to one of the Azure role-based access control (Azure RBAC) roles:
|Owner||Read, write, and delete Schema Registry groups and schemas.|
|Contributor||Read, write, and delete Schema Registry groups and schemas.|
|Schema Registry Reader||Read and list Schema Registry groups and schemas.|
|Schema Registry Contributor||Read, write, and delete Schema Registry groups and schemas.|
For instructions on creating registering an application using the Azure portal, see Register an app with Azure AD. Note down the client ID (application ID), tenant ID, and the secret to use in the code.
- To learn how to create a schema registry using the Azure portal, see Create an Event Hubs schema registry using the Azure portal.
- See the following Schema Registry Avro client library samples.
Submit and view feedback for