Share via


Structured Body Format

Introduction

This document describes the Structured Body format used by the Azure Storage Blob, File, and DFS APIs to support efficient checksum calculation over request content. This is a custom binary format which encodes data (e.g., blob or file content) with trailing checksums on a checksum basis. Note that the request body itself is what is encoded into this format.

This documentation is primarily targeted at customers using the Azure Storage REST APIs directly. Customers using a supported Azure Storage SDK will automatically have their requests encoded in this format.

Currently, Azure Storage only supports v1 of this format, which only supports crc64 checksums. Use of the structured message format is optional.

Specification

Sections

An encoded message has three sections.

Section Description
Header The header contains the schema version (1), message length, options, and the number of segments.
Segment(s) Each message has one or more segments, and each segment contains the segment #, data, and optional trailing metadata. For v1, the only supported trailer is a crc64 checksum.
Trailer Each message has optional trailing metadata. For v1, the only supported trailer is a crc64 checksum.

Binary Format, v1

The Structured Body v1 binary format is defined as:

Header:
   uint8      message-version
   uint64     message-length
   uint16     message-flags
   uint16     num-segments

Segment(s):
   uint16     segment-num
   uint64     segment-data-length (dl)
   byte[dl]   segment-data
   byte[8]    [optional] segment-data-crc64

Trailer:
   byte[8]    [optional] message-data-crc64

All integer data types are encoded as little-endian.

Field Reference

Field Type Description
message-version uint8 Schema version of the message. This must be 1.
message-length uint64 Length of the full message. In an HTTP message, this must match the Content-Length header.
message-flags uint16 Flags (options) enabled for this message. Version 1 only supports a single flag for crc64 checksums. See Flags.
num-segments uint16 Number of segments contained in the message. This must be at least 1. See Segments
segment-num uint16 The current segment #. The first segment is 1 and must increment for each subsequent segment.
segment-data-length (dl) uint64 Length of the segment's blob/file data, in bytes.
segment-data byte[dl] Blob/file data bytes.
segment-data-crc64[^1] byte[8] Computed crc64 checksum for the segment's data.
message-data-crc64[^1] byte[8] Computed crc64 checksum for the message data (all segments' data.)

[^1]: crc64 checksums are present when the include-crc64 option is specifed. See Flags.

Flags

The message-flags field is used to specify options for the encoded message. Version 1 only supports a single option, include-crc64, but the remaining bits are reserved for future options such as other checksum algorithms and other metadata.

Value Name Description
0x0001 include-crc64 Include crc64 checksums in segments and message trailer.
0x0002-0x8000 Reserved for future versions.

Segments

Encoded messages are split into one or more segments. Each segment contains its segment #, segment data, and a checksum[^1]. This design allows for incremental integrity verification for large requests, and is useful for resuming partial downloads.

Note

Segments are numbered starting with 1. The maximum number of segments is 65535.

Empty Segments

Note that segments can have an empty segment-data field. See the example empty blob, which has a single, empty segment. Empty segments must have a segment-data-length of 0 and if include-crc64 is enabled, must include the valid checksum.

Segment Size

On a GetBlob or ReadFile request with x-ms-structured-body appropriate set in the HTTP request, the service will chunk the blob or file data into 4MiB segments in the encoded response. If the message would exceed the maximum number of segments, the segment size will be increased.

For uploaded blob or file data from a client, the service will accept segments of any size or varying sizes. The recommendation is to use 4MiB or larger segment sizes. The SDKs use 4MiB segment sizes by default.

CRC64 content validation

CRC64 content validation is a feature in the Azure Storage REST API that enables checksum validation for supported APIs. There are many variants of CRC64 algorithms. CRC64 checksums are calculated using CRC64-NVME (a.k.a CRC64-Rocksoft). The feature utilizes a custom CRC64 polynomial to validate the integrity of content transferred. There are two forms in which this checksum can be utilized:

  • Structured body: The CRC64 checksums are embedded into the body of the API request, which allows checksums to be validated as data is streamed.
  • Transactional CRC64 checksums (supported only in uploads): For each individual API request, the client computes the CRC64 checksum and sets the value to the header, x-ms-content-crc64. The Storage service validates that the checksum of the bytes received matches the checksum provided in the header.

Polynomial

This CRC64 variant is bit-reflected (based on the non bit-reflected polynomial 0xad93d23594c93659) and inverts the CRC input and output bits.

Examples

Example - Empty Encoded Message

This example shows a message encoded with the structured body format without data. Note that the message must contain an empty segment.

// header: 13 bytes
0x01, // message-version: 1
0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // message-length: 39
0x01, 0x00, // message-flags: 1 (include-crc64)
0x01, 0x00, // num-segments: 1

// segment 1: 18 bytes
0x01, 0x00, // segment-num: 1
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // segment-data-length: 0
// segment-data: empty
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // segment-data-crc64: 0

// trailer: 8 bytes
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 // message-data-crc64: 0

Example - Empty Encoded Message without crc64

This example shows an encoded message without data and without the include-crc64 option enabled.

// header: 13 bytes
0x01, // message-version: 1
0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // message-length: 23
0x00, 0x00, // message-flags: 0 (none)
0x01, 0x00, // num-segments: 1

// segment 1: 10 bytes
0x01, 0x00, // segment-num: 1
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // segment-data-length: 0
// segment-data: empty

// trailer: empty

Example - Encoded Message with Two Segments and crc64 Checksum

// header: 13 bytes
0x01, // message-version: 1
0x3b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // message-length: 59
0x01, 0x00, // message-flags: 1 (include-crc64)
0x02, 0x00, // num-segments: 2

// segment 1: 19 bytes
0x01, 0x00, // segment-num: 1
0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // segment-data-length: 1
0x11, // segment-data
0xd0, 0x61, 0x67, 0x57, 0xb4, 0x5f, 0x54, 0xd2, // segment-data-crc64

// segment 2: 19 bytes
0x02, 0x00, // segment-num: 2
0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // segment-data-length: 1
0x22, // segment-data
0xd8, 0x4a, 0xfb, 0x9e, 0xa0, 0x4f, 0xc6, 0xda, // segment-data-crc64

// trailer: 8 bytes
0xe2, 0xa6, 0x37, 0x74, 0x50, 0xad, 0xc2, 0xef // message-data-crc64