Query Blob Contents

The Query Blob Contents operation applies a simple Structured Query Language (SQL) statement on a blob's contents and returns only the queried subset of the data. You can also call Query Blob Contents to query the contents of a version or snapshot.

Request

You can construct the Query Blob Contents request as follows. We recommend HTTPS. Replace myaccount with the name of your storage account.

POST method request URI HTTP version
https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query

https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query&snapshot=<DateTime>

https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query&versionid=<DateTime>
HTTP/1.0

HTTP/1.1

URI parameters

You can specify the following additional parameters on the request URI:

Parameter Description
snapshot Optional. The snapshot parameter is an opaque DateTime value. When it's present, it specifies the blob snapshot to query. For more information on working with blob snapshots, see Create a snapshot of a blob.
versionid Optional, version 2019-12-12 and later. The versionid parameter is an opaque DateTime value. When it's present, it specifies the version of the blob to retrieve.
timeout Optional. The timeout parameter is expressed in seconds. For more information, see Set timeouts for Blob Storage operations.

Request headers

The following table describes required and optional request headers:

Request header Description
Authorization Required. Specifies the authentication scheme, account name, and signature. For more information, see Authorize requests to Azure Storage.
Date or x-ms-date Required. Specifies the Coordinated Universal Time (UTC) for the request. For more information, see Authorize requests to Azure Storage.
x-ms-version Required for all authenticated requests, optional for anonymous requests. Specifies the version of the operation to use for this request. For more information, see Versioning for the Azure Storage services.
Content-Type Required. The value of this header should be application/xml; charset=UTF-8.
x-ms-lease-id:<ID> Optional. If this header is specified, the operation will be performed only if both of the following conditions are met:

- The blob's lease is currently active.
- The lease ID specified in the request matches that of the blob.

If this header is specified and both of these conditions are not met, the request will fail and the Query Blob Contents operation will fail with status code 412 (Precondition Failed).
Origin Optional. Specifies the origin from which the request is issued. The presence of this header results in Cross-Origin Resource Sharing (CORS) headers on the response.
x-ms-client-request-id Optional. Provides a client-generated, opaque value with a 1-kibibyte (KiB) character limit that's recorded in the logs when logging is configured. We highly recommend that you use this header to correlate client-side activities with requests that the server receives. For more information, see Azure logging: Use logs to track storage requests.

This operation also supports the use of conditional headers to query the blob contents only if a specified condition is met. For more information, see Specify conditional headers for Blob Storage operations.

Request body

The request body for this version of Query Blob Contents uses the following XML format:

<?xml version="1.0" encoding="utf-8"?>  
<QueryRequest>
  <QueryType>String</QueryType>
  <Expression>String</Expression>
  <InputSerialization>
    <Format>
      <Type>String</Type>
          <DelimitedTextConfiguration>
            <ColumnSeparator>String</ColumnSeparator>
            <FieldQuote>String</FieldQuote>
            <RecordSeparator>String</RecordSeparator>
            <EscapeChar>String</EscapeChar>
            <HasHeaders>Boolean</HasHeaders>
          </DelimitedTextConfiguration>
          <JsonTextConfiguration>
            <RecordSeparator>String</RecordSeparator>
          </JsonTextConfiguration>
    </Format>
  </InputSerialization>
  <OutputSerialization>
    <Format>
      <Type>String</Type>
      <DelimitedTextConfiguration>
        <ColumnSeparator>String</ColumnSeparator >
        <FieldQuote>String</FieldQuote >
        <RecordSeparator>String</RecordSeparator>
        <EscapeChar>String</EscapeChar>
        <HasHeaders>Boolean</HasHeaders>
      </DelimitedTextConfiguration>
      <JsonTextConfiguration>
        <RecordSeparator>String</RecordSeparator>
      </JsonTextConfiguration>
      <ParquetTextConfiguration></ParquetTextConfiguration>
      <ArrowConfiguration>
        <Schema>
            <Field>
                <Type>String</Type>
                <Name>String</Name>
            </Field>
            <Field>
                <Type>String</Type>
            </Field>
                .
                .
                .
            <Field>
                <Type>String</Type>
                <Precision>Integer</Precision>
                <Scale>Integer</Scale>
            </Field>
        </Schema>
      </ArrowConfiguration>
    </Format>
  </OutputSerialization>
</QueryRequest>

The following table describes the elements of the request body:

Element name Description
QueryRequest Required. Groups the set of query request settings.
QueryType Required. Indicates the type of the provided query expression. The only valid value for the current version is SQL.
Expression Required. Indicates the query expression in SQL. The maximum size of the query expression is 256 KiB. For more information about the expression syntax, see Query acceleration: SQL language reference.
InputSerialization Optional. Groups the settings regarding the input serialization of the blob contents. If it's not specified, the delimited text configuration is used.
Format Required if InputSerialization is specified. Groups the settings regarding the format of the blob data.
Type Required if InputSerialization is specified. Indicates the format type. Valid values are delimited, csv, and json.
DelimitedTextConfiguration Optional. Groups the settings that are used for interpreting the blob data if the blob is formatted with delimited text.
ColumnSeparator Optional. Indicates the string that's used to separate columns.
FieldQuote Optional. Indicates the string that's used to quote a specific field.
RecordSeparator Optional. Indicates the string that's used to separate records.
EscapeChar Optional. Indicates the string that's used as an escape character.
HasHeaders Optional. Specifies a Boolean that represents whether the data has headers.
JsonTextConfiguration Optional. Groups the settings that are used for interpreting the blob data if the blob is JSON formatted.
RecordSeparator Optional. Indicates the string that's used to separate records.
OutputSerialization Optional. Indicates the serialization format of the filtered contents of the blob returned in the response. If it's not specified, the delimited text configuration is used.
Format Required if OutputSerialization is specified. Groups the settings regarding the format of the returned response.
Type Required if OutputSerialization is specified. Indicates the format type. Valid values are delimited, csv, json, arrow, and parquet.
DelimitedTextConfiguration Optional. Groups the settings that are used for formatting the response if the response should be formatted with delimited text.
ColumnSeparator Optional. Indicates the string that's used to separate columns.
FieldQuote Optional. Indicates the string that's used to quote a specific field.
RecordSeparator Optional. Indicates the string that's used to separate records.
EscapeChar Optional. Indicates the string that's used as an escape character.
HasHeaders Optional. Specifies a Boolean that represents whether the data has headers.
JsonTextConfiguration Optional. Groups the settings that are used for formatting the response if the response should be JSON formatted.
ParquetTextConfiguration Optional. Groups the settings that are used for interpreting the blob data if the blob data is in Parquet format.
RecordSeparator Optional. Indicates the string that's used to separate records.
ArrowConfiguration Optional. Groups the settings that are used for formatting the response if the response should be Arrow formatted.
Schema Required if ArrowConfiguration is specified. Groups the settings regarding the schema of the returned Arrow response.
Field Optional. Groups settings regarding a specific field.
Type Required if Field is specified. Indicates the field type. Valid values are Int, Float, Decimal, and Bool.
Precision Optional. Indicates the precision of the field.
Scale Optional. Indicates the scale of the field.

Response

The response includes an HTTP status code, a set of response headers, and the response body. The response body is in Avro binary format. Because the response content length is unknown, the response is streamed with chunked encoding.

Status code

If the query request is well formed and authorized, the operation returns status code 202 (Accepted). Any errors or progress messages encountered during the response streaming will be returned as a part of the response body.

For information about status codes, see Status and error codes.

Response headers

The response for this operation includes the following headers. The response might also include additional standard HTTP headers. All standard headers conform to the HTTP/1.1 protocol specification.

Syntax Description
Last-Modified Indicates the date/time that the blob was last modified. The date format follows RFC 1123.

Any operation that modifies the blob, including an update of the blob's metadata or properties, changes the last-modified time of the blob.
Content-Type Specifies the format in which the results are returned. Currently, this value is avro/binary.
ETag Contains a value that you can use to perform operations conditionally. For more information, see Specify conditional headers for Blob Storage operations. If the request version is 2011-08-18 or later, the ETag value is in quotation marks.
Content-Encoding Returns the value that was specified for the Content-Encoding request header.
Content-Language Returns the value that was specified for the Content-Language request header.
Cache-Control Returned if this header was previously specified for the blob.
Content-Disposition Returned for requests against version 2013-08-15 and later. This header returns the value that was specified for the x-ms-blob-content-disposition header.

The Content-Disposition response header field conveys additional information about how to process the response payload. You can also use the response header field to attach additional metadata. For example, if the response header field is set to attachment, the user agent should not display the response. Instead, it should show a Save As dialog with a file name other than the specified blob name.
x-ms-blob-type: <BlockBlob> Returns the blob's type.
x-ms-request-id Uniquely identifies the request that was made. You can use it to troubleshoot the request. For more information, see Troubleshoot API operations.
x-ms-version Indicates the version of Azure Blob Storage that's used to execute the request. Included for requests made using version 2009-09-19 and later.

This header is also returned for anonymous requests without a version specified, if the container was marked for public access using the 2009-09-19 version of Blob Storage.
Date A UTC date/time value that indicates the time at which the service sent the response.
Access-Control-Allow-Origin Returned if the request includes an Origin header and CORS is enabled with a matching rule. This header returns the value of the origin request header in case of a match.
Access-Control-Expose-Headers Returned if the request includes an Origin header and CORS is enabled with a matching rule. This header returns the list of response headers that will be exposed to the client or issuer of the request.
Vary Returned with the value of the Origin header when CORS rules are specified. For details, see CORS support for Azure Storage.
Access-Control-Allow-Credentials Returned if the request includes an Origin header and CORS is enabled with a matching rule that doesn't allow all origins. This header is set to true.
x-ms-blob-committed-block-count Indicates the number of committed blocks present in the blob. This header is returned only for append blobs.
x-ms-server-encrypted: true/false Version 2015-12-11 or later. The value of this header is set to true if the blob data and application metadata are completely encrypted through the specified algorithm. When the blob is unencrypted, or if only parts of the blob/application metadata are encrypted, the value is set to false.

Response body

The response body contains the filtered contents of the blob sent as a series of messages in Avro binary format. It uses the following schema:

{
    "type": "record",
    "name": "com.microsoft.azure.storage.queryBlobContents.resultData",
    "doc": "Holds result data in the format specified for this query (CSV, JSON, etc.).",
    "fields": [
      {
        "name": "data",
        "type": "bytes"
      }
    ]
  },
  {
    "type": "record",
    "name": "com.microsoft.azure.storage.queryBlobContents.error",
    "doc": "An error that occurred while processing the query.",
    "fields": [
      {
        "name": "fatal",
        "type": "boolean",
        "doc": "If true, this error prevents further query processing.  More result data may be returned, but there is no guarantee that all of the original data will be processed.  If false, this error does not prevent further query processing."
      },
      {
        "name": "name",
        "type": "string",
        "doc": "The name of the error"
      },
      {
        "name": "description",
        "type": "string",
        "doc": "A description of the error"
      },
      {
        "name": "position",
        "type": "long",
        "doc": "The blob offset at which the error occurred"
      }
    ]
  },
  {
    "type": "record",
    "name": "com.microsoft.azure.storage.queryBlobContents.progress",
    "doc": "Information about the progress of the query",
    "fields": [
      {
        "name": "bytesScanned",
        "type": "long",
        "doc": "The number of bytes that have been scanned"
      },
      {
        "name": "totalBytes",
        "type": "long",
        "doc": "The total number of bytes to be scanned in this query"
      }
    ]
  },
  {
    "type": "record",
    "name": "com.microsoft.azure.storage.queryBlobContents.end",
    "doc": "Sent as the final message of the response, indicating that all results have been sent.",
    "fields": [
      {
        "name": "totalBytes",
        "type": "long",
        "doc": "The total number of bytes to be scanned in this query"
      }
    ]
  }
]

Sample response

      "StatusCode": 200,
      "ResponseHeaders": {
        "Content-Type": "avro/binary",
        "Date": "Fri, 24 Apr 2020 20:25:42 GMT",
        "ETag": "\u00220x8D7E88DA9C0A75B\u0022",
        "Last-Modified": "Fri, 24 Apr 2020 20:25:43 GMT",
        "Transfer-Encoding": "chunked",
        "x-ms-blob-type": "BlockBlob",
        "x-ms-client-request-id": "f6d1983c-55e5-9f95-6d3d-80d74862d99e",
        "x-ms-creation-time": "Fri, 24 Apr 2020 20:25:43 GMT",
        "x-ms-lease-state": "available",
        "x-ms-lease-status": "unlocked",
        "x-ms-request-id": "46c09ab1-b01e-0001-1076-1acef2000000",
        "x-ms-version": "2019-12-12"
	},
	"ResponseBody":{...}
  

Authorization

If the container's access control list (ACL) is set to allow anonymous access to the blob, any client can call this operation. If the container is private, this operation can be performed by the account owner and by anyone with a shared access signature that has permission to read the blob.

Remarks

  • The Query Blob Contents operation is supported only on a BlockBlob type.
  • Querying the contents of a blob that's encrypted with customer-provided keys is not supported in this version of the API.
  • The x-ms-version header is required to retrieve a blob that belongs to a private container. If the blob belongs to a container that's available for full or partial public access, any client can read it without specifying a version. The service version is not required for retrieving a blob that belongs to a public container. For more information, see Restrict access to containers and blobs.
  • You can use the Query Blob Contents operation to query only objects that have delimited/CSV or JSON format.

See also

Authorize requests to Azure Storage
Status and error codes
Blob Storage error codes
Set timeouts for Blob Storage operations
Query acceleration: SQL language reference