Query Blob Contents
The Query Blob Contents
operation applies a simple Structured Query Language (SQL) statement on a blob's contents and returns only the queried subset of the data. You can also call Query Blob Contents
to query the contents of a version or snapshot.
Request
You can construct the Query Blob Contents
request as follows. We recommend HTTPS. Replace myaccount with the name of your storage account.
POST method request URI | HTTP version |
---|---|
https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query&snapshot=<DateTime> https://myaccount.blob.core.windows.net/mycontainer/myblob?comp=query&versionid=<DateTime> |
HTTP/1.0 HTTP/1.1 |
URI parameters
You can specify the following additional parameters on the request URI:
Parameter | Description |
---|---|
snapshot |
Optional. The snapshot parameter is an opaque DateTime value. When it's present, it specifies the blob snapshot to query. For more information on working with blob snapshots, see Create a snapshot of a blob. |
versionid |
Optional, version 2019-12-12 and later. The versionid parameter is an opaque DateTime value. When it's present, it specifies the version of the blob to retrieve. |
timeout |
Optional. The timeout parameter is expressed in seconds. For more information, see Set timeouts for Blob Storage operations. |
Request headers
The following table describes required and optional request headers:
Request header | Description |
---|---|
Authorization |
Required. Specifies the authentication scheme, account name, and signature. For more information, see Authorize requests to Azure Storage. |
Date or x-ms-date |
Required. Specifies the Coordinated Universal Time (UTC) for the request. For more information, see Authorize requests to Azure Storage. |
x-ms-version |
Required for all authenticated requests, optional for anonymous requests. Specifies the version of the operation to use for this request. For more information, see Versioning for the Azure Storage services. |
Content-Type |
Required. The value of this header should be application/xml; charset=UTF-8 . |
x-ms-lease-id:<ID> |
Optional. If this header is specified, the operation will be performed only if both of the following conditions are met: - The blob's lease is currently active. - The lease ID specified in the request matches that of the blob. If this header is specified and both of these conditions are not met, the request will fail and the Query Blob Contents operation will fail with status code 412 (Precondition Failed). |
Origin |
Optional. Specifies the origin from which the request is issued. The presence of this header results in Cross-Origin Resource Sharing (CORS) headers on the response. |
x-ms-client-request-id |
Optional. Provides a client-generated, opaque value with a 1-kibibyte (KiB) character limit that's recorded in the logs when logging is configured. We highly recommend that you use this header to correlate client-side activities with requests that the server receives. |
This operation also supports the use of conditional headers to query the blob contents only if a specified condition is met. For more information, see Specify conditional headers for Blob Storage operations.
Request body
The request body for this version of Query Blob Contents
uses the following XML format:
<?xml version="1.0" encoding="utf-8"?>
<QueryRequest>
<QueryType>String</QueryType>
<Expression>String</Expression>
<InputSerialization>
<Format>
<Type>String</Type>
<DelimitedTextConfiguration>
<ColumnSeparator>String</ColumnSeparator>
<FieldQuote>String</FieldQuote>
<RecordSeparator>String</RecordSeparator>
<EscapeChar>String</EscapeChar>
<HasHeaders>Boolean</HasHeaders>
</DelimitedTextConfiguration>
<JsonTextConfiguration>
<RecordSeparator>String</RecordSeparator>
</JsonTextConfiguration>
</Format>
</InputSerialization>
<OutputSerialization>
<Format>
<Type>String</Type>
<DelimitedTextConfiguration>
<ColumnSeparator>String</ColumnSeparator >
<FieldQuote>String</FieldQuote >
<RecordSeparator>String</RecordSeparator>
<EscapeChar>String</EscapeChar>
<HasHeaders>Boolean</HasHeaders>
</DelimitedTextConfiguration>
<JsonTextConfiguration>
<RecordSeparator>String</RecordSeparator>
</JsonTextConfiguration>
<ArrowConfiguration>
<Schema>
<Field>
<Type>String</Type>
<Name>String</Name>
</Field>
<Field>
<Type>String</Type>
</Field>
.
.
.
<Field>
<Type>String</Type>
<Precision>Integer</Precision>
<Scale>Integer</Scale>
</Field>
</Schema>
</ArrowConfiguration>
</Format>
</OutputSerialization>
</QueryRequest>
The following table describes the elements of the request body:
Element name | Description |
---|---|
QueryRequest |
Required. Groups the set of query request settings. |
QueryType |
Required. Indicates the type of the provided query expression. The only valid value for the current version is SQL . |
Expression |
Required. Indicates the query expression in SQL. The maximum size of the query expression is 256 KiB. For more information about the expression syntax, see Query acceleration: SQL language reference. |
InputSerialization |
Optional. Groups the settings regarding the input serialization of the blob contents. If it's not specified, the delimited text configuration is used. |
Format |
Required if InputSerialization is specified. Groups the settings regarding the format of the blob data. |
Type |
Required if InputSerialization is specified. Indicates the format type. Valid values are delimited , csv , and json . |
DelimitedTextConfiguration |
Optional. Groups the settings that are used for interpreting the blob data if the blob is formatted with delimited text. |
ColumnSeparator |
Optional. Indicates the string that's used to separate columns. |
FieldQuote |
Optional. Indicates the string that's used to quote a specific field. |
RecordSeparator |
Optional. Indicates the string that's used to separate records. |
EscapeChar |
Optional. Indicates the string that's used as an escape character. |
HasHeaders |
Optional. Specifies a Boolean that represents whether the data has headers. |
JsonTextConfiguration |
Optional. Groups the settings that are used for interpreting the blob data if the blob is JSON formatted. |
RecordSeparator |
Optional. Indicates the string that's used to separate records. |
OutputSerialization |
Optional. Indicates the serialization format of the filtered contents of the blob returned in the response. If it's not specified, the delimited text configuration is used. |
Format |
Required if OutputSerialization is specified. Groups the settings regarding the format of the returned response. |
Type |
Required if OutputSerialization is specified. Indicates the format type. Valid values are delimited , csv , json , and arrow . |
DelimitedTextConfiguration |
Optional. Groups the settings that are used for formatting the response if the response should be formatted with delimited text. |
ColumnSeparator |
Optional. Indicates the string that's used to separate columns. |
FieldQuote |
Optional. Indicates the string that's used to quote a specific field. |
RecordSeparator |
Optional. Indicates the string that's used to separate records. |
EscapeChar |
Optional. Indicates the string that's used as an escape character. |
HasHeaders |
Optional. Specifies a Boolean that represents whether the data has headers. |
JsonTextConfiguration |
Optional. Groups the settings that are used for formatting the response if the response should be JSON formatted. |
RecordSeparator |
Optional. Indicates the string that's used to separate records. |
ArrowConfiguration |
Optional. Groups the settings that are used for formatting the response if the response should be Arrow formatted. |
Schema |
Required if ArrowConfiguration is specified. Groups the settings regarding the schema of the returned Arrow response. |
Field |
Optional. Groups settings regarding a specific field. |
Type |
Required if Field is specified. Indicates the field type. Valid values are Int , Float , Decimal , and Bool . |
Precision |
Optional. Indicates the precision of the field. |
Scale |
Optional. Indicates the scale of the field. |
Response
The response includes an HTTP status code, a set of response headers, and the response body. The response body is in Avro binary format. Because the response content length is unknown, the response is streamed with chunked encoding.
Status code
If the query request is well formed and authorized, the operation returns status code 202 (Accepted). Any errors or progress messages encountered during the response streaming will be returned as a part of the response body.
For information about status codes, see Status and error codes.
Response headers
The response for this operation includes the following headers. The response might also include additional standard HTTP headers. All standard headers conform to the HTTP/1.1 protocol specification.
Syntax | Description |
---|---|
Last-Modified |
Indicates the date/time that the blob was last modified. The date format follows RFC 1123. Any operation that modifies the blob, including an update of the blob's metadata or properties, changes the last-modified time of the blob. |
Content-Type |
Specifies the format in which the results are returned. Currently, this value is avro/binary . |
ETag |
Contains a value that you can use to perform operations conditionally. For more information, see Specify conditional headers for Blob Storage operations. If the request version is 2011-08-18 or later, the ETag value is in quotation marks. |
Content-Encoding |
Returns the value that was specified for the Content-Encoding request header. |
Content-Language |
Returns the value that was specified for the Content-Language request header. |
Cache-Control |
Returned if this header was previously specified for the blob. |
Content-Disposition |
Returned for requests against version 2013-08-15 and later. This header returns the value that was specified for the x-ms-blob-content-disposition header.The Content-Disposition response header field conveys additional information about how to process the response payload. You can also use the response header field to attach additional metadata. For example, if the response header field is set to attachment , the user agent should not display the response. Instead, it should show a Save As dialog with a file name other than the specified blob name. |
x-ms-blob-type: <BlockBlob> |
Returns the blob's type. |
x-ms-request-id |
Uniquely identifies the request that was made. You can use it to troubleshoot the request. For more information, see Troubleshoot API operations. |
x-ms-version |
Indicates the version of Azure Blob Storage that's used to execute the request. Included for requests made using version 2009-09-19 and later. This header is also returned for anonymous requests without a version specified, if the container was marked for public access using the 2009-09-19 version of Blob Storage. |
Date |
A UTC date/time value that indicates the time at which the service sent the response. |
Access-Control-Allow-Origin |
Returned if the request includes an Origin header and CORS is enabled with a matching rule. This header returns the value of the origin request header in case of a match. |
Access-Control-Expose-Headers |
Returned if the request includes an Origin header and CORS is enabled with a matching rule. This header returns the list of response headers that will be exposed to the client or issuer of the request. |
Vary |
Returned with the value of the Origin header when CORS rules are specified. For details, see CORS support for Azure Storage. |
Access-Control-Allow-Credentials |
Returned if the request includes an Origin header and CORS is enabled with a matching rule that doesn't allow all origins. This header is set to true . |
x-ms-blob-committed-block-count |
Indicates the number of committed blocks present in the blob. This header is returned only for append blobs. |
x-ms-server-encrypted: true/false |
Version 2015-12-11 or later. The value of this header is set to true if the blob data and application metadata are completely encrypted through the specified algorithm. When the blob is unencrypted, or if only parts of the blob/application metadata are encrypted, the value is set to false . |
Response body
The response body contains the filtered contents of the blob sent as a series of messages in Avro binary format. It uses the following schema:
{
"type": "record",
"name": "com.microsoft.azure.storage.queryBlobContents.resultData",
"doc": "Holds result data in the format specified for this query (CSV, JSON, etc.).",
"fields": [
{
"name": "data",
"type": "bytes"
}
]
},
{
"type": "record",
"name": "com.microsoft.azure.storage.queryBlobContents.error",
"doc": "An error that occurred while processing the query.",
"fields": [
{
"name": "fatal",
"type": "boolean",
"doc": "If true, this error prevents further query processing. More result data may be returned, but there is no guarantee that all of the original data will be processed. If false, this error does not prevent further query processing."
},
{
"name": "name",
"type": "string",
"doc": "The name of the error"
},
{
"name": "description",
"type": "string",
"doc": "A description of the error"
},
{
"name": "position",
"type": "long",
"doc": "The blob offset at which the error occurred"
}
]
},
{
"type": "record",
"name": "com.microsoft.azure.storage.queryBlobContents.progress",
"doc": "Information about the progress of the query",
"fields": [
{
"name": "bytesScanned",
"type": "long",
"doc": "The number of bytes that have been scanned"
},
{
"name": "totalBytes",
"type": "long",
"doc": "The total number of bytes to be scanned in this query"
}
]
},
{
"type": "record",
"name": "com.microsoft.azure.storage.queryBlobContents.end",
"doc": "Sent as the final message of the response, indicating that all results have been sent.",
"fields": [
{
"name": "totalBytes",
"type": "long",
"doc": "The total number of bytes to be scanned in this query"
}
]
}
]
Sample response
"StatusCode": 200,
"ResponseHeaders": {
"Content-Type": "avro/binary",
"Date": "Fri, 24 Apr 2020 20:25:42 GMT",
"ETag": "\u00220x8D7E88DA9C0A75B\u0022",
"Last-Modified": "Fri, 24 Apr 2020 20:25:43 GMT",
"Transfer-Encoding": "chunked",
"x-ms-blob-type": "BlockBlob",
"x-ms-client-request-id": "f6d1983c-55e5-9f95-6d3d-80d74862d99e",
"x-ms-creation-time": "Fri, 24 Apr 2020 20:25:43 GMT",
"x-ms-lease-state": "available",
"x-ms-lease-status": "unlocked",
"x-ms-request-id": "46c09ab1-b01e-0001-1076-1acef2000000",
"x-ms-version": "2019-12-12"
},
"ResponseBody":{...}
Authorization
Authorization is required when calling any data access operation in Azure Storage. You can authorize the Query Blob Contents
operation as described below.
Important
Microsoft recommends using Microsoft Entra ID with managed identities to authorize requests to Azure Storage. Microsoft Entra ID provides superior security and ease of use compared to Shared Key authorization.
Azure Storage supports using Microsoft Entra ID to authorize requests to blob data. With Microsoft Entra ID, you can use Azure role-based access control (Azure RBAC) to grant permissions to a security principal. The security principal may be a user, group, application service principal, or Azure managed identity. The security principal is authenticated by Microsoft Entra ID to return an OAuth 2.0 token. The token can then be used to authorize a request against the Blob service.
To learn more about authorization using Microsoft Entra ID, see Authorize access to blobs using Microsoft Entra ID.
Permissions
Listed below are the RBAC action necessary for a Microsoft Entra user, group, managed identity, or service principal to call the Query Blob Contents
operation, and the least privileged built-in Azure RBAC role that includes this action:
- Azure RBAC action: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
- Least privileged built-in role: Storage Blob Data Reader
To learn more about assigning roles using Azure RBAC, see Assign an Azure role for access to blob data.
Remarks
- The
Query Blob Contents
operation is supported only on aBlockBlob
type. - Querying the contents of a blob that's encrypted with customer-provided keys is not supported in this version of the API.
- This operation is not supported on blobs in accounts that have infrastructure encryption enabled.
- The
x-ms-version
header is required to retrieve a blob that belongs to a private container. If the blob belongs to a container that's available for full or partial public access, any client can read it without specifying a version. The service version is not required for retrieving a blob that belongs to a public container. For more information, see Restrict access to containers and blobs. - You can use the
Query Blob Contents
operation to query only objects that have delimited/CSV or JSON format.
Billing
Pricing requests can originate from clients that use Blob Storage APIs, either directly through the Blob Storage REST API, or from an Azure Storage client library. These requests accrue charges per transaction. The type of transaction affects how the account is charged. For example, read transactions accrue to a different billing category than write transactions. The following table shows the billing category for Query Blob Contents
requests based on the storage account type:
Operation | Storage account type | Billing category |
---|---|---|
Query Blob Contents | Premium block blob Standard general-purpose v2 |
Read operations1 |
1In addition to a read charge, the account incurs charges for Query Acceleration - Data Scanned and Query Acceleration - Data Returned transaction categories. Pricing for these categories appears on the Azure Data Lake Storage pricing page.
See also
Authorize requests to Azure Storage Status and error codes Blob Storage error codes Set timeouts for Blob Storage operations Query acceleration: SQL language reference