Редагувати

Поділитися через


Retrieve large cost datasets recurringly with exports

This article helps you regularly export large amounts of data with exports from Cost Management. Exporting is the recommended way to retrieve unaggregated cost data. Especially when usage files are too large to reliably call and download using the Cost Details API. Exported data is placed in the Azure Storage account that you choose. From there, you can load it into your own systems and analyze it as needed. To configure exports in the Azure portal, see Export data.

If you want to automate exports at various scopes, the sample API request in the next section is a good starting point. You can use the Exports API to create automatic exports as a part of your general environment configuration. Automatic exports help ensure that you have the data that you need. You can use in your own organization's systems as you expand your Azure use.

Common export configurations

Before you create your first export, consider your scenario and the configuration options need to enable it. Consider the following export options:

  • Recurrence - Determines how frequently the export job runs and when a file is put in your Azure Storage account. Choose between Daily, Weekly, and Monthly. Try to configure your recurrence to match the data import jobs used by your organization's internal system.
  • Recurrence Period - Determines how long the Export remains valid. Files are only exported during the recurrence period.
  • Time Frame - Determines the amount of data generated by the export on a given run. Common options are MonthToDate and WeekToDate.
  • StartDate - Configures when you want the export schedule to begin. An export is created on the StartDate and then later based on your Recurrence.
  • Type - There are three export types:
    • ActualCost - Shows the total usage and costs for the period specified, as they're accrued and shows on your bill.
    • AmortizedCost - Shows the total usage and costs for the period specified, with amortization applied to the reservation purchase costs that are applicable.
    • Usage - All exports created before July 20 2020 are of type Usage. Update all your scheduled exports as either ActualCost or AmortizedCost.
  • Columns – Defines the data fields you want included in your export file. They correspond with the fields available in the Cost Details API.
  • Partitioning - Set the option to true if you have a large dataset and would like it to be broken up into multiple files. It makes data ingestion faster and easier. For more information about partitioning, see File partitioning for large datasets.

Create a daily month-to-date export for a subscription

Request URL: PUT https://management.azure.com/{scope}/providers/Microsoft.CostManagement/exports/{exportName}?api-version=2020-06-01

{
  "properties": {
    "schedule": {
      "status": "Active",
      "recurrence": "Daily",
      "recurrencePeriod": {
        "from": "2020-06-01T00:00:00Z",
        "to": "2020-10-31T00:00:00Z"
      }
    },
    "format": "Csv",
    "deliveryInfo": {
      "destination": {
        "resourceId": "/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/MYDEVTESTRG/providers/Microsoft.Storage/storageAccounts/{yourStorageAccount} ",
        "container": "{yourContainer}",
        "rootFolderPath": "{yourDirectory}"
      }
    },
    "definition": {
      "type": "ActualCost",
      "timeframe": "MonthToDate",
      "dataSet": {
        "granularity": "Daily",
        "configuration": {
          "columns": [
            "Date",
            "MeterId",
            "ResourceId",
            "ResourceLocation",
            "Quantity"
          ]
        }
      }
    }
}

Copy large Azure storage blobs

You can use Cost Management to schedule exports of your Azure usage details into your Azure Storage accounts as blobs. The resulting blob sizes could be over gigabytes in size. The Cost Management team worked with the Azure Storage team to test copying large Azure storage blobs. The results are documented in the following sections. You can expect to have similar results as you copy storage blobs from one Azure region to another.

The team conducted a performance test by transferring blobs from storage accounts located in the US West region to the same region and to other regions. The team measured speeds that ranged from 2 GB per second in the same region to 150 MB per second to storage accounts in the South East Asia region.

Test configuration

To measure blob transfer speeds, the team created a simple .NET console application referencing the latest version (v2.0.1) of the Azure Data Movement Library (DLM) via NuGet. DLM is an SDK provided by the Azure Storage team that enables programmatic access to their transfer services. Then they created Standard V2 storage accounts in multiple regions and use the West US as the source region. They populated the storage accounts there with containers, where each held ten 2-GB block blobs. They copied the containers to other storage accounts using DLM's TransferManager.CopyDirectoryAsync() method with the CopyMethod.ServiceSideSyncCopy option. Tests were conducted on a computer running Windows 10 with 12 cores and 1-GbE network.

Application settings used:

  • TransferManager.Configurations.ParallelOperations = Environment.ProcessorCount * 32. The team found the setting to have the most effect on overall throughput. A value of 32 times the number of cores provided the best throughput for the test client.
  • ServicePointManager.DefaultConnectionLimit = int.MaxValue. Setting it to a maximum value effectively passes full control of transfer parallelism to the ParallelOperations setting mentioned previously.
  • TransferManager.Configurations.BlockSize = 4,194,304. It had some effect on transfer rates with 4 MB, proving to be best for testing.

For more information and sample code, see links in the Related content section.

Test results

Test number To region Blobs Time (secs) MB/s Comments
1 WestUS 2 GB x 10 10 2,000
2 WestUS2 2 GB x 10 33 600
3 EastUS 2 GB x 10 67 300
4 EastUS 2 GB x 10 x 4 99 200 Four parallel transfers using eight storage accounts: Four West to four East averages per transfer
6 EastUS 2 GB x 10 x 4 92 870 Four parallel transfers from one storage account to another
5 EastUS 2 GB x 10 x 8 148 135 Eight parallel transfers using eight storage accounts: Four West to four East averages per transfer
7 SE Asia 2 GB x 10 133 150
8 SE Asia 2 GB x 10 x 4 444 180 Four parallel transfers from one storage account to another

Sync transfer characteristics

Here are some of the characteristics of the service-side sync transfer used with DML that is relevant to its use:

  • DML can transfer a single blob or a directory. For directory transfer, you can use a search pattern to match on blob prefix.
  • Block blob transfers happen in parallel. All complete towards the end of the transfer process. Individual blob blocks are transferred in parallel.
  • The transfer is executed asynchronously on the client. The transfer status is available periodically via a callback to a method that can be defined in a TransferContext object.
  • The transfer creates checkpoints during its progress and exposes a TransferCheckpoint object. The object represents the latest checkpoint via the TransferContext object. If the TransferCheckpoint is saved before a transfer is cancelled/aborted, the transfer can be resumed from the checkpoint for up to seven days. The transfer can be resumed from any checkpoint, not just the latest.
  • If the transfer client process is killed and restarted without implementing the checkpoint feature:
    • Before any blob transfers complete, the transfer restarts.
    • After some of the blobs complete, the transfer restarts for only the incompleted blobs.
  • Pausing the client execution pauses the transfers.
  • The blob transfer feature abstracts the client from transient failures. For instance, storage account throttling doesn't normally cause a transfer to fail but slows the transfer.
  • Service-side transfers have low client resource usage for CPU and memory, some network bandwidth, and connections.

Async transfer characteristics

You can invoke the TransferManager.CopyDirectoryAsync() method with the CopyMethod.ServiceSideAsyncCopy option. It operates similar to the sync transfer mechanism from the client perspective but with the following differences in operation:

  • Transfer rates are slower than the equivalent sync transfer (typically 10 MB/s or less).
  • The transfer continues even if the client process terminates.
  • Although checkpoints are supported, resuming a transfer using a TransferCheckpoint doesn't resume at the checkpoint time but at the current state of the transfer.

Test summary

Azure blob storage supports high global transfer rates with its service-side sync transfer feature. Using the feature in .NET applications is straightforward using the Data Movement Library. It's possible for Cost Management exports to reliably copy hundreds of gigabytes of data to a storage account anywhere in less than an hour.