Export to Azure Blob Storage

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article describes how to use the Export to Azure Blob Storage option, in the Export Data module in Machine Learning Studio (classic).

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

This option is useful when you want to to export data from a machine learning experiment to Azure Blob Storage. For example, you might want to share machine learning data outputs with other applications, or store intermediate data or cleaned datasets for use in other experiments.

Azure blobs can be accessed from anywhere, by using either HTTP or HTTPS. Because Azure Blob Storage is an unstructured data store, you can export data in various formats. Currently, CSV, TSV, and ARFF formats are supported.

To export data to Azure blob for use by other applications, you use the Export Data module to save the data to Azure Blob Storage. Then, use any tool that can read data from Azure storage (such as Excel, cloud storage utilities, or other cloud services), to load and use the data.

Note

The Import Data and Export Data modules can read and write data only from Azure storage created using the Classic deployment model. In other words, the new Azure Blob Storage account type that offers a hot and cool storage access tiers is not yet supported.

Generally, any Azure storage accounts that you might have created before this service option became available should not be affected.

However, if you need to create a new account for use with Machine Learning, we recommend that you either select Classic for the Deployment model, or use Resource manager and for Account kind, select General purpose rather than Blob storage.

How to export data to Azure Blob Storage

The Azure blob service is for storing large amounts of data, including binary data. There are two types of blob storage: public blobs, and blobs that require login credentials.

  1. Add the Export Data module to your experiment. You can find this module in the Data Input and Output category in Studio (classic).

  2. Connect Export Data to the module that produces the data that you want to export to Azure Blob Storage.

  3. Open the Properties pane of Export Data. For the data destination, select Azure Blob Storage.

  4. For Authentication type, choose Public (SAS URL) if you know that the storage supports access via a SAS URL.

    A SAS URL is a special type of URL that can be generated by using an Azure storage utility, and is available for only a limited time. It contains all the information that is needed for authentication and download.

    For URI, type or paste the full URI that defines the account and the public blob.

  5. For private accounts, choose Account, and provide the account name and the account key, so that the experiment can write to the storage account.

    • Account name: Type or paste the name of the account where you want to save the data. For example, if the full URL of the storage account is https://myshared.blob.core.windows.net, you would type myshared.

    • Account key: Paste the storage access key that is associated with the account.

  6. Path to container, directory, or blob: Type the name of the blob where the exported data will be stored. For example, to save the results of your experiment to a new blob named results01.csv in the container predictions in an account named mymldata, the full URL for the blob would be https://mymldata.blob.core.windows.net/predictions/results01.csv.

    Therefore, in the field Path to container, directory, or blob, you would specify the container and blob name as follows: predictions/results01.csv

  7. If you specify the name of a blob that does not already exist, Azure creates the blob for you.

    When writing to an existing blob, you can specify that current contents of the blob be overwritten by setting the property, Azure Blob Storage write mode. By default, this property is set to Error, meaning that an error is raised whenever an existing blob file of the same name is found.

  8. For File format for blob file, select the format in which data should be stored.

    • CSV: Comma-separated values (CSV) is the default storage format. To export column headings together with the data, select the option, Write blob header row. For more information about the comma- delimited format used in Machine Learning, see Convert to CSV.

    • TSV: Tab-separated values (TSV) format is compatible with many machine learning tools. To export column headings together with the data, select the option, Write blob header row. For more information about the tab-separated format used in Machine Learning, see Convert to TSV.

    • ARFF: This format supports saving files in the format used by the Weka toolset. This format is not supported for files stored in a SAS URL. For more information about the ARFF format, see Convert to ARFF.

  9. Use cached results: Select this option if you want to avoid rewriting the results to the blob file each time you run the experiment. If there are no other changes to module parameters, the experiment writes the results only the first time the module is run, or when there are changes to the data.

Examples

For examples of how to use the Export Data module, see the Azure AI Gallery:

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

Common questions

How can I avoid writing the data if the experiment hasn't changed

When your experiment results changes, Export Data always saves the new dataset. However, if you are running the experiment repeatedly without making changes that affect the output data, you can select the Use cached results option.

The module checks whether the experiment has run previously using the same data and the same options, and if a previous run is found, the write operation is not repeated.

Can I save data to an account in a different geographical region

Yes, you can write data to accounts in different regions. However, if the storage account is in a different region from the compute node used for the machine learning experiment, data access might be slower. Also, you are charged for data ingress and egress on the subscription.

Module parameters

General options

Name Range Type Default Description
Data source List Data Source Or Sink Azure Blob Storage The destination can be a file in Azure BLOB storage, an Azure table, a table or view in an Azure SQL Database, or a Hive table.
Use cached results TRUE/FALSE Boolean FALSE Module only executes if valid cache does not exist; otherwise use cached data from prior execution.
Please specify authentication type SAS/Account AuthenticationType Account Indicates whether SAS or account credentials should be used for access authorization

Public or SAS - Public storage options

Name Range Type Default Description
SAS URI for blob any String none The SAS URI of the blob to be written to (required)
File format for SAS file ARFF

CSV

TSV
LoaderUtils.FileTypes CSV Indicates whether file is CSV, TSV, or ARFF. (required)
Write SAS header row TRUE/FALSE Boolean FALSE Indicates whether column headings should be written to the file

Account - Private storage options

Name Range Type Default Description
Azure account name any String none Azure user account name
Azure account key any SecureString none Azure storage key
Path to blob beginning with container any String none Name of the blob file, beginning with the container name
Azure Blob Storage write mode List: Error, Overwrite enum:BlobFileWriteMode Error Choose the method of writing blob files
File format for blob file ARFF

CSV

TSV
LoaderUtils.FileTypes CSV Indicates whether blob file is CSV, TSV, or ARFF
Write blob header row TRUE/FALSE Boolean FALSE Indicates whether blob file should have header row

Exceptions

Exception Description
Error 0027 An exception occurs when two objects have to be the same size, but they are not.
Error 0003 An exception occurs if one or more of inputs are null or empty.
Error 0029 An exception occurs when an invalid URI is passed.
Error 0030 an exception occurs in when it is not possible to download a file.
Error 0002 An exception occurs if one or more parameters could not be parsed or converted from the specified type to the type required by the target method.
Error 0009 An exception occurs if the Azure storage account name or the container name is specified incorrectly.
Error 0048 An exception occurs when it is not possible to open a file.
Error 0046 An exception occurs when it is not possible to create a directory on specified path.
Error 0049 An exception occurs when it is not possible to parse a file.

For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.

See also

Import Data
Export Data
Export to Azure SQL Database
Export to Hive Query
Export to Azure Table