Export Data

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Writes a dataset to various forms of cloud-based storage in Azure, such as tables, blobs, and Azure SQL databases

Category: Data Input and Output

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article describes how to use the Export Data module in Machine Learning Studio (classic), to save results, intermediate data, and working data from your experiments into cloud storage destinations outside Machine Learning Studio (classic).

This module supports exporting or saving your data to the following cloud data services:

  • Export to Hive Query: Write data to a Hive table in an HDInsight Hadoop cluster.

  • Export to Azure SQL Database: Save data to Azure SQL Database or to Azure SQL Data Warehouse.

  • Export to Azure Table: Save data to the table storage service in Azure. Table storage is good for storing large amounts of data. It provides a tabular format that is scalable, inexpensive, and highly available.

  • Export to Azure Blob Storage: Saves data to the Blob service in Azure. This option is useful for images, unstructured text, or binary data. Data in the Blob service can be shared publicly or saved in secured application data stores.

Note

Export data module does not support connecting to Azure Blob storage account if "Secure Transfer Required" option is enabled.

  • Download data: To download your data so that you can open it in Excel or another application, use a module such as Convert to CSV or Convert to TSV to prepare the data in a particular format, and then download the data.

  • You can download the results of any module that outputs a dataset by right-clicking the output and selecting Download dataset. By default, the data is exported in CSV format.

  • Download a module definition or experiment graph: A new PowerShell library lets you download the complete metadata for your experiment, or the details for a particular module. The PowerShell for Machine Learning library is an experimental release, but has many useful cmdlets:

    • Get-AmlExperiment lists all the experiments in a workspace.
    • Export-AmlExperimentGraph exports a definition of the complete experiment to a JSON file.
    • Download-AmlExperimentNodeOutput lets you extract the information provided on the output ports of any module.

How to configure Export Data

  1. Add the Export Data module to your experiment in Studio (classic). You can find this module in the Input and Output category.

  2. Connect Export Data to the module that contain the data you want to export.

  3. Double-click Export Data to open the Properties pane.

  4. For Data destination, select the type of cloud storage where you'll save your data. If you make any changes to this option, all other properties are reset. So be sure to choose this option first!

  5. Provide an account name and authentication method required to access the specified storage account.

    Depending on the storage type and whether the account is secured, you might need to provide the account name, file type, access key, or container name. For sources that do not require authentication, generally it is sufficient to know the URL.

    For examples of each type, see the following topics:

  6. The option, Use cached results, lets you repeat the experiment without rewriting the same results each time.

    If you deselect this option, results are written to storage each time the experiment is run, regardless of whether the output data has changed.

    If you select this option, Export Data uses cached data, if available. New results are generated only when there is an upstream change that would affect the results.

  7. Run the experiment.

Examples

For examples of how to use the Export Data module, see the Azure AI Gallery:

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.

Implementation details

  • This module was previously named Writer. If you have an existing experiment that uses the Writer module, the module is renamed to Export Data when you refresh the experiment.

  • Not all modules produce output that is compatible with Export Data destinations. For example, Export Data cannot save a dataset that has been converted to the SVMLight format. Export Data supports these formats:

    • Dataset (Azure ML internal format)
    • .NET DataTable
    • CSV with or without headers
    • TSV with or without headers

Known issues

  • When you select Azure Table as the location to output your data, occasionally there might be an error when writing to the specified table. When this happens, the data might be written to a blob instead.

    If this error happens and later you are unable to read from the expected table, try using an Azure storage utility to check the blobs in the specified container in your storage account.

  • Currently, you cannot save a blob into a specified Hive table. If you need to write intermediate results, avoid using a Hive table in HDInsight, and use blob storage or table storage instead.

  • Currently, if you select HDFS as the location to save output data, this error message is returned: “Microsoft.Analytics.Exceptions.ErrorMapping+ModuleException.”

Expected inputs

Name Type Description
Dataset Data Table The dataset to be written.

Module parameters

This table lists parameters that apply to all Export Data options. Other parameters are dynamic and change depending on the data destination you select.

Name Range Type Default Description
Please specify data destination List DataSourceOrSink Blob service in Azure Storage Indicate whether the data destination is a file in the Blob service, a file in the Table service, a SQL database in Azure, or a Hive table.
Use cached results TRUE/FALSE Boolean FALSE Select this option to avoid rewriting results unnecessarily. If anything changes upstream in the experiment, Export Data will always execute and write new results. However if nothing has changed, and you have selected this option, Export Data will not execute in order to avoid rewriting the same results.

Exceptions

Exception Description
Error 0057 An exception occurs when attempting to create a file or blob that already exists.
Error 0001 An exception occurs if one or more specified columns of the dataset couldn't be found.
Error 0027 An exception occurs when two objects have to be of the same size, but they are not.
Error 0079 An exception occurs if the container name in Azure Storage is specified incorrectly.
Error 0052 An exception occurs if the storage access key for the Azure account is specified incorrectly.
Error 0064 An exception occurs if account name or storage access key for the Azure account is specified incorrectly.
Error 0071 An exception occurs if the provided credentials are incorrect.
Error 0018 An exception occurs if the input dataset is not valid.
Error 0029 An exception occurs when an invalid URI is passed.
Error 0003 An exception occurs if one or more inputs are null or empty.

For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.

See also

Import Data
Data Input and Output
Data Transformation
Comparing Azure Table Storage and Azure SQL Database
A-Z Module List