Data Science Virtual Machine data ingestion tools
At an early stage in a data science or AI project, you must identify the needed datasets, and then bring them into your analytics environment. The Data Science Virtual Machine (DSVM) provides tools and libraries to bring data from different sources into local analytical data storage resources on the DSVM. The DSVM can also bring data into a data platform located either on the cloud or on-premises.
The DSVM offers these data movement tools:
Azure CLI
Category | Value |
---|---|
What is it? | A management tool for Azure. It offers command verbs to move data from Azure data platforms - for example, Azure Blob storage and Azure Data Lake Store |
Supported DSVM versions | Windows, Linux |
Typical uses | Import and export data between Azure Storage and Azure Data Lake Store |
How to use / run it? | Open a command prompt, and type az to get help. |
Links to samples | Using Azure CLI |
AzCopy
Category | Value |
---|---|
What is it? | A tool to copy data between local files, Azure Blob storage, files, and tables |
Supported DSVM versions | Windows |
Typical uses | Copy files to Azure Blob storage Copy blobs between accounts |
How to use / run it? | Open a command prompt, and type azcopy to get help. |
Links to samples | AzCopy on Windows |
Azure Cosmos DB Data Migration tool
Category | Value |
---|---|
What is it? | Tool to import data from various sources into Azure Cosmos DB, a NoSQL database in the cloud. These sources include JSON files CSV files SQL MongoDB Azure Table storage Amazon DynamoDB Azure Cosmos DB for NoSQL collections |
Supported DSVM versions | Windows |
Typical uses | Import files from a VM to Azure Cosmos DB import data from Azure table storage to Azure Cosmos DB import data from a Microsoft SQL Server database to Azure Cosmos DB |
How to use / run it? | To use the command-line version, open a command prompt and type dt . To use the GUI tool, open a command prompt and type dtui |
Links to samples | Import data into Azure Cosmos DB |
Azure Storage Explorer
Category | Value |
---|---|
What is it? | Graphical User Interface to interact with files stored in the Azure cloud |
Supported DSVM versions | Windows |
Typical uses | Import data to and export data from the DSVM |
How to use / run it? | Search for "Azure Storage Explorer" in the Start menu |
Links to samples | Azure Storage Explorer |
bcp
Category | Value |
---|---|
What is it? | SQL Server tool to copy data between SQL Server and a data file |
Supported DSVM versions | Windows |
Typical uses | Import a CSV file into a SQL Server table Export a SQL Server table to a file |
How to use / run it? | Open a command prompt, and type bcp to get help |
Links to samples | bcp utility |
blobfuse
Category | Value |
---|---|
What is it? | A tool to mount an Azure Blob storage container in the Linux file system |
Supported DSVM versions | Linux |
Typical uses | Read from and write to blobs in a container |
How to use and run it? | Run blobfuse at a terminal |
Links to samples | blobfuse on GitHub |