Read a TabularDataset in AML SDK v2
Jorge Lopez
36
Reputation points
When I try to load a TabularDataset (v1) in the new SDK v2 as input for a job, it raises an error. I have followed the instructions in the documentation
My TabularDataset is loaded from a Blostorage container with thousands of .csv files. And it still works when working with the SDK v1.
Here is my code:
from azure.ai.ml import command, Input, Output
from azure.ai.ml.constants import AssetTypes, InputOutputModes
data_asset = ml_client.data.get(name="tabular_dataset", version=1)
train_job = command(
code='./',
inputs = {"data": Input(type = AssetTypes.MLTABLE, path = data_asset, mode = InputOutputModes.DIRECT)},
outputs = {"output_folder": Output(type=AssetTypes.MLFLOW_MODEL)},
command = 'python conv_sdk_v2.py --data ${{inputs.data}} --output_folder ${{outputs.output_folder}}',
environment = f'{experiment_env.name}:{experiment_env.version}',
compute = compute_name_training,
experiment_name='sdk-v2-experiment-train',
)
train_job_output = ml_client.create_or_update(train_job)
And the exception raised:
Exception:
1) One or more fields are invalid
Details:
(x) Could not parse creation_context:
created_at: '2023-01-23T08:39:16.388014+00:00'
created_by: ****
created_by_type: User
last_modified_at: '2023-01-23T08:39:16.388014+00:00'
last_modified_by: ****
last_modified_by_type: User
id: /subscriptions/****/resourceGroups/****/providers/Microsoft.MachineLearningServices/workspaces/****/data/tabular_dataset/versions/1
name: tabular_dataset
path: azureml://subscriptions/****/resourcegroups/****/workspaces/****/datastores/datastore_name/paths/*.csv/
properties:
v1_type: tabular
tags: {}
type: mltable
version: '1'
. If providing an ARM id, it should start with a '/'.
Resolutions:
1) Double-check that all specified parameters are of the correct types and formats prescribed by the ArmResource schema.
If using the CLI, you can also check the full log in debug mode for more details by adding --debug to the end of your command
Additional Resources: The easiest way to author a yaml specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning. To set up VS Code, visit https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code
Alternative, I have tried to create a proper MLTable (SDK v2 native) from the Blobstorage, but I haven't achieved (there is lack of documentation or examples about it)
edit: typo