Read a TabularDataset in AML SDK v2

Jorge Lopez 36 Reputation points
2023-11-01T12:30:21.8+00:00

When I try to load a TabularDataset (v1) in the new SDK v2 as input for a job, it raises an error. I have followed the instructions in the documentation

My TabularDataset is loaded from a Blostorage container with thousands of .csv files. And it still works when working with the SDK v1.

Here is my code:

from azure.ai.ml import command, Input, Output
from azure.ai.ml.constants import AssetTypes, InputOutputModes

data_asset = ml_client.data.get(name="tabular_dataset", version=1)

train_job = command(
    code='./',
    inputs = {"data": Input(type = AssetTypes.MLTABLE, path = data_asset, mode = InputOutputModes.DIRECT)},
    outputs = {"output_folder": Output(type=AssetTypes.MLFLOW_MODEL)},
    command = 'python conv_sdk_v2.py --data ${{inputs.data}} --output_folder ${{outputs.output_folder}}',
    environment = f'{experiment_env.name}:{experiment_env.version}',
    compute = compute_name_training,
    experiment_name='sdk-v2-experiment-train',
)
train_job_output = ml_client.create_or_update(train_job)

And the exception raised:

Exception: 


1) One or more fields are invalid

Details: 

(x) Could not parse creation_context:
  created_at: '2023-01-23T08:39:16.388014+00:00'
  created_by: ****
  created_by_type: User
  last_modified_at: '2023-01-23T08:39:16.388014+00:00'
  last_modified_by: ****
  last_modified_by_type: User
id: /subscriptions/****/resourceGroups/****/providers/Microsoft.MachineLearningServices/workspaces/****/data/tabular_dataset/versions/1
name: tabular_dataset
path: azureml://subscriptions/****/resourcegroups/****/workspaces/****/datastores/datastore_name/paths/*.csv/
properties:
  v1_type: tabular
tags: {}
type: mltable
version: '1'
. If providing an ARM id, it should start with a '/'.

Resolutions: 
1) Double-check that all specified parameters are of the correct types and formats prescribed by the ArmResource schema.
If using the CLI, you can also check the full log in debug mode for more details by adding --debug to the end of your command

Additional Resources: The easiest way to author a yaml specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: https://code.visualstudio.com/docs/datascience/azure-machine-learning. To set up VS Code, visit https://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code

Alternative, I have tried to create a proper MLTable (SDK v2 native) from the Blobstorage, but I haven't achieved (there is lack of documentation or examples about it)

edit: typo

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,729 questions
{count} votes