An Azure machine learning service for building and deploying models.
Getting invalid dataflow in Azure ML Studio batch inference. Worked weeks ago
I have some code performing batch inference on Azure. The directories I want to run inference on are loaded into a MLTable file, then I pass this to the batch inference pipeline component. It worked fine a few weeks ago when I last modified it, but when I went to run it yesterday something has changed, and I get an error saying the dataflow is invalid:
[2023-10-11 16:07:46Z] Job failed, job RunId is <run id>. Error: {"Error":{"Code":"UserError","Severity":null,"Message":"
Error Code: ScriptExecution.StreamAccess.Unexpected
Native Error: error in streaming from input data sources
StreamError(Unknown(\"Dataflow at inmemory://dataflow/<dataflow id> is not valid.\", Some(DataflowInvalid(\"inmemory://dataflow/<dataflow id>\", VisitError(ExecutionError(StreamError(InvalidInput(InvalidUri { message: \"invalid uri format\", uri: \"azureml://subscriptions/<subscription id>/resourcegroups/<resource group name>/workspaces/<workspace name>/datastores/workspaceblobstore/paths/LocalUpload/<another id>/input_data/azureml://datastores/<datastore name>/paths/<folder name>/**.png\" }))))))))
=> Dataflow at inmemory://dataflow/<dataflow id> is not valid.
Unknown(\"Dataflow at inmemory://dataflow/<dataflow id> is not valid.\", Some(DataflowInvalid(\"inmemory://dataflow/<dataflow id>\", VisitError(ExecutionError(StreamError(InvalidInput(InvalidUri { message: \"invalid uri format\", uri: \"azureml://subscriptions/<subscription id>/resourcegroups/<resource group name>/workspaces/<workspace name>/datastores/workspaceblobstore/paths/LocalUpload/<another id>/input_data/azureml://datastores/<datastore name>/paths/<folder name>/**.png\" })))))))
=> Dataflow at inmemory://dataflow/<dataflow id> is not valid.
DataflowInvalid(\"inmemory://dataflow/<dataflow id>\", VisitError(ExecutionError(StreamError(InvalidInput(InvalidUri { message: \"invalid uri format\", uri: \"azureml://subscriptions/<subscription id>/resourcegroups/<resource group name>/workspaces/<workspace name>/datastores/workspaceblobstore/paths/LocalUpload/<another id>/input_data/azureml://datastores/<datastore name>/paths/<folder name>/**.png\" })))))
Error Message: Got unexpected error: Dataflow at inmemory://dataflow/<dataflow id> is not valid.. DataflowInvalid(\"inmemory://dataflow/<dataflow id>\", VisitError(ExecutionError(StreamError(InvalidInput(InvalidUri { message: \"invalid uri format\", uri: \"azureml://subscriptions/<subscription id>/resourcegroups/<resource group name>/workspaces/<workspace name>/datastores/workspaceblobstore/paths/LocalUpload/<another id>/input_data/azureml://datastores/<datastore name>/paths/<folder name>/**.png\" })))))| session_id=<session id>","MessageFormat":null,"MessageParameters":{},"ReferenceCode":null,"DetailsUri":null,"Target":null,"Details":[],"InnerError":null,"DebugInfo":null,"AdditionalInfo":null},"Correlation":null,"Environment":null,"Location":null,"Time":"0001-01-01T00:00:00+00:00","ComponentName":"CommonRuntime"}
(I've cleaned out all IDs as I don't know what's sensitive)
I had been using short form URIs like
azureml://datastores/{datastore}/paths/{folder}/**.png
as used here, for example, but this appears to have suddenly broken. I tried switching to fully qualified URIs (so including my subscription ID and resource group etc. but it also doesn't work, as it seems to add an extra slash, which I didn't have in my code:
azureml uri must follow pattern azureml://subscriptions/<subscription>/resourcegroups/<resourcegroup>/workspaces/<workspace>/...", uri: "azureml:///subscriptions/<subscription id>/resourcegroups/<resource group name>/workspaces/<workspace name>/datastores/<datastore name>/paths/<folder name>/**.png"
Note that I replaced the subscription name etc. in the second string.
The Python string I had defined was
f'azureml://datastores/<datastore name>/paths/{folder}/**.png'
I need this fixed quite urgently, so any help would be appreciated.