i cant trigger azure ml pipeline from synapse because of this error User starting the run is not an owner or assigned user to the Compute Instance

Question

i cant trigger azure ml pipeline from synapse because of this error User starting the run is not an owner or assigned user to the Compute Instance

MarwanSamrout-7915 40

i have a working pipeline in azure ml and i successfully ran it in azure ml workspace (notebook) ,and i published the pipeline so i can use it in synapse analytics, but when i try to trigger the pipeline in synapse i get this error :
Failed to submit job due to Exception: Response status code does not indicate success: 400 (User starting the run is not an owner or assigned user to the Compute Instance). User starting the run is not an owner or assigned user to the Compute Instance.
here is my component code , i tried with both compute cluster and computer instance and i get the same issue.

import os  
import argparse  
from azureml.core import Dataset
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import logging
# import mlflow
def main():
    parser=argparse.ArgumentParser()
    parser.add_argument("--data",type=str,help="data lake gen2 path for input data")
    parser.add_argument("--path",type=str,help="path for output")
    parser.add_argument("--outputdata",type=str,help="data lake gen2 path for output data")
    args=parser.parse_args()
    print("test testtest testtest test")
    # mlflow.start_run()
    #my func
    data_path="csv web path"
    # mlflow.log_metric("data_path", args.data)
    dataset = Dataset.Tabular.from_delimited_files(data_path, separator=',',set_column_types=None)
    print(dataset.to_pandas_dataframe().head())
    df = dataset.to_pandas_dataframe()
    token_credential = DefaultAzureCredential()
    account_url = f"ACCOUNTURl LINK"
    service_client = DataLakeServiceClient(account_url, credential=token_credential)
    file_system_client = service_client.get_file_system_client("dataasset2")
    directory_client = file_system_client.get_directory_client("test")
    file_name = args.path
    file_client = directory_client.get_file_client(file_name)
    csv_data = df.to_csv(index=False).encode('utf-8')
    file_client.upload_data(csv_data, overwrite=True)
    # mlflow.end_run()

if __name__=="__main__":
    main()

@Amira Bedhiafi

1 answer

Your answer

Answer 1

Amira Bedhiafi 33,071 Volunteer Moderator

A specific Azure compute instance is assigned to one user who has rights as a root. This configuration assures that all operations and job of experiment are assigned onto the identity this user has within RBAC for Azure When an automation process uses a different user account from the owner of instance, it can error while trying to access the compute Instance. To solve this, use either a compute cluster or start the run with an account that is owned by the owner of a compute cluster. Here are old threads to help you : https://learn.microsoft.com/en-us/answers/questions/1350625/the-user-starting-the-run-is-not-an-owner-or-assig https://learn.microsoft.com/en-us/answers/questions/661588/executing-pipeline-in-aml-from-adf-suddenly-stoppe

MarwanSamrout-7915 40 Reputation points

2024-01-29T17:49:25.27+00:00

im the owner of the compute instance , and im trying from synapse pipeline to trigger the pipeline with same account

MarwanSamrout-7915 40

im getting this error when i try to run the pipeline from azure ml (notebook) using compute cluster


Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details. Error:     dataflow = _transform_and_validate(
  File "/azureml-envs/sklearn-1.1/lib/python3.8/site-packages/azureml/data/dataset_factory.py", line 1225, in _transform_and_validate
    _validate_has_data(dataflow, 'Failed to validate the data.')
  File "/azureml-envs/sklearn-1.1/lib/python3.8/site-packages/azureml/data/dataset_error_handling.py", line 69, in _validate_has_data
    raise DatasetValidationError(error_message + '\n' + e.compliant_message, exception=e)
azureml.data.dataset_error_handling.DatasetValidationError: DatasetValidationError:
	Message: Failed to validate the data.
ScriptExecutionException was caused by StreamAccessException.
  Unable to authenticate data access to '[REDACTED]'.
    AuthenticationException was caused by AzureIdentityAccessTokenResolutionException.
      Compute has no identity provisioned.
| session_id=771c548f-2dbd-4bce-a8b9-eec29c26257b
	InnerException None
	ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Failed to validate the data.\nScriptExecutionException was caused by StreamAccessException.\n  Unable to authenticate data access to '[REDACTED]'.\n    AuthenticationException was caused by AzureIdentityAccessTokenResolutionException.\n      Compute has no identity provisioned.\n| session_id=771c548f-2dbd-4bce-a8b9-eec29c26257b"
    }
}

Amira Bedhiafi 33,071 Reputation points Volunteer Moderator

2024-01-29T19:13:51.2533333+00:00

You have an issue of authentication ; https://learn.microsoft.com/en-us/azure/machine-learning/how-to-identity-based-service-authentication?view=azureml-api-2&tabs=cli
MarwanSamrout-7915 40 Reputation points

2024-01-29T19:31:01.99+00:00

im stuck on this step , i did assing System assigned identity but when i go to azure storage to assign a role to the given principle id i cant , i only can search by name
Mikhail Agladze 0 Reputation points

2024-09-01T22:04:42.37+00:00

you are a saint... this was the problem... had to use computer cluster instead of compute instance

Share via

i cant trigger azure ml pipeline from synapse because of this error User starting the run is not an owner or assigned user to the Compute Instance

1 answer

Your answer