question

subtle-curry avatar image
0 Votes"
subtle-curry asked subtle-curry answered

Azure ML workspace authorization failing when running via pipeline - OAuth 2.0 device flow error


I followed the instructions here: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-machine-learning-pipelines to publish and run a pipeline.

But the pipeline steps throws the following error:

"error": {
"message": "AADSTS70016: OAuth 2.0 device flow error. Authorization is pending. Continue polling. Timestamp: 2022-05-20 18:17:51Z"
}


From the error message, it looks like the pipeline step is stuck at the 'interactive authorization' step and timesout after 900.0 sec.

At the moment we are just testing this in lower environment, hence I have not used service principal for authorization.

Can someone please suggest how to fix this?

azure-machine-learning
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@tender-honey Thanks for the question. Can you please add more details about the step that throw this error.

Also Share how the datastore is configured and make sure you have "storage blob data reader" access.

0 Votes 0 ·

its the beginning of 1st step:

*[2022-05-20T18:02:53.165265] Entering Run History Context Manager.
[2022-05-20T18:02:53.967379] Current directory: /mnt/batch/tasks/shared
[2022-05-20T18:02:53.967705] Preparing to call script [1.py] with arguments:['arg_1', 'arg_2']
[2022-05-20T18:02:53.967742] After variable expansion, calling script [1.py] with arguments:['arg_1', 'arg_2']


Performing interactive authentication. Please follow the instructions on the terminal.
2022/05/20 18:02:57 Not exporting to RunHistory as the exporter is either stopped or there is no data.
Stopped: false
OriginalData: 1
FilteredData: 0.*

Datastore type
Azure Blob Storage

Dataset is Tabular type


Azure Machine Learning service has Storage Blob Data Contributor permission to storage account
And I have Contributor permission








0 Votes 0 ·

@ramr-msft - i put this single line - "workspace = Workspace.from_config()" and it failed. Looks like it's not able to authorize to access the workspace.

0 Votes 0 ·
ramr-msft avatar image
0 Votes"
ramr-msft answered subtle-curry edited

@tender-honey Thanks for the details. We are able to successfully authenticate. Interactive authentication uses your browser, and requires cookies (including 3rd party cookies). If you have disabled cookies. The error may also occur if you have enabled Azure AD Multi-Factor Authentication.

Please follow the document to setup authentication.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Were you able to authenticate when running through pipeline?
Because when I create workspace from notebook, it works fine, but when running through pipeline, it fails.

I tried this:
from azureml.core.authentication import AzureCliAuthentication
cli_auth = AzureCliAuthentication()
ws = Workspace(subscription_id="your-sub-id",
resource_group="your-resource-group-id",
workspace_name="your-workspace-name",
auth=cli_auth
)

but it gives:
"error": {
"code": "UserError",
"message": "Could not retrieve user token. Please run 'az login'"
}

The documentation says - 'will automatically prompt you with a UI-based authentication flow' --- how will a prompt appear when running the code through ML pipeline ?


Yes, my user has multi-factor auth enabled.

0 Votes 0 ·
subtle-curry avatar image
0 Votes"
subtle-curry answered

@ramr-msft - I ended up using a service principal to authenticate. Still not sure how to use interactive authorization in pipelines.

But now I am stuck at another point, below is the code for my pipeline:

*datastore_name = 'tmp'
datastore = Datastore.get(workspace, datastore_name)
step1_output_data = OutputFileDatasetConfig(name="step1_output_data", destination=(datastore, "{run-id}/"))
curated_env_name = 'my-env'
pytorch_env = Environment.from_conda_specification(name=curated_env_name, file_path='./conda_dependencies.yml')
cluster_name = 'cpu64'

src = ScriptRunConfig(
source_directory='../../python-pipeline',
script="1.py",
compute_target=cluster_name,
environment=pytorch_env,
)

step_1 = PythonScriptStep(
name="step_1",
script_name="1.py",
source_directory='../../python-pipeline',
compute_target=cluster_name,
arguments=[step1_output_data],
allow_reuse=True,
runconfig=src.run_config,
)

step_2 = PythonScriptStep(
name="step_2",
script_name="2.py",
source_directory='../../python-pipeline',
compute_target=cluster_name,
arguments=[step1_output_data.as_input(name='step1_output_data')],
allow_reuse=True,
)*

When I run the above pipeline, the step_1 shows complete, BUT when I read through the logs, i see this:

{"FileSystemName":"data","Uri":null,"Account":"storage_acc_name","RelativePath":"6666666-8888-4444-bbbb-fffffffffffff/step1_output_data","PathType":0,"AmlDataStoreName":"tmp"}

I would expect this step1_output_data to be written out in the adls gen 2, but as the URI above is null, it is not writing anything.

And as a result of this, step_2 fails with :

"error": {
"code": "UserError",
"message": "Cannot mount Dataset(id='6666666-8888-4444-bbbb-fffffffffffff', name='None', version=None). Error Message: DataAccessError(NotFound)"
}

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.