Questions about aml python sdk v2 for pipeline

Question

Questions about aml python sdk v2 for pipeline

Yongchao Liu (Neusoft America Inc) 191 Microsoft External Staff

We are currently migrating v1 to v2
pipeline takes the decorator form in v2. Is there any other way similar to v1


datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data", destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",
    script_name="step1.py",
    runconfig = aml_run_config,
    arguments = ["--output_path", step1_output_data]
)

step2 = PythonScriptStep(
    name="read_pipeline_data",
    script_name="step2.py",
    compute_target=compute,
    runconfig = aml_run_config,
    arguments = ["--pd", step1_output_data.as_input()]

)

pipeline = Pipeline(workspace=ws, steps=[step1, step2])

v1 is very convenient for ci/cd deployment and generalizing the code


cluster_name = "cpu-cluster"
custom_path = "azureml://datastores/workspaceblobstore/paths/custom_path/${{name}}/"

# define a pipeline with component
@pipeline(default_compute=cluster_name)
def pipeline_with_python_function_components(input_data, test_data, learning_rate):
    """E2E dummy train-score-eval pipeline with components defined via python function components"""

    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=input_data, max_epochs=5, learning_rate=learning_rate
    )
    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_data
    )
    # example how to change path of output on step level,
    # please note if the output is promoted to pipeline level you need to change path in pipeline job level
    score_with_sample_data.outputs.score_output = Output(
        type="uri_folder", mode="rw_mount", path=custom_path
    )
    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "eval_output": eval_with_sample_data.outputs.eval_output,
        "model_output": train_with_sample_data.outputs.model_output,
    }


pipeline_job = pipeline_with_python_function_components(
    input_data=Input(
        path="wasbs://******@dprepdata.blob.core.windows.net/Titanic.csv", type="uri_file"
    ),
    test_data=Input(
        path="wasbs://******@dprepdata.blob.core.windows.net/Titanic.csv", type="uri_file"
    ),
    learning_rate=0.1,
)
# example how to change path of output on pipeline level
pipeline_job.outputs.model_output = Output(
    type="uri_folder", mode="rw_mount", path=custom_path
)

v2 I have two problems at the moment:

1.Decorator issue. Decorator is not supported for methods in the class，I had to do it.
User's image

2.pipeline_with_python_function_components, for example, seem unable to generalize

If I have multiple pipelines that need to be processed, I need to declare multiple such jobs for processing. The entry and sale may be different

So does anyone have a way around decorator

Thanks

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2023-03-21T09:22:50.99+00:00

Yongchao Liu (Neusoft America Inc) For the first scenario, I think it could be a feature that needs to be requested with the SDK team by raising an issue on this repo.

For the second scenario, different pipelines with different inputs would be different jobs. I think this is the same scenario with v1 too. May be you could add more details if I have misunderstood your question. Thanks!!

Yongchao Liu (Neusoft America Inc) 191 Microsoft External Staff

Hi @romungi-MSFT
Sorry for the delay in replying you, Thank you for your comments.

For the second scenario
We had a project where we had a lot of pipelines in v1, and we could deploy different pipelines using a common script
like that:
pipeline = Pipeline(workspace=ws, steps=[step1, step2])

We were mainly responsible for deploying the pipeline part which was kind of a black box for us and we didn't know what the parameters were, right.

But the v2 part how do we generalize this part which I think is the hard part

Maybe different pipeline input and output are different, I don't know the relationship between step values

Thanks


# define a pipeline with component
@pipeline(default_compute=cluster_name)
def pipeline_with_python_function_components(input_data, test_data, learning_rate):
    """E2E dummy train-score-eval pipeline with components defined via python function components"""

    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    train_with_sample_data = train_model(
        training_data=input_data, max_epochs=5, learning_rate=learning_rate
    )
    score_with_sample_data = score_data(
        model_input=train_with_sample_data.outputs.model_output, test_data=test_data
    )
    # example how to change path of output on step level,
    # please note if the output is promoted to pipeline level you need to change path in pipeline job level
    score_with_sample_data.outputs.score_output = Output(
        type="uri_folder", mode="rw_mount", path=custom_path
    )
    eval_with_sample_data = eval_model(
        scoring_result=score_with_sample_data.outputs.score_output
    )

    # Return: pipeline outputs
    return {
        "eval_output": eval_with_sample_data.outputs.eval_output,
        "model_output": train_with_sample_data.outputs.model_output,
    }


pipeline_job = pipeline_with_python_function_components(
    input_data=Input(
        path="wasbs://******@dprepdata.blob.core.windows.net/Titanic.csv", type="uri_file"
    ),
    test_data=Input(
        path="wasbs://******@dprepdata.blob.core.windows.net/Titanic.csv", type="uri_file"
    ),
    learning_rate=0.1,
)

Your answer

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2023-03-21T09:22:50.99+00:00

Yongchao Liu (Neusoft America Inc) For the first scenario, I think it could be a feature that needs to be requested with the SDK team by raising an issue on this repo.

For the second scenario, different pipelines with different inputs would be different jobs. I think this is the same scenario with v1 too. May be you could add more details if I have misunderstood your question. Thanks!!

Share via

Questions about aml python sdk v2 for pipeline

Your answer