Share via

ScriptExecution.StreamAccess.NotFound

CG-8750 0 Reputation points
2025-10-13T02:22:30.2366667+00:00

Every time I try to run a pipeline on my dataset, tried both as a txt or csv data asset, I get this message:

Error Code: ScriptExecution.StreamAccess.NotFound Native Error: error in streaming from input data sources StreamError(NotFound) => stream not found NotFound Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=xxx

The URI is correct. The components and script are structured in a way that should be of correct syntax and sufficient compatibility between each other. I have checked permissions and can't see anything I'd believe to interfere. I have tried a number of different methods, a dozen environments, small alterations and the only conclusion I can fall on is that there is a connectivity issue between the pipeline and the data asset. I run it from the directory where the scripts are stored in the workspace, referenced to registered components, run against the direct azure: address where the data asset is kept. The assets can connect when I run a script manually.

Microsoft suggestions haven't helped, there's nothing from community forums that I've been able to find that helps and this is only a data reading, cleaning and then conversion to an mltable which becomes a data asset artifact conversion script. I'd appreciate a solution.

1 data read, clean and data asset create script. 1 component to reference it. 1 component to run the pipeline. When this is finished I intend to add more scripts and components under the pipeline but it's not working so I'm stuck. Every single time it appears to be a data asset connectivity issue.

Additional details: I ran a minimal diagnostic pipeline, whose only job is to run an ls command on the data asset, also fails with the exact same StreamError(NotFound). This proves the issue is not related to the Python script, custom environment, or any specific package. Hierarchical namespace is disabled and I created a managed identity with a storage blob data reader role to try as a fix.

Azure Machine Learning
0 comments No comments

1 answer

Sort by: Most helpful
  1. Aryan Parashar 3,695 Reputation points Microsoft External Staff Moderator
    2025-10-13T09:32:59.2733333+00:00

    Hi CG-8750,

    Only use a valid datastore URI inside the pipeline code while submitting the pipeline job. Inside the individual component, it is best to use Blob for file access inside the pipeline but only use a valid datastore URI while submitting the job.

    The pipeline job only accepts a datastore URI, not a Blob URI, as shown below:

    azureml://subscriptions<subscription-id>/resourcegroups/<resource-group-nane>/workspaces/<workspace-name>/datastores/workspaceblobstore/paths/LocalUpload/<folder>/<dataset-file>
    

    An example pipeline structure is shown below:

    
    │   environment.yml
    │   pipeline.yml
    │   run_pipeline.py
    │
    └───components
        ├───data_reader
        │       component_spec.yaml
        │       data_reader.py
        │
        ├───data_writer
        │       component_spec.yaml
        │       data_writer.py
        │
        └───data_processor
                component_spec.yaml
                data_processor.py
    

    example of run_pipeline.py is shown below:

    from azure.ai.ml import MLClient, Input, Output, command, dsl
    from azure.ai.ml.entities import Environment, AmlCompute
    from azure.identity import DefaultAzureCredential
    
    # Initialize ML Client
    ml_client = MLClient(
       DefaultAzureCredential(),
       subscription_id="<subscription_id>",
       resource_group_name="<resource_group_name>",
       workspace_name="<workspace_name>"
    )
    
    env = Environment(
       name="genai-test-env",
       description="Environment for GenAI processing",
       image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04",
       conda_file="environment.yml"
    )
    
    ml_client.environments.create_or_update(env)
    # Ensure compute cluster exists
    compute_name = "GenAI-pipeline-compute"
    try:
       compute = ml_client.compute.get(compute_name)
       print(f"Using existing compute: {compute_name}")
    except Exception:
       print(f"Creating new compute: {compute_name}")
       compute_config = AmlCompute(
           name=compute_name,
           size="Standard_DS3_v2",
           min_instances=0,
           max_instances=4,
       )
       ml_client.compute.begin_create_or_update(compute_config).result()
    
    # Define components
    data_reader_component = command(
       name="data_reader",
       display_name="Read Excel Data",
       description="Reads prompts from Excel file in Blob Storage",
       inputs={"input_path": Input(type="uri_file")},
       outputs={"output_data": Output(type="uri_file")},
       code="./components/data_reader",
       command="python data_reader.py --input_path ${{inputs.input_path}} --output_path ${{outputs.output_data}}",
       environment="genai-test-env@latest"
    )
    openai_processor_component = command(
       name="openai_processor",
       display_name="Process with OpenAI",
       description="Generates responses using Azure OpenAI",
       inputs={"input_data": Input(type="uri_file")},
       outputs={"output_data": Output(type="uri_file")},
       code="./components/openai_processor",
       command="python openai_processor.py --input_data ${{inputs.input_data}} --output_data ${{outputs.output_data}}",
       environment="genai-test-env@latest"
    )
    data_writer_component = command(
       name="data_writer",
       display_name="Write Excel Output",
       description="Writes responses to Excel file",
       inputs={"input_data": Input(type="uri_file")},
       outputs={"output_path": Output(type="uri_file")},
       code="./components/data_writer",
       command="python data_writer.py --input_data ${{inputs.input_data}} --output_path ${{outputs.output_path}}",
       environment="genai-test-env@latest"
    )
    # Build pipeline
    @dsl.pipeline(
       name="GenAI-Prompt-Pipeline",
       description="End-to-end prompt processing pipeline",
       default_compute_target=compute_name
    )
    
    def genai_pipeline():
       reader = data_reader_component(
           input_path=Input(
               type="uri_file",
               path="azureml://subscriptions<subscription-id>/resourcegroups/<resource-group-nane>/workspaces/<workspace-name>/datastores/workspaceblobstore/paths/LocalUpload/<folder>/<dataset-file>"
           )
       )
       processor = openai_processor_component(input_data=reader.outputs.output_data)
       writer = data_writer_component(input_data=processor.outputs.output_data)
       return writer.outputs
    
    # Submit pipeline
    pipeline_job = ml_client.jobs.create_or_update(
       genai_pipeline(),
       experiment_name="genai-prompt-processing"
    )
    print(f"Pipeline submitted successfully! Job name: {pipeline_job.name}")
    print(f"Monitor progress at: https://ml.azure.com/jobs/{pipeline_job.name}?wsid=/subscriptions/{ml_client.subscription_id}/resourcegroups/{ml_client.resource_group_name}/workspaces/{ml_client.workspace_name}")
    

    As shown above, only use a valid datastore URI while submitting a pipeline job.

    Below is an example of pipeline.yaml:

    $schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
    type: pipeline
    display_name: GenAI-Prompt-Processing
    description: Pipeline for processing prompts with Azure OpenAI using blob storage
    
    inputs:
      input_file:
        type: uri_file
        path: <datastore uri or any this doesn't do anything>
    
    settings:
      default_compute: GenAI-pipeline-compute
      default_datastore: workspaceblobstore
    
    jobs:
      read_data:
        component: azureml:data_reader_component@latest
        inputs:
          input_path: ${{parent.inputs.input_file}}
        compute: GenAI-pipeline-compute
    
      process_prompts:
        component: azureml:openai_processor_component@latest
        inputs:
          input_data: ${{parent.jobs.read_data.outputs.output_data}}
        compute: GenAI-pipeline-compute
    
      write_results:
        component: azureml:data_writer_component@latest
        inputs:
          input_data: ${{parent.jobs.process_prompts.outputs.output_data}}
        compute: GenAI-pipeline-compute
    

    Below is an example of component_spec.yaml:

    $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
    name: data_reader_component
    display_name: Data Reader
    version: 1.0.0
    type: command
    inputs:
      input_path:
        type: uri_file
        description: Path to input Excel file
    outputs:
      output_data:
        type: uri_file
        description: Processed data in JSON format
    code: ./components/data_reader
    environment: azureml:genai-test-env@latest
    command: >-
      python data_reader.py 
      --input_path ${{inputs.input_path}} 
      --output_path ${{outputs.output_data}}
    

    For .yaml schemas, please visit the below link: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json

    https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json

    To trigger the pipeline after successful creation, refer to the supported documentation:
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-trigger-published-pipeline?view=azureml-api-1

    Feel free to accept this as an answer.

    Thank you for reaching out to the Microsoft QNA Portal.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.