Azure data factory run Python to access Azure blob storage failed

Han Shih 施學翰 146 Reputation points
2021-10-14T01:41:22.503+00:00

I follow this document and try to "download a file, make some modification, and upload the file"

In "Create a Batch pool using Batch Explorer" section:
I change the OS from Windows to Linux
140299-image.png

Setup commands for my python module installation
140401-image.png

Here is my sample code

I even use the BatchExplorer tool to SSH into the node, and run my Python script, which works fine.
140347-image.png

And then I follow the ADF setup in the tutorial:
140392-image.png

The batch account also links to the storage account that I want to perform read write task.
(Maybe it has nothing to do with the storage I want to access? All I need is the connection string?)
140402-image.png

Python script is placed at the corresponding location
(In fact, if I replace the "azure blob storage stuff" with "local file read & write". It passed the debug process successfully.)
140300-image.png

And last, the error occurred. I collect all messages into a file.
140403-error-details.log

Any suggestion will be appreciated! Thanks!

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,449 questions
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
307 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,381 Reputation points Microsoft Employee
    2021-10-15T22:52:51.997+00:00

    Hello @Han Shih 施學翰 ,
    Thanks for the ask and using Microsoft Q&A platform .
    The example which you have refered in the ask uses the windows server and so the below command does make sense .

    140886-image.png

    but in Ubuntu there is no "cmd" , I think if you just call
    pip# install XXXXX
    It should work .

    Please do let me know how it goes .
    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Han Shih 施學翰 146 Reputation points
    2021-10-18T01:39:18.897+00:00

    @HimanshuSinha-msft , thanks for your reply

    I modified the start task as you advised:
    141192-image.png

    First, I tried a simple script. And it worked as I expected.
    141100-image.png

    141201-image.png

    Second, I changed to another script interacting with Azure blob storage.
    I found that once I add this line from azure.storage.blob import BlobClient, BlobServiceClient, ContainerClient, the same error occurred.

    So, I think there might be something wrong in my task pool settings.
    In 'start task' section, I saw this message:

    To allow tasks to run in containers, the container configuration must be specified for the task's pool. Please ensure that pool container configuration is set up and selected for this task.

    However, I can not find a way to deal with it.
    Maybe I have to some configurations about 'start task' do not set correctly?


  2. Han Shih 施學翰 146 Reputation points
    2021-10-22T01:48:18.537+00:00

    @HimanshuSinha-msft

    Here is my script
    142721-image.png

    Pipeline settings
    142712-image.png

    Start task in Azure Batch
    142731-image.png

    As you say, the reason might be:
    1.module installation failed
    2.module installed, but cannot be loaded in execution

    However, the output of the execution seems confusing

    {
    "exitcode": 1,
    "outputs": [
    " https://sysgeneral.blob.core.windows.net/adfjobs/fc915f8d-d513-4683-a0b9-e64ebb85b8d7/output/stdout.txt ",
    " https://sysgeneral.blob.core.windows.net/adfjobs/fc915f8d-d513-4683-a0b9-e64ebb85b8d7/output/stderr.txt "
    ],
    "errorCategory": 0,
    "code": "FailureExitCode",
    "message": "The task exited with an exit code representing a failure",
    "details": [
    {
    "Name": "Message",
    "Value": "The task process exited with an unexpected exit code"
    },
    {
    "Name": "AdditionalErrorCode",
    "Value": "FailureExitCode"
    }
    ],
    "computeInformation": "{\"account\":\"dsubatch\",\"poolName\":\"test_pool\",\"vmSize\":\"standard_a1_v2\"}",
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Japan East)",
    "executionDuration": 16,
    "durationInQueue": {
    "integrationRuntimeQueue": 1
    },
    "billingReference": {
    "activityType": "ExternalActivity",
    "billableDuration": [
    {
    "meterType": "AzureIR",
    "duration": 0.016666666666666666,
    "unit": "Hours"
    }
    ]
    }
    }

    Is there a better to find out the root cause?

    0 comments No comments

  3. Han Shih 施學翰 146 Reputation points
    2021-10-22T02:52:28.153+00:00

    I found the root cause,

    Whenever "start task" is modified, all the nodes should be rebooted.

    Thanks for your time ~

    0 comments No comments