How can I read a .dcm file stored in ADLS gen2 from pyspark notebook in Azure Synapse?

Prateek Narula 41 Reputation points
2022-06-01T08:02:59.79+00:00

.dcm is an image format for medical images. I've stored the file on ADLS gen2 and want to access those files in a notebook,
207453-screenshot-2022-05-31-at-181407.png

and this is how I'm trying to access it, I'm using pydicom and notebook is unable to resolve the file destination for dcm file. Although csv file is fine.

207438-screenshot-2022-05-31-at-181552.png

data['dicom'] is the path to the file in the ADLS gen 2 and the path is correct. I've verified and and cross checked it multiple times.

What I want to achieve is select and display the image based on the id that I have (pid).
I want to use Azure Synapse Analytics and ADLS gen2, I can use blob storage as well but the problem is the same.

The error I get is:
207481-screenshot-2022-06-01-at-095837.png
both the directions (blob storage and ADLS gen2 ) give me the same file not found error.
Seems like the library is not able to access the ADLS path as is.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,199 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,236 Reputation points
    2022-06-02T19:55:10.723+00:00

    Hello @Prateek Narula and welcome to Microsoft Q&A. Thank you for providing such detailed background information and context. I had never heard of dicom or pydicom before.

    So here is my hunch.

    The lowest level of your error, the filereader.py is using low level operating system methods for opening a file. This is expecting to run on a normal environment on a computer where the file is stored on the local disk drive. The URI you have given is sharing the https protocol, as opposed to a local disk like C:\ .

    I think we should try downloading the file to the cluster, and then running on that.

    Another option is to mount the storage account, so it can then be referred to by a more operating system like designation. Actually, this
    second option is what you probably want more. Mounting makes it so spark can pretend the storage account is actually an attached disk drive, as opposed to some web URL.

    Does this make sense? You were trying to use a local file system operation on a website resource. Like trying to get to a campsite using a passenger train instead of a car. Both car and train are modes of transportation, but trains only go to train stops. The train can't get to the campsite.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.