Searching In a blob file

Arkady Mankovsky 21 Reputation points
2021-05-27T07:39:05.653+00:00

Hi,

I have a 30Gb csv file in my blob container.
I have a list of about 1000 records, and I want to search for these records inside the large file.
what is the right/best way to do that? I tried using local python script on my pc but of course it crashed.
should I use databricks to distribute a python script?
is there a way to do it using data factory? I saw there is an option in data flow to do lookup on a file,
but only for defined string...

thanks in advance,

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,501 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,611 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,486 Reputation points Microsoft Employee Moderator
    2021-05-28T17:57:46.77+00:00

    Hello @Arkady Mankovsky ,
    Thanks for the ask and using the Microsoft Q&A platform .
    You have mentioned that the you have one csv file which is of the size 30GB and you need to search the words in there . I suggest to split the bigger files to smaller files on some fields and then do trhe search . Yes databricks can be helpful . You can also explore mapping data flow .

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.