How to read a file at folder level ignoring the sub-folders within #Azure-data-lake-storage using databricks

Goutham Kannekanti 1 Reputation point
2021-02-09T14:13:14.183+00:00

Hi Team,
In Data lake, I have a folder called "AA" and there is a sub-folder called "BB" within folder "AA". I have a file named "One.parquet" at folder level ie inside "AA" but outside "BB". I have another file named "Two.parquet" inside "BB".
Now, how can I read the data of "One.parquet" file in the azure databricks notebook using below code
var df = http://spark.read.parquet(
ABFSSPath
.concat(abcdefg)
)

I just want data of "One.parquet" but the above script is reading both files ie "One.parquet" and "Two.parquet"

Request your help ASAP

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,491 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,222 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2021-02-09T22:32:32.33+00:00

    Hello @Goutham Kannekanti ,
    Thanks for the ask and using the Microsoft Q&A platform .

    The code which you ahve posted is not complete .

    var df = http://spark.read.parquet(
    ABFSSPath
    .concat(abcdefg)
    )

    I think you are creating the mount point with * in the end .My best guess is you should focus on
    abcdefg and see if you can pass the AA*.parquet , I think that should help.

    Thanks
    Himanshu


  2. Pranay 291 Reputation points
    2021-02-27T19:50:14.217+00:00

    @Goutham Kannekanti

    Not sure, if you have already figured out or not!! Its quite easy actually.
    If you are using API to connect or using spark. You can use conditional statements and check the name of the folder in path. Ex. The name of folder is always “BB.
    You can use

    if “BB” not in path:

    Would work without problems. Hope that helps. .

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.