How can I create a dataset in Azure ML studio (through the GUI) from a parquet file created with Azure Spark

Nastasia Saby 206 Reputation points
2020-09-30T12:15:16.08+00:00

I'm trying to load files as a dataset in the GUI of Azure ML Studio. These parquet files have been created through Spark.

In my folder, Spark creates files such as "_SUCCESS" or "_committed_8998000".

Azure ML Studio is not able to read them or ignore them and tells me:

The provided file(s) have invalid byte(s) for the specified file encoding.
{
  "message": " "
}

I selected "Ignore unmatched files path" and yet, it still does not work.

If I remove the "_SUCCESS" and other Spark files, it works.

Does anyone have an idea about a workaround?

Thank you.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,958 questions
0 comments No comments
{count} vote

Accepted answer
  1. Nastasia Saby 206 Reputation points
    2020-10-01T06:51:09.84+00:00

    I used "path//.parquet" in the "Path" field and now it works.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.