How to keep the order of the columns in a (with Azure Purview) scanned parquet file?

Bernhard Lauber 41 Reputation points
2022-02-24T09:26:31.597+00:00

I did a scan of Parquet files (StorageV2 (general purpose v2)). The discovered columns are shown not the original order but alphabetically. When I do a scan with a CSV file the original order is retained.

What can I do to keep the order of the columns in a scanned parquet file? Any hints? Do you have the same experience?

Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,465 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,622 Reputation points Microsoft Employee
    2022-03-03T20:19:57.203+00:00

    Hello @Bernhard Lauber ,

    After having conversation with internal team and other online readings, it looks like this is a problem with parquet and not Azure Purview. PARQUET-188 suggests that column ordering is not part of the parquet spec, so it confirms that the column order is not honored while Parquet.

    Here is another SO thread on similar discussion - Is there a possibility to keep column order when reading parquet?

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.