Share via

How to keep the order of the columns in a (with Azure Purview) scanned parquet file?

Bernhard Lauber 41 Reputation points
2022-02-24T09:26:31.597+00:00

I did a scan of Parquet files (StorageV2 (general purpose v2)). The discovered columns are shown not the original order but alphabetically. When I do a scan with a CSV file the original order is retained.

What can I do to keep the order of the columns in a scanned parquet file? Any hints? Do you have the same experience?

Microsoft Security | Microsoft Purview

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,827 Reputation points Microsoft Employee Moderator
    2022-03-03T20:19:57.203+00:00

    Hello @Bernhard Lauber ,

    After having conversation with internal team and other online readings, it looks like this is a problem with parquet and not Azure Purview. PARQUET-188 suggests that column ordering is not part of the parquet spec, so it confirms that the column order is not honored while Parquet.

    Here is another SO thread on similar discussion - Is there a possibility to keep column order when reading parquet?

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.