Thank you for using Microsoft Q&A platform and thanks for posting your query.
You should Consider the queries that you'll be running frequently and partition your data accordingly. If you have specific non-partition columns that you filter on regularly, you may want to consider restructuring your data to make those columns part of the partitioning scheme.
Organizing files into many subfolders based on non-partition columns (as you suggested with a "long" path pc1/.../pcN/npc1/.../npcM) could potentially improve performance for specific queries. However, this approach has limitations of complexity can become complex to manage. It may create more partitions than necessary, which might actually hinder performance in some cases. You'd have to carefully balance the granularity of the partitions with query patterns and data volume.
Also, in serverless your cost if based on data reads, so you will definitely want to reduce the amount of data being read by avoiding scans as much as possible. Ensuring that only part of your data is read which is relevant for the query.
Scans are very costly in terms of performance. But actual scan performance will depend on how much data is required to be read.
Edited to Update More Regarding Joins.
Parquet is a columnar storage format. This means that reading a specific column's data across many rows is faster than reading many columns for a specific row. If your JOIN operation predominantly filters or scans specific columns, Parquet's structure will inherently provide some performance benefits.
When data is partitioned, the query engine can skip entire partitions that are irrelevant to a query. This is particularly beneficial for filter operations, not JOINs specifically. Partitioning on columns that you're joining on can improve performance if it allows the query engine to skip large amounts of irrelevant data.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.
.