Row count of parquet files

Question

Row count of parquet files

M. Adil 26

Hello, How can we get the row count of a parquet file?

I want do a conditional copy activity based on the row count of the source data file. I tried with the lookup condition and used a if condition, based on output of lookup activity the if condition will work. It was successful for some files but it is failing for file having large size (more than 5000 rows).

MartinJaffer-MSFT 26,236 Reputation points

2022-02-28T16:47:42.913+00:00

@M. Adil did @Pratik Somaiya 's response solve your issue? If so, please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

Other alternatives could include using Mapping Data Flow feature of Data Factory, or using Azure Synapse serverless SQL notebook.

Accepted answer

0 additional answers

Your answer

MartinJaffer-MSFT 26,236 Reputation points

2022-02-28T16:47:42.913+00:00

@M. Adil did @Pratik Somaiya 's response solve your issue? If so, please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

Other alternatives could include using Mapping Data Flow feature of Data Factory, or using Azure Synapse serverless SQL notebook.

Answer 1

Pratik Somaiya 4,211 Volunteer Moderator

Yes the lookup activity has a limitation of 5000 rows only

I would suggest to create a Databricks notebook, mount Azure Storage on it and then take the count of rows in parquet file using PySpark

This will give you the count even if the number of records are in billions

Article to help connect Databricks to Azure Storage: https://www.sqlshack.com/accessing-azure-blob-storage-from-azure-databricks/

Thanks!

Jona 660 Reputation points

2024-02-07T17:41:58.8966667+00:00

I'm facing the same situation. I'm using the copy activity and its output has the size of the parquet file, not the row count. Is there any way to do this, without implementing a dataflow/spark stuff?? regards

Share via

Row count of parquet files

0 additional answers

Your answer