Row count of parquet files

M. Adil 26 Reputation points
2022-02-23T08:40:00.77+00:00

Hello, How can we get the row count of a parquet file?

I want do a conditional copy activity based on the row count of the source data file. I tried with the lookup condition and used a if condition, based on output of lookup activity the if condition will work. It was successful for some files but it is failing for file having large size (more than 5000 rows).

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,569 questions
{count} votes

Accepted answer
  1. Pratik Somaiya 4,206 Reputation points
    2022-02-23T10:37:38.103+00:00

    Yes the lookup activity has a limitation of 5000 rows only

    I would suggest to create a Databricks notebook, mount Azure Storage on it and then take the count of rows in parquet file using PySpark

    This will give you the count even if the number of records are in billions

    Article to help connect Databricks to Azure Storage: https://www.sqlshack.com/accessing-azure-blob-storage-from-azure-databricks/

    Thanks!

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.