rowcount from parquet file in synapse pipeline

rajendar erabathini 616 Reputation points
2023-03-30T16:47:50.54+00:00

HI - I need to know the row count from parquet file in synapse pipeline. I am trying with lookup activity but it has some limitation in keeping the rows in memory and throws an error if the data size exceeds 1MB(?). Is there any better way to find it within synapse pipeline. Please note that we are not using azure databricks.

thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,422 questions
0 comments No comments
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 37,971 Reputation points Microsoft Employee
    2023-03-31T09:20:22.8866667+00:00

    Hi rajendar erabathini,

    Thank you for posting query in Microsoft Q&A Platform.

    Yes, lookup activity can read up to 5000 rows or 4MB of size only. So you cannot use that in this case. You need to consider using mapping data flows. In source transformation use your file and then use aggregate transformation to group all data and get count of rows. Finally use sink transformation with output to activity option. This helps to output the row count to output of data flow activity.

    Kindly consider checking below videos to understand each of above capabilities or transformations.

    Aggregate Transformation in Mapping Data Flow in Azure Data Factory

    Write Cache Sink to Activity Output in Azure Data factory

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. rajendar erabathini 616 Reputation points
    2023-04-06T10:44:07.3+00:00
    0 comments No comments