Querying Azure Purview Catalogue with Python returns object count mismatch

Francesco Castellani 0 Reputation points
2025-02-18T15:03:55.3466667+00:00

I have a problem with querying Azure Purview using the azure-purview Python package. The steps are the following:

  1. I registered and scanned a data source
  2. The data source is linked to a collection for which I know the collectionId
  3. I connected to the Azure Purview instance using Python, making sure the clientId I use has all the relevant permissions:
    1. Collection admin
    2. Data source admin
    3. Data curator
    4. Data reader
  4. I queried the Azure Purview instance using the following:
       payload = {
       "keywords": None,
       "filter": {
         "and": [
         {
             "collectionId": <collectionId>
         }
         ]
       }
       }
       
       
       catalog_client = PurviewCatalogClient(endpoint=<purview_endpoint>, credential=<credentials>)
       assets = catalog_client.discovery.query(payload)															
    
  5. This is what the results of the query look like: it says that there are 265 assets in my collection, which is what I can also see on the Purview UI. However, why are there only 50 objects under the 'value' level of the JSON shown below?
{'@search.count': 265, 'value': [{'objectType': 'Tables', 'updateBy': 'ServiceAdmin', 'term': [{'name': 'SomeName', 'guid': 'SOmeId', 'glossaryName': 'SomeOtherName'}], 'id': 'SomeOtherId'...}

len(assets['value']) = 50
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,465 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 14,800 Reputation points Microsoft External Staff
    2025-02-18T19:34:59.9833333+00:00

    @Francesco Castellani

    Welcome to the Microsoft Q&A forum.

    The issue you're encountering with Azure Purview's Python SDK, where the query returns a count of 265 assets but only 50 objects in the 'value' field, is likely due to pagination in the results. By default, the Azure Purview API limits the number of results returned in a single query to a maximum of 50.

    To retrieve all the assets, you need to implement pagination in your query. Here's how you can do that:

    • You can specify the number of results you want to retrieve in each call. The maximum value is 1000, but you can start with 50 or any other number.
    • This parameter allows you to skip a certain number of results, which is useful for pagination.

    Here’s an example

    def get_all_assets(catalog_client, collection_id):
        all_assets = []
        top = 50  # Number of results to return per request
        skip = 0  # Number of results to skip
        while True:
            payload = {
                "keywords": None,
                "filter": {
                    "and": [
                        {
                            "collectionId": collection_id
                        }
                    ]
                },
                "top": top,
                "skip": skip
            }
            assets = catalog_client.discovery.query(payload)
            all_assets.extend(assets['value'])
            # Check if we have retrieved all assets
            if len(assets['value']) < top:
                break  # No more assets to retrieve
            skip += top  # Increment the skip value for the next batch
        return all_assets
    # Usage
    catalog_client = PurviewCatalogClient(endpoint=<purview_endpoint>, credential=<credentials>)
    collection_id = <collectionId>
    all_assets = get_all_assets(catalog_client, collection_id)
    print(f'Total assets retrieved: {len(all_assets)}')
    

    I hope the above steps will resolve the issue, please do let us know if issue persists. Thank you

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.