Welcome to the Microsoft Q&A forum.
The issue you're encountering with Azure Purview's Python SDK, where the query returns a count of 265 assets but only 50 objects in the 'value' field, is likely due to pagination in the results. By default, the Azure Purview API limits the number of results returned in a single query to a maximum of 50.
To retrieve all the assets, you need to implement pagination in your query. Here's how you can do that:
- You can specify the number of results you want to retrieve in each call. The maximum value is 1000, but you can start with 50 or any other number.
- This parameter allows you to skip a certain number of results, which is useful for pagination.
Here’s an example
def get_all_assets(catalog_client, collection_id):
all_assets = []
top = 50 # Number of results to return per request
skip = 0 # Number of results to skip
while True:
payload = {
"keywords": None,
"filter": {
"and": [
{
"collectionId": collection_id
}
]
},
"top": top,
"skip": skip
}
assets = catalog_client.discovery.query(payload)
all_assets.extend(assets['value'])
# Check if we have retrieved all assets
if len(assets['value']) < top:
break # No more assets to retrieve
skip += top # Increment the skip value for the next batch
return all_assets
# Usage
catalog_client = PurviewCatalogClient(endpoint=<purview_endpoint>, credential=<credentials>)
collection_id = <collectionId>
all_assets = get_all_assets(catalog_client, collection_id)
print(f'Total assets retrieved: {len(all_assets)}')
I hope the above steps will resolve the issue, please do let us know if issue persists. Thank you