Run metadata scanning

The following short walkthrough shows how to use the scanner APIs to retrieve metadata from your organization's Fabric items. It assumes that a Fabric admin has set up metadata scanning in your organization.

For the list of the artifact and subartifact metadata that metadata scanning returns, see the documentation for the Admin - WorkspaceInfo GetScanResult API.

The following are the scanner APIs. They support both public and sovereign clouds.

Important

The app you develop for scanning can authenticate by using either a standard delegated admin access token or a service principal. The two authentication paths are mutually exclusive. When running under a service principal, there must be no Power BI admin-consent-required permissions set on your app. For more information, see Enable service principal authentication for read-only admin APIs.

Step 1: Perform a full scan

Call workspaces/modified without the modifiedSince parameter to get the complete list of workspace IDs in the tenant. This scan retrieves all the workspaces in the tenant, including personal workspaces and shared workspaces. If you wish to exclude personal workspaces from the scan, use the workspaces/modified excludePersonalWorkspaces parameter.

Divide the list into chunks of 100 workspaces at most.

For each chunk of 100 workspaces:

Call workspaces/getInfo to trigger a scan call for these 100 workspaces. You'll receive the scanId in the response to use in the next steps. In the location header, you'll also receive the Uniform Resource Identifier (URI) to call for the next step.

Note

Not more than 16 calls can be made simultaneously. The caller should wait for a scan succeed/failed response from the scanStatus API before invoking another call.

If some metadata you expected to receive is not returned, check with your Fabric admin to make sure they have enabled all relevant admin switches.

Use the URI from the location header you received from calling workspaces/getInfo and poll on workspaces/scanStatus/{scan_id} until the status returned is "Succeeded". This status means the scan result is ready. It's recommended to use a polling interval of 30-60 seconds. In the location header, you also receive the URI to call in the next step. Use it only after the status is "Succeeded".

Use the URI from the location header you received from calling workspaces/scanStatus/{scan-id} and read the data using workspaces/scanResult/{scan_id}. The data contains the list of workspaces, item info, and other metadata based on the parameters passed in the workspaces/getInfo call.

Step 2: Perform an incremental scan

Now that you have all the workspaces and the metadata and lineage of their assets, it's recommended that you perform only incremental scans that reference the previous scan that you did.

Call workspaces/modified with the modifiedSince parameter set to the start time of the last scan in order to get the workspaces that have changed, and which therefore require another scan. The modifiedSince parameter should be set for a date within the last 30 days.

Divide this list into chunks of up to 100 workspaces, and get the data for these changed workspaces by using the three API calls, workspaces/getInfo, workspaces/scanStatus/{scan_id}, and workspaces/scanResult/{scan_id}, as described in Step 1.

Considerations and limitations

  • semantic models that haven't been refreshed or republished will be returned in API responses but without their subartifact information and expressions. For example, semantic model name and lineage are included in the response, but not the semantic model's table and column names.
  • semantic models containing only DirectQuery tables will return subartifact metadata only if some sort of action has been taken on the semantic model, such as someone building a report on top of it, someone viewing a report based on it, etc.
  • Real-time datasets, semantic models with object-level security, semantic models with a live connection to AS-Azure and AS on-premises, and Excel full fidelity datasets aren't supported for subartifact metadata. For unsupported datasets, the response returns the reason for not getting the subartifact metadata from the dataset. It's found in a field named schemaRetrievalError, for example, schemaRetrievalError: Unsupported request. RealTime dataset are not supported.
  • The API doesn't return subartifact metadata for semantic models that are larger than 1 GB in shared workspaces. In Premium workspaces, there's no size limitation on semantic models.

Licensing

Metadata scanning requires no special license. It works for all of your tenant's metadata, including that of items located in non-Premium workspaces.