GraphQL API with Microsoft Purview (Preview)
Important
This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include additional legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability.
In this tutorial, learn to programmatically interact with Microsoft Purview using the GraphQL API. For more information about GraphQL in general, see this introduction to GraphQL.
Using GraphQL is similar to using the REST APIs, in that you send a JSON payload to a service endpoint. However, GraphQL allows us to return complete information in a single fetch, eliminating the need for multiple API calls.
GraphQL also uses declarative data fetching. Declarative data fetching is useful for selective fetch on fields like classification/term/related entity linked to the original entity. By using GraphQL with these query patterns, we can optimize backend database fetching and data transmission. A good example is “Get Entity with filtered related entities, separated by aliases."
With its introspection feature, the GraphQL API becomes self-descriptive, enabling clients to retrieve schema details such as available queries, types, and query parameters. See more about introspection.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin.
You must have an existing Microsoft Purview account. If you don't, see the quickstart for creating a Microsoft Purview account.
To establish a bearer token and to call any APIs, see the documentation about how to authenticate APIs for Microsoft Purview.
GraphQL Endpoint
For all the requests, you can send a POST
request to the following endpoint:
POST https://{{endpoint}}/datamap/api/graphql
- If you're using the new Microsoft Purview portal, the {{endpoint}} value is:
api.purview-service.microsoft.com
. - If you're using the classic Microsoft Purview governance portal, the {{endpoint}} value is:
{your_purview_account_name}.purview.azure.com
.
Basic query
Entity - List By Guids
query {
entities(where: { guid: ["<guid1>", "<guid2>"] }) { #Values in the array are combined as a logical-OR.
guid
createTime
updateTime
typeName
attributes
name
qualifiedName
description
}
}
Sample Response:
{
"data": {
"entities": [
{
"guid": "9fb74c11-ac48-4650-95bc-760665c5bd92",
"createTime": 1553072455110,
"updateTime": 1553072455110,
"typeName": "azure_storage_account",
"attributes": {
"qualifiedName": "https://exampleaccount.core.windows.net",
"name": "ExampleStorageAccount",
},
"name": "ExampleStorageAccount",
"qualifiedName": "https://exampleaccount.core.windows.net",
"description": "Example Storage Account"
}
]
}
}
Query entities by type and qualified name
query {
entities(
where: {
type: { typeName: ["<entityType1>", "<entityType2>"] }
qualifiedName: { in: ["<qualifiedName1>", "<qualifiedName2>"] }
}
) {
guid
typeName
qualifiedName
attributes
}
}
The operator in
is used to query multiple qualified names.
Possible Operators could be exists|eq|ne|in|nin|gt|ge|lt|le
. You can find more server schema details with introspection.
Declarative data fetching
Get entity with selected fields
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
businessAttributes
qualifiedName
}
}
Get entity with all fields
query {
entities(where: { guid: "<guid>" }) {
guid
createTime
createdBy
updateTime
updatedBy
lastModifiedTS
typeName
attributes
businessAttributes
collectionId
customAttributes
hierarchyInfo {
...hierarchyInfoFields
}
labels
sensitivityLabel {
...sensitivityLabelFields
}
source
sourceDetails
qualifiedName
name
description
displayName
userDescription
classifications {
...classificationFields
}
relatedEntities {
...relatedEntitiesFields
}
assignedTerms {
...assignedTermsFields
}
}
}
Get entity with classifications
- with all classifications
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
classifications {
typeName
attributes
}
}
}
- with filtered classifications
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
classifications(where: { type: { typeName: "<classificationType>" } }) {
typeName
attributes
}
}
}
Get entity with related entities
- with all related entities
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
relatedEntities {
relationshipAttributeName
relationship {
guid
typeName
attributes
}
entity {
guid
typeName
qualifiedName
attributes
}
}
}
}
- with filtered related entities
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
relatedEntities(where: { relationshipAttributeName: "<relAttrName1>" }) {
entity {
guid
typeName
qualifiedName
attributes
}
}
}
}
- with filtered related entities, separated by aliases
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
alias1: relatedEntities(where: { relationshipAttributeName: "<relAttrName1>" }) {
entity {
guid
typeName
attributes
}
}
alias2: relatedEntities(where: { relationshipAttributeName: "<relAttrName2>" }) {
entity {
guid
typeName
attributes
}
}
}
}
Get entity with glossary terms
- with all glossary terms
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
assignedTerms {
confidence
createdBy
description
expression
steward
source
status
term {
qualifiedName
name
shortDescription
longDescription
}
}
}
}
- with filtered glossary terms
query {
entities(where: { guid: "<guid>" }) {
guid
typeName
attributes
assignedTerms {
confidence
createdBy
description
expression
steward
source
status
term {
qualifiedName
name
shortDescription
longDescription
}
}
}
}
Filtering (preview)
The performance of exact matching for ‘GUID’ and ‘Qualified-Name’ is guaranteed in the examples provided in the 'basic queries' section. However, there are some limitations for other filtering patterns:
- Filtering on Non-indexed Fields: Fields other than GUID & qualified name are currently not indexed (as examples in the “Simple filter” section). Filtering over nonindexed fields without criteria on GUID/Qualified-name will result in a table scan and might cause performance issues on large datasets.
- Nested Filtering: Similar to nonindexed fields, nested filtering can cause table scans, which might cause performance issues on large datasets. For example, finding an entity with a linked classification/term/related entity.
Despite these limitations, this call pattern is superior to client-side filtering and is currently used by our internal client.
Simple filter
Query entities by type and system attributes
query {
entities(
where: {
type: { typeName: "<entityType>" }
name: { eq: "<value>" }
createTime: { timerange: LAST_7D }
updateTime: { gt: "<timestamp>" }
}
) {
guid
typeName
qualifiedName
attributes
}
}
Query entities by type and attributes
query {
entities(
where: {
type: { typeName: "<entityType>" }
attribute: { field: "<attrName>",operator: eq, value: "<attrValue>" }
}
) {
guid
typeName
qualifiedName
attributes
}
}
Query entities by type and business attributes
query {
entities(
where: {
type: { typeName: "<entityType>" }
businessAttribute: { field: "<BusinessMetadataName>.<BusinessAttributeName>", operator: eq, value: "<BizAttrValue>" }
}
) {
guid
typeName
qualifiedName
attributes
}
}
Query entities by collection
Currently, this query pattern isn't supported for subassets.
query {
entities(
where: {
type: { typeName: "<entityType>" }
collectionID: "<collectionId>"
}
) {
guid
typeName
qualifiedName
attributes
}
}
Filter combination
Keys in an object(map) are combined as a logical AND.
query {
entities(
where: {
type: { typeName: "<entityType>" }
or: [
{
and: [
{ attribute: { field: "<attrName1>", value: "<attrValue1>" } }
{ attribute: { field: "<attrName2>", value: "<attrValue2>" } }
]
}
{
not: {
businessAttribute: {
field: "<BusinessMetadataName>.<BusinessAttributeName>", value: "<BizAttrValue>"
}
}
}
]
}
) {
guid
typeName
qualifiedName
attributes
}
}
Nested filter
Currently, this query pattern isn't supported for subassets.
Query entity by classification
query {
entities(
where: {
classification: { type: { typeName: "<classificationType>", includeSubTypes: true } }
}
) {
guid
typeName
qualifiedName
attributes
}
}
Query entity by related entity
query {
entities(
where: {
relatedEntity: {
relationshipAttributeName: "<relAttrName>"
entity: {
type: { typeName: "<entityType>" }
}
}
}
) {
guid
typeName
qualifiedName
attributes
}
}
Query entity by glossary terms
query {
entities(
where: {
assignedTerm: {
term: {
qualifiedName: { eq: "<termName>" }
}
}
}
) {
guid
typeName
qualifiedName
attributes
}
}
Other queries
Get relationship
query {
relationships(where: { guid: "<guid>" }) {
guid
typeName
attributes
end1 {
guid
typeName
qualifiedName
attributes
}
end2 {
guid
typeName
qualifiedName
attributes
}
}
}
Sample Response:
{
"data": {
"relationships": [
{
"guid": "...",
"typeName": "ExampleRelationship",
"attributes": {},
"end1": {
"guid": "...",
"typeName": "column",
"qualifiedName": "...",
"attributes": {}
},
"end2": {
"guid": "...",
"typeName": "column",
"qualifiedName": "...",
"attributes": {}
}
}
]
}
}
Lineage
Get dataset with downstream datasets
This query returns downstream datasets to 2 degrees.
query {
entities(where: { guid: "<guid>" }) {#dataset
guid
typeName
qualifiedName
relatedEntities(where: { relationshipAttributeName: "inputToProcesses" }) {
entity {#process
guid
typeName
relatedEntities(where: { relationshipAttributeName: "outputs" }) {
entity {#dataset
guid
typeName
qualifiedName
relatedEntities(where: { relationshipAttributeName: "inputToProcesses" }) {
entity {#process
guid
typeName
qualifiedName
relatedEntities(where: { relationshipAttributeName: "outputs" }) {
entity {#dataset
guid
typeName
qualifiedName
}
}
}
}
}
}
}
}
}
}
Sample Response:
{
"data": {
"entities": [
{
"guid": "...",
"typeName": "Dataset",
"qualifiedName": "...",
"relatedEntities": [
{
"entity": {
"guid": "...",
"typeName": "Process",
"relatedEntities": [
{
"entity": {
"guid": "...",
"typeName": "Dataset",
"qualifiedName": "...",
"relatedEntities": [
{
"entity": {
"guid": "...",
"typeName": "Process",
"relatedEntities": [
{
"entity": {
"guid": "...",
"typeName": "Dataset",
"qualifiedName": "..."
}
}
]
}
}
]
}
}
]
}
}
]
}
]
}
}
Limitations
Query fetch-depth limitation:
Queries are limited by query depth. The maximum depth is 5. For example, the following query will fail:
query {
entities { #depth 1
relatedEntities { #depth 2
entity {
relatedEntities { #depth 3
entity {
assignedTerms{ #depth 4
term {
classifications { #depth 5
...
}
}
}
relatedEntities { #depth 4
entity {
assignedTerms{ #depth 5
term {
classifications { #depth 6
...
}
}
}
}
}
}
}
}
}
}
}
Execution-time fetch-cost limitation:
Queries are constrained by the cost of query execution. The maximum allowable cost is set at 100 units.
The execution fetch cost is computed each time we aim to retrieve related entities, assigned terms, or classifications for a given entity or term.
Example
Consider a scenario where we query 3 entities, each with two related entities. The cost calculation would be as follows:
- One unit for the root query
- Three units for each level-1 entity
Hence, the total cost for this query would be 1 (root query) + 3 (level-1 entities) = 4
.
Filtering performance
Filtering is currently preview and has some limitations. See filtering for more information.
GraphQL queries begin with a root query that retrieves the top-level nodes. It then recursively fetches the related nodes.
The performance of a nested query is primarily determined by the root query because the related nodes are fetched from a known starting point, similar to a foreign key in SQL.
To optimize performance, it’s crucial to avoid wildcard root queries that could trigger a table scan on the top-level nodes.
For instance, the following query could cause performance issues on large datasets because the name
field isn't indexed:
query {
entities(where: { name: { eq: "<value>" } }) {
guid
typeName
attributes
relatedEntities {
entity {
guid
typeName
attributes
}
}
}
}
To prevent such performance issues, ensure that you apply a filter on guid
or qualifiedName
.