Graph models overview (preview)

2025-06-04

Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer

Note

This feature is currently in public preview. Functionality and syntax are subject to change before General Availability.

Graph models in Azure Data Explorer enable you to define, manage, and efficiently query persistent graph structures within your database. Unlike transient graphs created using the make-graph operator, graph models are stored representations that can be queried repeatedly without rebuilding the graph for each query, significantly improving performance for complex relationship-based analysis.

Overview

A graph model is a database object that represents a labeled property graph (LPG) within Azure Data Explorer. It consists of nodes, also called vertices, and edges, also called relationships. Both nodes and edges can have properties that describe them. The model defines the schema of the graph, including node and edge types with their properties. It also defines the process for constructing the graph from tabular data stored in KQL Database tables and from tabular expressions.

Key characteristics

Graph models offer:

Metadata persistence: Store graph specifications in database metadata for durability and reusability
Materialized snapshots: Eliminate the need to rebuild graphs for each query, dramatically improving query performance
Schema definition: Support optional but recommended defined schemas for nodes and edges, ensuring data consistency
Deep KQL integration: Seamlessly integrate with graph semantics
Optimized traversals: Include specialized indexing for efficient graph traversal operations, making complex pattern matching and path-finding queries significantly faster

When to use graph models

Graph models provide significant advantages for relationship-based analysis but require additional setup compared to ad-hoc graph queries. Consider using graph models when:

Performance is critical: You repeatedly run graph queries on the same underlying data and need optimized performance
Complex relationship data: You have data with many interconnected relationships that benefit from a graph representation
Stable structure: Your graph structure is relatively stable, with periodic but not constant updates
Advanced graph operations: You need to perform complex traversals, path finding, pattern matching, or community detection on your data
Consistent schema: Your graph analysis requires a well-defined structure with consistent node and edge types

For simpler, one-time graph analysis on smaller datasets, the make-graph operator might be more appropriate.

Graph model components

A graph model consists of two main components:

Schema (optional)

The schema defines the structure and properties of nodes and edges in the graph model. While optional, the schema serves several important purposes:

Type safety: Schema properties define the expected data types for node and edge properties, ensuring type consistency during graph queries
Property validation: All properties defined in the schema become valid properties for nodes/edges with the corresponding labels, regardless of whether these properties appear in the step query columns
Query compatibility: Schema properties can be safely referenced in graph-match queries without type collisions with step query columns

Schema structure

Nodes: Defines node label types and their typed properties (e.g., "Person": {"Name": "string", "Age": "long"})
Edges: Defines edge label types and their typed properties (e.g., "WORKS_AT": {"StartDate": "datetime", "Position": "string"})

Definition

The Definition specifies how to build the graph from tabular data through a series of sequential operations. This section is the core of the graph model, as it transforms your relational data into a graph structure.

Key characteristics of the Definition:

Sequential execution: Steps are executed in the exact order they appear in the Definition array. This order is critical because:
- Nodes must typically be created before edges that reference them
- Later steps can build upon or modify the results of earlier steps
- The sequence affects performance and memory usage during graph construction
Incremental construction: Each step adds to the graph being built, allowing you to:
- Combine data from multiple tables or sources
- Apply different logic for different types of nodes or edges
- Build complex graph structures incrementally

Step types:

AddNodes: Steps that define how to create nodes from tabular data
- Can be used multiple times to add different types of nodes
- Each step can pull from different data sources or apply different filters
- Node properties are derived from the columns in the query result
AddEdges: Steps that define how to create edges from tabular data
- Can reference nodes that don't yet exist (the system will create placeholder nodes and update them when AddNodes steps are processed later)
- Can create relationships between nodes from the same or different AddNodes steps
- Edge properties are derived from the columns in the query result
- While it's possible to add edges before nodes, it's recommended to add nodes first for better readability and understanding

Execution flow example:

Step 1 (AddNodes): Create Person nodes from Employees table
Step 2 (AddNodes): Create Company nodes from Organizations table  
Step 3 (AddEdges): Create WORKS_AT edges between Person and Company nodes
Step 4 (AddEdges): Create KNOWS edges between Person nodes

This sequential approach ensures that when Step 3 creates WORKS_AT edges, both the Person nodes (from Step 1) and Company nodes (from Step 2) already exist in the graph.

Labels in Graph models

Labels are critical identifiers that categorize nodes and edges in the graph, enabling efficient filtering and pattern matching. Azure Data Explorer graph models support two complementary types of labels:

Static labels

Defined explicitly in the Schema section of the graph model
Represent node or edge types with predefined properties
Provide a consistent schema for the graph elements
Referenced in the "Labels" array in AddNodes and AddEdges steps
Ideal for well-known, stable entity and relationship types

Dynamic labels

Not predefined in the Schema section
Generated at runtime from data in the underlying tables
Specified using "LabelsColumn" in the AddNodes or AddEdges steps
Can be a single label (string column) or multiple labels (dynamic array column)
Allow for more flexible graph structures that adapt to your data
Useful for systems where node/edge types evolve over time

Tip

You can combine static and dynamic labels to get the benefits of both approaches: schema validation for core entity types while maintaining flexibility for evolving classifications.

Definition steps in detail

The Definition section of a graph model contains steps that define how to construct the graph from tabular data. Each step has specific parameters depending on its kind.

AddNodes steps

AddNodes steps define how to create nodes in the graph from tabular data:

Parameter	Required	Description
Kind	Yes	Must be set to "AddNodes"
Query	Yes	A KQL query that retrieves the data for nodes. The query result must include all columns required for node properties and identifiers
NodeIdColumn	Yes	The column from the query result used as the unique identifier for each node
Labels	No	An array of static label names defined in the Schema section to apply to these nodes
LabelsColumn	No	A column from the query result that provides dynamic labels for each node. Can be a string column (single label) or dynamic array column (multiple labels)

AddEdges steps

AddEdges steps define how to create relationships between nodes in the graph:

Parameter	Required	Description
Kind	Yes	Must be set to "AddEdges"
Query	Yes	A KQL query that retrieves the data for edges. The query result must include source and target node identifiers and any edge properties
SourceColumn	Yes	The column from the query result that contains the source node identifiers
TargetColumn	Yes	The column from the query result that contains the target node identifiers
Labels	No	An array of static label names defined in the Schema section to apply to these edges
LabelsColumn	No	A column from the query result that provides dynamic labels for each edge. Can be a string column (single label) or dynamic array column (multiple labels)

Graph model examples

Basic example with both static and dynamic labels

The following example creates a professional network graph model that combines static schema definitions with dynamic labeling:

.create-or-alter graph_model ProfessionalNetwork ```
{
  "Schema": {
    "Nodes": {
      "Person": {"Name": "string", "Age": "long", "Title": "string"},
      "Company": {"Name": "string", "Industry": "string", "FoundedYear": "int"}
    },
    "Edges": {
      "WORKS_AT": {"StartDate": "datetime", "Position": "string", "Department": "string"},
      "KNOWS": {"ConnectionDate": "datetime", "ConnectionStrength": "int"}
    }
  },
  "Definition": {
    "Steps": [
      {
        "Kind": "AddNodes",
        "Query": "Employees | project Id, Name, Age, Title, NodeType",
        "NodeIdColumn": "Id",
        "Labels": ["Person"],
        "LabelsColumn": "NodeType"
      },
      {
        "Kind": "AddNodes",
        "Query": "Organizations | project Id, Name, Industry, FoundedYear",
        "NodeIdColumn": "Id",
        "Labels": ["Company"]
      },
      {
        "Kind": "AddEdges",
        "Query": "EmploymentRecords | project EmployeeId, CompanyId, StartDate, Position, Department",
        "SourceColumn": "EmployeeId",
        "TargetColumn": "CompanyId",
        "Labels": ["WORKS_AT"]
      },
      {
        "Kind": "AddEdges",
        "Query": "Connections | project PersonA, PersonB, ConnectionDate, ConnectionType, ConnectionStrength",
        "SourceColumn": "PersonA",
        "TargetColumn": "PersonB",
        "Labels": ["KNOWS"],
        "LabelsColumn": "ConnectionType"
      }
    ]
  }
}
```

This model would enable queries such as finding colleagues connected through multiple degrees of separation, identifying people working in the same industry, or analyzing organizational relationships.

Creating and managing Graph models

Azure Data Explorer provides a comprehensive set of management commands for working with graph models throughout their lifecycle.

Command summary

Command	Purpose	Key parameters
.create-or-alter graph_model	Create a new graph model or modify an existing one	Database, Name, Schema, Definition
.drop graph_model	Remove a graph model	Database, Name
.show graph_models	List available graph models	Database [optional]

Graph model lifecycle

A typical workflow for managing graph models involves:

Development - Create an initial graph model with a schema and definition that maps to your data
Validation - Query the model to verify correct structure and expected results
Maintenance - Periodically update the model as your data structure evolves
Snapshot management - Create and retire snapshots to balance performance and freshness

Tip

When starting with graph models, begin with a small subset of your data to validate your design before scaling to larger datasets.

Graph snapshots

Graph snapshots are database entities that represent instances of graph models at specific points in time. While a graph model defines the structure and data sources for a graph, a snapshot is the actual materialized graph that can be queried.

Key aspects of graph snapshots:

Each snapshot is linked to a specific graph model
A single graph model can have multiple snapshots associated with it
Snapshots are created with the .make graph_snapshot command
Snapshots include metadata such as creation time and the source graph model
Snapshots enable querying the graph as it existed at a specific point in time

For more detailed information about working with graph snapshots, see Graph snapshots overview.

Querying Graph models

Graph models are queried using the graph() function, which provides access to the graph entity. This function supports retrieving either the most recent snapshot of the graph or creating the graph at query time if snapshots aren't available.

Basic query structure

graph("GraphModelName")
| graph-match <pattern>
    where <filters>
    project <output fields>

Query examples

1. Basic node-edge-node pattern

// Find people who commented on posts by employees in the last week
graph("SocialNetwork") 
| graph-match (person)-[comments]->(post)<-[authored]-(employee)
    where person.age > 30 
      and comments.createTime > ago(7d)
    project person.name, post.title, employee.userName

2. Multiple relationship patterns

// Find people who both work with and are friends with each other
graph("ProfessionalNetwork") 
| graph-match (p1)-[worksWith]->(p2)-[friendsWith]->(p1)
    where labels(worksWith) has "WORKS_WITH" and labels(friendsWith) has "FRIENDS_WITH" and
      labels(p1) has "Person" and labels(p2) has "Person"
    project p1.name, p2.name, p1.department

3. Variable-length paths

// Find potential influence paths up to 3 hops away
graph("InfluenceNetwork") 
| graph-match (influencer)-[influences*1..3]->(target)
    where influencer.id == "user123" and all(influences, labels() has "INFLUENCES")
    project influencePath = influences, 
         pathLength = array_length(influences), 
         target.name

The graph() function provides a consistent way to access graph data without needing to explicitly construct the graph for each query.

Note

See Graph operators for the complete reference on graph query syntax and capabilities.

Frequently Asked Questions

Who is responsible for refreshing the graph?

Users or processes must refresh the graph themselves. Initially, no automatic refresh policies exist for new graph entities. However, the graph remains queryable even if the snapshot is being created or has not yet been created yet.

How can a graph be refreshed?

To refresh a graph:

Create a new snapshot using an asynchronous operation (.make graph_snapshot)
Once created, incoming graph queries automatically use the new snapshot
Optional: Drop the old snapshot to free up resources (.drop graph_snapshot)

What if different steps create duplicate edges or nodes?

The Definition steps execute sequentially, and duplicate handling differs between nodes and edges:

Edges: Duplicates remain as duplicates by default since edges don't have unique identifiers. If multiple steps create identical source-target relationships, each one becomes a separate edge in the graph. This behavior is intentional as multiple relationships between the same nodes can represent different interactions or events over time.
Nodes: "Duplicates" are automatically merged based on the NodeIdColumn value - the system assumes they represent the same entity. When multiple steps create nodes with the same identifier:
- All properties from different steps are combined into a single node
- If there are conflicting property values for the same property name, the value from the step that executed last takes precedence
- Properties that exist in one step but not another are preserved

This merge behavior allows you to build nodes incrementally across steps, such as adding basic information in one step and enriching with additional properties in subsequent steps.

How do graph models handle schema changes?

When the schema of your underlying data changes:

Alter your graph model using the .create-or-alter graph_model command to update its schema or definition
To materialize these changes, create a new snapshot
Older snapshots remain accessible until explicitly dropped

Can I query across multiple graph models?

Yes, you can query multiple graph models within a single query using composition:

Use the output of one graph() operator as input to another graph() operator
Process and transform results from one graph before feeding into another graph query
Chain multiple graph operations for cross-domain analysis without creating a unified model

Example:

// Query the first graph model
graph("EmployeeNetwork") 
| graph-match (person)-[manages]->(team)
    where labels(manages) has "MANAGES" and labels(person) has "Employee"
    project manager=person.name, teamId=team.id
// Use these results to query another graph model
| join (
	graph("ProjectNetwork")
	| graph-match (project)-[assignedTo]->(team)
        where labels(assignedTo) has "ASSIGNED_TO"
	    project projectName=project.name, teamId=team.id
) on teamId

What's the difference between labels and properties?

Labels: Categorize nodes and edges for structural pattern matching
Properties: Store data values associated with nodes and edges (used in filtering and output)