Supported Spatial Analysis operations

Article
11/04/2021

Note

Azure Video Analyzer has been retired and is no longer available.

Azure Video Analyzer for Media is not affected by this retirement. It is now rebranded to Azure Video Indexer. Click here to read more.

Spatial Analysis enables the analysis of real-time streaming video from camera devices. For each camera device you configure, the operations will generate an output stream of JSON messages sent to Azure Video Analyzer.

Video Analyzer implements the following Spatial Analysis operations:

Operation Identifier	Description
Microsoft.VideoAnalyzer.SpatialAnalysisPersonZoneCrossingOperation	Emits a personZoneEnterExitEvent event when a person enters or exits the zone and provides directional info with the numbered side of the zone that was crossed. Emits a personZoneDwellTimeEvent when the person exits the zone and provides directional info and the number of milliseconds the person spent inside the zone.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonLineCrossingOperation	Tracks when a person crosses a designated line in the camera's field of view.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonDistanceOperation	Tracks when people violate a distance rule.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonCountOperation	Counts people in a designated zone in the camera's field of view. The zone must be fully covered by a single camera in order for PersonCount to record an accurate total.
Microsoft.VideoAnalyzer.SpatialAnalysisCustomOperation	Generic operation that can be used to run all scenarios mentioned above. This option is more useful when you want to run multiple scenarios on the same camera or use system resources (for example, GPU) more efficiently.

Person Zone Crossing

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonZoneCrossingOperation

See an example of Person Zone Crossing Operation from our GitHub sample.

Parameters:

Name	Type	Description
`zones`	list	List of zones.
`name`	string	Friendly name for this zone.
`polygon`	string	Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. `threshold` float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is `zonecrossing` and 16 when time is `DwellTime`. The specifies values are recommended in order to achieve maximum accuracy.
`eventType`	string	For cognitiveservices.vision.spatialanalysis-personcrossingpolygon this should be `zonecrossing` or `zonedwelltime`.
`trigger`	string	The type of trigger for sending an event. Supported Values: "event": fire when someone enters or exits the zone.
`focus`	string	The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
`threshold`	float	Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 16. This is the recommended value to achieve maximum accuracy.
`enableFaceMaskClassifier`	boolean	true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
`detectorNodeConfiguration`	string	The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
`trackerNodeConfiguration`	string	The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Orientation parameter settings

You can configure the orientation computation through DETECTOR_NODE_CONFIG parameter settings

{
    "enable_orientation": true,
}

Name	Type	Description
`enable_orientation`	bool	Indicates whether you want to compute the orientation for the detected people or not. `enable_orientation` is set by default to True.

Speed parameter settings

You can configure the speed computation through TRACKER_NODE_CONFIG parameter settings

{
    "enable_speed": true,
}

Name	Type	Description
`enable_speed`	bool	Indicates whether you want to compute the speed for the detected people or not. `enable_speed` is set by default to True. It is highly recommended that we enable both speed and orientation to have the best estimated values.

Output:

{
  "body": {
    "timestamp": 147026846756730,
    "inferences": [
      {
        "type": "entity",
        "inferenceId": "8e8269c1a9584b3a8f16a3cd7a2cd45a",
        "entity": {
          "tag": {
            "value": "person",
            "confidence": 0.9511422
          },
          "box": {
            "l": 0.59845686,
            "t": 0.35958588,
            "w": 0.11951797,
            "h": 0.50172085
          }
        },
        "extensions": {
          "centerGroundPointY": "0.0",
          "footprintY": "inf",
          "centerGroundPointX": "0.0",
          "mappedImageOrientation": "inf",
          "groundOrientationAngle": "inf",
          "footprintX": "inf",
          "trackingId": "f54d4c8fb4f345a9afb944303b0f3b40",
          "speed": "0.0"
        }
      },
      {
        "type": "entity",
        "inferenceId": "c54c9f92dd0d442b8d1840756715a5c7",
        "entity": {
          "tag": {
            "value": "person",
            "confidence": 0.92762595
          },
          "box": {
            "l": 0.8098704,
            "t": 0.47707137,
            "w": 0.18019487,
            "h": 0.48659682
          }
        },
        "extensions": {
          "footprintY": "inf",
          "groundOrientationAngle": "inf",
          "trackingId": "a226eda9226e4ec9b39ebceb7c8c1f61",
          "mappedImageOrientation": "inf",
          "speed": "0.0",
          "centerGroundPointX": "0.0",
          "centerGroundPointY": "0.0",
          "footprintX": "inf"
        }
      },
      {
        "type": "event",
        "inferenceId": "aad2778756a94afd9055fdbb3a370d62",
        "relatedInferences": [
          "8e8269c1a9584b3a8f16a3cd7a2cd45a"
        ],
        "event": {
          "name": "personZoneEnterExitEvent",
          "properties": {
            "trackingId": "f54d4c8fb4f345a9afb944303b0f3b40",
            "zone": "retailstore",
            "status": "Enter"
          }
        }
      },
      {
        "type": "event",
        "inferenceId": "e30103d3af28485688d7c77bbe10b5b5",
        "relatedInferences": [
          "c54c9f92dd0d442b8d1840756715a5c7"
        ],
        "event": {
          "name": "personZoneEnterExitEvent",
          "properties": {
            "trackingId": "a226eda9226e4ec9b39ebceb7c8c1f61",
            "status": "Enter",
            "zone": "retailstore"
          }
        }
      }
    ]
  }

Person Line Crossing

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonLineCrossingOperation

See an example of Person Line Crossing Operation from our GitHub sample.

Parameters:

Name	Type	Description
`lines`	list	List of lines.
`name`	string	Friendly name for this line.
`line`	string	Each value pair represents the starting and ending point of the line. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
`start`	value pair	x, y coordinates for line's starting point. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
`end`	value pair	x, y coordinates for line's ending point. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
`type`	string	This should be `linecrossing`.
`trigger`	string	The type of trigger for sending an event. Supported Values: "event": fire when someone crosses the line.
`outputFrequency`	int	The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
`focus`	string	The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
`threshold`	float	Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 16. This is the recommended value to achieve maximum accuracy.
`enableFaceMaskClassifier`	boolean	true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
`detectorNodeConfiguration`	string	The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
`trackerNodeConfiguration`	string	The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666620394490,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "2d3c7c7d6c0f4af7916eb50944523bdf",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.38330078
        },
        "box": {
          "l": 0.5316645,
          "t": 0.28169397,
          "w": 0.045862257,
          "h": 0.1594377
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "trackingId": "ac4a79a29a67402ba447b7da95907453",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "2206088c80eb4990801f62c7050d142f",
      "relatedInferences": ["2d3c7c7d6c0f4af7916eb50944523bdf"],
      "event": {
        "name": "personLineEvent",
        "properties": {
          "trackingId": "ac4a79a29a67402ba447b7da95907453",
          "status": "CrossLeft",
          "zone": "door"
        }
      }
    }
  ]
}

Person Distance

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonDistanceOperation

See an example of Person Distance Operation from our GitHub sample.

Parameters:

Name	Type	Description
`zones`	list	List of zones.
`name`	string	Friendly name for this zone.
`polygon`	string	Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. `threshold` float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is `zonecrossing` and 16 when time is `DwellTime`. The specifies values are recommended in order to achieve maximum accuracy.
`trigger`	string	The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not.
`focus`	string	The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
`threshold`	float	Events are egressed when the person is greater than this number of pixels inside the zone.
`outputFrequency`	int	The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
`minimumDistanceThreshold`	float	A distance in feet that will trigger a "TooClose" event when people are less than that distance apart.
`maximumDistanceThreshold`	float	A distance in feet that will trigger a "TooFar" event when people are greater than that distance apart.
`aggregationMethod`	string	The method for aggregate `persondistance` result. The aggregationMethod is applicable to both mode and average.
`enableFaceMaskClassifier`	boolean	true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
`detectorNodeConfiguration`	string	The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
`trackerNodeConfiguration`	string	The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666613610297,
  "inferences": [
    {
      "type": "event",
      "inferenceId": "85a5fc4936294a3bac90b9c43876741a",
      "event": {
        "name": "personDistanceEvent",
        "properties": {
          "maximumDistanceThreshold": "14.5",
          "personCount": "0.0",
          "eventName": "Unknown",
          "zone": "door",
          "averageDistance": "0.0",
          "minimumDistanceThreshold": "1.5",
          "distanceViolationPersonCount": "0.0"
        }
      }
    }
  ]
}

Person Count

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonCountOperation

See an example of Person Count Operation from our GitHub sample.

Parameters:

Name	Type	Description
`zones`	list	List of zones.
`name`	string	Friendly name for this zone.
`polygon`	string	Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. `threshold` float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is `zonecrossing` and 16 when time is `DwellTime`. The specifies values are recommended in order to achieve maximum accuracy.
`outputFrequency`	int	The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
`trigger`	string	The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not.
`focus`	string	The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
`threshold`	float	Events are egressed when the person is greater than this number of pixels inside the zone.
`enableFaceMaskClassifier`	boolean	true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
`detectorNodeConfiguration`	string	The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
`trackerNodeConfiguration`	string	The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666599533564,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "5b8076753b8c47bba8c72a7e0f7c5cc0",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.9458008
        },
        "box": {
          "l": 0.474487,
          "t": 0.26522297,
          "w": 0.066929355,
          "h": 0.2828749
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "fb309c9285f94f268378540b5fbbf5ad",
      "relatedInferences": ["5b8076753b8c47bba8c72a7e0f7c5cc0"],
      "event": {
        "name": "personCountEvent",
        "properties": {
          "personCount": "1.0",
          "zone": "demo"
        }
      }
    }
  ]
}

Custom Operation

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisCustomOperation

See an example of Custom Operation from our GitHub sample.

Parameters:

Name	Type	Description
extensionConfiguration	string	JSON representation of the operation.

Output:

{
  "timestamp": 145666599533564,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "5b8076753b8c47bba8c72a7e0f7c5cc0",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.9458008
        },
        "box": {
          "l": 0.474487,
          "t": 0.26522297,
          "w": 0.066929355,
          "h": 0.2828749
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "fb309c9285f94f268378540b5fbbf5ad",
      "relatedInferences": ["5b8076753b8c47bba8c72a7e0f7c5cc0"],
      "event": {
        "name": "personCountEvent",
        "properties": {
          "personCount": "1.0",
          "zone": "demo"
        }
      }
    }
  ]
}

Share via

Supported Spatial Analysis operations

Person Zone Crossing

Parameters:

Orientation parameter settings

Speed parameter settings

Output:

Person Line Crossing

Parameters:

Output:

Person Distance

Parameters:

Output:

Person Count

Parameters:

Output:

Custom Operation

Parameters:

Output:

Additional resources