Share via


Supported Spatial Analysis operations

Note

Azure Video Analyzer has been retired and is no longer available.

Azure Video Analyzer for Media is not affected by this retirement. It is now rebranded to Azure Video Indexer. Click here to read more.

Spatial Analysis enables the analysis of real-time streaming video from camera devices. For each camera device you configure, the operations will generate an output stream of JSON messages sent to Azure Video Analyzer.

Video Analyzer implements the following Spatial Analysis operations:

Operation Identifier Description
Microsoft.VideoAnalyzer.SpatialAnalysisPersonZoneCrossingOperation Emits a personZoneEnterExitEvent event when a person enters or exits the zone and provides directional info with the numbered side of the zone that was crossed. Emits a personZoneDwellTimeEvent when the person exits the zone and provides directional info and the number of milliseconds the person spent inside the zone.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonLineCrossingOperation Tracks when a person crosses a designated line in the camera's field of view.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonDistanceOperation Tracks when people violate a distance rule.
Microsoft.VideoAnalyzer.SpatialAnalysisPersonCountOperation Counts people in a designated zone in the camera's field of view. The zone must be fully covered by a single camera in order for PersonCount to record an accurate total.
Microsoft.VideoAnalyzer.SpatialAnalysisCustomOperation Generic operation that can be used to run all scenarios mentioned above. This option is more useful when you want to run multiple scenarios on the same camera or use system resources (for example, GPU) more efficiently.

Person Zone Crossing

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonZoneCrossingOperation

See an example of Person Zone Crossing Operation from our GitHub sample.

Parameters:

Name Type Description
zones list List of zones.
name string Friendly name for this zone.
polygon string Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. threshold float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is zonecrossing and 16 when time is DwellTime. The specifies values are recommended in order to achieve maximum accuracy.
eventType string For cognitiveservices.vision.spatialanalysis-personcrossingpolygon this should be zonecrossing or zonedwelltime.
trigger string The type of trigger for sending an event. Supported Values: "event": fire when someone enters or exits the zone.
focus string The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
threshold float Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 16. This is the recommended value to achieve maximum accuracy.
enableFaceMaskClassifier boolean true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
detectorNodeConfiguration string The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
trackerNodeConfiguration string The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Orientation parameter settings

You can configure the orientation computation through DETECTOR_NODE_CONFIG parameter settings

{
    "enable_orientation": true,
}

Name Type Description
enable_orientation bool Indicates whether you want to compute the orientation for the detected people or not. enable_orientation is set by default to True.

Speed parameter settings

You can configure the speed computation through TRACKER_NODE_CONFIG parameter settings

{
    "enable_speed": true,
}

Name Type Description
enable_speed bool Indicates whether you want to compute the speed for the detected people or not. enable_speed is set by default to True. It is highly recommended that we enable both speed and orientation to have the best estimated values.

Output:

{
  "body": {
    "timestamp": 147026846756730,
    "inferences": [
      {
        "type": "entity",
        "inferenceId": "8e8269c1a9584b3a8f16a3cd7a2cd45a",
        "entity": {
          "tag": {
            "value": "person",
            "confidence": 0.9511422
          },
          "box": {
            "l": 0.59845686,
            "t": 0.35958588,
            "w": 0.11951797,
            "h": 0.50172085
          }
        },
        "extensions": {
          "centerGroundPointY": "0.0",
          "footprintY": "inf",
          "centerGroundPointX": "0.0",
          "mappedImageOrientation": "inf",
          "groundOrientationAngle": "inf",
          "footprintX": "inf",
          "trackingId": "f54d4c8fb4f345a9afb944303b0f3b40",
          "speed": "0.0"
        }
      },
      {
        "type": "entity",
        "inferenceId": "c54c9f92dd0d442b8d1840756715a5c7",
        "entity": {
          "tag": {
            "value": "person",
            "confidence": 0.92762595
          },
          "box": {
            "l": 0.8098704,
            "t": 0.47707137,
            "w": 0.18019487,
            "h": 0.48659682
          }
        },
        "extensions": {
          "footprintY": "inf",
          "groundOrientationAngle": "inf",
          "trackingId": "a226eda9226e4ec9b39ebceb7c8c1f61",
          "mappedImageOrientation": "inf",
          "speed": "0.0",
          "centerGroundPointX": "0.0",
          "centerGroundPointY": "0.0",
          "footprintX": "inf"
        }
      },
      {
        "type": "event",
        "inferenceId": "aad2778756a94afd9055fdbb3a370d62",
        "relatedInferences": [
          "8e8269c1a9584b3a8f16a3cd7a2cd45a"
        ],
        "event": {
          "name": "personZoneEnterExitEvent",
          "properties": {
            "trackingId": "f54d4c8fb4f345a9afb944303b0f3b40",
            "zone": "retailstore",
            "status": "Enter"
          }
        }
      },
      {
        "type": "event",
        "inferenceId": "e30103d3af28485688d7c77bbe10b5b5",
        "relatedInferences": [
          "c54c9f92dd0d442b8d1840756715a5c7"
        ],
        "event": {
          "name": "personZoneEnterExitEvent",
          "properties": {
            "trackingId": "a226eda9226e4ec9b39ebceb7c8c1f61",
            "status": "Enter",
            "zone": "retailstore"
          }
        }
      }
    ]
  }

Person Line Crossing

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonLineCrossingOperation

See an example of Person Line Crossing Operation from our GitHub sample.

Parameters:

Name Type Description
lines list List of lines.
name string Friendly name for this line.
line string Each value pair represents the starting and ending point of the line. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
start value pair x, y coordinates for line's starting point. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
end value pair x, y coordinates for line's ending point. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size.
type string This should be linecrossing.
trigger string The type of trigger for sending an event. Supported Values: "event": fire when someone crosses the line.
outputFrequency int The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
focus string The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
threshold float Events are egressed when the person is greater than this number of pixels inside the zone. The default value is 16. This is the recommended value to achieve maximum accuracy.
enableFaceMaskClassifier boolean true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
detectorNodeConfiguration string The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
trackerNodeConfiguration string The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666620394490,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "2d3c7c7d6c0f4af7916eb50944523bdf",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.38330078
        },
        "box": {
          "l": 0.5316645,
          "t": 0.28169397,
          "w": 0.045862257,
          "h": 0.1594377
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "trackingId": "ac4a79a29a67402ba447b7da95907453",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "2206088c80eb4990801f62c7050d142f",
      "relatedInferences": ["2d3c7c7d6c0f4af7916eb50944523bdf"],
      "event": {
        "name": "personLineEvent",
        "properties": {
          "trackingId": "ac4a79a29a67402ba447b7da95907453",
          "status": "CrossLeft",
          "zone": "door"
        }
      }
    }
  ]
}

Person Distance

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonDistanceOperation

See an example of Person Distance Operation from our GitHub sample.

Parameters:

Name Type Description
zones list List of zones.
name string Friendly name for this zone.
polygon string Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. threshold float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is zonecrossing and 16 when time is DwellTime. The specifies values are recommended in order to achieve maximum accuracy.
trigger string The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not.
focus string The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
threshold float Events are egressed when the person is greater than this number of pixels inside the zone.
outputFrequency int The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
minimumDistanceThreshold float A distance in feet that will trigger a "TooClose" event when people are less than that distance apart.
maximumDistanceThreshold float A distance in feet that will trigger a "TooFar" event when people are greater than that distance apart.
aggregationMethod string The method for aggregate persondistance result. The aggregationMethod is applicable to both mode and average.
enableFaceMaskClassifier boolean true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
detectorNodeConfiguration string The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
trackerNodeConfiguration string The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666613610297,
  "inferences": [
    {
      "type": "event",
      "inferenceId": "85a5fc4936294a3bac90b9c43876741a",
      "event": {
        "name": "personDistanceEvent",
        "properties": {
          "maximumDistanceThreshold": "14.5",
          "personCount": "0.0",
          "eventName": "Unknown",
          "zone": "door",
          "averageDistance": "0.0",
          "minimumDistanceThreshold": "1.5",
          "distanceViolationPersonCount": "0.0"
        }
      }
    }
  ]
}

Person Count

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisPersonCountOperation

See an example of Person Count Operation from our GitHub sample.

Parameters:

Name Type Description
zones list List of zones.
name string Friendly name for this zone.
polygon string Each value pair represents the x,y for vertices of polygon. The polygon represents the areas in which people are tracked or counted. The float values represent the position of the vertex relative to the top-left corner. To calculate the absolute x, y values, you multiply these values with the frame size. threshold float events are egressed when the person is greater than this number of pixels inside the zone. The default value is 48 when type is zonecrossing and 16 when time is DwellTime. The specifies values are recommended in order to achieve maximum accuracy.
outputFrequency int The rate at which events are egressed. When outputFrequency = X, every X event is egressed, ex. outputFrequency = 2 means every other event is output. The outputFrequency is applicable to both event and interval.
trigger string The type of trigger for sending an event. Supported values are event for sending events when the count changes or interval for sending events periodically, irrespective of whether the count has changed or not.
focus string The point location within person's bounding box used to calculate events. Focus's value can be footprint (the footprint of person), bottom_center (the bottom center of person's bounding box), center (the center of person's bounding box). The default value is footprint.
threshold float Events are egressed when the person is greater than this number of pixels inside the zone.
enableFaceMaskClassifier boolean true to enable detecting people wearing face masks in the video stream, false to disable it. By default this is disabled. Face mask detection requires input video width parameter to be 1920 "INPUT_VIDEO_WIDTH": 1920. The face mask attribute will not be return.
detectorNodeConfiguration string The DETECTOR_NODE_CONFIG parameters for all Spatial Analysis operations.
trackerNodeConfiguration string The TRACKER_NODE_CONFIG parameters for all Spatial Analysis operations.

Output:

{
  "timestamp": 145666599533564,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "5b8076753b8c47bba8c72a7e0f7c5cc0",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.9458008
        },
        "box": {
          "l": 0.474487,
          "t": 0.26522297,
          "w": 0.066929355,
          "h": 0.2828749
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "fb309c9285f94f268378540b5fbbf5ad",
      "relatedInferences": ["5b8076753b8c47bba8c72a7e0f7c5cc0"],
      "event": {
        "name": "personCountEvent",
        "properties": {
          "personCount": "1.0",
          "zone": "demo"
        }
      }
    }
  ]
}

Custom Operation

Operation Identifier: Microsoft.VideoAnalyzer.SpatialAnalysisCustomOperation

See an example of Custom Operation from our GitHub sample.

Parameters:

Name Type Description
extensionConfiguration string JSON representation of the operation.

Output:

{
  "timestamp": 145666599533564,
  "inferences": [
    {
      "type": "entity",
      "inferenceId": "5b8076753b8c47bba8c72a7e0f7c5cc0",
      "entity": {
        "tag": {
          "value": "person",
          "confidence": 0.9458008
        },
        "box": {
          "l": 0.474487,
          "t": 0.26522297,
          "w": 0.066929355,
          "h": 0.2828749
        }
      },
      "extensions": {
        "centerGroundPointX": "0.0",
        "centerGroundPointY": "0.0",
        "footprintX": "inf",
        "footprintY": "inf"
      }
    },
    {
      "type": "event",
      "inferenceId": "fb309c9285f94f268378540b5fbbf5ad",
      "relatedInferences": ["5b8076753b8c47bba8c72a7e0f7c5cc0"],
      "event": {
        "name": "personCountEvent",
        "properties": {
          "personCount": "1.0",
          "zone": "demo"
        }
      }
    }
  ]
}