Share via


Platform framework

MDEP Telemetry Service is a privileged APK that periodically executes and collects data from data providers. Each data provider is responsible for collecting specific telemetry data by querying respective data sources and sending it to Microsoft cloud destinations where it can be processed and analyzed.

Telemetry Service Sources and Destinations Diagram

Telemetry Service Sources and Destinations Diagram

To make use of existing best-practice services, the framework uses 1DS and Watson for data transmission. Currently telemetry data can be sent only to Microsoft backends. In future releases, there may be updates to the telemetry service architecture with the ability to customize the data destination so partners can use their own backends to process telemetry data.

MDEP Platform telemetry framework consists of:

  • Telemetry Service and providers,
  • Worker Service,
  • Telemetry Utilities,
  • HAL Service and native providers.

Worker Service

This is a simple Java service that receives notification about device boot or shutdown to start/stop HAL.

Telemetry Service

This is a service classified as JobService, meaning that Android’s JobScheduler is responsible for periodically starting and stopping the service. This is managed by parameters that are set during the BootCompleted or LockedBootCompleted broadcast event. This enables our service to be scheduled to execute every 15 minutes, both before and after the user has unlocked the device. Since the Telemetry Service is not an always-running background service, it does not maintain context or variables after each execution. While this keeps the overall system impact low, it also means that the framework is not able to respond to inputs in real-time. Therefore, any data collection that requires a registered receiver or other continuously monitored source is not a good fit for collection with this service.

During each scheduled execution, a Runnable is posted to a Handler thread. The Runnable checks the current telemetry opt-in level (Optional, Required, Disabled) and processes each provider accordingly.

The Telemetry Service uses the concept of “Required” or “Optional” data collection. If the user does not opt-in to Optional collection (more data collected), then we only collect from providers that are labeled as Required. This allows us to collect usage data only when the user has allowed it. Additionally, the Telemetry service has two separate logger instances for different event priorities: the general logger and critical logger. Most events are logged to the general logger; it only uploads events when the network conditions are determined to be unmetered. The critical logger is used for high-importance events that need to be uploaded as they come in, regardless of network meter status. Events related to device resets are an example that use the critical logger.

Data providers

Data providers are individual classes that are used to collect data and log/transmit events. They have their own parameters for periodic frequency, allowing some providers to run during every service execution (15 minutes) and others to run at an hourly or daily cadence. As data collection is modularized into individual provider classes, we can easily add new providers, throttle existing providers, and disable providers with minimal changes. We also collect stats on providers’ execution time and total events collected, as well as handle exceptions without impacting execution of other providers.

The Telemetry Service leverages existing AOSP frameworks such as StatsD, Dropbox Manager, and MDEP Diagnostic Service to telemetry data. More information about MDEP Diagnostic Service can be found in the OS services section.

HAL service

Its purpose is to collect events from data sources that are accessible at this layer, such as Linux kernel data, queries to hardware or other subsystems, etc. Like any other HAL services, it starts on boot and persists in the background.

Native provider

Event data collected from the native service follows the same design pattern as in Java service. Here are examples of native providers:

  • Crashdump collection - after a SOC reset, a partition can include memory data that can help debug the issue. This data is collected using the Crashdump provider for future processing and upload.
  • Reset data - collects data from various sources to enrich the reboot reason provided to the OS.
  • Peripheral data - collects information about connected peripherals. Events are collected periodically or on demand depending on the provider. Events are stored in-memory at the native layer until the Android Telemetry Service periodically queries and collects the events for processing/storing/uploading, which only takes place at the Android service layer.