How Personalizer works


Starting on the 20th of September, 2023 you won’t be able to create new Personalizer resources. The Personalizer service is being retired on the 1st of October, 2026.

The Personalizer resource, your learning loop, uses machine learning to build the model that predicts the top action for your content. The model is trained exclusively on your data that you sent to it with the Rank and Reward calls. Every loop is completely independent of each other.

Rank and Reward APIs impact the model

You send actions with features and context features to the Rank API. The Rank API decides to use either:

  • Exploit: The current model to decide the best action based on past data.
  • Explore: Select a different action instead of the top action. You configure this percentage for your Personalizer resource in the Azure portal.

You determine the reward score and send that score to the Reward API. The Reward API:

  • Collects data to train the model by recording the features and reward scores of each rank call.
  • Uses that data to update the model based on the configuration specified in the Learning Policy.

Your system calling Personalizer

The following image shows the architectural flow of calling the Rank and Reward calls:

alt text

  1. You send actions with features and context features to the Rank API.

    • Personalizer decides whether to exploit the current model or explore new choices for the model.
    • The ranking result is sent to EventHub.
  2. The top rank is returned to your system as reward action ID. Your system presents that content and determines a reward score based on your own business rules.

  3. Your system returns the reward score to the learning loop.

    • When Personalizer receives the reward, the reward is sent to EventHub.
    • The rank and reward are correlated.
    • The AI model is updated based on the correlation results.
    • The inference engine is updated with the new model.

Personalizer retrains your model

Personalizer retrains your model based on your Model frequency update setting on your Personalizer resource in the Azure portal.

Personalizer uses all the data currently retained, based on the Data retention setting in number of days on your Personalizer resource in the Azure portal.

Research behind Personalizer

Personalizer is based on cutting-edge science and research in the area of Reinforcement Learning including papers, research activities, and ongoing areas of exploration in Microsoft Research.

Next steps

Learn about top scenarios for Personalizer