What is Personalizer?


Starting on the 20th of September, 2023 you won’t be able to create new Personalizer resources. The Personalizer service is being retired on the 1st of October, 2026.


As of July 2023, Azure AI services encompass all of what were previously known as Cognitive Services and Azure Applied AI Services. There are no changes to pricing. The names Cognitive Services and Azure Applied AI continue to be used in Azure billing, cost analysis, price list, and price APIs. There are no breaking changes to application programming interfaces (APIs) or SDKs.

Azure AI Personalizer is an AI service that your applications make smarter decisions at scale using reinforcement learning. Personalizer processes information about the state of your application, scenario, and/or users (contexts), and a set of possible decisions and related attributes (actions) to determine the best decision to make. Feedback from your application (rewards) is sent to Personalizer to learn how to improve its decision-making ability in near-real time.

Personalizer can determine the best actions to take in a variety of scenarios:

  • E-commerce: What product should be shown to customers to maximize the likelihood of a purchase?
  • Content recommendation: What article should be shown to increase the click-through rate?
  • Content design: Where should an advertisement be placed to optimize user engagement on a website?
  • Communication: When and how should a notification be sent to maximize the chance of a response?

To get started with the Personalizer, follow the quickstart guide, or try Personalizer in your browser with this interactive demo.

This documentation contains the following types of articles:

  • Quickstarts provide step-by-step instructions to guide you through setup and sample code to start making API requests to the service.
  • How-to guides contain instructions for using Personalizer features and advanced capabilities.
  • Code samples demonstrate how to use Personalizer and help you to easily interface your application with the service.
  • Tutorials are longer walk-throughs implementing Personalizer as a part of a broader business solution.
  • Concepts provide further detail on Personalizer features, capabilities, and fundamentals.

How does Personalizer work?

Personalizer uses reinforcement learning to select the best action for a given context across all users in order to maximize an average reward.

  • Context: Information that describes the state of your application, scenario, or user that may be relevant to making a decision.
    • Example: The location, device type, age, and favorite topics of users visiting a web site.
  • Actions: A discrete set of items that can be chosen, along with attributes describing each item.
    • Example: A set of news articles and the topics that are discussed in each article.
  • Reward: A numerical score between 0 and 1 that indicates whether the decision was bad (0), or good (1)
    • Example: A "1" indicates that a user clicked on the suggested article, whereas a "0" indicates the user did not.

Rank and Reward APIs

Personalizer empowers you to take advantage of the power and flexibility of reinforcement learning using just two primary APIs.

The Rank API is called by your application each time there's a decision to be made. The application sends a JSON containing a set of actions, features that describe each action, and features that describe the current context. Each Rank API call is known as an event and noted with a unique event ID. Personalizer then returns the ID of the best action that maximizes the total average reward as determined by the underlying model.

The Reward API is called by your application whenever there's feedback that can help Personalizer learn if the action ID returned in the Rank call provided value. For example, if a user clicked on the suggested news article, or completed the purchase of a suggested product. A call to the Reward API can be in real-time (just after the Rank call is made) or delayed to better fit the needs of the scenario. The reward score is determined by your business metrics and objectives and can be generated by an algorithm or rules in your application. The score is a real-valued number between 0 and 1.

Learning modes

  • Apprentice mode Similar to how an apprentice learns a craft from observing an expert, Apprentice mode enables Personalizer to learn by observing your application's current decision logic. This helps to mitigate the so-called "cold start" problem with a new untrained model, and allows you to validate the action and context features that are sent to Personalizer. In Apprentice mode, each call to the Rank API returns the baseline action or default action that is the action that the application would have taken without using Personalizer. This is sent by your application to Personalizer in the Rank API as the first item in the set of possible actions.

  • Online mode Personalizer will return the best action, given the context, as determined by the underlying RL model and explores other possible actions that may improve performance. Personalizer learns from feedback provided in calls to the Reward API.

Note that Personalizer uses collective information across all users to learn the best actions based on the current context. The service does not:

  • Persist and manage user profile information. Unique user IDs should not be sent to Personalizer.
  • Log individual users' preferences or historical data.

Example scenarios

Here are a few examples where Personalizer can be used to select the best content to render for a user.

Content type Actions {features} Context features Returned Reward Action ID
(display this content)
News articles a. The president..., {national, politics, [text]}
b. Premier League ... {global, sports, [text, image, video]}
c. Hurricane in the ... {regional, weather, [text,image]}
Recent_Topics=('politics', 'business'),
a The president...
Movies 1. Star Wars {1977, [action, adventure, fantasy], George Lucas}
2. Hoop Dreams {1994, [documentary, sports], Steve James}
3. Casablanca {1942, [romance, drama, war], Michael Curtiz}
Device='smart TV',
3. Casablanca
E-commerce Products i. Product A {3 kg, $$$$, deliver in 1 day}
ii. Product B {20 kg, $$, deliver in 7 days}
iii. Product C {3 kg, $$$, deliver in 2 days}
ii. Product B

Scenario requirements

Use Personalizer when your scenario has:

  • A limited set of actions or items to select from in each personalization event. We recommend no more than ~50 actions in each Rank API call. If you have a larger set of possible actions, we suggest using a recommendation engine or another mechanism to reduce the list of actions prior to calling the Rank API.
  • Information describing the actions (action features).
  • Information describing the current context (contextual features).
  • Sufficient data volume to enable Personalizer to learn. In general, we recommend a minimum of ~1,000 events per day to enable Personalizer to learn effectively. If Personalizer doesn't receive sufficient data, the service takes longer to determine the best actions.

Responsible use of AI

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. AI models such as the ones available in the Personalizer service have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers. See the Responsible AI docs for Personalizer.

Integrate Personalizer into an application

  1. Design and plan the actions, and context. Determine how to interpret feedback as a reward score.

  2. Each Personalizer Resource you create is defined as one Learning Loop. The loop will receive both the Rank and Reward calls for that content or user experience and train an underlying RL model. There are

    Resource type Purpose
    Apprentice mode - E0 Train Personalizer to mimic your current decision-making logic without impacting your existing application, before using Online mode to learn better policies in a production environment.
    Online mode - Standard, S0 Personalizer uses RL to determine best actions in production.
    Online mode - Free, F0 Try Personalizer in a limited non-production environment.
  3. Add Personalizer to your application, website, or system:

    1. Add a Rank call to Personalizer in your application, website, or system to determine the best action.

    2. Use the best action, as specified as a reward action ID in your scenario.

    3. Apply business logic to user behavior or feedback data to determine the reward score. For example:

      Behavior Calculated reward score
      User selected a news article suggested by Personalizer 1
      User selected a news article not suggested by Personalizer 0
      User hesitated to select a news article, scrolled around indecisively, and ultimately selected the news article suggested by Personalizer 0.5
    4. Add a Reward call sending a reward score between 0 and 1

      • Immediately after feedback is received.
      • Or sometime later in scenarios where delayed feedback is expected.
    5. Evaluate your loop with an offline evaluation after a period of time when Personalizer has received significant data to make online decisions. An offline evaluation allows you to test and assess the effectiveness of the Personalizer Service without code changes or user impact.

Next steps