Events
Take the Microsoft Learn Challenge
Nov 19, 11 PM - Jan 10, 11 PM
Ignite Edition - Build skills in Microsoft Azure and earn a digital badge by January 10!
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Important
Starting on the 20th of September, 2023 you won’t be able to create new Personalizer resources. The Personalizer service is being retired on the 1st of October, 2026.
Azure AI Personalizer is an AI service that your applications make smarter decisions at scale using reinforcement learning. Personalizer processes information about the state of your application, scenario, and/or users (contexts), and a set of possible decisions and related attributes (actions) to determine the best decision to make. Feedback from your application (rewards) is sent to Personalizer to learn how to improve its decision-making ability in near-real time.
Personalizer can determine the best actions to take in a variety of scenarios:
To get started with the Personalizer, follow the quickstart guide, or try Personalizer in your browser with this interactive demo.
This documentation contains the following types of articles:
Personalizer uses reinforcement learning to select the best action for a given context across all users in order to maximize an average reward.
Personalizer empowers you to take advantage of the power and flexibility of reinforcement learning using just two primary APIs.
The Rank API is called by your application each time there's a decision to be made. The application sends a JSON containing a set of actions, features that describe each action, and features that describe the current context. Each Rank API call is known as an event and noted with a unique event ID. Personalizer then returns the ID of the best action that maximizes the total average reward as determined by the underlying model.
The Reward API is called by your application whenever there's feedback that can help Personalizer learn if the action ID returned in the Rank call provided value. For example, if a user clicked on the suggested news article, or completed the purchase of a suggested product. A call to the Reward API can be in real-time (just after the Rank call is made) or delayed to better fit the needs of the scenario. The reward score is determined by your business metrics and objectives and can be generated by an algorithm or rules in your application. The score is a real-valued number between 0 and 1.
Apprentice mode Similar to how an apprentice learns a craft from observing an expert, Apprentice mode enables Personalizer to learn by observing your application's current decision logic. This helps to mitigate the so-called "cold start" problem with a new untrained model, and allows you to validate the action and context features that are sent to Personalizer. In Apprentice mode, each call to the Rank API returns the baseline action or default action that is the action that the application would have taken without using Personalizer. This is sent by your application to Personalizer in the Rank API as the first item in the set of possible actions.
Online mode Personalizer will return the best action, given the context, as determined by the underlying RL model and explores other possible actions that may improve performance. Personalizer learns from feedback provided in calls to the Reward API.
Note that Personalizer uses collective information across all users to learn the best actions based on the current context. The service does not:
Here are a few examples where Personalizer can be used to select the best content to render for a user.
Content type | Actions {features} | Context features | Returned Reward Action ID (display this content) |
---|---|---|---|
News articles | a. The president... , {national, politics, [text]}b. Premier League ... {global, sports, [text, image, video]}c. Hurricane in the ... {regional, weather, [text,image]} |
Country='USA', Recent_Topics=('politics', 'business'), Month='October' |
a The president... |
Movies | 1. Star Wars {1977, [action, adventure, fantasy], George Lucas}2. Hoop Dreams {1994, [documentary, sports], Steve James}3. Casablanca {1942, [romance, drama, war], Michael Curtiz} |
Device='smart TV', Screen_Size='large', Favorite_Genre='classics' |
3. Casablanca |
E-commerce Products | i. Product A {3 kg, $$$$, deliver in 1 day}ii. Product B {20 kg, $$, deliver in 7 days}iii. Product C {3 kg, $$$, deliver in 2 days} |
Device='iPhone', Spending_Tier='low', Month='June' |
ii. Product B |
Use Personalizer when your scenario has:
At Microsoft, we're committed to the advancement of AI driven by principles that put people first. AI models such as the ones available in the Personalizer service have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers. See the Responsible AI docs for Personalizer.
Design and plan the actions, and context. Determine how to interpret feedback as a reward score.
Each Personalizer Resource you create is defined as one Learning Loop. The loop will receive both the Rank and Reward calls for that content or user experience and train an underlying RL model. There are
Resource type | Purpose |
---|---|
Apprentice mode - E0 |
Train Personalizer to mimic your current decision-making logic without impacting your existing application, before using Online mode to learn better policies in a production environment. |
Online mode - Standard, S0 |
Personalizer uses RL to determine best actions in production. |
Online mode - Free, F0 |
Try Personalizer in a limited non-production environment. |
Add Personalizer to your application, website, or system:
Add a Rank call to Personalizer in your application, website, or system to determine the best action.
Use the best action, as specified as a reward action ID in your scenario.
Apply business logic to user behavior or feedback data to determine the reward score. For example:
Behavior | Calculated reward score |
---|---|
User selected a news article suggested by Personalizer | 1 |
User selected a news article not suggested by Personalizer | 0 |
User hesitated to select a news article, scrolled around indecisively, and ultimately selected the news article suggested by Personalizer | 0.5 |
Add a Reward call sending a reward score between 0 and 1
Evaluate your loop with an offline evaluation after a period of time when Personalizer has received significant data to make online decisions. An offline evaluation allows you to test and assess the effectiveness of the Personalizer Service without code changes or user impact.
Events
Take the Microsoft Learn Challenge
Nov 19, 11 PM - Jan 10, 11 PM
Ignite Edition - Build skills in Microsoft Azure and earn a digital badge by January 10!
Register now