Learning policy and settings


Starting on the 20th of September, 2023 you won’t be able to create new Personalizer resources. The Personalizer service is being retired on the 1st of October, 2026.

Learning settings determine the hyperparameters of the model training. Two models of the same data that are trained on different learning settings will end up different.

Learning policy and settings are set on your Personalizer resource in the Azure portal.

Import and export learning policies

You can import and export learning-policy files from the Azure portal. Use this method to save existing policies, test them, replace them, and archive them in your source code control as artifacts for future reference and audit.

Learn how to import and export a learning policy in the Azure portal for your Personalizer resource.

Understand learning policy settings

The settings in the learning policy aren't intended to be changed. Change settings only if you understand how they affect Personalizer. Without this knowledge, you could cause problems, including invalidating Personalizer models.

Personalizer uses vowpalwabbit to train and score the events. Refer to the vowpalwabbit documentation on how to edit the learning settings using vowpalwabbit. Once you have the correct command line arguments, save the command to a file with the following format (replace the arguments property value with the desired command) and upload the file to import learning settings in the Model and Learning Settings pane in the Azure portal for your Personalizer resource.

The following .json is an example of a learning policy.

  "name": "new learning settings",
  "arguments": " --cb_explore_adf --epsilon 0.2 --power_t 0 -l 0.001 --cb_type mtr -q ::"

Compare learning policies

You can compare how different learning policies perform against past data in Personalizer logs by doing offline evaluations.

Upload your own learning policies to compare them with the current learning policy.

Optimize learning policies

Personalizer can create an optimized learning policy in an offline evaluation. An optimized learning policy that has better rewards in an offline evaluation will yield better results when it's used online in Personalizer.

After you optimize a learning policy, you can apply it directly to Personalizer so it immediately replaces the current policy. Or you can save the optimized policy for further evaluation and later decide whether to discard, save, or apply it.

Next steps