Keyword: Curriculum

The curriculum statement is used within the concept statement to define how the training engine should train the AI. Define lessons to create a staged teaching plan and adjust optional training parameters to control how training episodes run.

Training for the curriculum stops when any of the following conditions are met:

  • Manually stopped by the user.
  • Training appears to have converged (the AI is no longer improving).
  • Limits on curriculum training parameters (for example, TotalIterationLimit) are reached.

During training, the platform periodically runs assessments consisting of groups of test episodes. Assessments produce the following information for all goals in the curriculum:

  1. Success rate: Success rate is a summary metric indicating the fraction of test episodes within an assessment where the AI achieves a given objective. Success is a binary measure (pass/fail) at the episode level while overall success rate is calculated across all episodes.
  2. Goal satisfaction rate: Goal satisfaction rate is a summary and episode-level percentage metric indicating how close the AI came to satisfying the associated objective during the episode, regardless of success. An AI may receive a high satisfaction rate for coming very close, despite ultimately failing in the objective. For example, an AI attempting to raise and hold temperature at 30°C may only reach 29°C in a given episode. Such an episode would have a high goal satisfaction rate, even if the AI ultimately failed. A 100% satisfaction rate is only possible if the AI successfully completes the objective.

See Keyword: Goal for objective-specific training results.

Usage

Important

There can be only one curriculum per concept, and every learned concept must have a curriculum.

Every curriculum must provide a source clause that specifies the data source for teaching the concept. Simulators are the only supported data source at this time. See Keyword: simulator for more information.

concept MyConcept(input): OutputType {
  curriculum {
    source MySimulator

    # Lessons specified here

  }
}

Curriculum training parameters

You can adjust some training parameters with the training clause:

Parameter Values Default Description
EpisodeIterationLimit Number.UInt32 1000 Total iterations allowed per training episode.
TotalIterationLimit Number.UInt32 50,000,000 Total iterations allowed for the concept.
NoProgressIterationLimit Number.UInt32 250,000 Number of iterations allowed with no improvement before training auto-terminates.
LessonRewardThreshold number None Minimum reward value that counts as success.
LessonSuccessThreshold number<0 .. 1> 0.90 (90%) Minimum success rate to complete the lesson.
LessonAssessmentWindow Number.UInt32 30 Number of episodes per assessment. Used to compute LessonRewardThreshold and LessonSuccessThreshold.
Robustness Structure ({}) - Set parameters of robustness-enhancing training augmentation. See next rows.
Robustness.MaxDroppedInputRate number<0..1> 0 Max rate of dropping brain inputs.
Robustness.MaxDroppedActionRate number<0..1> 0 Max rate of dropping actions.
Robustness.MaxInputDelay UInt32<0..100> 0 Max delay of brain inputs, in iterations.
Robustness.MaxActionDelay UInt32<0..100> 0 Max delay of brain actions, in iterations.

For example:

concept MyConcept(input: SimState): BrainAction {
  curriculum {
    training {
      EpisodeIterationLimit: 250,
      TotalIterationLimit: 100000
    }
  }
}

EpisodeIterationLimit

A new training episode begins after EpisodeIterationLimit iterations if the brain fails to reach a valid terminal condition.

TotalIterationLimit

The training engine ends training after TotalIterationLimit iterations, even if training performance is still improving. See also NoProgressIterationLimit.

Note

The actual number of training iterations may go slightly beyond the iteration limit to allow the last training batch to complete.

LessonRewardThreshold

Only supported for reward- and terminal-function based curriculum.

When a sufficient fraction of test episodes in an assessment (as indicated by LessonSuccessThreshold) have a cumulative reward that meets or exceeds the LessonRewardThreshold value, the training engine considers the lesson complete and moves to the next lesson in the curriculum.

If the lesson definition does not include a reward threshold, the training engine uses a general convergence test to determine when the lesson is complete.

LessonSuccessThreshold

When the episode success rate in an assessment exceeds the LessonSuccessThreshold value, the training engine considers the lesson complete and moves to the next lesson in the curriculum.

The value must be between 0 and 1 and represents a target fraction.

NoProgressIterationLimit

Training stops when brain performance has not improved in NoProgressIterationLimit iterations. The NoProgressIterationLimit iteration counter resets when training moves between lessons.

The NoProgressIterationLimit iteration counter does not reset when training is stopped or restarted. If training auto-terminates and you want to continue, you can increase the value of NoProgressIterationLimit and resume training.

Note

Progress checks happen after each assessment. The actual number of training iterations may go beyond the iteration limit to allow an assessment to finish and the last training batch to complete.

LessonAssessmentWindow

Sets the number of test episodes per assessment. Assessments are groups of test episodes periodically run to evaluate the AI during training. Lesson transitions based on the LessonSuccessThreshold and LessonRewardThreshold parameters are evaluated after each assessment. Auto-termination of training (see NoProgressIterationLimit) is also based on assessment performance.

Robustness

The Robustness clause configures several features of the Bonsai AI engine that increase the robustness of the trained concept. Robustness helps address differences between the training simulator and the real deployment environment. For example:

concept MyConcept(input: SimState): BrainAction {
  curriculum {
    training {
      Robustness: {
        # drop up to 20% of brain inputs, replacing them with the previous value
        MaxDroppedInputRate: 0.2,
        # delay brain inputs by up to 10 iterations
        MaxInputDelay: 10,
        # Delay actions by up to 5 iterations
        MaxActionDelay: 5,
        # Drop up to 5% of actions, replacing them with the previous action
        MaxDroppedActionRate: 0.05
      }
    }
  }
}

The robustness features work by injecting delays or failures of brain inputs and concept outputs. Delays and failures force the concept to learn a policy that handles delayed or dropped states and actions appropriately. Robustness helps bridge the sim-to-real gap for concepts trained in a simulation that does not perfectly model sensor and actuator delays or errors.

Goals, rewards, and terminal conditions are still computed on the true state and action. The delays and drops are applied to the brain input and the actions sent to the environment as shown in the following figure:

Data flow for robustness features

Diagram showing where the robustness functionality fits in the training data flow.

Note

Robustness features only work when training with a simulator, not with a dataset.

MaxDroppedInputRate, MaxDroppedActionRate

MaxDroppedInputRate and MaxDroppedActionRate configure the maximum rate at which the brain input and the concept action are dropped. During training, the system chooses a different drop rate for each episode to expose the concept to a variety of conditions. When an input or action is dropped during training, Bonsai uses the last known input or action value instead.

The first input and action is never dropped. The drop rate must be between 0 and 1, and defaults to 0.

MaxInputDelay, MaxActionDelay

MaxInputDelay and MaxActionDelay configure the maximum allowable delay for delivering brain inputs (MaxInputDelay) or brain actions (MaxActionDelay) during training iterations.

During training, the system selects a different delay value for each episode to expose the concept to a variety of sensor and actuation delays.

The delay value must be an integer between 0 and 100, and defaults to 0.

Transform functions

In some cases, you need to translate communication between the simulator and the AI during training. For example,

  • The simulator produces more information than the AI will have access to production.
  • Action instructions the AI sends once deployed have a different format than what the simulator expects.

To perform translations on communication between the simulator and the AI, use a transform function. Inkling supports the following transform functions:

  • State transform: used to translate information about the environment for consumption by the AI. For example, applying scaling values, converting measurement units, or aggregating values.
  • Action transform: used to translate AI instructions for application within the simulated environment.

See Keyword: State (transform) for details on using state transforms and Keyword: Action (transform) for details on using action transforms.

Action masking

See Keyword: Mask for details on using action masks, which restrict the set of available actions for particular states.