Keyword: Curriculum
The curriculum
statement is used within the concept
statement to define
how the training engine should train the AI. Define lessons to
create a staged teaching plan and adjust optional training parameters to
control how training episodes run.
Training for the curriculum stops when any of the following conditions are met:
- Manually stopped by the user.
- Training appears to have converged (the AI is no longer improving).
- Limits on curriculum training parameters (for example,
TotalIterationLimit
) are reached.
During training, the platform periodically runs assessments consisting of groups of test episodes. Assessments produce the following information for all goals in the curriculum:
- Success rate: Success rate is a summary metric indicating the fraction of test episodes within an assessment where the AI achieves a given objective. Success is a binary measure (pass/fail) at the episode level while overall success rate is calculated across all episodes.
- Goal satisfaction rate: Goal satisfaction rate is a summary and episode-level percentage metric indicating how close the AI came to satisfying the associated objective during the episode, regardless of success. An AI may receive a high satisfaction rate for coming very close, despite ultimately failing in the objective. For example, an AI attempting to raise and hold temperature at 30°C may only reach 29°C in a given episode. Such an episode would have a high goal satisfaction rate, even if the AI ultimately failed. A 100% satisfaction rate is only possible if the AI successfully completes the objective.
See Keyword: Goal for objective-specific training results.
Usage
Important
There can be only one curriculum
per concept, and every learned concept
must have
a curriculum.
Every curriculum must provide a source
clause that specifies the data source
for teaching the concept. Simulators are the only supported data source at this
time. See Keyword: simulator for more information.
concept MyConcept(input): OutputType {
curriculum {
source MySimulator
# Lessons specified here
}
}
Curriculum training parameters
You can adjust some training parameters with the training
clause:
Parameter | Values | Default | Description |
---|---|---|---|
EpisodeIterationLimit |
Number.UInt32 |
1000 | Total iterations allowed per training episode. |
TotalIterationLimit |
Number.UInt32 |
50,000,000 | Total iterations allowed for the concept. |
NoProgressIterationLimit |
Number.UInt32 |
250,000 | Number of iterations allowed with no improvement before training auto-terminates. |
LessonRewardThreshold |
number |
None | Minimum reward value that counts as success. |
LessonSuccessThreshold |
number<0 .. 1> |
0.90 (90%) | Minimum success rate to complete the lesson. |
LessonAssessmentWindow |
Number.UInt32 |
30 | Number of episodes per assessment. Used to compute LessonRewardThreshold and LessonSuccessThreshold . |
Robustness |
Structure ({} ) |
- | Set parameters of robustness-enhancing training augmentation. See next rows. |
Robustness.MaxDroppedInputRate |
number<0..1> |
0 | Max rate of dropping brain inputs. |
Robustness.MaxDroppedActionRate |
number<0..1> |
0 | Max rate of dropping actions. |
Robustness.MaxInputDelay |
UInt32<0..100> |
0 | Max delay of brain inputs, in iterations. |
Robustness.MaxActionDelay |
UInt32<0..100> |
0 | Max delay of brain actions, in iterations. |
For example:
concept MyConcept(input: SimState): BrainAction {
curriculum {
training {
EpisodeIterationLimit: 250,
TotalIterationLimit: 100000
}
}
}
EpisodeIterationLimit
A new training episode begins after EpisodeIterationLimit
iterations if
the brain fails to reach a valid terminal condition.
TotalIterationLimit
The training engine ends training after TotalIterationLimit
iterations, even if training performance is still improving. See also NoProgressIterationLimit
.
Note
The actual number of training iterations may go slightly beyond the iteration limit to allow the last training batch to complete.
LessonRewardThreshold
Only supported for reward- and terminal-function based curriculum.
When a sufficient fraction of test episodes in an assessment (as indicated by
LessonSuccessThreshold
) have a cumulative reward that meets or exceeds the
LessonRewardThreshold
value, the training engine considers the lesson
complete and moves to the next lesson in the curriculum.
If the lesson definition does not include a reward threshold, the training engine uses a general convergence test to determine when the lesson is complete.
LessonSuccessThreshold
When the episode success rate in an assessment exceeds the
LessonSuccessThreshold
value, the training engine considers the lesson
complete and moves to the next lesson in the curriculum.
The value must be between 0 and 1 and represents a target fraction.
NoProgressIterationLimit
Training stops when brain performance has not improved in
NoProgressIterationLimit
iterations. The NoProgressIterationLimit
iteration
counter resets when training moves between lessons.
The NoProgressIterationLimit
iteration counter does not reset when
training is stopped or restarted. If training auto-terminates and you want to
continue, you can increase the value of NoProgressIterationLimit
and resume
training.
Note
Progress checks happen after each assessment. The actual number of training iterations may go beyond the iteration limit to allow an assessment to finish and the last training batch to complete.
LessonAssessmentWindow
Sets the number of test episodes per assessment. Assessments are groups of test episodes
periodically run to evaluate the AI during training.
Lesson transitions based on the LessonSuccessThreshold
and LessonRewardThreshold
parameters are evaluated after each assessment. Auto-termination of training (see
NoProgressIterationLimit
) is also based on assessment performance.
Robustness
The Robustness
clause configures several features of the Bonsai AI engine that
increase the robustness of the trained concept. Robustness helps address
differences between the training simulator and the real deployment environment.
For example:
concept MyConcept(input: SimState): BrainAction {
curriculum {
training {
Robustness: {
# drop up to 20% of brain inputs, replacing them with the previous value
MaxDroppedInputRate: 0.2,
# delay brain inputs by up to 10 iterations
MaxInputDelay: 10,
# Delay actions by up to 5 iterations
MaxActionDelay: 5,
# Drop up to 5% of actions, replacing them with the previous action
MaxDroppedActionRate: 0.05
}
}
}
}
The robustness features work by injecting delays or failures of brain inputs and concept outputs. Delays and failures force the concept to learn a policy that handles delayed or dropped states and actions appropriately. Robustness helps bridge the sim-to-real gap for concepts trained in a simulation that does not perfectly model sensor and actuator delays or errors.
Goals, rewards, and terminal conditions are still computed on the true state and action. The delays and drops are applied to the brain input and the actions sent to the environment as shown in the following figure:
Diagram showing where the robustness functionality fits in the training data flow.
Note
Robustness features only work when training with a simulator, not with a dataset.
MaxDroppedInputRate
, MaxDroppedActionRate
MaxDroppedInputRate
and MaxDroppedActionRate
configure the maximum rate at
which the brain input and the concept action are dropped. During training, the
system chooses a different drop rate for each episode to expose the concept to
a variety of conditions. When an input or action is dropped during training,
Bonsai uses the last known input or action value instead.
The first input and action is never dropped. The drop rate must be between 0 and 1, and defaults to 0.
MaxInputDelay
, MaxActionDelay
MaxInputDelay
and MaxActionDelay
configure the maximum allowable delay for
delivering brain inputs (MaxInputDelay
) or brain actions (MaxActionDelay
)
during training iterations.
During training, the system selects a different delay value for each episode to expose the concept to a variety of sensor and actuation delays.
The delay value must be an integer between 0 and 100, and defaults to 0.
Transform functions
In some cases, you need to translate communication between the simulator and the AI during training. For example,
- The simulator produces more information than the AI will have access to production.
- Action instructions the AI sends once deployed have a different format than what the simulator expects.
To perform translations on communication between the simulator and the AI, use a transform function. Inkling supports the following transform functions:
- State transform: used to translate information about the environment for consumption by the AI. For example, applying scaling values, converting measurement units, or aggregating values.
- Action transform: used to translate AI instructions for application within the simulated environment.
See Keyword: State (transform) for details on using state transforms and Keyword: Action (transform) for details on using action transforms.
Action masking
See Keyword: Mask for details on using action masks, which restrict the set of available actions for particular states.