Keyword: Goal
Goals are a high-level specification of what you want the AI to learn. Use goals to let the training engine automatically determine appropriate reward functions and conditions for early termination.
A goal-based curriculum lets Bonsai report on training progress in terms of the concrete objectives you specify rather than abstract reward scores.
Important
You cannot use goals and define reward and terminal functions.
Usage
To use goals, include the Goal
namespace at the beginning of your Inkling
file and include the goal
definition with one or more objectives in your
curriculum definition.
using Goal
...
curriculum {
...
goal (State: SimState, ConceptAction: Action) {
`Don't Fall`:
avoid Math.Abs(State.Angle)
in Goal.RangeAbove(MaxAngle)
SmallAngle:
drive Math.Abs(State.Angle)
in Goal.Range(0, MaxAngle/6)
StayCentered:
drive Math.Abs(State.Position)
in Goal.Range(0, MaxPosition/10)
SaveEnergy:
minimize ConceptAction.Force**2
in Goal.RangeBelow(TargetMeanSquaredForce)
}
training {
EpisodeIterationLimit: MaxIterationCount
}
}
In the example goal
definition above:
State
is required and refers to the state information provided by the training environment.ConceptAction
is optional and refers to the action taken to satisfy the concept that moved the environment to the current state.
The order in which you specify your objectives does not matter. By default, the training engine will try to satisfy the success criteria for all of them.
You can also specify a goal expression that describes
how objectives should be combined, using a do
statement with and
and until
operators.
For example,
goal (State: SimState, ConceptAction: Action) {
`Don't Fall`:
avoid Math.Abs(State.Angle)
in Goal.RangeAbove(MaxAngle)
SmallAngle:
drive Math.Abs(State.Angle)
in Goal.Range(0, MaxAngle/6)
StayCentered:
drive Math.Abs(State.Position)
in Goal.Range(0, MaxPosition/10)
SaveEnergy:
minimize ConceptAction.Force**2
in Goal.RangeBelow(TargetMeanSquaredForce)
# Goal expression that describes how to combine objectives
do
(`Don't Fall` and SmallAngle and StayCentered and SaveEnergy) until State.Time > 120
}
Note
Bonsai treats the EpisodeIterationLimit
parameter as an implicit subclause
at the end of the do
statement:
until CurrentIteration > EpisodeIterationLimit
.
See the Goal expressions reference for more details on constructing complex goal expressions.
Success and failure
As Bonsai runs each training episode, it keeps track of the success state of
each objective and subclause in the do
statement:
- undecided: the objective or subclause is still being evaluated.
- success: the brain met the objective or subclause.
- failure: the brain could not meet the objective or subclause.
Eventually, every undecided objective and subclause will resolve to either success or failure, based on the applicable rules.
Important
Early episode termination: as soon as the success state of the overall expression moves from undecided to success or failure, Bonsai stops the training episode.
Resolving objectives
All objectives start undecided. How the objective resolves is specific to the meaning of the objective.
Success conditions:
avoid
: the test value avoids the target region as long as necessary.reach
: the test value enters the target region at some point during the training episode.drive
: the test value is in the target region when the objective resolves.minimize
: the mean value of the test value is in the target region over all iterations up to when objective resolves.maximize
: the mean value of the test value is in the target region over all iterations up to when objective resolves.
Failure conditions:
avoid
: the test value enters the target region at any point during training.reach
: the test value does not reach the target region before the objective resolves.drive
:- the objective DOES include a
within k
clause and the test value is out of the target region for more thank
iterations. - the objective DOES NOT include a
within k
clause and the test value is not in the target region when the objective resolves.
- the objective DOES include a
minimize
: the mean value of the test value is not in the target region over all completed iterations when the objective resolves.maximize
: the mean value of the test value is not in the target region over all completed iterations when the objective resolves.
Goal metrics
Bonsai computes and reports metrics for the goal statement overall and for the individual objectives. Each metric is averaged across test episodes in an assessment.
Universal metrics
- Success rate: the fraction of episodes where the AI achieves the objective.
- Goal satisfaction rate: the average progress toward satisfying the objective. A satisfaction of 100% means the AI successfully completed the objective.
- Goal robustness: a measure of how robust the learned policy is to noise and perturbation. Robustness is negative if the objective fails.
Objective-specific metrics
Objective | Metric | Description |
---|---|---|
drive |
Percentage of iterations in target region | The percentage of iterations where the test value was inside the target region after first reaching the target region. The total percentage is averaged over an assessment. |
drive |
Max distance to target region | The maximum distance from the test value to target region, averaged over a assessment. |
reach |
Final distance to target region | The mean final distance from the target region when the objective resolves. |
reach |
Maximum iterations to reach target | The longest consecutive number of iterations outside the the target region, averaged over an assessment. |
minimize |
Mean value | The mean of the test value during the episode. |
maximize |
Mean value | The mean of the test value during the episode. |
Goal-related training parameters
You can adjust the following training parameters for the goal with the
training
clause:
Parameter | Values | Default | Description |
---|---|---|---|
EpisodeIterationLimit |
Number.UInt32 |
1000 | Total iterations allowed per training episode. |
For example:
concept MyConcept(Input: SimState): ConceptAction {
curriculum {
training {
EpisodeIterationLimit: 250
LessonSuccessThreshold: 0.7,
}
}
}
EpisodeIterationLimit
The training engine terminates the training episode and begins a new one after
EpisodeIterationLimit
iterations if no terminal condition has been reached.
Examples
Keep the temperature between MinTemp and MaxTemp. If the temperature is ever outside that region, it must get back in the region within 10 iterations. If it does not, the training episode terminates and is marked a failure.
TemperatureInRange: drive S.Temperature in Goal.Range(MinTemp, MaxTemp) within 10
Minimize the temperature in the episode below at most MaxTemp. This will try to get the temperature, averaged over all iterations of the episode, as low as possible.
MinimizeAverageTemperature: minimize S.Temperature in Goal.RangeBelow(MaxTemp)
Drive the temperature in the episode below MaxTemp. This will try to get the temperature below MaxTemp as quickly as possible and also get the temperature of the final iteration in the episode as low as possible.
MinimizeFinalTemperature: drive S.Temperature in Goal.RangeBelow(MaxTemp)
Tip
You can find more goal examples in the Inkling cookbook.