Keyword: Goal

Goals are a high-level specification of what you want the AI to learn. Use goals to let the training engine automatically determine appropriate reward functions and conditions for early termination.

A goal-based curriculum lets Bonsai report on training progress in terms of the concrete objectives you specify rather than abstract reward scores.

Important

You cannot use goals and define reward and terminal functions.

Usage

To use goals, include the Goal namespace at the beginning of your Inkling file and include the goal definition with one or more objectives in your curriculum definition.

using Goal

...

curriculum {
    ...
    goal (State: SimState, ConceptAction: Action) {
        `Don't Fall`: 
            avoid Math.Abs(State.Angle) 
            in Goal.RangeAbove(MaxAngle)
        SmallAngle: 
            drive Math.Abs(State.Angle) 
            in Goal.Range(0, MaxAngle/6)
        StayCentered: 
            drive Math.Abs(State.Position) 
            in Goal.Range(0, MaxPosition/10)
        SaveEnergy: 
            minimize ConceptAction.Force**2 
            in Goal.RangeBelow(TargetMeanSquaredForce)
    }
    training {
        EpisodeIterationLimit: MaxIterationCount
    }
}

In the example goal definition above:

  • State is required and refers to the state information provided by the training environment.
  • ConceptAction is optional and refers to the action taken to satisfy the concept that moved the environment to the current state.

The order in which you specify your objectives does not matter. By default, the training engine will try to satisfy the success criteria for all of them.

You can also specify a goal expression that describes how objectives should be combined, using a do statement with and and until operators.

For example,

goal (State: SimState, ConceptAction: Action) {
        `Don't Fall`: 
            avoid Math.Abs(State.Angle) 
            in Goal.RangeAbove(MaxAngle)
        SmallAngle: 
            drive Math.Abs(State.Angle) 
            in Goal.Range(0, MaxAngle/6)
        StayCentered: 
            drive Math.Abs(State.Position) 
            in Goal.Range(0, MaxPosition/10)
        SaveEnergy: 
            minimize ConceptAction.Force**2 
            in Goal.RangeBelow(TargetMeanSquaredForce)
        
        # Goal expression that describes how to combine objectives
        do 
           (`Don't Fall` and SmallAngle and StayCentered and SaveEnergy) until State.Time > 120
    }

Note

Bonsai treats the EpisodeIterationLimit parameter as an implicit subclause at the end of the do statement: until CurrentIteration > EpisodeIterationLimit.

See the Goal expressions reference for more details on constructing complex goal expressions.

Success and failure

As Bonsai runs each training episode, it keeps track of the success state of each objective and subclause in the do statement:

  • undecided: the objective or subclause is still being evaluated.
  • success: the brain met the objective or subclause.
  • failure: the brain could not meet the objective or subclause.

Eventually, every undecided objective and subclause will resolve to either success or failure, based on the applicable rules.

Important

Early episode termination: as soon as the success state of the overall expression moves from undecided to success or failure, Bonsai stops the training episode.

Resolving objectives

All objectives start undecided. How the objective resolves is specific to the meaning of the objective.

Success conditions:

  • avoid: the test value avoids the target region as long as necessary.
  • reach: the test value enters the target region at some point during the training episode.
  • drive: the test value is in the target region when the objective resolves.
  • minimize: the mean value of the test value is in the target region over all iterations up to when objective resolves.
  • maximize: the mean value of the test value is in the target region over all iterations up to when objective resolves.

Failure conditions:

  • avoid: the test value enters the target region at any point during training.
  • reach: the test value does not reach the target region before the objective resolves.
  • drive:
    • the objective DOES include a within k clause and the test value is out of the target region for more than k iterations.
    • the objective DOES NOT include a within k clause and the test value is not in the target region when the objective resolves.
  • minimize: the mean value of the test value is not in the target region over all completed iterations when the objective resolves.
  • maximize: the mean value of the test value is not in the target region over all completed iterations when the objective resolves.

Goal metrics

Bonsai computes and reports metrics for the goal statement overall and for the individual objectives. Each metric is averaged across test episodes in an assessment.

Universal metrics

  • Success rate: the fraction of episodes where the AI achieves the objective.
  • Goal satisfaction rate: the average progress toward satisfying the objective. A satisfaction of 100% means the AI successfully completed the objective.
  • Goal robustness: a measure of how robust the learned policy is to noise and perturbation. Robustness is negative if the objective fails.

Objective-specific metrics

Objective Metric Description
drive Percentage of iterations in target region The percentage of iterations where the test value was inside the target region after first reaching the target region. The total percentage is averaged over an assessment.
drive Max distance to target region The maximum distance from the test value to target region, averaged over a assessment.
reach Final distance to target region The mean final distance from the target region when the objective resolves.
reach Maximum iterations to reach target The longest consecutive number of iterations outside the the target region, averaged over an assessment.
minimize Mean value The mean of the test value during the episode.
maximize Mean value The mean of the test value during the episode.

You can adjust the following training parameters for the goal with the training clause:

Parameter Values Default Description
EpisodeIterationLimit Number.UInt32 1000 Total iterations allowed per training episode.

For example:

concept MyConcept(Input: SimState): ConceptAction {
  curriculum {
    training {
      EpisodeIterationLimit: 250
      LessonSuccessThreshold: 0.7,
    }
  }
}

EpisodeIterationLimit

The training engine terminates the training episode and begins a new one after EpisodeIterationLimit iterations if no terminal condition has been reached.

Examples

Keep the temperature between MinTemp and MaxTemp. If the temperature is ever outside that region, it must get back in the region within 10 iterations. If it does not, the training episode terminates and is marked a failure.

TemperatureInRange: drive S.Temperature in Goal.Range(MinTemp, MaxTemp) within 10

Minimize the temperature in the episode below at most MaxTemp. This will try to get the temperature, averaged over all iterations of the episode, as low as possible.

MinimizeAverageTemperature: minimize S.Temperature in Goal.RangeBelow(MaxTemp)

Drive the temperature in the episode below MaxTemp. This will try to get the temperature below MaxTemp as quickly as possible and also get the temperature of the final iteration in the episode as low as possible.

MinimizeFinalTemperature: drive S.Temperature in Goal.RangeBelow(MaxTemp)

Tip

You can find more goal examples in the Inkling cookbook.