Define evaluation signals and enforce quality gates

Completed

In this unit, you'll learn:

  • How to define success criteria for agent tasks

  • How to use pull request checks and workflows for evaluation

  • How to enforce quality using required checks and rules

  • How to incorporate security scanning into evaluation

Defining success criteria

Evaluation begins with clear success criteria.

In GitHub workflows, success criteria should be defined in the issue or pull request. These criteria describe what must be true for the task to be considered complete.

For example:

  • A feature behaves as expected

  • Tests pass successfully

  • No new security issues are introduced

In practice:

  • Write acceptance criteria directly in the issue

  • Reference those criteria in the pull request

  • Use them as the basis for validation

Clear criteria allow both the agent and the reviewer to verify completion.

Using pull request checks for evaluation

Pull requests are the primary place where evaluation occurs.

GitHub displays evaluation signals through:

  • Status checks

  • Workflow runs

  • Check results in the “Checks” tab

In practice:

  • Configure workflows to run on pull request or push events

  • Ensure tests and validations run automatically

  • Review results in the pull request before merging

These checks provide feedback on whether changes meet the required standards.

Using workflows to validate changes

Workflows powered by GitHub Actions are used to automate evaluation.

Common workflow steps include:

  1. Running tests

  2. Linting code

  3. Building the application

Example trigger:

on:
  pull_request:
   branches: [main]

In practice:

  • Add workflows in .github/workflows/

  • Ensure they run on pull requests or pushes

  • Use workflow results as the source of validation

Each workflow run produces logs and results that are visible in the pull request.

Enforcing quality with required checks

GitHub allows you to enforce evaluation through required status checks.

Required checks ensure that certain conditions must be met before a pull request can be merged.

In practice:

  • Configure branch protection rules

  • Enable "Require status checks to pass before merging"

  • Select specific checks to enforce

This ensures that all required checks must pass before merging.

GitHub also supports requiring branches to be up to date before merging, depending on configuration.

Using workflow outputs for visibility

Workflows produce logs and artifacts that support evaluation.

In practice:

  • Review logs directly in the Actions tab

  • Use artifacts to store outputs such as test results or reports

  • Link workflow runs in pull requests for visibility

Artifacts and logs are retained for a limited time and can be reviewed during that period.

By default, GitHub stores workflow logs and artifacts for 90 days, after which they are automatically deleted.

Incorporating security into evaluation

Evaluation should include security checks to prevent unsafe changes.

In GitHub, this can include:

  • Code scanning (for vulnerabilities)

  • Dependency review checks

  • Secret scanning and push protection

In practice:

  • Enable available security features for the repository

  • Treat security alerts or failed checks as blockers

  • Review security results in the pull request

These checks help ensure that changes are safe before merging.

Using rules and protections

GitHub provides controls to enforce evaluation policies.

These include:

  • Branch protection rules

  • Required pull request reviews

  • Required status checks

In practice:

  • Require at least one approving review before merging

  • Combine reviews with required checks

  • Prevent direct pushes to protected branches

Branch protection rules enforce these requirements before changes can be merged.

End-to-end evaluation flow

A typical evaluation flow in GitHub looks like:

  1. An issue defines the task and success criteria

  2. An agent creates a branch and opens a pull request

  3. Workflows run automatically

  4. Status checks appear in the pull request

  5. Required checks must pass

  6. A reviewer approves the changes

  7. The pull request is merged

This ensures that all changes are validated before being accepted.

Key takeaway

Evaluation defines whether agent work is complete and correct. By using GitHub workflows, pull request checks, required status checks, and security scanning, you can enforce consistent quality and ensure that all changes meet defined expectations before merging.