Implement progressive deployment strategies
Azure Container Apps and GitHub Actions work together to implement progressive deployment strategies for agent systems. Because agent behavior changes are invisible to traditional infrastructure monitoring—a new system prompt or model version can produce fundamentally different outputs while latency and error rates stay unchanged—you can't rely on health checks alone to validate a deployment. Quality-gated rollout strategies built on Azure AI Evaluation SDK metrics give you the behavioral signal you need at each stage of the rollout.
| Strategy | Traffic Pattern | Rollback Speed | Best For |
|---|---|---|---|
| Canary | Gradual % increase | Moderate (hours) | Behavioral validation at scale |
| Blue-green | Instant 100% switch | Immediate (seconds) | Configuration changes, urgent fixes |
| Feature flags | Per-tenant activation | Immediate | Gradual feature rollout by customer |
Route traffic incrementally with canary deployments
Canary deployment routes a small percentage of requests to the new agent version while the majority continues using the stable version. Start with 5% of traffic going to the canary. Monitor quality metrics for 24 hours. If the canary performs within acceptable thresholds, increase to 25%, then 50%, then 100% over several days.
The quality gate determines success: if evaluation scores drop more than a defined threshold compared to the baseline, stop the rollout and roll back automatically. For Fabrikam's code review system, the primary quality metric is the accuracy of security vulnerability detection. A canary that misses 15% more vulnerabilities than the production version fails the quality gate.
Implement canary deployment using traffic splitting in Microsoft Foundry or Azure Container Apps. For agent endpoints deployed as Azure Container Apps, configure traffic splitting at the revision level:
# scripts/configure_canary.py
from azure.mgmt.appcontainers import ContainerAppsAPIClient
from azure.identity import DefaultAzureCredential
def configure_canary_deployment(
subscription_id: str,
resource_group: str,
container_app_name: str,
stable_revision: str,
canary_revision: str,
canary_weight: int
):
"""Configure traffic split between stable and canary agent revisions."""
credential = DefaultAzureCredential()
client = ContainerAppsAPIClient(credential, subscription_id)
# Get current container app configuration
container_app = client.container_apps.get(
resource_group_name=resource_group,
container_app_name=container_app_name
)
# Update traffic configuration
traffic_config = [
{
"revisionName": stable_revision,
"weight": 100 - canary_weight,
"label": "stable"
},
{
"revisionName": canary_revision,
"weight": canary_weight,
"label": "canary"
}
]
container_app.configuration.ingress.traffic = traffic_config
# Apply configuration
client.container_apps.begin_create_or_update(
resource_group_name=resource_group,
container_app_name=container_app_name,
container_app_envelope=container_app
).result()
print(f"Canary configured: {canary_weight}% traffic to {canary_revision}")
This script sets the traffic weight for the canary revision. Call it from a GitHub Actions workflow that gradually increases the canary percentage based on quality gate results.
Monitor quality metrics during progressive rollout
Quality gates assess whether the canary version maintains acceptable behavioral performance. Define baseline metrics from the stable version using evaluation runs over representative test data. During the canary period, evaluate the canary version against the same test data and compare results.
Key quality metrics for agent deployments include:
- Evaluation score: Overall quality rating from Azure AI Evaluation SDK benchmarks
- Error rate: Percentage of requests that fail or produce invalid output
- Latency P95: 95th percentile response time
- Task success rate: Percentage of requests that accomplish the intended task
Configure automated quality assessment in a GitHub Actions workflow that queries evaluation results and makes rollout decisions:
# .github/workflows/canary-quality-gate.yml
name: Canary Quality Gate Assessment
on:
schedule:
- cron: '0 */4 * * *' # Every 4 hours during canary period
jobs:
assess-quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run evaluation against canary revision
run: |
python scripts/run_evaluation.py \
--agent-endpoint ${{ vars.CANARY_ENDPOINT }} \
--test-dataset data/eval-set.jsonl \
--output results/canary-eval.json
- name: Compare against baseline metrics
id: compare
run: |
python scripts/compare_metrics.py \
--baseline results/baseline-eval.json \
--current results/canary-eval.json \
--threshold 0.05
- name: Make rollout decision
run: |
if [ "${{ steps.compare.outputs.decision }}" == "PASS" ]; then
echo "Quality gate passed - increasing canary traffic"
python scripts/increase_canary_weight.py --increment 20
else
echo "Quality gate failed - initiating rollback"
python scripts/rollback_canary.py
exit 1
fi
The workflow runs every four hours during the canary period. Quality metrics within 5% of baseline (the threshold parameter) advance canary traffic by 20%. Metrics that degrade beyond the threshold trigger an automatic rollback.
Switch instantly with blue-green deployments
Blue-green deployment maintains two complete environments: blue represents the current production version, green represents the new version. You deploy and validate the green environment fully before switching any traffic to it. Once validation passes, switch 100% of traffic from blue to green instantly. If problems emerge, switch back to blue immediately.
This strategy works well for configuration changes, infrastructure updates, or urgent security fixes where you need immediate full deployment or immediate full rollback. Use Azure Traffic Manager or Azure Front Door to switch traffic between blue and green endpoints at the DNS level.
The traffic switch happens by updating the Traffic Manager profile to route all requests to the green endpoint:
# Switch traffic from blue to green
az network traffic-manager endpoint update \
--name blue-endpoint \
--profile-name fabrikam-agents \
--resource-group fabrikam-production \
--type azureEndpoints \
--endpoint-status Disabled
az network traffic-manager endpoint update \
--name green-endpoint \
--profile-name fabrikam-agents \
--resource-group fabrikam-production \
--type azureEndpoints \
--endpoint-status Enabled
Blue-green deployments require maintaining double the infrastructure during the transition period, but they provide the fastest rollback path—re-enable the blue endpoint if the green version causes problems.
Control feature activation with feature flags
Feature flags decouple code deployment from feature activation. Deploy the new agent code everywhere, but activate new behaviors only for specific tenants using runtime configuration. This approach enables gradual feature rollout without redeployment and provides instant feature deactivation if problems occur.
Store feature flags in Azure App Configuration and configure agents to check flag state before using new behaviors:
# agents/orchestrator/feature_flags.py
from azure.appconfiguration import AzureAppConfigurationClient
from azure.identity import DefaultAzureCredential
class FeatureFlagManager:
def __init__(self, connection_string: str):
self.client = AzureAppConfigurationClient.from_connection_string(
connection_string
)
def is_enabled(self, feature_name: str, tenant_id: str = None) -> bool:
"""Check if a feature is enabled globally or for specific tenant."""
# Check tenant-specific override first
if tenant_id:
tenant_key = f"features:{feature_name}:tenants:{tenant_id}"
try:
config = self.client.get_configuration_setting(key=tenant_key)
return config.value.lower() == "true"
except:
pass # No tenant override exists
# Check global feature flag
global_key = f"features:{feature_name}:enabled"
config = self.client.get_configuration_setting(key=global_key)
return config.value.lower() == "true"
# Usage in agent code
feature_flags = FeatureFlagManager(os.environ["APPCONFIG_CONNECTION_STRING"])
if feature_flags.is_enabled("advanced_security_scan", tenant_id=current_tenant):
# Use new security scanning model
result = advanced_security_scan(code)
else:
# Use stable security scanning model
result = standard_security_scan(code)
Feature flags provide the most granular control over rollout. Activate the new security scanning model for Fabrikam's internal testing first, then for select beta customers, then for all customers—without touching deployment pipelines.
Combine strategies for comprehensive risk management
Use multiple progressive deployment strategies together. Deploy the new agent version to staging with full blue-green infrastructure. After staging validation passes, deploy to production using canary rollout. Control which customers see the new behavior using feature flags. This layered approach provides defense in depth: infrastructure validation, behavioral validation at scale, and per-tenant activation control.
Key takeaways
- Canary deployment routes a small percentage of traffic to the new version while monitoring quality gates before increasing exposure.
- Quality gates compare behavioral metrics—evaluation scores, error rates, and task success rates—against baseline thresholds to approve or reject rollouts.
- Blue-green deployment maintains two complete environments for instant traffic switching and immediate rollback capability.
- Feature flags decouple deployment from activation, enabling per-tenant feature rollout without redeployment.
- Layered strategies combine canary, blue-green, and feature flags for defense-in-depth risk management.