Share via

SWA managed Functions serving stale code after successful deploys — Standard tier, UK South

Muhammad Rizwan 0 Reputation points
2026-05-24T15:14:02.1166667+00:00

Setup: Azure Static Web Apps Standard tier, UK South region (Microsoft.Web/staticSites). Affected endpoint: anonymous public signup POST on our SWA managed Functions deployment.

Summary

After multiple successful SWA Actions deploys (most recent: a commit containing fresh diagnostic logging), the running Functions worker continues to execute old code. Diagnostic log markers added in the new deploy do not appear in App Insights despite the deploy reporting success and Oryx confirming fresh artifact build.

Reproduction window: 2026-05-25 between approximately 12:00 and 15:32 UK time (UTC+1). Most recent reproduction:

  • Deploy via GitHub Actions: completed approximately 12:57 UTC on 2026-05-25
  • Cold-start retry: 2026-05-25 14:32:34 UTC
  • App Insights shows endpoint executed at 14:32:35.192 UTC
  • Old code path's warning log fired at 14:32:35.192 UTC (proving OLD code path executed)
  • New diagnostic log markers from new code: absent

Reproduction steps

Reproduced 4 times across 90 minutes plus a separate 60-minute cold-start retry:

  1. Code change committed and pushed to main
  2. SWA Actions workflow runs to completion (Status: Succeeded, ~1m30s duration)
  3. Oryx build log shows new commit SHA building fresh artifacts
  4. POST to the API endpoint via a frontend form
  5. App Insights shows endpoint executed, but old code path runs (warning log from old behaviour fires)
  6. New diagnostic log markers do not appear in any App Insights query

Hypotheses already ruled out

R1. Node.js require()-cache lifecycle stuck on warm worker.

Ruled out: 60+ minute idle window with zero endpoint traffic, then single cold-start retry, still serves old code. Cold-start by definition forces fresh require().

R2. SWA Actions deploy failure.

Ruled out: all 4 deploys reported success in GitHub Actions. Oryx build log explicitly confirms the new commit SHA is what was built.

R3. App settings causing module init to skip.

Ruled out: appsetting toggle (DIAG_FORCE_RESTART=<unix timestamp>) saved successfully and visible in az staticwebapp appsettings list output. Did not change running behaviour.

R4. Configurable restart via CLI.

Ruled out: az staticwebapp does not expose restart/stop/start/reload subcommands. No separate Microsoft.Web/sites Function App resource exists (confirmed via az resource list -g <our-resource-group>). All Function code encapsulated in the SWA resource.

Remaining hypotheses (cannot verify from customer side)

  • H1. WEBSITE_RUN_FROM_PACKAGE-style read-only mount serving stale package
  • H2. Build artifact / CDN caching layer not invalidating on deploy
  • H3. Deployment slot / environment split where deploys land in one slot but live URL serves another

These involve platform internals not visible from customer-side CLI or portal investigation.

Workaround status

  • Step 2 (trivial file-content change to invalidate hash-based caches): IN PROGRESS, will update this thread with results
  • Step 3 (SWA SKU toggle Free → Standard): NOT YET ATTEMPTED pending guidance on whether disruption is justified
  • No confirmed working workaround as of post submission

Looking for guidance on

A confirmed CLI or portal mechanism (or sequence of mechanisms) that forces SWA managed Functions to serve the most recent deployed code, applicable in scenarios where standard deploy mechanisms have apparently completed successfully but old code continues to run.

Specifically:

  • Which mechanism does SWA managed Functions use to serve Function code (RunFromPackage / direct filesystem / other)?
  • What does and does not force a code-refresh on the running worker?
  • If the supported mechanism is "scale tier to Free and back" or similar disruptive operation, is this the supported path?
  • If the mechanism does not exist and Functions configuration detachment/reattachment is required, what is the exact procedure?

Context

Working through pre-production verification discipline before broader rollout. Establishing deploy reliability with the platform is part of that.

Diagnostic data available on request

  • Full Oryx build logs from GitHub Actions
  • App Insights query results with timestamps showing the absent log markers
  • Complete log of all reproduction attempts with timestamps
  • Output of all az staticwebapp investigation commands run
  • Resource details available privately to Microsoft engineers if needed for backend correlation

Community engagement

If anyone else has encountered this pattern on SWA managed Functions, would value hearing your resolution path. Happy to help this thread become a community reference if we find a working mechanism.

Azure Static Web Apps
Azure Static Web Apps

An Azure service that provides streamlined full-stack web app development.


6 answers

Sort by: Most helpful
  1. Muhammad Rizwan 0 Reputation points
    2026-05-26T17:17:20.4866667+00:00

    @Golla Venkata Pavani Apologise the previous reply, which was accidentally pasted twice. The intended message is the first occurrence; please disregard the duplicate."

    Was this answer helpful?

    0 comments No comments

  2. Muhammad Rizwan 0 Reputation points
    2026-05-26T17:12:07.3833333+00:00

    Hi @Golla Venkata Pavani Thank you for the detailed analysis and the explicit recommendation that BYOF is the supported architecture for our requirements. We have now executed two tactical bypass attempts per your recommendation framing; both failed, and the evidence empirically confirms your diagnosis.

    Test results. Two code-side refactors against /api/public/request-access, each verified on a fresh post-deploy worker:

    • Pattern A (PR-1.7). Module-object property dereference replacing the destructured shared-module import (emailModule.getSuperAdminEmails(...)). Worker cloud_RoleInstance=0--a713afc4-..., first-seen +83 seconds after deploy completion. The module-load marker fired confirming the new shared module was in the require cache, but the function-body entry marker did not fire. The caller's .catch(() => []) resolved to an empty array without the source-defined function body executing.
    • Pattern B (PR-1.7b). Inlined the SELECT directly in the caller file, eliminating the shared-module call entirely. Worker cloud_RoleInstance=0--c428b6fb-..., first-seen +78 seconds after deploy completion. Same outcome — no function-body markers fired. Additionally, a context.warn log call later in the same outer async Promise.allSettled IIFE that successfully reached App Insights under Pattern A was silently dropped under Pattern B, on the same code structure at the same line of caller code — a strict-superset failure mode. Handler latency 186 ms vs Pattern A's 341 ms, consistent with the inline IIFE either not executing or short-circuiting before any work.

    The strict-superset finding is the decisive piece of evidence. Under standard Node/V8 semantics, replacing a shared-module function call with an equivalent inlined async IIFE cannot cause a previously-working synchronous log call later in the same outer IIFE to silently fail. This is consistent with the managed Functions runtime exhibiting behaviour inconsistent with documented Node module semantics under our configuration (system-assigned MI + Key Vault references on the API layer). No code-side mitigation we have tried has survived it. Your BYOF recommendation is confirmed.

    Full evidence trail. Our internal runbook §3.6 documents both tests side-by-side: test conditions, full marker tick tables, fresh-worker cross-check timestamps, the handler-latency delta evidence, and the canonical statement of the strict-superset finding in §3.6.4. Permalink to the file at the current main commit:

    https://github.com/mrizwanvu/esendplan-app/blob/843ff07/docs/runbooks/swa-managed-functions-deploy-refresh.md

    Scroll to §3.6 (after §3.5). If the repository's access requires a collaborator grant on your end, please let us know and we will arrange.

    Migration timeline. We are scheduling the BYOF migration in the coming week, following a five-phase plan: baseline measurements, resource provisioning, parallel deploy, ≥5 business days of parallel-run validation, decommission of the managed Functions configuration. The Phase 1 baseline inventory is in progress as of this reply.

    Three forward-looking architecture questions to inform the migration design, where your team's experience with similar workloads exceeds our local guesswork:

    1. Hosting plan tier. UK Local-Authority-facing SaaS at pre-pilot, ~50 HTTP-triggered functions, system-assigned MI + Key Vault references, Postgres Flexible Server in UK South, bursty traffic around morning school sign-in. Is Consumption-plan BYOF sufficient initially, or do you recommend Premium from day one?
    2. Cutover ordering with existing MSI + KV references. Recommended ordering for granting Key Vault access to the new Function App's MSI relative to its first deploy. Also: does az staticwebapp functions link fully purge any managed-Functions deployment cache before the new Function App becomes authoritative, or is there state coupling we should plan around?
    3. Monitoring for LA DPO compliance. Recommended Azure Monitor / App Insights configuration for a Function App handling UK Local-Authority data — dashboards you would keep always-on for a procurement reviewer, default alert rules for a Function App handling LA data, and any built-in Functions metrics that map cleanly to UK GDPR Article 32 "appropriate technical measures" reporting.

    We will report back with cutover confirmation and ≥5-day-stable parallel-run results, at which point this ticket can be closed. Thank you again for the substantive guidance — the explicit written recommendation has been valuable in scoping the migration internally.Thank you for the detailed analysis and the explicit recommendation that BYOF is the supported architecture for our requirements. We have now executed two tactical bypass attempts per your recommendation framing; both failed, and the evidence empirically confirms your diagnosis.

    Test results. Two code-side refactors against /api/public/request-access, each verified on a fresh post-deploy worker:

    • Pattern A (PR-1.7). Module-object property dereference replacing the destructured shared-module import (emailModule.getSuperAdminEmails(...)). Worker cloud_RoleInstance=0--a713afc4-..., first-seen +83 seconds after deploy completion. The module-load marker fired confirming the new shared module was in the require cache, but the function-body entry marker did not fire. The caller's .catch(() => []) resolved to an empty array without the source-defined function body executing.
    • Pattern B (PR-1.7b). Inlined the SELECT directly in the caller file, eliminating the shared-module call entirely. Worker cloud_RoleInstance=0--c428b6fb-..., first-seen +78 seconds after deploy completion. Same outcome — no function-body markers fired. Additionally, a context.warn log call later in the same outer async Promise.allSettled IIFE that successfully reached App Insights under Pattern A was silently dropped under Pattern B, on the same code structure at the same line of caller code — a strict-superset failure mode. Handler latency 186 ms vs Pattern A's 341 ms, consistent with the inline IIFE either not executing or short-circuiting before any work.

    The strict-superset finding is the decisive piece of evidence. Under standard Node/V8 semantics, replacing a shared-module function call with an equivalent inlined async IIFE cannot cause a previously-working synchronous log call later in the same outer IIFE to silently fail. This is consistent with the managed Functions runtime exhibiting behaviour inconsistent with documented Node module semantics under our configuration (system-assigned MI + Key Vault references on the API layer). No code-side mitigation we have tried has survived it. Your BYOF recommendation is confirmed.

    Full evidence trail. Our internal runbook §3.6 documents both tests side-by-side: test conditions, full marker tick tables, fresh-worker cross-check timestamps, the handler-latency delta evidence, and the canonical statement of the strict-superset finding in §3.6.4. Permalink to the file at the current main commit:

    https://github.com/mrizwanvu/esendplan-app/blob/843ff07/docs/runbooks/swa-managed-functions-deploy-refresh.md

    Scroll to §3.6 (after §3.5). If the repository's access requires a collaborator grant on your end, please let us know and we will arrange.

    Migration timeline. We are scheduling the BYOF migration in the coming week, following a five-phase plan: baseline measurements, resource provisioning, parallel deploy, ≥5 business days of parallel-run validation, decommission of the managed Functions configuration. The Phase 1 baseline inventory is in progress as of this reply.

    Three forward-looking architecture questions to inform the migration design, where your team's experience with similar workloads exceeds our local guesswork:

    1. Hosting plan tier. UK Local-Authority-facing SaaS at pre-pilot, ~50 HTTP-triggered functions, system-assigned MI + Key Vault references, Postgres Flexible Server in UK South, bursty traffic around morning school sign-in. Is Consumption-plan BYOF sufficient initially, or do you recommend Premium from day one?
    2. Cutover ordering with existing MSI + KV references. Recommended ordering for granting Key Vault access to the new Function App's MSI relative to its first deploy. Also: does az staticwebapp functions link fully purge any managed-Functions deployment cache before the new Function App becomes authoritative, or is there state coupling we should plan around?
    3. Monitoring for LA DPO compliance. Recommended Azure Monitor / App Insights configuration for a Function App handling UK Local-Authority data — dashboards you would keep always-on for a procurement reviewer, default alert rules for a Function App handling LA data, and any built-in Functions metrics that map cleanly to UK GDPR Article 32 "appropriate technical measures" reporting.

    We will report back with cutover confirmation and ≥5-day-stable parallel-run results, at which point this ticket can be closed. Thank you again for the substantive guidance — the explicit written recommendation has been valuable in scoping the migration internally.

    Was this answer helpful?

    0 comments No comments

  3. Muhammad Rizwan 0 Reputation points
    2026-05-26T12:15:09.3766667+00:00

    Hi Pavani thanks for following up.

    Your previous guidance was substantively helpful. We're adopting the BYOF long-term recommendation and have

    scheduled the migration. In the interim, we're applying a dynamic property-access refactor to mitigate the stale-

    binding symptom. The SKU toggle workaround was blocked by active Managed Identity (MI must be disabled before SKU change, which requires re-granting Key Vault access policies post- toggle) — worth noting for future customers in similar configurations. I'll post back to this thread once the BYOF migration is complete, in case it helps others encountering the same

    managed Functions + MI + Key Vault references combination.

    Thanks again for the engagement.

    Was this answer helpful?

    0 comments No comments

  4. Muhammad Rizwan 0 Reputation points
    2026-05-24T20:11:31.2433333+00:00

    Update — sharper evidence after further diagnostics

    This is a follow-up update to the original investigation in this thread. The original 4 ruled-out hypotheses (R1–R4) still hold; the new evidence below sharpens the open question substantially.

    Two further workarounds attempted and empirically falsified: (a) trivial file-touch on the caller function file to change its file hash and force re-binding to the freshly-loaded shared module; (b) a full source-folder rename (e.g. api/api2/) plus matching workflow YAML update (api_location), to force the SWA managed Functions deploy pipeline to treat the deployment as a brand-new managed-functions registration. Both deployed cleanly. Both failed to resolve the symptom.

    Four facts verified in App Insights for the latest attempt:

    1. A new Function host worker spawned aligned with the latest merge-to-main deploy completion. The worker's "Initializing Warmup Extension" log timestamp is within seconds of the deploy run finishing. Confirmed via a new cloud_RoleInstance GUID. Fresh process, not warm-worker continuation.
    2. The shared module loaded fresh in the new worker. A console.log at module-load time (line 1 of the shared module, outside any function body) appears in App Insights traces under the new worker's cloud_RoleInstance shortly after the warmup-init log. Node's require() returned new file contents to this process.
    3. The caller function executed successfully on the new code path. A unique log line from inside the caller's function body fires, and a database INSERT that depends on a new permissions pattern from the same PR succeeds. The caller file is the new version.
    4. The shared module's function-body diagnostic markers do NOT fire. A log('...entered v2') placed as the first executable statement inside the shared module's exported function — before any try/catch — is absent from App Insights, while logs from immediately before the call (in the caller) and immediately after the call (in the caller's empty-result branch) both fire normally. Under standard Node.js semantics this should not occur: if the function were called, the entry log must fire before any failure mode.

    The module-load layer is healthy: fresh workers reliably load the new shared module on every fresh deploy. The caller-binding layer is the active pathology: destructured function references inside caller files appear to remain bound to a pre-deploy version of the exported function even after a fresh worker process spawn that successfully reloads the shared module. PR-level workarounds that change the caller's file hash, the shared module's file hash, and even the entire source folder's path do not clear this binding.

    Question for the SWA team: what platform mechanism caches the caller's function binding across what is otherwise a clean worker process restart with a fresh module load? Plausible candidates I cannot confirm without platform-internal visibility — @azure/functions handler registry capturing references at registration time; worker-pool pre-warming inheriting handler bindings from a parent process; or a managed-Functions deployment-pipeline layer caching function metadata above the Node process itself.

    One configuration observation worth noting: az staticwebapp environment list shows several preview environments persisting in Ready status from old feature branches (some weeks old, source PRs closed). Each has its own distinct hostname so URL-level routing is not the concern. The open question is whether managed-Functions worker compute is isolated per-environment or shared across environments in a single SWA resource. If shared, function-handler bindings established by an earlier preview-environment deploy could conceivably persist into the production worker pool. I cannot test this from the customer side.

    Happy to share full repro details — cloud_RoleInstance GUIDs, exact timestamps, full App Insights query results — privately via support channel for backend correlation.

    Was this answer helpful?

    0 comments No comments

  5. AI answer

    2026-05-24T15:14:24.0366667+00:00

    For Static Web Apps with managed Functions, the supported way to force new code to run is to ensure that:

    1. The latest build artifacts are actually being used by the SWA environment.
    2. The Functions runtime has restarted and loaded the new artifacts.

    From the available information, the following mechanisms and checks are supported.

    1. Confirm the deployment actually produced the expected artifacts

    Static Web Apps uses the Oryx build process and then validates the configured folders:

    • app_location – root of the frontend source
    • output_location – build output folder under app_location
    • api_location – root of the Azure Functions app for managed functions

    If these are misconfigured, SWA can successfully run the workflow while still serving older artifacts.

    Actions:

    • In the GitHub Actions workflow, verify that app_location, output_location, and api_location match the repo layout and that the Functions app is under the configured api_location.
    • Check the workflow logs for any of the folder-configuration errors listed in the troubleshooting article (for example, “App Directory Location: '…' is invalid” or “Either no API directory was specified, or the specified directory wasn't found”).

    If api_location is wrong or missing, the build may succeed but the live /api route continues to use the previous managed Functions app.

    1. Verify that the managed Functions app is the one serving /api

    For SWA managed Functions:

    • The /api route is fixed.
    • The source folder for the managed Functions app is controlled by api_location in the workflow.

    If api_location is changed or misaligned, SWA can end up with an older managed Functions app still serving /api while new builds are going to a different folder that is not wired to the live route.

    Actions:

    • Confirm that the api_location in the workflow is stable and points to the same folder across all commits.
    • Ensure there is no second Functions app or alternative API configuration (such as API Management or App Service) mapped to /api that might be serving an older backend.
    1. Use Application Insights to confirm which code path is running

    Application Insights is the supported way to see runtime behavior for both Static Web Apps and managed Functions:

    • For SWA, enable Application Insights on the static web app.
    • Use the Failures and operation details views to inspect the exact timestamps and stack traces for the /api endpoint.

    Actions:

    • In the portal, open the resource group → Application Insights instance for the SWA.
    • Under Investigate → Failures, drill into failed or successful operations for the signup endpoint and confirm whether any new log markers appear after the deployment time.

    If no new markers appear at all, this reinforces that the live Functions host is still using older artifacts or a different API backend.

    1. What forces a code refresh for SWA managed Functions

    The platform behavior for managed Functions in Static Web Apps is:

    • The Functions code is deployed as part of the SWA build and deployment pipeline.
    • A successful SWA deployment that includes the api_location content is expected to update the managed Functions app and cause the Functions host to load the new package.

    There is no separate restart/stop/start command for managed Functions; the refresh is tied to the SWA deployment itself. The documented mechanisms to influence runtime behavior are:

    • Correcting the workflow configuration so the right folders are built and deployed.
    • Redeploying via the SWA workflow so that new artifacts are produced and swapped in.
    • Using Application Insights to verify that the new code is executing.
    1. Recommended end-to-end path for this scenario

    Given the symptoms (successful builds, old code executing), the most actionable steps supported by the documentation are:

    1. In the GitHub Actions workflow YAML:
      • Confirm app_location, output_location, and especially api_location are correct for the repo.
      • Ensure output_location is actually produced by the build (for example, the folder exists after npm run build).
    2. Re-run the SWA workflow after any configuration correction and verify in the logs that:
      • Oryx builds the expected commit.
      • No folder-configuration errors are reported.
    3. In Application Insights for the SWA:
      • Use Failures and operation details to confirm that requests after the deployment time show the new diagnostic log markers.

    If, after confirming folder configuration and redeploying, the Functions host still serves old code, this becomes a platform-side issue that requires Azure support to inspect the SWA’s internal deployment state. The public documentation does not expose an additional CLI/portal “force refresh” operation beyond correcting configuration and redeploying.

    There is no documented guidance that scaling between Free and Standard, or detaching/reattaching Functions, is required or supported as a normal mechanism to force managed Functions to pick up new code.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.