An Azure service that provides a platform-managed, scalable, and highly available application delivery controller as a service.
Disabling request or response buffering in Azure Application Gateway changes how the gateway handles the HTTP body between the client and the backend. With buffering enabled, the gateway reads the entire request or response body before forwarding it. With buffering disabled, the gateway streams the data between client and backend as it arrives. This mainly affects flow control, inspection behavior, and error handling.
As far as I recall, request buffering cannot be disabled when the Application Gateway is running the WAF_v2 SKU. The Web Application Firewall requires access to the full request body in order to evaluate its rule set, including rules that inspect POST payloads and other request content. The platform enforces request buffering when WAF is enabled. In practice, this means request streaming is not supported on WAF-enabled gateways and the request body must be fully buffered before being sent to the backend. So effectively disabling request buffering is possible on the Standard_v2 SKU or on gateways where WAF is not enabled.
Response buffering is independent of this behavior because WAF primarily inspects inbound requests rather than outbound responses. As a result, response buffering can still be disabled even when WAF is enabled.
SSL termination is not affected by buffering settings. TLS negotiation, certificate handling, and decryption still occur at the gateway in the same way. The gateway continues to terminate TLS connections and forward decrypted HTTP traffic to the backend according to the configured HTTP settings. Buffering only changes how the HTTP payload is handled after decryption.
Routing behavior is also unchanged. Listener selection, host and path-based routing, rewrite rules, and backend pool selection rely on request metadata such as the host header, URI path, and other headers. These elements are available immediately when the request arrives and do not require the request body to be buffered, so disabling response buffering does not alter routing decisions.
Large uploads are one scenario where request buffering would normally be a concern because buffering requires the gateway to receive the entire payload before sending it to the backend. This increases latency before the backend begins processing and increases temporary memory or disk usage on the gateway. However, because request buffering cannot be disabled when WAF is enabled, large upload workloads behind WAF-enabled gateways will still pass through the full buffering process.
Streaming responses are a common case where disabling response buffering is beneficial. Examples include server-sent events, long-running report generation that streams incremental output, media streaming, or APIs designed to deliver partial results as they become available. When response buffering is enabled, the gateway waits for the entire backend response before returning it to the client, which prevents real-time streaming and increases perceived latency. Disabling response buffering allows the gateway to forward backend data to the client as soon as it is received.
Slow backend services can behave differently when response buffering is disabled. With buffering enabled, the gateway can receive the full response quickly and then transmit it to the client independently. When response buffering is disabled, the client connection is tied more directly to the backend’s response rate. If the backend produces data slowly, the client will observe that delay directly and the connection will remain open for longer periods.
Slow clients can also influence backend connection lifetimes when response buffering is disabled. With buffering enabled, the gateway can read the backend response quickly and deliver it to the client at the client’s pace. Without buffering, a slow client may cause the backend connection to remain open longer because the gateway forwards the response stream at the rate the client can receive it.
Another operational consideration involves retries and failure handling. When a full request or response is buffered, the gateway may have more flexibility to retry certain backend failures or to return consistent error responses. When streaming is used, once a portion of the response has already been sent to the client, retrying the backend request or altering the response becomes much more limited.
Resource usage patterns also change. Disabling response buffering can reduce memory pressure on the gateway because full responses are not stored before transmission. At the same time, connections may remain open longer, which can increase concurrent connection counts and affect throughput characteristics.
For environments where Application Gateway fronts heterogeneous backends such as AKS services, Service Fabric applications, and Azure App Service workloads, the practical risks associated with disabling buffering mainly relate to response streaming behavior, tighter coupling between backend response speed and client delivery, and reduced flexibility in error handling once a response stream has begun.
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin