What Is 502 Bad Gateway and How to Fix It
A 502 Bad Gateway means a proxy got an invalid response from an upstream. This guide walks through what actually causes it, how to diagnose the real source (not the surface symptom), and the nginx, Cloudflare, AWS, and Vercel specific fixes I reach for when a customer’s site lights up red on the dashboard.
I have spent the last several years running production infrastructure on bare metal at Hetzner, managing reverse proxies, and more recently building Visual Sentinel (which sits outside people’s infrastructure and watches for this exact class of error). A 502 Bad Gateway is one of the more honest errors HTTP has. Its meaning is precise, the diagnosis path is deterministic, and the fix is almost always upstream of where people first look.
The trap is that “502” feels like a single thing when it’s really a family of at least seven distinct failures wearing the same uniform. In this guide I’ll cover all seven, show you how to tell them apart in thirty seconds, and walk through the platform-specific fixes for nginx, Cloudflare, AWS ALB, and Vercel.
1. What 502 Bad Gateway Actually Means
Per RFC 9110, a 502 response means the server (acting as a gateway) received an invalid response from an upstream server it contacted while trying to fulfill the request. “Invalid” covers three concrete cases:
- The upstream refused the connection outright (nothing listening).
- The upstream accepted the connection but closed it before sending a complete, well-formed response.
- The upstream returned something the gateway could not parse as HTTP (garbled headers, oversize headers, broken chunked encoding, TLS handshake failure).
Whenever you see a 502, there are always at least two servers involved: the “gateway” the browser talks to (nginx, Cloudflare, an AWS ALB, Vercel’s edge, a Kubernetes ingress) and the “upstream” behind it (your application, a microservice, a PHP-FPM pool, a Lambda). The 502 is the gateway’s way of saying “I got your request, I tried to pass it along, and the thing behind me let me down.”
This is why refreshing your browser rarely fixes it. Nothing on the visitor side caused the error; something between the gateway and the origin is broken.
2. The Seven Real Causes of a 502
Every 502 I’ve diagnosed in the last few years has fallen into one of these seven buckets. The order below is roughly how often each one comes up in practice, not alphabetically.
Origin app crashed
The application behind your proxy (Node, PHP-FPM, Gunicorn, Puma) exited and the port stopped listening. Proxy attempts connect, gets refused, returns 502.
Worker pool exhausted
PHP-FPM / Passenger / Unicorn is alive but every worker is busy. New requests hit the backlog, the proxy gives up and returns 502.
Firewall or security group
A recently changed iptables rule, UFW, Cloudflare firewall, or AWS security group blocked the proxy’s own IP from reaching the origin port.
DNS resolution failed
If the proxy resolves the upstream by hostname (common on AWS ALB, Cloudflare, Kubernetes ingress), a DNS change or TTL expiry can silently break the path.
Timeout under load
The origin takes longer than proxy_read_timeout or proxy_connect_timeout to respond. nginx surfaces this as 502 in many configurations, not 504.
TLS handshake failure
Proxy connects over TLS, origin cert expired / hostname mismatch / self-signed and proxy rejects it. This is common when a cert autorotates and the proxy is not reloaded.
Protocol mismatch
Proxy speaks HTTP/2 to an origin that only groks HTTP/1.1, or sends large headers the origin buffer cannot hold. The origin closes the connection mid-response, proxy reports 502.
3. If You’re a Visitor: What You Can Actually Do
Honest answer: not much. A 502 is a server-side failure and the fix lives on the site owner’s infrastructure. That said, five things are worth trying before you give up or report it:
- Hard refresh. Sometimes your browser or a CDN edge near you cached a brief error state. Press Ctrl+Shift+R (Cmd+Shift+R on Mac) to bypass cache.
- Wait 30 seconds and try again. Most 502s are transient spikes, a bounced service, a brief GC pause, a deploy in progress. Real outages tend to last longer and hit the site owner’s dashboard first.
- Try a different network. Switch from WiFi to cellular, or vice versa. If the site loads from mobile data but fails on your home WiFi, your ISP is routing you to a partially-broken CDN edge.
- Check if it’s just you. Open our free website checker (or any “is-it-down” service) to see if the site is returning 502 for everyone or just for your region.
- Report it to the site owner. If the site has a status page, check there first. Otherwise an email or a mention on Twitter/Mastodon frequently reaches someone faster than their own monitoring (especially for small sites).
What won’t help: reinstalling your browser, switching DNS servers, or running a virus scan. None of those touch the gateway-to-origin path that broke.
4. If You Run the Site: Diagnose From Upstream Outward
The single rule that will save you hours: always diagnose 502s from the application outward, not from the edge inward. Nine times out of ten the real cause is on the origin host, and every layer above it (proxy, CDN, DNS) is just faithfully reporting “I couldn’t get an answer.”
Step 1: Is the origin even running?
SSH into the origin. Does the application process exist? Is the port listening?
# On the origin host
ss -tlnp | grep :3000 # replace 3000 with your app's port
ps aux | grep -i node # or php-fpm / gunicorn / puma
curl -v http://127.0.0.1:3000/healthIf curl localhostfails, the app itself is down or stuck. Start it (or your process manager will) and watch for it to crash again. A crash-loop usually shows in the app’s own logs, not nginx’s.
Step 2: Read the reverse-proxy error log
The access log will tell you a request returned 502, which you already knew. The error log will tell you why. For nginx:
tail -f /var/log/nginx/error.log
# while you reproduce the request in another terminal
curl -v https://yoursite.com/the-failing-pathLook for the word upstream. The first clause after it is the exact failure mode: connection refused, timed out, prematurely closed connection, SSL handshake failed. That one phrase narrows the root cause down to one of the seven buckets in section 2.
Step 3: Can the proxy actually reach the origin?
From the proxy host (not from your laptop), curl the origin exactly the way nginx would:
# From the proxy host, with the same target nginx uses
curl -v http://<upstream-host>:<port>/
curl -v --resolve app.internal:443:10.0.0.42 https://app.internal/If thisfails, you have a network-level problem between proxy and origin. Iptables, UFW, Docker bridge network, a cloud security group, or a DNS change. When I moved Visual Sentinel’s production to Hetzner bare metal, I deliberately removed UFW (documented in our CLAUDE.md) because UFW and Docker’s own iptables rules conflict in ways that surface as 502s nobody can reproduce.
Step 4: Timeouts and buffers
If the origin is slow but working, nginx surfaces it as 502 in certain configurations (even though 504 would be more semantically correct). The directives that matter:
# in your nginx server{} or location{} block
proxy_connect_timeout 10s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffer_size 16k;
proxy_buffers 8 16k;
proxy_busy_buffers_size 32k;Raising these is a diagnostic tool, not a fix. If your 502 disappears when you bump proxy_read_timeout from 10s to 60s, your real problem is a slow endpoint. Find and fix the slow endpoint, then put the timeout back.
Step 5: Restart the upstream only, not the proxy
If a worker process is stuck, bounce just the origin app. Restarting nginx in front of a broken backend fixes nothing, drops in-flight connections for healthy endpoints, and teaches you the wrong lesson. systemctl restart php-fpm, pm2 restart <app>, docker compose restart app, whichever applies.
5. Platform-Specific 502s
Each proxy, CDN, and PaaS generates 502s with its own phrasing. The error message is a strong hint at the real cause if you know how to read it.
| Platform | Error Message | Real Cause & Fix |
|---|---|---|
| nginx | upstream prematurely closed connection while reading response header | The upstream closed the socket mid-response. Usually a crashed worker or a short FCGI timeout. Check the upstream process is alive and raise fastcgi_read_timeout. |
| nginx | connect() failed (111: Connection refused) while connecting to upstream | Nothing is listening on the upstream port. The backend process is not running, crashed, or is bound to a different interface. Restart the upstream and confirm the bind address. |
| nginx | upstream timed out (110: Connection timed out) while connecting | Proxy can reach the upstream port but TCP handshake did not complete in proxy_connect_timeout. Firewall drop, network partition, or origin swamped. |
| Cloudflare | Error 502: Bad gateway (Ray ID shown) | Edge reached your origin but got an invalid response. Check your origin proxy error log for the same timestamp. If origin is healthy, confirm Cloudflare IPs are allow-listed. |
| AWS ALB | HTTP 502 with target health failing | ALB could not reach the target, target returned malformed response, or target closed the connection. Check target health in the ALB console and the target’s keep-alive timeout (must be > ALB’s idle timeout, default 60s). |
| Vercel | FUNCTION_INVOCATION_FAILED (502) | A serverless function threw or exceeded memory. Check Vercel function logs. This is origin-side, not a proxy problem. Fix the function, not the edge. |
Cloudflare specifically:a 502 generated by Cloudflare’s edge always carries a CF-Ray header and uses the orange-cloud error page template. A 502 that comes from yourorigin passes through Cloudflare unchanged. Inspecting the response HTML tells you immediately which one you’re looking at, and that decides which side to debug.
6. Diagnosing a 502 with curl
The single most useful command I run when something 502s in production:
curl -v -o /dev/null -w "%{http_code} %{time_total}s %{remote_ip}\n" https://yoursite.com/Output tells you the HTTP status, total time, and the resolved IP. Compare that IP across runs: if it changes, you’re hitting different CDN edges, and one of them might be misrouting.
To bypass the CDN and hit origin directly (useful for isolating where the 502 really comes from):
# Bypass the CDN, hit origin IP with the correct SNI
curl -v --resolve yoursite.com:443:<ORIGIN_IP> https://yoursite.com/If origin returns 200 but the CDN returns 502, the CDN-to-origin link is the problem. If both return 502, the origin is the problem and the CDN is innocent. Diagnosis settled in two curl calls.
7. 502 vs 504: Stop Confusing Them
These two are not the same, and telling them apart short-circuits half the diagnosis.
| Status | Meaning | What Happened |
|---|---|---|
| 502 Bad Gateway | Invalid response from upstream | Upstream crashed, refused connection, or sent a broken response. Connection was made, answer was bad. |
| 504 Gateway Timeout | Upstream too slow | Proxy made the connection fine but gave up waiting for a response within the configured timeout. |
| 503 Service Unavailable | Server refused on purpose | Origin is up but refusing requests (maintenance mode, rate limited, overloaded with an explicit backoff). |
A 502 under sustained load is sometimes really a 504 in disguise (nginx behavior). If bumping proxy_read_timeout turns your 502 into a 200, the underlying failure mode was a timeout, not a truly bad gateway.
8. How to See 502s Before Your Users Do
Here is the hard truth about 502s: your own origin health check will almost never see them. A curl localhost/healthfrom the origin host returns 200 even when every external visitor is getting 502, because the proxy-to-origin path you just bypassed is the thing that’s broken.
The only way to catch 502s at parity with real users is to monitor the real user-facing URL from outside your own infrastructure. That’s the category Visual Sentinel sits in, and it’s why I built it the way I did: the check runs from independent regions, hits the public URL the same way a browser would, and alerts you inside 60 seconds when anything non-2xx shows up.
Catching 502s externally
Our HTTP uptime checks from EU and US regions hit your public URL every minute, identify as VisualSentinelBot/2.0 (whitelist-safe), and record the full status code, latency, and response body for every check. If a 502 surfaces even for 60 seconds, the alert fires to your email, Slack, Discord, Telegram, WhatsApp, or webhook, whichever channels you’ve wired up.
Set up free uptime monitoringBeyond monitoring, the big structural fixes that cut 502 rates over time: process managers that restart crashed workers automatically (pm2, systemd with Restart=always, Kubernetes restartPolicy: Always), generous but finite proxy timeouts, keep-alive tuned on both sides of the proxy-origin hop, and an origin SSL certificate that you actually monitor for expiry.
9. Frequently Asked Questions
Related Guides
Rai Ansar
DevOps Engineer, Founder of Visual Sentinel
I run production infrastructure on bare metal and spend most of my time thinking about why monitoring systems lie to you. Visual Sentinel exists because the uptime checks I had running against client sites missed too many real outages, and a 502 that only visitors see is the cleanest example of that gap. More from Rai.