What Is 504 Gateway Timeout Meaning, Causes, and Fix
A 504 Gateway Timeout means a proxy waited too long for an upstream response and gave up. This guide walks through what that actually means, why it happens (it is almost always something the origin is waiting on, not the proxy itself), and how to diagnose and fix 504s on nginx, Cloudflare, AWS ALB, and Vercel.
I run production infrastructure on Hetzner bare metal, and I spend a lot of time thinking about why web requests take longer than they should. 504 Gateway Timeout is the error that shows up when a proxy loses patience with your application. It is one of the more useful errors HTTP has, because it tells you something specific: the connection was fine, the upstream was reachable, but the answer was not coming back fast enough.
That specificity matters. I wrote a companion piece on 502 Bad Gateway last week, and the distinction between 502 and 504 is the difference between “your backend is broken” and “your backend is slow.” Fixing them pulls you in opposite directions. This guide covers what 504 actually means, the seven causes I see in the wild, the platform-specific quirks (Cloudflare 524 is a separate animal), and how to diagnose one without reaching for the proxy config first.
1. What 504 Gateway Timeout Actually Means
Per RFC 9110 (section 15.6.5), a 504 response means the server, acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to contact to complete the request. The key word is “timely.” Unlike 502, the connection itself was made. The proxy spoke to the upstream. It just never got a full answer within the time limit it was willing to wait.
Whenever you see 504, two servers are in the picture: the “gateway” the browser reached (nginx, Cloudflare, an AWS ALB, Vercel’s edge, a Kubernetes ingress) and the “upstream” behind it (your app, a microservice, a serverless function). The gateway has a stopwatch. When that stopwatch hits zero before the upstream finishes responding, the gateway returns 504.
Three concrete scenarios produce a 504:
- The upstream started responding but the full response did not arrive within the read timeout.
- The upstream accepted the connection but never wrote any response at all.
- The upstream accepted the connection, streamed a partial response, then stalled for longer than the idle-between-reads timeout.
In all three, the network path works, the upstream process is alive, and your firewall rules are fine. The problem is purely time. That is why 504 and 502 demand completely different diagnosis paths, and why conflating them costs hours.
2. The Seven Real Causes of a 504
Every 504 I have diagnosed in the last few years has fallen into one of these seven buckets. In rough order of how often each comes up in practice, slow database queries and blocked worker pools dominate, with external API dependencies a close third.
Slow database query
A missing index, a full table scan, or a lock-heavy transaction turns a 20ms query into a 90s wait. Every request hitting that code path now times out at the proxy.
Worker pool saturated
PHP-FPM, Gunicorn, Puma, or Node event-loop workers are all blocked on slow work. New requests queue up behind them until the proxy gives up and returns 504, even for endpoints that are normally fast.
External API dependency
Your code makes a synchronous call to Stripe, OpenAI, a CRM, a payment gateway, and that third party is slow or hanging. Your request hangs with it until your proxy’s read timeout trips.
Long-running legitimate work
Report generation, file uploads, CSV exports, or PDF rendering genuinely takes longer than your proxy_read_timeout. The work is fine, the HTTP request path is the wrong place for it.
Origin overloaded
CPU pegged at 100%, memory thrashing, swap hit. The app is still running but every request is slow. Proxy timeouts trip en masse. Often load-related and time-correlated with a traffic spike.
Network path partitioned
A cloud network link between proxy and origin is lossy or slow. TCP retransmits burn seconds off the timeout budget. Less common on the same LAN, relatively common across regions or VPC peering.
Backend stuck in GC or pause
A JVM stop-the-world GC, a Ruby long pause, a Node.js sync fs call, freezes the worker for several seconds. If it happens often enough, it looks like intermittent 504 to your users.
Worth noting: 504s are rarer than 502s in the wild. Across the last 2.5 million HTTP checks Visual Sentinel has run, we captured 4 genuine 502 Bad Gateway responses and zero 504s. That reflects how most production systems are tuned, a crashed backend fails fast (502) but a saturated backend queues requests (504 only once the proxy gives up). When 504 does hit, it is often a tail-latency event, and the causes above are the usual suspects.
3. If You’re a Visitor: What You Can Actually Do
Honest answer: not much. A 504 is a server-side failure and the fix lives on the site owner’s infrastructure. Unlike connectivity errors, nothing on your device is contributing. That said, four things are worth trying before you give up or report it:
- Wait a minute and try again. Most 504s are tied to a momentary spike on the origin: a slow query, a GC pause, a deploy that hasn’t warmed up. They often clear within 60 to 120 seconds without any human action.
- Hard refresh. Ctrl+Shift+R (Cmd+Shift+R on Mac). A CDN edge near you might have briefly cached the 504. A hard refresh forces a new round trip that may hit a different edge or catch the origin after it has recovered.
- Try a different network. Switch between WiFi and cellular. If the site loads from one and not the other, you might be on a CDN edge that is being routed to an unhealthy origin region.
- Check if it’s just you. Open our free website checker (or any “is-it-down” service) to see if the site is returning 504 for everyone or only from your region. If only you are seeing it, report your geographic region to the site owner, it narrows their search.
Things that won’t help: reinstalling your browser, switching DNS servers, running a virus scan, clearing cookies. None of them touch the slow upstream path that caused the timeout.
4. If You Run the Site: Diagnose From Upstream Outward
The single rule that will save you hours: never raise a proxy timeout before you understand what the upstream is waiting on. The timeout is a symptom. Raising it from 60s to 300s just makes the symptom quieter for five minutes at a time instead of one.
Step 1: Reproduce the 504 and time it
The first thing I always run when 504 alerts fire:
curl -v -o /dev/null \
-w "%{http_code} %{time_total}s %{remote_ip}\n" \
--max-time 180 \
https://yoursite.com/failing-pathThe time_total number is the real signal. If the 504 comes back after ~60s, your proxy timeout is 60s. If it returns ~100s, you are probably behind Cloudflare. The timing tells you which layer is giving up, and that narrows the fix dramatically.
Step 2: Hit the origin directly, bypassing the proxy
Get onto the origin host and curl the app itself, with a long max time so you can see the real duration:
# On the origin host
curl -v -o /dev/null \
-w "%{http_code} %{time_total}s\n" \
--max-time 300 \
http://127.0.0.1:3000/failing-pathIf the origin takes 90 seconds to respond with 200 OK, your application is the bottleneck, not your proxy. The 504 at the edge is a correct answer to a slow upstream. Now the real question: why is that endpoint slow?
Step 3: Identify the specific slow dependency
This is where the actual work happens. The upstream is waiting on one of:
- A slow database query. Enable
log_min_duration_statementin Postgres or the slow query log in MySQL and reproduce the request. The query above your threshold is the cause. - An external API call. Wrap outbound HTTP calls with a short timeout and log every one that exceeds it. When Stripe or a CRM is slow, your requests are slow too.
- A lock. A long-held row lock or table lock serialises every request that wants the same row. Look for
pg_lockscontention orSHOW PROCESSLISToutput in MySQL. - Blocking I/O on the event loop (Node.js) or the GIL (Python with sync I/O). A single sync
fs.readFileSyncor a heavy JSON parse in a hot path halts every other request.
Step 4: Check worker concurrency
If one endpoint is slow, it starves the others. PHP-FPM, Gunicorn, Puma all have a fixed pool of workers. Slow endpoints grab workers, fast endpoints end up queued, and you see 504 on everything during a load spike.
# PHP-FPM
curl http://localhost/fpm-status
# pm2
pm2 list
# Gunicorn / Unicorn
ps aux | grep -i gunicorn | wc -lIf every worker is busy, you can temporarily bump the pool size to relieve the symptom, but do not ship that as the fix. Find the slow endpoint, address the root cause, then put concurrency back at a sane level.
Step 5: Read the proxy error log for the exact directive
For nginx, tail the error log while you reproduce the 504:
tail -f /var/log/nginx/error.logThe first line that mentions upstream tells you which timeout tripped: upstream timed out (110: Connection timed out) while reading response header from upstream is proxy_read_timeout, while connecting to upstream is proxy_connect_timeout, and while sending request to upstream is proxy_send_timeout. Different directives point at different failure modes.
Step 6: Decide, fix it or move it off the request path
If the operation should be fast (a product page, a dashboard query), fix the slowness: add the index, cache the API call, fix the lock. If the operation is legitimately long (report generation, video encoding, bulk CSV import), move it to a background job and return a 202 Accepted with a job ID immediately. Do not force users to hold an HTTP connection open for two minutes. That is what BullMQ, Sidekiq, and Celery exist for.
5. Platform-Specific 504s
Every proxy, CDN, and PaaS has its own timeout budget and its own way of expressing it. Knowing which directive matters on your stack turns a mysterious 504 into a one-line fix.
| Platform | Directive | Default | Notes |
|---|---|---|---|
| nginx | proxy_read_timeout | 60s | Max idle time between two successive reads from the upstream. Most common 504 cause. Scope to specific location blocks, not globally. |
| nginx | proxy_connect_timeout | 60s | Time to establish the TCP handshake with upstream. Above 75s has no effect. If this trips, the upstream is hard-down or firewalled, not slow. |
| nginx | proxy_send_timeout | 60s | Max idle time between two successive writes to upstream. Only bites on request bodies the upstream is slow to consume (large uploads). |
| Cloudflare | Proxy timeout (plan-level) | 100s | Free / Pro / Biz cap. Enterprise can raise. If your origin takes longer than 100s, Cloudflare returns 524 (not 504), and you need a background job, not a bigger timeout. |
| AWS ALB | Idle timeout | 60s | Applies to the client-to-ALB and ALB-to-target connections. Your target’s keep-alive must be longer than this, otherwise ALB reuses a half-closed connection and returns 504. |
| Vercel | Function max duration | 10s (Hobby), 15s (Pro), 900s (Enterprise) | Hard per-invocation cap. If your function runs longer, Vercel terminates it and returns 504. Move long work to a queue or use a streaming response. |
| HAProxy | timeout server | (no default, must be set) | Max inactivity between server responses. The HAProxy config without this set is a common source of mysterious 504s nobody tracks down for weeks. |
The Cloudflare 524 distinction
Cloudflare is the one case where 504 gets confusing, because Cloudflare also exposes Error 524. They mean related but not identical things:
- Cloudflare 524: the TCP connection to your origin succeeded, but Cloudflare waited 100 seconds (plan-level default) for an HTTP response and got nothing. Your origin is hung, not slow. 524 almost always means an origin process is stuck, not just overloaded.
- Cloudflare 504: the standard HTTP gateway timeout. Often generated by a proxy between Cloudflare and your true origin (another nginx, an ALB, a Kubernetes ingress) that itself gave up first.
Check the CF-Rayheader and the error page HTML. Cloudflare 524 pages use an orange-branded template that says “A timeout occurred.” A plain 504 HTML with no Cloudflare branding is coming from your own proxy stack, not from the edge. That one observation tells you which side to debug.
AWS ALB idle timeout and keep-alive
ALB has a gotcha: its idle timeout (default 60s) applies to both sides of the connection, and your target’s keep-alive must be longer than the ALB idle timeout. If your target closes connections after 30s but ALB reuses them for up to 60s, ALB sometimes picks a half-closed connection, fails to get a response, and returns 504. Set your application keep-alive to at least 75s when sitting behind ALB.
Vercel function duration limits
Vercel caps serverless function duration by plan: 10 seconds on Hobby, 15 seconds on Pro, and up to 15 minutes on Enterprise. Exceed that and you get 504 (FUNCTION_INVOCATION_TIMEOUT in the logs). Raising maxDuration in your function config helps up to the plan ceiling, but above a few seconds you should really be streaming the response or moving work to a queue with a webhook callback.
6. Diagnosing a 504 with curl
Three curl invocations will tell you almost everything about a 504. Run them in order:
The time check
curl -v -o /dev/null \
-w "status=%{http_code} total=%{time_total}s connect=%{time_connect}s starttransfer=%{time_starttransfer}s\n" \
--max-time 180 \
https://yoursite.com/failing-pathThe difference between time_connect and time_starttransfer is how long the origin took to start producing a response. If starttransfer is close to your proxy timeout, the origin was sitting on the request before writing anything, that is the classic 504 pattern.
Bypass the CDN, hit the origin directly
# --resolve forces curl to connect to ORIGIN_IP while sending the right SNI
curl -v -o /dev/null \
-w "%{http_code} %{time_total}s\n" \
--resolve yoursite.com:443:<ORIGIN_IP> \
--max-time 300 \
https://yoursite.com/failing-pathIf this returns 200 eventually (say 90 seconds), your origin works but is slower than your CDN is willing to wait. If this also 504s or hangs, the problem is on the origin side and the CDN is blameless.
Watch the proxy log in parallel
# Terminal 1
tail -f /var/log/nginx/error.log
# Terminal 2
curl -v --max-time 180 https://yoursite.com/failing-pathWatch which directive nginx mentions when the timeout trips. That phrase is deterministic: it names the exact timeout directive that expired, which tells you exactly which value to tune or which layer to investigate.
7. 504 vs 502 vs 408: The Timeout Trio
Three status codes overlap around “something took too long,” but they are reported from different places and for different reasons. Telling them apart is half the diagnosis.
| Status | Who Gave Up | What It Means |
|---|---|---|
| 504 Gateway Timeout | The proxy | Upstream was too slow. The connection worked, the upstream was reachable, but the response did not arrive in time. Fix is on the upstream app. |
| 502 Bad Gateway | The proxy | Upstream sent a broken response or refused the connection outright. Crashed backend, bad TLS, closed port. Fix usually on the origin or the proxy-to-origin wiring. |
| 408 Request Timeout | The server | The client was too slow. Server waited for the request body or further request and the client never finished sending. Usually a client-side or network issue, not a backend issue. |
The quick mental model I use:
- 504: the backend answered slowly. Fix the app.
- 502: the backend answered wrong. Fix the backend process or the path to it.
- 408: the client answered slowly. Probably not your server’s fault.
Also worth knowing: 503 Service Unavailable is different from all three. 503 means the server is up but refusing work on purpose (maintenance mode, rate limit, explicit backoff). 504 means the server tried and ran out of time. If you see 503, check your rate limiter and maintenance toggles before touching anything else.
8. How to See 504s Before Your Users Do
A hard truth about 504: your own origin health check almost never sees them. A curl localhost/health from the origin host returns 200 in well under a second because the health endpoint is fast by design. Meanwhile, the slow dashboard page behind a database query is 504ing for every user. The health check is correct and useless at the same time.
The only reliable way to catch 504s at parity with real users is to monitor the real public URL, through the same proxy chain, from outside your own infrastructure. That is the category Visual Sentinel sits in, and it is why I built it the way I did: the check runs from independent regions, hits the public URL the same way a browser would, and alerts you inside 60 seconds on any non-2xx response.
Catching 504s externally
Our HTTP uptime checks from EU and US regions hit your public URL every minute, identify as VisualSentinelBot/2.0 (whitelist-safe), and record the full status code, latency, and response body for every check. The response-time chart also makes the slow-creep pattern that precedes a 504 visible, so you can react before the timeout actually trips. Alerts fire to email, Slack, Discord, Telegram, WhatsApp, or webhook.
Set up free uptime monitoringBeyond monitoring, the structural fixes that cut 504 rates over time: aggressive query timeouts at the database layer so a single slow query cannot hold a worker indefinitely, circuit breakers around external API calls so a hung third party cannot drag your app down with it, moving any operation longer than a few seconds to a background job, and watching p95 and p99 response times rather than the average (an average of 200ms hides a p99 of 90 seconds).
9. Frequently Asked Questions
Related Guides
What Is 502 Bad Gateway and How to Fix It
The other half of the gateway error pair. 502 is “the upstream answered wrong,” 504 is “the upstream never answered.” Read both to cover the full diagnosis tree.
Common Website Error Codes Explained
4xx vs 5xx, when each one shows up, when to alert, and how each class reads in your monitoring dashboard.
Rai Ansar
DevOps Engineer, Founder of Visual Sentinel
I run production infrastructure on bare metal and spend most of my time thinking about why monitoring systems lie to you. Visual Sentinel exists because the uptime checks I had running against client sites missed too many real outages, and a 504 that only shows up in the p99 of real user traffic is a clean example of that gap. More from Rai.