Nginx Load Balancer Improvements to proxy_next_upstream

This change happened in March of 2016, but was still news to me when I stumbled across it recently. So I wanted to share since it’s important but didn’t seem to be loudly broadcast. Nginx is no longer dangerously bad at load balancing!

Among the many features of the outstanding Nginx webserver is the ability to act as a load balancer. Using the built-in upstream module, you can define a pool of backend app servers that should take turns servicing requests. And in theory, you can tell nginx to skip a server if it is down or returning error (HTTP 5xx) responses.

In practice, however, Nginx’s handling of downed servers can be very dangerous. That Hacker News thread notes that when a server returns an error, Nginx will by default always retry the request on a second server. This is fine most of the time. But what if the request was “charge $10,000 to my credit card”? Maybe the server correctly applied the charge, but then failed while rendering the confirmation page and returned an error. Well, get ready for some real angry customer support calls. Nginx would have resubmitted that same $10,000 charge over and over until a server responded with an HTTP 200 OK.

For this reason, many admins recommend setting the value proxy_next_upstream off;. This makes a failed backend request simply return an error page instead of retrying it on another server. Definitely not ideal; who wants their users to see error pages? But better than handling a deluge of chargebacks from outraged customers who were billed multiple times. In reality, this often meant admins chose another, specialized tool for their load balancing needs, like HAProxy or an expensive hardware appliance from the likes of F5 or A10.

But wait! With the release of Nginx 1.9.13, things got better. Nginx will now never retry “non-idempotent” requests unless you explicitly tell it to. Idempotent means that no matter how many times you perform an action, it always has the same result. So this excludes POSTs, and a few more obscure methods.

So if you’re still running with proxy_next_upstream off; in your config because of those concerns, it’s time to test removing it. Nginx’s load balancing is much safer and saner than it was this time last year.