How Load Balancing Works: Spreading Traffic Across Multiple Servers
A single server can only handle so many requests per second before it starts degrading — response times climb, queues build up, and eventually requests fail. Load balancing solves this by distributing traffic across multiple servers. Here is how it works in practice.
The Basic Problem
Modern websites routinely handle traffic spikes that far exceed what any single server can process. A product launch, a mention in a major publication, or a viral social post can send traffic from a few hundred requests per minute to tens of thousands in minutes. Without load balancing, the choices are to overprovision a single massive server (expensive at rest) or to accept that spikes will take the site down.
Load balancing allows horizontal scaling — adding more servers instead of larger servers. This is more cost-effective, more resilient, and more predictable under load.
Where the Load Balancer Sits
A load balancer sits between the internet and your application servers. All incoming traffic hits the load balancer, which then distributes individual requests to backend servers. Backend servers are often on a private network not exposed to the internet directly — only the load balancer has a public IP. This adds a layer of security as well as distribution.
Load balancers can be dedicated hardware appliances (F5, Citrix), software running on a server (HAProxy, nginx, Traefik), or managed cloud services (AWS ELB/ALB, Cloudflare Load Balancing, DigitalOcean Load Balancers).
Distribution Algorithms
Round-robin is the simplest: requests are sent to each server in turn. Server 1 gets request 1, server 2 gets request 2, server 3 gets request 3, then back to server 1. This works well when all servers are identical in capacity and requests are roughly equal in cost.
Weighted round-robin assigns different weights to servers — a more powerful server gets more requests proportionally. If server A has double the CPU of server B, it gets two requests for every one that goes to server B.
Least connections sends each new request to whichever server currently has the fewest active connections. This is better than round-robin when requests vary significantly in duration — a long-running database export on one server will fill up its connection count, and the load balancer will route new requests elsewhere until it finishes.
IP hash uses a hash of the client IP to determine which server gets the request. The same client IP always goes to the same server. This provides session persistence without shared session storage — useful when application state is held in server memory.
Health Checks
A load balancer that blindly sends traffic to a server that has crashed is worse than no load balancer. Health checks are the mechanism that prevents this. The load balancer periodically sends a test request (an HTTP GET to a health endpoint, or a TCP connect, or an ICMP ping) to each backend server. If a server fails to respond correctly within a timeout, the load balancer marks it as unhealthy and stops sending traffic to it. When the server recovers and passes health checks again, traffic resumes automatically.
Session Persistence: The Stateful Problem
Many web applications store session data in server memory. If a user is authenticated on server 1 and their next request goes to server 2, server 2 has no knowledge of their session and they appear logged out. Solutions: use IP hash routing to stick a user to one server, use sticky sessions (the load balancer sets a cookie that routes the user to the same backend), or move session storage out of server memory into a shared store like Redis. The last option is the most robust — it allows any server to handle any request without routing constraints.
SSL Termination at the Load Balancer
Load balancers typically handle SSL termination — the TLS handshake happens at the load balancer, and traffic to backend servers flows over unencrypted HTTP on a private network. This centralizes certificate management (one certificate to renew instead of one per server) and reduces the computational overhead on application servers. For environments that require end-to-end encryption even on internal networks, SSL passthrough (forwarding encrypted traffic to backends) or backend re-encryption are alternatives.