What Is Serverless Hosting and Is It Right for Your Application
Serverless computing is one of the most overhyped and simultaneously most genuinely useful architectural patterns in modern web development. The name is misleading — there are absolutely servers involved — but the key idea is that the developer does not manage or provision any servers. You write a function, deploy it, and the platform handles everything else: running the function when it is needed, scaling it automatically to handle any amount of traffic, and charging you only for the compute time actually used. Understanding what this means in practice, including the trade-offs, helps you decide whether a serverless architecture is right for your application.
How Functions-as-a-Service Work
The fundamental serverless primitive is the function. You write a function (in Python, JavaScript, Go, Java, or another supported runtime) that takes an event as input and returns a response. You deploy that function to a platform like AWS Lambda, Google Cloud Functions, Cloudflare Workers, or Vercel Functions. When a triggering event occurs — an HTTP request, a message in a queue, a scheduled timer, a file uploaded to object storage — the platform instantiates your function, runs it, and returns the result.
The platform manages the execution environment. It decides how many instances of your function to run simultaneously based on incoming traffic. If 1,000 HTTP requests arrive at the same time, the platform runs 1,000 concurrent instances of your function. If no requests arrive, no instances are running and you incur no compute cost. This automatic scaling from zero to effectively unlimited concurrency, without any configuration, is the core value proposition of serverless for variable workloads.
The Cold Start Problem
When a function has not run recently and a new request arrives, the platform must initialise a new execution environment: download the function code, start the runtime process, and execute any initialisation code in your function. This takes time — typically 100 to 500 milliseconds for interpreted languages, potentially several seconds for JVM-based runtimes like Java or Scala. This initialisation latency on first invocation after a period of inactivity is called a cold start.
Cold starts matter for latency-sensitive applications. A user who happens to be the first request after a quiet period experiences significantly more latency than subsequent users who hit warm instances. Several mitigation strategies exist: keeping functions small and using lightweight dependencies to reduce initialisation time, using provisioned concurrency (AWS Lambda terminology) to keep a minimum number of warm instances running at all times, or choosing a runtime like JavaScript or Python that has faster cold start characteristics than JVM-based alternatives.
Cloudflare Workers uses a different isolation model (V8 isolates rather than separate container instances) that essentially eliminates cold starts at the cost of a more restricted execution environment. For edge functions that need sub-millisecond latency globally, this architecture is a significant advantage. The trade-off is that Workers cannot run arbitrary Node.js code — the runtime environment is more constrained than Lambda.
Statelessness and Its Implications
Serverless functions are stateless. Each invocation is independent — you cannot store data in a global variable and expect it to be there in the next invocation, because the next invocation may run on a completely different instance. Any state that must persist between invocations must be stored externally: in a database, in object storage, in a cache service like Redis or Upstash, or in the request itself via cookies or tokens.
This statelessness constraint forces a cleaner architectural separation between compute and state. State lives in purpose-built storage services (databases, caches, queues) rather than in the application process. This is architecturally cleaner and actually easier to reason about than stateful server processes, but it means every piece of persistent state requires a separate service, which adds cost and latency compared to the simplicity of a single server with local storage.
Pricing: Pay-Per-Request vs Always-On
Serverless pricing is based on two dimensions: number of invocations and compute duration (measured in GB-seconds — the amount of memory allocated multiplied by the execution time). AWS Lambda's free tier includes 1 million requests and 400,000 GB-seconds per month. Above the free tier, Lambda costs roughly $0.20 per million requests and $0.0000166667 per GB-second.
For low-traffic applications and APIs, serverless is dramatically cheaper than running a VPS that idles most of the time. For high-traffic applications with consistent load, the economics shift: a VPS running at 60-70% utilisation may be cheaper per request than serverless at scale. The break-even point depends on your specific function's memory usage, execution time, and invocation count. Running the numbers for your specific use case matters more than general claims about serverless being cheaper or more expensive.
Edge Computing and Serverless
An important variant of serverless is edge computing: running functions not in a central data centre but at network edge locations distributed around the world. Cloudflare Workers runs in 300+ locations globally. Vercel Edge Functions run on Cloudflare's edge. Fastly Compute@Edge operates similarly. The advantage is dramatically reduced latency for globally distributed users — a function running at a CDN edge location near the user can respond in 10-20ms, compared to 100-300ms for a request that has to travel to a central server.
Edge functions are particularly well-suited for tasks that need to be fast for all users regardless of location: A/B test routing, authentication token validation, personalisation redirects, geolocation-based responses, and similar lightweight compute tasks that currently add round-trip latency by requiring a trip to the origin. The constraints of edge runtimes (limited memory, restricted APIs, no filesystem access) mean they are best used for these targeted tasks rather than as general-purpose application servers.
When Serverless Is the Right Choice
Serverless hosting is a genuinely good choice for API backends with variable traffic, webhooks and event handlers, scheduled jobs, and auxiliary services like image resizing, PDF generation, or authentication. These workloads have in common that they run for short durations, do not need persistent local state, and may have highly variable invocation rates. A webhook handler that processes 10 requests a day and occasionally 10,000 is exactly the profile that serverless optimises for: no idle server costs during quiet periods, automatic scaling during spikes.
Serverless is a poor choice for long-running processes, applications that need local filesystem access, workloads with strict latency requirements and cold start sensitivity, or applications that are CPU-intensive for extended periods. A video transcoding job that takes minutes, a background data processing pipeline that runs continuously, or a game server that maintains persistent WebSocket connections are not well-served by functions with 15-minute execution limits and stateless execution environments. The right tool for these workloads is a persistent compute platform: a VPS, a container service, or a managed compute cluster.