Designing systems that scale

4 min readNov 22, 2021

One of the biggest challenges in big tech is scalability

Horizontal vs Vertical Scaling

How do you design systems that can handle millions of people hitting it at once.

Let’s say that one person takes 1 hour to complete 10 tasks, this means that you’d need 2 people to complete 20 tasks in 1 hour. Horizontal scaling is pretty much the same thing. The more traffic your application has to handle, the more servers you throw at the traffic.

Let’s talk about a design that does not scale and work our way up to one that does.

Single Server Design

So in a sort of legacy single-server design, you have a bunch of clients out there in the internet and they are talking to one box on the other side of the internet.

The server doesn’t get a whole lot of traffic and uptime isn’t really a huge concern. One server is enough for that. The data can be backed up somewhere occasionally and can be restored if necessary. Another situation to employ a single server solution would be if there’s some little internet tool that just isn’t that important, you don’t really have a lot of budget for it and you don’t want to spend a lot of time maintaining it. This server is running HTTP of some sort and maybe there’s a blog application on top of it. It could also be connected to some database as well. The server could have some MySQL instance running on it as well.

This solution isn’t viable for a commercial system. This solution has a single point of failure, so if this server goes down, it’ll be a bad bad day for everyone involved with the business. Everything will have to be restored on a different server and the DNS entry will have to be switched from somewhere to point to the new server.

So a single server design is for applications that are unimportant and are deployed at a small scale.

Let’s talk about a way of scaling up

Imagine a situation where you would like your app to take more traffic than what can be handled on the simple server that it’s been deployed on.

What can you do now? You could just use a bigger server.

This brings us to

Vertical scaling

Vertical scaling means that instead of adding more servers, we just add a bigger server.

But then the more you want to scale, the bigger and more expensive the host gets. You can just keep throwing more hardware at the problem and get away with that for a little while. This is definitely not something you’d want to keep doing in commercial system. Vertical scaling has it’s limits, servers can only get so big before you can’t get anything bigger, and there are limitations on the amount of CPU you can get in a system or the amount of memory that you can make available. At some point, you’ll hit a big, strong, and un-scalable wall.

Similar to making the website host bigger, the database server can be made bigger too. But same problem again, you’ll eventually require the biggest server mankind has ever made and then some.

The only good thing about vertical scaling is that there are not a lot of things to maintain. In this system, you only have two hosts to take care of.

Fewer components in the system, less likely it is to break. But what about when it does break? Yep, you are screwed. Either one of the hosts go down and you are in hot soup.

This is where horizontal scaling comes in.

Horizontal Scaling

For most modern system design problems, where you need to develop a system at a large scale, you would want to take a more horizontal scaling approach.

The idea is that instead of a single server, you have multiple servers.

The load is distributed in some fair manner by a load balancer between the internet and those servers. Load balancers need a separate article in itself, I’ll write one soon.

The short version of the idea is that, you have some device or software that’s distributing the load coming in from the internet to a whole fleet of servers; all the servers performing the same function.

This is a pretty good way to go about scaling because if one server goes down, the user doesn’t need to know.

The load bouncer can realize that a host is down and reroute the traffic around it automatically so that you don’t have any downtime.

An issue that might arise in this case is that maybe all the servers are working on the same database. We could scale that database out as well.

The idea of horizontal scaling is that as you get more and more traffic, you just add more and more servers to the fleet and the load balancer will distribute that traffic amongst all the available servers.

You can scale infinitely with this, yea?

If you have a billion transactions coming in, have enough servers set up in there and you’ll do just fine.

There are more factors you can alter while horizontally scaling. The geographical location of servers, the type of servers, the type of data centers used, all of these are finer points that will be discussed here later on.

But this is the basic idea of horizontal scaling, it allows a practically infinite way of scaling your system up. One big disadvantage of horizontal scaling is that there is more stuff to maintain.

Horizontal scaling really only works if your webservers are what we call stateless.

Subsequent requests should not depend on something being stored on that server from a previous request. This just means that any individual server should not assume that it is the one that served the previous request to a given user.

So how can you choose between these architectures? Always choose the simplest architecture that meets your projected requirements but no simpler than that.

So in short, vertical scaling means that you are getting bigger servers, horizontal scaling is when you get more number of servers.