Vikas Maurya
24 min readMay 2, 2021


After the failure in an interview, I thought let’s try to learn this system design in depth. Then I understand to crack interviews we don’t need to learn everything. Fundamentals of System Design is enough to crack this interview. I have gone through various YouTube channels which taught System Design. I looked at the course content of System expert then I thought with learning let’s create a blog that might help people in a nutshell about what topic to study or give importance to while preparing for the System Design Interview.



2)What Are Design Fundamentals?

3)Client-Server Model

4)Network Protocols


6)Latency and Throughput




10)Load Balancers


12)Relational Databases

13)Key-Value Stores

14)Replication and Sharding

15)Leader Election

16)Peer-To-Peer Networks

17)Polling and Streaming


19)Rate Limiting

20)Logging and Monitoring

21)Publish/Subscribe Pattern

“I guide others to a treasure I cannot possess”

Lecture 1: Design Fundamentals Introduction

Definition: Systems design is the process of defining elements of a system like modules, architecture, components and their interfaces and data for a system based on the specified requirements.

Description: A systemic approach is required for a coherent and well-running System.

Designing a Robust, Functional, Scalable system is very important. Since it helps reduce the complexity of the system and makes the system more efficient in terms of processing and access.

Why should you learn System Design?

Understanding the scalable System Design paradigm might help you save millions of dollars.

Better user experience which is good for business and helps you gain more clients.

Handling system failure becomes easy with well-designed system architecture.

Some of the important topics which help build this system are as follows.

SQL, Server, Cache, Polling, Load Balancer, Availability, Databases, Map-Reduce, Client, Leader Election, Peer to Peer, HTTP, Nginx, Replication, Hashing, Network Protocol.

Lecture 2: What are Design Fundamentals?

No matter your role in the tech world, System design is a concept you should be somewhat familiar with. It is the process through which you can design and build scalable and maintainable applications.

Interview Aspects:

Always be aware of simple system design which is fundamental since sometimes in interviews the questions will be like designing YouTube with all the basic functionality.

While answering in any system design interview there is no right answer but if you want to become a software engineer you should know why and what are the basic fundamentals behind it.

You should be able to defend your solution and tackle any question from the interviewer. Here knowing design fundamentals comes into the picture.

Categories for Design Fundamentals:

  • Underlying Foundational Knowledge.
  • Key Characteristics of System (Availability, Consistency, Latency, Throughput, Redundancy.)
  • Components of System (Load Balancer, Proxies, Caches, Leader Election, etc.)
  • Technology Tools (Nginx, Cloud Service, etc.)

Lecture 3: Client-Server Model

The client-server model describes how a server provides resources and services to one or more clients. Examples of servers include web servers, mail servers, and file servers. Each of these servers provides resources to client devices, such as desktop computers, laptops, tablets, and smartphones.

It is the foundation of the Modern Computer system. It explains how 2 systems communicate.

The client speaks to the server (It requests data or sends data) and the Server listens to it and on the basis of client requests, it responds to the client.

If you want to talk to anyone you should have some contact details, similarly clients should have some address of the server so that they can interact with each other.

An IP address is a unique Identifier for the Machine. Which is used by clients so that clients can send data to specific systems or servers.

DNS (Domain Name Server) is used to get the IP Address of a particular site.

DNS is the database of IP or you can say that it is a phonebook of the Internet.

A DNS query is a special request which goes to the server and gets the IP address.

The client sends an HTTP request to the server. The client will send the packet request which contains the source address and destination address. Servers are machines that receive the request. The server which has some IP address (Private), contains 65535 Ports. So it is very hard to find which port server is listening. Hence ports should be specified so that the request route properly to the correct server.

If a client wants to talk to a server on HTTP it uses port 80.

For HTTPS it uses Port 443.

Local IP Address:

Lecture 4: Network Protocols

Software Engineers don’t use Network protocols every day but it is necessary to know these Network Protocols.

Focus on these 4 only for now.


Internet Protocol Address: Modern Internet runs on IP. One Machine interacts with another Machine in the form of IP packets (Made with Bytes). IP Packets have IP Header (Source IP Address, Destination IP Address, Total size of Packet, Version of IP), Data (2¹⁶ bytes)

One IP packet is not enough to send data. We have to send Multiple IP Packets

But then the sequence will change and data may be corrupted. So to solve this we use TCP (Transmission Control Protocol).

TCP Establish the Connection via 3- Way Handshaking Protocol.

TCP is a more powerful rapper around IP Addresses. It is used to Transfer Data.

But it does not assure security and robustness. So we have HTTP (HyperText Transfer Protocol).

How does HTTP work? As a request-response protocol, HTTP gives users a way to interact with web resources such as HTML files by transmitting hypertext messages between clients and servers. HTTP clients generally use Transmission Control Protocol (TCP) connections to communicate with servers.

Suppose we want to send data very fast manner and we don’t care about their flow sequence and error control then we use UDP.

It is used for especially time-sensitive transmissions such as video playback or DNS lookups.

But if security and correctness is the core aspect of your system then avoid UDP then TCP is best used.

UDP (User Datagram Protocol) is a communications protocol that is primarily used for establishing low-latency and loss-tolerating connections between applications on the internet. It speeds up transmissions by enabling the transfer of data before an agreement is provided by the receiving party.

Lecture 5: Storage

Storage is one of the backbones of System Design. If you have data you should require some storage.

The database is used as storage. Store and Retrieve Data (Read and Write).

The database is just a server. Local computers also become a Database.

Databases should be persisted if power outages occur.

The database server writes the data to disk if a power outage occurs then data is never lost.

If we used Memory to store the data then it does not have persistence.

Memory stores the data temporarily so if we restart the server then data will be lost hence to achieve persistence we need to store the data on Disk.

While choosing a Database you should know what kind of storage system you require.

Since there are so many databases for different purposes and functionality.

If you want to store structured data then go for Relational DB.

If you want to store unstructured data then go for NoSQL DB.

Here is the list of databases you can study.

Lecture 6: Latency and Throughput

Latency is how long it takes data to get from one point of the system to another point of the system.

The amount of time taken by request from the client to the server and response from the server to the client is known as Latency.

Reading 1MB of Data Sequentially from Memory takes 250 Microseconds.

Reading 1MB of Data Sequentially from an SSD takes 1000 Microseconds.

Sending 1MB of Data 1 Gigabyte Per the second network takes 10000 Microsecond (API Request).

Reading 1MB of Data Sequentially from HDD takes 20000 Microseconds.

Sending Packet California->Netherlands->California takes 150000 Microsecond.

Latency needs to be considered. It depends on the system designer if the system is for the Game server then latency should be less but in the Payment system uptime and accuracy is more important there we can have high latency.


How Much work a machine can perform in a given amount of time?

We calculate the throughput of the system in terms of Gigabytes i.e. how much amount of data it can transfer from one point to another.

How many requests a server can handle in a given amount of time is known as throughput.

Adding multiple servers helps improve throughput and latency.

Lecture 7: Availability

System availability means if a system gets failed like a Database Fails, Components Fails, etc. the system should be available to the client it should never go down.

Low Fault Tolerance is the way to increase availability.

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail. … Fault-tolerant systems use backup components that automatically take the place of failed components, ensuring no loss of service.

If System is not available then we might lose customers and have bad publicity.

High and low availability depends on system needs. For e.g. YouTube is down then will cause problems for Google.

Availability always measures in Nines that is 99% means 2 Nines available, and 99.99% 4 Nines available.

More the nines it shows High Availability.

Service Level Agreement/Service Level Objective:

Many Service Providers made SLAs for customers. SLA guarantees a certain amount (%) of availability.

To achieve High Availability we have to avoid single-point Failure.

We have to make our system redundant. We can do this by adding many servers and adding load balancers to distribute the requests to different servers.

Redundancy could be invoked by just adding many components.

Passive Redundancy: When you have multiple machines and one fails then the other Machine handles it.

Active Redundancy: When one fails (Leader) then other machines take over via Leader Election.

It is one of the most important concepts of System Design.

Caching is to avoid the redo of the same task. It is used to speed up the system.

It also helps improve latency and throughput.

Lecture 8: Caching

Caching is the way to design the system and type of operation like network requests. We don’t have to face some type of requests again and again.

Caching is done at the client or server level.

So that clients don’t request a server (in client-level caching).

At the server level caching server reduces continuous database interactions.

Hardware-level and OS-level caching are also done (High-level understanding requires).

In Client-Server architecture=> Client performs a network request to the server then the server request to Database. It is time-consuming sometimes if we are requesting complex computation data. Every time it will compute and which will affect latency. If we use the caching concept, then it will store those results in the client site (Like Dynamic Programming). This caching will improve the latency and time complexity of the system.

The website uses Caching if we load any site it takes some time depending on our internet but after loading, once we navigate to other pages and come back to the previous page then it didn’t load and shows the result instantaneously.

Redis (Key-Value Store) is a famous database used to implement caching.

Redis stores the data and stores it in key-value form when the user requests some data then it will check into the database if the key is present then it will return that value.

Write through Cache: When we write the piece of data in both Database and cache.

Write Back Cache: When we write only in Cache and not in Database. This Database will change in a certain interval of time.

While writing a post on LinkedIn or Facebook. We might edit those posts.

For e.g. On YouTube Many people comments if we use a different cache on the client side and do not update i.e. Write Back Cache then if the user edits their comments then everyone will see a stale version of the comment.

Hence we need a Redis database between the client and server.

This caching was not required when a user saw the YouTube count since it is not an important type of data. But if a user replies to a stale version of a comment then it will give a bad user experience.

Static and immutable data caching is beautiful but for dynamic data, it is a very tricky thing. It increases the complexity.

Least Recently Used (LRU) and Least Frequently Used (LFU) is the way to remove data from the cache.

Lecture 9: Proxies

Proxy means somebody wants to access something on the internet without revealing their identity.

Forward Proxy: It is a server or set of servers it sits between the server and the client. Forward mostly acts as a client site. When the Client requests then it goes to forward proxies and then it goes to the server.

The server will respond to the proxy then it will forward it to the client. If the client doesn’t want that server should know his identity then the client uses Forward Proxies.

For e.g. VPN (Virtual Private Network)

Reverse Proxy: It is mostly by the Server site. When a client requests then it will go to reverse proxy then it will go to the server. Clients do not know that all requests go to Reverse Proxy.

The server uses Reverse proxy and Client Forward proxies.

If we want that server to not get the bad request then this proxy server acts as a filter.

Proxy servers also act as a load balancers. Malicious clients or servers also get filtered by these proxy servers.

Lecture 10: Load Balancer

When there are millions of users requesting a server then the server may get overloaded or may be slow. Which will lead to failure or a bad user Experience.

Horizontal and vertical scaling are the 2 ways to scale the system. Vertical scaling is limited since in this we increase the computational power of the server.

In horizontal scaling, we add more servers so that all the requests can be distributed across the server.

The load balancer helps do that in a very efficient manner so that the speed of the server increases.

Rerouting or distributing the traffic coming from the client to the server in a balanced manner is the main work of a Load Balancer.

The load Balancer is mostly in between the client and the server. But it also comes in between the server and database or in between where the load is to be managed, etc.

There are software and hardware load balancers. The software load balancer is easy to customize but the hardware is hard to modify.

Configuration of load balancer done by the server owner.

For routing or distributing the request load balancer follows random order.

Which may cause issues if the same server gets the request. So we need some technique which distributes requests uniformly.

Round robin technique to distribute the server evenly if one server said that it is full then it will redirect to subsequent.

Weighted Round Robin technique in which the bigger the server higher the weight and the load balancer will route the requests to high-weighted servers since it can handle more requests than other servers.

Performance-based: How much the server performs, if it is poorly performing then the load balancer chooses which performs well.

IP-based load balancing: In this IP hashing occurs and then it goes to the server which has the IP.

Path-based Load Balancing:

All the requests related to payment, different servers for different services.

Multiple Load Balancers may follow different server selection strategies also if one load balancer gets overloaded then another load balancer will help.

Lecture 11: Hashing

We can hash or transform a piece of data into the fixed length of data.

For the same user if we cache the data in one server then the load balancer will follow the round-robin technique to route the request but it does it randomly which reduces cache hit.

A cache hit is a state in which data requested for processing by a component or application is found in the cache memory. It is a faster means of delivering data to the processor, as the cache already contains the requested data.

So if a request is routed to the same server then request processing will be very fast.

Hashing could help to route requests to the same server in a faster way for a particular user.

We will hash the Client/User that is mode the client number with the number of servers.

So it will assign the client to the server so that the load Balancer will assign the same server to the client every time.

Hashing will distribute the client's request in the desired manner.

If one of the servers dies then adding a new server and modifying the hash value will reduce the cache hit.

So we need to use some concept which solves the above problem.

Consistent Hashing:

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.

The server is arranged in a circular manner with a sequential hash function.

The client also will be arranged in this ring, in the above diagram object can be considered as the client.

Object or client’s request will be routed to the server in a clockwise manner to the nearest distance.

If any server dies or adds any server then it will not affect the cache hit or response rate.

So if we want a more evenly distributed System then we can use a complex hash.

We can also place the strong server into multiple places in the ring by passing to multiple hashing functions.

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of options out of a possible set. Options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to.

Lecture 12: Relational Database

A relational database is a type of database. It uses a structure that allows us to identify and access data in relation to another piece of data in the database. Often, data in a relational database is organized into tables.

The relational Table is very structured. It will be having rules and schema.

The Relational Table will consist of rows and columns.

SQL stands for Structured Query Language. It is designed for managing data in a relational database management system (RDBMS). It is pronounced as S-Q-L or sometime See-Quell. SQL is a database language, it is used for database creation, deletion, fetching rows, modifying rows, etc.

Why do we need relational data?

NoSQL data is good for storing non- Relational but it is not good for querying simple data.

If we use python for querying the data then it is very hard since tera byte of data we need to first load in the memory then we can query the desired data.

Hence for large datasets python is not the best solution. SQL helps in high-speed querying hence it is used so widely and used by all organizations.

Index in Database:

It is a powerful functionality to fast the querying in SQL.

It is used for fast searching. We can create indexing on a database for optimization of searching in millions of data.

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access to ordered records.

Lecture 13: Key-Value Stores

As the name suggests it stores the data in the key-value form. It is the NoSql type of database.

You can relate this storage-type database with a hash map.

Caching implementation uses a key-value structure. Dynamic Configuration also uses this storage structure.

Since the extraction of data is very fast it reduces latency and throughput.

E.g. Dynamodb, Redis, Zookeeper, etc.

Some Key-value store data to disc. So if the system goes down you will not lose any data.

Redis is a memory-specific key-store storage type, it stores in memory if the system fails then data will be lost.

There are various other types of NoSQL also.

Lecture 14: Replication and Sharding

System performance mostly depends on the database. If the database is not available the system also might not be available. Hence to prevent database failure we need Replication of database or secondary database.

We will create a replica of the main database so that if the main database goes down then the replica will replace or take over the main db. So we need to synchronize the main database with the replica database. So they both should change their data with each other so that consistency is achieved and there will be no data integrity issue.

So if there is any write operation in the main Database then it will be updated in the replica database also.

Suppose we want write a post on LinkedIn

Sometimes terabytes of data in a database and replicating all the data is not an optimal way.

So for that, the solution is sharding. We split the data across multiple databases which are known as shards.

We can use different logic/strategies to split the data. We can split on the basis of places, usernames, etc.

But it will cause traffic on these shards since the split is obvious.

We should use hashing for uniform distribution across shards.

Lecture 15: Leader Election

Many organizations have various services which depend on third-party services. This third party service should not directly have access to any sensitive data. So we can put one leader or server which should connect to third-party services so that they don’t have direct access.

If we want to deal with the failure of services we need to add redundancy by adding more copies of the leader.

But we don’t want all the copies to perform the same services which lead to multiple copied services which is useless. So we have to elect leaders from all the servers. The leader will perform all the services and the other server will wait. If a Leader fails then another leader is elected and then it will perform the previous leader’s services.

But after the failure of a leader then how to elect a new leader is very complex since gaining consensus on who will be a leader at any given point in time.

So we have to use a consensus algorithm.

A leader election algorithm is a special case of consensus algorithm where the set is the set of participants. … This is often how consensus is solved in practice: choose a leader (using a specialized consensus algorithm), then the leader broadcasts its choice of value each time a consensus is needed.

E.g. Paxos and Raft etc.

Zookeepers and Etcd help implement their own elected leader.

Etcd is a key-value pair. It uses the Raft algorithm to implement election Leader. Etcd provides high Availability and consistency. So we can ensure that it will elect properly.

A consensus algorithm is a mechanism in computer science used to establish agreement on a single data value across distributed processes or systems. … For blockchain networks, the consensus algorithms are an essential element because they maintain the integrity and security of these distributed computing systems.

The purpose of leader election is to choose a node that will coordinate activities of the system. In any election algorithm, a leader is chosen based on some criterion such as choosing the node with the largest identifier. Once the leader is elected, the nodes reach a particular state known as the terminated state.

Lecture 16: Peer-To-Peer Networks

A peer-to-peer (P2P) network is a group of computers, each of which acts as a node for sharing files within the group. Instead of having a central server to act as a shared drive, each computer acts as the server for the files stored upon it. When a P2P network is established over the Internet, a central server can be used to index files, or a distributed network can be established where the sharing of files is split between all the users in the network that are storing a given file.

Suppose we want a server to transfer large files then Peer-to-peer is an efficient solution.

If the central server tries to transfer files one by one to the client then it will take a while.

If there are 1000 Clients and the server wants to share 5GB files per second then it will take 1000 sec which is around 17 minutes. Which is not good at all.

In a peer-to-peer network, each client computer acts as a server or peer, which can communicate with each other. So if we divide the 5GB data into 1000 chunks and then transfer them to peer then in each 0.001 sec it will exponentially transfer by the peer to its peer. Hence the speed of transfer increases.

But there should be some system that tracks which peer received what this done by Tracker.

So there is some gossip or epidemic protocol that helps transfer.

Distributed Hash Table (DHT) uses by a peer-to-peer network so that chunks of files could be merged correctly.

E.g. Kraken (Uber), Torrent

Lecture 17: Polling and Streaming

In systems, clients and servers communicate together in which client requests and the server responds back to the respective client with requested data. Now there can be choices on at what frequency I need to update the data. Well, it all depends on your purpose and your system.

It is defined as the process when a client requests a particular piece of data at regular intervals (maybe every x second) and the server reverts to a usual response with the required data.

In these scenarios, where the client needs to update the data (get the data from the server) regularly in instant mode, polling may not benefit your architecture. Ex: You are building a chat app where you have many clients that are meant to communicate with each other in real time. As a System Design Expert, you need to make sure that the client gets the updates instantaneously (here updates mean chats, text, and messages). But Polling is not suitable everywhere as an example of a Chat box, you need to get the instantaneous message as the other end sends it, but due to a set interval of x seconds you are unable to get the instant feel of messages and your messages would feel a lot of delays

The best suitability of Polling can be getting temperature updates maybe 30sec/1min of regular intervals.

Streaming is done through sockets, you can learn about sockets in detail In Layman's terms, sockets are a file that your computer can write/read from in a long-width connection with another computer, an open connection till one machine turns it off.

Ex: Designing a Chat Application just like What Sapp/Instagram and many others

Here you might think of a choice of reducing the set intervals and using Polling instead of Streaming, that is, you may think of reducing the set intervals to 1sec/0.5sec then within 10 seconds you are requesting up to 20 requests for a single client, and for millions of clients, this would create an issue for our server to handle these requests at the same time. Here you need to notice that you may get the instant experience in messages but this is not optimal as it would create much more load on a server.

Polling and streaming are equally important depending on the scenario or business need.

Lecture 18: Configuration

It is a set of parameters or constants that the system uses. There will be a file that will be having different system configurations.

It will be mostly in JSON or Yaml format since it is more readable.

There are 2 types of Configuration File.

Static Configuration and Dynamic Configuration.

In the static configuration we need to change every time manually if the Application configuration changes.

Dynamic Configuration needs some database from where it will query the system configuration.

Dynamic gives the power to build a User Interface so that we can change from there and use a different version of the Application. But it gives some complexity to the system.

Deploying new features is easy in Dynamic Configuration. But we need to review the application so that new features don’t affect the Application.

Configuration design is a kind of design where a fixed set of predefined components that can be interfaced (connected) in predefined ways is given, and an assembly (i.e. designed artifact) of components selected from this fixed set is sought that satisfies a set of requirements and obeys a set of constraints.

Lecture 19: Rate Limiting

Rate limiting is putting some limit on some operation so that it can show an error or some response.

Suppose the Client requests a service to Server if we don’t have any limit, then it may request multiple requests that the server can’t handle.

So in Rate Limiting, we put a limit that the server accepts 4 requests every 10 sec. If the request goes high then it will show errors or messages showing we can’t handle requests.

We can solve the Denial of Service Attack by using Rate Limiting.

We can apply rate limiting using different Factors.

Distributed DoS attack is not able to be solved by Rate Limiting.

Client interacting with the server then in large scale distribution is very hard since every time request may route to a different server. So if we want rate limiting to work for specific users then we have to use some different server.

Redis is a very popular key-value database, which handles rate-limiting logic.

Lecture 20: Logging and Monitoring

Suppose a customer buys some product from a website it gets one issue that credit card gets charged but access is not given to the customer for that product. Then it is a big issue, we should know where the problem is. In a larger-scale distributed system it might be complex, here Logging and Monitoring help find user activity while purchasing the product. It also helps store the history of every user, which can help find malicious activity also.

There are various formats for saving logs.

CIS maintains a log of events that can be viewed at any time by clicking ‘View Logs’ from the General Tasks interface. The Log Viewer module opens with its home screen displaying a summary of CIS events: The left-hand side of the home screen displays a bar graph showing a comparison of the Antivirus events, Firewall events, and Defense+ events.

The JSON format annotated each line with its origin (stdout or stderr) and its timestamp. Each log file contains information about only one container. The JSON-file logging driver uses file-based storage. These files are designed to be exclusively accessed by the Docker daemon.

Google Stack Driver provides logging and monitoring.


It has monitoring tools to check the health of the system. A monitoring Matrix can be made of various logs of the system. Changing the log breaks the monitoring matrix.

So we should use a Time-Series Database, which is independent of logs.

Grafana is a multi-platform open-source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.

We also can create an alerting system, we can connect with slack if we get any errors in slack.

Lecture 21: Publish/Subscribe Pattern

Publish-subscribe is a communication pattern that is defined by the decoupling of applications, where applications publish messages to an intermediary broker rather than communicating directly with consumers (as in point-to-point). In this sense, publishers and consumers have no knowledge of each other; they simply produce or receive the events.

Persistent storage is any data storage device that retains data after power to that device is shut off. It is also sometimes referred to as non-volatile storage.

Publish/ Subscribe Pattern helps build persistent databases.

It has 4 components:





Idempotent: Performing the same operation multiple times gives the same output then it is known as idempotent. Apache Kafka, Google Cloud pub/sub.

Many more System Design concepts are yet to come.

Do give your valuable feedback and comment below.

Originally published at