After failure in interview, I thought let’s try to learn this system design in depth. Then I understand to crack interview we don’t need to learn everything. Fundamentals of System Design is enough to crack this interviews. I gone through various You Tube Channel which taught System Design. I looked to the course content of System expert then I thought with learning let’s create a series of blog which might help people in nutshell what topic to study or give importance while preparing for System Design Interview.



2)What Are Design Fundamentals?

3)Client-Server Model

4)Network Protocols


6)Latency and Throughput




10)Load Balancers


12)Relational Databases

13)Key-Value Stores

14)Replication and Sharding

15)Leader Election

16)Peer-To-Peer Networks

17)Polling and Streaming


19)Rate Limiting

20)Logging and Monitoring

21)Publish/Subscribe Pattern

“I guide others to a treasure I cannot possess”

Lecture 1: Design Fundamentals Introduction

Definition: Systems design is the process of defining elements of a system like modules, architecture, components and their interfaces and data for a system based on the specified requirements.

Description: A systemic approach is required for a coherent and well-running System.

Designing Robust, Functional, Scalable system is very important. Since it helps reduce complexity of the system and makes the system more efficient in terms of processing and accessing.

Why should you learn System Design?

Understanding the scalable System Design paradigm might help you save millions of dollars.

Better user experience which is good for business and helps you gain more clients.

Handling system failure becomes easy with good designed system architecture.

Some of the other benefits are as follows.

Some of the important topics which help building this system are as follows.

SQL, Server, Cache, Polling, Load Balancer, Availability, Databases, Map-Reduce, Client, Leader Election, Peer to Peer, HTTP, Nginx, Replication, Hashing, Network Protocol.

Lecture 2: What are Design Fundamentals?

No matter your role in the tech world, System design is a concept you should be somewhat familiar with. It is the process through which you can design and build scalable and maintainable applications.

Interview Aspects:

Always aware of simple system design which is fundamental since some time in interview the questions will be like designing YouTube with all the basic functionality.

While answering in any system design interview there is no right answer but if you want to become a software engineer you should know why and what are the basic fundamentals behind it.

You should be able to defend your solution and tackle any question from the interviewer. Here the knowing design fundamentals comes into the picture.

Categories for Design Fundamentals:

  • Underlying Foundational Knowledge.
  • Key Characteristics of System (Availability, Consistency, Latency, Throughput, Redundancy.)
  • Components of System (Load Balancer, Proxies, Caches, Leader Election etc.)
  • Technology Tools (Nginx, Cloud Service, etc.)

Lecture 3: Client-Server Model

The client-server model describes how a server provides resources and services to one or more clients. Examples of servers include web servers, mail servers, and file servers. Each of these servers provide resources to client devices, such as desktop computers, laptops, tablets, and smartphones.

It is the foundation of the Modern Computer system. It explains how 2 systems communicate.

Client speaks to server (It requests data or sends data) and Server listens to it and on the basis of client requests it responds to client.

If you want to talk to anyone you should have some contact details, similarly clients should have some address of the server so that they can interact with each other.

An IP address is a unique Identifier to the Machine. Which is used by clients so that clients can send data to specific systems or servers.

DNS (Domain Name Server) used to get the IP Address of a particular site.

DNS is the database of IP or you can say that it is a phonebook of the Internet.

DNS query is a special request which goes to the server and gets the IP address.

Client sends HTTP request to server. Client will send the packet request which contains source address and destination address. Servers are machines which receive the request. Server which has some IP address (Private), contains 65535 Ports. So it is very hard to find which port server is listening. Hence ports should be specified so that request route properly to the correct server.

If a client wants to talk to a server on HTTP it uses port 80.

For HTTPS it uses Port 443.

Local IP Address:

Lecture 4: Network Protocols

Software Engineer don’t use Network Protocol everyday but it is necessary to know this Network Protocols.

Focus on this 4 only for now.


Internet Protocol Address : Modern Internet runs on IP. One Machine interacts with another Machine in the form of IP packets (Made with Bytes). IP Packets has IP Header (Source IP Address, Destination IP Address, Total size of Packet, Version of IP), Data (2¹⁶ bytes)

One IP packet is not enough to send data. We have to send Multiple IP Packets

But then sequence will change and data may be corrupted. So to solve this we use TCP (Transmission Control Protocol).

TCP Establish the Connection via 3- Way Handshaking Protocol.

TCP is more powerful rapper around IP Address. It is used to Transfer Data.

But it does not assure security and robustness. So we have HTTP (HyperText Transfer Protocol).

How does HTTP work? As a request-response protocol, HTTP gives users a way to interact with web resources such as HTML files by transmitting hypertext messages between clients and servers. HTTP clients generally use Transmission Control Protocol (TCP) connections to communicate with servers.

Suppose we want to send data very fast manner and we don’t care about their flow sequence and error control then we use UDP.

It is used for especially time-sensitive transmissions such as video playback or DNS lookups.

But if security and correctness is core aspect of your system then avoid UDP then TCP is best used.

UDP (User Datagram Protocol) is a communications protocol that is primarily used for establishing low-latency and loss-tolerating connections between applications on the internet. It speeds up transmissions by enabling the transfer of data before an agreement is provided by the receiving party.

Lecture 5: Storage

Storage is one of the back bones of System Design. Since if you have data you should require some storage.

Database is used as storage. Store and Retrieve Data (Read and Write).

Database is just a server. Local computers also become a Database.

Databases should be persisted if power outages occur.

Database server writes the data to disk if power outage occurs then data never lost.

If we used Memory to store the data then it does not have persistence.

Memory stores the data temporarily so if we restart the server then data will be lost hence to achieve persistence we need to store the data into Disk.

While choosing Database you should know what kind of storage system you require.

Since there are so many database for different purpose and functionality.

If you want to store structured data then go for Relational DB.

If you want to store for unstructured data then go for NoSQL DB.

Here is the list of database you can study.

Lecture 6: Latency and Throughput

Latency is how long it takes data to get from one point of system to another point of system.

Amount of time taken by request from client to server and response from server to client is known as Latency.

Reading 1MB Data Sequentially from Memory takes 250 Microsecond.

Reading 1MB Data Sequentially from an SSD takes 1000 Microsecond.

Sending 1MB Data 1 Gigabyte Per second network takes 10000 Microsecond (API Request).

Reading 1MB Data Sequentially from HDD takes 20000 Microsecond.

Sending Packet California->Netherlands->California takes 150000 Microsecond.

Latency needs to be considered. It depends on the system designer if the system is for Game server then latency should be less but in Payment system uptime and accuracy is more important there we can have high latency.


How Much work a machine can perform in a given amount of time.

We calculate the throughput of the system in terms of Gigabyte i.e. how much amount of data it can transfer from one point to another.

How many requests a server can handle in a given amount of time is known as throughput.

Adding multiple servers helps improve throughput and latency.

Lecture 7: Availability

System availability means if a system gets failed like Database Fails, Components Fails etc. the system should be available to the client it should never go down.

Low Fault Tolerance is the way to increase availability.

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail. … Fault-tolerant systems use backup components that automatically take the place of failed components, ensuring no loss of service.

If System is not available then we might lose customers and have bad publicity.

High and low availability depends on system need. For e.g. YouTube down then will cause problems for Google.

Availability always measures in Nines that is 99% means 2 Nines available, 99.99% 4 Nines available.

More the nines it shows High Availability.

Service Level Agreement/Service Level Objective:

Many Service Providers made SLA for customers. SLA guarantees a certain amount (%) of availability.

To achieve High Availability we have to avoid single point Failure.

We have to make our system redundant. We can do this by adding many servers, and add load balancers to distribute the requests to different servers.

Redundancy could be invoked by just adding many components.

Passive Redundancy: When you have multiple machines and one fails then other Machine handles.

Active Redundancy: When one fails (Leader) then other machines take over via Leader Election.

It is one of the most important concepts of System Design.

Caching is to avoid the redo of the same task. It is used to speed up the system.

It also helps improve latency and throughput.

Lecture 8: Caching

Caching is the way to design the system and type of operation like network requests. We don’t have to face same type of requests again and again.

Caching done at client or server level.

So that clients don’t request a server (in client level caching).

At server level caching server reduces continuous database interactions.

Hardware level and OS level caching are also done (High level understanding requires).

In Client-Server architecture=> Client performs network request to server then server request to Database. It is time consuming sometime if we are requesting complex computation data. Every time it will compute and which will affect latency. If we use the caching concept, then it will store those results in the client site (Like Dynamic Programming).This caching will improve latency and time complexity of the system.

Website uses Caching if we load any site it takes some time depending on our internet but after loading once we navigate to other pages and come back to the previous page then it didn’t load and show the result instantaneously.

Redis (Key-Value Store) is a famous database used to implement caching.

Redis stores the data and stores in key-value form when the user requests for some data then it will check into the database if the key is present then it will return that value.

Write through Cache : When we write the piece of data in both Database and cache.

Write Back Cache : When we write only in Cache and not in Database. In this Database will change in a certain interval time.

While writing a post on LinkedIn or Facebook. We might edit those posts.

For e.g. On YouTube Many people comments if we uses different cache at client side and do not update i.e. Write Back Cache then if the user edits their comments then everyone will see a stale version of the comment.

Hence we need a redis database in between client and server.

This caching was not required while a user saw the YouTube count since it is not an important type of data. But if a user replies to a stale version of a comment then it will give bad user experience.

Static and immutable data caching is beautiful but for dynamic data it is a very tricky thing. It increases the complexity.

Least Recently Used (LRU) and Least Frequently Used (LFU) is the way to remove data from cache.

Lecture 9: Proxies

Proxy means if somebody wants to access something on internet without revealing their identity.

Forward Proxy: It is a server or set of servers it sits in between server and client. Forward mostly acts as a client site. When Client requests then it goes to forward proxies and then it goes to the server.

Server will respond to the proxy then it will forward to the client. If the client doesn’t want that server should know his identity then the client uses Forward Proxies.

For e.g. VPN (Virtual Private Network)

Reverse Proxy: It is mostly by Server site. When a client requests then it will go to reverse proxy then it will go to server. Clients do not know that all requests go to Reverse Proxy.

Server uses Reverse proxy and Client Forward proxies.

If we want that server to not get the bad request then this proxy server acts as a filter.

Proxy servers also act as load balancer. Malicious clients or servers also get filtered by these proxy servers.

Lecture 10: Load Balancer

When there are millions of users requesting a server then the server may get overloaded or may be slow. Which will lead to failure or bad user Experience.

Horizontal and vertical scaling are the 2 ways to scale the system. Vertical scaling is limited since in this we increase the computational power of the server.

In horizontal scaling we add more servers so that all the requests can be distributed across the server.

Load balancer helps doing that in a very efficient manner so that speed of server increases.

Rerouting or distributing the traffic coming from the client to the server in a balanced manner is the main work of Load Balancer.

Load Balancer is mostly in between client and server. But it also comes in between server and database or in between where load to be managed, etc.

There are software and hardware load balancers. Software load balancer is easy to customize but hardware is hard to modify.

Configuration of load balancer done by server owner.

For routing or distributing the request load balancer follows random order.

Which may cause issues if the same server gets the request. So we need some technique which distributes requests uniformly.

Round robin technique to distribute the server evenly if one server said that it is full then it will redirect to subsequent.

Weighted Round Robin technique in which the bigger the server higher the weight and load balancer will route the requests to high weighted servers since it can handle more requests than other servers.

Performance based: How much the server performs, if it is poorly performing then load balancer chooses which performs well.

IP based load balancing: In this IP hashing occurs and then it goes to server which has the IP.

Path based Load Balancing:

All the requests related to payment, different servers for different services.

Multiple Load Balancers may follow different server selection strategies also if one load balancer gets overloaded then another load balancer will help.

Lecture 11: Hashing

We can hashed or transformed a piece of data into fixed length of data.

For the same user if we cache the data in one server then load balancer will follow round robin technique to route the request but it does it in randomly which reduces cache hit.

A cache hit is a state in which data requested for processing by a component or application is found in the cache memory. It is a faster means of delivering data to the processor, as the cache already contains the requested data.

So if a request is routed to the same server then request processing will be very fast.

Hashing could help routing requests to the same server in a faster way for a particular user.

We will hash the Client/User that is mode the client number with the number of servers.

So it will assign the client to the server so that load Balancer will assign the same server to the client every time.

Hashing will distribute the client request in a desired manner.

If one of the servers dies then adding a new server and modifying hash value will reduce the cache hit.

So we need to use some concept which solves the above problem.

Consistent Hashing:

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This allows servers and objects to scale without affecting the overall system.

Server arranged in a circular manner with a sequential hash function.

Client also will be arranged in this ring, in the above diagram object can be considered as client.

Object or client’s request will be routed to the server in clockwise manner to the nearest distance.

If any server dies or adds any server then it will not affect the cache hit or response rate.

So if we want a more evenly distributed System then we can use complex hash.

We can also place the strong server into multiple places in the ring by passing to multiple hashing functions.

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of options out of a possible set of. Options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to.

Lecture 12: Relational Database

A relational database is a type of database. It uses a structure that allows us to identify and access data in relation to another piece of data in the database. Often, data in a relational database is organized into tables.

Relational Table is very structured. It will be having rules and schema.

The Relational Table will consist of rows and columns.

SQL stands for Structured Query Language. It is designed for managing data in a relational database management system (RDBMS). It is pronounced as S-Q-L or sometime See-Qwell. SQL is a database language, it is used for database creation, deletion, fetching rows, and modifying rows, etc.

Why we need the relational data?

NoSQL data is good for storing non- Relational but it is not good for querying simple data.

If we use python for querying the data then it is very hard since tera byte of data we need to first load in the memory then we can query the desired data.

Hence for large dataset python is not the best solution. SQL helps in high speed querying hence it is used so widely and used by all the organizations.

Index in Database:

It is the powerful functionality to fast the querying in SQL.

It is used for fast searching. We can create indexing on database for optimization of searching in millions of data.

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Lecture 13: Key-Value Stores

As the name suggests it stores the data in key-value form. It is the NoSql type of database.

You can relate this storage type database with a hash map.

Caching implementation uses key-value structure. Dynamic Configuration also uses this storage structure.

Since extraction of data is very fast it reduces latency and throughput.

E.g. Dynamodb, redis, Zookeeper, etc.

Some Key-value store data to disc. So if the system goes down you will not lose any data.

Redis is memory specific key-store storage type, it stores in memory if the system fails then data will be lost.

There are various other types of NoSQL also.

Lecture 14: Replication and Sharding

System performance mostly depends on the database. If the database is not available the system also might not be available. Hence to prevent database failure we need Replication of database or secondary database.

We will create a replica of the main database so that if the main database goes down then the replica will replace or take over the main db. So we need to synchronize the main database with the replica database. So they both should change their data with each other so that consistency is achieved and there will be no data integrity issue.

So if there is any write operation in the main Database then it will be updated in the replica database also.

Suppose we want write a post on LinkedIn

Sometimes terabytes of data in a database and replicating all the data is not an optimal way.

So for that the solution is sharding. We split the data across multiple databases which are known as shards.

We can use different logic/strategies to split the data. We can split on the basis of places, username, etc.

But it will cause traffic on these shards, since the split is obvious.

We should use hashing for uniform distribution across shards.

Lecture 15: Leader Election

Many organizations have various services which depend on third party services. This third part service should not directly have access to any sensitive data. So we can put one leader or server which should connect to third party services so that they don’t have direct access.

If we want to deal with failure of services we need to add redundancy by adding more copies of leader.

But we don’t want all the copies perform the same services which leads to multiple copied services which is useless. So we have to elect leaders from all the servers. Leader will perform all the services and the other server will wait. If a Leader fails then another leader is elected and then it will perform the previous leader’s services.

But after failure of a leader then how to elect a new leader is very complex since gaining consensus on who will be leader at any given point of time.

So we have to use a consensus algorithm.

A leader election algorithm is a special case of consensus algorithm where the set is the set of participants. … This is often how consensus is solved in practice: choose a leader (using a specialized consensus algorithm), then the leader broadcasts its choice of value each time a consensus is needed.

E.g. Paxos and Raft etc.

Zookeeper and Etcd help implement their own election leader.

Etcd is a key-value pair. It uses the Raft algorithm to implement election Leader. Etcd provides high Availability and consistency. So we can ensure that it will elect properly.

A consensus algorithm is a mechanism in computer science used to establish agreement on a single data value across distributed processes or systems. … For block chain networks, the consensus algorithms are an essential element because they maintain the integrity and security of these distributed computing system.

The purpose of leader election is to choose a node that will coordinate activities of the system. In any election algorithm, a leader is chosen based on some criterion such as choosing the node with the largest identifier. Once the leader is elected, the nodes reach a particular state known as terminated state.

Lecture 16: Peer-To-Peer Networks

A peer-to-peer (P2P) network is a group of computers, each of which acts as a node for sharing files within the group. Instead of having a central server to act as a shared drive, each computer acts as the server for the files stored upon it. When a P2P network is established over the Internet, a central server can be used to index files, or a distributed network can be established where the sharing of files is split between all the users in the network that are storing a given file.

Suppose we want a server to transfer large files then Peer-to-peer is an efficient solution.

If the central server tries to transfer files one by one to the client then it will take a while.

If there are 1000 Clients and the server wants to share 5GB file per second then it will take 1000 sec which is around 17 minutes. Which is not good at all.

In a peer-to-peer network each client computer acts as server or peer, which can communicate with each other. So if we divide the 5GB data into 1000 chunks then transfer to peer then in each 0.001 sec it will exponentially transfer by peer to its peer. Hence the speed of transfer increases.

But there should be some system which tracks which peer received what this done by Tracker.

So there is some gossip or epidemic protocol which helps transfer.

Distributed Hash Table (DHT) uses by a peer-to-peer network so that chunks of file could be merged correctly.

E.g. Kraken (Uber), Torrent

Lecture 17: Polling and Streaming

In systems, clients and servers communicate together in which client requests and the server responds back to the respective client with requested data. Now there can be choices on at what frequency do I need to update the data? Well, it all depends on your purpose and your system.

It is defined as the process when a client requests a particular piece of data at regular intervals (maybe every x seconds) and the server reverts with a usual response with the required data.

In these scenarios, where the client needs to update the data (get the data from the server) regularly at instant mode, polling may not benefit your architecture. Ex: You are building a chat app where you have many clients that are meant to communicate with each other in real-time. As a System Design Expert, you need to make sure that the client gets the updates instantaneously (here updates means chats, txt, messages). But Polling is not suitable everywhere as an example of Chat box, you need to get the instantaneous message as the other end sends it, but due to set interval of x seconds you are unable to get the instant feel of messages and your messages would feel a lot of delays

The best suitability of Polling can be getting temperature updates maybe 30sec/1min of regular intervals.

Streaming is done through sockets, you can learn about sockets in detail In Layman terms, sockets are a file that your computer can write/read from in a long width connection with another computer, an open connection till one machine turns it off.

Ex: Designing a Chat Application just like What Sapp/Instagram and many others

Here you might think of a choice of reducing the set intervals and use Polling instead of Streaming, that is, you may think of reducing the set intervals to 1sec/0.5sec than within 10 seconds you are requesting up to 20 requests for a single client, and for millions of clients, this would create an issue for our server to handle these requests at the same time. Here you need to notice that you may get the instant experience in messages but this is not optimal as it would create much more load to a server.

Polling and streaming are equally important depending on scenario or business need.

Lecture 18: Configuration

It is a set of parameters or constants that the system uses. There will be a file which will be having different system configuration.

It will be mostly JSON or Yaml Format since it is more readable.

There are 2 types of Configuration File.

Static Configuration and Dynamic Configuration.

In static Configuration we need to change every time manually if Application configuration changes.

Dynamic Configuration needs some database from where it will query the system configuration.

Dynamic gives the power to build a User Interface so that we can change from there and use a different version of Application. But it gives some complexity to the system.

Deploying new features is easy in Dynamic Configuration. But we need to review the application so that new features don’t affect the Application.

Configuration design is a kind of design where a fixed set of predefined components that can be interfaced (connected) in predefined ways is given, and an assembly (i.e. designed artifact) of components selected from this fixed set is sought that satisfies a set of requirements and obeys a set of constraints.

Lecture 19: Rate Limiting

Rate limiting is putting some limit on some operation so that it can show error or some response.

Suppose Client requests a service to Server if we don’t have any limit, then it may request multiple requests which the server can’t handle.

So in Rate Limiting we put a limit that the server accepts 4 requests every 10 sec. If the request goes high then it will show errors or messages showing we can’t handle requests.

We can solve Denial of service Attack by using Rate Limiting.

We can apply rate limiting using different Factors.

Distributed DoS attack is not able to be solved by Rate Limiting.

Client interacting with the server then in large scale distribution is very hard since every time request may route to different server. So if we want rate limiting to work for specific users then we have to use some different server.

Redis is a very popular key-value database, which handles rate limiting logic.

Lecture 20: Logging and Monitoring

Suppose a customer buys some product from a website it gets one issue that credit card gets charged but access is not given to the customer for that product. Then it is a big issue, we should know where the problem is. In a larger scale distributed system it might be complex, here Logging and Monitoring helps finding user activity, while purchasing the product. It also helps storing the history of every user, which can help find malicious activity also.

There are various formats for saving logs.

CIS maintains a log of events which can be viewed at any time by clicking ‘View Logs’ from the General Tasks interface. The Log Viewer module opens with its home screen displaying a summary of CIS events: The left hand side of the home screen displays a bar graph showing a comparison of the Antivirus events, Firewall events and Defense+ events.

The JSON format annotated each line with its origin (stdout or stderr) and its timestamp. Each log file contains information about only one container. The json-file logging driver uses file-based storage. These files are designed to be exclusively accessed by the Docker daemon.

Google Stack Driver provides logging and monitoring.


It has monitoring tools to check the health of the system. Monitoring Matrix can be made of various logs of the system. Changing the log breaks the monitoring matrix.

So we should use a Time-Series Database, which is independent from logs.

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.

We also can create an alerting system, we can connect with slack if we get any error in slack.

Lecture 21: Publish/Subscribe Pattern

Publish-subscribe is a communication pattern that is defined by the decoupling of applications, where applications publish messages to an intermediary broker rather than communicating directly with consumers (as in point-to-point). In this sense, publishers and consumers have no knowledge of each other; they simply produce or receive the events.

Persistent storage is any data storage device that retains data after power to that device is shut off. It is also sometimes referred to as non-volatile storage.

Publish/ Subscribe Pattern helps build persistent databases.

It has 4 components:





Idempotent: Performing the same operation multiple times gives the same output then it is known as idempotent. Apache Kafka ,Google Cloud pub/sub.

Many more System Design concepts are yet to come.

Do give your valuable feed back and comment below.

Originally published at




Lead Data Scientist at Tata Power Ltd.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vikas Maurya

Vikas Maurya

Lead Data Scientist at Tata Power Ltd.

More from Medium

My journey from backend developer to Data Engineer

Introduction to Big Data with Spark and Hadoop — Week 4. DataFrames and SparkSQL

SQL vs NoSQL Choosing the right onefor your application

Hello name ! RAW