distributed systems - freeCodeCamp.org

How to Use Time To Live in Event-Driven Architecture in AWS

Anant Chowdhary — Wed, 19 Jun 2024 18:08:45 +0000

Distributed systems generally involve the storage and exchange of huge amounts of data. Not all data is created the same, and some of it can even expire – by design.

As the Buddha said, "All conditioned things are impermanent."

In this article, we'll look at how the concept of time to live can help us with this type of data and when it makes sense to use it.

What is Time to Live (TTL) in Distributed Systems?
How to use TTL in Message Queues (AWS SQS)
How to use TTL in Object Storage Systems (AWS S3)
How to use TTL in Databases (AWS DynamoDB)
How to use TTL in Event Based Architecture
Summary

What is Time to Live (TTL) in Distributed Systems?

TTL, as the name suggests, is the amount of time a piece of data stays relevant or stays stored in a distributed system or a component of a distributed system. A TTL may be set on any piece of data that isn't needed indefinitely.

Knowing when and when not to use a TTL can sometimes be tricky. It can also affect the way a system is designed, cost and scaling considerations. In the following sections, we learn about when and when not to use TTL.

Where does TTL make sense?

As mentioned above, it makes sense to use TTL for any piece of data that is ephemeral. Some common examples of use cases where you can set a TTL on data are:

Cached data: Cached data is pretty much omnipresent in distributed systems. For instance, a very popular social media post's resources (image, video, audio) may be cached on a CDN (Content Delivery Network)'s servers. You don't want this data to live forever on the server, so in some cases it may make sense to add a TTL to this data, so that it is automatically removed after a certain period of time.
Analytics Data: Most if not all large scale systems store some form of metrics that help analyze things like latency, system health, and product metrics amongst others. In a large number of cases, you wouldn't want these metrics to be stored in systems forever. Only recent data (say 60 days or 180 days) may be useful in most cases. A TTL on data in this case makes sense, especially if you have constraints on memory.
Indexed data: Search is a feature that's ubiquitous across products. Be it social media apps, e-mail or search engines – indexed data is vital to blazing fast searches. Indexed data, however, can become stale after a while, so it makes sense for the index to expire after some time. Hence, a TTL here can be useful.
Social media apps with short lived content: Social media apps with short lived content are extremely popular and images/videos posted are often short lived. In case these images do not need to be stored for posterity, they can benefit from a TTL being set on them. In addition to being memory efficient, it also aids privacy.

Where does TTL not make sense?

In the above section we looked at a few cases where TTL makes sense. What about cases where TTL isn't common and isn't useful? Let's look at some examples:

Media stored for streaming platforms: Streaming platforms often use cloud storage solutions such as Amazon AWS S3 to store objects that correspond to the media they stream to customers. These forms of media are generally not ephemeral and are expected to stay on platforms for years if not decades. Since such data isn't expected to expire anytime, TTL does not make sense here.
Bank transactions: Bank transactions produce some of the most sensitive data that are stored in cloud-based and distributed systems. For audit and book-keeping purposes, these pieces of data are generally stored for decades. So, since this data seemingly never expires, there's generally no use for a TTL here. This isn't to say that this form of data can't be moved from fast access databases/caches to slower and cheaper forms of data storage, though.

How to Use TTL in Message Queues (AWS SQS)

AWS SQS is a distributed message queuing solution that is the backbone of many versatile distributed systems across the world. Message queues can process billions of messages and are used almost universally across distributed systems around the world.

In this section, we'll look at how TTLs can be useful while we consider design options with respect to message queues.

What happens if a message queue's consumers have been backed up for several days, or messages simply haven't been consumed for a while? We have the option of setting a custom Time To Live on SQS messages.

By default, the retention period is 4 days. The maximum TTL at the time of writing is 14 days. So it's important to be aware of constraints such as these while using AWS SQS to design systems.

Note that with AWS SQS, a retention period is a set on the queue itself, and not individually for each message.

Boto is an AWS SDK for Python that enables developers to create, configure, and manage AWS services and resources programmatically. Boto is widely used for prototyping, production systems, and in general offers a user-friendly interface for accessing services like S3, EC2, and DynamoDB.

Here's a code snippet using Boto that will help you set the MessageRetentionPeriod attribute which is the formal name for TTL in this context.

sqs = boto3.client('sqs', 
aws_access_key_id=your_aws_access_key_id, 
aws_secret_access_key=aws_secret_access_key, 
region_name='your_region')

# Set the desired retention period in seconds
retention_period_seconds = 86400  # Example: 1 day

# Set the queue attributes
response = sqs.set_queue_attributes(
    QueueUrl=your_queue_url,
    Attributes={
        'MessageRetentionPeriod': str(retention_period_seconds)
    }
)

Visibility Timeout in Message Queues (AWS SQS)

Note that while it's tempting to think of Visibility Timeout in SQS as Time To Live, these aren't the same. Time To Live or Retention Period is different from Visibility Timeout.

Visibility timeout instead refers to a generally shorter period of time by which a message should be processed (once picked up by a consumer). If not, it is back in the SQS queue and visible to consumers again, with its receive count having been increased by one.

How to Use TTL in Object Storage Systems (AWS S3)

The all-versatile AWS S3, which is an object storage solution, gives users the ability to set a Time To Live on objects stored in S3 buckets.

S3 is extremely flexible with the way TTLs are set on objects / buckets. You can set Lifecycle rules to specify what objects or what versions of an object you'd like to remove.

Managing your storage lifecycle is a great read on the AWS Documentation website.

How to Use TTL in Databases (AWS DynamoDB)

Some types of data in databases are prime candidates to have a TTL set on them. Pieces of data such as logs and analytics data may become stale very fast, and/or they may lose utility with time.

TTL in DynamoDB provides a cost-effective approach that lets you automatically remove items that are no longer relevant. It is supported natively and can be set on the whole DynamoDB table.

Here's a code snippet that lets you set the TTL on a DynamoDB table (again, using Boto):

ddb_client = boto3.client('dynamodb')

# Enable Time To Live (TTL) on an existing DynamoDB table
ttl_response = ddb_client.update_time_to_live(
    TableName=your_table_name,
    TimeToLiveSpecification={
        'Enabled': True,
        'AttributeName': your_ttl_attribute_name
    }
)

# Check for a successful status code in the response
if ttl_response['ResponseMetadata']['HTTPStatusCode'] == 200:
    print("Time To Live (TTL) has been successfully enabled.")
else:
    print(f"Failed to enable Time To Live (TTL)")

Here, the your_ttl_attribute_name attribute is the attribute that DynamoDB looks at to determine whether or not the item is to be deleted. The attribute is generally set to some timestamp in the future. When that timestamp is reached, DynamoDB removes the item from the table.

How to Use TTL in Event-Based Architecture

So far we've discussed Time To Live and where it can be useful. What about its implications? Lots of cloud based solutions provide notifications that can indicate that a piece of data has indeed reached it's expiration, and allow you to take actions based on the expiration of that data.

Let's look at a common use case. Suppose you have a social media app that you're building that lets users send each other ephemeral messages. Now while the contents of these messages themselves are ephemeral, you may still want to retain a log of what users a particular user exchanged messages with, even though the contents of the message (audio/video/image) may have expired.

The diagram below explains a possible architecture in a little more detail:

Social Media App Architecture Example

Suppose a user exchanges messages with another user. An entry corresponding to a message is stored in ActiveMessageDB which, for the purpose of simplicity, we'll suppose is a NoSQL database that stores messages.

If the app here allows for expiring messages, you could set a TTL on the entry. While the message entry itself is deleted after the TTL is reached, an event can be fired off to let a system know that the message is being deleted.

In the above diagram, the event is picked up by an AWS Lambda instance and a much smaller amount of data is written to another database MessageLogDB which isn't as frequently accessed as ActiveMessageDB. What we just saw is an instance of event-based architecture being coupled with TTL.

Summary

TTL is the amount of time a piece of data stays relevant or stays stored in a distributed system or a component of a distributed system.
TTL makes sense in use cases where data can be deleted, can expire, or its form can change after a certain period of time.
TTL is popular and generally easy to set on many distributed systems offerings.
TTL can be paired with event driven architecture to transform data.

How to Partition Queues for Distributed Workloads

Okoye Chukwuebuka Victor — Thu, 23 May 2024 20:57:00 +0000

Technology, like human lives, evolves, and with this evolution comes the use of better solutions to existing problems. You might wonder: does this complicate systems or make them more efficient?

In this article, we will discuss some measures you can take to make systems much more efficient and reliable.

What is a Distributed System?

In a very simple way, a distributed system is a group or collection of independent computer nodes connected via a central unit which share and utilise computational resources.

Think of this as a network of systems which can interact and share tasks which they can process collectively as a single unit. Therefore distributed workloads refer to the processes or tasks that run in these systems.

Now, for these systems to interact efficiently, you can make use of some protocols or designs. One of them is a queue, which you'll learn about in the next section.

What is a Queue?

A queue is a data structure which models the First-In-first-Out Principle (FIFO).

To understand this, picture a checkout station in a supermarket. The first person to join the line will be the first to be attended to. Then subsequently, the next person is attended to. This is pretty much how queues works, but now as a data structure what is being processed can be different events or messages which we will identify as workloads.

Typically in a queue system, you have the producer and the consumer. The producer is the service that sends an event or in this context, a workload into the queue. And the consumer is the service that listens and processes it.

Streamlining Workflows: Producers Feed the Queue, Consumers Process the Flow.

With the very nature of these systems, running queues across this network can be demanding and sometimes you resort to some challenges which will be discussed briefly.

Challenges of Long-Running Queues in a Distributed System.

The goal of a task queue is primarily to handle asynchronous or long-running tasks efficiently.

However, within the distributed system context, running workloads across a network can sometimes prove to be challenging as they become overwhelming for these queues. This can leads to problems such as the following:

Increase In Network Latency: Distributed systems primarily rely on communications across networks to efficiently coordinate workloads across the different computer nodes. Delayed or long-running tasks can lead to unpredictable response time(latency).
Difficulty in tracking and monitoring: It can get very complex very quickly to track down each workload especially now that they are shared across different running systems in the network. In order to efficiently manage these you will have to make use of specialised tools such as the ELK Stack.
Distributed Coordination: Ensuring that workloads being passed to different queues across the network are executed correctly and in the order they ought to is a challenge. Lack of this coordination further leads to data inconsistency.
Fault Tolerance and Resilience: Processes running in a distributed system will surely contend with breakpoints and node failures. Having workloads in long-running queues can increase these failure points, which can lead to an increase in queue backlogs, data loss or inconsistency.

The question is how best can you combat these issues, or what measures should you take to reduce the possibility of system failures or data inconsistency. An approach to solving this is by making use of partitioned task queues.

What does it mean to partition a Queue?

Partitioning can be defined as breaking down a component into smaller subsections. Using this, you can derive a definition for queue partition as breaking a queue system into smaller, more manageable bits to offload workloads in a faster and more efficient manner.

Picture a centralised queue with various workloads coming with different task types or even priorities.

When traffic or workloads are at a very high rate, this queue will eventually become overwhelmed and keep compiling backlogs.

In order to lift the burden you can divide the queue into various bits to handle different task types. For example, you can have a queue to handle priority tasks or even to process different event types, like notifications.‌

Partitioned Queues in Distributed Systems: Optimizing Workload Management Across Nodes.

Primarily the goal of partitioning a queue is to improve system performance and reduce downtimes or data inconsistency. In order to do this, you have to look at different strategies of how to partition queues in other to know the one that is best suitable for your system.

Effective Task Queue Partitioning Strategies

There are various ways in which you can break a queue into subsections, but for you to be able to do that efficiently, you can employ some tested strategies such as Range-based Partitioning, Hash-based partitioning, and so on.

Hash-based partitioning

This form of partitioning allows for the selection of a queue using an assigned hash value.

First off, a parameter in the workload is used to get a hash value. Then, you take the modulo of that value with the number of partitions (queues), which results in a partition index. This index is then used to determine the queue to which the task will be assigned.

const numPartitions = 4;

const queues = Array.from({ length: numPartitions }, (_, i) => ({
    name: `Queue ${i + 1}`,
    partitionIndex: i + 1,
}));

function getPartitionIndex(hash) {
    const partitionIndex = hash % numPartitions;
    return partitionIndex + 1;
}

function hashCode(str) {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
        const char = str.charCodeAt(i);
        hash = ((hash << 5) - hash) + char;
    }
    return Math.abs(hash);
}

const tasks = [
    { id: 1, value: "Emails Task" },
    { id: 2, value: "Transactions Tasks"},
    { id: 3, value: "Orders Tasks" },
    { id: 4, value: "Update Tasks" },
];

tasks.forEach(task => {
    const hash = hashCode(task.value);
    const partitionIndex = getPartitionIndex(hash);
    const assignedQueue = queues.find(queue => queue.partitionIndex === partitionIndex);
    console.log(`Task with ID: ${task.id} (content: ${task.value}) assigned to ${assignedQueue ? assignedQueue.name : "no queue"}`);
});

The above example involves hashing a particular value or field in the workload or event that is to be assigned. In this case, the content field was hashed, and its hash value was then used to determine which queue the workload will be assigned to. This hash value might appear random but in truth it is constant, meaning the same input will always produce the same hash value.

Task with ID: 1 (content: Emails Task) assigned to Queue 3
Task with ID: 2 (content: Transactions Tasks) assigned to Queue 4
Task with ID: 3 (content: Orders Tasks) assigned to Queue 4
Task with ID: 4 (content: Update Tasks) assigned to Queue 2

Seeing the output, if you were to change the content of one of these tasks, you will see that they will get another hash value and might be assigned to a different queue.

In a real-life scenario, if you want the workloads to be distributed amongst different queues, you should ensure the hash value is something unique among tasks, like its Identifier. You can make use of an UUID.

Range-based Partitioning

This strategy allows you to spread tasks or workloads coming from a centralised workload into subsections based on a range of task IDs. For example, queue A might only allow a task ID that falls within the range of 1 - 50, while queue B handles the task with an ID ranging from 51- 100.‌

‌This strategy ensures that tasks are distributed across each partition in order to efficiently utilise the resources available and ensure workloads are distributed.

Take a look at this code example below that shows how tasks are assigned to a partition based on the range their IDs fall in:

const { v4: uuidv4 } = require("uuid");

const numPartitions = 4;
const partitionRanges = Array.from({ length: numPartitions }, (_, i) => ({
    start: i * 250,
    end: (i + 1) * 250,
}));

const queues = Array.from({ length: numPartitions }, (_, i) => ({
    name: `Queue ${i + 1}`,
    partitionIndex: i + 1,
}));

function getPartitionIndex(hash) {
    return Math.floor(hash / 250) + 1;
}

const hashCode = (str) => {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
        const char = str.charCodeAt(i);
        hash = (hash << 5) - hash + char;
    }
    return Math.abs(hash) % 1000;
};

const tasks = Array.from({ length: 4 }, () => ({ id: uuidv4() }));

tasks.forEach((task) => {
    const hash = Math.abs(hashCode(task.id));
    const partitionIndex = getPartitionIndex(hash);
    const assignedQueue = queues.find(
        (queue) => queue.partitionIndex === partitionIndex,
    );
    console.log(
        `Task with ID: ${task.id} assigned to ${assignedQueue ? assignedQueue.name : "no queue"}`,
    );
});

In this code example, the taskId is a UUID. To match them to a partitioned queue range (each queue has a range of 250), you first hash-code it and then convert it to an absolute value.

Next, you call the PartitionIndex() function with this hash value.

From there, It finds the partitioned queue that the taskId fits within the range.

This is an example output below:

Task with ID: 493fbf04-2f4f-43c0-9486-f5c99313d4e6 assigned to Queue 2
Task with ID: 9bad272f-3369-408e-bb37-5ef00c68b0b6 assigned to Queue 3
Task with ID: e0ec2d06-d66b-4e0e-8e85-04e4755a42be assigned to Queue 1
Task with ID: 960f415b-2e64-4169-b4ee-59b11c33b451 assigned to Queue 4

Round-Robin Partitioning

For this partitioning strategy, tasks are assigned to queues in a cyclic pattern. This means that if a task gets assigned to queue 1, the next task will get assigned to queue 2, until all the queues get assigned a task before starting from queue 1 again, therefore making it to be distributed evenly amongst the queues.

const numPartitions = 4;
let currentPartition = 0;

function getRoundRobinPartitionIndex() {
    const partitionIndex = (currentPartition % numPartitions) + 1;
    currentPartition++;
    return partitionIndex;
}

const tasks = [1, 2, 3, 4, 5]; 
tasks.forEach(task => {
    const partitionIndex = getRoundRobinPartitionIndex();
    console.log(`Task with ID: ${task} assigned to Queue ${partitionIndex}`);
});

For this example, a function getRoundRobinPartitionIndex() was defined and it determines the next queue index using a round-robin strategy.

It first calculates the queue index by taking the modular value of currentPartition and numPartitions. It adds 1 to it to ensure the initial index starts from 1, then increments the currentPartition to make sure the next queue to be assigned is greater than the previous. If the modular value is 0, it returns to the queue with index of 1.

Task with ID: 1 assigned to Queue 1
Task with ID: 2 assigned to Queue 2
Task with ID: 3 assigned to Queue 3
Task with ID: 4 assigned to Queue 4
Task with ID: 5 assigned to Queue 1

As you can see, the output above shows that the tasks were distributed to each of the queues and then started from the first queue again.

Dynamic Partitioning

This type of partitioning strategy involves assigning tasks to queues based on changing workloads or system conditions.

For instance, a system has 3 queues actively working and 1 that is free. It checks for the one with the least tasks assigned to it and then assigns a new task to it. By doing this, it ensures that the queues are not overwhelmed and that system resources are efficiently utilised.

const { v4: uuidv4 } = require('uuid');
let numPartitions = 4;
let queues = Array.from({ length: numPartitions }, (_, i) => ({
    name: `Queue ${i + 1}`,
    tasks: [],
}));

function dynamicPartitioning(task) {
    let minLoadQueue = queues[0];
    for (let queue of queues) {
        if (queue.tasks.length < minLoadQueue.tasks.length) {
            minLoadQueue = queue;
        }
    }
    minLoadQueue.tasks.push(task);
    console.log(`Task with ID: ${task.id} assigned to ${minLoadQueue.name}`);
}

const tasks = Array.from({ length: 5 }, () => ({ id: uuidv4(), value: `Task content` }));

tasks.forEach(task => {
    dynamicPartitioning(task);
});

function addPartition() {
    numPartitions++;
    queues.push({ name: `Queue ${numPartitions}`, tasks: [] });
    console.log(`Added new partition: Queue ${numPartitions}`);
}

setTimeout(() => {
    addPartition();
    // Assign new tasks to demonstrate dynamic partitioning with the new queue
    const newTasks = Array.from({ length: 5 }, () => ({ id: uuidv4(), value: `New task content` }));
    newTasks.forEach(task => {
        dynamicPartitioning(task);
    });
}, 5000);

In this code above, it starts by creating three queues and dynamically assign tasks to the least loaded queue.

After a simulated put-off, a new queue is brought to deal with the elevated workload and new obligations are distributed among all available queues.

This technique ensures that the burden is balanced throughout all queues, and the queue is overloaded, making the machine more efficient and responsive to adjustments in workload.

Task with ID: 396d4fde-830b-4443-b59b-c089f3a1db49 assigned to Queue 1
Task with ID: aa85af19-e46a-4b13-9a9c-2eb3ec535af0 assigned to Queue 2
Task with ID: f93d4cbb-0cad-4b71-a424-4c27a77ed38a assigned to Queue 3
Task with ID: c7e0cc7e-f055-4a38-9869-466406a33ca8 assigned to Queue 4
Task with ID: b49fe153-73e8-45d1-85b3-a5873f836dc9 assigned to Queue 1
Added new partition: Queue 5
Task with ID: a8a5f78e-da7e-4c65-9d74-8cef57f271f3 assigned to Queue 5
Task with ID: 00ce47b0-6098-49d6-ad06-820733ffe16e assigned to Queue 2
Task with ID: 7b1a5dcf-c42c-46dc-8e29-bae1cc5408c1 assigned to Queue 3
Task with ID: 3e31ad34-1a19-4d3c-a8f3-08f765a45f13 assigned to Queue 4
Task with ID: 73d55243-27da-44a4-9192-532a20889373 assigned to Queue 5

There are other strategies, like content-based partitioning, random-based partitioning, or even a user-defined custom type.

All these strategies are good but can still have some issues if not properly managed.

For instance, if you are using the range-based partition strategy and you select an identifier which falls within the set range of a queue, that queue partition will eventually get overwhelmed. It is left up to you to figure out the one that is best for your system requirements.

Conclusion

Processing workloads in distributed systems can be kind of complex and inefficient if not managed properly.

Partitioning queues across the system and having the workloads be distributed to them based on a strategy is one way to improve your system performance.

In this article, some challenges with long running queues were pointed out, and some strategies to relieve them were given. I have been researching distributed systems in run-time environments and databases and this was one of my findings.

I hope you enjoyed reading this because I enjoyed writing this. You can follow me on Twitter(X).

How to Deal with Traffic Surges in Distributed Systems

Anant Chowdhary — Fri, 17 May 2024 06:50:28 +0000

Web and Distributed Systems can often get overwhelmed with traffic.

What leads to systems being overwhelmed, why does it happen, and what are some common strategies we can use to deal with this? We'll answer these questions in this article.

What is traffic in the context of distributed systems?
Why can traffic surges be problematic?
Ways to deal with high traffic loads
Exponential Backoff and Retries
Summary

What is Traffic in the Context of Distributed Systems?

Traffic in distributed systems generally refers to the exchange of data between end users and the entry point to a system that may rely on distributed components.

The patterns of traffic a system sees usually informs multiple design decisions since it impacts performance, scalability and reliability of a system.

Why Can Traffic Surges be Problematic?

Traffic surges can often cripple systems that aren’t equipped to deal with them.

You may have come across instances of social media services such as Instagram or TikTok being down. In some cases, this may be due to surges of traffic.

Here are some common problems a surge in traffic may cause:

Congestion: As traffic increases, network congestion may increase. This may in some cases, lead to packet loss, increased latency and impact performance of systems.
Imbalanced load: Not all distributed systems balance load well. A sudden spike in traffic may lead to failures in particular sub-systems. As an example, let’s think of a celebrity’s tweets being stored on a shard. In the scenario where an event leads to millions of people accessing the celebrity’s tweets, the shard that stores the celebrity’s tweets may get overwhelmed.
Cascading failures: Imagine a set of domninoes that are placed right next to each other. One domino falling may lead to the entire set of dominoes falling. Distributed systems are similar. If components aren't loosely coupled, a single point of failure may lead to cascading failures. It is therefore important to consider cascading failures when designing distributed systems with high traffic loads in mind.

Ways to Deal with High Traffic Loads

No system is immune to failure under an unspecified amount of traffic.

Fortunately, there are some design decisions you can take to ameliorate the problems discussed above, and make systems more resilient against failure when they see a sudden spike in traffic.

Now, let's cover some of the commonly used solutions that can help deal with surges in traffic.

Firstly, horizontal scaling is generally the process of adding resources to a system by adding more resources in parallel. For instance, adding more servers or adding more CDN nodes.

In effect, it is adding more resources instead of increasing the capacity of a single node in the network.

Distributing traffic across servers can lead to improved performance, lower latency, and improved response times in general.

Next, load balancing can sometimes be closely linked with horizontal scaling. However, load balancing by itself also can be very useful in situations where we see a sudden surge in traffic.

Load balancers can smartly route requests to servers so that traffic is well balanced across systems, and doesn't overwhelm one particular system.

In addition, caching can dramatically reduce the need for traffic (requests) to go all the way to a server for fulfilment. Some types of requests, such as those that access static content as great candidates for caching.

Similar to an example that we discussed in the above sections, let's assume that there's a sudden spike in people viewing a celebrity's tweets. The static content on the page, such as the celebrity's display picture, can be easily cached. This will prevent a request that goes all the way to the profile picture database, and therefore may help prevent read failures, and in turn cascading failures.

Lastly, consider a scenario where a client internal to a distributed system sends a request to the server and the request fails. Clients often retry requests, but this may lead to cascading retries.

This is a scenario where multiple clients (the original one, and the one downstream) may be retrying their requests, and as a result, a system downstream may be inundated with requests and that by itself may lead to cascaded failures.

A single retry request can lead to an exponential number of retries in other parts of a distributed system

In the figure above, we can see that two requests from the server (one retry in the event of a failure at the Queue), leads to :

1) Four Requests from the serverless component to the Notification Topic (two requests to the serverless component and two retries)

2) Eight Requests from the Notification Topic to the Queue (four requests to the queue, four retries).

Even a single failure at the end component (the Queue in this case), led to an exponential number of retry requests to it.

A common antidote to this problem is to use exponential backoff while retrying requests.

Exponential Backoff and Retries

Exponential backoff, as the name suggests, refers to introducing a delay before the next attempt instead of immediately retrying. We increase the delay time exponentially with each attempt.

For instance, the first retry might wait for one second, the second retry waits for two seconds, the third for four seconds, and so on.

Note that since a retry attempt isn't made immediately, the probability of cascading retries goes down compared to if retries were made immediately.

Here's some code that illustrates exponential backoff in action :


def exponential_backoff_retry(url, max_retries=5, initial_delay=1, backoff_factor=2):
    retry_count = 0
    while retry_count < max_retries:
        try:
            response = requests.get(url)
            # Check if the response was successful (status code 2xx)
            if response.status_code // 100 == 2:
                return response
            # If not successful, raise an exception to trigger a retry
            response.raise_for_status()
        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            # Calculate the exponential backoff delay
            delay = initial_delay * (backoff_factor ** retry_count)
            print(f"Retrying in {delay} seconds...")
            time.sleep(delay)
            retry_count += 1
    # If max retries reached, raise an exception
    raise RuntimeError(f"Failed to fetch {url} after {max_retries} retries")

The code above retries HTTP GET requests using an exponential backoff strategy.

Inside the while loop, we make attempts to make the request and check if the response is successful (successful HTTP requests have a status code of 2xx).

Note that if the request fails, we raise an exception and retry the request after calculating an exponential delay.

This process is continued until either a request succeeds or the maximum number of retries is reached. If the maximum retries are exhausted without success, we raise a RuntimeError.

Summary

To summarize, we delved into the significance of traffic within distributed systems, emphasizing its influence on system performance and resilience.

We looked at the complications arising from traffic surges, including network congestion, load imbalances, and cascading failures, which can render systems vulnerable to collapse.

To address these challenges, we looked at some strategic measures such as horizontal scaling, load balancing, caching, and retry strategies. Particularly, we looked at the effectiveness of exponential backoff in mitigating cascading retries, thereby enhancing system robustness.

By keeping in mind some of these solutions, systems can better manage sudden spikes in traffic, ensuring sustained functionality and minimizing potential downtime, ultimately bolstering overall system reliability.

These are just a few of the numerous methods that are used industry wide to deal with surges in traffic.

Asynchronous vs Batch Data Processing in Distributed Systems – Explained with Examples

Anant Chowdhary — Wed, 20 Mar 2024 15:13:11 +0000

Distributed Systems often process and store huge amounts of data. Processing this data efficiently is typically an ongoing endeavor, and how it is designed almost always affects the end-user experience of a product.

Two popular modes of processing data are Batch Processing and Asynchronous Processing. We'll learn more about both in this article, along with when to use each approach.

Batch Processing of Data
What is Batch Processing?
When Do We Use Batch Processing?
Real World Example of Batch Processing of Data
What Does Batch Processing Look Like in Code?
Asynchronous Processing of Data
What is Asynchronous Processing?
When Do We Use Asynchronous Processing
Real World Example of Asynchronous Processing of Data
What Does Async Processing Look Like in Code?
Summary

Batch Processing of Data

What is Batch Processing?

Batch Processing, as you may have guessed, waits for a certain amount of data to be accumulated, and then processes this batch of data in one go. In other words, this means that in most scenarios we would wait for some number of events to complete and then process the data.

This is different from asynchronous processing of data, where we process an event and its associated data as soon as it occurs. More on that soon.

Now that you know a bit more about batch processing, it'll be useful to see a couple of real world examples.

When Do We Use Batch Processing?

Batch processing is used in lots of scenarios, such as:

Large volume of data: When we have a very large amount of data, it is often more resource-efficient to let the data collect over a period of time and then process it.
Data that isn't time sensitive: Since batch processing waits for data to collect, it is generally not suitable for processing data that's very time sensitive. On the other hand, it is possible to process batches of data within short intervals of time.
Scheduled Processing of data: In lots of instances, we need a large amount of data to be processed at regular intervals. Automated system backup and updates, for example, are generally scheduled for particular intervals. Batch processing can be very useful in such scenarios.

Real World Example of Batch Processing of Data

A popular real world use case for batch processing is credit card transactions.

Many financial institutions choose to settle credit card transactions in batches instead of settling them in real time. Since the settlement of transactions is generally not very time sensitive, this gives systems the time to run various other analyses / jobs on the transactions such as fraud detection, currency conversions etc.

Credit Card Transactions and Batch Processing

The diagram above shows a very high level example of a lifecycle of a credit card transaction. The steps are as follows:

The credit card transaction takes place at the Point of Sale (POS).
A gateway forwards the request to a serverless component that writes the transaction to a staging database where the transaction is stored temporarily.
At the end of the business day, the transactions in the staging database are reconciled and go through fraud detection. This is the component where batch processing takes place (note that we waited for some data to collect, and processed a large amount of data).

What Does Batch Processing Look Like in Code?

We saw an example of a distributed system in the above example. How would batch processing look like in code?

Below you'll see some code that lets you process a batch of SQS messages:

import boto3 

def process_batch_messages(sqs, queue_url):
    partial_response = sqs.receive_message(
        QueueUrl=queue_url
        MaxNumberOfMessages=10 # This sets the maximum batch size to 10
        WaitTimeSeconds=10 # We wait for a maximum of 10 seconds
    )
    if 'Messages' in partial_response:
        messages = partial_response['Messages']
        for each message in messages:
            # do something with each message

            # remove the message from the queue after processing
            sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])


if __name__ == '__main__':
    # Initialize sqs client
        sqs = boto3.client(
        'sqs',
        aws_access_key_id='Your access key id',
        aws_secret_access_key='your secret access key,
        region_name='Your AWS region'
    )
    your_queue_url = 'your-queue-url'
    process_batch_messages(sqs, your_queue_url)

The above code waits for the earlier of two events: either 10 seconds having passed, or a batch size of 10 being reached within the queue.

Asynchronous Processing of Data

What is Asynchronous Processing?

The word asynchronous is generally defined as "events that are not coordinated in time". As the definition suggests, asynchronous processing of data does not rely on coordination of data events, and these events are processed as and when they occur.

This means that as soon as an event occurs, the event is processed and the data corresponding to the event may be stored in a sub system, passed on to another component in the system, or may simply lead to another event being fired off.

When Do We Use Asynchronous Processing?

You'll use asynchronous processing of data (sometimes also referred to as async) in various scenarios.

Microservices: Microservices often involve a request that needs an immediate response. Since this processing is done "per event", this would require async processing of data, so in most cases results are returned to clients within a very short period of time (low latency).
User Interfaces: Often, components in user facing UI components need to use async processing of data. For instance, multiple data fetches can be performed in the background using async calls when a user is using an application. This ensures that the application works smoothly and responsively without having the need for the UI components to "freeze".
Systems that require real time responses: Many interactive systems require real time processing of data. In the past few years, video calls and meetings have become increasingly popular. Since systems like these require immediate requests and responses (and in some cases streams of data being processed), async processing of data is used here.

Real World Example of Asynchronous Processing of Data

Chat apps are a great example of asynchronous processing of data. Here, if a user 1 types a message and sends it to user 2, the message must be written to the required databases / systems, delivered to user 2, and possibly read by user 2 without any delay.

Since this is real time processing of the event that occurred here (the event being that a message was sent), this is an example of asynchronous processing of data.

Exchange of messages in a chat app

In the above diagram we see that User 1 sends a message through their phone. The message gets routed to a message server which ultimately creates an entry in a messages database (Messages DB).

Now that MessagesDB has an entry, an event is fired off that is consumed by the Notification Pusher. This then communicates with User 2's notification queue to put a notification related to the message in their notification queue.

Whenever User 2's device comes online or has access to the internet, they receive a message notification.

Note that we did not wait for any data to collect, nor did we process this data after any specific time delay. We processed the event as soon as it happened. So this is an example of asynchronous processing of data.

What Does Async Processing Look Like in Code?

Can we modify the code that we saw in the section for batch processing to work for async processing? Remember that we said "this code waits for the earlier of two events: 10 seconds having passed, or a batch size of 10 being reached within the queue".

If we change the batch size to 1, we would effectively process a message as soon as it is received.

import boto3 

def process_async_messages(sqs, queue_url, batch_size):
    partial_response = sqs.receive_message(
        QueueUrl=queue_url
        MaxNumberOfMessages=batch_size # This sets the maximum batch size
        WaitTimeSeconds=10 # We wait for a maximum of 10 seconds
    )
    if 'Messages' in partial_response:
        messages = partial_response['Messages']
        for each message in messages:
            # do something with each message

            # remove the message from the queue after processing
            sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])


if __name__ == '__main__':
    # Initialize sqs client
        sqs = boto3.client(
        'sqs',
        aws_access_key_id='Your access key id',
        aws_secret_access_key='your secret access key,
        region_name='Your AWS region'
    )
    your_queue_url = 'your-queue-url'
    process_batch_messages(sqs, your_queue_url, 1)

Note that in the above code we modified the process_batch_messages to accept a batch_size parameter and renamed the method to process_async_messages. This method processes a message as soon as the queue receives a method (assuming the queue has received a message within the wait time of 10 seconds)

Summary

Let's summarize batch and asynchronous data processing.

Batch Processing is a paradigm where you wait for an amount of data to collect or some time to pass before the data is processed.

Batch processing is often used in scenarios where you have large volumes of data, data that isn't time sensitive, and data that can be processed on a set schedule. The example we discussed above was that of a credit card transaction.

Asynchronous processing of data, on the other hand, is used to process data related to events as soon as they occur.

This approach is often used when dealing with data processed in microservices, user interfaces, and in general with systems needing real time request-response processing. We looked at an example of a chat app in the above discussion and learnt how asynchronous processing of data is applicable to the scenario.

How to Create Multiple Virtual Hosts on a Single RabbitMQ Instance

freeCodeCamp — Wed, 26 Jul 2023 21:25:15 +0000

By Ridwan Yusuf

Hi, friends, and welcome to another tutorial. We'll be talking about RabbitMQ, a tool that is widely used as a message broker in distributed and Pub-Sub systems.

While multiple applications can share a single RabbitMQ instance, it's a great approach to configure it to have different virtual hosts for each application.

One practical usage in a Python application is using it as a Celery broker URL or simply as a message queue.

One benefit of this is that it saves the costs that comes with spinning up multiple instances of RabbitMQ. Creating RabbitMQ virtual hosts is similar both locally and when running in a self-managed solution such as AWS RabbitMQ

By the end of this guide, you will be able to:

Create multiple virtual hosts for a single RabbitMQ instance.
Test the connections using the correct credentials, that is user, password, and virtual host.

Prerequisites to follow along:

Ensure you have Docker and Docker Compose installed, or
Have a self-managed RabbitMQ instance from AWS, CloudAMQP, or any other service provider

Without any delay, let’s get started

How to Run a RabbitMQ Instance Locally

To keep things simple, we are going to run the RabbitMQ instance in a Docker container.

Create a file named docker-compose.yml and update it with the following content:

docker-compose.yml

version: "3.8"
services:
  rabbitmq:
    image: rabbitmq:3.8-management-alpine
    container_name: 'rabbitmq_test'
    environment:
      - RABBITMQ_DEFAULT_USER=sampleuser
      - RABBITMQ_DEFAULT_PASS=samplepassword
    ports:
        - 5672:5672
        - 15672:15672
    volumes:
      - ~/.docker-conf/rabbitmq/data/:/var/lib/rabbitmq/
      - ~/.docker-conf/rabbitmq/log/:/var/log/rabbitmq

To start the container, run this command in the terminal:

docker-compose up

Once the container is up and running, access the GUI by visiting this link: http://localhost:15672/

RabbitMQ GUI login page

Use the credentials specified in the docker-compose file to gain access to the dashboard – “sampleuser” and “samplepassword” in my case.

Note: If you are using a self-managed RabbitMQ instance from AWS or CloudAMQP, the RabbitMQ connection string will look similar to this:

amqps://ausername:apassword@32wewjijiokkoo.mq.eu-west-3.amazonaws.com:5671

To access the RabbitMQ admin dashboard, follow these steps:

Visit the URL using the HTTPS protocol: https://32wewjijiokkoo.mq.eu-west-3.amazonaws.com
Remove the username, password, and port from the URL.
After accessing the URL, the system will prompt you to enter the username and password specified in the RabbitMQ connection string (“ausername” and “apassword” in this case). Provide these credentials to log in to the admin dashboard.

How to Create a Virtual Host

Once you are logged in, click on the ‘Admin’ tab, and then navigate to the ‘Virtual Hosts’ tab as shown in the image below.

RabbitMQ Virtualhost tab

Click on ‘Add a new virtual host’, and provide it with an appropriate name. Finally, click on the ‘Add virtual host’ button to create the new virtual host.

New Virtual Host created

How to Create a New User

On the Users tab, click on the ‘Add a user’ toggle. Specify a Username and Password for the new user. If you wish, you may grant the user access to the Admin (GUI) interface. Finally, click on the ‘Add user’ button to create the new user.

Create a new user

How to Assign the New User to the Created Virtual Host (Vhost)

On the Users tab, locate and click on the previously created user, which was named ‘user1’ in my case.

New user in users list

For the new user, choose the virtual host as ‘new-virtual-host’, and then click the ‘Set permission’ button. This grants the user access to the specified virtual host. To confirm this, go back to the Users tab and verify that the user now appears with access to the ‘new-virtual-host'

Assign new user to the created virtual host

By now, the new user 'user1' has been configured to access the vhost 'new-virtual-host'.

New user assigned to the created virtual host

How to Test the Connection for the New User and Vhost

To test if the new user can access the new vhost, you can use the following sample Python script. Once everything is working correctly, you can use the URL string to connect to the RabbitMQ broker in any application.

_testrabbit.py

import pika

def test_rabbitmq_url(url):

    try:
        params = pika.URLParameters(url)
        connection = pika.BlockingConnection(params)
        connection.close()
        print("Connection successful!")

    except pika.exceptions.AMQPError as e:
        print("Connection failed:", e)

test_rabbitmq_url('amqp://user1:password@localhost:5672/new-virtual-host')

Note how we have specified the RabbitMQ URL: it has the username as “user1” and password as “password9,” while the virtual host is named ‘new-virtual-host.’

Remember to install pika by running pip install pika on the command line.

To run this script on the terminal, enter the following command:

python test_rabbit.py

Successfully connected to RabbitMQ

This indicates a successful connection to the RabbitMQ instance.

Change the password in the utility function to an incorrect one, and you should see an outcome similar to this:

Connection to RabbitMQ failed

Wrapping Up

In conclusion, creating multiple virtual hosts on a single RabbitMQ instance can significantly enhance your message queue management capabilities.

By effectively segregating applications and resources, you ensure better organization, security, and scalability. With the step-by-step guide provided in this article, you now have the knowledge to harness the full potential of RabbitMQ

Thanks for reading, and I hope you enjoyed the article.

For more interesting content and tips, please feel free to check out my video collection on YouTube and consider hitting the subscribe button.

You can also find me on LinkedIn.

The Design Patterns for Distributed Systems Handbook – Key Concepts Every Developer Should Know

Tamerlan Gudabayev — Mon, 05 Jun 2023 20:49:03 +0000

When I first started my career as a backend engineer, I always worked with monolithic systems.

The work was good but I always had this thought in the back of my mind:

"Man, I want to work on big systems such as ones for Google, Netflix, etc..."

I was 19 and a junior developer, so cut me some slack here.

I didn't even know the term distributed systems until one of my colleagues started talking about it.

Then I researched it – I researched a lot. It seemed very complicated, and I felt very stupid.

But I was excited.

I studied this concept of distributed systems for a while, but didn't fully understand it until I saw it all in action a few years later.

Now that I have some experience, I would like to share with you what I know about distributed systems.

Prerequisite Knowledge

The topics I'll be discussing here may be a bit advanced for beginner programmers. To help you be prepared, here's what I assume you know:

Mid Level Programming (any language will work)
Basic Computer Networking (TCP/IP, network protocols, and so on)
Basic Data Structures and Algorithms (Big O notation, search, sort, and so on)
Databases (Relational, NoSQL, and so on)

If that sounds like a lot, don't feel discouraged.

Here are some resources that can help you brush up on some of these more specific topics:

Learn how computer networks work in this free course
Learn about data structures and algorithms in this free course
Learn about relational databases here, and about NoSQL databases here.

Here's What We'll Cover:

What are Distributed Systems?
Common Challenges in Distributed Systems
Command and Query Responsibility Segregation (CQRS) Pattern
Two-Phase Commit (2PC) Pattern
Saga Pattern
Replicated Load-Balanced Services (RLBS) Pattern
Sharded Services Pattern
Sidecar Pattern
Write-Ahead Log Technique
Split-Brain Pattern
Hinted Handoff Pattern
Read Repair Pattern
Service Registry Pattern
Circuit Breaker Pattern
Leader Election Pattern
Bulk Head Pattern
Retry Pattern
Scatter Gather Pattern
Bloom Filters Data Structure

What are Distributed Systems?

Netflix Architecture. Source

When I started my career, I worked as a front-end developer for an agency.

We used to get requests from clients and then we'd just build their site.

Back then I didn't fully understand the architecture and infrastructure behind the stuff I was building.

Thinking back about it now, it wasn't complicated at all.

We had one backend service written in PHP and Yii2 (PHP Framework) and a frontend written in JavaScript and React.

All of this was deployed to one server hosted in ps.kz (Kazakhstan Hosting Provider) and exposed to the internet using NGINX as a web server.

This architecture works for most projects. But once your application becomes more complex and popular, the cracks begin to show.

You get problems such as:

Complexity – Codebase is too large and too complex for one person to mentally handle. It's also hard to create new features and maintain old ones.
Performance Issues – The popularity of your app caused it to have a lot of network traffic and it has proven to be too much for your single server. Because of that, the application has started to face performance issues.
Inflexibility – Having a single codebase means that you are stuck with the technology stack that you began with. If you want to change it, then you will either have to rewrite the whole thing in another language or break up the app.
Fragile System – Code being highly coupled together means that if any of the functionality breaks then the whole application will break. This leads to more downtime which will lose the business more money.

There are many ways to optimize a monolithic application, and it can go very far. Many big tech companies such as Netflix, Google, and Facebook (Meta) started off as monolithic applications because they're easier to launch.

But they all began facing problems with monoliths at scale and had to find a way to fix it.

What did they do? They restructured their architectures. So instead of having a single super service that contained all the features of their business, they now had multiple independent services that talk to each other.

This is the basis of distributed systems.

Some people mistake distributed systems for microservices. And it's true – microservices are a distributed system. But distributed systems do not always follow the microservice architecture.

So with that in mind, let's come up with a proper definition for distributed systems:

A distributed system is a computing environment in which various components are spread across multiple computers (or other computing devices) on a network.

Common Challenges in Distributed Systems

Distributed systems are by far much more complex than monolithic ones.

That's why before migrating or starting a new project, you should ask the question:

Do I really need it?

If you decide that you do need a distributed system, then there are some common challenges you will face:

Heterogeneity – Distributed systems allow us to use a wide range of different technologies. The problem lies in how we keep consistent communication between all the different services. Thus it is important to have common standards agreed upon and adopted to streamline the process.
Scalability – Scaling is no easy task. There are many factors to keep in mind such as size, geography, and administration. There are many edge cases, each with their own pros and cons.
Openness – Distributed systems are considered open if they can be extended and redeveloped.
Transparency – Transparency refers to the distributed system's ability to conceal its complexity and give off the appearance of a single system.
Concurrency – Distributed systems allow multiple services to use shared resources. Problems may arise when multiple services attempt to access the same resources at the same time. We use concurrency control to ensure that the system remains in a stable state.
Security – Security is comprised of three key components: availability, integrity, and confidentiality.
Failure Handling – There are many reasons for errors in a distributed system (for example, software, network, hardware, and so on...). The most important thing is how we handle those errors in a graceful way so that the system can self-heal.

Yea, designing robust and scalable distributed systems is not easy.

But you are not alone. Other smart people have faced similar problems and offer common solutions which are called design patterns.

Let's cover the most popular ones.

PS. We won't only cover patterns, but anything that helps in distributed systems. This may include data structures, algorithms, common scenarios, etc...

Command and Query Responsibility Segregation (CQRS) Pattern

Source

Imagine that you have an application with millions of users. You have multiple services that handle the backend, but as a single database.

The problem arises when you do reads and writes on this same database. Writes are a lot more expensive to compute than reads, and the system starts to suffer.

This is what the CQRS pattern solves.

It states that writes (commands) and reads (queries) should be separated. By separating writes and reads, we can allow developers to optimize each task separately.

For example, you might choose a high-performance database for write operations and a cache or search engine for read operations.

Pros

Code Simplification – Reduces system complexity by separating writes and reads.
Resource Optimizations – Optimizes resource usage by having a separate database for writes and reads.
Scalability – Improves scalability for reads as you can simply add more database replicas.
Reduce Number of Errors – By limiting the entities that can modify shared data, we can reduce the chances of unexpected modifications of data.

Cons

Code Complexity – Adds code complexity by requiring developers to manage reads and writes separately.
Increased Development Time – This can increase development time and cost (in the beginning only).
Additional Infrastructure – This may require additional infrastructure to support separate read and write models.
Increased Latency – It can cause increased latency when sending high-throughput queries.

Use Cases

CQRS is best used when an application's writes and reads have different performance requirements. But it is not always the best approach, and developers should carefully consider the pros and cons before adopting the pattern.

Here are some use cases that utilize the CQRS pattern:

E-Commerce – Separate read models for product catalogs and recommendations, while the write side handles order processing and inventory management.
Banks – Optimize read models for balance inquiries and reporting, while the write side handles transactions and calculations.
Healthcare – CQRS can be used to optimize the reads for patient searches, medical record retrieval, and generating reports, while the write side manages data updates, scheduling, and treatment plans.
Social Media –By applying CQRS, the read models can efficiently handle feed generation, personalized content recommendations, and user profile queries, while the write side handles content creation, updates, and engagement tracking.

Two-Phase Commit (2PC) Pattern

_Source_

2PC solves the problem of data consistency. When you have multiple services talking to a relational database, it's hard to keep the data consistent as one service can create a transaction while the other aborts it.

2PC is a protocol that ensures that all services commit or abort a transaction before it is completed.

It works in two phases. The first phase is the Prepare phase in which the transaction coordinator tells the service to prepare the data. Then comes the Commit phase, which signals the service to send the prepared data, and the transaction gets committed.

2PC systems make sure that all services are locked by default. This means that they can't just write to the database.

While locked, the services complete the Prepare stage to get their data ready. Then the transaction coordinator checks each service one-by-one to see if they have any prepared data.

If they do, then the service gets unlocked and the data gets committed. If not, then the transaction coordinator moves on to another service.

2PC ensures that only one service can operate at a time, which makes the process more resistant and consistent than CQRS.

Pros

Data Consistency – Ensures data consistency in a distributed transaction environment.
Fault Tolerance – Provides a mechanism to handle transaction failures and rollbacks.

Cons

Blocking – The protocol can introduce delays or blocking in the system, as it may have to wait for unresponsive participants or resolve network issues before proceeding with the transaction.
Single point of failure – Reliance on a single coordinator introduces a potential point of failure. If the coordinator fails, the protocol may be disrupted, leading to transaction failures or delays.
Performance overhead – The additional communication rounds and coordination steps in the protocol introduce overhead, which can impact the overall performance, especially in scenarios with many participants or high network latency.
Lack of scalability – As the number of participants increases, the coordination and communication overhead also increase, potentially limiting the scalability of the protocol.
Blocking during recovery – The protocol may introduce blocking during recovery until the failed participant is back online, impacting system availability and responsiveness.

Use Cases

2PC is best used for systems that deal with important transaction operations that must be accurate.

Here are some use cases where the 2PC pattern would be beneficial:

Distributed Databases – Coordinating transaction commits or aborts across multiple databases in a distributed database system.
Financial Systems – Ensuring atomic and consistent transaction processing across banks, payment gateways, and financial institutions.
E-commerce Platforms – Coordinating services like inventory management, payment processing, and order fulfillment for reliable and consistent transaction processing.
Reservation Systems – Coordinating distributed resources and participants in reservation processes for consistency and atomicity.
Distributed File Systems – Coordinating file operations across multiple nodes or servers in distributed file systems to maintain consistency.

Saga Pattern

So let's imagine that you have an e-commerce app that has three services, each with its own database.

You have an API for your merchants which is called /products to which you can add a product with all its information.

Whenever you create a product, you also have to create its price and meta-data. All three are managed in different services with different databases.

So, you implement the simple approach of:

Create product -> create price -> create meta-data

But what if you created a product but failed to create a price? How can one service know that there was a failed transaction of another service?

The saga pattern solves this problem.

There are two ways to implement sagas: orchestration and choreography.

Orchestration

Source

The first method is called Orchestration.

You have a central service that calls all the different services in the right order.

The central service makes sure that if there is a failure, it will know how to compensate for that by reverting transactions or logging the errors.

Pros

Suitable for complex transactions that involve multiple services or new services added over time.
Suitable when there is control over every participant in the process and control over the flow of activities.
Doesn't introduce cyclical dependencies, because the orchestrator unilaterally depends on the saga participants.
Services don't need to know about commands for other services. There is a clear separation of concerns which reduces complexity.

Cons

Additional design complexity requires you to implement a coordination logic.
If the orchestrator fails then the whole system fails.

When to Use Orchestration?

You should consider using this pattern:

If you need a centralized service that coordinates of the workflow.
If you want a clear and centralized view of the workflow which makes it easier to understand and manage the overall system behavior.
If you have complex and dynamic workflows that require a high degree of coordination and centralized control.

Choreography

Source

On the other hand, the Choreography method doesn't use a central service. Instead, all communication between servers happens by events.

Services will react to events and will know what to do in case of success or failure.

So for our example above, when the user creates a product it will:

Create an event called product-created-successfully
Then the price service will react to the event by creating a price for the product and it will then create another event called price-created-successfully
The same logic applies to the meta-data service.

Pros

Suitable for simple workflows that don't require complex coordination logic.
Simple to implement because it doesn't require additional service implementation and maintenance.
There is no single point of failure as responsibilities are distributed between the services.

Cons

Difficult to debug because it's difficult to track which saga services listen to which commands.
There's a risk of cyclic dependency between Saga services because they have to consume each other's commands.
Integration testing is difficult because all services must be running to simulate a transaction.

When to Use Choreography:

You should consider using this pattern if:

The application needs to maintain data consistency across multiple microservices without tight coupling.
There are long-lived transactions and you don’t want other microservices to be blocked if one microservice runs for a long time.
You need to be able to roll back if an operation fails in the sequence.

Replicated Load-Balanced Services (RLBS) Pattern

Source

This is essentially a load balancer – I don't know why they made it sound so intimidating.

A load balancer is software or hardware that distributes network traffic equally between a set of resources.

But that's not always the case – it can also route different routes to different services.

So for example:

/frontend goes to the front-end service.
/api goes to the backend service.

Pros

Performance – Load balancing distributes the workload evenly across multiple resources, preventing any single resource from becoming overloaded. This leads to improved response times, reduced latency, and better overall performance for users or clients accessing the system.
Scalability – Load balancing allows you to scale horizontally, meaning that instead of getting more powerful servers, you can get more servers.
High availability – As said above, load balancing allows us to vertically scale which means we have multiple servers. If one server fails then the load balancer will detect that and traffic can be redirected to other working servers.
Better resource utilization – Load balancing helps to optimize resource utilization by distributing traffic evenly across multiple servers or resources. This ensures that each server or resource is used efficiently, helping to reduce costs and maximize performance.

Cons

Complexity – Implementing and configuring load balancing can be complicated especially for large scale systems.
Single Point of Failure – While load balancers enhance system availability, they can also become a single point of failure. If the load balancer itself fails, it can cause a disruption in service for all resources behind it.
Increased Overhead –Load balancers computations are not free, if they are not controlled then they can become a bottleneck to the entire system.
Session Handling Challenges – Load balancing stateful applications is a bit tricky as you need to maintain sessions. It requires additional mechanisms such as sticky sessions or session synchronization, which adds complexity.

When to use load balancing

A load balancer is mainly used when:

You have a high traffic website and you want to spread the load so that your servers don't fry.
You have users from all over the world and want to serve them data from their closest location. You could have a server in Asia and another in Europe. The load balancer would then route all users from Asia to the Asian server and European users to the Europe server.
You have a service orientated architecture with API's corresponding to different services. Load balancing can be used as a simple API gateway.

I've written a bunch of articles on load balancing, so feel free to check them out.

Sharded Services Pattern

Source

In the previous section, we talked about replicated services. Any request can be processed by any of the services. This is because they are replicas of one another.

This is good for stateless services. But what if you have a stateful service? Then a sharded approach would be more appropriate.

Sharded service only accepts certain kinds of requests.

For example, you may have one shard service accept all caching requests while another shard service accepts high-priority requests.

But, how do we implement this?

Well, if we are strictly talking about application services then you could go with the service oriented architecture approach. You could have multiple services developed and deployed independently.

Then you could use a load balancer to route requests by URL path to the appropriate service.

PS. Sharding is not only used for application services but can be used for databases, caches, CDNs, etc...

Pros:

Scalability – Sharding allows you to distribute load across multiple nodes or servers, thus enabling horizontal scaling. As your workload increases, you can just add more shards.
Performance – A single node doesn't have to handle all requests especially if it is computationally heavy. Each node can take in a subset of requests, improving the systems performance.
Cost-effectiveness – Sharding can be a cost-effective solution for scaling your system. Instead of investing in a single, high-capacity server, you can use commodity hardware and distribute the workload across multiple, less expensive servers.
Fault isolation – Sharding provides a level of fault isolation. If one shard or node fails, the remaining shards can continue serving requests.

Cons:

Complexity – Sharding is not easy to implement. It requires careful planning and design to handle data distribution, consistency, and query coordination.
Operational overhead – Managing a sharded system involves additional operational tasks such as monitoring, maintenance, and backup, which can require more resources and expertise.

Use Cases

The Sharded Services Pattern is typically used in the following scenarios:

Performance Requirements – If your system is dealing with large data volumes or high read/write workloads that a single server cannot handle, sharding can distribute the workload across multiple shards. This enables parallel processing and improving overall performance.
Scalability requirements – When you anticipate the need for horizontal scalability in the future, sharding can be implemented from the beginning to provide the flexibility to add more shards and scale the system as the workload grows.
Cost considerations – If vertical scaling (upgrading to more powerful hardware) becomes cost-prohibitive, sharding offers a cost-effective alternative by distributing the workload across multiple, less expensive servers or nodes.
Geographical distribution – Sharding can be beneficial when you need to distribute data across different geographical locations or data centers, allowing for improved performance and reduced latency for users in different regions.

But keep in mind that there must be careful considerations when sharding your services as it is very complex and expensive to implement and revert.

Sidecar Pattern

Source

In a service-oriented architecture, you might have a lot of common functionalities – things such as error handling, logging, monitoring, and configuration.

In the past, there were two ways to solve this problem:

Implement common functionalities within the service

The problem with this approach is that the utilities are tightly linked and run within the same process by making efficient use of the shared resources. This makes the components interdependent.

If one functionality fails then this can lead to another functionality failing or the whole service failing.

Implement common functionalities in a separate service

This may seem a good approach because the utilities can be implemented in any language and it does not share resources with other services.

The downsides are that it adds latency to the application when we deploy two services on different containers, and it adds complexity in terms of hosting, deployment, and management.

How can we do it better?

One method is using the side-car pattern. It states that a container should only address a single concern and do it well.

So to do that, we have a single node (virtual or physical machine) with two containers.

The first is the application container which contains the business logic. The second container, usually called the sidecar, is used to extend/enhance the functionality of the application container.

Now you might ask, "But, how is this useful?"

You should keep in mind that the sidecar service runs in the same node as the application container. So they share the same resources (like filesystem, memory, network, and so on...)

An example

Let's say you have a legacy application that generates logs and saves them in a volume (persisted data) and you want to extract them into an external platform such as ELK.

One way to do this is just extending the main application. But that's difficult due to the messy code.

So you decide to go with the sidecar method and develop a utility service that:

Captures the logs from the volume.
Transfers the logs into Elastic.

The architecture of the node would look something like this:

Source

Hooray, you haven't changed any code in the application and you extended its functionality by plugging in a sidecar.

Heck, you can even plug this log aggregator sidecar into other applications.

Pros:

Modularity – Sidecar allows you to develop and maintain utility functions independently.
Scalability – If there is too much load on the sidecar, you can easily horizontally scale it by adding more containers.
Isolation – The sidecar is isolated from the main application, providing an additional layer of security.

Cons:

Complexity – It requires extra management of multiple containers and their dependencies.
Resource overhead – Because we have an extra container, this can increase the overall resource usage of the application.
Coordination – The sidecar must be coordinated in a way that it works correctly with the main application which increases complexity.
Debugging – Debugging is more difficult with the sidecar pattern, as it requires tracing the interactions between the main application and the sidecar.

Use Cases

The sidecar pattern is useful when you want to add additional functionality to the application without touching the core business logic code.

By deploying the sidecar, the core logic can remain lightweight and focus on its primary task while the sidecar can handle additional functionality.

If need be, you can reuse the sidecar for other applications too.

Now that we know when to use this pattern, let's look at some use cases where it is beneficial:

Logging and Monitoring – The sidecar container collects logs and metrics from the main container, providing centralized storage and real-time monitoring for improved observability.
Caching – The sidecar container caches frequently accessed data or responses, enhancing performance by reducing the need for repeated requests to external services.
Service Discovery and Load Balancing – The sidecar container registers the main container with a service discovery system, enabling load balancing and fault tolerance across multiple instances of the main container.
Security and Authentication – The sidecar container handles authentication tasks, offloading responsibilities like OAuth, JWT verification, or certificate management from the main container.
Data Transformation and Integration – The sidecar container performs data transformation and integration tasks, facilitating seamless communication and synchronization between the main container and external systems.
Proxy and Gateway – The sidecar container acts as a proxy or gateway, providing functionalities like rate limiting, SSL termination, or protocol translation for enhanced communication capabilities.
Performance Optimization – The sidecar container handles CPU-intensive tasks or background processes, optimizing resource usage and improving the main container's performance.

Write-Ahead Log Technique

Imagine this: you are working on a service that is connected to a database with sensitive user information.

One day, the server crashes. Your database crashes. All the data is gone, apart from the backups.

You sync the database with the backup, but the backup is not up to date. It's 1 day old. You sit and cry in the corner.

Well thankfully, that will most likely never happen.

Because most databases have something called a write-ahead log (WAL).

What is a write-ahead log?

A write-ahead log is a popular technique used to preserve:

Atomicity – Every transaction is treated as a single unit. Either the entire transaction is executed, or none of it is executed. This ensures that the data does not get corrupted or lost.
Durability – Ensures that the data will not be lost, even in an event of a system failure.

But how does it work?

A WAL stores every change you made onto a file on a hard disk.

So for example, let's say you created your own in-memory database called KVStore. In case of system failure, you want:

Data to not be lost.
Data to be recovered onto memory.

So you decide to implement a write-ahead log.

Every time you do any transaction (SET or REMOVE), the command will be logged into a file on the hard disk. This allows us to recover the data in case of system failure. The memory will be flushed, but the log is still stored in the hard drive.

The overall architecture would look something like this:

Source

It's not all sunshine and rainbows

As useful as it is, WAL is not easy to implement. There are many nuances, but the most common ones are:

Performance

If you use standard file-handling libraries in most programming languages, you would most likely "flush" the file onto the hard disk.

Flushing every log will give you a strong guarantee of durability. But this severely limits performance and can quickly become a bottleneck.

You might ask, "why don't we delay flushing or do it asynchronously?"

Well, this might improve performance but at the risk of losing entries from the log if the server crashes before entries are flushed.

The best practice here is to implement techniques like Batching, to limit the impact of the flush operation.

Data Corruption

The other consideration is that we have to make sure that corrupted log files are detected.

To handle this, save log files via CRC (Cyclic Redundancy Check) records, which validates the files when read.

Storage

Single Log files can be difficult to manage and can consume all the available storage.

To handle this issue, use techniques like:

Segmented Log – Split the single log into multiple segments.
Low-Water Mark – This technique tells us which portion of the logs can be safely discarded.

These two techniques are used together as they both complement each other.

Duplicate Entries

WALs are append-only, meaning that you can only add data. Because of this behavior, we might have duplicate entries. So when the log is applied, it needs to make sure that the duplicates are ignored.

One way to solve this is to use a hashmap, where updates to the same key are idempotent. If not, then there needs to be a mechanism to mark each transaction with a unique identifier and detect duplicates.

Use Cases

Overall WALs are mostly used in databases but can be beneficial in other areas:

Write-ahead logs (WALs) are widely used in various systems and databases. Here are some common use cases for write-ahead logs:

File Systems – File systems can employ write-ahead logging to maintain data consistency. By logging changes before applying them to the file system, WALs allow for crash recovery and help prevent data corruption in case of system failures.
Message Queues and Event Sourcing – Write-ahead logs are often used in message queues and event sourcing architectures. The logs serve as a reliable and ordered record of events, allowing for reliable message delivery, event replay, and system state restoration.
Distributed Systems – Distributed systems that need to maintain consistency across multiple nodes can benefit from write-ahead logs. By coordinating log replication and replay, WALs help synchronize data updates and ensure consistency in distributed environments.

Split-Brain Pattern

Source

That's definitely an interesting name, isn't it? It might make you think of the two halves of the brain.

Well, it is actually somewhat similar.

A split brain in distributed systems happens when nodes in a distributed system become disconnected from each other but still continue to operate.

Split brain? Nodes working independently when they should work together? Sound similar? I hope so.

Anyways, the biggest problem with this is that it causes:

Data Inconsistency
Competing for resources

This will usually shut the cluster off while developers try to fix things. This causes downtime which makes the business lose money.

What's the fix?

One fix is using a generational number. Every time a leader is elected, the generation number gets incremented.

For example, if the old leader had a generational number of one, then the second leader will have a generational number of two.

The generation number is included in every request, and now clients can just trust the leader with the highest number.

But keep in mind that the generational number must be persisted on disk.

One way to do that is using a Write Ahead Log. See, things are connected to each other.

This solution is categorized as a leader election but there are others:

Quorum-based Consensus – Use algorithms like Raft or Paxos to ensure that only a majority of nodes can make decisions, preventing conflicting decisions in a split brain scenario.
Network Partition Detection – Use monitoring techniques or distributed failure detectors to identify network partitions and take appropriate actions.
Automatic Reconciliation – Implement mechanisms to automatically resolve conflicts and ensure data consistency once a split brain is resolved, such as merging conflicting changes or using timestamps or vector clocks.
Application-level Solutions – Design the application to tolerate split brain scenarios by employing eventual consistency models, conflict-free data structures, or CRDTs.
Manual Intervention –In some cases, manual intervention may be required to resolve split brain scenarios, involving human decision-making or administrative actions to determine the correct system state and perform data reconciliation.

Pros

Data Consistency – Implementing a fix ensures that shared data remains consistent across the distributed system.
System Stability – Resolving split brain scenarios promotes system stability by avoiding conflicting operations and maintaining coherent behavior.

Cons

Increased Complexity – Fixing split brain scenarios adds complexity to the system due to the intricate logic and mechanisms required.
Performance Overhead – Split brain resolution mechanisms may impact system performance and latency due to additional processing and communication requirements.
Higher Resource Utilization – Addressing split brain scenarios may require allocating more resources, potentially increasing costs.
Increased Failure Surface – Introducing split brain resolution mechanisms may inadvertently introduce new failure modes or vulnerabilities.
Configuration and Tuning Complexity – Implementing a fix requires careful configuration and ongoing maintenance to ensure optimal behavior in different scenarios.

Hinted Handoff Pattern

_Source_

The Hinted Handoff technique makes sure that you have:

Fault Tolerance – The ability of the system to continue working even if one or more components fail.
Availability of Data – Ability to access and modify the data at any given time.

What problem does it solve?

Imagine you have a bank service that communicates with a node that has three replicas. None of the nodes are leaders, so the architecture is leaderless.

You send a request to update the customer's balance to $100. You send this to all the replicas.

The request is successful for the first two replicas but the last one is down. After a few seconds, the replica that was down got back up again but it has the old data.

How do we fix this issue?

The hinted handoff technique says that when a node for a particular data goes offline, then the other nodes of the system will temporarily store updates or modifications intended for the unavailable node.

Hence the name "hints".

So when the unavailable node comes back alive, it can retrieve the hints and apply them.

So the process goes like this:

Detection – When a node fails, other nodes detect this failure and mark that node as unavailable.
Hint Generation – When a node receives a request to send to the unavailable node, it will store it locally, usually to a file on the disk.
Hint Delivery – When the available node goes back online, it sends a message to the other nodes requesting any hints that were made while it was offline. The other nodes send the hints and the node applies them.

By using this technique we ensure our data is consistent and available even when nodes fail or become temporarily unavailable.

Pros

Improved data availability – Hinted handoff ensures data remains accessible during temporary node failures by transferring responsibilities to other nodes.
Data consistency – Hinted handoff helps maintain data consistency by synchronizing the failed node with others when it recovers.
Reduced latency – Hinted handoff minimizes the impact of node failures on system performance, routing requests to alternative nodes and reducing latency.
Scalability – Hinted handoff enables dynamic redistribution of data responsibilities, allowing the system to handle increased workloads and node changes.

Cons

Increased complexity – Implementing hinted handoff adds complexity to the system, making development, debugging, and maintenance more challenging.
Storage overhead – Hinted handoff requires storing additional metadata, incurring storage overhead to track handoff status.
Potential data staleness – After a failure, recovered nodes may have temporarily stale data until synchronization occurs, leading to potential inconsistencies.
Increased network traffic – Hinted handoff involves transferring data responsibilities, resulting in increased network traffic and potential impact on network performance.

Use Cases

Hinted handoff is typically implemented in distributed database systems or distributed storage systems where data availability and consistency are crucial.

Here are some scenarios where implementing hinted handoff is beneficial:

Cloud storage systems – Hinted handoff enables seamless redirection of client requests to available nodes in cloud storage systems when a node becomes temporarily unavailable.
Messaging systems – Hinted handoff allows distributed messaging systems to route messages to other active brokers when a broker node fails, ensuring message delivery and system operability.
Distributed file systems – Hinted handoff in distributed file systems allows the temporary transfer of data responsibilities to other nodes when a data node fails, ensuring data availability and uninterrupted read/write operations.
Content delivery networks (CDNs) – Hinted handoff in CDNs facilitates the redirection of content delivery requests to other servers in the network when a server becomes temporarily unavailable, ensuring continuous content delivery to users.

Read Repair Pattern

Source

In a distributed system, you can have data partitioned into multiple nodes.

This introduces a new challenge where we have to keep the data consistent in all nodes.

For example, if you update data on node A, the changes might not be immediately propagated to other nodes due to all sorts of reasons.

So we use the "read repair" pattern

When a client reads a piece of data from a node, that node will check if the data is the latest. If not, then it receives the latest data from another node.

Once it receives the latest data, it will update the node with the old data with the new data. Hence the "repair".

But this is done on the application side

Doing things on the application side is pretty flexible as you can have your own custom logic per service but it increases complexity and development time.

Thankfully, there are three other methods of implementing read repair:

Write-Based Read Repair – Proactively update multiple replicas or nodes with the latest data during write operations.
Background Repair – Schedule periodic background repair processes to scan and repair inconsistencies in the database.
Database-Specific Read Repair – Leverage built-in read repair mechanisms or conflict resolution features provided by the database.

Pros

Data Consistency – Read repair maintains data consistency by automatically detecting and correcting inconsistencies between replicas or nodes.
Improved User Experience – Read repair provides reliable and accurate data, enhancing the user experience by reducing conflicting or outdated information.
Fault Tolerance – Read repair increases system resilience by addressing data inconsistencies and reducing the risk of cascading failures.
Performance Optimization – Read repair improves performance by minimizing the need for separate repair processes and distributing the repair workload.
Simplified Development – Read repair automates consistency checks, simplifying application development.

Cons

Increased Complexity – Implementing read repair adds complexity to system design, development, and maintenance efforts.
Performance Overhead – Read repair may introduce additional latency and computational overhead, impacting overall system performance.
Risk of Amplifying Failures – Incorrect implementation of read repair can propagate inconsistencies or amplify failures.
Scalability Challenges – Coordinating repairs in large-scale systems can be challenging, impacting performance and scalability.
Compatibility and Portability – Read repair mechanisms may be specific to certain databases or technologies, limiting compatibility and portability.

Use Cases

Read repair can be beneficial in various scenarios where maintaining data consistency across replicas or nodes is crucial.

Here are some situations where you should consider using read repair:

Distributed Database Systems – Use read repair in distributed database systems where data is replicated across multiple nodes or replicas to ensure data consistency.
High Data Consistency Requirements – Implement read repair when your application requires a high level of data consistency, such as in financial systems, collaborative editing platforms, or real-time analytics.
Read-Heavy Workloads – Consider read repair for read-heavy workloads to detect and reconcile inconsistencies during read operations, improving performance by reducing the need for separate repair processes.
Systems with Network Delays or Failures – Use read repair in environments with network delays or occasional node failures to automatically detect and correct data inconsistencies caused by these issues.

Service Registry Pattern

When you are working with a distributed system, you will have services that have instances that can scale up or down.

One service can have ten instances at one time and two at another time.

The IP addresses of these instances get created dynamically. Problems arise here.

Imagine that you have a client and it wants to talk to a service. How will it know the IP address of the service if IP addresses are dynamically created?

The answer is a service registry.

What the heck is that?

A service registry is usually a separate service that runs that keeps information about available instances and their locations.

But how does the service registry know all this information?

When we create an instance of any of our services, we register ourselves to the service registry with our name, IP address, and port number.

The service registry then stores this information in its data store.

When a client needs to connect to a service, it queries the service registry to obtain the necessary information for connecting to the service.

Now that we know what a service registry is, let's talk about the patterns of service discovery.

Client Side Discovery

Source

The first and easiest way is for the client to call the service registry and get information about all the available instances of a service.

This approach works well when you want to:

Have something simple and straightforward.
Have the client-side make the decisions about which instances to call.

But the significant drawback here is that it couples the client with the service registry.

So you must implement client-side discovery logic for every programming language and framework that your service uses.

Server Side Discovery

Source

On the other hand, the server-side discovery forces the client to make the request via a load balancer.

If you don't know what a load balancer is, feel free to check out my other comprehensive article on it.

The load balancer will call the service registry and routes the request to the specific instance.

Server-side discovery has the following benefits:

Abstracting the details of the service registry to the load balancer means that the client can simply make a request to the load balancer.
It's built-in in most popular providers such as AWS ELB (Elastic Load Balancer).

The only downside is that you have another component (service registry) in your infrastructure that you have to maintain.

Circuit Breaker Pattern

Source

Let's say you have three services, A, B, and C. They all call each other sequentially – A calls B, which calls C.

All goes well as long as the services work. But what if one of the services is down Then the other services would fail. If service C is down, then B and A would be down, too.

How do we fix this?

We use the circuit breaker pattern which acts as a middleware between two services.

It monitors the state of the second service and, in case of failure or unresponsiveness, stops the requests to the service and returns a fallback response or error message to the component.

The middleware has three states:

Closed – The service can communicate with the second service normally.
Open – When the middleware detects a certain number of consecutive failures then it transitions to the open state, and all requests to the service are immediately blocked.
Half Open – After a certain period of time, the middleware transitions to a half-open state which allows a limited number of requests to be sent to the second service. If successful, then the middleware transitions to a closed state otherwise it will transition to an open state.

Overall, the circuit breaker pattern increases resilience by providing a fallback mechanism and reducing the load on a failed service.

It also provides insight into the status of a service which helps us identify failures more quickly.

Pros

Fault tolerance – Circuit breakers enhance system stability by protecting against cascading failures and reducing the impact of unavailable or error-prone dependencies.
Fail-fast mechanism – Circuit breakers quickly detect failures, allowing for faster recovery and reducing latency by avoiding waiting for failed requests to complete.
Graceful degradation – Circuit breakers enable the system to gracefully degrade functionality by providing alternative responses or fallback mechanisms during failures.
Load distribution – Circuit breakers can balance the load across available resources during high traffic or when a service is experiencing issues.

Cons

Increased complexity – Implementing a circuit breaker adds complexity to the system, impacting development, testing, and maintenance efforts.
Overhead and latency – Circuit breakers introduce processing overhead and latency as requests are intercepted and evaluated against circuit states.
False positives – Circuit breakers can mistakenly block requests even when the dependency is available, leading to false positives and impacting system availability and performance.
Dependency on monitoring – Circuit breakers rely on accurate monitoring and health checks, and if these are unreliable, the effectiveness of the circuit breaker may be compromised.
Limited control over remote services – Circuit breakers provide protection but lack direct control over underlying services, requiring external intervention for resolving certain issues.

Use Cases

The circuit breaker pattern is beneficial in specific scenarios where a system relies on remote services or external dependencies.

Here are some situations where it is recommended to use the circuit breaker pattern:

Distributed systems – When building distributed systems that communicate with multiple services or external APIs, the circuit breaker pattern helps improve fault tolerance and resilience by mitigating the impact of failures in those dependencies.
Unreliable or intermittent services – If you are integrating with services or dependencies that are known to be unreliable or have intermittent availability, implementing a circuit breaker can protect your system from prolonged delays or failures caused by those dependencies.
Microservices architecture – In a microservices architecture, where individual services have their own dependencies, implementing circuit breakers can prevent cascading failures across services and enable graceful degradation of functionality during failures.
High-traffic scenarios – In situations where the system experiences high traffic or load, circuit breakers can help distribute the load efficiently by redirecting requests to alternative services or providing fallback responses, thereby maintaining system stability and performance.
Resilient and responsive systems – The circuit breaker pattern is useful when you want to build systems that are resilient and responsive to failures. It allows the system to quickly detect and recover from issues, reducing the impact on users and ensuring a smoother user experience.

Leader Election Pattern

_Source_

The leader election pattern is a pattern that gives a single thing (process, node, thread, object) superpowers in a distributed system.

This pattern is used to ensure that a group of nodes can coordinate effectively and efficiently.

So when you have three or five nodes performing similar tasks such as data processing or maintaining a shared resource, you don't want them to conflict with one another (that is, contesting for resources or interfering with each other's work).

Elect one leader that coordinates everything

The leader will be responsible for critical decisions or distributing workloads among the other processes.

Any node can be a leader, so we should be careful when we elect them. We don't want two leaders at the same time.

So we must have a good system for selecting a leader. It must be robust, which means it has to cope with network outages or node failures.

How to select a leader

One algorithm named Bully assigns every node with a unique identifier. The node with the highest identifier initially is considered the leader.

If a low-ranked node detects that the leader has failed, it sends a signal to all the other high-ranked nodes to take over. If none of them respond then the node will make itself the leader.

Another popular algorithm is called the Ring algorithm. Each node is arranged in a logical ring. The node with the highest identifier is initialized as a ring.

If a lower-ranked node detects that the ring has failed, then it will request all the other nodes to update their leader to the next highest node.

Both of the above algorithms assume that every node can be uniquely identified.

Pros

Coordination – Leader election allows for better organization and coordination in distributed systems by establishing a centralized point of control.
Task Assignment – The leader can efficiently allocate and distribute tasks among nodes, optimizing resource utilization and workload balancing.
Fault Tolerance – Leader election ensures system resilience by promptly electing a new leader if the current one fails, minimizing disruptions.
Consistency and Order – The leader maintains consistent operations and enforces the proper order of tasks in distributed systems, ensuring coherence.

Cons

Overhead and Complexity – Leader election introduces additional complexity and communication overhead, increasing network traffic and computational requirements.
Single Point of Failure – Dependency on a single leader can lead to system disruptions if the leader fails or becomes unavailable.
Election Algorithms – Implementing and selecting appropriate leader election algorithms can be challenging due to varying trade-offs in performance, fault tolerance, and scalability.
Sensitivity to Network Conditions – Leader election algorithms can be sensitive to network conditions, potentially impacting accuracy and efficiency in cases of latency, packet loss, or network partitions.

Use Cases

Distributed Computing – In distributed computing systems, leader election is crucial for coordinating and synchronizing the activities of multiple nodes. It enables the selection of a leader responsible for distributing tasks, maintaining consistency, and ensuring efficient resource utilization.
Consensus Algorithms – Leader election is a fundamental component of consensus algorithms such as Paxos and Raft. These algorithms rely on electing a leader to achieve agreement among distributed nodes on the state or order of operations.
High Availability Systems – Leader election is employed in systems that require high availability and fault tolerance. By quickly electing a new leader when the current one fails, these systems can ensure uninterrupted operations and mitigate the impact of failures.
Load Balancing – In load balancing scenarios, leader election can be used to select a leader node responsible for distributing incoming requests among multiple server nodes. This helps in optimizing resource utilization and evenly distributing the workload.
Master-Slave Database Replication – In database replication setups, leader election is used to determine the master node responsible for accepting write operations. The elected master coordinates data synchronization with slave nodes, ensuring consistency across the replica set.
Distributed File Systems – Leader election is often employed in distributed file systems, such as Apache Hadoop's HDFS. It helps in maintaining metadata consistency and enables efficient file access and storage management across a cluster of nodes.

Bulk Head Pattern

Source

Things fail for all sorts of reasons in a distributed system. This is why we need to make sure our systems are resilient in case of failures.

One way to improve that is to use the bulkhead pattern.

So let's say we have two services, A and B.

Some requests of A depend on B. But the issue is that service B is pretty slow which makes A slow and blocks its threads. This makes all requests to A slow, even the ones which don't depend on B.

The bulkhead pattern solves that by allocating a specific amount of threads for requests to B. This prevents A from consuming all threads due to B.

Pros

Fault isolation – The bulkhead pattern contains failures within individual services, minimizing their impact on the overall system.
Scalability – Services can be independently scaled, allowing allocation of resources where needed without affecting the entire system.
Performance optimization – Specific services can receive performance optimizations independently, ensuring efficient operation.
Development agility – Code modularity and separation of concerns facilitate parallel development efforts.

Cons

Complexity – Implementing the bulkhead pattern adds architectural complexity to the system.
Increased resource usage – Duplication of resources across components may increase resource consumption.
Integration challenges – Coordinating and integrating the services can be challenging and introduce potential points of failure.

Use Cases

Here are some common scenarios where the bulkhead pattern is beneficial:

Microservices architecture – The bulkhead pattern is used to isolate individual microservices, ensuring fault tolerance and scalability in a distributed system.
Resource-intensive applications – By partitioning the system into components, the bulkhead pattern optimizes resource allocation, enabling efficient processing of resource-intensive tasks without impacting other components.
Concurrent processing – The bulkhead pattern helps handle concurrent processing by assigning dedicated resources to each processing unit, preventing failures from affecting other units.
High-traffic or high-demand systems – In systems experiencing high traffic or demand, the bulkhead pattern distributes the load across components, preventing bottlenecks and enabling scalable handling of increased traffic.
Modular and extensible systems – The bulkhead pattern facilitates modular development and maintenance, allowing independent updates and deployment of specific components in a system.

Retry Pattern

Source

The retry pattern is used to handle temporary failures.

Requests fail for all sorts of reasons, from faulty connections to site reloading after a deployment.

So let's say we have a client and a service. The client sends a request to the service and receives a response code 500 back.

There are multiple ways to handle this according to the retry pattern.

If the problem is rare, then retry immediately.
If the problem is more common, then retry after some amount of delay (for example 50ms, 100ms, etc)
If the issue is not temporary (invalid credentials, and so on) then cancel the request.

But sometimes, the client thinks that the request has failed

Sometimes the operation was a success but the request failed to return something. This happens when requests are not stateless, meaning that they change the state of the system somehow (like CRUD operations on a database).

So what we do is we send a duplicate request which can cause problems in the system.

For that, we need some sort of tracking system. We can use the circuit breaker pattern to limit the impact of repeatedly retrying a failed/recovering service.

Pros

Resilience and Reliability – By automatically retrying failed requests, you increase the chance of the system recovering from failure (resilience) and increase a chance of a successful requests (reliability).
Error handling – Retrying an operation transparently hides the error from the end user, making the system appear more robust.
Simplified Implementation – The retry pattern is simple to implement. Instead of having custom error handling logic for all your requests, you can wrap them all with a retry pattern class.
Performance optimization – In some cases, temporary errors may be resolved quickly, and the subsequent retries can succeed without significant delays. This can help optimize performance by avoiding unnecessary failures and reducing the impact on the end user.

Cons

Increased latency – When retries are attempted, there is an inherent delay introduced in the system. If the retries are frequent or the operations take a long time to complete, the overall latency of the system may increase.
Risk of infinite loops – Without proper safeguards, retries can potentially lead to infinite loops if the underlying error condition persists.
Increased network and resource usage – Retrying failed operations can lead to additional network traffic and resource utilization.
Potential for cascading failures – If the underlying cause of the failure is not resolved, retrying the operation may result in cascading failures throughout the system.
Difficulty in handling idempotent operations – Retrying stateful requests can lead to unintended consequences if the request is performed multiple times.

Use Cases

The applicability of the retry pattern depends on the specific requirements and characteristics of the system being developed.

The pattern is most useful for handling temporary errors. But, you must also keep in mind for scenarios involving long delays, stateful requests, or cases where manual intervention is required.

Now, let's look at some use cases where retrying requests help:

Network communication – When working with third party services or APIs over a network, all sorts of errors can occur.
Database operations – Databases also face the same issues due to temporary unavailability, timeouts, or deadlock situations.
File or resource access – When accessing files, external resources, or dependencies, failures due to locks, permissions, or temporary unavailability can occur.
Queue processing – Systems that rely on message queues or event-driven architectures may encounter temporary issues like queue congestion, message delivery failures, or service availability fluctuations.
Distributed transactions – In distributed systems, performing coordinated transactions across multiple services or components can face challenges like network partitions or temporary service failures.
Data synchronization – When synchronizing data between different systems or databases, temporary errors can disrupt the synchronization process.

Scatter Gather Pattern

Source

Let's say we have a process-heavy task like video compression.

We get a video and have to compress it into 5 different resolutions such as 240p, 360p, 480, 720p, and 1080p.

We have a single image compression service that takes in the video and processes each resolution sequentially. The problem here is that it's very slow.

The scatter-gather pattern advises us to run multiple of these processes simultaneously and gather all the results in one place.

So for our video process examples, this is how it will look:

Scatter – We will divide the task into multiple nodes, so one node will take care of compressing the video into 240p, another to 360p, and so on.
Process – Each node will compress its video individually and simultaneously.
Gather – Once all the nodes have finished compressing the videos, the videos will be stored on some server and we collect the links for all the different versions.

Instead of waiting for each of our videos to be compressed sequentially, now we can parallelize this whole process and increase performance significantly.

Pros

Parallel Processing – The scatter-gather pattern improves performance by enables parallel processing of subtasks. Tasks can execute concurrently across multiple nodes or processors.
Scalability – The scatter gather pattern allows us to scale horizontally, the higher our workload, the more nodes we provision.
Fault Tolerance – This pattern enhances fault tolerance by redistributing failed subtasks to other available nodes, ensuring the overall task can still be completed successfully.
Resource Utilization – The scatter-gather pattern optimizes resource utilization by efficiently leveraging available computational resources across multiple nodes or processors.

Cons

Communication Overhead – This pattern involves communication between nodes which introduce potential latency and network congestion that may impact overall performance, especially with large data volumes.
Load Balancing – Balancing the workload evenly across nodes can be challenging, leading to potential inefficiencies and performance bottlenecks if some nodes are idle while others are overloaded.
Complexity – This pattern is not easy to implement and adds an aditional layer of complexity to the system. It requires careful planning and synchronization mechanisms to coordinate subtasks.
Data Dependency – Handling dependencies among subtasks, where the output of one subtask is needed as input for another, can be more complex in the scatter-gather pattern compared to other patterns.

Use Cases

The scatter-gather pattern is useful in distributed systems and parallel computing scenarios where you can divide tasks into smaller subtasks that can be performed concurrently across multiple nodes and processors.

Here are some use cases where the scatter-gather pattern can be used:

Web Crawling – You can use the scatter gather pattern to fetch and crawl multiple web pages concurrently.
Data Analytics – Apply scatter-gather to divide large datasets into chunks for parallel processing in data analysis tasks, combining individual results for final insights.
Image/Video Processing – Utilize scatter-gather for distributed image/video processing, such as encoding frames in parallel and gathering results for the final output.
Distributed Search – Implement scatter-gather in distributed search systems to distribute search queries across nodes, gather and rank results for final search output.
Machine Learning – Apply scatter-gather in distributed machine learning for parallel model training on scattered data and combining models for final trained model.
MapReduce – Incorporate scatter-gather in the MapReduce model for big data processing, scattering input, parallel processing of intermediate results, and gathering for final output.

Bloom Filters (Data Structure)

_Source_

Bloom filters is a data structure that is designed to tell you efficiently both in memory and speed whether an item is in a set.

But the cost of this efficiency is that it is probabilistic. It can tell you that an item is either:

Definitely not in a set.
Might be in a set.

Pros

Space Efficiency – Bloom filters usually require a smaller amount of memory to store the same number of elements compared to hash tables or other similar data structures.
Fast Lookups – Checking whether an element is in a Bloom filter has a constant-time complexity regardless of the size of the filter or the number of elements it contains.
No False Negatives – Bloom filters provide a definitive "no" answer if an element is not in the filter. There are no false negatives, meaning if the filter claims an element is not present, it is guaranteed to be absent.
Parallelizable – Bloom filters can be easily parallelized, allowing for efficient implementation on multi-core systems or distributed environments.

Cons

False Positives – Due to their probabilistic nature, there is a small probability that the filter will incorrectly claim an element is present when it is not. The probability of false positives increases as the filter becomes more crowded or the number of elements increases.
Limited Operations – Bloom filters only support insertion and membership tests. They do not allow deletion of elements or provide a mechanism to retrieve the original elements stored in the filter. Once an element is inserted, it cannot be removed individually.
Fixed Size – The size of the bloom filter must be determined beforehand and cannot dynamically resize. If these parameters are not estimated accurately, the filter may become ineffective or waste memory.
Unordered Data – Bloom filters do not maintain the order or provide any information about the stored elements. They are not suitable for scenarios that require ordering or retrieval of elements based on certain criteria.

Use Cases

Bloom filters are most useful in situations where approximate membership tests with a low false positive rate are acceptable and where memory efficiency is a priority.

Here are some scenarios where Bloom filters are particularly beneficial:

Caching Systems – Bloom filters can be used to quickly check if an item is likely to be present in the cache or not. By checking the bloom filter, the system can avoid the costly operation of fetching the item from the cache or underlying storage if it is not likely to be there, thereby improving overall performance.
Content Delivery Systems – CDNs use bloom filters to handle cache invalidation. Instead of checking each edge server, we use a bloom filter to identify whether the edge server potentially has a specific resource.
Anti-Spam Systems – Anti-spam systems check emails against common spam databases. Instead of checking against the databases which is expensive, we use bloom filters which are much more efficient.
Distributed Data Processing – In a distributed data processing framework such as Hadoop or Spark. We use bloom filters to optimize join operations by pre-filtering data and removing unnecessary shuffles.

Conclusion

You don't need to be an expert in all of these things – I'm not. And even if you don't ever directly use or work with some of these concepts, they are still good to know.

Try to implement them in simple projects. It's a good idea to keep the projects simple enough so you won't get sidetracked.

Below, I share with you my favorite resources that I used to learn about distributed systems.

Thank you for reading.

Recommended Resources

How Message Queues Work in Distributed Systems

freeCodeCamp — Thu, 16 Feb 2023 15:54:17 +0000

By Nyior Clement

From town criers to written letters and now real-time chat applications, humanity has always been on the lookout for innovative ways to communicate. And we can say the same about computers.

There's an even greater need to allow computers to communicate with the increasing adoption of distributed architecture. This also means that coordinating and managing communication between smaller services has gotten more complex.

But different ways of allowing the services in a distributed architecture to communicate have evolved over the years. We can put these different communication patterns into two broad categories: synchronous and asynchronous modes of communication.

But more relevant to this article would be exploring how we can use message queues to implement asynchronous communication in a distributed system. And, more importantly, how this can help us realize a more reliable and scalable architecture.

Synchronous, asynchronous, message queues, distributed systems — what are all these terms?

This article will cover what they are and, more importantly, explore the link between message queues and distributed systems. To begin, let’s look at the evolution from monoliths to distributed systems.

Note: for brevity’s sake, this article will interchangeably use the term distributed systems and microservices.

The Era of Monoliths

In the early days, software systems were relatively simple and small. As a result, they were developed and deployed as a unit that runs on a single process.

For example, let's say you were working on an application that does authentication, processes orders, and, eventually, handles payments. You would bundle all these components, and the application’s database, into a tightly-coupled unit that’s deployed as one application — a monolith.

Illustration of a monolithic style of architecture

Over time, as more and more people embraced the internet, these monolith applications began to experience an explosion in the number of their users — a good thing, right? But this unmasked some limitations of the monolithic architecture.

Limitations of the monolithic architecture

First, with the increasing number of users, developers realized that having an application and its database hosted on the same machine wasn’t great. The database quickly became a resource hog, eating up the server’s memory and compute power.

To solve that, developers figured it’d be great if the application and its database could be hosted on different machines.

A monolithic application with a separate database

Even with that, they noticed that just moving the database to a separate machine wasn’t enough. As more users used an application, the demand for the host machine’s compute resources equally increased.

As users interacted with an application even more, developers realized that the host machine was being pushed to its limit. This negatively affected how quickly the monolith application could serve a user request. This latency problem got worse with increasing user traffic.

To fix that, they adopted a new design.

Enter the replicated monolithic architecture

In this design pattern, the same monolith application is duplicated on multiple servers — with all the servers hidden behind a load balancer. Each of the monoliths connected to the same database, and the load balancer routed requests to these instances optimally so as not to overwhelm any of the instances.

Replicated monolithic architecture

Replicating a monolith across multiple machines improved performance, but at what cost?

Making changes to a monolith running a replicated architecture became an undertaking that had all the makings of a nightmare. Let's review some of these downsides.

The pitfalls of the replicated monolithic architecture

Some of the limitations of the replicated monolith are, but not limited to:

1. Difficulty of upgrade or maintenance: For example, imagine you needed to update how your application was processing payments— maybe switch to a different payment processor. Updates like this would have to be reflected on all the replicated monoliths.

Essentially, all the copies would have to be taken down, updated, tested, and then redeployed. Overall, introducing changes or maintaining a monolith, in general, is way harder than it should be, because all the components are tightly coupled.

2. Inefficient horizontal scaling: Beyond maintenance and upgrades, yes, horizontally scaling a monolith by creating duplicate instances can improve performance – but how the scaling is done in this case is not optimal. Let’s see why…

Remember, a monolith is a bundle of different components that do different things but run as a single process. Going with our previous example, let's say our monolith has authentication, order processing, and payment components.

Now, in a monolith, it is common to have some components that do more work (receive and process user requests) than others. For example, in our case, not every user who logs in or signs up would make an order or pay for an order.

Even though, in our case, we are aware that it’s the authentication component that we need to allocate more resources to in order to scale, we can’t. This is because all the components are tied to each other and work as a single unit, so we can’t simply take out the authentication component.

That’s why, to scale a monolith horizontally, we’d have to create a duplicate instance of the entire application as opposed to scaling just the relevant components. This, in turn, leads to unnecessary pressure that could be avoided on the computing resources of the host machine.

A recap of the limitations of the monolithic architecture in general

Lastly, because all the components in a monolith are tightly coupled, if one component is faulty, this will directly affect the availability of the other parts. Clearly, with monoliths, we end up with fragile systems.

In summary, monolithic applications have the following limitations:

It’s hard to introduce new changes to a monolith, as well as maintain and upgrade it over time
Scaling a monolithic application is usually not optimal
Monolithic applications are not resilient — if one part of the system fails, the entire system fails

The desire to create a software architecture that overcame the shortcomings of the monolith brought about the era of microservices.

What is a Microservice?

Well, burdened by the glaring limitations of the monolith, developers became analytical and went back to the big picture. They thought, what if we develop the tightly coupled components of the monolith as loosely coupled independent components?

That thinking gave rise to what we now call the microservice architecture. Unlike the monolithic architecture, the microservice architecture breaks down an application into smaller, self-reliant components called services.

Each service is developed and deployed separately. As a result, each service runs on a separate process.

Going with our previous example, our monolith will be broken down into three services: the authentication service, the order processing service, and the payment processing service, as shown in the image below.

A microservice architecture

Even though these services are now developed as autonomous components, they all work together and communicate to achieve a common goal — ensuring that authenticated users can make purchases and pay.

But this brings up the question: how do these services communicate with each other?

Communication patterns in a microservice architecture

Broadly speaking, the services in the microservice architecture can be configured to communicate in one of two ways: synchronously or asynchronously.

To understand these two communication patterns, let’s pursue our previous example further — imagine a scenario where the order processing service needs to send the details of a new order to the payment processing service for payment to be processed.

Synchronous communication

Suppose the communication between the two services is synchronous. In that case, the order processing service will make an API call to the payment processing service and then wait for a response from that service before continuing its process.

As a result, the order processing service is blocked until it receives the response to its request. While the synchronous communication pattern is simple and direct, it has some major flaws when used in a microservice architecture.

One of the objectives of microservices is to create small independent services that remain operational even if one or more services experience downtime. This helps eliminate single points of failure.

However, the fact that the order processing service has to wait idly for a response from the payment service creates a level of interdependence that detracts from the intended autonomy of each service.

This is problematic because:

If the payment processing service experiences an unexpected failure, the order processing service would be unable to fulfill its requests.
If the order processing service encounters a sudden surge in traffic, it would also impact the payment service as it directly forwards its requests to the payment service.
If the payment service takes too long to respond, the order processing service will also take very long to send a response to the end user.

How can these problems be fixed? Asynchronous communication to the rescue.

Asynchronous communication

Suppose the communication between the two services is asynchronous. In that case, the order processing service will initiate a request to the payment service and continue its execution without waiting for a response. Instead, it receives the response at a later time.

For asynchronous communication, multiple patterns are available such as publish/subscribe, message queueing, and event-driven architecture.

Here, we are interested in seeing how asynchronous communication can be implemented in microservices with message queues.

What is a Message Queue?

A message queue is fundamentally any technology that acts as a buffer of messages — it accepts messages and lines them up in the order they arrive. When these messages need to be processed, they are again taken out in the order they arrive.

A message is any data or instruction added to the message queue. Going with our example, a message would be the details of an order that could be added to the message queue and then later processed by the payment service.

The architecture of a message queue is simple — applications called the producers create messages and add them to the message queue. Other applications called the consumers pick up these messages and process them. Some examples of message queues are Apache Kafka, RabbitMQ, and LavinMQ, among others.

We can generally use a message queue to allow the services within a microservice to communicate asynchronously. But how, you might ask?

Async communication in distributed systems with message queues

To demonstrate how a message queue can foster asynchronous communication in a microservice architecture, let’s go to the example we’ve repeated in this article.

Recall we mentioned that the order processing service could forward the details of new orders to the payment processing service synchronously via an API call. We can replace the synchronous API with a message queue.

In essence, the message queue will sit between the two services. In this setup, the order processing service will act as the producer that produces and adds messages to the message queue. The payment processing service will then act as the consumer that processes messages added to the queue.

The asynchronous communication here is inherent in the fact that when the producer (in this case, the order processing service) adds a message to the message queue, it continues with its execution without waiting for a response.

The consumer could then process messages from the message queue at its own pace and send a response to the producer or the end user at a later time.

This is a drastic shift from the synchronous approach. The genius of the asynchronous communication pattern with message queues lies in the fact that the services are separated and, as a result, made independent.

This is true because:

If the payment processing service experiences an unexpected failure, the order processing service will continue to accept requests and place them in the queue as if nothing had occurred. The requests will always be present in the queue, waiting for the payment service to process them when it is back online. This leads to a more reliable architecture with no single point of failure
The order processing service does not have to wait for a response from the consumer, the end users don’t also have to wait for long— this leads to improved performance and, by extension, a better user experience.
If the order processing service experiences an increase in traffic, the payment service will not be affected as it only retrieves requests from the message queues at its own pace.
This approach has the advantage of being simple to scale, as it allows for adding more queues or creating additional copies of the payment processing service to process more messages faster.

Wrapping Up

In this article, we learned about monolithic and microservices architecture. We also focused on exploring the concept of message queues and how to implement an asynchronous communication pattern in a microservices architecture.

Then we covered how adopting the asynchronous communication pattern as opposed to the synchronous approach could lead to increased performance, reliability, and ease of scale.

We merely managed to scratch the surface of message queues and what can be achieved with them— for example, being used in a distributed architecture isn’t their only use case.

To take your understanding of message queues and their use cases a step further, you can check out this detailed 7-part beginner’s guide on message queues and microservices.

This guide will go deeper and also walk you through building a demo application that adopts the microservice architecture with a message queue. That way, you get to see firsthand how to use a message queue in a distributed architecture.

How to Scale a Distributed System

freeCodeCamp — Mon, 13 Dec 2021 23:37:24 +0000

By Apoorv Tyagi

Designing a distributed system that supports millions of users is a complex task, and one that requires continuous improvement and refinement.

Recently I read a book by Alex Xu called "System Design Interview – An Insider's Guide". This article, inspired by the first part of the book, shares some popular techniques used by many large tech companies to scale their architecture to support up to a million users.

This is not an exhaustive list, but if you're a newer developer who's just getting started, this can help you build a stronger foundation for your career.

Use a Load Balancer**

A load balancer is a device that evenly distributes network traffic across several web servers. In this architecture, the clients do not connect to the servers directly – instead they connect to the public IP of the load balancer.

Using a load balancer also protects your site in the event of web server failure – and this, in turn, improves availability. For example,

If one server goes down, all the traffic can be routed to the second server. This prevents the overall system from going offline.
If in the future the traffic grows and these two servers are not enough to handle all the requests properly, then you just need to add more servers to your pool of web servers and the load balancer automatically starts distributing requests to them.

Load Balancing Algorithms

Let's look at some of the algorithms which a load balancer can use to choose a web server from a pool for an incoming request:

Round Robin – You start from the first server in the pool, move down to the next server, and when you're done with the last server you loop back up to the first and start working down the pool again.
Load-based server – You assign a server based on whichever server has the smallest load currently, thereby increasing throughput.
IP Hashing – You assign a server by hashing the IP address of incoming requests and using the hash value to do the modulo operation with the number of servers available in the server pool.

Use Caching

A cache stores the result of the previous responses so that any subsequent requests for the same data can be served faster. So you can use caching to minimize the network latency of a system.

You can significantly improve the performance of an application by decreasing the network calls to the database. This is because repeated database calls are expensive and cost time.

For example, every time a new user loads a website's home page, one or more database calls are made to fetch the data. This increases the response time. Caching can alleviate this problem by storing the results you know will get called often and those whose results get modified infrequently.

Here are a few considerations to keep in mind before using a cache:

Set an expiration policy: You should always have an expiration policy on your cache. If you don't have one, the data will get stored in the cache permanently and it will become stale.
Sync the cache and database: You should build a mechanism to keep the database and the cache in sync. If any data modifying operations occur in the databases and the same change doesn't reflect in the cache then it will introduce inconsistencies in your system.
Set an eviction policy: You should have an algorithm that can decide which existing items will get removed once the cache is full and you get a request to add other items to the cache. Least-recently-used (LRU) is one of the most popular cache eviction policies used today.

Use a Content Delivery Network (CDN)**

A CDN or a Content Delivery Network is a network of geographically distributed servers that help improve the delivery of static content from a performance perspective. CDN servers are generally used to cache content like images, CSS, and JavaScript files.

Here is how a CDN works:

When a client sends a request, a CDN server to the client will deliver all the static content related to the request.
If the CDN server does not have the required file, it then sends a request to the original web server.
The CDN caches the file and returns it to the client.
Let's say now another client sends the same request, then the file is returned from the CDN.

Here are a few considerations to keep in mind before using a CDN:

Cost: CDNs are generally run by third-party providers and they charge you for the data transfers in and out of the CDN. So caching infrequently used assets should not be stored in the CDN.
Fallback Mechanism: If a CDN fails, you should be able to detect it and start sending requests for resources from the original web server. So you should build a mechanism for how your application copes with a CDN failure.

Set Up a Message Queue**

A message queue allows an asynchronous form of communication. It acts as a buffer for the messages to get stored on the queue until they are processed.

The architecture of a message queue includes an input service, called publishers, that creates messages, publishes them to a message queue, and sends an event. Another service called subscribers receives these events and performs actions defined by the messages.

Both publishers and subscribers are decoupled from each other and that's what makes the message queue a preferred architecture for building scalable applications.

Message queue example

Consider the following use case:

You are building an application for ticket booking. As soon as a user completes their booking, a message confirming their payment and ticket should be triggered. This task may take some time to complete and it should not make our system wait for processing the next request.

Here, we can push the message details along with other metadata like the user's phone number to the message queue. Another worker service picks up the jobs from the message queue and asynchronously performs the message creation and sending tasks.

The publishers and the subscribers can be scaled independently. When the size of the queue increases, you can add more consumers to reduce the processing time.

Choose Your Database Wisely

According to Wikipedia:

A database is an organized collection of data stored and accessed via a computer system.

Databases are used for the persistent storage of data. We generally have two types of databases, relational and non-relational.

➔ Relational Database

A relational database has strict relationships between entries stored in the database and they are highly structured. This is to ensure data integrity. For example, adding a new field to the table when its schema doesn't allow for it will throw an error.

Another important feature of relational databases is ACID transactions.

ACID transactions

These are a set of features that describe any given transactions (a set of read or write operations) that a good relational database should support.

Atomicity means that when a transaction that comprises more than one operation takes place, the database must guarantee that if one operation fails the entire transaction fails. Either it happens completely or doesn't happen at all.

Consistency means that each transaction in a database does not violate the data integrity constraints whenever the database changes state and does not corrupt the data. In simple terms, consistency means for every "read" operation, you'll receive the most recent "write" operation results.

Isolation means that you can run multiple concurrent transactions on a database, without leading to any kind of inconsistency. All these multiple transactions will occur independently of each other.

Durability means that once the transaction has completed execution, the updated data remains stored in the database. It will be saved on a disk and will be persistent even if a system failure occurs.

➔ Non-Relational Databases

A non-relational database has a less rigid structure and may or may not have strict relationships between the entries stored in the database. The data typically is stored as key-value pairs. For example:

[
    { 
        firstName: "Apoorv",
        lastName: "Tyagi",
        gender: "M"
    },
    { 
        name: "Judit",
        rank: "Polgar",
        gender: "F"
    },
    {
      //...
    },
]

Similar to the ACID properties of relational databases, the non-relational database offers BASE properties:

Basically Available (BA) which states that the system guarantees availability even in the presence of multiple failures.

Soft State (S) means the state of the system may change over time, even without application interaction due to eventual consistency. In NoSQL, unlike RDBMS, it is believed that data consistency is the developer's responsibility and should not be handled by the database.

Eventual Consistency (E) means that the system will become consistent "eventually". However, there's no guarantee of when this will happen.

NoSQL vs SQL

Non-relational databases (also often referred to as NoSQL databases) might be a better choice if:

Your application requires low latency. Since there are no complex JOIN queries.
You have a large amount of unstructured data, or you do not have any relation among your data.

How to Scale a Database

Let's now look at the various ways you can scale your database:

Vertical vs horizontal database scaling

In vertical scaling, you scale by adding more power (CPU, RAM) to a single server.

In horizontal scaling, you scale by simply adding more servers to your pool of servers.

For low-scale applications, vertical scaling is a great option because of its simplicity. But vertical scaling has a hard limit. It is practically not possible to add unlimited RAM, CPU, and memory to a single server.

Because of this, it is recommended that you go for horizontal scaling (also known as sharding) for large-scale applications.

Database replication

This is the process of copying data from your central database to one or more databases.

You do database replication using primary-replica (formerly known as master-slave) architecture. The primary database generally only supports write operations. All the data modifying operations like insert or update will be sent to the primary database.

On the other hand, the replica databases get copies of the data from the primary database and only support read operations. All the data querying operations like read, fetch will be served by replica databases.

Advantages of database replication:

Performance Improvements: Database replication improves performance significantly as all the writes and updates happen in the primary node and all the read operations are distributed to replica nodes, thereby allowing more queries to run in parallel.
High Availability: Since we create replicas of data across different nodes available in different parts of the world, the application remains functional even if one database node goes offline as you can access data from other nodes. In case the failure occurs in the primary node, any one of the replica nodes will get promoted to a primary node and serve the write/update operations until the original primary node comes back online.

Wrapping Up

That's it. Thanks for stopping by. I hope you found this article interesting and informative!

My DMs are always open if you want to discuss further on any tech topic or if you've got any questions, suggestions, or feedback in general:

Happy learning! 💻 😄

Microservices Architecture – Explained in Plain English

freeCodeCamp — Wed, 23 Jun 2021 21:52:08 +0000

By Charles M.

Over the last few years, microservices have gone from an overhyped buzzword to something you should understand as a software engineer.

According to an O'Reilly developer survey in 2020:

61% of companies have been using microservices in the last year
29% say at least half of their company systems are built using microservices
74% say their teams own the build/test/deploy phases of their applications

These numbers will only continue to increase over time as the ecosystem around microservices matures and makes adoption even easier.

This doesn't mean you need to be an expert on microservices to get a job, but it is definitely a bonus to at least understand the basic fundamentals.

The truth is, microservices aren't that hard to understand when you boil it down to the basics. The biggest problem is that most of the resources available are written to impress readers instead of actually educating them.

Another reason is that there isn’t even a true concrete definition about what a microservice is. The result is that there are tons of overlapping definitions and jargon which leads to confusion for people trying to learn about microservices.

In this article I will cut through all the chaff and focus on the core concepts of what microservices actually are. I'll use a variety of real world examples and metaphors to make abstract concepts and ideas easier to understand.

Here's what we'll cover:

Brief history of software design
Benefits and downsides of monoliths
Benefits and downsides of microservices

4 Minute Microservice Summary

If you prefer a quick introduction to microservices, you can watch this video first:

How to Understand Microservices with an Analogy of Starting Your Own Business

Let's say you are a software engineer and decide to start freelancing to earn some money. At the beginning you have a few clients and things go smoothly. You spend most of your time writing code and clients are happy.

But over time you start to slow down as the business grows. You spend more and more of your time doing customer service, answering emails, making minor changes for past customers, and other tasks that don't move the needle for you in terms of revenue.

You realize that you aren't optimizing your time as a software engineer so you hire a dedicated employee to handle customer service.

As you continue to grow you add more employees with specialized skills. You hire a marketer to focus on attracting new customers. You add project managers, more software engineers, and eventually an HR department to help with all these employees.

This was all necessary for your business to grow beyond what you could do by yourself as a single person, but there are of course growing pains.

Sometimes there are miscommunications between teams or departments and clients get upset when details slip through the cracks. There is the direct cost of having to pay employee salaries, internal rivalries between teams, and numerous other issues that arise when a company grows larger.

This example is somewhat representative of how a software company might move from a monolith to a microservice type architecture. What starts out with one person doing all the work gradually becomes a collection of specialized teams working together to achieve a common goal for the company.

This is very similar to how tech companies with monoliths have migrated to microservice architectures. While these examples aren't a perfect 1-1 match for microservices, the general problems are the same:

Scaling – Ideally you want to be able to quickly hire new employees and linearly scale how productive your company is.
Communication - Adding more employees adds the overhead of needing to coordinate and communicate across the organization. There are numerous strategies that companies try to use to make this efficient, especially in this era of remote work.
Specialization - Allowing certain groups in the organization to have autonomy to solve problems the most efficient way possible rather than trying to enforce a standard protocol for all situations. Certain customers might have different needs than others, so it makes sense to allow teams to have some flexibility in how they handle things.

How to Go from a Monolith to Microservices

To understand the present it helps to understand the past. Traditionally, software was designed in a monolithic style, and everything ran together as a single application. Like everything else in life, there are some pros and cons to this style of application.

Monoliths aren't inherently bad – and many microservice advocates actually recommend starting out with a monolith and sticking with it until you start running into problems. You can then break your monolith into microservices naturally over time.

Advantages of a Monolith Architecture

Faster development time initially

With a small team, development speed can be extremely fast when you're just starting off.

This is because the project is small, everybody understands the entire app, and things move smoothly. The members of the team know exactly how everything works together and can rapidly implement new features.

Simple deployment

Because monoliths work as a single unit, things like testing and logging are fairly simple. It's also easier to build and deploy a single monolith compared to a bunch of separate microservices.

Disadvantages of a Monolith Architecture

Despite the early benefits of monoliths, as companies grow they often encounter several problems on organizational and technical levels as a result of their monolithic application.

Tight-coupling of modules

Most companies with monolithic applications try to logically break the monolith into functional modules by use case to keep things organized. Think things like Authentication, Comments, Users, and Blog posts.

The problem is that this requires extreme engineering discipline to maintain long term. Established rules often get thrown out the window when a deadline approaches. This results in shortcuts being taken during a crunch and tangled interconnected code that accumulates as technical debt over time.

Real world example - Trying to stay disciplined with monoliths is a lot like sticking to an exercise routine or diet. Many people get excited and can stay disciplined with their diet for a few weeks, but eventually life gets in the way and you revert back to your normal routine.

Trying to enforce loose coupling with monoliths is like that – there's just too much temptation to cut corners when you get in a time crunch.

Onboarding new hires becomes hard

For new hires, becoming productive often takes much longer because they need to learn how all the interconnected pieces of the monolith work together before they can risk modifying any single part of the application.

It's not unheard of for new hires to say it takes months for them to truly feel comfortable with a massive code base. And there's always the underlying fear that any time you push new code it might blow up the entire app.

Real world comparison - Training somebody to do a single task like hammer nails vs training somebody to do every single possible task on a construction site.

Having to teach a new hire absolutely everything about the entire job increases the cost of hiring new employees.

Conflicting resource requirements

In a monolith, different modules might have different hardware requirements. Some tasks might be CPU-heavy calculations, others might require a lot of RAM.

But because the entire application has to run on the same server, you can't use the type of hardware specialized for a certain task.

Real world example - Certain types of vehicles are better suited for certain tasks.

If you are going on a road trip, a car with great fuel economy would be the best choice so you save money on gas. If you are moving into a new apartment, it would be good to have a vehicle with more space for storage so you don't have to make as many trips.

A single bug can take down the entire app

Because the application is deployed as a single unit, that means that any team can accidentally create a bug that takes down the entire monolith.

Real world example - To prevent a single leak from sinking an entire ship, bulkheads are used to seal off sections if they start to flood.

Microservices work in a similar way – each service is deployed independently from others, which can reduce the chances of a bug taking down the entire app.

Limits experimentation

When building a monolith, you are pretty much stuck using the ecosystem of the programming language the monolith was written in. A simple example would be the tradeoffs of low level programming languages and higher level programming languages.

With a microservice architecture, if a certain service is struggling to scale, you have the option to rewrite it in a higher performance language like C++ or Go.

For other services where performance isn't a huge factor, you can improve development speed by using higher level languages like Python or JavaScript.

A monolith architecture can also blind a team from seeing alternative ways to solve a problem. When you only have a hammer, everything looks like a nail.

Real world comparison - Pizza is great, but you probably wouldn't want to eat pizza every meal for the rest of your life.

Plus in some situations it would also just be inconvenient to cook and eat pizza rather than something else. Sometimes it would be nice to just grab a quick snack or eat something a little healthier.

Deployments can become slow

One of the strengths of the monolith listed above can eventually become a weakness. The fact the entire app is deployed together can become a problem for massive monoliths because it can result in taking a long time to deploy the entire service. This reduces how fast a team can iterate and make changes to the app.

Each time they make even a minor change they are forced to wait for the app to build and deploy for testing.

Real world example- Your dream is to make the world's best cookies. The fastest way to accomplish this goal would be to test as many batches of cookies as possible while gradually changing and improving the recipe until it was perfect.

Now imagine you only have 1 oven. The rate at which you can test out different cookie recipes is much slower compared to having 10 ovens.

Advantages of Microservices

So now that you know the pros and cons of the monolith architecture style, let's examine microservices.

Development Speed Improves

Because you are no longer deploying a monolith, teams are able to move faster when it comes to adding features. Teams can have independent release schedules and don't have to worry about coordinating with other teams as much.

As long as the external interface that other microservices use to interact with the team's service stays the same, a development team could completely rewrite the system in another programming language if they wanted.

Another benefit of each service being deployed independently is that builds are faster due to each build being smaller. This means that iteration time is also improved just due to builds being faster.

Real world example - When you buy food from a restaurant you don't really care if anything changed behind the scenes as long as the food tastes good.

Maybe they got new ovens or fryers, but as long as the food tastes the same you don't worry about it. As an external consumer the only thing that matters is the end product.

Faster onboarding for new hires

New employees can learn a single system to start and begin contributing. Over time they can continue learning more about the entire application but that isn't necessary right away.

Real world example - The assembly line revolutionized production by breaking things down. Instead of each employee having to know how to create an entire product from scratch, they just needed to learn the single part they worked on. This cut down on training time for new employees and allowed better scale.

Fault Tolerance

While microservices often do depend on each other to complete tasks, a properly designed microservice architecture will have built-in redundancy and fail safes to prevent failure of the entire system if another service goes down.

Often this involves retrying requests with an increasing wait period between requests or a default fallback value to return if the service isn't available.

Real world example - If Netflix's recommendation service breaks, it doesn't make sense to return a complete failure message to users.

Instead Netflix could just return a default set of popular movies and in the background keep retrying the recommendation service until it is able to return the user's customized recommendations.

Flexible Scalability

Because each service is deployed independently you can also replicate and scale each service on its own. With a monolith the company would be forced to scale the entire application, despite only a single feature getting more traffic than usual.

With a microservice architecture, a company can specifically scale only the service that needs to handle more traffic, which is more efficient and can save money because it reduces wasted resources.

Real world example- Let's consider something like Cyber Monday for Amazon, way more orders than usual will be placed but most people probably already selected what they wanted and put it in their cart.

So while the Orders Service will be getting way more traffic than usual, things like the Search Service and other features might be around normal usage rates.

This is especially useful if a service is particularly heavy for a certain resource and can use specialized hardware for that task.

If a service needs a ton of CPU resources but not much RAM, the company can save money by not using general purpose servers. A company using a pure monolith has no choice but to scale using "jack of all trades" type servers.

Disadvantages of Microservices

Microservices are far from perfect. Shifting from monolith to microservices eliminates some problems while creating new ones.

Overall Complexity

While each individual service is easier to understand, the entire system itself is complicated. This additional complexity led to the rise of tools like Docker and Kubernetes to abstract away as much of it as possible.

The goal of these tools is to allow software engineers to not worry about anything other than building features like they normally would, without worrying about how it all works behind the scenes.

Communication

One of the biggest issues with microservices is figuring out how they communicate with each other.

A single external request from a user might require several services working together to fulfill that request. Let's use placing an order online as an example of how this might work:

User places order in app
Load balancer forwards request to services that are available to process the request
Shopping cart service gives list of items in the order
Inventory service confirms that items are in stock
Shipping service calculates estimated cost and delivery time
Payment service confirms that customer's payment is valid
Recommendation service uses items ordered to generate new recommendations for the customer in the future
Review service schedules an email to ask the customer to leave a review

At any of the above stages a single service failing could result in the entire order process failing or annoyance for the user, which would quickly make for some angry customers.

Handling how all these services interact and deal with partial failures is a huge challenge with microservice architectures.

Handling Data

One of the most difficult challenges with microservices is how to handle requests that span multiple services and require making updates to data.

What happens if a request fails part way through the sequence with data updated in one service but not the rest? You don't want to bill a user but then have them not receive what they paid for because the service was down.

In a monolith you can rely on ACID transactions to rollback a database change if something goes wrong. With microservices there is much more complexity involved with what are known as distributed transactions across services.

Development environment

Most tools were designed with monoliths in mind and development in general becomes more difficult with a microservice architecture.

Testing requires being able to simulate interactions with other services, Debugging is more difficult because things are no longer happening inside a single process, and logging must be done across multiple services.

Even something simple like trying to track why a blog is loading slowly is more difficult than you might expect.

Let's say you notice on your analytics that all of a sudden it's taking 5 seconds for pages to load on your blog. With a monolith it would be pretty easy to track down the problem, but with a microservice architecture you need specialized tools to track external requests as they are processed by different services.

Conclusion

Hopefully this article gave you a decent understanding of the what and why of microservices and an intuitive understanding of how they work, even if you don't understand all the technical details under the hood.

If you are interested in seeing future videos and articles on microservices be sure to subscribe on YouTube or follow on Twitter so you don't miss anything.

System Design Interview Tutorial – The Beginner's Guide to System Design

freeCodeCamp — Mon, 14 Jun 2021 22:46:31 +0000

By Charles M.

System Design is an important topic to understand if you want to advance further in your career as a software engineer. Even if you are just beginning your coding journey, it's a good idea to get a head start on learning about system design.

Early in your career you will mostly just be tested on your coding ability. In higher level interviews, however, there will often be a greater focus on testing your ability and experience at designing applications.

The biggest struggle engineers have with system design interviews is that they are more open-ended and there isn't any single correct answer. This lack of structure can be intimidating, so my goal with this article is to give you a roadmap for navigating these types of interviews with confidence.

What this article will cover:

What is a system design interview and why they are used
The main stages of a system design interview
Example interview problem – Design YouTube

Video Tutorial

You can also watch this tutorial on YouTube if you like:

And I've created a playlist of videos on specific topics related to system design and web architecture:

System Design Interview Overview

At first glance it seems silly to ask somebody to design a huge app like Twitter or YouTube in 45-60 minutes. These apps were designed over a period of years by hundreds of engineers working together, so it's clearly an impossible task to do in a short interview.

There are two main reasons why companies use these types of interviews. The first is, of course, to test your knowledge about the technologies being discussed. They want you to go deep enough to make sure you aren't just throwing buzzwords around without understanding how things actually work.

The second reason might be more important, though. The system design interview is a way to simulate a realistic scenario where you are working together with the interviewer to determine the best design decision.

Getting the perfect answer isn't necessarily the most important thing here – it's some of the other things you can show, like:

How do you handle being challenged? Do you get defensive or take feedback with a positive attitude? Are you stubborn or narrow-minded?
Do you show knowledge of the various tradeoffs certain design decisions involve? There's a big difference between blindly making a decision and not realizing the consequences, and knowing the pros/cons and accepting the tradeoffs.
Are you able to effectively communicate and if necessary explain complex technical concepts in an easy to understand way?
Are you candidate somebody the interviewer would want to work with long term? Even if somebody is a genius, if they are miserable to work with they might not be a good hire.

Stages of a System Design Interview

In this section you'll learn a general framework for structuring how to handle a problem during a system design interview.

Clarify the problem and establish design scope

The first thing you'll want to do after your interviewer gives you the problem is to take a few minutes to ask some clarifying questions and figure out what exactly they are looking for.

The worst thing you could do here is just start off in the completely wrong direction because you didn't take the time to ask a few questions. You have a limited amount of time during the interview, so you want to make sure you focus on what's important.

Here are some examples of questions you might ask:

What are the use cases / features of the app?

In this article we will be using YouTube as an example. There are hundreds of different features you could design like ad delivery, authentication, recommendation algorithms, comments, video upload, video processing, and many others.

During an interview you only have time to cover a few of those, so make sure to ask the interviewer questions to figure out what they want you to focus on designing.

How many users are expected / what is the likely traffic volume?

The complexity of the system will depend on the amount of traffic it needs to handle, so make sure to gather this information.

You don't want to over-engineer things if the traffic is relatively low and you also don't want to get stuck with an app that can't scale because you didn't design it properly.

Ask questions like how many users the app will have, the average amount of data per request, how long data needs to be stored, and how reliable and available does the system need to be?

This step is going to help you beyond just getting more information to work with. You're also showing the interviewer that you understand how to gather information about a vague problem.

Determine Rough Capacity Estimates

Using the information you gathered during the first step, you can begin to make some rough estimates and generalizations for things like storage and bandwidth requirements.

This process will involve some basic math like multiplying the number of users by the average request size and the amount of requests each user is expected to make daily.

Create a High Level Design

Here you want to create a rough architecture for the system. Draw out things like load balancers, web servers, app servers, task queues, database, caching, file storage, and so on. You should include all the core components you need to create the system.

Make sure to communicate with the interviewer during this stage and check to ensure that you aren't missing anything. While they probably won't tell you directly, they will give you a nudge in the right direction if you forgot about some crucial feature.

API Design

This part is almost cheating because you are using the structure of the interview to your advantage to confirm that you are on the right path.

The interviewer is never going to deliberately lead you down the wrong path, so once you've created your high level design you can start sketching out some rough API endpoints for each component.

For the YouTube example they might look something like this, depending on which features you are building:

uploadVideo (userID, video, description, title)
comment (userID, videoID, comment)
viewVideo (videoID)
videoSearch (query)

In some cases you might not need to drill down to this level. If the interview question is very high level like "design Youtube", you can probably skip this part. On the other hand if you get a more focused question like "design YouTube's comment system", it would make sense to go more in depth.

Create a Data Schema

At this point you should have a good idea of all the requirements and data needed for the application to work, so now you can plan out how your data is structured.

Depending on what you are building and the requirements, you'll need to weigh the costs and benefits of things like using a relational vs non-relational database. When modeling your data you'll also want to account for things like potential data partitioning and replication.

Take a Detailed Look at the Components

What happens during this section will mainly depend on the feedback of the interviewer. They will probably pick out a few specific components to focus on and ask why you made certain decisions.

The most important part here isn't necessarily being 100% right. Instead, it's to show that you didn't just blindly make decisions and understand exactly what tradeoffs you were making.

You should be able to propose alternate design decisions that could have been used and explain why you didn't use them.

How to Design YouTube

Now that you have a general idea of how a system design interview works and a framework for handling a system design problem, I'm going to show you how to put it all into practice using YouTube as an example.

Step 1 – Define Problem Scope and Requirements

This will be a high level problem where we implement a few of YouTube's major features without diving too in-depth on any of them. The features to focus on will be:

Users can upload videos
Users can view videos
Users can comment on videos

Step 2 – Determine Capacity estimates

The two biggest capacity factors in an app handling large amounts of video like YouTube will be storing all that content and bandwidth requirements to deliver the content to users. In this section you'll learn how to make rough estimates for capacity requirements.

The main focus here is not on being highly accurate, but showing a logical thought process for calculating these numbers based on the information available to you.

In an interview you would be given the data, but in this case I'm using two key pieces of data that YouTube has made public:

YouTube creators upload 500 hours of video every minute
YouTube users watch 1 billion hours of video per day

You can use these numbers to calculate storage and bandwidth requirements with a few assumptions.

Bandwidth Calculation

Daily bandwidth calculation

To calculate an estimate for bandwidth, we start with the amount of video watched daily. The key assumption here is how much bandwidth is used per hour watched, as this would depend on the quality of video most users choose to watch.

The 3 Gigabyte estimate is based on a rough percentage of users watching in standard definition and others choosing HD or 4K, which consume much more bandwidth per hour watched.

The math here is fairly simple: multiply 1 billion hours by the average bandwidth of an hour of video, then divide that by 1000 to convert to terabytes, then divide by 1000 again to get to Petabytes. The final bandwidth estimate is 3,000 PB used daily.

Storage Calculation

Step by step calculations for storage

Based on a few assumptions we can calculate that YouTube will need to store around 2.16 Petabytes of new video every day. Here's how we get that number:

Convert 500 hours to 30,000 minutes of video uploaded per minute
Each minute of HD video is roughly 50 Megabytes due to having copies of each video in multiple formats. We multiply that by 30,000 minutes and then divide by 1000 to convert to Gigabytes.
We then take the 1,500GB uploaded per minute and multiply by 60 then 24 to calculate the daily amount of video uploaded. We divide by 1000 again to convert Gigabytes to Terabytes
Our final total is 2,160 Terabytes uploaded daily or 2.16 Petabytes

Step 3 – Database Design

For our database we will use a standard relational database like MySQL. The schema will look something like this:

This design is very simple but has the essentials that you'd need for a basic implementation. It would be a good idea to do some research into the differences between relational and non-relational databases so you understand what kind of situations each excel at and when to use them.

For certain apps with different requirements a NoSQL database might make sense. Often large systems will have many different services that use different types of databases depending on their needs.

Step 4 – High Level Design

That's a pretty complex diagram, so let me break down what's happening:

Client – This could be a user on a mobile app or their computer trying to upload a video, make a comment, or watch a video
CDN – A content distribution network is used to reduce latency and improve reliability when it comes to delivering static content like videos or images. A CDN works by storing content in data centers all around the world so that the content is closer to users. This results in reduced latency because requests travel a shorter distance. There's also an added benefit of content being stored in multiple locations so even if one location can't serve traffic for some reason, another location can.
Load Balancers – A load balancer accepts requests and routes them to servers depending on a number of factors. At YouTube's scale, a single server can't handle all the traffic and you want replication to prevent a single point of failure. The load balancer can check the status of servers and verify they can handle traffic or choose another server that can handle the request.
Services – You can think of this as the app layer of the system. Instead of using a single monolith to handle traffic, we'll use several microservices to handle specific tasks. The second box for each of these services in the diagram represents multiple servers running for each of them to increase reliability. If one replica of the service goes down, there's always another to step in and handle traffic.
Data Stores – When using microservices it is generally best practice for each microservice to own its own data. If one service needs data from another they can access it through an API.
Video Upload Process – Handling the video uploads will involve multiple steps, as trying to handle it synchronously with the app server would be fragile and reduce performance. I'll cover this more in depth in the next section

I don't want to go too in-depth on these individual components because I could write entire articles on any of them if I wanted to explain them fully.

If you are interested in a more detailed explanation you can check out the system design playlist I linked to above which has videos covering most of these concepts.

Step 5 – Go Over Specific Components and Details

At this stage you have a working design. Now let's look at some of the specific components in detail.

Video Upload

Video content is the lifeblood of YouTube, and it doesn't exist without it. This means that making it quick and easy for users to upload videos is probably the most important feature.

Imagine uploading a multi-gigabyte video to YouTube and then seeing the upload fail after 30 minutes when it's 95% done. To prevent this you'll want to support the ability for resuming uploads if the client's connection is lost temporarily. The uploaded video can then be stored with a distributed file system like HDFS.

Once the upload is complete there's still a lot more to do before the video is ready for users to access. The video needs to be encoded into multiple different quality formats, you need to generate thumbnails, and push copies of the video to the global CDN.

Again, at any stage one of these processes could fail. To prevent this you'll have a task queue to manage this process and retry the processing attempt if it fails at any stage.

Database Scaling

The database is often the bottleneck of an application. You will probably be tested on whether you understand some of the fundamental concepts around database scaling. This could include caching to handle read requests, sharding, and replication.

Conclusion

Hopefully this article has given you a better understanding of what to expect during a system design interview.

This article mainly focused on the structure of the interview itself rather than the concepts you need to understand to answer the questions given during the interview.

Two great resources for beginning to learn about that are:

A great article posted here on Free Code Camp News: https://www.freecodecamp.org/news/systems-design-for-interviews/

The system design primer repo on GitHub: https://github.com/donnemartin/system-design-primer

Both cover just about every major concept you need to know for your system design interview and should put you in a great position for success.

How to make Wordpress more exciting with the Wordpress API, ACF, & Express.js

freeCodeCamp — Thu, 14 Jun 2018 00:33:56 +0000

By Tyler Jackson

I’ve been working with Wordpress since it’s proliferation as a content management system. I hardly get excited when clients or co-workers mention it anymore. I’ve “found the light” in more robust frameworks and learned much more about the different parts of custom web applications.

So, in an effort to rejuvenate my passion for Wordpress, I’ve started looking at different ways to implement the framework. One of those ways is to separate the front-end from the back-end and avoid some of the pain points of using the Wordpress Template Tags and theming system. Let’s take a look.

Monolithic vs. Distributed Apps

Wordpress is a monolithic framework, meaning the different parts of the framework (database, file storage, presentation structure & asset files, business logic files) are all packaged together. This is a large part of why Wordpress is so easy to get up and running. Install MAMP, copy over the latest Wordpress files, create a database, and change the wp-config.php file. Good to go.

We are going to go against the monolithic convention and break this Wordpress site up into two different parts: front-end and back-end, presentation and administration.

We are going to use Wordpress for the data administration of our app and leverage a plugin to help with the creation and management of attributes (fields) for our custom post type. For the presentation side of things, we are going to forego a theme entirely and consume API endpoints from an Express.js application.

Example

In this example, we are going to build a simple product listing. The idea is that you already have a website powered by Wordpress, and would like to manage a list of products for sale through the same interface. But you want to create a completely different website for the store.

Wordpress API

Since version 4.7, Wordpress is automatically exposing your published posts (and other data) via its REST API, presented in a JSON format. If you’ve developed a website using Wordpress 4.7+, simply add /wp-json to the root URL and marvel at the wall of text that’s returned.

With this API automatically integrated into the Wordpress installation, a lot of the work of a distributed application is already done for us. API creation can be a roadblock when getting started with this new way of thinking about applications. Wordpress has created a fantastic, basic API for consuming our data any way we prefer.

At this point, I would only be cluttering the internet by writing a tutorial on how to locally install Wordpress. So instead, I’m going to point you towards a trusted source on the subject.

No matter what path you take to get a Wordpress instance up and running, you should be able to access it via http://localhost or some other URL. Once we have a URL, let’s do a quick test to make sure we have data coming back. I prefer a tool like Postman, but we’ll keep it simple and visit the following URL in our browser (changing the URL accordingly, of course).

http://localhost/mysite/wp-json

This should return a list of all the available endpoints for your Wordpress installation’s REST API.

But for real, Postman…

Postman
_Postman is the only complete API development environment, for API developers, used by more than 5 million developers…_www.getpostman.com

Custom Post Types

Since Wordpress limits us to two data types (Posts & Pages) we are going to need to create a custom post type for our Products. This will create a clear separation from the Product posts and any other posts we have.

There are a number of different ways to create custom post types. Here, we are going to create a single file Wordpress Plugin to register the Products post type.

/*Plugin Name: Product Custom Post Type*/

function create_product_cpt() {  $labels = array(   'name' => __( 'Products', 'Post Type General Name', 'products' ),   'singular_name' => __( 'Product', 'Post Type Singular Name', 'products' ),   'menu_name' => __( 'Products', 'products' ),   'name_admin_bar' => __( 'Product', 'products' ),   'archives' => __( 'Product Archives', 'products' ),   'attributes' => __( 'Product Attributes', 'products' ),   'parent_item_colon' => __( 'Parent Product:', 'products' ),   'all_items' => __( 'All Products', 'products' ),   'add_new_item' => __( 'Add New Product', 'products' ),   'add_new' => __( 'Add New', 'products' ),   'new_item' => __( 'New Product', 'products' ),   'edit_item' => __( 'Edit Product', 'products' ),   'update_item' => __( 'Update Product', 'products' ),   'view_item' => __( 'View Product', 'products' ),   'view_items' => __( 'View Products', 'products' ),   'search_items' => __( 'Search Product', 'products' ),   'not_found' => __( 'Not found', 'products' ),   'not_found_in_trash' => __( 'Not found in Trash', 'products' ),   'featured_image' => __( 'Featured Image', 'products' ),   'set_featured_image' => __( 'Set featured image', 'products' ),   'remove_featured_image' => __( 'Remove featured image', 'products' ),   'use_featured_image' => __( 'Use as featured image', 'products' ),   'insert_into_item' => __( 'Insert into Product', 'products' ),   'uploaded_to_this_item' => __( 'Uploaded to this Product', 'products' ),   'items_list' => __( 'Products list', 'products' ),   'items_list_navigation' => __( 'Products list navigation', 'products' ),   'filter_items_list' => __( 'Filter Products list', 'products' ),  );

  $args = array(   'label' => __( 'Product', 'products' ),   'description' => __( '', 'products' ),   'labels' => $labels,   'menu_icon' => 'dashicons-products',   'supports' => array('title', 'editor', 'excerpt', 'thumbnail'),   'taxonomies' => array('products'),   'public' => true,   'show_ui' => true,   'show_in_menu' => true,   'menu_position' => 5,   'show_in_admin_bar' => true,   'show_in_nav_menus' => true,   'can_export' => true,   'has_archive' => true,   'hierarchical' => false,   'exclude_from_search' => false,   'show_in_rest' => true,   'rest_base' => 'products',   'publicly_queryable' => true,   'capability_type' => 'post',  );

  register_post_type( "product", $args );}%>

While long-winded, this is pretty standard code for creating a custom post type in Wordpress. Two things to note in our $args array:

'show_in_rest' => true makes the custom post type accessible via the REST API
'rest_base' => 'products' sets the name we use to access Products via the REST API endpoints

Once you have your custom post type showing in the Wordpress admin, let’s make sure we can get a response via the API (you’ll need to create a product so it doesn’t return empty).

http://localhost/mysite/wp-json/wp/v2/products

And here’s what we get…

Sweet!

Advanced Custom Fields

I try to limit my usage of plugins as much as possible, but I’ll make an exception for Advanced Custom Fields (ACF). ACF takes all the work out of creating and managing custom fields for post types. Head to your Plugins page, search for Advanced Custom Fields then click “Install” & “Activate”. All done.

It would also be redundant for me to walk you through creating a Field Group using Advanced Custom Fields, so I’ll let their documentation walk you through it if you don’t know how.

Let’s create a Field Group called “Product Meta” and add fields for “Normal Price”, “Discount Price” and “Inventory Quantity” and position them in the sidebar area.

Good.

Now comes the tricky part. The fields we just created through ACF aren’t exposed via the REST API by default. We will have to leverage add_filter and rest_prepare_{$post_type} to add the custom field values to the JSON response. Here, I’ve simply added this bit of code to the bottom of our custom post type plugin file for the sake of brevity.

function my_rest_prepare_post($data, $post, $request) {  $_data = $data->data;    $fields = get_fields($post->ID);

  foreach ($fields as $key => $value){    $_data[$key] = get_field($key, $post->ID);  }

  $data->data = $_data;    return $data;}

add_filter("rest_prepare_product", 'my_rest_prepare_post', 10, 3);

Thanks to Cody Sand for the tidbit above.

Express.js

Our Express.js app will provide us a framework for consuming the Wordpress API and presenting products in our store website. Since we are simply consuming an API, we could use any framework of our choosing. Vue.js. Angular. Backbone. React. Rails. Django. Middleman. Jekyll. The front-end world is your oyster.

I’ll assume you already have Node.js installed. If you don’t, it’s dead simple. Let’s start a new Express.js app.

npm install -g express-generator nodemonexpress --css=sass --view=jade --git mystorecd mystorenpm install --save request request-promise && npm install

Here, we are using the Express Generator package to generate a skeleton for our Express app. We’ll also be using SASS for stylesheets and Jade Template Engine. Choose whatever you’re comfortable with. Nodemon will restart our app automatically for us when a file changes, and the Request library will help us make HTTP requests to the Wordpress API. Let’s serve up our Express app:

nodemon

Now, when we pull up http://localhost:3000 we should see our Express app running.

Alright, now let’s pull in our products.

var express = require('express');var router = express.Router();const rp = require('request-promise');

/* GET index page. */router.get('/', function(req, res, next) {  rp({uri: 'http://127.0.0.1:8888/test/wp-json/wp/v2/products', json: true})  .then((response) => {    console.log(response);    res.render('index', {products: response});  })  .catch((err) => {    console.log(err);  });});

module.exports = router;

In our index.js route file, let’s include the Request-Promise library then make a call to the products endpoint within our root route (/).

If the request is successful, then we render our index view. If there’s an error with the request, we simply log it. Now, the view…

extends layout

block content h1 MyStore ul  each product in products   li    product.title.rendered    product.price

Using Jade, we will simply list the products out. Ok, let’s check out our store site.

? There’s your prize. I’ll leave it up to you to continue down the Express road and figure out how to get product listing and index pages working.

Beyond

This is a fairly simple example of how distributed apps work using Wordpress. We could have continued to separate the app into even more parts by integrating a CDN for media storage or moving the database to a separate server. We also didn’t cover authentication for the Wordpress API which is something you would absolutely need in production.

From here, you could implement Stripe or another payment processor and have a fully functional store site. I hope this has inspired some of you to leverage Wordpress in different ways and continue using one of the most ubiquitous CMS solutions out there. Happy coding!

A Thorough Introduction to Distributed Systems

freeCodeCamp — Fri, 27 Apr 2018 18:44:13 +0000

By Stanislav Kozlovski

What is a Distributed System and why is it so complicated?

With the ever-growing technological expansion of the world, distributed systems are becoming more and more widespread. They are a vast and complex field of study in computer science.

This article aims to introduce you to distributed systems in a basic manner, showing you a glimpse of the different categories of such systems while not diving deep into the details.

What is a distributed system?

A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user.

These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

I propose we incrementally work through an example of distributing a system so that you can get a better sense of it all:

A traditional stack

Let’s go with a database! Traditional databases are stored on the filesystem of one single machine, whenever you want to fetch/insert information in it — you talk to that machine directly.

For us to distribute this database system, we’d need to have this database run on multiple machines at the same time. The user must be able to talk to whichever machine he chooses and should not be able to tell that he is not talking to a single machine — if he inserts a record into node#1, node #3 must be able to return that record.

An architecture that can be considered distributed

Why distribute a system?

Systems are always distributed by necessity. The truth of the matter is — managing distributed systems is a complex topic chock-full of pitfalls and landmines. It is a headache to deploy, maintain and debug distributed systems, so why go there at all?

What a distributed system enables you to do is scale horizontally. Going back to our previous example of the single database server, the only way to handle more traffic would be to upgrade the hardware the database is running on. This is called scaling vertically.

Scaling vertically is all well and good while you can, but after a certain point you will see that even the best hardware is not sufficient for enough traffic, not to mention impractical to host.

Scaling horizontally simply means adding more computers rather than upgrading the hardware of a single one.

Horizontal scaling becomes much cheaper after a certain threshold

It is significantly cheaper than vertical scaling after a certain threshold but that is not its main case for preference.

Vertical scaling can only bump your performance up to the latest hardware’s capabilities. These capabilities prove to be insufficient for technological companies with moderate to big workloads.

The best thing about horizontal scaling is that you have no cap on how much you can scale — whenever performance degrades you simply add another machine, up to infinity potentially.

Easy scaling is not the only benefit you get from distributed systems. Fault tolerance and low latency are also equally as important.

Fault Tolerance — a cluster of ten machines across two data centers is inherently more fault-tolerant than a single machine. Even if one data center catches on fire, your application would still work.

Low Latency — The time for a network packet to travel the world is physically bounded by the speed of light. For example, the shortest possible time for a request‘s round-trip time (that is, go back and forth) in a fiber-optic cable between New York to Sydney is 160ms. Distributed systems allow you to have a node in both cities, allowing traffic to hit the node that is closest to it.

For a distributed system to work, though, you need the software running on those machines to be specifically designed for running on multiple computers at the same time and handling the problems that come along with it. This turns out to be no easy feat.

Scaling our database

Imagine that our web application got insanely popular. Imagine also that our database started getting twice as much queries per second as it can handle. Your application would immediately start to decline in performance and this would get noticed by your users.

Let’s work together and make our database scale to meet our high demands.

In a typical web application you normally read information much more frequently than you insert new information or modify old one.

There is a way to increase read performance and that is by the so-called Primary-Replica Replication strategy. Here, you create two new database servers which sync up with the main one. The catch is that you can only read from these new instances.

Whenever you insert or modify information — you talk to the primary database. It, in turn, asynchronously informs the replicas of the change and they save it as well.

Congratulations, you can now execute 3x as much read queries! Isn’t this great?

Pitfall

Gotcha! We immediately lost the C in our relational database’s ACID guarantees, which stands for Consistency.

You see, there now exists a possibility in which we insert a new record into the database, immediately afterwards issue a read query for it and get nothing back, as if it didn’t exist!

Propagating the new information from the primary to the replica does not happen instantaneously. There actually exists a time window in which you can fetch stale information. If this were not the case, your write performance would suffer, as it would have to synchronously wait for the data to be propagated.

Distributed systems come with a handful of trade-offs. This particular issue is one you will have to live with if you want to adequately scale.

Continuing to Scale

Using the replica database approach, we can horizontally scale our read traffic up to some extent. That’s great but we’ve hit a wall in regards to our write traffic — it’s still all in one server!

We’re not left with much options here. We simply need to split our write traffic into multiple servers as one is not able to handle it.

One way is to go with a multi-primary replication strategy. There, instead of replicas that you can only read from, you have multiple primary nodes which support reads and writes. Unfortunately, this gets complicated real quick as you now have the ability to create conflicts (e.g insert two records with same ID).

Let’s go with another technique called sharding (also called partitioning).

With sharding you split your server into multiple smaller servers, called shards. These shards all hold different records — you create a rule as to what kind of records go into which shard. It is very important to create the rule such that the data gets spread in an uniform way.

A possible approach to this is to define ranges according to some information about a record (e.g users with name A-D).

This sharding key should be chosen very carefully, as the load is not always equal based on arbitrary columns. (e.g more people have a name starting with C rather than Z). A single shard that receives more requests than others is called a hot spot and must be avoided. Once split up, re-sharding data becomes incredibly expensive and can cause significant downtime, as was the case with FourSquare’s infamous 11 hour outage.

To keep our example simple, assume our client (the Rails app) knows which database to use for each record. It is also worth noting that there are many strategies for sharding and this is a simple example to illustrate the concept.

We have won quite a lot right now — we can increase our write traffic N times where N is the number of shards. This practically gives us almost no limit — imagine how finely-grained we can get with this partitioning.

Pitfall

Everything in Software Engineering is more or less a trade-off and this is no exception. Sharding is no simple feat and is best avoided until really needed.

We have now made queries by keys other than the partitioned key incredibly inefficient (they need to go through all of the shards). SQL JOIN queries are even worse and complex ones become practically unusable.

Decentralized vs Distributed

Before we go any further I’d like to make a distinction between the two terms.

Even though the words sound similar and can be concluded to mean the same logically, their difference makes a significant technological and political impact.

Decentralized is still distributed in the technical sense, but the whole decentralized systems is not owned by one actor. No one company can own a decentralized system, otherwise it wouldn’t be decentralized anymore.

This means that most systems we will go over today can be thought of as distributed centralized systems — and that is what they’re made to be.

If you think about it — it is harder to create a decentralized system because then you need to handle the case where some of the participants are malicious. This is not the case with normal distributed systems, as you know you own all the nodes.

Note: This definition has been debated a lot and can be confused with others (peer-to-peer, federated). In early literature, it’s been defined differently as well. Regardless, what I gave you as a definition is what I feel is the most widely used now that blockchain and cryptocurrencies popularized the term.

Distributed System Categories

We are now going to go through a couple of distributed system categories and list their largest publicly-known production usage. Bear in mind that most such numbers shown are outdated and are most probably significantly bigger as of the time you are reading this.

Distributed Data Stores

Distributed Data Stores are most widely used and recognized as Distributed Databases. Most distributed databases are NoSQL non-relational databases, limited to key-value semantics. They provide incredible performance and scalability at the cost of consistency or availability.

Known Scale — Apple is known to use 75,000 Apache Cassandra nodes storing over 10 petabytes of data, back in 2015

We cannot go into discussions of distributed data stores without first introducing the CAP Theorem.

CAP Theorem

Proven way back in 2002, the CAP theorem states that a distributed data store cannot simultaneously be consistent, available and partition tolerant.

Choose 2 out of 3 (But not Consistency and Availability)

Some quick definitions:

Consistency — What you read and write sequentially is what is expected (remember the gotcha with the database replication a few paragraphs ago?)
Availability — the whole system does not die — every non-failing node always returns a response.
Partition Tolerant — The system continues to function and uphold its consistency/availability guarantees in spite of network partitions

In reality, partition tolerance must be a given for any distributed data store. As mentioned in many places, one of which this great article, you cannot have consistency and availability without partition tolerance.

Think about it: if you have two nodes which accept information and their connection dies — how are they both going to be available and simultaneously provide you with consistency? They have no way of knowing what the other node is doing and as such have can either become offline (unavailable) or work with stale information (inconsistent).

What do we do?

In the end you’re left to choose if you want your system to be strongly consistent or highly available under a network partition.

Practice shows that most applications value availability more. You do not necessarily always need strong consistency. Even then, that trade-off is not necessarily made because you need the 100% availability guarantee, but rather because network latency can be an issue when having to synchronize machines to achieve strong consistency. These and more factors make applications typically opt for solutions which offer high availability.

Such databases settle with the weakest consistency model — eventual consistency (strong vs eventual consistency explanation). This model guarantees that if no new updates are made to a given item, eventually all accesses to that item will return the latest updated value.

Those systems provide BASE properties (as opposed to traditional databases’ ACID)

Basically Available — The system always returns a response
Soft state — The system could change over time, even during times of no input (due to eventual consistency)
Eventual consistency — In the absence of input, the data will spread to every node sooner or later — thus becoming consistent

Examples of such available distributed databases — Cassandra, Riak, Voldemort

Of course, there are other data stores which prefer stronger consistency — HBase, Couchbase, Redis, Zookeeper

The CAP theorem is worthy of multiple articles on its own — some regarding how you can tweak a system’s CAP properties depending on how the client behaves and others on how it is not understood properly.

Cassandra

Cassandra, as mentioned above, is a distributed No-SQL database which prefers the AP properties out of the CAP, settling with eventual consistency. I must admit this may be a bit misleading, as Cassandra is highly configurable — you can make it provide strong consistency at the expense of availability as well, but that is not its common use case.

Cassandra uses consistent hashing to determine which nodes out of your cluster must manage the data you are passing in. You set a replication factor, which basically states to how many nodes you want to replicate your data.

Sample write

When reading, you will read from those nodes only.

Cassandra is massively scalable, providing absurdly high write throughput.

_Possibly biased diagram, showing writes per second benchmarks. [Taken from here.](https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks" rel="noopener" target="blank" title=")

Even though this diagram might be biased and it looks like it compares Cassandra to databases set to provide strong consistency (otherwise I can’t see why MongoDB would drop performance when upgraded from 4 to 8 nodes), this should still show what a properly set up Cassandra cluster is capable of.

Regardless, in the distributed systems trade-off which enables horizontal scaling and incredibly high throughput, Cassandra does not provide some fundamental features of ACID databases — namely, transactions.

Consensus

Database transactions are tricky to implement in distributed systems as they require each node to agree on the right action to take (abort or commit). This is known as consensus and it is a fundamental problem in distributed systems.

Reaching the type of agreement needed for the “transaction commit” problem is straightforward if the participating processes and the network are completely reliable. However, real systems are subject to a number of possible faults, such as process crashes, network partitioning, and lost, distorted, or duplicated messages.

This poses an issue — it has been proven impossible to guarantee that a correct consensus is reached within a bounded time frame on a non-reliable network.

In practice, though, there are algorithms that reach consensus on a non-reliable network pretty quickly. Cassandra actually provides lightweight transactions through the use of the Paxos algorithm for distributed consensus.

Distributed Computing

Distributed computing is the key to the influx of Big Data processing we’ve seen in recent years. It is the technique of splitting an enormous task (e.g aggregate 100 billion records), of which no single computer is capable of practically executing on its own, into many smaller tasks, each of which can fit into a single commodity machine. You split your huge task into many smaller ones, have them execute on many machines in parallel, aggregate the data appropriately and you have solved your initial problem. This approach again enables you to scale horizontally — when you have a bigger task, simply include more nodes in the calculation.

Known Scale — Folding@Home had 160k active machines in 2012

An early innovator in this space was Google, which by necessity of their large amounts of data had to invent a new paradigm for distributed computation — MapReduce. They published a paper on it in 2004 and the open source community later created Apache Hadoop based on it.

MapReduce

MapReduce can be simply defined as two steps — mapping the data and reducing it to something meaningful.

Let’s get at it with an example again:

Say we are Medium and we stored our enormous information in a secondary distributed database for warehousing purposes. We want to fetch data representing the number of claps issued each day throughout April 2017 (a year ago).

This example is kept as short, clear and simple as possible, but imagine we are working with loads of data (e.g analyzing billions of claps). We won’t be storing all of this information on one machine obviously and we won’t be analyzing all of this with one machine only. We also won’t be querying the production database but rather some “warehouse” database built specifically for low-priority offline jobs.

Each Map job is a separate node transforming as much data as it can. Each job traverses all of the data in the given storage node and maps it to a simple tuple of the date and the number one. Then, three intermediary steps (which nobody talks about) are done — Shuffle, Sort and Partition. They basically further arrange the data and delete it to the appropriate reduce job. As we’re dealing with big data, we have each Reduce job separated to work on a single date only.

This is a good paradigm and surprisingly enables you to do a lot with it — you can chain multiple MapReduce jobs for example.

Better Techniques

MapReduce is somewhat legacy nowadays and brings some problems with it. Because it works in batches (jobs) a problem arises where if your job fails — you need to restart the whole thing. A 2-hour job failing can really slow down your whole data processing pipeline and you do not want that in the very least, especially in peak hours.

Another issue is the time you wait until you receive results. In real-time analytic systems (which all have big data and thus use distributed computing) it is important to have your latest crunched data be as fresh as possible and certainly not from a few hours ago.

As such, other architectures have emerged that address these issues. Namely Lambda Architecture (mix of batch processing and stream processing) and Kappa Architecture (only stream processing). These advances in the field have brought new tools enabling them — Kafka Streams, Apache Spark, Apache Storm, Apache Samza.

Distributed File Systems

Distributed file systems can be thought of as distributed data stores. They’re the same thing as a concept — storing and accessing a large amount of data across a cluster of machines all appearing as one. They typically go hand in hand with Distributed Computing.

Known Scale — Yahoo is known for running HDFS on over 42,000 nodes for storage of 600 Petabytes of data, way back in 2011

The difference being that distributed file systems allow files to be accessed using the same interfaces and semantics as local files, not through a custom API like the Cassandra Query Language (CQL).

HDFS

Hadoop Distributed File System (HDFS) is the distributed file system used for distributed computing via the Hadoop framework. Boasting widespread adoption, it is used to store and replicate large files (GB or TB in size) across many machines.

Its architecture consists mainly of NameNodes and DataNodes. NameNodes are responsible for keeping metadata about the cluster, like which node contains which file blocks. They act as coordinators for the network by figuring out where best to store and replicate files, tracking the system’s health. DataNodes simply store files and execute commands like replicating a file, writing a new one and others.

Unsurprisingly, HDFS is best used with Hadoop for computation as it provides data awareness to the computation jobs. Said jobs then get ran on the nodes storing the data. This leverages data locality — optimizes computations and reduces the amount of traffic over the network.

IPFS

Interplanetary File System (IPFS) is an exciting new peer-to-peer protocol/network for a distributed file system. Leveraging Blockchain technology, it boasts a completely decentralized architecture with no single owner nor point of failure.

IPFS offers a naming system (similar to DNS) called IPNS and lets users easily access information. It stores file via historic versioning, similar to how Git does. This allows for accessing all of a file’s previous states.

It is still undergoing heavy development (v0.4 as of time of writing) but has already seen projects interested in building over it (FileCoin).

Distributed Messaging

Messaging systems provide a central place for storage and propagation of messages/events inside your overall system. They allow you to decouple your application logic from directly talking with your other systems.

Known Scale — LinkedIn’s Kafka cluster processed 1 trillion messages a day with peaks of 4.5 millions messages a second.

Simply put, a messaging platform works in the following way:

A message is broadcast from the application which potentially create it (called a producer), goes into the platform and is read by potentially multiple applications which are interested in it (called consumers).

If you need to save a certain event to a few places (e.g user creation to database, warehouse, email sending service and whatever else you can come up with) a messaging platform is the cleanest way to spread that message.

Consumers can either pull information out of the brokers (pull model) or have the brokers push information directly into the consumers (push model).

There are a couple of popular top-notch messaging platforms:

RabbitMQ — Message broker which allows you finer-grained control of message trajectories via routing rules and other easily configurable settings. Can be called a smart broker, as it has a lot of logic in it and tightly keeps track of messages that pass through it. Provides settings for both AP and CP from CAP. Uses a push model for notifying the consumers.

Kafka — Message broker (and all out platform) which is a bit lower level, as in it does not keep track of which messages have been read and does not allow for complex routing logic. This helps it achieve amazing performance. In my opinion, this is the biggest prospect in this space with active development from the open-source community and support from the Confluent team. Kafka arguably has the most widespread use from top tech companies. I wrote a thorough introduction to this, where I go into detail about all of its goodness.

Apache ActiveMQ — The oldest of the bunch, dating from 2004. Uses the JMS API, meaning it is geared towards Java EE applications. It got rewritten as ActiveMQ Artemis, which provides outstanding performance on par with Kafka.

Amazon SQS — A messaging service provided by AWS. Lets you quickly integrate it with existing applications and eliminates the need to handle your own infrastructure, which might be a big benefit, as systems like Kafka are notoriously tricky to set up. Amazon also offers two similar services — SNS and MQ, the latter of which is basically ActiveMQ but managed by Amazon.

Distributed Applications

If you roll up 5 Rails servers behind a single load balancer all connected to one database, could you call that a distributed application? Recall my definition from up above:

A distributed system is a group of computers working together as to appear as a single computer to the end-user. These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

If you count the database as a shared state, you could argue that this can be classified as a distributed system — but you’d be wrong, as you’ve missed the “working together” part of the definition.

A system is distributed only if the nodes communicate with each other to coordinate their actions.

Therefore something like an application running its back-end code on a peer-to-peer network can better be classified as a distributed application. Regardless, this is all needless classification that serves no purpose but illustrate how fussy we are about grouping things together.

Known Scale — BitTorrent swarm of 193,000 nodes for an episode of Game of Thrones, April, 2014

Erlang Virtual Machine

Erlang is a functional language that has great semantics for concurrency, distribution and fault-tolerance. The Erlang Virtual Machine itself handles the distribution of an Erlang application.

Its model works by having many isolated lightweight processes all with the ability to talk to each other via a built-in system of message passing. This is called the Actor Model and the Erlang OTP libraries can be thought of as a distributed actor framework (along the lines of Akka for the JVM).

The model is what helps it achieve great concurrency rather simply — the processes are spread across the available cores of the system running them. Since this is indistinguishable from a network setting (apart from the ability to drop messages), Erlang’s VM can connect to other Erlang VMs running in the same data center or even in another continent. This swarm of virtual machines run one single application and handle machine failures via takeover (another node gets scheduled to run).

In fact, the distributed layer of the language was added in order to provide fault tolerance. Software running on a single machine is always at risk of having that single machine dying and taking your application offline. Software running on many nodes allows easier hardware failure handling, provided the application was built with that in mind.

BitTorrent

BitTorrent is one of the most widely used protocol for transferring large files across the web via torrents. The main idea is to facilitate file transfer between different peers in the network without having to go through a main server.

Using a BitTorrent client, you connect to multiple computers across the world to download a file. When you open a .torrent file, you connect to a so-called tracker, which is a machine that acts as a coordinator. It helps with peer discovery, showing you the nodes in the network which have the file you want.

a sample network

You have the notions of two types of user, a leecher and a seeder. A leecher is the user who is downloading a file and a seeder is the user who is uploading said file.

The funny thing about peer-to-peer networks is that you, as an ordinary user, have the ability to join and contribute to the network.

BitTorrent and its precursors (Gnutella, Napster) allow you to voluntarily host files and upload to other users who want them. The reason BitTorrent is so popular is that it was the first of its kind to provide incentives for contributing to the network. Freeriding, where a user would only download files, was an issue with the previous file sharing protocols.

BitTorrent solved freeriding to an extent by making seeders upload more to those who provide the best download rates. It works by incentivizing you to upload while downloading a file. Unfortunately, after you’re done, nothing is making you stay active in the network. This causes a lack of seeders in the network who have the full file and as the protocol relies heavily on such users, solutions like private trackers came into fruition. Private trackers require you to be a member of a community (often invite-only) in order to participate in the distributed network.

After advancements in the field, trackerless torrents were invented. This was an upgrade to the BitTorrent protocol that did not rely on centralized trackers for gathering metadata and finding peers but instead use new algorithms. One such instance is Kademlia (Mainline DHT), a distributed hash table (DHT) which allows you to find peers through other peers. In effect, each user performs a tracker’s duties.

Distributed Ledgers

A distributed ledger can be thought of as an immutable, append-only database that is replicated, synchronized and shared across all nodes in the distributed network.

Known Scale — Ethereum Network had a peak of 1.3 million transactions a day on January 4th, 2018.

They leverage the Event Sourcing pattern, allowing you to rebuild the ledger’s state at any time in its history.

Blockchain

Blockchain is the current underlying technology used for distributed ledgers and in fact marked their start. This latest and greatest innovation in the distributed space enabled the creation of the first ever truly distributed payment protocol — Bitcoin.

Blockchain is a distributed ledger carrying an ordered list of all transactions that ever occurred in its network. Transactions are grouped and stored in blocks. The whole blockchain is essentially a linked-list of blocks (hence the name). Said blocks are computationally expensive to create and are tightly linked to each other through cryptography.

Simply said, each block contains a special hash (that starts with X amount of zeroes) of the current block’s contents (in the form of a Merkle Tree) plus the previous block’s hash. This hash requires a lot of CPU power to be produced because the only way to come up with it is through brute-force.

Simplified blockchain

Miners are the nodes who try to compute the hash (via bruteforce). The miners all compete with each other for who can come up with a random string (called a nonce) which, when combine with the contents, produces the aforementioned hash. Once somebody finds the correct nonce — he broadcasts it to the whole network. Said string is then verified by each node on its own and accepted into their chain.

This translates into a system where it is absurdly costly to modify the blockchain and absurdly easy to verify that it is not tampered with.

It is costly to change a block’s contents because that would produce a different hash. Remember that each subsequent block‘s hash is dependent on it. If you were to change a transaction in the first block of the picture above — you would change the Merkle Root. This would in turn change the block’s hash (most likely without the needed leading zeroes) — that would change block #2’s hash and so on and so on. This means you’d need to brute-force a new nonce for every block after the one you just modified.

The network always trusts and replicates the longest valid chain. In order to cheat the system and eventually produce a longer chain you’d need more than 50% of the total CPU power used by all the nodes.

Blockchain can be thought of as a distributed mechanism for emergent consensus. Consensus is not achieved explicitly — there is no election or fixed moment when consensus occurs. Instead, consensus is an emergent product of the asynchronous interaction of thousands of independent nodes, all following protocol rules.

This unprecedented innovation has recently become a boom in the tech space with people predicting it will mark the creation of the Web 3.0. It is definitely the most exciting space in the software engineering world right now, filled with extremely challenging and interesting problems waiting to be solved.

Bitcoin

What previous distributed payment protocols lacked was a way to practically prevent the double-spending problem in real time, in a distributed manner. Research has produced interesting propositions[1] but Bitcoin was the first to implement a practical solution with clear advantages over others.

The double spending problem states that an actor (e.g Bob) cannot spend his single resource in two places. If Bob has $1, he should not be able to give it to both Alice and Zack — it is only one asset, it cannot be duplicated. It turns out it is really hard to truly achieve this guarantee in a distributed system. There are some interesting mitigation approaches predating blockchain, but they do not completely solve the problem in a practical way.

Double-spending is solved easily by Bitcoin, as only one block is added to the chain at a time. Double-spending is impossible within a single block, therefore even if two blocks are created at the same time — only one will come to be on the eventual longest chain.

Bitcoin relies on the difficulty of accumulating CPU power.

While in a voting system an attacker need only add nodes to the network (which is easy, as free access to the network is a design target), in a CPU power based scheme an attacker faces a physical limitation: getting access to more and more powerful hardware.

This is also the reason malicious groups of nodes need to control over 50% of the computational power of the network to actually carry any successful attack. Less than that, and the rest of the network will create a longer blockchain faster.

Ethereum

Ethereum can be thought of as a programmable blockchain-based software platform. It has its own cryptocurrency (Ether) which fuels the deployment of smart contracts on its blockchain.

Smart contracts are a piece of code stored as a single transaction in the Ethereum blockchain. To run the code, all you have to do is issue a transaction with a smart contract as its destination. This in turn makes the miner nodes execute the code and whatever changes it incurs. The code is executed inside the Ethereum Virtual Machine.

Solidity, Ethereum’s native programming language, is what’s used to write smart contracts. It is a turing-complete programming language which directly interfaces with the Ethereum blockchain, allowing you to query state like balances or other smart contract results. To prevent infinite loops, running the code requires some amount of Ether.

As the blockchain can be interpreted as a series of state changes, a lot of Distributed Applications (DApps) have been built on top of Ethereum and similar platforms.

Further usages of distributed ledgers

Proof of Existence — A service to anonymously and securely store proof that a certain digital document existed at some point of time. Useful for ensuring document integrity, ownership and timestamping.

Decentralized Autonomous Organizations (DAO) — organizations which use blockchain as a means of reaching consensus on the organization’s improvement propositions. Examples are Dash’s governance system, the SmartCash project

Decentralized Authentication — Store your identity on the blockchain, enabling you to use single sign-on (SSO) everywhere. Sovrin, Civic

And many, many more. The distributed ledger technology really did open up endless possibilities. Some are most probably being invented as we speak!

Summary

In the short span of this article, we managed define what a distributed system is, why you’d use one and go over each category a little. Some important things to remember are:

Distributed Systems are complex
They are chosen by necessity of scale and price
They are harder to work with
CAP Theorem — Consistency/Availability trade-off
They have 6 categories — data stores, computing, file systems, messaging systems, ledgers, applications

To be frank, we have barely touched the surface on distributed systems. I did not have the chance to thoroughly tackle and explain core problems like consensus, replication strategies, event ordering & time, failure tolerance, broadcasting a message across the network and others.

Caution

Let me leave you with a parting forewarning:

You must stray away from distributed systems as much as you can. The complexity overhead they incur with themselves is not worth the effort if you can avoid the problem by either solving it in a different way or some other out-of-the-box solution.

[1]
Combating Double-Spending Using Cooperative P2P Systems, 25–27 June 2007 — a proposed solution in which each ‘coin’ can expire and is assigned a witness (validator) to it being spent.

Bitgold, December 2005 — A high-level overview of a protocol extremely similar to Bitcoin’s. It is said this is the precursor to Bitcoin.

Further Distributed Systems Reading:

Designing Data-Intensive Applications, Martin Kleppmann — A great book that goes over everything in distributed systems and more.

Cloud Computing Specialization, University of Illinois, Coursera — A long series of courses (6) going over distributed system concepts, applications

Jepsen — Blog explaining a lot of distributed technologies (ElasticSearch, Redis, MongoDB, etc)

Thanks for taking the time to read through this long(~5600 words) article!

If, by any chance, you found this informative or thought it provided you with value, please make sure to give it as many claps you believe it deserves and consider sharing with a friend who could use an introduction to this wonderful field of study.

~Stanislav Kozlovski

Update

I currently work at Confluent. Confluent is a Big Data company founded by the creators of Apache Kafka themselves! I am immensely grateful for the opportunity they have given me — I currently work on Kafka itself, which is beyond awesome! We at Confluent help shape the whole open-source Kafka ecosystem, including a new managed Kafka-as-a-service cloud offering.

We are hiring for a lot of positions (especially SRE/Software Engineers) in Europe and the USA! If you are interested in working on Kafka itself, looking for new opportunities or just plain curious — make sure to message me on Twitter and I will share all the great perks that come from working in a bay area company.

Building on Quicksand: A Summary

freeCodeCamp — Mon, 04 Dec 2017 17:11:21 +0000

By Shubheksha Jalan

Let’s try to break down the paper “Building On Quicksand” published by Pat Helland and David Campbell in 2009. All pull quotes are from the paper.

The paper focuses on the design of large, fault-tolerant, replicated distributed systems. It also discusses its evolving based on changing requirements over time. It starts off by stating “Reliable systems have always been built out of unreliable components”.

As the granularity of the unreliable component grows (from a mirrored disk to a system to a data center), the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system because the user is not delayed behind a slow interaction with the backup.

Fault-tolerant systems can be made of many components. Their goal is keep functioning when one of those components fail. We don’t consider Byzantinefailures in this discussion. Instead, the fail fast model where either a component works correctly or it fails.

The paper goes on to compare two versions of the Tandem NonStop system. One that used synchronous checkpointing and one that used asynchronous checkpointing. Refer section 3 of the paper for all the details. I’d like to touch upon the difference between the two checkpointing strategies.

Synchronous checkpointing: in this case, with every write to the primary, state needed to be sent to the backup. Only after the backup acknowledged the write, did the primary send a response to the client who issued the write request. This ensured that when the primary fails, the backup can take over without losing any work.
Asynchronous checkpointing: in this strategy, the primary acknowledges and commits the write. This is done as soon as it processes it without waiting for a reply from the backup. This technique has improved latency but also poses other challenges addressed later.

Log Shipping

A classic database system has a process that reads the log and ships it to a backup data-center. The normal implementation of this mechanism commits transactions at the primary system (acknowledging the user’s commit request) and asynchronously ships the log. The backup database replays the log, constantly playing catch-up.

The mechanism described above is termed as log shipping. The main problem this poses is that when the primary fails and the back up takes over, some recent transactions might be lost.

This inherently opens up a window in which the work is acknowledged to the client but it has not yet been shipped to the backup. A failure of the primary during this window will lock the work inside the primary for an unknown period of time. The backup will move ahead without knowledge of the locked-up work.

The introduction of asynchrony into the system has an advantage in latency, response time and performance. However, it makes the system more prone to the possibility of losing work when the primary fails. There are two ways to deal with this:

Discard the work locked in the primary when it fails. Whether a system can do that or not depends on the requirements and business rules.
Have a recovery mechanism to sync the primary with backups when it comes back up and retries lost work. This is possible only if the operations can be retried in an idempotent way and the out-of-order retries are possible.

The system loses the notion of what the authors call “an authoritative truth”. Nobody knows the accurate state of the system at any given point in time if the work is locked in an unavailable backup or primary.

The authors conclude that business rules in a system with asynchronous checkpointing are probabilistic.

If a primary uses asynchronous checkpointing and applies a business rule on the incoming work, it is necessarily a probabilistic rule. The primary, despite its best intentions, cannot know it will be alive to enforce the business rules.

When the backup system that participates in the enforcement of these business rules is asynchronously tied to the primary, the enforcement of these rules inevitably becomes probabilistic!

The authors state that commutative operations, operations that can be reordered, can be executed independently, as long as the operation preserves business rules. However, this is hard to do with storage systems because the write operation isn’t commutative.

Another consideration is that work of a single operation is idempotent. For example, executing the operation any number of time should result in the same state of the system.

To ensure this, applications typically assign a unique number or ID to the work. This is assigned at the ingress to the system (i.e. whichever replica first handles the work). As the work request rattles around the network, it is easy for a replica to detect that it has already seen that operation and, hence, not do the work twice.

The authors suggest that different operations within a system provide different consistency guarantees. Yet, this depends on the business requirements. Some operations can choose classic consistency over availability and vice versa.

Next, the authors argue that as soon there is no notion of authoritative truth in a system. All computing boils down to three things: memories, guesses, and apologies.

Memories: you can only hope that your replica remembers what it has already seen.
Guesses: Due to only partial knowledge being available, the replicas take actions based on local state and may be wrong. “In any system which allows a degradation of the absolute truth, any action is, at best, a guess.” Any action in such a system has a high probability of being successful, but it’s still a guess.
Apologies: Mistakes are inevitable. Hence, every business needs to have an apology mechanism in place either through human intervention or by automating it.

The paper next discusses the topic of eventual consistency. The authors do this by taking the Amazon shopping cart built using Dynamo & a system for clearing checks as examples. A single replica identifies and processes the work coming into these systems. It flows to other replicas as and when connectivity permits. The requests coming into these systems are commutative (reorderable). They can be processed at different replicas in different orders.

Storage systems alone cannot provide the commutativity we need to create robust systems that function with asynchronous checkpointing. We need the business operations to reorder. Amazon’s Dynamo does not do this by itself. The shopping cart application on top of the Dynamo storage system is responsible for the semantics of eventual consistency and commutativity. The authors think it is time for us to move past the examination of eventual consistency in terms of updates and storage systems. The real action comes when examining application based operation semantics.

Next, they discuss two strategies for allocating resources in replicas that might not be able to communicate with each other:

Over-provisioning: the resources are partitioned between replicas. Each has a fixed subset of resources they can allocate. No replica can allocate a resource that’s not actually available.
Over-booking: the resources can be individually allocated without ensuring strict partitioning. This may lead to the replicas allocating a resource that’s not available, promising something they can’t deliver.

The paper talks also about something termed as the “seat reservation pattern”. This is a compromise between over-provisioning and over-booking:

Anyone who has purchased tickets online will recognize the “Seat Reservation” pattern where you can identify potential seats and then you have a bounded period of time, (typically minutes), to complete the transaction. If the transaction is not successfully concluded within the time period, the seats are once again marked as “available”.

ACID 2.0

The classic definition of ACID stands for “Atomic, Consistent, Isolated, and Durable”. Its goal is to make the application think that there is a single computer which isn’t doing anything else when the transaction is being processed. The authors talk about a new definition for ACID. which stands for Associative, Commutative, Idempotent, and Distributed.

The goal for ACID2.0 is to succeed if the pieces of the work happen: At least once, anywhere in the system, in any order. This defines a new KIND of consistency. The individual steps happen at one or more system. The application is explicitly tolerant of work happening out of order. It is tolerant of the work happening more than once per machine, too.

Going by the classic definition of ACID, a linear history is a basis for fault tolerance. If we want to achieve the same guarantees in a distributed system, it’ll require concurrency control mechanisms which “tend to be fragile”.

When the application is constrained to the additional requirements of commutativity and associativity, the world gets a LOT easier. No longer must the state be checkpointed across failure units in a synchronous fashion. Instead, it is possible to be very lazy about the sharing of information. This opens up offline, slow links, low-quality datacenters, and more.

In conclusion:

We have attempted to describe the patterns in use by many applications today as they cope with failures in widely distributed systems. It is the reorderability of work and repeatability of work that is essential to allowing successful application execution on top of the chaos of a distributed world in which systems come and go when they feel like it.

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

Harvest, Yield, and Scalable Tolerant Systems: A Summary

freeCodeCamp — Mon, 30 Oct 2017 17:14:37 +0000

By Shubheksha Jalan

This article presents a summary of the paper “Harvest, Yield, and Scalable Tolerant Systems” published by Eric Brewer & Amando Fox in 1999. All unattributed quotes are from this paper.

The paper deals with the trade-offs between consistency and availability (CAP) for large systems. It’s very easy to point to CAP and assert that no system can have consistency and availability.

But, there is a catch. CAP has been misunderstood in a variety of ways. As Coda Hale explains in his excellent blog post “You Can’t Sacrifice Partition Tolerance”:

Of the CAP theorem’s Consistency, Availability, and Partition Tolerance, Partition Tolerance is mandatory in distributed systems. You cannot not choose it. Instead of CAP, you should think about your availability in terms of yield (percent of requests answered successfully) and harvest (percent of required data actually included in the responses) and which of these two your system will sacrifice when failures happen.

The paper focuses on increasing the availability of large scale systems by fault toleration, containment and isolation:

We assume that clients make queries to servers, in which case there are at least two metrics for correct behavior: yield, which is the probability of completing a request, and harvest, which measures the fraction of the data reflected in the response, i.e. the completeness of the answer to the query.

The two metrics, harvest and yield can be summarized as follows:

Harvest: data in response/total data
For example: If one of the nodes is down in a 100 node cluster, the harvest is 99% for the duration of the fault.
Yield: requests completed with success/total number of requests
Note: Yield is different from uptime. Yield deals with the number of requests, not only the time the system wasn’t able to respond to requests.

The paper argues that there are certain systems which require perfect responses to queries every single time. Also, there are systems that can tolerate imperfect answers once in a while.

To increase the overall availability of our systems, we need to carefully think through the required consistency and availability guarantees it needs to provide.

Trading Harvest for Yield — Probabilistic Availability

Nearly all systems are probabilistic whether they realize it or not. In particular, any system that is 100% available under single faults is probabilistically available overall (since there is a non-zero probability of multiple failures)

The paper talks about understanding the probabilistic nature of availability. This helps in understanding and limiting the impact of faults by making decisions about what needs to be available and what kind of faults the system can deal with.

They outline the linear degradation of harvest in case of multiple node faults. The harvest is directly proportional to the number of nodes that are functioning correctly. Therefore, it decreases/increases linearly.

Two strategies are suggested for increasing the yield:

Random distribution of data on the nodes
If one of the nodes goes down, the average-case and worst-case fault behavior doesn’t change. Yet if the distribution isn’t random, then depending on the type of data, the impact of a fault may vary.
For example, if only one of the nodes stored information related to a user’s account balance goes down, the entire banking system will not be able to work.
Replicating the most important data
This reduces the impact in case one of the nodes containing a subset of high-priority data goes down.
It also improves harvest.

Another notable observation made in the paper is that it is possible to replicate all your data. It doesn’t do a lot to improve your harvest/yield, but it increases the cost of operation substantially. This is because the internet works based on best-in-effort protocols which can never guarantee 100% harvest/yield.

Application Decomposition and Orthogonal Mechanisms

The second strategy focuses on the benefits of orthogonal system design.

It starts out by stating that large systems are composed of subsystems which cannot tolerate failures. But they fail in a way that allows the entire system to continue functioning with some impact on utility.

The actual benefit is the ability to provision each subsystem’s state management separately, providing strong consistency or persistent state only for the subsystems that need it, not for the entire application. The savings can be significant if only a few small subsystems require the extra complexity.

The paper states that orthogonal components are completely independent of each other. They have no run time interface to other components, unless there is a configuration interface. This allows each individual component to fail independently and minimizes its impact on the overall system.

Composition of orthogonal subsystems shifts the burden of checking for possibly harmful interactions from runtime to compile time, and deployment of orthogonal guard mechanisms improves robustness for the runtime interactions that do occur, by providing improved fault containment.

The goal of this paper was to motivate research in the field of designing fault-tolerant and highly available large scale systems.

Also, to think carefully about the consistency and availability guarantees the application needs to provide. As well as the trade offs it is capable of making in terms of harvest against yield.

If you enjoyed this paper, please hit the clap button so more people see it. Thank you.

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

Some Constraints & Trade-offs In The Design of Network Communications: A Summary

freeCodeCamp — Mon, 25 Sep 2017 18:44:11 +0000

By Shubheksha Jalan

This article distills the content presented in the paper “Some Constraints and Trade-offs In The Design of Network Communications” published in 1975 by E. A. Akkoyunlu et al.

The paper focuses on the inclusion of Interprocess Communication (IPC) primitives and the consequences of doing so. It explores, in particular, the time-out and the insertion property feature described in detail below with respect to distributed systems of sequential processes without system buffering and interrupts.

It also touches upon the two generals problem which states that it’s impossible for two processes to agree on a decision over an unreliable network.

Introduction:

The design of an Interprocess Communication Mechanism (IPCM) can be described by stating the behavior of the system and the required services. The features to be included in the IPCM are very critical as they might be interdependent, hence the design process should begin with a detailed spec. This involves thorough understanding of the consequences of each decision.

The major aim of the paper is to point out the interdependence of the features to be incorporated in the system.

The paper states that at times the incompatibility between features is visible from the start. Yet, sometimes two features which seem completely unrelated end up affecting each other significantly. If the trade-offs involved aren’t explored at the beginning, it might not be possible to include desirable features. Trying to accommodate conflicting features results in messy code at the cost of elegance.

Intermediate Processes:

Let’s suppose a system doesn’t allow indirect communication between processes that cannot establish a connection. The users just care about the logical sender and receiver of the messages: they don’t care what path the messages take or how many processes they travel through to reach their final destination. In such a situation, intermediate processes come to our rescue. They’re not a part of the IPCM but are inserted between two processes that can’t communicate directly through a directory or broker process when the connection is set up. They’re the only ones aware of the indirect nature of communication between the processes.

Centralized vs. Distributed Systems:

Centralized Communication Facility

Has a single agent which is able to maintain all state information related to the communication happening in the system
The agent can also change the state of the system in a well-defined manner

For example, if we consider the IPCM to be the centralized agent, it’ll be responsible for matching the SEND and RECEIVE requests of two processes, transferring data between their buffers and relaying appropriate status to both.

Distributed Communication Facility

No single agent has the complete state information at any time
The IPCM is made of several individual components which coordinate, exchange and work with parts of state information they possess.
A global change can take a considerable amount of time
If one of the components crashes, the activity of other components still interests us

Case 1:

In Figure 1, P1 and P2 are the two communicating processes on different machines over a network with their own IPCMs and P is the interface which enables this, with parts that lie on both machines. P handles the details of the network lines.

If one machine or a communication link crashes, we want the surviving IPCM’s to continue their operation. At least one component should detect a failure and be able to communicate. (In the case of a communication link failure, both ends must know.)

Case 2:

Distributed communication can also happen on the same machine given that there are one or more intermediate processes taking part in the system. In that case, P, P1 and P2 will be processes on the same system with identical IPCMs. P is an intermediate processes which facilitates the communication between P1 and P2.

Transactions between P1 and P2 consist of two steps: P1 to P and P to P2. Normally, the status returned to P1 would reflect the result of the P1 to P transfer, but P1 is interested in the status of the over all transaction from P1 to P2.

One way to deal with this is a delayed status return. The status isn’t sent to the sender immediately after the transaction occurs but only when the sender issues a SEND STATUS primitive. In the example above, after receiving the message from P1, P further sends it to P2, doesn’t send any status to P1 and waits to receive a status from P2. When it receives the appropriate status from P2, it relays it to P1 using the SEND STATUS primitive.

Special Cases of Distributed Facility

This section starts out by stating some facts and reasoning around them.

FACT 0: A perfectly reliable distributed system can be made to behave as a centralized system.

Theoretically, this is possible if:

The state of different components of the system is known at any given time
After every transaction, the status is relayed properly between the processes through their IPCMs using reliable communication.

However, this isn’t possible in practice because we don’t have a perfect reliable network. Hence, the more realistic version of the above fact is:

FACT I: A distributed IPCM can be made to simulate a centralized system provided that:

The overall system remains connected at all times, and

When a communication link fails, the component IPCM’s that are connected to it know about it, and

The mean time between two consecutive failures is large compared to the mean transaction time across the network.

The paper states that if the above conditions are met, we can establish communication links that are reliable enough to simulate a centralized systems because:

There is always a path from the sender to the receiver
Only one copy of an undelivered message will be retained by the system in case of a failure due to link failure detection. Hence a message cannot be lost if undelivered and will be removed from the system when delivered.
A routing strategy and a bound on the failure rate ensures that a message moving around in a subset of nodes will eventually get out in finite time if the target node isn’t present in the subset.

The cases described above are special cases because they make a lot of assumptions, use inefficient algorithms and don’t take into account network partitions leading to disconnected components.

Status in Distributed Systems

Complete Status

A complete status is one that relays the final outcome of the message, such as whether it reached its destination.

FACT 2: In an arbitrary distributed facility, it is impossible to provide complete status.

Case 1:

Assume that a system is partitioned into two disjoint networks, leaving the IPCMs disconnected. Now, if IPCM1 was awaiting a status from IPCM2, there is no way to get it and relay the result to P1.

Case 2:

Consider figure 2, if there isn’t a reliable failure detection mechanism present in the system and IPCM2 sends a status message to IPCM1, then it can never be sure it reached or not without an acknowledgement. This leads to an infinite exchange of messages.

Time-outs

Time-outs are required because the system has finite resources and can’t afford to be deadlocked forever. The paper states that:

FACT 3: In a distributed system with timeouts, it is impossible to provide complete status (even if the system is absolutely reliable).

In figure 3, P1 is trying to send P2 a message through a chain of IPCMs.

Suppose if I1 takes data from P1 but before it hears about the status of the transaction, P1’s request times out. IPCM1 has now knowledge about the final outcome whether the data was successfully received by P2. Whatever status it returns to P1, it may prove to be incorrect. Hence, it’s impossible to provide complete status in a distributed facility with time-outs.

Insertion Property

An IPCM has insertion property if we insert an intermediate process P between two processes P1 and P2 that wish to communicate such that:

P is invisible to both P1 and P2
The status relayed to P1 and P2 is the same they’d get if directly connected

FACT 4: In a distributed system with timeouts, the insertion property can be possessed only if the IPCM withholds some status information that is known to it.

Delayed status is required to fulfill the insertion property. Consider that the message is sent from P1 to P2. What happens if P receives P1’s message, it goes into await-status state but it times out before P could learn about the status?

We can’t tell P1 the final outcome of the exchange as that’s not available yet. We also can’t let P know that it’s in await-status state because that would mean that the message was received by someone. It’s also not possible that P2 never received the data because such a situation cannot arise if P1 and P2 are directly connected & hence violates the insertion property.

The solution to this is to provide an ambiguous status to P1, one that is as likely to be possible if the two processes were connected directly.

Thus, a deliberate suppression of what happened is introduced by providing the same status to cover a time-out which occurs while awaiting status and, say, a transmission error.

Logical and Physical Messages

The basic function of an IPCM is the transfer and synchronization of data between two processes. This may happen by dividing the physical messages originally sent by the sender process as a part of a single operation into smaller messages, also known as logical message for the ease of transfer.

Buffer Size Considerations

As depicted in figure 5, if a buffer mismatch arises, we can take the following approaches to fix it:

Define a system-wide buffer size. This is extremely restrictive, especially within a network of heterogeneous systems.
Satisfy the request with the small buffer size and inform both the processes involved what happened. This approach requires that the processes are aware of the low level details of the communication.
Allow partial transfers. In this approach, only the process that issued the smaller request (50 words) is woken up. All other processes remain asleep awaiting further transfers. If the receiver’s buffer isn’t full, an EOM (End Of Message) indicator is required to wake it up.

Partial Transfers and Well-Known Ports

In figure 6, a service process using a well-known port is accepting requests for sever user processes, P1…Pn. If P1 sends a message to the service process that isn’t complete and doesn’t fill its buffer, we need to consider the following situations:

The well-known port is reserved for P1. No other process can communicate with the service process using it until P1 is done.
When the service process times out while P1 is preparing to send the second and final part of the message, we need to handle it without informing P1 that the first part has been ignored. P1 isn’t listening for incoming messages from the service process.

Since none of these problems arise without partial transfers, one solution is to ban them altogether. For example:

This is the approach taken in ARPANET where the communication to well known ports are restricted to short, complete messages which are used to setup a separate connection for subsequent communication.

Buffer Processes

This solution is modeled around the creation of dynamic processes.

Whenever P1 wishes to transfer data to the service process, a new process S1 is created and receives messages from P1 until the logical message is completed, sleeping as and when required. Then it sends the complete physical message to the service process with EOM flag set. Thus no partial transfers happen between S1 and the service process, they’re all filtered out before that.

However, this kind of a solution isn’t possible with well-known ports. S1 is inserted between P1 and the service process when the connection is initialized. In the case of well-known ports, no initialization takes place.

In discussing status returned to the users, we have indicated how the presence of certain other features limits the information that can be provided.

In fact, we have shown situations in which uncertain status had to be returned, providing almost no information as to the outcome of the transaction.

Even though the inclusion of insertion property complicates things, it is beneficial to use the weaker version of it.

Finally, we list a set of features which may be combined in a working IPCM:

(1) Time-outs

(2) Weak insertion property and partial transfer

(3) Buffer processes to allow

(4) Well-known ports — with appropriate methods to deal with partial transfers to them.

distributed systems - freeCodeCamp.org

How to Use Time To Live in Event-Driven Architecture in AWS

Table of Contents

What is Time to Live (TTL) in Distributed Systems?

Where does TTL make sense?

Where does TTL not make sense?

How to Use TTL in Message Queues (AWS SQS)

Visibility Timeout in Message Queues (AWS SQS)

How to Use TTL in Object Storage Systems (AWS S3)

How to Use TTL in Databases (AWS DynamoDB)

How to Use TTL in Event-Based Architecture

Summary

How to Partition Queues for Distributed Workloads

What is a Distributed System?

What is a Queue?

Challenges of Long-Running Queues in a Distributed System.

What does it mean to partition a Queue?

Effective Task Queue Partitioning Strategies

Hash-based partitioning

Range-based Partitioning

Round-Robin Partitioning

Dynamic Partitioning

Conclusion

How to Deal with Traffic Surges in Distributed Systems

Table of Contents

What is Traffic in the Context of Distributed Systems?

Why Can Traffic Surges be Problematic?

Ways to Deal with High Traffic Loads

Exponential Backoff and Retries

Summary

Asynchronous vs Batch Data Processing in Distributed Systems – Explained with Examples

Table of Contents

Batch Processing of Data

What is Batch Processing?

When Do We Use Batch Processing?

Real World Example of Batch Processing of Data

What Does Batch Processing Look Like in Code?

Asynchronous Processing of Data

What is Asynchronous Processing?

When Do We Use Asynchronous Processing?

Real World Example of Asynchronous Processing of Data

What Does Async Processing Look Like in Code?

Summary

How to Create Multiple Virtual Hosts on a Single RabbitMQ Instance

How to Run a RabbitMQ Instance Locally

How to Create a Virtual Host

How to Create a New User

How to Assign the New User to the Created Virtual Host (Vhost)

How to Test the Connection for the New User and Vhost

Wrapping Up

The Design Patterns for Distributed Systems Handbook – Key Concepts Every Developer Should Know

Prerequisite Knowledge

Here's What We'll Cover:

What are Distributed Systems?

Common Challenges in Distributed Systems

Command and Query Responsibility Segregation (CQRS) Pattern

Pros

Cons

Use Cases

Two-Phase Commit (2PC) Pattern

Pros

Cons

Use Cases

Saga Pattern

Orchestration

Pros

Cons

When to Use Orchestration?

Choreography

Pros

Cons

When to Use Choreography:

Replicated Load-Balanced Services (RLBS) Pattern

Pros

Cons

When to use load balancing

Sharded Services Pattern

But, how do we implement this?

Pros:

Cons: