Daniel Adetunji - freeCodeCamp.org

How Auto Scaling and Load Balancing Work in Software Architecture

Daniel Adetunji — Mon, 17 Jun 2024 18:52:13 +0000

While auto scaling and load balancing are two separate techniques in software architecture management, they are often implemented simultaneously. In the software architecture wild, one rarely exists without the other, as they complement each other to handle unpredictable changes in demand.

This article will explain how auto scaling and load balancing work and why they're important to consider in your designs. It will also go through example architectures showing auto scaling and load balancing in action.

Auto Scaling Explained
Dynamic Scaling
Scheduled Scaling
Why use Auto Scaling
Load Balancing Explained
Why use Load Balancing
Bringing it Together – Load Balancing and Auto Scaling in Action

Auto Scaling Explained

Auto scaling, as its name implies, is simply a way to automatically scale your compute instances. With most cloud providers like AWS, GCP and Azure, you select scaling policies that define how it will add or remove instances.

Scaling policies are simply rules that say how much you should increase or decrease the number of instances based on some predefined metric.

Scaling policies can be dynamic, for example, by adding new instances based on CPU utilisation of the existing instances. Scaling policies can also be based on a schedule, that is based on specific times of the day or the week when you anticipate higher or lower demand.

Dynamic Scaling

Dynamic scaling is ideal for when there is a large fluctuation of demand at unknown and unpredictable times. You know there may be a sudden surge or drop in demand on your instances, you just don’t know when.

Using a restaurant analogy, think of an instance as a chef doing the work of converting orders into meals. If you only have three chefs and don’t have large fluctuations in demand throughout the day or week, you have nothing to worry about.

But if your restaurant had a sale that was more popular than anticipated, or a large party of tourists were to suddenly descend upon the restaurant, how would you cope? What if you could add more chefs on the fly immediately when needed?

This is how dynamic auto scaling works. Dynamic scaling will cause chefs to spontaneously appear in the kitchen, ready to transform orders into delicious meals, based on a predefined metric that you can choose to measure how overworked the chefs are – that is, how much they are struggling to fulfill current orders.

Remember that these scaling policies are simply rules. These rules can be very simple, like:

if CPU utilisation is > 50%, add one more instance. If CPU utilisation is <50%, remove an instance.

These rules can also be more complex.

With AWS and GCP, for example, you can set a target tracking metric that will monitor the CPU performance of your scaling group and add or remove instances so that that the average CPU utilisation approximately matches your desired setting.

For example, if you specify that you want the average CPU utilisation of your scaling group to be at 60%, instances will be added or removed as required to approximately meet that target.

Using CPU utilisation to trigger a scaling action is one of the most popular patterns. But CPU utilisation is not the only metric you can use to scale. In some ways, it can actually be suboptimal to use CPU utilisation, especially if you want even more responsive scaling.

What if you could track another metric that anticipates the increase in CPU utilisation so you don’t have to wait for the inevitable increase in the CPU utilisation of your instances before a scaling action is triggered?

With GCP, for example, if you have an HTTP load balancer in front of your instances, you can configure your scaling to be triggered based on the number of requests hitting your load balancer. Similarly with AWS, if you have an SQS queue in front of your instances, you can scale based on the number of messages in the queue.

In both of these examples, something else anticipates a likely increase in future CPU utilisation, so setting a scaling action to be triggered based on this is a way of creating more responsive scaling.

Bringing back our restaurant analogy, this would be like calling in more chefs to the kitchen once you see a large queue outside the restaurant. This is a more responsive way of dealing with a sudden surge in demand compared to waiting until your chefs are overwhelmed with orders.

Scheduled Scaling

Scheduled scaling is ideal for when there is a large fluctuation in demand at known times.

Using the restaurant analogy again, your scaling policy can be based on a schedule. So for example, if you know evenings and weekends are busier than mornings and weekdays, your scaling policy will ensure that there are more chefs during periods of higher expected demand.

With AWS and GCP, you can set a scheduled scaling policy to add or remove instances at specific times.

Why Use Auto Scaling?

Auto scaling solves the age old problem of capacity planning. Trying to accurately forecast how much compute will be required in the future is fraught with errors. Too little capacity, and your website is down during periods of high demand, costing you money and reputation. Too much capacity, and you are paying for unused instances.

Capacity planning is fundamentally a forecasting problem. And humans are not great at accurately forecasting the future. Before cloud providers like AWS, GCP, and Azure existed, companies needed to plan capacity based on expected future demand. This planning process was often just disguised guesswork. You had to pay upfront for servers and hope you didn’t significantly under or overestimate how many servers you needed.

The problem with forecasting arises because we have a misguided faith in the precise measurement of the unknowable future. Humans have been making inaccurate forecasts for a long time. As far back as 600 BC, the Greek philosopher Thales was so intent on counting the stars that he kept falling into potholes on the road.

Some things are fundamentally unknowable, and that is ok. Auto scaling removes the need to accurately forecast future demand since you can automatically increase or decrease the number of instances you have based on your scaling policy.

By using auto scaling, you get to improve the resilience of your architecture and reduce costs. These are the two main reasons to use auto scaling in your designs.

Improve Resilience

Being able to automatically and immediately increase the number of instances in response to growing demand reduces the chances that your instances are under excessive load and at risk of poor performance. This improves the resilience of your architecture.

Auto scaling is, however, not only about scaling. It can also be used to maintain a set number of instances. This is a great way of creating self healing architectures.

With AWS, you can set your minimum, maximum, and desired number of compute instances, without any scaling policy. AWS will simply attempt to maintain the desired number of instances specified by you. So if you set the min, max, and desired all equal to one, AWS will maintain one instance for you. If this instance fails, another will be automatically created to replace the failed instance to restore your desired capacity.

This is a cheap and easy way of ensuring high availability without having multiple instances in different availability zones.

Self healing in action, figuratively

The ability to create self healing architectures is a really strong argument to almost always place your instances in an auto scaling group. AWS and GCP do not, as of this writing, charge you for the privilege of using auto scaling. You only pay for the underlying infrastructure that is created to support your instances.

So, even if there is no requirement to be able to scale instances based on the demand thrown at them, having instances in an auto scaling group is a cheap and easy way of creating a self healing architecture.

Reduce Cost

Previous examples have been about scaling up the number of instances to meet higher demand. But equally as important is the ability to scale down during periods of lower demand.

Auto scaling allows you to do this using scheduled or dynamic scaling policies. This is a great way of ensuring that you are not paying for more than you need to.

Load Balancing Explained

Load balancers accept connections from clients and distribute the requests across target instances. The distribution of requests is usually done on layer 7 (application layer) or layer 4 (transport layer). These layers are a theoretical model that organises computer networking into 7 layers and is know as the OSI model.

I won't go into too much detail on the OSI model here, but for now, what is important to know is that most load balancers can work on the application layer or transport layer. This means that they work with layer 7 protocols like HTTP(S) or layer 4 protocols like TCP, UDP, SMTP, SSH.

The example in this section will only cover the more popular layer 7 application load balancers that work with HTTP/HTTPS.

While the low level implementation details and use cases between layer 7 and layer 4 load balancers are different, the principles remain the same. Load balancers are used to distribute incoming traffic across a number of target instances

The distribution of the requests among the target instances typically uses a round robin algorithm where requests are sent to each instance sequentially. So, request #1 goes to instance #1, request #2 to instance #2, request #3 to instance #3, request #4 again comes to instance #1, and so on.

While other balancing algorithms exist, the round robin algorithm is the most popular one used by most cloud providers for load balancing.

A simple view of how load balancers distribute requests

The diagram above is a logical depiction of how load balancers work. It only shows one load balancer, which is not a very resilient design. This logical abstraction is easy to illustrate, but is not accurate.

Behind the scenes, multiple load balancer nodes are deployed into each subnet within an availability zone. The load balancer is created with a single DNS record that points at all the elastic load balancer nodes created – that is, this single DNS record points at all of the IP addresses of the actual nodes deployed. All incoming requests are distributed equally across all the load balancer nodes and the load balancer nodes in turn equally distribute requests to target instances. In this way, you don’t have a single point of failure.

A more realistic, albeit more complex, representation of how load balancers work is shown below. In this example, requests will come to any of the load balancer nodes deployed across the three subnets and then they are equally distributed across the target instances.

A more accurate view of how load balancers distribute requests

Why Use Load Balancing?

Load balancers ensure that traffic is distributed among the target instances. This spreads out the load and prevents a single instance from being overloaded with an excessive number of requests.

Load balancers also create a loosely coupled architecture. Loose coupling is generally sought because it mans that users don't have to be aware of the instances, or instances don't need to be aware of other instances.

What exactly does being “aware” mean? Since user requests are first sent to the load balancer, users are not aware of the instances actually responding to their request. All communication is done via the load balancer, so it becomes easy to change the type and number of instances without the user being aware of it. The load balancer is aware of the instances in its target so it can send the request to all relevant instances.

Bringing it Together – Load Balancing and Auto Scaling in Action

The diagram below shows load balancing and auto scaling used for a three tiered web application consisting of web, application, and database tiers. Each of these tiers have separate instances/infrastructure.

Load balancing and auto scaling used for a three tiered web application consisting of web, application, and database tiers.

The instances in the web and application tiers are in separate auto scaling groups. There is also a load balancer between the user and the web tier, and between the web tier and the application tier.

By having a load balancer between the user and the web tier, the web tier can scale independently, using the auto scaling feature to add or remove instances as needed.

The user does not need to know which instance to connect to as the connection is through a load balancer. This is loose coupling in action. The same logic applies between the web tier and application tier. Without the load balancer, the instances in the two tiers would be tightly coupled, making scaling difficult.

The database tier in this case is an RDS database with one master and two standby nodes. All reads and writes go to the master node and if this node fails, there is an automatic failover to one of the standby instances.

Auto scaling ensures:

Resilience, as it can automatically and immediately increase the number of instances in response to growing demand. It can also self heal, so even if you don’t anticipate the need for immediate and automatic scaling based on changes to demand, self healing is almost always desired as it increases the availability of your architecture
Cost control, as it has the ability to scale in and reduce the number of instances used during periods of lower demand can save you money

Load balancing ensures:

Distribution of load, as it prevents a single node being overloaded with requests
Loose coupling, as it removes the need for awareness between users and instances, and between instances themselves. This allows for instances to scale independently

Thank you for reading!

Microservices vs Monoliths: Benefits, Tradeoffs, and How to Choose Your App's Architecture

Daniel Adetunji — Tue, 14 May 2024 00:18:44 +0000

When you're tasked with designing an application, one of the first questions that probably comes to your mind is whether to design a microservice or a monolith.

The consequences of this seemingly simple and innocuous decision are potentially significant, and they're often not fully thought through. A wrong decision can be very expensive, not just financially, but also expensive with regard to the time required to develop the application and the time required to deploy any future changes.

There is no objectively correct approach, though. It all depends on what problem you are trying to solve and what trade-offs you are able to live with.

This article will explain the differences between monoliths and microservices as well as some heuristics to help you decide how to choose between the two architectures.

Monoliths vs Microservices: An Analogy
What is a Monolith?
What are Microservices?
Data Management in Microservices
Database Isolation in Microservices
How to Choose Between Monoliths and Microservices
Why you should start with a Monolith
Why you should start with a Microservice
Hybrid Architecture – A Middle Ground
Bringing it Together

Monoliths vs Microservices: An Analogy

Before we go into the technical details of monoliths and microservices, let’s quickly explain the difference between the two architectures using an analogy.

A monolithic architecture is like a typical restaurant, where all kinds of dishes are prepared in one large kitchen and a single menu is presented to guests to choose from.

Just as the restaurant offers everything from starters to desserts in one place, a monolith includes all functionalities in one codebase.

A typical restaurant is like a monolithic application

A microservice architecture is like a food court composed of several small, specialised stalls, each serving a different type of cuisine. Here, you can pick and choose dishes from various stalls, each expertly preparing its own menu.

A food court is like a microservice application

In a microservice architecture, the application is divided into smaller, independent services. Just as each stall in the food court manages its own menu, staff, and kitchen, in a microservice architecture, different services run separately and are responsible for handling their specific functionalities.

Customers can pick and choose dishes from any stall, mixing and matching as they like, just as different microservices can be used in combination to create a comprehensive application. Each service is self-contained and communicates with other services through simple, well-defined interfaces.

What is a Monolith?

In a monolith, all the code needed for the all the features of the application is in a single codebase and gets deployed as a single unit.

Let's look at an e-commerce application, for example. Some of the important features of an e-commerce application are:

Product search service: Manages product listings, descriptions, inventory, prices, and categories. It's responsible for providing up-to-date information about products to other services and users.
Payment service: Handles processing of payments and transactions. It interacts with external payment gateways and provides secure payment options to customers.
Order management service: Manages the lifecycle of customer orders from creation to completion. This includes handling order processing, status updates and order cancellation.
Recommendation service: Provides personalised product recommendations to users based on their search history and past purchases.

In a monolithic application, the code for these features will be in a single codebase and deployed as a single unit. This is illustrated in the image below where the application is deployed to a single server with a separate database.

Monolithic e-commerce application deployed on a single server

The database is hosted on a separate server to improve performance and security, while the application servers handle the business logic.

Even in a monolithic architecture, the application can be duplicated and deployed across multiple servers, with a load balancer distributing traffic between the servers. This is illustrated below:

Monolithic e-commerce application deployed on two separate servers

What are Microservices?

Microservices are independently deployable services modeled around a business domain.

In contrast to a monolithic architecture, where all the application components are tightly integrated and deployed as a single unit, a microservices architecture breaks down the application into smaller, independently deployable services. Each service runs its own process and communicates with other services over a network, typically using HTTP/REST, RPC, or message queues.

We can brea the monolithic e-commerce application we talked about above down into a microservice architecture, as shown below:

Microservice e-commerce application

The following are some key differences between the monolithic and microservices e-commerce application:

In the microservice architecture, every feature of the application is in a separate codebase. This separation ensures we have independently deployable services modeled around business domains (Product Search Service, Payment Service, Order Management Service and Recommendation Service).

Having a separate codebase for every service ensures:

Simplified deployment: With each service in its own codebase, it can be updated, tested, and deployed independently of others.
Fault Tolerance: Separate codebases contribute to fault tolerance. If one service experiences a failure, it does not necessarily compromise the operation of others. This is crucial for maintaining the overall system's availability and reliability. For example, if the payment service fails, only customers that want to purchase an item will be affected. Other customers can still search through the application for things to buy, track existing orders, and get recommendations for things they might want to buy.
Technology Flexibility: Separate codebases allow each service to be developed using the technology stack best suited to its needs. Different teams can choose different programming languages, frameworks, or databases depending on what works best for the specific functionality of that service.
Each service is deployed on its own servers. The servers hosting each service can be scaled independently based on its specific demand and resource requirements. This is much more efficient than scaling a monolithic application where scaling up often means scaling the entire application, even if only one part of it is under heavy load. For example, the payment service might be really busy during a promotion/sale. This can be independently scaled instead of scaling the entire application, which can be a waste of money.

Each service has its own database (if it needs a database). This ensures:

Every microservice can run independently of other services. If every service used the same database (as is the case in a monolithic application), a database failure will bring down the entire application.
The databases can be scaled independently as needed. Some databases will be busier than others, so having the flexibility to scale them independently is useful.
Every microservice uses the right type of database. Some microservices might function better with different types of databases. For example, Elasticsearch would be ideal for the product search database of the e-commerce application due to its powerful full-text search capabilities, while a relational SQL database will be better suited for the order and payment databases.
An API Gateway sits in front of the services. This acts as the middle-man between users and the many services they may need to access. The API Gateway handles authorisation and authentication, request routing and rate limiting.

Data Management in Microservices

Managing data between services is the most complex part of a microservice architecture. Communication between services is either synchronous or asynchronous.

Synchronous Communication: Services communicate directly with each other. This is a straightforward approach, easy to understand and implement.

For example, in an e-commerce application, when a customer places an order, the Order Management Service might directly call the Product Search Service to check if the item is in stock before proceeding.

Asynchronous Communication: Services do not wait for a direct response from another service. Instead, they communicate through events or messages using a message broker.

In the e-commerce example, when a new order is placed, the Order Management Service will publish an "Order Created" event to a message queue. The Product Search Service, subscribing to this queue, reacts to the event at its own pace and updates the inventory accordingly. This decouples the services, allowing them to operate and scale independently.

Synchronous communication is simpler to understand and implement but lacks fault tolerance.

Database Isolation in Microservices

In a microservice architecture, it is a standard practice to prevent services from directly accessing the databases of other services. You'd typically do this to ensure that each service can manage its data schema independently, without affecting other services.

Looking back at our e-commerce example, suppose the Payment Service decides to change its data schema and rename a column called “amount” to “order_value”, as “amount” can be quite an ambiguous term. If the Order Management Service were directly querying the Payment Service’s database, any direct SQL queries from the Order Management Service to the Payment Service’s database on this column would fail because of this schema change.

To handle these dependencies and changes securely and efficiently, the services should interact via APIs rather than via direct database access. By providing an API as an interface, the Payment Service can abstract the complexities of its underlying data model.

For instance, regardless of whether the database field is named “amount” or “order_value”, the API can expose a parameter called “payment_amount”. This allows the Payment Service to internally map “payment_amount” to whatever the current database schema is using.

How to Choose Between Monoliths and Microservices

Choosing between a monolith and a microservice architecture depends on what problem you are trying to solve and what trade-offs you are able to live with.

Microservices are newer and more popular with the large technology companies. Most technical books and blogs cover the architectures of these large companies.

But the engineering problems of large companies operating at scale are not necessarily the same engineering problems faced by smaller companies.

Copying what the large technology companies do is reasoning by analogy. This is not necessarily wrong, but it can introduce unnecessary complexities for a smaller company/startup. Better to reason by first principles, or better yet, choose better analogues.

You can look at what other startups are doing, or what the technology giants of today did when they were much smaller. For example, Etsy, Netflix and Uber all started as monoliths before migrating to a microservice architecture.

Why you should start with a Monolith

Creating an application should be done for one reason and one reason alone: to build something that people want to use. Users of your application do not care if you use a microservice or monolith. They care that you are solving a problem for them.

To quote Paul Graham:

“Almost everyone’s initial plan is broken. If companies stuck to their initial plans, Microsoft would be selling programming languages and Apple would be selling printed circuit boards. In both cases, their customers told them what their business should be and they were smart enough to listen”.

There is arguably no need to spend so much time designing and implementing a highly complex microservice architecture when you are not even sure that you are building something that people want to use.

So, why should you start with a monolith when building an application?

Simplicity: A monolith does not require dealing with the complexities of a distributed system, such as network latency, data consistency, or inter-service communication. This lack of complexity not only makes the initial development phase smoother but also reduces the overhead for new developers, who can contribute more quickly without having to understand the intricacies of a distributed system
Ease of Iteration: In the early stages of a product, rapid iteration based on user feedback is critical. The product direction is still evolving, and quick pivots or adjustments are necessary based on user input. This is usually easier to achieve with a simple monolithic architecture.
Low Cost: Running a monolithic application can be less expensive in the early stages, as it typically requires less infrastructure and fewer resources than a distributed microservices architecture. This is crucial for startups and small businesses where money can be in short supply.

Beginning with a monolith often aligns better with the practical realities of launching and iterating on a new application.

Why you should start with a Microservice

Scalability from the Start: One of the strongest arguments for microservices is their innate ability to scale. If you anticipate rapid growth in usage or data volume, microservices allow you to scale specific components of the application that require more resources without scaling the entire application. This can be particularly valuable for applications expected to handle varying loads or for services that might grow unpredictably.
Resilience: Microservices enhance the overall resilience of the application. Because each service is independent, failures in one area are less likely to bring down the whole system. This isolation helps in maintaining resilience by ensuring that parts of your application can still function even if others fail.
Flexible Tech Stacks: Microservices allow different teams to use the technology stacks that are best suited for their specific needs. Going back to our e-commerce example, the other services may be written in Java, but the recommendation service can be written in Python if the team responsible for building that has more expertise in Python. This is a very crude example, but the principle holds. A microservice architecture gives teams flexibility on which technology they can use. Taken to its logical extreme, this can also be a flaw since it can add additional complexity to the overall architecture. Introducing a different language for a service might require different build tools and deployment processes.

Hybrid Architecture – A Middle Ground

The formal, academic definition of a microservice is that it is an independently deployable service modeled around a business domain. Under the thumb of this definition, each business domain should be a separate service.

But you're not confined to this strict definition when it comes to implementing a design. Let’s look at our e-commerce microservice application again.

Microservice e-commerce application

We can choose to keep the product search service as a microservice. Since more people search for products than buy them, we may want the ability to scale this service independently of the others.

Also, this service will need its own dedicated full text search database like Elasticsearch or Solr. SQL databases are not well-suited for full text search and product filtering.

We can also choose to keep the recommendation service as a microservice since this will be written in a different language from the other services. This service will also need its own separate graph database like Neo4j to help make recommendations to users about what to buy based on their past searches and purchases.

We are left with the payment service and the order management service which can be combined into a monolith. This is illustrated below.

Hybrid monolithic/microservice architecture

In this example, we haven’t followed the academic definition of a microservice architecture, where every service is modeled around a business domain. Instead, we have chosen to be pragmatic and create microservices because we want to use a specific technology and because we want to be able to scale some services independently.

Bringing it Together

In a monolith, all the code needed for the all the features of an application is in a single codebase and is deployed as a single unit. In a microservices architecture, the application is divided into smaller, independent components, each responsible for specific features or functionalities. Each microservice has its own codebase and can be deployed independently of others.

Choosing between a monolith and a microservice depends on the problem you are trying to solve and what trade-offs you are able to live with.

Monoliths provide simplicity, ease of iteration and low cost. Microservices provide scalability, resilience and a more flexible tech stack.

For startups, the simplicity, ease of iteration, and cost-efficiency of a monolithic architecture make it an ideal initial choice, allowing them to focus on developing core features and finding product-market fit without the overhead of managing a distributed system.

For a more established company with growing needs for scalability, resilience, and technological flexibility, a microservice architecture can be a better choice.

What is Idempotence? Explained with Real-World Examples

Daniel Adetunji — Thu, 29 Feb 2024 00:44:58 +0000

Idempotence is a property of an operation that ensures that, if the operation is repeated once or more than once, you get the same result.

Apply it once or more and the outcome's the same, idempotence is the name.

All rhyming aside, idempotence is an important concept often used in the design of everyday things. The underlying principle of idempotence – where repeated actions do not change the outcome, beyond the initial action – has been applied implicitly or explicitly both to the physical world and the digital world of cloud computing and software applications.

This article will show you some examples of idempotence in the physical world, as well as how it is used in software architectures to build reliable and fault-tolerant systems.

Idempotence in the Physical World

Idempotent buttons are used in everyday systems, like traffic light buttons, the stop button on London buses, and elevator call buttons.

Some examples of idempotence in the physical world: a traffic light button for pedestrians, a stop button in a London bus, and an elevator call button.

Pressing a traffic light button multiple times does not make the light change faster – it simply registers the need for a pedestrian crossing once.

Similarly, pressing the stop button on a London bus signals the driver to stop at the next stop – but pressing it multiple times does not change the bus's route, make the bus stop faster, or cancel the initial stop request.

Idempotence Patterns in Software Architectures

Different patterns of idempotence are used in software architectures. We'll discuss two popular ones here.

API Design

In REST APIs, HTTP methods like GET, HEAD, PUT, and DELETE are inherently idempotent.

GET: Used to retrieve data from a server. Multiple GET requests to the same resource are safe and should return the same data, assuming no changes have been made to the resource in the meantime.
HEAD: Similar to GET, but it retrieves only the header information about a resource. Since it does not return a body, it's inherently safe and idempotent.
PUT: Replaces a resource's current representation with the request payload. Repeatedly putting the same data to the same resource endpoint will leave the resource in the same state.
DELETE: Removes a resource. Deleting the same resource multiple times results in the same outcome: the resource is removed after the first successful request, and subsequent DELETE requests typically return a 404 Not Found or 204 No Content status, indicating that there's no resource to delete.

A POST operation is not inherently idempotent, since it is typically used to create resources. But some implementations of POST can be designed to be idempotent.

A good example of this is the Post/Redirect/Get pattern, also referred to as the PRG pattern. This pattern is particularly useful for handling form submissions and can mitigate issues caused by users refreshing or bookmarking pages that make changes to the server's state. Let’s examine in detail how this works.

How to Make a POST Operation Idempotent

The sequence diagram below explains how the PRG pattern works to prevent duplicate orders in an e-commerce web application:

Sequence diagram showing how the PRG pattern works to "convert" a POST operation to an idempotent GET operation

Post: When the user submits a form to place an order, the browser sends a POST request to the server.
Redirect: After the server processes the POST request (for example, placing the order), it sends a redirect response to the browser, usually with a 303/302 HTTP status code, directing it to a new URL. This URL is typically an order confirmation page.
Get: The browser then makes a GET request to the URL provided by the redirect. The user sees the page that confirms their order or brings them back to a safe state where no duplicate orders can be accidentally created.

The key benefit of using the PRG pattern is that it turns the POST request into a GET request, which is idempotent. This means that refreshing the page at the end of the process will not cause the same order to be submitted more than once, because refreshing will only repeat the GET request, not the initial POST request that submitted the order.

This pattern enhances the user experience by preventing common mistakes, such as double-clicking a submission button or refreshing the page, creating unwanted duplicate orders.

It also makes the application more robust and user-friendly, as users can safely refresh the confirmation page or bookmark it without worrying about the order being duplicated.

Message Queueing Systems

A message queue can contribute to making a system idempotent by ensuring that even if a message (representing a request) is delivered multiple times, the operation it triggers is executed only once, or its effect is the same regardless of how many times it's executed.

This is crucial in distributed systems where network failures, system crashes, or other issues can lead to the same message being processed multiple times.

Let’s look at an example that involves making a payment. No customer wants to be accidentally double-charged when making a purchase, so making sure the system is idempotent is very important.

Sequence diagram showing how a message queue can make a system idempotent and prevent duplicate payments from happening

The message queue sends a message to the payment system to debit an account.
This message has a unique Transaction ID. The Database of Processed IDs maintains a record of Transaction IDs that have been processed. The Payment System checks the Transaction ID against the Database of Processed IDs and checks if this payment has already been processed.
If the message has a transaction ID in the Processed ID Database, it is ignored and treated as already processed. The payment has already been made and does not need to be repeated. The Payment System sends an acknowledgement (ACK) back to the queue to inform it that the message has been ignored. The message queue needs to know that the message has been handled before it deletes the message on its side.
If the message does not have a Transaction ID in the Processed ID database, this means the payment has not been processed before. The payment is therefore processed and the transaction ID is added to the Database of Processed IDs. Ideally, these two steps should be done in a single atomic transaction. This prevents an unwanted state where the payment is processed but the Transaction ID is never added to the Database of Processed IDs because of a database failure, networking issue or any other fault.
In the final step, the Payment System sends an acknowledgement (ACK) back to the queue to inform it that the message has been successfully processed. This acknowledgement informs the queue that the message has been successfully received, processed, and no longer needs to be kept in the queue for future delivery. This prevents the message from being sent again, ensuring that the system is idempotent.

In this example, the system ensures idempotency by:

Checking Transaction IDs for new payments against a database of payments already made
Sending the acknowledgement to the queue after the message is ignored or processed by the Payment System. Thus ensures that the same message is only sent to the Payment System once.

Bringing it Together

Idempotence solves one fundamental problem: how do you handle operations that, intentionally or by accident, can be repeated? Idempotence ensures that no matter how many times an operation is applied, the outcome remains the same after the first application, mitigating the risks associated with repeated actions.

The underlying principle of idempotence is used in the design of everyday objects we interact with in the physical world, from traffic light buttons for pedestrians to elevator call buttons.

In the abstract world of software architecture, idempotence ensures that repeated operations have the same effect as performing that operation just once. Idempotence allows us to build reliable and fault-tolerant architectures.

How Databases Guarantee Isolation – Pessimistic vs Optimistic Concurrency Control Explained

Daniel Adetunji — Mon, 05 Feb 2024 22:41:18 +0000

ACID (Atomicity, Consistency, Isolation, and Durability) is a set of guarantees when working with a DBMS. Pessimistic and optimistic concurrency control explains how databases achieve the “I” in ACID.

Isolation is a guarantee that concurrently running transactions should not interfere with each other. This is arguably the most important ACID property, because different DBMS can often have different default isolation levels. And you may need to change this based on what is needed for your application.

In a previous article, I explained the two main isolation levels used by most DBMS. These are the read committed and repeatable read isolation levels.

Pessimistic and optimistic concurrency controls essentially explain some of the ways a database is able to achieve these two isolation guarantees.

Pessimistic Concurrency Control
Pessimistic Concurrency Control Analogy
Real-World Example of Pessimistic Concurrency Control
Pros and Cons of Pessimistic Concurrency Control
How it Guarantees the Read Committed Isolation Level
Optimistic Concurrency Control
Real-World Example of Optimistic Concurrency Control
Pros and Cons of Optimistic Concurrency Control
How it Guarantees the Repeatable Read Isolation Level
Bringing it Together

Pessimistic Concurrency Control

With pessimistic concurrency control, the DBMS assumes that conflicts between transactions are likely to occur. It is pessimistic – that is, it assumes that if something can go wrong, it will go wrong. This pessimism prevents conflicts from occurring by blocking them before they get a chance to start.

To prevent these conflicts, it locks the data that a transaction is using until the transaction is completed. This approach is 'pessimistic' because it assumes the worst-case scenario – that every transaction might lead to a conflict. The data is therefore locked in order to prevent conflicts from happening.

I've mentioned two technical terms here that need clarification: locks and conflict.

What are locks?

A lock is a mechanism used to control access to a database item, like a row or table. Locks ensure data integrity, if multiple transactions are occurring at the same time.

In very simple terms, a lock is analogous to a reservation on the database item. A reservation, be it a restaurant, hotel, or a train, prevents other people from using the resource you reserved for a fixed duration of time. Locks work in a similar way.

There are two types of locks: a read lock and a write lock.

A read lock can be shared by multiple transactions trying to read the same database item. But it blocks other transactions from updating that database item.

A write lock is exclusive – that is, it can only be held by a single transaction. A transaction with a write lock on a database item blocks every other transaction from reading or updating that database item.

What are conflicts?

A conflict refers to a situation where multiple transactions are attempting to access and modify the same data concurrently, in a way that could lead to inconsistencies or errors in the database.

A Library Analogy for Pessimistic Concurrency Control

First, let us describe an analogy for a write lock.

Imagine you're at a library, and you want to borrow a hard copy of a popular book, say, The Great Gatsby by F. Scott Fitzgerald.

Write locks are analogous to borrowing a physical book from the library

With a write lock, the librarian assumes that there can be conflicts over who gets to borrow the book. So, they implement a strict rule to avoid conflicts: only one person can hold the reservation for a physical book at a time.

When you reserve the book, no one else can borrow it. The book is available to be reserved again only once it is returned. This is similar to how a write lock works.

Write locks are exclusive. This means that they can only he held by a single transaction at any time. Similarly, reserving a physical book from the library means no one else has access to it. Only the person with the reservation can read the book, or write in it (although writing in a library book is bad form).

Read locks work a bit differently.

A read lock is analogous to someone making a reservation to borrow an e-book. Borrowing an e-book is not a very popular thing to do, but some libraries do have such a service.

Many people can make the same reservation for the same e-book without any conflict. One person borrowing an e-book version of The Great Gatsby does not stop others from doing the same. But no one who borrows an e-book can update it, by scribbling notes in it that can be seen by others, for example.

Read locks are analogous to borrowing an e-book from the library

Pessimistic concurrency control is very safe because it prevents conflicts from occurring by blocking them before they get a chance to start. A write lock on a database item prevents other transactions from reading or updating that item while that lock is held, similar to to how a library stops more than one person from trying to borrow the same physical book at the same time.

A read lock on a database item allows other transactions to also obtain a read lock for that item, but prevents transactions from updating that item. This is analogous to borrowing an e-book, where multiple people can borrow the same e-book at the same time, but can’t make any updates to it.

A Simple Real-World Example of Pessimistic Concurrency Control in Action

Let's illustrate how pessimistic concurrency control works using a simple example involving a bank balance database table. Assume we have a table named Accounts with the following columns: AccountID and Balance.

Database columns for AccountID and Balance

Two transactions, T1 and T2, intend to update the balance of account 12345. T1 wants to withdraw $300, and T2 wants to deposit $400. At the end of these two transactions, the account balance should read $1600

Here are the steps of how this will work using write locks:

Start of T1 (Withdrawal): T1 requests to update the balance of AccountID 12345. The database system places an exclusive write lock on the row for AccountID 12345, preventing other transactions from reading or writing to this row until T1 is completed. T1 reads the balance ($1500).
T1 Processing: T1 calculates the new balance as $1200 ($1500 - $300).
Commit T1: T1 writes the new balance ($1200) back to the database. Upon successful commit, T1 releases the exclusive lock on AccountID 12345.
Start of T2 (Deposit) After T1 Completes: Now that T1 has completed and the lock is released, T2 can start. T2 attempts to read and update the balance for AccountID 12345. The database system places an exclusive lock on the row for AccountID 12345 for T2, ensuring no other transactions can interfere. T2 reads the updated balance ($1200).
T2 Processing: T2 calculates the new balance as $1600 ($1200 + $400).
Commit T2: T2 writes the new balance ($1600) back to the database. Upon successful commit, T2 releases the exclusive lock on AccountID 12345.
Result: The Accounts table is updated using locks After T1: $1200 After T2: $1600

Without a write lock in this example, T1 and T2 could read the original balance of $1500 at the same time. So, instead of a balance of $1200 after T1 has committed, T2 still reads the original balance of $1500 and adds $400. This would cause the final balance to be $1500 + $400 = $1900 (instead of $1600).

Absence of locking has created free money, which is never a bad thing for a customer. But, if money can be conjured out of thin air because of these conflicts, it can also vanish, and accidentally shrinking bank balances are a quick way to make customers unhappy.

Benefits and Challenges of Pessimistic Concurrency Control

Just like reserving a book ensures that it's set aside for one person, pessimistic concurrency control locks data for a single transaction. Other transactions cannot access or modify this data until the lock is released.

This method prevents two people from trying to take out the same popular book at the same time, thereby avoiding disputes. Similarly, in databases, it stops conflicts due to concurrent transactions before they get a chance to start.

But this approach can be inefficient. The reserved book might sit on the reserved shelf for a while, stopping other people from reading it.

In databases, this locking mechanism can lead to underutilisation of resources and a slowdown in the speed transactions take to complete, since a subset of the data is locked and inaccessible to other transactions.

How Pessimistic Concurrency Controls Guarantee the Read Committed Isolation Level

So, how exactly does pessimistic concurrency control work in ensuring the isolation guarantee, that is the “I” in ACID? The implementation details can vary across different DBMS. But the explanation here shows the general approach.

Recall that the read committed isolation level prevents dirty writes and dirty reads.

Preventing Dirty Writes

Overwriting data that has already been written by another transaction but not yet committed is called a dirty write. A common approach to preventing dirty writes is to use pessimistic concurrency control. For example, by using a write lock at the row level.

When a transaction wants to modify a row, it acquires a lock on that row and holds it until the transaction is complete. Recall that write locks can only be held by a single transaction. This prevents another transaction from acquiring a lock to modify that row.

Preventing Dirty Reads

Reading data from another transaction that has not yet been committed is called a dirty read. Dirty reads are prevented using either a read or write lock. Once a transaction acquires a read lock on a database item, it will prevent updates to that item.

But what happens if you are trying to read something that is already being updated but the transaction has not yet committed? In this instance, the write lock saves the day again.

Since write locks are exclusive (can’t be shared with other transactions), any transaction wanting to read the same database item will have to wait until the transaction with the write lock is committed (or aborted, if it fails). This prevents other transactions from reading uncommitted changes.

Optimistic Concurrency Control

With optimistic concurrency control, transactions do not obtain locks on data when they read or write. The "Optimistic" in the name comes from assuming that conflicts are unlikely to occur, so locks are not needed. If something does go wrong though, conflicts will still be prevented and everything will be OK.

Unlike pessimistic concurrency control – which prevents conflicts from occurring by blocking them before they get a chance to start – optimistic concurrency control checks for conflicts at the end of a transaction.

With optimistic concurrency control, multiple transactions can read or update the same database item without acquiring locks. How exactly does this work?

Every time a transaction wants to update a database item, say a row, it will also read two additional columns added to every table by the DBMS – the timestamp and the version number. Before that transaction is committed, it checks if another transaction has made any change(s) to that row by confirming if the version number and timestamp are the same.

If they have changed, that means another transaction has updated that row, so the initial transaction will have to be retried.

A Simple Real-World Example of Optimistic Concurrency Control in Action

Let's illustrate how optimistic concurrency control works using a simple example involving a bank balance database table. Assume we have a table named Accounts with the following columns: AccountID, Balance, VersionNumber, and Timestamp.

Table showing AccountID, Balance, VersionNumber, and Timestamp columns

Two transactions, T1 and T2, intend to update the balance of account 12345 at the same time. T1 wants to withdraw $200, and T2 wants to deposit $300. At the end of these two transactions, the account balance should read $1100

Here are the steps of how this will work:

Start of Transactions: T1 reads the balance, version number, and timestamp for AccountID 12345. Simultaneously, T2 reads the same row with the same balance, version number, and timestamp.
Processing: T1 calculates the new balance as $800 ($1000 - $200) but does not write it back immediately. T2 calculates the new balance as $1300 ($1000 + $300) but also waits to commit.
Attempt to Commit T1: Before committing, T1 checks the current VersionNumber and Timestamp of AccountID 12345 in the database. Since no other transaction has modified the row, T1 updates the balance to $800, increments the VersionNumber to 2, updates the Timestamp, and commits successfully.
Attempt to Commit T2: T2 attempts to commit by first verifying the VersionNumber and Timestamp. T2 finds that the VersionNumber and Timestamp have changed (now VersionNumber is 2, and Timestamp is updated), indicating another transaction (T1) has updated the row. Since the version number and timestamp have changed, T2 realises there was a conflict.
Resolution for T2: T2 must restart its transaction. It re-reads the updated balance of $800, the new VersionNumber 2, and the updated Timestamp. T2 recalculates the new balance as $1100 ($800 + $300), updates the VersionNumber to 3, updates the Timestamp, and commits successfully.

Result: The Accounts table is updated sequentially and safely without any locks: After T1: $800, VersionNumber: 2. After T2: $1100, VersionNumber: 3.

Benefits and Challenges of Optimistic Concurrency Control

On the positive side, avoiding locks allows for high levels of concurrency. This is particularly beneficial in read-heavy workloads where transactions are less likely to conflict, allowing the system to handle more transactions in a given period. For example, database backups and analytical queries typically used in a data warehouse.

But in scenarios where conflicts are frequent, the cost of repeatedly rolling back and retrying transactions can outweigh the benefits of avoiding locks, making optimistic concurrency control less efficient

How Optimistic Concurrency Controls Guarantee the Repeatable Read Isolation level

The repeatable read is more strict isolation level in that it has the same guarantees as read committed isolation, plus it guarantees that reads are repeatable.

A repeatable read guarantees that if a transaction reads a row of data, any subsequent reads of that same row of data within the same transaction will yield the same result, regardless of changes made by other transactions. This consistency is maintained throughout the duration of the transaction.

How can a repeatable read be achieved? Pessimistic control using a read lock can help with this, since a transaction with a read lock on a database item will prevent that item from being updated. But this can be inefficient, since a long running read transaction can block updates from happening to that database item.

Multi-Version Concurrency Control (MVCC) is a concurrency control method used by some DBMS to allow multiple transactions to access the same data simultaneously without locking the data. This makes it a popular choice for reducing lock contention and improving the scalability of databases.

MVCC achieves this by keeping multiple versions of data objects, which helps to manage different visibility levels for transactions depending on their timestamps or version numbers.

Bringing it Together

A lock is a mechanism used to control access to a database item, like a row or table. In very simple terms, it is analogous to a reservation on a database item.

Pessimistic concurrency control assumes the worst. It assumes that conflicts are likely to happen, so locks are used to block transactions that can cause conflicts before they even get a chance to start.

In situations where conflicts are common, such as a write heavy application, this approach can prevent the overhead associated with frequent rollbacks and retries (which happens in optimistic concurrency control) by ensuring exclusive access to database items during transactions.

Optimistic concurrency control assumes the best. It assumes that conflicts are unlikely to occur, so locks are not needed to stop transactions before they start. Instead, potential conflicts are checked at the end of a transaction and if any are found, the transaction is aborted or retried.

Optimistic concurrency control is useful for read heavy transactions with infrequent writes, as it allows multiple transactions to proceed without the need to use a lock, which can be inefficient.

ACID Databases – Atomicity, Consistency, Isolation & Durability Explained

Daniel Adetunji — Wed, 17 Jan 2024 17:45:53 +0000

ACID stands for Atomicity, Consistency, Isolation and Durability. These are four key properties that most database management systems (DBMS) offer as guarantees when handling transactions.

Most popular DBMS like MySQL, PostgresSQL and Oracle have ACID guarantees out of the box. Others have partial ACID guarantees like Redis, DynamoDB, and Cassandra. The trend, however, seems to be that more and more DBMS are offering ACID compliance.

It is important to note that while a lot of DBMS may say they are ACID compliant, the implementation of this compliance can vary.

So, for example, if isolation is a key property that you need for an application you are building, you need to understand how exactly your chosen DBMS implements isolation.

This article will explain what transactions are, and go through, in detail, what atomicity, consistency, isolation and durability mean, using analogies and real world examples.

What are Transactions?

Lots of things can go wrong when using a database:

the database hardware or software can fail
the application calling the database can fail mid-operation
the network can be flooded with more traffic can it can handle (rendering it inoperable)
several clients can make writes at the same time that overwrite the other’s changes
clients can read phantom data that should not be in the database

And so on – this is in no way an exhaustive list of things that can go wrong.

Since things can fail in more ways than we can possibly anticipate, trying to prevent every possible failure can become unnecessarily expensive and complicated. Instead, it is better to design a system that can continue to operate in spite of a failure. Transactions allow us to do this.

Transactions serve a single purpose: they make sure a system is fault tolerant. If a failure in a system occurs, can the system continue to operate without complete catastrophe? Phrased differently, can the system tolerate faults? An answer of ‘yes’ to this question means that such a system is fault tolerant.

So, what exactly is a transaction?

Not this kind of transaction

A transaction is an abstraction. It is a collection of operations (reads and writes) that are treated as a single logical operation.

Imagine you want to buy a single book from an online store, say amazon.com. The steps below show a simplified view of what needs to happen:

First, you select the book, which adds the item to your basket.
The inventory quantity of the book is checked to ensure it is valid (that is, the inventory value for the title you are buying needs to be greater than 0).
You click ‘buy’, which updates Amazon’s inventory for the book and decreases it by 1 (since you are buying a single book).
Also, your bank account balance is updated to account for the cost of the book.

A transaction ensures that all operations related to the purchase are treated as a single operation. If any part of the transaction fails, the entire transaction is rolled back, leaving the database in a state as if the customer had never attempted the purchase, thus maintaining the integrity of the data.

The transaction is committed when all the operations within the transaction are successfully completed and their results are permanently recorded. This permanence is typically achieved by writing the changes to the database's storage, which could be on disk for traditional databases or in memory for in-memory databases like Redis.

By treating all of these different operations as a single logical operation, the database is able to offer some guarantees as to how it can be fault tolerant. These guarantees are atomicity, consistency, isolation and durability.

What Does Atomicity Mean?

Atomicity simply means that all queries in a transaction must succeed for the transaction to succeed. If one query fails, the entire transaction fails.

An Atomic Restaurant

Imagine using a self-service machine at a fast-food restaurant. The transaction in this case is ordering food, and consists of two separate operations:

Select food
Make payment

Both of these must succeed for the transaction to succeed. If either fails, the transaction fails.

Customer making an order in an "atomic" restaurant

You select your burger, fries, and a drink from the touchscreen menu. The machine prompts you to pay, and only after your payment is processed successfully, it sends your order to the kitchen. Moments later, your entire order is ready, and you pick it up from the counter.

This is an atomic operation: the transaction (ordering food) is either entirely completed (if you select your food item and make a payment) or not completed at all.

Either part of the transaction failing means the entire transaction will fail. If your payment fails, the machine won't process any part of the order, so the transaction fails. If you make a payment without selecting a food item, the transaction also fails, as there is nothing for the kitchen to prepare.

A Non-Atomic Restaurant

Now consider the alternative, a traditional sit-down restaurant where you order several dishes. As each dish is prepared, it is brought to your table.

customer making an order in a "non-atomic" restaurant

Again, the transaction is ordering food, and consists of two separate operations:

Select food
Make payment

In this non-atomic restaurant, failure to make a payment does not stop the transaction from completing, since you pay after you have finished your meal. Partial failures do not cause a transaction to fail.

This creates a risk for the restaurant. Customers that choose to dine and dash can order food to their heart’s delight and then simply leave without paying, causing a financial loss for the restaurant.

non-atomic restaurants are at risk of customers doing a dine and dash

Atomic Transactions

If several SQL queries are grouped together in a transaction, atomicity is a guarantee that, should any of the queries fail for any reason (hardware, application or networking problems) then the transaction is aborted and the database returns to its previous state, as if nothing had happened.

Without atomicity, if a failure occurs while some queries are running, it is difficult to know which queries have been committed (that is, completed) and which have not. Running the queries again after a failure can compound the problem, since you risk introducing incorrect data to the database by re-running queries that previously succeeded.

Atomic transactions prevent such uncertainty, since you know that if the previous transaction failed, it failed in its entirety, and you can simply retry without worrying about introducing inconsistent data.

What Does Consistency Mean?

Consistency can mean different things in cloud/software engineering, depending on the context. In the case of ACID, the “C” was most likely added to make the acronym work.

Consistency in the context of ACID means consistency in data, which is defined by the creator of the database. The technical term for consistency in data is called referential integrity. Referential integrity is a method of ensuring that relationships between tables remain consistent. It's usually enforced through the use of foreign keys.

To understand referential integrity, consider the following.

Imagine a library system with two types of cards: a book card and a borrower's card.

The book card lists all the books available in the library.
The borrower's card tracks which books are borrowed by which members.

A book card and borrower card for a library

The rule of the library is that a book can only be listed on a borrower's card if it exists on a book card. This is referential integrity. If someone tries to list a book on a borrower's card that isn't on the book card (that is a book that doesn’t exist in the library), the system will not allow it.

While atomicity, isolation and durability are properties intrinsic to the database itself, consistency in data, or referential integrity, is not a property intrinsic to the database.

Consistency is defined by the creator of the database. The application calling the database relies on the atomicity and isolation properties of the database to maintain that consistency.

What Does Isolation Mean?

Isolation is a guarantee that concurrently running transactions should not interfere with each other. Concurrency here refers to two or more transactions trying to modify or read the same database record(s) at the same time.

There are three levels of transaction isolation. I'll just explain the two main ones below, arranged in order from the least strict to most strict.

Read Committed

This gives two guarantees. It prevents dirty reads and dirty writes.

No Dirty Reads: Reading data from another transaction that has not yet been committed is called a dirty read. With the read committed isolation level, you will only see data that has been committed by another transaction.

No Dirty Writes: Overwriting data that has already been written by another transaction but not yet committed is called a dirty write.

To understand how read committed isolation works, consider the following example.

Imagine a fast-food restaurant with only one last special burger available, and two hungry customers, Marie and Marko, are trying to buy it simultaneously.

Two customers ordering a burger at the same time

Marie checks the availability of burgers and sees the last one available. Unknown to her, Marko’s order is being processed but hasn't been finalised in the system, as he has not paid. Since his order has not yet been finalised, Marie is not aware that his order conflicts with her own. This is similar to a transaction reading the most recently committed data, where it does not see uncommitted changes (like Marko's pending order).
Marie places an order based on this incomplete information, thinking a burger is available.
Once Marko pays, the system updates to show that there are no burgers left. This is similar to a transaction being committed
Marie’s order will have to be aborted since there are no burgers left.

The key point here is step #3. What if Marko’s payment failed at this stage? Then the transaction will not be committed and there would still be a burger available for Marie.

In this example, read committed isolation ensures that Marie is not prematurely excluded from buying the burger just because someone else said they wanted it. Only committed transactions can be read. Therefore, the burger is available to be ordered as long as no one has paid for it.

Repeatable Read

The repeatable read is a more strict isolation level, in that it has the same guarantees as read committed isolation – plus it guarantees that reads are repeatable.

When a transaction reads the same data twice, but sees a different value in each read because a committed transaction has updated the value between the two reads, this is called a fuzzy read. The repeatable read isolation level prevents fuzzy reads.

Fuzzy reads are neither inherently good nor bad. It all depends on what you are trying to achieve.

Fuzzy reads are bad for long-running, read-only transactions, since new writes are likely to occur during the transaction and this can cause inconsistencies in the data. Examples of long running, read-only transactions are a database backup and analytical queries typically used in a data warehouse.

Repeatable reads are usually implemented by the DBMS by reading from a snapshot of the database which remains unchanged for the duration of the transaction, thereby ignoring any new committed writes in that period.

What Does Durability Mean?

Durability is a guarantee that changes made by a committed transaction must not be lost. All committed transactions must be persisted on durable, non-volatile storage, that is on disk. This ensures that any committed transactions are protected even if the database crashes.

Naturally, durability cannot protect against destruction of the disk which stores the data. Additional redundancy can be added by having backups of your database stored separately from the original.

Bringing it Together

ACID (Atomicity, Consistency, Isolation, and Durability) provides a set of guarantees when working with a DBMS. While most relational DBMS are ACID compliant, the implementation of this compliance can vary.

Atomicity ensures that all parts of a transaction are completed or none at all. Partial failures are not allowed.

Consistency, or referential integrity, ensures that data remains accurate and reliable, adhering to predefined rules. Unlike the other priorities, consistency is not intrinsic to the DBMS itself. Instead, the application calling the database relies on the atomicity and isolation properties of the database to maintain consistency.

Isolation is a guarantee that concurrently running transactions should not interfere with each other. This is arguably the most important property because a DBMS can often have different default isolation levels, which may need to be changed based on what is needed for your application.

Finally, durability is a guarantee that changes made by a committed transaction must not be lost.

What is an API Gateway and Why is it Useful?

Daniel Adetunji — Mon, 11 Dec 2023 21:40:25 +0000

APIs are often referred to as the front-door for applications to access data and business logic from backend services. As explained here, an API is essentially the interface that a piece of software presents to other humans or programs, allowing them to interact with that software.

When creating an API, you need to choose a programming language (Java, Python, PHP, and so on) in which to write the API logic. You also need to deploy the API to a server, and you need to monitor the API to ensure your infrastructure has enough capacity to deal with a large number of requests.

API gateways abstract these steps away. You don’t have to write much code or worry about managing the underlying infrastructure. You simply create API endpoints which clients can send requests to.

The major cloud providers all have a fully managed API gateway service:

This article will explain why you should use an API gateway, how they work, and we'll look at a real world example of an API gateway in action.

What we'll cover:

Why use an API gateway?
How an API gateway works
– Request validation
– Authorisation and authentication
– Rate limiting
– Request routing
– Request and response transformation
Real world example
Bringing it together

Why Use an API Gateway?

An API gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at almost any scale.

The term “fully managed” in the context of cloud computing means that the maintenance and management responsibilities of the service are handled by the cloud provider. This means the underlying infrastructure, software updates, security, scalability, availability and disaster recovery are all managed by the cloud provider.

This abstraction mostly makes life easier for developers, as they simply need to focus on developing the service instead of worrying about managing it. This is not always the case, though, as every abstraction comes with a price.

In this case, the price of such an abstraction is a loss of flexibility. Most API gateways offered by cloud providers have a hard limit on the number of requests per second (RPS) they can handle.

There is also the higher cloud cost of using a managed service like an API gateway, which must be weighed against the higher number of developer days (number of developers * number of days worked) needed to build an API from scratch.

To really understand the benefits of using an API Gateway, let’s have a look at the steps you need to follow to design, write, and deploy a traditional API:

Step 1: Define Requirements and Scope

Understand the needs of the target users or systems.
Determine the data and functionality the API will expose.

Step 2: Design the API

Define the API endpoints and methods (GET, POST, PUT, DELETE).
Design the request and response format (usually JSON or XML).
Specify the data models and resources the API will interact with.
Plan for error handling and status codes.

Step 3: Develop the API

Choose a programming language and framework.
Implement the API endpoints as defined in the design phase.
Integrate with databases or other services as needed.
Ensure security practices are implemented, like input validation and rate limiting.

Step 4: Deploy the API

Choose a hosting solution (cloud provider, on-premises servers).
Set up the deployment environment.
Deploy the API to the server.

Step 5: Monitor and Maintain the API

Monitor the API for uptime, performance, and errors.
Regularly update the API to fix bugs and patch security vulnerabilities

With an API gateway, you mainly need to focus on step 1, step 2, and parts of step 3. The other steps are mostly abstracted away and handled by the API gateway.

The main reason for using an API gateway is to simplify the process of developing and maintaining an API.

How an API Gateway Works

An API gateway does many things at the same time.

To understand how an API gateway works, let's consider a restaurant analogy.

An API gateway is like the maître d’ (French for head waiter, more or less). The maître d' is usually found in upscale restaurants, although it is a slowly dying profession.

The maître d' serves as a liaison between the guests and the restaurant staff, and is responsible for:

Greeting and Seating Guests: The maître d' is often the first person guests encounter when they arrive at the restaurant. They warmly welcome guests, inquire about reservations, and assist in seating them at their tables, taking into account preferences and special requests.
Reservations: The maître d's is responsible for managing reservations and ensuring that tables are allocated efficiently. They keep track of available tables and reservation times, making adjustments as necessary to accommodate guests.
Managing Wait Times: During busy periods, the maître d' manages wait times for guests by providing estimated wait times and offering alternatives, such as seating at the bar or in a waiting area.
Resolving Issues: If any issues or concerns arise during a guest's meal, the maître d' steps in to address them promptly and ensure that the guest is satisfied.
Handling Special Requests: If guests have special requests or dietary restrictions, the maître d' communicates these to the kitchen and ensures that the guest's needs are met.

In short, the maître d’ is a person with multiple talents and responsibilities in a restaurant. From the image below, we can see how the maitre’d serves as a communicator between the customers and whatever they might need.

A maître d' serves as the communicator between customers and whatever they might need.

An API gateway works in a similar fashion. It acts as the communicator between clients and the many services they may need to access. A simplified view of this is shown below.

API Gateway serves as the middle-man between clients and the many services they may need to access.

Let’s examine in more detail what an API gateway can do.

Request validation

This involves checking incoming requests to confirm they meet predefined criteria before forwarding them to the backend services.

This may include checking the structure of the request, validating data types, ensuring required parameters are present, and validating the query parameters, headers, and body of the request against a schema.

By doing so, the API gateway acts as a first line of defense, preventing malformed or malicious requests from reaching backend systems.

Using our restaurant analogy, this is similar to the maître’d waiting at the entrance of the restaurant to greet guests as they arrive. But remember, this is a fancy upscale restaurant. So the maître d’ ensures guests are dressed according to the restaurant's dress code – similar to validating the incoming API request against a predefined schema.

Authorisation and Authentication

Authentication is the process of verifying the identity of a user or service making a request, often through credentials like a username and password, tokens, or API keys.

Once authenticated, authorisation determines what resources or operations the authenticated entity has permission to access or execute.

API gateways often integrate with identity providers and support various authentication and authorisation mechanisms like OAuth, JWT, API keys, and so on. They ensure that only legitimate, authorised requests are allowed through to backend services.

Authentication is concerned with the “who” while authorisation is concerned with the “permissions”.

For the maître d’ waiting for guests as they arrive to the restaurant, authentication would involve the guests proving they are who they say they are, usually by showing some form of identification with a picture that can be matched to their faces.

Authorisation will involve checking that they have a reservation, that is that they have the permission to enter the restaurant and order a meal.

Rate Limiting

Rate limiting involves controlling the number of requests a user or service can make within a specified time frame, usually defined as a limit on the number of requests per second (RPS).

Rate limiting helps to avoid overloading of backend services, ensuring they remain available. Rate limiting is also used as part of a cost-control strategy, since you will pay for every request sent to the API gateway.

API gateways can enforce different rate limiting policies based on the user, service, or endpoint being accessed.

Drawing on our restaurant analogy, imagine our restaurant with guests inside, who have all been validated, authenticated, and authorised to enter the restaurant. But the guests are particularly hungry and thirsty and keep ordering meal after meal and drink after drink. At a certain point, this becomes unmanageable for the restaurant. The chefs and waiters are overworked and have no capacity to take on any new orders, plates and cutlery are all used up, and food in the kitchen is running out.

The maître d’ can step in and limit the number of orders customer are making. For example, by limiting the number of main courses or bottles of wine that can be ordered every hour. Rate limiting ensures that the restaurant is not overloaded with orders and is still able to serve new customers.

Request Routing

API gateways manage the routing of incoming requests to the appropriate backend services based on various criteria like the URL path, HTTP method, headers, or query parameters. It's integral for microservice architectures where different services handle different parts of the API.

Back to our restaurant analogy, based on what the guests are there for, the maître d’ directs them to the appropriate person or place – diners to a waiter, guests who only want a drink to the bar, and those inquiring about booking events in the restaurant to the event coordinator.

Request and Response Transformation

This involves modifying requests and responses as they pass through the API gateway.

For requests, this might mean adding, removing, or modifying headers, rewriting URLs, or even changing the request body. For responses, it might involve changing the status code, modifying headers, or transforming the body.

This capability allows the API gateway to serve as an intermediary that can transform requests and responses to meet the needs of both clients and backend services.

The backend services can also carry out this request and response transformation. The decision on which component (API gateway or a backend service) does the transformation is subjective. But an API gateway is often an ideal place to centralise such transformation with minimal effort, instead of having custom transformations in every backend service.

If a guest in a restaurant is gluten intolerant, for example, their orders have to be transformed to ensure that the their meal does not contain any gluten.

This logic of this order transformation can be handled by the maître d’ explicitly calling out which ingredients should be excluded from the dish before sending the order to the chef. This transformation can also be handled in the kitchen by the maître d’ simply telling the chef that the guest ordered a gluten-free dish and letting him modify the order accordingly.

Real World Example

A microservice architecture is an approach to developing software that breaks down a large application into smaller, independent components called microservices. Each microservice is a self-contained unit with a specific function or responsibility within the broader application.

The figure below shows a simple microservice architecture for a basic E-commerce application.

An API Gateway used in a microservice architecture for an e-commerce site

Clients: These are different clients that interact with the e-commerce platform. They can be a mobile app, a web browser, or any other third-party application.
API gateway: Serves as the single entry point for all types of clients. It routes requests to the appropriate microservices based on the nature of the request (user-related, product-related, order-related).
Services: These are examples of microservices specific to an E-commerce site. Each service handles a different aspect of the business logic like user profiles, product catalog, and order processing.
Databases: Each microservice has its own dedicated database, ensuring data isolation and service independence.

In this example, the API gateway:

Ensures every client request is validated
Ensures clients are authenticated and authorised before they can carry out some actions like making an order or writing a review for a product
Rate limits requests to ensure services are not taken down by malicious actors sending a high number of requests
Routes client requests to the appropriate backend services based on various criteria like the URL path, HTTP method, headers, or query parameters.
Handles request and response transformation. For example, the response from the Product Service might be in a complex format with extensive details. The API gateway takes this response and transforms it into a format that is more suitable for the mobile app. This might involve simplifying the data, converting it into a lighter format, or extracting only the essential information needed by the mobile app.

Bringing it Together

An API gateway is a fully managed service that makes it easier for developers to create, publish, maintain, monitor, and secure APIs at almost any scale. Being fully managed, it abstracts away the effort needed to manage and maintain the underlying infrastructure – this is handled by the cloud provider offering the service.

The API gateway acts as the middle-man between clients and the many services they may need to access. It handles request validation, authentication and authorisation, rate limiting, request routing, and request/response transformation.

It is especially useful in microservice architectures as the central point of entry for managing, processing, and routing incoming requests to the appropriate microservices. It plays a crucial role in simplifying the client-side interaction and provides a central interface for a group of microservices.

How Docker Containers Work – Explained for Beginners

Daniel Adetunji — Mon, 23 Oct 2023 16:45:13 +0000

A container is a lightweight, standalone, and executable software package that includes everything needed to run a piece of software.

And one of the most popular tools for working with containers is Docker.

Docker is both the name of the company (Docker Inc) and the software they have created which packages software into containers.

To understand how containers work and why they are incredibly useful for software development, you need to understand two seemingly unrelated topics – shipping containers and virtual machines.

A Brief History of Shipping Containers

"The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger" is a book by Marc Levinson. It explores the profound impact of the shipping container on global trade and the world economy.

While the history of the shipping container may seem irrelevant in a discussion about Docker containers, they have more in common than you would expect.

Before shipping containers, cargo handling was labor-intensive and time-consuming, leading to inefficiencies and delays in global trade. Cargo arrived in various shapes and sizes, and the lack of standardised packaging made it challenging to stack and secure items efficiently.

Without standardised containers, cargo was often stored haphazardly in the holds of ships or in dockyards. This inefficient use of space meant that ships were not carrying as much cargo as they could potentially hold, leading to higher transportation costs.

The adoption of uniform container dimensions and handling procedures allowed for seamless transfer of cargo between different modes of transportation – ships, trucks, trains, and the cranes used to move the containers around.

Image showing how standardised container sizes allow them to be easily moved between ships, trains and trucks.

This standardisation was the key to the success of shipping containers. After all, if one company’s containers didn't fit on another company's ship, truck, or freight train, they couldn't be properly transported. Every company would need its own fleet of containers to be able to send things to each of their customers – which would be an operational nightmare.

Standardisation of shipping containers makes them portable, that is easy to move from one place to another. This portability is a key feature of Docker containers as well, which we'll discuss shortly.

What are Virtual Machines?

Virtual machines (VMs) are created through a process called virtualisation.

Virtualisation is a technology that allows you to create multiple simulated environments or virtual versions of something, such as an operating system, a server, storage, or a network, on a single physical machine.

These virtual environments behave as if they are independent, separate entities, even though they share the resources of the underlying physical system.

Virtualisation is like having a magician's hat that can conjure up multiple hats within it. Just as the magician's hat creates the illusion of many hats appearing from just a single physical hat, virtualisation allows a single physical computer or server to appear as multiple virtual machines (VMs), each with its own operating system and resources.

VMs virtualise the hardware. This simply means that a VM takes a single piece of hardware – a server – and creates virtual versions of other servers running their own operating systems. Physically, it is just a single piece of hardware.

Logically, multiple virtual machines can run on top of a single piece of hardware. This is essentially one or more computers running within a computer, as shown below.

Image showing how virtualisation creates several virtual machines (VMs) from a single physical server

How does virtualisation work?

So you might be wondering – how exactly does virtualisation work? Have a look at the image below:

Image showing how virtualisation works by virtualising a single piece of hardware to create multiple virtual machines

At the base, you have the host hardware and OS. This is the physical machine that is used to create the virtual machines. On top of this, you have the hypervisor. This allows multiple virtual machines, each with their own operating systems (OS), to run on a single physical server.

VMs have a few downsides, though, which containers address. Two downsides particularly stand out:

VMs consume more resources: VMs have a higher resource overhead due to the need to run a full OS instance for each VM. This can lead to larger memory and storage consumption. This in turn can have a negative effect on performance and startup times of the virtual machine.
Portability: VMs are typically less portable due to differences in underlying OS environments. Moving VMs between different hypervisors or cloud providers can be more complex.

The major cloud providers all have VMs. For AWS, it's EC2, GCP has Compute Engine, and Azure has Azure Virtual Machines.

What are Containers?

A container is a lightweight, standalone, and executable software package that includes everything needed to run a piece of software, including the code, runtime, system tools, and libraries.

Containers are designed to isolate applications and their dependencies, ensuring that they can run consistently across different environments. Whether the application is running from your computer or in the cloud, the application behaviour remains the same.

Unlike VMs which virtualise the hardware, containers virtualise the operating system. This simply means that a container uses a single OS to create a virtual application and its libraries. Containers run on top of a shared OS provided by the host system.

This is illustrated below:

Image showing how containers works by virtualising the OS

The container engine allows you to spin up containers. It provides the tools and services necessary for building, running, and deploying containerised applications.

Containers have several benefits:

Portability: Containers are designed to be platform-independent. They can run on any system that supports the container runtime, such as Docker, regardless of the underlying operating system. This makes it easier to move applications between different environments, including local development machines, testing servers, and different cloud platforms.
Efficiency: Containers share the host system's operating system, which reduces the overhead of running a virtual machine with multiple operating systems. This leads to more efficient resource utilization and allows for a higher density of applications that can run on a single host.
Consistency: Containers package all the necessary components, including the application code, runtime, libraries, and dependencies, into a single unit. This eliminates the "it works on my machine" problem and ensures that the application runs consistently across different environments, from development to production.
Isolation: Containers provide a lightweight and isolated environment for running applications. Each container encapsulates the application and its dependencies, ensuring that they do not interfere with each other. This isolation helps prevent conflicts and ensures consistent behaviour across different environments.
Fast Deployment: Containers can be created and started quickly, often in a matter of seconds. This rapid deployment speed is particularly beneficial for applications that need to rapidly scale up or down based on demand.

What is Docker?

Now that we have covered VMs and containers, what exactly is Docker? Docker is simply a tool for creating and managing containers.

At its core, Docker has two concepts that are useful to understand: the Dockerfile and Docker Images.

A Dockerfile contains the set of instructions for building a Docker Image.

A Docker Image serves as a template for creating Docker containers. It contains all the necessary code, runtime, system tools, libraries, and settings required to run a software application.

So, a Dockerfile is used to build a Docker Image which is then used as the template for creating one or more Docker containers. This is illustrated below.

Image showing the steps to create a docker container. First you create the Dockerfile which is used to build the Docker Image which is finally used to run a Docker container

If this explanation still causes you to scratch your head, consider the following analogy using shipping containers.

Imagine you need to build multiple shipping containers to transport items all over the world. You start with a document listing out the requirements for your shipping container. This will contain information like the container dimensions, type of seals, door locking mechanisms, ventilation and refrigeration requirements (if you are shipping food that needs a temperature controlled environment, for example), and so on.

This requirement document will then be used to create a detailed template for the container which will include engineering drawings showing the dimensions and other specifications.

From this template, the physical containers will then be built. This single template can be used to build one or many physical containers which will all be identical and match the specifications in the container template.

This is illustrated below:

Image showing a shipping container analogue for docker containers

The Dockerfile is analogous to the requirements document, which simply has a set of instructions for building the container template.

The Docker Image is analogous to the container template, which details all the instructions needed for building the physical container.

Once created, Docker images are immutable, meaning they cannot be changed. If you need to make changes to an application, you need to modify the Dockerfile and create a new image. This immutability ensures consistency and reproducibility in application deployment.

And finally, the Docker container is analogous to the physical shipping container.

Bringing it Together

In summary, containers provide a portable and efficient way to package applications and their dependencies, ensuring consistency across various environments. The benefits they bring to software development is similar to the benefits brought to the global economy by the humble shipping container.

Portability

Shipping containers, through standardisation, ensure that any container, anywhere in the world, can be seamlessly used to move items across various modes of transportation – ships, trucks, trains and the cranes used to load them on and off different forms of transport.

Similarly, Docker containers allow for portability. They ensure that applications can run consistently across different environments, from development laptops to production servers, and across different cloud providers.

Increased Efficiency

With standard container sizes, the packing density of goods you can move increases. Now, you can squeeze more things into a single shipping container, compared to the days before the shipping container existed where you had cargo in non standard shapes and sizes stored haphazardly in the holds of ships or on dockyards. So, every ship, freight train or truck can carry more goods during every trip, making it cheaper to move goods around the world.

With Docker containers, better efficiency comes from the fact that containers share the host operating system, making them lightweight compared to VMs. This leads to rapid container startup times and less CPU, memory, and storage use.

Less resource utilisation also means that containers can increase the application density when compared to VMs. With containers, you can run more applications on the same hardware without a significant drop in performance.

To conclude, the shipping container by itself is not magical. After all, it is just a metal box. It is the standardisation of shipping containers which made them portable and a cheap and efficient way to move goods around the world.

In application development, containers benefit from standardisation in the same way. Containers provide a portable and efficient way to package applications and their dependencies, ensuring consistency across various environments.

API Integration Patterns – The Difference between REST, RPC, GraphQL, Polling, WebSockets and WebHooks

Daniel Adetunji — Mon, 09 Oct 2023 15:14:33 +0000

API stands for Application Programming Interface. The “I” in API is the key part that explains its purpose.

The interface is what the software presents to other humans or programs, allowing them to interact with it.

A good analogy for an interface is a remote control. Imagine you have a universal remote that can control your TV, lights, and fan.

Image showing a remote control and a TV, light fixture, and fan.

Let’s break down what a universal remote control can do:

The remote control has various buttons, each serving a different purpose. One button might change the channel, while another can dim the lights of the chandelier, and another can turn on the fan.
When you press a button, it sends a specific signal via infrared, bluetooth, or wifi to the object you are controlling, instructing it to perform a particular action.
The key thing about the remote is that it allows you to interact with the TV, chandelier, and the fan without understanding the internal workings of these objects. All that complexity is abstracted away from you. You simply press a button, and you get a response that you can observe straight away.

APIs work in a similar way.

APIs can have various endpoints, each designed to perform a specific action. One endpoint might retrieve data, while another updates or deletes it.
When you send a request to an endpoint, it communicates with the server using HTTP methods – GET, POST, PUT, DELETE to instruct it to perform a particular action (like retrieving, sending, updating, or deleting data).
The key thing about APIs, as with remote controls, is that APIs abstract away the inner workings of the server and the database behind the API. The API allows users, developers and applications to interact with a software application or platform without needing to understand its internal code or database structure. You simply send a request, the server processes it and provides a response.

This analogy only holds true for so long, as APIs are more complex than a remote control. But the basic principles of operation between an API and a universal remote are quite similar.

This article will explain API integration patterns, which can be split into two broad groups: Request-response (REST, RPC & GraphQL) and event driven APIs (Polling, WebSockets & WebHooks).

Request-Response Integration

In a request-response integration, the client initiates the action by sending a request to the server and then waits for a response.

Different patterns of the request-response integration exist, but at a high level, they all conform to the same rule of the client initiating a request and waiting for a response from the server.

1. REST

Rest stands for Representational State Transfer – the acronym is a combination of the first one or two letters from these three words. This is the simplest and most popular form of a request-response integration.

REST APIs use a stateless, client-server communication model, wherein each message contains all the information necessary to understand and process the message.

REST is all about resources. Resources are entities that the API exposes, which can be accessed and manipulated using URL paths.

To understand REST APIs, consider the following analogy. Imagine you go into a restaurant to order some food. The menu is extensive and items are categorically organised. Each item on the menu can be equated to a resource.

First, you call the waiter to get their attention, then you place an order. Each request receives a response, before you proceed with another request, like ordering a dish.

Restaurant analogy for REST API

In REST API terms, the client initiates requests to the server by specifying exactly what it wants using HTTP methods (such as GET, POST, PUT, DELETE) on specific URLs (the menu items). Each interaction is stateless, meaning that each request from the client to the server must contain all the information needed to understand and process the request.

The server then processes the request and returns the appropriate response – in our analogy, bringing the ordered item to the table.

Simple sequence diagram for REST API

2. RPC

RPC stands for Remote Procedure Call. Unlike REST APIs which are all about resources, RPC is all about actions. With RPC, the client executes a block of code on the server

Think of a restaurant without a menu. There is no dish you can request in this restaurant. Instead, you request a specific action to be performed by the restaurant.

Restaurant analogy for RPC

With a REST API, the guest would have simply asked for some fish and chips. With RPC, they have to give instructions on what they want the kitchen to prepare.

In the RPC pattern, the client calls a specific procedure on the server and waits for the result. The procedure to prepare and what gets prepared are tightly bound together. This might give the client very specific and tailored results, but lacks the flexibility and ease of use of REST.

There is a reason most restaurants use menus, instead of following the custom requests of their customers. This partly explains why RPC is a less popular integration pattern compared to REST.

3. GraphQL

With GraphQL, the client specifies exactly what data it needs, which can include specific fields from various resources. The server processes this query, retrieves the exact data, and returns it to the client.

This enables the client to have a high degree of flexibility and only retrieve exactly the data it needs. It also requires the server to be capable of handling more complex and unique queries.

In this way, GraphQL is a more customisable form of REST. You still deal with resources (unlike actions in RPC) but you can customise how you want the resource returned to you.

Think of a restaurant that allows you to customise your own dish by specifying exact quantities or ingredients you want.

Restaurant analogy for GraphQL

This may look similar to the RPC pattern, but notice that the customer is not saying how the food should be made, they're just customising their order by removing some ingredients (no salt) and reducing the number of some items (two pieces of fish instead of four).

One of the drawbacks of GraphQL is that it adds complexity to the API since the server needs to do additional processing to parse complex queries. This additional complexity would also apply to the restaurant analogy, since each order would need to be customised to the guest.

GraphQL has one clear benefit over REST and RPC. Since clients can specify exactly what they need, the response payload sizes are typically smaller, which means faster response times.

Event Driven Integration

This integration pattern is ideal for services with fast changing data.

Some of these integration patterns are also asynchronous and initiated by the server, unlike the request-response patterns which are synchronous and initiated by the client.

1. Polling

Let’s bring back the restaurant analogy. When you order food, it will take some time for it to be prepared.

You can get updates on your order by asking the waiter if it is ready yet. The more frequently you ask, the closer you will be to having real-time information about your order.

This, however, puts unnecessary strain on the waiter since they have to constantly check the status of your order and have them update you whenever you ask.

Restaurant analogy for polling

Polling is when the client continuously asks the server if there is new data available, with a set frequency. It's not efficient because many requests may return no new data, thus unnecessarily consuming resources.

The more frequently you poll (make requests) the closer the client gets to real-time communication with the server.

Simple sequence diagram showing polling in action

Most of the requests during polling are wasted, since they only return something useful to the client once there is a change on the server.

There is, however, another version of polling called long polling. With long polling, the waiter does not respond to the guest straightaway about the status of the order. Instead, the waiter only responds if there is an update.

Naturally, this only works if the guest and the waiter agree beforehand that a slow response from the waiter does not mean that the waiter is being rude and the guest is being ignored.

Restaurant analogy for long polling

With long polling, the server does not respond to the client immediately. It waits until something has changed before responding.

As long as the client and server agree that the server will hold on to the client’s request, and the connection between the client and server remains open, this pattern works and can be more efficient than simply polling.

These two assumptions for long polling may be unrealistic, though – the server can lose the client's request and/or the connection can be broken.

To address these limitations, long polling adds extra complexity to the process by requiring a directory of which server contains the connection to the client, which is used to send data to the client whenever the server is ready.

Standard polling on the other hand can remain stateless, making it more fault tolerant and scalable.

2. WebSockets

WebSockets provide a persistent, two-way communication channel between the client and server. Once a WebSocket connection is established, both parties can communicate freely, which enables real-time data flows and is more resource-efficient than polling.

Using the restaurant analogy again, a guest orders a meal and then establishes a dedicated communication channel with the waiter so they can freely communicate back and forth about updates or changes to the order until the meal is ready. This means the waiter can also initiate the communication with the guest, which is not the case for the other integration patterns mentioned so far.

Restaurant analogy for WebSockets

WebSockets are similar to long polling. They both avoid the wasteful requests of polling, but WebSockets have the added benefit of having a persistent connection between the client and the server.

WebSockets are ideal for fast, live streaming data, like real-time chat applications. The downside of WebSockets is that the persistent connection consumes bandwidth, so may not be ideal for mobile applications or in areas with poor connectivity

3. WebHooks

WebHooks allow the server to notify the client when there's new data available. The client registers a callback URL with the server and the server sends a message to that URL when there is data to send.

With WebHooks, the client sends requests as usual, but can also listen for and receive requests like a server.

Simple sequence diagram showing WebHooks in action

Using the restaurant analogy, when the guest orders a meal, they give the waiter a bell (analogous to the callback URL). The waiter goes to the kitchen and rings the bell as soon as the meal is ready. This allows the client to know, in real-time, about the progress of his order.

WebHooks are superior to polling because you get real-time updates from the server once something changes, without having to make frequent, wasteful requests to the server about that change.

They're also superior to long polling because long polling can consume more client and server resources as it involves keeping connections open, potentially resulting in many open connections.

Bringing it Together

In conclusion, APIs are crucial tools in software development, allowing users and applications to interact with software without understanding its inner workings.

They come in different integration patterns, such as REST, RPC, GraphQL, Polling, WebSockets, and WebHooks.

If you need a simple request-response integration, then REST, RPC or GraphQL could be ideal. For real-time or near-real-time applications, polling, WebScokets, or WebHooks are ideal.

As with any design problem, the right choice depends on the business case and what tradeoffs you are willing to tolerate.

Stateful vs Stateless Architecture – Explained for Beginners

Daniel Adetunji — Mon, 21 Aug 2023 21:08:59 +0000

In programming, "state" refers to the condition of a system, component, or application at a particular point in time.

As a simple example, if you are shopping on amazon.com, whether you are currently logged into the site or if you have anything stored in your cart are some examples of state.

State represents the data that is stored and used to keep track of the current status of the application. Understanding and managing state is crucial for building interactive and dynamic web applications.

The concept of a “state” crosses many boundaries in architecture. Design patterns (like REST and GraphQL), protocols (like HTTP and TCP), firewalls and functions can be stateful or stateless. But the underlying principle of “state” cutting across all of these domains remains the same.

This article will explain what state means. It will also explain stateful and stateless architectures with some analogies and the benefits and tradeoffs of both.

What is Stateful Architecture?

Imagine you go to a pizza restaurant to eat some food. In this restaurant, there is only a single waiter, and the waiter takes detailed notes on your table number, what you ordered, your preferences based on past orders, like what type of pizza crust you like or toppings you are allergic to, and so on.

Illustration of a waiter taking a person's order at a pizza restaurant

All of these pieces of information that the waiter writes down in their notepad are the customer's state. Only the waiter serving you has access to this information. If you want to make a change to your order or check how its coming along, you need to speak to the same waiter that took your order. But since there is only one waiter, that is not a problem.

Now, suppose the restaurant starts to get busier. Your waiter has to respond to other guests so more waiters are called to work. You now want to check the status of your order and make a small change to it – a plain crust instead of a cheesy crust. The only available waiter is different from the one who initially took your order.

Illustration showing a different waiter being unable to help the customer change their order

This new waiter does not have details of your order, that is your state. Naturally, they will not be able to check the status of your order or make changes to it. A restaurant that operates like this, where only the waiter that initially took your order can give you updates about it, or make changes to it, follows a stateful design.

Similarly, a stateful application will have a server that remembers clients' data (that is, their state). All future requests will be routed to the same server using a load balancer with sticky sessions enabled. In this way, the server is always aware of the client.

The diagram below shows two different users trying to access a web server through a load balancer. Since the application state is maintained on the servers, the users must always be routed to the same server for every single request in order to preserve state.

Diagram showing how a stateful application works

Sticky sessions is a configuration that allows the load balancer to route a user's requests consistently to the same backend server for the duration of their session. This is in contrast to traditional load balancing, where requests from a user can be directed to any available backend server in a round-robin or other load distribution pattern.

What is the problem with a stateful architecture? Imagine a restaurant run in this manner. While it may be ideal and easy to implement for a small, family run restaurant with only a few customers, such a design is not fault tolerant and not scalable.

What happens if the waiter who took a customer's order has an emergency and needs to leave? All the information regarding that order leaves with that waiter as well. This disrupts the customer’s experience, since any new waiter brought in to replace the old one has no knowledge of previous orders. This is a design that is not fault tolerant.

Also, having to distribute requests so that the same customer can only speak to the same waiter means that the load on different waiters is not equally distributed. Some waiters will be overwhelmed with requests if you have a very demanding customer who always modifies or adds things to their order. Some of the other waiters will have nothing to do, and can’t step in to help. Again, this is a non scalable design.

Similarly, storing state data for different customers on different servers is not fault tolerant and not scalable. A server failure will lead to loss of state data. So, if a user is logged in and about to checkout for a large order on Amazon.com for example, the user will be forced to re-authenticate and the user's basket will be empty. They would have to log in again and fill up their basket from scratch – a poor user experience.

Scalability will also be difficult to achieve during peak times like Black Friday with a stateful design. New servers will be added to the auto scaling group but since sticky sessions are enabled, clients will be routed to the same server, causing them to be overwhelmed, which can cause an increase in response times - a poor user experience.

Stateless architectures solve a lot of these problems.

What is Stateless Architecture?

“Stateless” architecture is a confusing term, as it implies the system is without state. A stateless architecture does not, however, mean that state information is not stored. It simply means that state information is stored outside of the server. Therefore, the state of being stateless only applies to the server.

Bringing back the restaurant analogy, waiters in a stateless restaurant can be thought of as having perfectly forgetful memories. They do not recognise old customers, and can’t recall what you ordered or how you like your pizza. They will simply take note of customers' orders on a separate system, say a computer, that is accessible by all the waiters. They can then revert back to the computer to get details of an order and make changes to it as required.

Illustration of a "forgetful" waiter taking an order and then consulting the computer about orders

By storing the ‘state’ of a customer's order on a central system accessible by other waiters, any waiter can serve any customer.

In a stateless architecture, HTTP requests from a client can be sent to any of the servers.

State is typically stored on a separate database, accessible by all the servers. This creates a fault tolerant and scalable architecture since web servers can be added or removed as needed, without impacting state data.

The load will also be equally distributed across all servers, since the load balancer will not need a sticky session configuration to route the same clients to the same servers.

The diagram below shows two different users trying to access a web server through a load balancer. Since the application state is maintained separately from the servers, the users can be routed to any of the servers, which will then get the state information from an external database accessible by both servers.

Illustration showing a diagram of stateless architecture

Typically, state data is stored in a cache like Redis, an in-memory data store. Storing state data in-memory improves read and write times, compared to storing it on disk, as explained here.

Bringing it Together

This article has described how stateful and stateless web applications work and the trade-offs of both. But the principle of statefulness and statelessness apply beyond web applications.

If we look at network protocols as an example, HTTP is a stateless protocol. This means that each HTTP request from a client to a server is independent and carries no knowledge of previous requests or their context. The server treats each request as a separate and isolated transaction, and it doesn't inherently maintain information about the state of the client between requests.

State is either maintained on the servers (stateful architecture) or in a separate database outside the servers (stateless architecture). The HTTP protocol itself does not maintain state.

Unlike the stateless nature of HTTP, the TCP protocol is connection-oriented and stateful. It establishes a connection between two devices (usually a client and a server) and maintains a continuous communication channel until the connection is terminated.

The same logic applies to firewalls as well, which can be stateful or stateless.

In AWS, a security group is a virtual firewall that controls inbound and outbound traffic for virtual machines or instances within a cloud environment. Security groups are stateful. When you allow a specific incoming traffic flow, the corresponding outgoing traffic flow is automatically allowed as well. In other words, the state of the connection is tracked.

Network Access Control Lists (NACLs) are used to control inbound and outbound traffic at the subnet level in AWS. NACLs are stateless. Being stateless means that you must explicitly define rules for both incoming and outgoing traffic.

Unlike security groups, where response traffic is automatically allowed when you allow incoming traffic, NACLs require you to define separate rules for inbound and outbound traffic.

Functions and design patterns can also be stateful or stateless.

The key principle behind something that is stateful is that it has perfect memory or knowledge of previous calls or requests, while something that is stateless has no memory or knowledge of previous calls or requests.

Hopefully you now have a good grasp of how stateful and stateless applications work and can decide which option is best for your applications.

Cloud Storage Options – Block Storage vs File Storage vs Object Storage Explained

Daniel Adetunji — Wed, 26 Jul 2023 17:10:33 +0000

There are three types of storage options offered by most cloud providers: block storage, file storage, and object storage (often referred to as BLOB or Binary Large Object).

This tutorial will explain these types of storage, their real-world use cases, and some trade-offs.

What is Block Storage?

Imagine you have a large bookshelf with many shelves, and each shelf can hold a specific number of pages from a book.

Now, let's say you have a collection of books, but they are all different sizes. To efficiently store these books, you decide to divide them into smaller uniform pieces, called blocks, that fit nicely on the shelves.

Each shelf can only store 100 pages from a book. The Great Gatsby has about 200 pages, so each shelf can only store half the book. Therefore, this single book will be stored in two separate shelves as shown below, with half of the book in one location and the other in another location.

We have defined a block as the maximum number of pages that can be stored in a shelf. In this example, the block size is 100 pages.

With block storage, data is divided into fixed-size blocks, just like the pages of a book in our analogy. These blocks are usually several thousand bytes.

Each block is assigned a unique address, similar to the location of a specific book (or pages of a book) on a shelf. These addresses allow you to quickly find and access individual blocks of data without having to go through the entire storage.

When you want to store or retrieve data using block storage, you interact with the blocks directly. You can write new data to an empty block or overwrite existing data in a block. If you need to retrieve specific information, you can request the block by its unique address, and it will be returned to you.

Hard disk drives (HDD) and solid state drives (SSD) that are attached to a computer physically or via a network are examples of block storage devices.

The main cloud providers all have block storage options:

AWS – Elastic Block Storage (EBS)
GCP – Persistent Disks
Azure – Managed Disks

Block storage devices are usually only attached to a single instance. This is one way in which block storage differs from file storage, which I will explain in the next section.

Physically attached block storage is not persistent, meaning it only lasts as long as the instance is not terminated. Network attached block storage persists beyond the life of the instance.

Bringing back the analogy of storing a fixed number of pages from a book in a bookshelf, block storage will allow you to modify or retrieve specific pages without having to handle the entire book.

However, why would you ever need to do this? Isn’t it better to simply have the shelves store a book in its entirety?

For humans, this is certainly preferred. For computers, storing information in blocks has some advantages.

Since block storage presents raw blocks to the compute instance, the instance has flexibility over how the blocks are managed. This is ideal for applications that require high performance and low latency storage like databases, high performance computing applications and ETL (Extract Transform Load), among other applications.

Block storage devices are also used to store the operating system. They are also bootable. “Bootable” simply refers to the ability of a device to start or initiate the process of loading an operating system or software program when a computer is powered on or restarted.

A bootable device contains the necessary files and data that enable a computer to begin its startup sequence and load the operating system into the computer's memory.

What is File Storage?

Block storage is the lowest level abstraction of storage. It provides a low-level interface where you can read from or write to individual blocks of data. But it does not inherently understand the concept of files, directories, or the hierarchical structure typically associated with file systems.

File storage is an abstraction built on top of block storage. It introduces the concept of files, directories, and a hierarchical structure for organising and managing data.

With file storage, you can group related blocks of data together to form files and organise files within directories. This allows for a more intuitive way to access and manage data, as you can work with files and directories rather than dealing with individual blocks.

Using our bookshelf analogy, with file storage, all you interact with are files and their hierarchies, just like how a book store can organise their books in a structured way to make it easier for customers to find and browse through the selections. Books can be arranged alphabetically by author name, by genre, or a combination of both as shown below.

Books can also be further arranged by sub genres (science fiction, fantasy, mystery etc), bestsellers, new releases, books on sale, staff recommendations and children’s books.

Similarly with file storage, you have flexibility over the hierarchical structure which makes it easier for the users (humans or other applications) to access data.

This hierarchical file structure is just an abstraction. Behind the scenes, the operating system abstracts away the underlying block storage and instead gives the appearance of a file cabinet with a folder-like structure. This simplifies access to applications trying to read or write files on the disk.

Applications don’t need to know the underlying block address to retrieve the files, which makes it easier for the application to interact with the files. This ease of interaction comes with a performance cost, which is acceptable for some use cases.

The main cloud providers all have file storage options:

AWS – Elastic File Storage (EFS)
GCP – Cloud Filestore
Azure – Azure Files

Unlike block storage, multiple compute instances can be mounted on the same file storage device.

What is Object Storage?

This is the newest form of storage on the cloud. Object storage stores all data as objects in a flat structure. There is no hierarchy, unlike in file storage. But an artificial folder-like hierarchy is imparted to block storage to give it the appearance of having a structure.

Object storage is highly scalable. It can store multiple billions of objects. As of 2009, it stored 82 billion objects. It would not be a surprise if this has surpassed trillions of objects as of this writing in 2023.

Objects can be any file. It could be video, audio, image, text file, Excel file, word document, HTML, CSS, XML, JSON, and so on.

Object storage is highly durable – that is, there is a very low probability that any object stored there will be lost.

Object storage offered by cloud providers usually provides 99.999999999% durability over a year. This is colloquially referred to as 11 nines of durability.

Durability is defined as the probability of not losing an object. A storage system that has a durability of 99.999999999% has a 0.000000001% chance of losing a single object in a year. This means that even if you have a million objects stored in object storage, you are likely to only lose a single object in 100,000 years.

This is a remarkable level of durability that is not matched by the other storage systems.

Naturally, all these probabilities are meaningless if Earth is destroyed, since all the servers storing these objects currently reside on a single planet.

The main cloud providers all have object storage options:

AWS – S3
GCP – Cloud Storage
Azure – Azure Blob storage

Patterns & Anti-patterns – When to Use Each Storage Type

The table below summarises the trade-offs between the different storage types plus some use cases.

Note: "Mountable" refers to the capability of connecting or attaching a storage device or file system to a specific location in a computer's file system hierarchy.

When a storage device or file system is "mountable," it means that it can be integrated and made accessible to the operating system and applications running on the computer.

As you can see in the image above, some common use cases for each storage type are as follows:

Block storage: databases, ETL, high performance computing, OS storage, and boot volume
File storage: sharing files across many compute instances
Object storage: large, scalable, and durable storage of different objects (like images, audio, and video), disaster recovery, and archiving data.

Bringing it Together

Let’s imagine a simple scenario where you would need to use block, file, and object storage together.

Imagine you have been tasked with designing the cloud architecture for a law firm. You need to store a large number of evidence files in different formats (audio, video, image, text, Excel, JSON, and so on).

You need to processes these files, extract the useful information, and make it available for further processing by different people. You also need to make it available for direct use to a team of lawyers in different parts of the world.

From a high level, how could you design such a solution?

You can see how to do this in the image below:

First, the raw files could be stored using either object storage or file storage. Object storage is preferable because it is lower cost. It is also ideal for low cost archival storage if you need to keep files for several years.

To process the files, you can run multiple applications on an EC2 instance that uses block storage. The other alternative is to use file storage.

Block storage is preferable because these processing tasks may include things like transcribing audio files, extracting text from images, improving and stabilising video files, a database that extracts data in JSON format and stores it in a relational database, and so on. These are all tasks that require higher performance, which is an ideal use case for block storage.

The processed files are then stored in S3 again before they are loaded into a file system with several instances mounted on it.

The up-to-date processed files must be available for further processing or for direct use by a team of lawyers. Block storage is not ideal here because block storage cannot be shared across multiple instances. Object storage is also not ideal because its not mountable (can’t be attached to a compute instance).

In this case, file storage is ideal because it has non of these constraints – it can be shared across multiple instances and it is mountable.

Summary

In summary, block storage is ideal for high performance applications. File storage is ideal for sharing files across multiple instances.

Like block storage, file storage is mountable – that is, it can be integrated and made accessible to the operating system and applications running on the instance.

Object storage is ideal for low cost durable and scalable storage where high performance is not important.

Thank you for reading!

What is Infrastructure as Code? Explained for Beginners

Daniel Adetunji — Thu, 15 Jun 2023 14:32:46 +0000

Infrastructure as Code (IaC) is a way of managing your infrastructure like it was code. This gives you all the benefits of using code to create your infrastructure, like version control, faster and safer infrastructure deployments across different environments, and having up to date documentation of your infrastructure.

The article will cover how infrastructure as code works using an analogy. We'll cover the different infrastructure as code tools available as well as declarative vs imperative code

I'll also introduce you to Terraform, which is an open source infrastructure as code tool you can use to create infrastructure across multiple cloud providers like AWS, GCP, Azure and others.

Infrastructure as Code in Practice

Imagine you are trying to create a three-tiered web application on AWS as you can see in the image below:

Three tiered web application example

The presentation tier is responsible for presenting the user interface to the user. It includes the user interface components such as HTML, CSS, and JavaScript running on EC2 instances.

The logic tier is responsible for processing user requests and generating responses, by communicating with the database layer to retrieve or store data. This is also deployed on EC2 instances

The database tier is responsible for storing and managing the application's data and allows access to its data through the logic tier. The database runs on AWS RDS.

Each of the instances are in an autoscaling group with a load balancer in front of it (except for the database tier).

If you want to create this infrastructure through the AWS console, you would have to manually click through various screens to spin up the infrastructure. This is fine if it is a one time activity.

But if you need to repeat this across different environments like development and test, or need to add additional infrastructure like caches, queues, firewall rules, IAM or SSL certificates, then it becomes increasingly more complex to manage through the AWS console.

Managing complex infrastructure through the console also introduces the possibility of human error.

Infrastructure as code expresses your desired infrastructure in the language of code. This brings all the benefits of code to managing your infrastructure like:

Version Control – allows you to store the history of your infrastructure and revert to a previous version if needed.
Faster & safer deployments – can recreate infrastructure in new environments quickly and with less errors since every part of the infrastructure is clearly defined in the code.
Documentation – your current infrastructure state is documented and kept up to date automatically whenever you make a change. This keeps your infrastructure documentation detailed and accurate, compared to having the infrastructure written in a document or on a confluence page that may not be updated whenever there is a change.

How Infrastructure as Code Works – Explained with an Analogy

Infrastructure as code allows you to create a detailed blueprint of your infrastructure. This blueprint gives instructions to your cloud provider about the infrastructure you want created.

This is similar to how an architecture blueprint works. It outlines the layout, dimensions, materials, and various components of the structure. The blueprint serves as a reference for architects and engineers to understand the desired construction.

how an architectural blueprint is analogous to infrastructure as code

The blueprint leaves little room for error. It will be interpreted in the same way by any architect or engineer. If you wanted to build exact copies of this house, all you need is the architecture blueprint.

Infrastructure as code, at a basic level, works in the same way as an architecture blueprint. It details the infrastructure you want to create as code in a number of different possible languages (JSON, YAML, HCL, Python, Ruby, JavaScript, and so on), instructing the cloud provider to create your infrastructure exactly as specified.

Declarative & Imperative Infrastructure as Code Tools

There are many IaC options to choose from, and all the major cloud providers have their own dedicated tools:

AWS has CloudFormation
GCP has Deployment Manager
Azure has Resource manager

One limitation of these cloud provider-specific tools is that they can only create infrastructure in their respective clouds. So CloudFormation only works in AWS and Deployment Manager only works in GCP. IaC using these providers is usually written in JSON or YAML format.

Terraform, on the other hand, is open source and you can use it to create infrastructure across all the major cloud providers. It uses HCL (HashiCorp Configuration Language).

Infrastructure as code can also be written using popular languages like Python and JavaScript.

These scripting/programming languages lie on a spectrum of declarative and imperative code as shown below.

A spectrum of declarative & imperative languages and where Terraform HCL fits

The main difference between an imperative and declarative language is that imperative languages explicitly define the control flow. This is simply the order in which instructions are executed in a program. Control flow determines the path the program takes and how it responds to different conditions or events.

In imperative languages, control flow is explicitly defined using control structures such as loops, conditionals, and function calls. Imperative languages give you more flexibility in configuring your infrastructure. This is not necessarily a positive, as more flexibility means more opportunity to introduce errors into your infrastructure.

A declarative language focuses on describing the desired result without giving specific instructions on how to achieve it.

An illustration demonstrating the difference between declarative and imperative languages

An example JSON is shown below, used in AWS CloudFormation to create an EC2 instance:

"Type": "AWS::EC2::Instance",
      "Properties": {
        "ImageId": "ami-0123456789",
        "InstanceType": "t2.micro",
        "KeyName": "my-key-pair",
        "SecurityGroupIds": ["sg-0123456789"],
        "SubnetId": "subnet-0123456789",
        "Tags": [
          {
            "Key": "Name",
            "Value": "MyEC2Instance"
          }
        ]
      }

A declarative language like JSON abstracts away the underlying complexity that details how the EC2 instance will be created. All it cares about is the end state.

Terraform HCL is closer to the declarative end of the spectrum. Terraform allows you to describe the desired infrastructure's final state without specifying the exact steps to get there. Terraform internally manages the execution order, resource dependencies, and handles the infrastructure changes based on the desired configuration.

But Terraform does have support for some imperative features like variables and expressions, allowing dynamic behaviour based on inputs. So, it is not a completely declarative language like JSON.

How Terraform Works

There are two fundamental concepts that serve as a foundation for understanding Terraform:

The configuration file – this describes the desired infrastructure
The state file – this describes the current infrastructure as it exists in the real world

Terraform’s job is to create, modify or delete infrastructure as needed so that the desired infrastructure configuration is met. It does this by executing the necessary API calls to your cloud provider(s) to create, modify, or destroy the resources as specified.

Once the infrastructure has been created/modified/destroyed to match the configuration file, the state file is updated to reflect the current infrastructure.

The terraform plan command creates an execution plan, which lets you preview the changes that Terraform plans to make to your infrastructure.

By default, when Terraform creates a plan, it compares the desired configuration as described in the configuration file, with the current configuration as described in the state file. Terraform then proposes a list of changes needed what will ensure that the current configuration matches the desired configuration.

If you then run the terraform apply command, terraform will modify the real world infrastructure so that it matches the desired configuration, and updates the state file to show the new infrastructure configuration.

At a high level, this is what terraform does:

What happens when you run the terraform apply command

Let’s bring back the architectural blueprint analogy.

The configuration file is like the architectural blueprint. It details the infrastructure that needs to be built, that is the desired construction. The real world infrastructure is the existing construction in the physical world and the state file is a representation of what currently exists – the current blueprint. The engineers work to ensure that the existing construction matches the architecture blueprint.

In this analogy, engineers do the work of Terraform in ensuring that the existing construction matches the architecture blueprint. You don’t need to specify the details of how to build the house, you just need to specify what you want built and the engineers handle the rest.

An architectural analogy to running terraform apply

If you want to learn more about how Terraform works and how you can use it in your projects, you can check out this free course on freeCodeCamp's YouTube channel.

Bringing it Together

Infrastructure as code (IaC) is a great way of managing complex infrastructure configuration in the form of code. This naturally brings all the advantages of code to your infrastructure like version control, faster and safer infrastructure deployments across different environments and up to date documentation of your infrastructure.

Terraform is an open source IaC tool that allows you to work with multiple cloud providers to spin up infrastructure as defined in your configuration files.

Terraform HCL is a declarative language that allows you to describe your desired infrastructure configuration. All you have to do is specify what you want created and terraform handles the creation on your behalf by making API calls to your chosen cloud provider(s).

Cloud Computing Abstractions – IaaS, PaaS, FaaS, and SaaS Explained

Daniel Adetunji — Tue, 16 May 2023 16:31:36 +0000

Abstracting is the process of reducing something to its most basic form. It is the hiding away of the inessential.

For a drawing, this could be reducing it to its basic lines and shapes. Naturally, there are many levels of an abstraction, since what is inessential is subjective.

This is illustrated in the image below showing a Greek temple and two drawings at different levels of abstraction.

Example of abstraction

As you abstract away more of the details of the temple, you are left with something very simple. In its most basic form, a Greek temple can be thought of as a triangle sitting on top of a rectangle with vertical lines going across the rectangle.

Abstractions are everywhere around us. Google Maps is a good example of this. You can have a satellite, roadmap, terrain, traffic, cycling, public transport or street view, among others. Each of these options abstracts away some details, allowing you to focus on what you want to see.

Even a satellite map is an abstraction, since it is a point in time snapshot and cannot capture every new house, tree, or blade of grass.

The key point is that an abstraction simplifies something by hiding away the underlying details. It is a way of managing complexity.

But there is always a price to be paid. In exchange for hiding away complexity, you lose some lower level details which often means a loss of control if things go wrong.

Abstractions in the Cloud

In cloud computing, abstractions are everywhere. When you choose a particular technology to solve a problem, you are implicitly choosing a level of abstraction.

There are four broad levels of abstraction in cloud computing. These are called the service models:

IaaS (Infrastructure as a Service)
PaaS (Platform as a Service)
FaaS (Function as a Service)
SaaS (Software as a Service).

These four broad service models are just a guide for splitting out the different levels of abstraction in cloud computing. You can think of them more like well thought-out opinions, rather than some hard rule of physics.

Some people only consider IaaS, PaaS and SaaS as the service models, ignoring FaaS. Others will include Container as a Service, Security as a Service, Database as a Service, and so on.

The examples can go on and on by simply appending “as a service” to different technologies. This makes sense since you can abstract at different levels, which makes any abstraction of the cloud a spectrum of possibilities.

Just like the Greek temple shown above, there can be many intermediary levels of abstraction between a life-like drawing of the temple and a drawing with a triangle sitting on top of a rectangle.

When you choose to use cloud computing instead of an on-premise solution, you are effectively choosing to abstract away some of the underlying tasks and pieces of infrastructure that you would otherwise need to manage.

The figure below shows the differences between an on-premise solution and IaaS, PaaS, FaaS, and SaaS.

Figure illustrating the differences between an on-premise solution vs IaaS, PaaS, FaaS, and SaaS.

As you move to the right in the above illustration, you abstract away more of the underlying infrastructure stack. This reduces the complexity of what you are trying to build, since there are fewer things to build and manage.

But the price you pay for this reduction in complexity is a loss of control. Sometimes, that is a worthy price to pay, and sometimes it is not.

On-Premise Solutions

You are responsible for managing everything in the infrastructure stack, from the physical security of the data centre to the application itself.

In this case, almost nothing is abstracted away. This gives you increased control and flexibility to customise what you want. In exchange for that, you pay the price of managing the entire stack and bearing the risks associated with that.

This is analogous to opening a pizza restaurant, but instead of just renting some space, you build the restaurant from scratch.

Building a pizza restaurant from the ground up is like managing on-premise solutions

The upside is you have full control over how the restaurant will look. The downside is the large upfront expenditure you will need to make for plumbing, ventilation, electrical wiring, air conditioning, heating, and so on.

You will also be plagued with questions like “Is the restaurant big enough, or is it too big?” or “How do I expand or scale down based on growing or falling demand?”. Large upfront costs and higher levels of uncertainty are the price you pay for full control.

IaaS (Infrastructure as a Service)

The physical security, data centre infrastructure, networking, servers and virtualisation (the process of creating multiple virtual machines out of a physical server) is abstracted away and managed by the cloud provider.

You are responsible for managing the operating system and everything above it in the infrastructure stack. You still get to customise what virtual machine you want, based on the choices made available by the cloud provider, and you will pay to use the virtual machine on a pay-as-you-go basis.

You don’t have to worry about purchasing more servers or cooling requirements for your servers. All of that is abstracted away and managed for you.

Virtual machines/instances are a good example of IaaS – EC2 from AWS, Compute Engine from GCP, and VMs from Azure.

IaaS is analogous to simply renting some space for your pizza restaurant. The electrical wiring, plumbing, heating, and so on is abstracted away since it is managed by the owner of the building.

You are responsible for paying rent to use the space, hiring chefs, a manager, waiters and cleaners, buying equipment and furniture, choosing decor, building a menu, marketing and getting customers through the door.

Still a lot of work, but all of the non-pizza making activities are hidden away, allowing you to focus on doing what you do best – making pizza.

PaaS (Platform as a Service)

Here, you manage the runtime and everything above it. The runtime is a software environment that provides the necessary resources and services for an application to run. Examples include the Java Virtual machine for Java applications, Python runtime for Python applications, and Node.js for JavaScript applications.

With PaaS, you have abstracted away all of the physical infrastructure. All you need to worry about is your runtime.

Good examples of PaaS are AWS Beanstalk and GCP App Engine. Also, managed database services like AWS RDS and GCP Cloud SQL fall under PaaS.

PaaS is analogous to opening a franchise pizza restaurant. When you open a franchise, you are provided with a pre-built restaurant space, equipment, branding, and a set of processes to follow. You are responsible for the core activities of running the restaurant such as hiring staff, managing inventory, and creating menus.

PaaS works in a similar way, providing developers with a pre-built platform that abstracts away the underlying infrastructure, allowing them to focus on building and deploying applications.

FaaS (Function as a Service)

Here, you manage the functions and the application while the cloud provider manages the rest.

What exactly is a function and how is it different from a runtime? A function is a block of code that performs a specific task, while a runtime is the environment in which that code is executed.

Functions are typically triggered by events such as HTTP requests, database updates, or messages from a queue. When an event occurs, the function is automatically executed, and the result is returned to the calling application.

This is analogous to hiring a freelance pizza chef to cook for you on-demand. When you hire this freelance chef, though, you only pay for the time they spend cooking. The chef starts getting paid in reaction to an event, that is the moment an order comes in, and stops getting paid once the pizza is ready. The rest of the time, the chef is just idle, waiting for the next order but not costing you any money.

Ignoring the corporate sleaze and potential illegality of such a practise, doing something like this will save you money since you are only paying for the duration of a pizza being made.

SaaS (Software as a Service)

Here, you don’t manage anything, but simply consume the service offered. Prime Video, Gmail, and Outlook are great examples of SaaS. When you use these, you don’t care about how the application works. All of that is abstracted away. You simply access the software through a web browser or a mobile app and use as needed.

Drawing on the restaurant analogy, this can be compared to simply ordering a pizza from the restaurant. The restaurant abstracts all the the steps needed to make the pizza.

Examples of IaaS, PaaS, FaaS & SaaS

The table below shows examples of IaaS, PaaS, FaaS and SaaS offerings from the main cloud providers – AWS, GCP & Azure.

Bringing it Together with an Example

If you are building a e-commerce site like Amazon.com, you will need a transactional database to store details about customers like their names, payment details, address, orders, inventories, and so on. How do you choose the right level of abstraction for your database?

You have four options to choose for you database. Starting from the option that abstracts the most from the infrastructure stack:

You could choose the use a FaaS option like AWS Aurora Serverless. This automatically starts up the database when it is being used and shuts it down when not in use, allowing you to save money by only paying for when it's in use. This is ideal for an infrequently used database with unpredictable workloads
You could choose a PaaS option like AWS RDS. This is a managed database where AWS abstracts away and manages administrative tasks like OS patching, scaling, database backups and other admin tasks that would otherwise require a database admin (DBA) to manage
You could choose an IaaS option by installing a relational database management system (RDBMS) like MySQL on an EC2 instance. AWS will manage the hardware, but you will be responsible for managing the OS and the database application. So, admin tasks like OS patching, scaling, database backups, among others, will be your responsibility
You can choose an on premise solution. Here, you will self-host the database and manage the hardware yourself, in addition to all the database admin tasks as described above

Which is the right option to choose? First, it depends on your use case and the benefits and tradeoffs you are comfortable with. This is trite but nevertheless true.

However, a good heuristic that will work most of the time for most problems is to focus on what to avoid. You generally want to avoid an extreme or outlier solution, unless the problem you are trying to solve is indeed extreme or an outlier. And most problems, by definition, cannot be outliers.

The FaaS option using Aurora, and the on premise solution are not ideal, unless your use case specifically demands the features that these options posses.

Aurora serverless is not a very popular service, so finding patterns for integration with other technologies or help with troubleshooting technical problems may be more difficult. Also, there can be some technical issues with using a serverless database like Aurora.

For example, waking it up from an idle state in response to a request can sometimes take a few seconds. And this can be a delay long enough to lose a customer on your e-commerce application.

The on premise solution is not ideal either, because an e-commerce application will have fluctuations in demand as a result of holidays, discounts or some product going viral. On premise solutions are bad at handling large fluctuations in demand.

This simple heuristic of focusing on what to avoid yields two acceptable solutions – the IaaS option of running your database on an EC2 instance or the PaaS option of using a managed database service like RDS. Either of these is fine for the use case described.

The key point to remember is that an abstraction simplifies something by hiding away the underlying details. It is a way of managing complexity.

The higher abstraction solution of using RDS is the less complex solution, since AWS manages all of the underlying complexities of OS patching, scaling, database backups and other admin tasks. The price you pay for this reduction in complexity is less control of the database and a higher AWS bill.

I hope this helps you choose the solution that's right for you. Thank you for reading!

How to Deploy Changes to an Application – Deployment Strategies Explained

Daniel Adetunji — Wed, 26 Apr 2023 14:41:00 +0000

When deploying changes to an application, there are several strategies you can use.

In this article, I'll explain the different strategies with an analogy, and then we'll analyze the benefits and tradeoffs.

Deployment Strategies

Imagine you are the manager of a popular pizza restaurant that is open 24/7 for deliveries. This restaurant has two chefs working in the kitchen and both are needed to ensure orders are fulfilled on time.

You have a new special recipe that will change how all pizzas are made. This new recipe involves using a different dough to make the pizza bread, using a different type of cheese, new toppings on the pizza, and changes to the pizza oven settings.

These are significant changes that you hope will lead to more delicious pizzas being made, which equals happier customers, which hopefully translates to more money.

This new recipe is quite complex and will take an hour for a single chef to learn. How do you teach the chefs this new recipe? Remember that this restaurant must be open 24/7. Your approach will be based on whether you are trying to:

reduce the time it takes for both chefs to learn the new recipe
ensure you have enough chefs to fulfill orders while once chef is learning the new recipe
keep costs low during the recipe change
be able to quickly revert back to the old recipe
test the new recipe with a small subset of your customers

You would make a similar set of trade-offs when deciding on an application deployment strategy. Do you want to:

minimise deployment time
have zero downtime
ensure capacity is maintained
reduce deployment cost
be able to rollback or easily revert changes
test the change with a small subset of your users

The trade-off comes because you can’t have it all. As an example, having zero downtime, ensuring capacity is maintained and having the ability to rollback comes at the price of a longer deployment time and higher cost.

I'll explain the logic behind this example using the blue/green deployment strategy example. Ultimately, there are no solutions, only trade-offs.

We'll use a three-tiered web application as the example architecture for the different deployment types. This consists of a presentation, logic, and database tier as shown below.

Example application architecture

The presentation tier is responsible for presenting the user interface to the user. It includes the user interface components such as HTML, CSS, and JavaScript.

The logic tier is responsible for processing user requests and generating responses, by communicating with the database layer to retrieve or store data.

The database tier is responsible for storing and managing the application's data and allows access to its data through the logic tier.

All At Once Deployment

In this type of deployment, you make changes to all instances of an application at once. In the three-tiered web application architecture, an all at once deployment that makes changes to the UI will take both instances in the presentation tier out of service during the deployment, as shown below.

Illustration of all at once deployment strategy

This type of deployment has some pros:

deployments are fast
deployments are cheap

And some cons:

downtime during deployment
a failed deployment will have further downtime since you will need to rollback by deploying the previous version of the application to the instances
rollbacks are manual

An all at once deployment is ideal in a situation when a deployment needs to be made quickly. It is also ideal for situations when there is a low impact of something going wrong. So for example, deployments in non-live environments like development and test environments, that don’t have any real users.

Any use case where the cons listed above are not acceptable would be an anti-pattern for an all at once deployment.

An all at once deployment is analogous to the two chefs being told to stop taking new orders as well as stopping any orders they were currently working on to learn the new pizza recipe. Then they would use that that recipe going forward.

While they are learning the new recipe, orders will go unfulfilled. If they can’t quite get to grips with the new recipe, any pizzas they make will also not be as good, will take longer to make, or both.

Also, if you later found out that customers do not like how the new pizza's taste, you have to revert back to the old recipe. This means restocking your kitchen with the previous dough and cheese you used and getting rid of the new toppings.

This is not an ideal way of making a recipe change as you can lose customers if they don’t like the taste of the pizza.

On the plus side, this approach is cheap, in terms of up front cost at least. If it goes wrong, it can be very expensive as a result of lost future sales and upset customers.

It is also fast to implement. If it takes each chef an hour to pick up the new recipe and you show them both at the same time, the new recipe can be ready to go live in an hour.

Rolling Deployment

In a rolling deployment, you make changes to an instance or a batch of instances at the same time. In the three-tiered web application example, UI changes will first be deployed to one instance and once that is complete, it will be repeated on the other instance.

Illustration of a rolling deployment strategy

With this approach, you avoid downtime as changes are only made to one instance at a time. The drawback is that deployments will naturally take longer since you have to wait for the first deployment to finish before deploying to the second instance.

Bringing back the chef analogy, the new recipe will only be shown to one chef at a time. This means a reduced capacity to deal with orders, but orders will still be fulfilled since there will always be at least one chef available.

Rolling with Additional Batch Deployment

This is similar to a rolling deployment, but an additional instance is added into the cluster during the deployment to maintain capacity, as shown below.

Illustration of rolling with additional batch deployment strategy

First you launch a new instance and then deploy the new application there. After the deployment is successful, you terminate an instance running the older application.

These three steps of launching a new instance, deploying the new application there, and terminating the old instance are repeated until you have deployed the new application on all the instances.

The key point to note with this approach is that by adding a new instance with the new application version before terminating any instances, you are always maintaining capacity. If you need two instances running at the same time, this deployment strategy will ensure you always have two instances available. This is useful for applications that require high availability.

With this approach, some users will be routed to different instances during the deployment. This means customers will see different UI on the web page – some will see the old, others will see new UI while the instances are still being updated.

If a consistent user experience is absolutely necessary for all your users at all times, this deployment may not be right for you.

The rolling with additional batch deployment is analogous to hiring an extra chef to show the new recipe to while the two existing chefs still fulfill pizza orders. Once this new chef is familiar with the new recipe, orders are routed to him and one of the existing chefs. The third chef is then told to go home.

This is repeated until both chefs in the kitchen are new and familiar with the new recipe. But while this transition is happening, there are always a minimum of two chefs who can fulfil pizza orders in the kitchen.

Canary Deployment

The phrase ‘canary in the coal mine’ originates from an old practice in coal mining where miners would take a canary into the coal mine as an early warning alarm.

Canaries are highly sensitive to toxic gases like methane and carbon monoxide, which humans can’t easily detect, as they are odorless and colorless. The canary dying was a signal to evacuate the mine, since dangerous levels of toxic gases had built up to levels high enough to kill the bird. This was an effective, albeit brutal way of signalling potential danger to the miners.

In canary deployment, a separate set of instances will have the new application deployed on them, and a small percentage of all visitors will be routed to the new version. This can be done with the weighted routing option using Route 53 (managed DNS service from AWS). With weighted routing, you can specify a weight for each target load balancer.

Illustration of the canary deployment strategy

In this example, Route 53 will initially point 90% of all users to the old application and 10% to the new application. The new application will then be closely monitored to see metrics like error rates, response times, and so on. If any issues arise with the new application at this stage, then the weights are simply updated so that all traffic points back to the old application.

Just like the canary in the coal mine, the initial monitoring on a small set of users serves as a cheap signal to give you confidence to either continue the transition to the new application, or revert back to the old.

For critical applications that cannot afford any downtime or other issues, this is an effective way of managing the risk of a new deployment while being able to immediately revert back to the old application.

If everything looks fine with the new application during the initial testing with a small number of users, then you can slowly increate the percentage of users routed there. As you gain confidence in its performance, you can eventually route all users to the new application and terminate the old instances

Blue/green Deployment

Blue/green deployment involves creating two identical environments: a "blue" environment which hosts the current version of the application, and a "green" environment which hosts the new version of the application. This is shown in the image below:

Illustration of blue/green deployment strategy

Once the new version of the application is deployed to the green environment, the Route 53 DNS record is updated to only point to the load balancer of the green environment in front of the presentation tier, as shown below. The instances of the presentation tier in the blue environment can also be stopped to save cost. You can restart them again when there is a new version of the application to deploy.

Example of blue/green deployment with presentation tier instances in separate environment

In this example of blue/green, only the instances in the presentation tier are in a separate environment.

But you could have an identical copy of the blue and green environments across all tiers. This would make it so that if you were making changes to the logic or database tiers of the application, there would also be no downtime during deployment, with the ability to easily rollback. You can see that scenario below:

Blue/green deployment with no downtime and easy ability to roll back

The main benefit of blue/green deployment is zero downtime during deployments, since all you have to do is update the DNS record to point to the load balancer of the ‘green’ environment.

Blue/green is similar to canary deployment, but instead of initially sending a small percentage of users to the new version of the application, all users are sent to the new version once it is deployed and thoroughly tested. There is no live testing with real users in a blue/green deployment.

Blue/green deployment is analogous to having two restaurant branches, each with two sets of chefs there. The ‘blue’ restaurant uses the current pizza recipe and all takeaway orders are at first routed to this restaurant as shown below.

The ‘green’ restaurant has perfected the new recipe and is ready to receive orders. Customer orders are then routed to this restaurant as shown below.

If customers complain about the delivery time or quality of the pizza (which shouldn’t happen if the new recipe has been tested with real customers beforehand), the manager can simply route the orders back to the blue restaurant making the old recipe. Then they can figure out what went wrong with the new recipe, make some tweaks, and try again.

Bringing it Together

The right deployment strategy for your application depends on what you are trying to optimise for.

All at once deployments are ideal is you want to minimise deploy time and upfront cost. The price you pay, however, is application downtime, with further downtime if the deployment fails (as well as a manual rollback process).

Rolling deployments will take longer to deploy than an all at once deployment. However, there will be no downtime since deployments are made incrementally on an instance or a set of instances. But there will be reduced capacity during deployment, so this may not be ideal for an application that requires high availability.

Rolling with additional batch deployment addresses the issue of reduced capacity with a rolling update. An additional instance or batch of instances with the new version is added to the cluster in order to maintain the same capacity. Only then are instances running the older version of the application terminated.

Canary deployment has no downtime and no reduced capacity during deployment. It is also safer as it allows for testing with a fraction of the users and closely monitoring performance before gradually routing all users to the new version.

But this does not come for free. Additional infrastructure is required. Also, detailed monitoring and observability of the application has to be in place. This means it is more expensive and more complex to deploy using this strategy.

It is important to caveat ‘more expensive’. This approach will incur higher upfront costs, but for a critical application with lots of users that cannot afford any downtime, it could be more expensive (through lost future revenue, unhappy customers or a ruined reputation) to use another deployment strategy that is ‘cheaper’ but ultimately less robust to failures.

Finally, blue/green is ideal for zero downtime deployments that are easy to rollback. It however requires additional cost for a separate set of identical infrastructure to be provisioned.

Thank you for reading!

Symmetric and Asymmetric Key Encryption – Explained in Plain English

Daniel Adetunji — Wed, 05 Apr 2023 20:09:48 +0000

Encryption is a way of scrambling data so that it can only be read by the intended recipient.

Encryption is an integral part of our daily lives – whether you are sending messages to friends on WhatsApp, visiting a website and your browser is making sure it's legitimate, or entering your bank details when buying something online. Encryption protects your data from potentially malicious and prying eyes.

This article will cover:

Encryption algorithms and keys
Symmetric and asymmetric key encryption
How TLS/SSL uses both symmetric and asymmetric encryption

Encryption Algorithms and Keys

At the start of this article, I described encryption as a way of scrambling data so that it can only be read by the intended recipient. Let’s break down what this means.

Let's say you want to write a letter to your friend and want to ensure that only the friend can read its contents. How would you prevent the prying eyes of all the intermediaries the letter could pass through before it gets to your friend? That is, how do you prevent the postman, the concierge in their building, or one of their friends from reading the letter?

You start with an unscrambled letter that anyone can read. This is called plaintext. To scramble the contents of the message, you need an encryption algorithm and a key. The encryption algorithm uses the key to scramble the contents of the message. This encrypted message is called ciphertext.

The process of encryption is shown in the image below:

When your friend gets the message, they will need to descramble it using the algorithm and the key. This is illustrated below:

The two key ingredients needed to send a message to your friend that only they can read is an encryption algorithm and a key.

The encryption algorithm is simply a mathematical formula designed to scramble data, while the key is used as part of the formula. The encryption algorithm is generic, but the key, used as an input to the algorithm, is what ensures the uniqueness of the scrambled data.

Let’s look at one of the simplest encryption algorithms, called the Caesar Cipher. In its simplest form, this algorithm simply replaces each letter by the next letter in the alphabet. So A becomes B, and B becomes C and so on.

With this algorithm, the text ‘Birthday Surprise’ becomes ‘Cjsuiebz Tvsqsjtf’, indistinguishable from gibberish to the untrained eye.

With the Caesar Cipher example, the algorithm is the formula used to replace each letter of the alphabet with another. The key is the number of shifts made between each letter. With a key of 0, A is A, an obviously poor choice of key as the data is unscrambled. With a key of 1, A becomes B. With a key of 10, A becomes K.

The Caesar Cipher is a relatively poor encryption algorithm. Why? Since there are only 26 letters in the English language, you can only produce a maximum of 25 possible ciphertexts. If you don’t have the key, you only need to shift each letter up to 25 times until you see coherent words and sentences, at which point you know that you have successfully decrypted the message.

A bad encryption algorithm is one that is easily decrypted by using a small amount of brute force (that is, trying every possible permutation) – and 25 possible ciphertexts is an objectively small number of possible options to go through.

Modern encryption algorithms like AES-256 used by AWS, GCP, and Azure for encrypting data are considerably more complicated and secure than the Caesar Cipher. Based on current computing capability, it would take trillions and trillions of years for the most advanced supercomputer to use brute force to decrypt data encrypted using AES-256 [1]. Even the universe is not that old.

Symmetric and Asymmetric Key Encryption

The core of any encryption process is the encryption algorithm and the key. There are many types of encryption algorithms. But there are, broadly speaking, two types of keys – symmetric and asymmetric keys.

In symmetric key encryption, the same key used to encrypt the data is used to decrypt the data. In asymmetric key encryption, one key is used to only encrypt the data (the public key) and another key is used to decrypt (the private key).

Asymmetric key encryption

First, let’s look at asymmetric key encryption with a simple analogy.

Imagine you wanted to send something to your friend, but it was absolutely essential that nobody else, except your friend, could have access to that object. So, your friend buys an indestructible box, fabricated from the strongest metal on the planet, and sends it to you so that you can place the object in it. Your friend also sends you the key that can only be used to lock the box.

Now, this box has one more special property. It has two keyholes. One keyhole to open the box, another to lock the box.

Naturally, this box will also need two keys – one to open and another to lock it.

Both keys are similar, but not identical. As you can see in the image above, for example, the key used to open the box has two prongs while the key used to lock the box has three prongs.

As the sender of the object, all you have is the box to place the object in and a key to lock the box. Only your friend has the key that can unlock the box.

The key used to lock the box is called the public key, and cannot be used to open it, as that requires the private key. If anyone intercepted the package and made a copy of the public key, it could not be used to open the box, only to lock it. Only the person who holds the private key can open the box.

Asymmetric key encryption is used when there are two or more parties involved in the transfer of data. This type of encryption is used for encrypting data in transit, that is encrypting data being sent between two or more systems. The most popular example of asymmetric key encryption is RSA.

Symmetric key encryption

Symmetric key encryption uses the same key for encryption and decryption. This makes sharing the key difficult, as anyone who intercepts the message and sees the key can then decrypt your data.

This is why symmetric key encryption is generally used for encrypting data at rest. AES-256 is the most popular symmetric key encryption algorithm. It is used by AWS for encrypting data stored in hard disks (EBS volumes) and S3 buckets. GCP and Azure also use it for encrypting data at rest.

How TLS/SSL Uses Both Symmetric and Asymmetric Encryption

The main strength of symmetric key encryption is that it is computationally easier and faster to encrypt and decrypt data using a single key, just as it is easier to build a box with a single lock and key.

The weakness of symmetric key encryption is that if the key is exposed, your data is no longer securely encrypted. So, if you needed to share the key with an external party, there is a risk that the key could be exposed, leaving your data at risk of being decrypted.

Symmetric key encryption is ideal for encrypting data at rest, where you do not need to share the key with another system.

With asymmetric encryption, this is not a problem since two separate keys are used – the public key to encrypt data and the private key to decrypt data.

The public key can be easily shared with anyone and poses no risk to your data being decrypted, since the private key is needed for decryption.

The drawback of asymmetric key encryption is that the encryption and decryption process is slower and more complicated. Asymmetric key encryption is ideal for encrypting data in transit, where you need to share the key with another system.

What if there was a way of getting the speed and computational simplicity of symmetric encryption without increasing the risk of exposing your keys?

TLS/SSL encryption use both symmetric and asymmetric keys to encrypt data in transit, and is used with the HTTP protocol for secure communications over a computer network.

TLS/SSL Encryption Explained

TSL (Transport Layer Security) and SSL (Secure Sockets Layer) are often used interchangeably to mean the same thing. But when people say SSL, they often mean TLS.

TLS is generally considered more secure than SSL due to several improvements made to the protocol, such as stronger cryptographic algorithms. Due to security concerns with SSL, most modern web browsers and applications have dropped support for SSL and only support TLS. As a result, TLS has become the standard for secure communication over the internet.

How to Use Symmetric and Asymmetric Encryption at the Same Time

Let's say you want to securely send a parcel to your friend. But you don’t want to keep using the special indestructible box that has two keyholes and two locks. It is expensive, heavy and impractical to use for frequent communications. You still want to use an indestructible box, but one that is simpler, with a single lock and key.

However, if you are using a box with only a single lock and key, you now need to figure out how to securely share the key for that simpler box with your friend.

Since the same key is used to both open and lock it, you cant just send the key to your friend without somehow protecting it first. If the key is intercepted and a copy is taken by someone, they can now open your box and take what is inside.

How can you securely share this key with your friend so that you can use this simpler box for future communication?

First, your friend sends the box with the two locks plus the public key used to lock it. But you don’t want to keep using this box. You will only use this box once – to transfer the key for another simpler box that you will use for future exchanges.
You place the master key that will be used in future exchanges inside this box and lock it with the public key sent by your friend.
You send the locked box which contains a copy of the master key inside back to your friend.
Your friend uses his private key to open the box. Now you both have the master key and can be sure no one else has it since it was sent in a secure box
All future items are then placed in this simpler box with a single lock and key which can be opened and locked using the master key you just sent to your friend.

TLS/SSL Encryption Sequence

The analogy in the previous section neatly maps to how TLS/SSL encryption actually works. But there are some prerequisite steps which I ignored in this analogy, like creating a TCP connection and the server sending its certificate (Steps 1 and 2 below).

Also, Step 6 is a simplification of the process. In reality, the master key is used to generate a further set of keys that the client and server will use to encrypt and decrypt messages and also to authenticate that the messages were indeed sent by the client and server.

To read more about the low level detail, I’d recommend Chapter 8 of "Computer Networking" by Kurose & Ross.

But, at a high level, the sequence is as follows:

Client establishes TCP connection with the server
Client verifies that the server is who it says it is – server sends certificate which has the public key. The accompanying private key remains with the server.
Client creates a master secret key and uses the server's public key to encrypt it. This master secret key is a symmetric key so the same key is used for encryption and decryption.
Client sends the encrypted master secret key to the server.
Server decrypts the encrypted master key using its private key.
All future messages between client and server now use the symmetric master key to encrypt and decrypt messages.

Best of Both Worlds

Using both symmetric and asymmetric key encryption gives you the speed of symmetric key encryption without compromising on the extra security provided by asymmetric key encryption.

But nothing comes for free, of course. With TLS, there is an added layer of complexity since you need to first use asymmetric keys to establish a secure connection before exchanging the symmetric key for future communication.

So by using both symmetric and asymmetric encryption, TLS/SSL gets the best of both worlds with limited downsides.

How to Select the Right EC2 Instance – A Guide to EC2 Instances and Their Capabilities

Daniel Adetunji — Thu, 15 Dec 2022 19:08:27 +0000

EC2 (Elastic Compute Cloud) is the most widely-used compute service from AWS. It's also one of the oldest services launched by AWS, as it was started in 2006.

In this article, I will go through some things you should consider when selecting an EC2 instance.

You can think of an EC2 instance as not too different from your personal computer. If you are going to buy a computer, three broad technical considerations may cross your mind (ignoring any aesthetic or design preferences you may have, of course):

How much processing can it handle?
How much memory does it have?
How much storage does it have?

These three questions should also cross your mind when selecting an EC2 instance. The difference being, you are only renting the instance from AWS, instead of buying it as you would with a personal computer.

Each EC2 instance is composed of:

CPU – how much processing can be achieved
Memory
Storage – this only applies to some instances that have physically attached storage (called the instance store). For other EC2 instances, you'll need to choose network storage using EBS (Elastic Block Storage) separately.

Compute, Memory, and Storage – An Analogy

A good analogy for an EC2 instance is your work desk.

Your brain is the compute, the surface of your desk is the memory, and your desk drawer is the storage. Note that this analogy (like all analogies) has its limitations. Its purpose is to neatly spilt the role of compute, memory, and storage in an EC2 instance.

What exactly does compute mean? Compute is concerned with parallelism – the ability to execute multiple tasks simultaneously.

Human brains can handle some level of parallelism. You might be able talk on the phone while taking notes simultaneously, for example. You cannot, however, write two different letters simultaneously, or talk on the phone while taking notes and reading a book.

These activities cannot be executed in parallel because our brain can be crudely thought of as a CPU with a single core. To increase compute, we need to increase parallelism, and this can be achieved by having multiple CPU cores. More cores equals more parallelism which equals more compute power.

Memory and storage are theoretically the same thing. We use them both for storing data. Practically, though, they are physically distinct pieces of infrastructure simply because there is no single storage device that is both fast and non volatile.

Memory is fast and volatile while storage is slow and non-volatile. Things kept on the surface of your desk are quickly and easily accessible, just like data in a computer’s memory. But anything left on your desk overnight in a busy office is at risk of being moved, lost, or stolen. The surface of your desk, just like a computer’s memory, is volatile.

Storage, on the other hand, is non-volatile but slower to read/write from. Just like items in your desk drawers are less likely to go missing but take longer to get your hands on.

How to Select the Right EC2 Instance

So, CPU, memory, and sometimes storage are the three levers you can pull when selecting an EC2 Instance. Recall that storage is often selected separately from the EC2 instance using EBS volumes, except for storage optimized instances that have physically attached storage.

When you select an instance type, you are effectively selecting for the lowest price per unit of the metric most important for your workload. This metric can be CPU/GPU performance, memory, or storage.

There are five AWS instance types:

general purpose: By choosing a general purpose instance, you are taking a balanced approach and not optimizing for any one metric.
compute optimized: By choosing a compute optimized instance, you are optimizing for the lowest price per unit of CPU performance (number of CPU cores).
accelerated computing: By choosing an accelerated computing instance, you are optimizing for the lowest price per unit of GPU performance (think of this as a specialised CPU needed for high performance compute workloads).
storage optimized: By choosing a storage optimized instance, you are optimizing for the lowest price per unit of storage capacity and efficiency.
memory optimized: And by choosing a memory optimized instance, you are optimizing for the lowest price per unit of memory.

Let’s go through the instance types in more detail.

AWS has a great outline of this here that I have summarised below:

General Purpose – For workloads that require a balance of compute, memory and networking. Ideal use case is web servers.
Compute Optimized – For workloads that require high performance processors. Lowest dollar cost per number of CPU cores. Ideal for compute intensive workloads like scientific modelling and gaming.
Accelerated Computing – For workloads that require even larger amounts of compute resources than compute optimized instances. This type of instance uses GPUs (graphical processing units) which is a specialised CPU designed for machine learning and high performance computing workloads.
Storage Optimized - For workloads that require high rates of reads and writes for large amounts of data, that is high IOPS (Input/Output Operations per second).

Unlike other instances, these do not use separate EBS volumes for storage. Instead they come with physically attached storage volumes (called the instance store). This means that data does not have to go through a network, allowing it to achieve high IOPS.

Ideal use case is NoSQL databases – like Elasticsearch, MongoDB, Cassandra and some data warehousing applications. Instance store volumes, however come with a catch: any data stored there does not persist beyond the life of the instance. So, if the instance stops, hibernates, terminates or fails, you lose all data on that instance.

The ideal use case for storage optimized instances is thus for workloads that require high IOPs and can tolerate the failure of an instance (usually by having data replicated to another instance for redundancy). 5. Memory Optimized – For workloads that require large amounts of RAM. Lowest dollar cost per unit of RAM. Ideal for in memory databases, caches, SQL databases.

Anatomy of an EC2 Instance Name

You may have come across EC2 instance names like t2.nano, r6a.large or i3en.6xlarge. What exactly do the letters and numbers mean?

Let’s take a complex name like i3en.6xlarge as an example and break it down.

Anatomy of an EC2 instance name broken down

Instance Family

Reading from left to right, the first letter is the instance family. Every family belongs to only one of the instance types, that is general purpose, compute optimized, accelerated computing, storage optimized or memory optimized.

There is no need to cram trying to learn which instance family belongs to which instance type. As you work more with AWS, it will become almost second nature. You can have a look here for reference if you'd like.

The i3en.6xlarge instance above belongs to the “i” family, which is a storage optimized instance.

Instance Generation

This is a number that shows the instance generation. The higher the number, the more recent the generation.

When given the option between different generations for the same instance, you should, ideally, always select the latest generation. The latest generation instance usually comes with the latest hardware. This typically means lower cost per unit of performance relative to older generations.

The i3en.6xlarge instance in the example above is a third generation instance.

Special Features

These are optional letters that come after the instance generation. Each letter denotes some special feature about the instance.

In this case the “e” signifies extra capacity (can be RAM or storage) and “n” signifies that the instance is network optimized. This means that it has high network bandwidth, meaning the instance can handle a high data transfer rate, typically measured in Gb per second.

Other special feature characters and their capabilities are as follows:

a – AMD processors
g – AWS Graviton processors
i – Intel processors
d – Instance store volumes
b – Block storage optimization
z – High frequency

These extra features do not come for free, so only select an instance with extra features if you need these additional features.

Instance Size

The size appears after the full stop. It consists of two parts: a number and letters denoting size. The size options range from nano to xlarge (extra large).

The number only appears with xlarge instances. It denotes how much larger the instance is compared to an xlarge. So a 2xlarge is twice as large as an xlarge and a 6xlarge is six times as large as the xlarge.

But, what does twice or six times as large really mean?

For the same instance type, the number after the full stop acts as a multiplier for the compute (number of vCPUs), memory (amount of RAM), and storage size (not all the time – some instances use EBS volumes where storage can scale independently of the instance. Storage optimized instances, on the other hand, use physically attached instance storage that scales based on the instance size.).

An i3en.xlarge instance has 4 vCPUs, 32 GiB memory, and 2500 GB storage capacity. An i3en.6xlarge is six times larger since it has six times the number of vCPUs (24), six times the memory (192 GiB), and six times the storage capacity (15,000 GB).

Bringing it All Together – How to Select Your Instance

So, let’s say you need to select an EC2 instance for your web server, or your NoSQL database – what are some logical steps to follow?

Step 1: Select instance type

Choosing between general purpose, compute optimized, accelerated computing, storage optimized and memory optimized is the first and most important decision. Every subsequent decision will be driven by this one.

Here, the decision you are making is primarily one of cost - you are trying to optimize for the lowest dollar cost per unit of the metric that's most important for your workload.

If your workload is generic, like a web server, choose a general purpose instance. If your workload is compute intensive, go with a compute optimized instance type. The same logic applies if your workload is memory or storage intensive.

Step 2: Select instance family

A good mental model for choosing the right instance family is to go through the technical documentation for the application you plan to run on that instance and use their recommendation.

For example, Elasticsearch (a full-text search engine database) recommends the “i” family of instances – specifically the “i3”. Recall that the number after the instance family is simply the instance generation, and the latest is usually the greatest.

When a newer generation of the “i” family arrives, Elasticsearch will likely recommend the “i4” instance. You can reason by analogy when selecting the instance family. Look at what the application recommends, as it's a great way to reduce any errors of omission or commission.

The company behind the application will have a lot of experience testing different families and will have done the experimentation on your behalf. No need to re-invent the wheel (unless, of course, your workload is truly niche and no best practice exists).

Step 3: Select an instance with special features

Do this only if absolutely needed. You will be paying extra for this.

Step 4: Select an instance size

This is purely specific to your workload and is usually an iterative process. You can run some tests while monitoring CPU and memory utilisation to see if the size you selected is appropriate.

You usually try to have some safety margin, so if your workload is consuming, on average, 90% of memory and CPU, you may need to choose a larger instance. A utilisation of 90% does not provide much headroom for any estimation errors you may have made during testing.

Deciding on the amount of headroom you need is more art than engineering, so there are no hard qualitative numbers on this. But as a rough guide, utilisation in the 90% range is bad, 80% range is acceptable, and 70% and below is good.

You need to provide some headroom to prevent any performance problems from occurring during peak demand.

Wrapping Up

When you select an instance type, you are effectively selecting for the lowest price per unit of the metric most important for your workload. This is an important foundation in any project you are working on, as it ensures you are paying the lowest dollar amount per unit of performance.

Selecting the instance size is the most difficult piece of the puzzle and is likely to be an iterative process where you start small, test, and then scale up as required.

Daniel Adetunji - freeCodeCamp.org

How Auto Scaling and Load Balancing Work in Software Architecture

Table of Contents

Auto Scaling Explained

Dynamic Scaling

Scheduled Scaling

Why Use Auto Scaling?

Improve Resilience

Reduce Cost

Load Balancing Explained

Why Use Load Balancing?

Bringing it Together – Load Balancing and Auto Scaling in Action

Microservices vs Monoliths: Benefits, Tradeoffs, and How to Choose Your App's Architecture

Table of Contents

Monoliths vs Microservices: An Analogy

What is a Monolith?

What are Microservices?

Data Management in Microservices

Database Isolation in Microservices

How to Choose Between Monoliths and Microservices

Why you should start with a Monolith

Why you should start with a Microservice

Hybrid Architecture – A Middle Ground

Bringing it Together

What is Idempotence? Explained with Real-World Examples

Idempotence in the Physical World

Idempotence Patterns in Software Architectures

API Design

How to Make a POST Operation Idempotent

Message Queueing Systems

Bringing it Together

How Databases Guarantee Isolation – Pessimistic vs Optimistic Concurrency Control Explained

Table of Contents

Pessimistic Concurrency Control

What are locks?

What are conflicts?

A Library Analogy for Pessimistic Concurrency Control

A Simple Real-World Example of Pessimistic Concurrency Control in Action

Benefits and Challenges of Pessimistic Concurrency Control

How Pessimistic Concurrency Controls Guarantee the Read Committed Isolation Level

Preventing Dirty Writes

Preventing Dirty Reads

Optimistic Concurrency Control

A Simple Real-World Example of Optimistic Concurrency Control in Action

Benefits and Challenges of Optimistic Concurrency Control

How Optimistic Concurrency Controls Guarantee the Repeatable Read Isolation level

Bringing it Together

ACID Databases – Atomicity, Consistency, Isolation & Durability Explained

Table of Contents:

What are Transactions?

What Does Atomicity Mean?

An Atomic Restaurant

A Non-Atomic Restaurant

Atomic Transactions

What Does Consistency Mean?

What Does Isolation Mean?

Read Committed

Repeatable Read

What Does Durability Mean?

Bringing it Together

What is an API Gateway and Why is it Useful?

What we'll cover:

Why Use an API Gateway?

Step 1: Define Requirements and Scope

Step 2: Design the API

Step 3: Develop the API

Step 4: Deploy the API

Step 5: Monitor and Maintain the API

How an API Gateway Works

Request validation

Authorisation and Authentication

Rate Limiting

Request Routing

Request and Response Transformation

Real World Example

Bringing it Together

How Docker Containers Work – Explained for Beginners

A Brief History of Shipping Containers

What are Virtual Machines?

How does virtualisation work?