scalability - freeCodeCamp.org

How to Optimize Django REST APIs for Performance: Profiling, Caching, and Scaling.

Mari — Tue, 17 Feb 2026 18:22:09 +0000

Performance problems in APIs rarely start as performance problems. They usually start as small design decisions that worked perfectly when the application had ten users, ten records, or a single developer testing locally. Over time, as traffic increases and data grows, those same decisions begin to slow everything down.

In this article, we’ll walk step by step through how performance issues arise in Django REST APIs, how to see them clearly using profiling tools, and how to fix them using query optimization, caching, pagination, and basic scaling strategies.

This article will be most useful for developers who already understand Django, the Django REST Framework, and REST concepts, but are new to performance optimization.

Why Django REST APIs Become Slow

Before optimizing anything, it’s important to understand why APIs become slow in the first place.

Most performance issues in Django REST APIs come from three main sources:

Too many database queries
Doing expensive work repeatedly
Returning more data than necessary

Django is fast by default, but it does exactly what you ask it to do. If your API endpoint triggers 300 database queries, Django will happily run all 300.

Now let’s look at some common causes of performance issues in Django REST APIs.

1. N+1 Query Problems in Serializers

This happens when you loop over objects and access related fields, causing a separate query for each object.

# models.py
class Author(models.Model):
    name = models.CharField(max_length=100)

class Post(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)

# views.py (naive approach)
posts = Post.objects.all()
for post in posts:
    # This triggers a query per post to fetch the author
    print(post.author.name)

If you have 100 posts, this runs 101 queries: 1 for posts and 100 for authors. Django lazily loads related objects by default, so without intervention, your API performs repetitive database work that slows response times.

# Naive queryset fetching all related objects separately
posts = Post.objects.all()
authors = [post.author for post in posts]  # triggers extra queries per post

Each access to post.author triggers a new query. Even though you already fetched all posts, Django lazily loads related objects by default. This creates many extra queries, slowing down your API.

3. Serializing Large Datasets Without Pagination

Returning large query sets all at once can slow down your API and increase memory usage.

# views.py
from rest_framework.response import Response
from rest_framework.decorators import api_view
from .models import Post
from .serializers import PostSerializer

@api_view(['GET'])
def all_posts(request):
    posts = Post.objects.all()  # retrieves all posts at once
    serializer = PostSerializer(posts, many=True)
    return Response(serializer.data)

If your database has thousands of posts, this endpoint fetches everything in memory, serializes it, and sends it over the network. It’s slow and can crash under load. Later, we’ll learn to paginate results efficiently.

4. Recomputing Expensive Work Repeatedly

Some endpoints calculate the same values on every request instead of caching or precomputing.

def expensive_view(request):
    # Simulate expensive computation
    result = sum([i**2 for i in range(1000000)])
    return JsonResponse({"result": result})

Even if the data doesn’t change often, this computation happens on every request, consuming CPU time unnecessarily.

Performance optimization is about reducing unnecessary work.

At this point, it might be tempting to jump straight into fixes like caching responses or optimizing database queries. But doing that without evidence often leads to wasted effort or even new problems.

Before changing anything, you need to understand where your API is actually spending time. Is it the database? Is it serialization? Is it Python code running repeatedly on every request? This is where profiling becomes essential.

Profiling: Finding the Real Bottlenecks

Optimizing without profiling is guessing. Profiling helps you answer one question:

Where is my API actually spending time?

In practice, profiling means observing an API while it runs and collecting data about what it’s doing. This includes how many database queries are executed, how long those queries take, and how much time is spent in Python code, such as serializers or business logic.

By profiling first, you avoid making assumptions and can focus on fixing the parts of your API that are truly slowing things down.

Measuring Query Count in a View

During development, Django keeps track of all executed queries. You can inspect them directly:

from django.db import connection
from rest_framework.decorators import api_view
from rest_framework.response import Response
from .models import Post
from .serializers import PostSerializer

@api_view(["GET"])
def post_list(request):
    posts = Post.objects.all()
    serializer = PostSerializer(posts, many=True)

    response = Response(serializer.data)

    print(f"Total queries executed: {len(connection.queries)}")

    return response

If this prints 101 queries for 100 posts, you likely have an N+1 problem. This simple check confirms whether the database layer is the bottleneck.

One of the easiest ways to profile Django applications during development is by using tools that expose this information directly while requests are being processed.

The Django Debug Toolbar is one of the simplest ways to understand performance during development. It acts as a lightweight profiling tool that shows what happens behind the scenes when a request is handled.

It shows you:

How many SQL queries were executed
How long each query took
whether queries are duplicated
Which parts of the request lifecycle are slow

First, install it:

pip install django-debug-toolbar

In settings.py:

INSTALLED_APPS = [
    ...
    "debug_toolbar",
]

MIDDLEWARE = [
    ...
    "debug_toolbar.middleware.DebugToolbarMiddleware",
]

INTERNAL_IPS = [
    "127.0.0.1",
]

In urls.py:

import debug_toolbar
from django.urls import path, include

urlpatterns = [
    ...
    path("__debug__/", include(debug_toolbar.urls)),
]

When you load an endpoint in the browser during development, the toolbar displays total SQL queries, execution time, and duplicate queries. This makes inefficiencies immediately visible.

When you load an API endpoint and see 150 SQL queries for a single request, that’s a strong signal that something is wrong, often an N+1 query problem or inefficient serializer behavior.

Logging SQL Queries

Django allows you to log all executed SQL queries. This is especially useful when debugging API views.

Seeing the raw SQL makes inefficiencies obvious, such as repeated SELECT statements for the same table.

How to Enable SQL Query Logging

You can configure Django to log all SQL queries in settings.py:

LOGGING = {
    "version": 1,
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
        },
    },
    "loggers": {
        "django.db.backends": {
            "handlers": ["console"],
            "level": "DEBUG",
        },
    },
}

With this configuration, every SQL query will be printed to the console when your API runs. Repeated SELECT statements or unexpected queries become obvious.

Profiling API Response Time

Database queries are only one part of API performance. Beyond queries, it’s also important to measure the total response time of an endpoint.

Profiling response time helps you understand whether delays are caused by database access or by other parts of the request lifecycle. For example, if an endpoint takes 1.2 seconds to respond but only 50 milliseconds are spent on database queries, the bottleneck is likely in serialization, business logic, or repeated computations in Python.

By comparing query time and total response time, profiling helps you identify what to fix first instead of optimizing the wrong layer of the system.

How to Measure Total Response Time

import time
from rest_framework.decorators import api_view
from rest_framework.response import Response

@api_view(["GET"])
def example_view(request):
    start_time = time.time()

    # Simulate work
    data = {"message": "Hello world"}

    response = Response(data)

    end_time = time.time()
    print(f"Response time: {end_time - start_time:.4f} seconds")

    return response

If database queries are fast but the total response time is high, the bottleneck may be serialization or expensive Python logic.

Once you’ve identified that database access is a significant contributor to slow response times, the next step is to look more closely at how Django retrieves related data.

SQL Query Optimization in Django REST APIs

One of the most common reasons Django REST APIs become slow is inefficient access to related objects. This often manifests as the N+1 query problem, where fetching related objects triggers a separate database query for each item. Identifying and fixing this problem can significantly reduce the number of queries and improve API performance.

Understanding the N+1 Query Problem

Consider a simple example:

You fetch a list of posts
Each post has an author
For every post, Django fetches the author separately

If you have 100 posts, this results in 101 queries: 1 for the posts and 100 for the authors. This happens because Django lazily loads related objects by default. Without intervention, your API performs repetitive database work that slows down response times.

Solving the Problem with `select_related` and `prefetch_related`

Django provides built-in tools to control how related objects are loaded efficiently: select_related and prefetch_related.

1. Using select_related

select_related is designed for foreign key and one-to-one relationships. It performs an SQL join and retrieves related objects in a single query.

Use it when:

You know you will access related objects
The relationship is one-to-one or many-to-one

posts = Post.objects.select_related("author")

for post in posts:
    print(post.author.name)  # No additional queries

This performs a SQL JOIN and retrieves posts and authors in a single query, eliminating the N+1 problem.

It reduces multiple queries into just one, avoiding repeated database hits.

2. Using prefetch_related

prefetch_related is used for many-to-many and reverse foreign key relationships. It performs separate queries for each related table but combines the results in Python.

Use it when:

A SQL join would produce too much duplicated data
You are dealing with collections of related objects

Example: How to Optimize a Many-to-Many Relationship

Consider a blog application where posts can have multiple tags:

# models.py
class Tag(models.Model):
    name = models.CharField(max_length=50)

class Post(models.Model):
    title = models.CharField(max_length=200)
    tags = models.ManyToManyField(Tag)

Now imagine fetching posts and accessing their tags:

posts = Post.objects.all()

for post in posts:
    print(post.tags.all())  # Triggers additional queries

If you have 100 posts, Django may execute:

1 query to fetch posts
1 query per post to fetch related tags

This results in many unnecessary database hits.

You can optimize this using prefetch_related:

posts = Post.objects.prefetch_related("tags")

for post in posts:
    print(post.tags.all())  # Uses prefetched data

With this approach, Django performs one query for posts and one query for all related tags. It then matches them in Python, eliminating repeated database queries.

Together, these tools allow you to optimize your queries and eliminate the N+1 problem efficiently.

Common Beginner Mistakes

Even after applying these optimizations, it’s easy to make mistakes. Watch out for:

Forgetting that serializers can trigger additional queries
Using select_related on many-to-many relationships
Assuming Django automatically optimizes queries
Not checking the query count after adding serializers

Paying attention to these pitfalls ensures your API remains fast and scalable.

Caching in Django REST APIs

Even after optimizing database queries, API performance can still suffer if the same computations or database lookups are performed repeatedly. This is where caching comes in. Caching is a technique for storing the results of expensive operations so they can be retrieved more quickly the next time they are needed.

At its core, caching exists because computers have multiple layers of memory with different speeds:

CPU registers (fastest)
L1, L2, L3 caches
Main memory (RAM)
SSD storage
HDD storage (slowest)

Each layer trades speed for size: the closer the data is to the CPU, the faster it can be accessed. Software systems use the same principle; by storing frequently accessed data in a “closer” or faster location, applications can respond more quickly.

Cache Eviction

Caches are limited in size, so when a cache is full, some data must be removed to make room for new data. This process is called cache eviction.

Common eviction strategies include:

Least Recently Used (LRU): removes the data that hasn’t been accessed for the longest time
Random Replacement: removes a random item from the cache

The goal is to keep the data that is most likely to be requested again while freeing space for new data. Understanding this helps developers use caching effectively.

Caching in Application Architectures

Caching exists at several levels in modern software systems:

Client-side caching: Web browsers cache HTTP responses to reduce the need for repeated network requests. This is controlled with HTTP headers like Cache-Control.
CDN caching: Content Delivery Networks store static assets closer to users, reducing latency and server load.
Backend caching: Backend services cache results from database queries, computed values, or API responses. This is where Django caching is most commonly applied.

By applying caching strategically at the backend, APIs can serve data faster while reducing computation and database load.

Caching in Django

Django provides a flexible caching framework that supports multiple backends, including in-memory, file-based, database-backed, and third-party stores like Redis. The main types of caching in Django are:

Per-view caching: caches the entire output of a view. Ideal for endpoints where responses rarely change.
```
 from django.views.decorators.cache import cache_page

 @cache_page(60 * 15)  # cache for 15 minutes
 def my_view(request):
```
1. Template fragment caching: caches specific parts of a template to avoid repeated rendering.
2. Low-level caching: gives full control over what is cached and for how long, making it ideal for API responses.

By combining these approaches, you can reduce repeated work in your API, lower database load, and speed up response times.

When to Use Redis

While Django’s built-in caching backends are sufficient for many projects, high-traffic APIs often require a shared, in-memory cache. This is where Redis excels. Redis is designed for fast access, low latency, and can handle frequent reads across multiple servers.

You should consider using Redis when:

Data is read frequently but changes infrequently
Low latency is important for API responses
You need cache expiration and eviction policies
You want a shared cache across multiple servers or services

Redis is particularly effective for API endpoints that serve the same data to many users, such as frequently accessed lists or computed results.

Common Beginner Mistakes

Caching is powerful, but it’s easy to misuse. Some common pitfalls include:

Caching everything blindly: not all data benefits from caching
Forgetting cache invalidation: stale data can lead to incorrect responses
Using cache where query optimization would suffice: sometimes optimizing database queries is a better solution than caching.

Remember: caching should complement good database design, not replace it.

Pagination and Limiting Expensive Datasets

Even with caching, returning large datasets in a single request can slow down your API and increase memory usage. Pagination is a simple and effective way to limit the amount of data returned at once.

Pagination helps by reducing:

Database load
Memory usage
Serialization time
Network transfer size

Django REST Framework provides built-in pagination classes that make it easy to paginate endpoints. As a rule of thumb, always paginate list endpoints unless there is a strong reason not to.

Load Testing and Measuring Improvement

Optimizations are only meaningful if you can measure their impact. Load testing simulates multiple users accessing your API simultaneously, helping you answer key questions:

How many requests per second can my API handle?
Where does the API start to break under load?
Did caching, query optimization, and pagination actually improve performance?

By running load tests before and after optimization, you can validate that your changes have the desired effect and avoid optimizing the wrong parts of your system.

Summary and Next Steps

Optimizing Django REST APIs isn’t about chasing every tiny micro-optimization. It’s about reducing unnecessary work and focusing on the parts of your API that actually slow down performance.

Key Takeaways

Profile before optimizing: Identify the real bottlenecks before making changes.
Reduce database queries: Use techniques like select_related, prefetch_related, and avoid N+1 queries.
Cache frequently accessed data: Use Django caching and Redis to reduce repeated computations.
Paginate large datasets: Limit memory usage and network load by returning data in chunks.
Measure performance changes: Always verify that your optimizations have a real impact.

Next Steps for Your APIs

Add profiling to your existing APIs to understand where time is spent.
Identify one slow endpoint and focus on optimizing it first.
Optimize database queries using Django ORM best practices.
Introduce caching carefully; avoid caching everything blindly.
Measure the results with load testing and performance metrics.

Remember: Performance optimization is not a one-time task. It’s a habit built by continuously observing how your system works, testing improvements, and applying changes where they make the most impact.

How to Build Your First Dynamic Performance Test in Apache JMeter

Mah Noor — Tue, 28 Oct 2025 16:48:10 +0000

As a QA engineer, I have always found performance testing to be one of the most exciting and underrated parts of software testing. Yes, functional testing is important, but it’s of little use if users have to wait for 5 seconds for each page to load.

For me personally, there is a deep satisfaction that comes with seeing your product come alive under load to find out how it’ll actually work in production when thousands of users will be using it.

Performance testing is about discovering how your system performs under real-world pressure in terms of load, concurrency, and throughput. One of the key aspects of performance testing is ensuring that the APIs can endure the expected load. You can do this using tools like Apache JMeter and K6.

In this tutorial, we’ll explore how you can build your first end-to-end performance test in Apache JMeter. You will be learning to create a test suite that is dynamic (the test can be run with any test data) and that’s one-click executable (the test execution can be done through the GUI as well as the CLI).

Prerequisites
Introduction to Apache JMeter
Conclusion

Prerequisites

Before you start, make sure you have:

Apache JMeter (5.5 or above) installed.
Java 8 or later configured on your system.

You can check if JMeter is installed by running the command below:

jmeter -v

Note: This tutorial will use the JSONPlaceholder public API. You’ll learn how you can get a post_id and use it in a chain request to get user details.

Let’s get started.

Introduction to Apache JMeter

Apache JMeter is an open-source API load and stress testing tool. It’s a powerful testing tool that supports a wide range of protocols, including HTTP, HTTPS, FTP, JDBC, SOAP, and REST.

JMeter helps you answer critical questions about your APIs, like:

How does my API perform under heavy load?
What’s the maximum number of users it can handle before it starts failing?
Which requests or endpoints are slowing things down?

Let’s go through the step-by-step process of building a dynamic load testing suite with JMeter.

Step 1: Create a New Test Plan

Once JMeter opens, you’ll see an empty Test Plan. Think of this as your main workspace, which holds everything: Test configuration, users, requests, assertions, and results.

Right-click on Test Plan → Add → Threads (Users) → Thread Group to add a thread group. A thread group is essentially a test suite containing our test cases.

Step 2: Configure the Thread Group

To configure the thread group, fill out the following input fields:

Setting	Value	Description
Number of Threads (Users)	5	This represents the number of concurrent users. In this case, it will be ‘5’
Ramp-up Period (seconds)	10	This means the time it takes the threads to reach the maximum value.
Loop Count	2	This specifies the number of times you want your thread group executed.

You’ve now created a small, controlled load test of 10 total requests (5 users × 2 loops).

Step 3: Add HTTP Request Defaults

When you’re creating a suite of 100s of APIs, you don’t need to add your request details to all the API samplers in JMeter. JMeter lets you set it once globally by using a config element called HTTP Request Defaults. To add this element, follow the steps below:

Right-click on Thread Group → Add → Config Element → HTTP Request Defaults.
Enter the following:
- Protocol: https
- Server Name or IP: jsonplaceholder.typicode.com

This means all requests in this test will automatically use this base URL.

Step 4: Add a CSV Data Set Config (Dynamic Input)

In real projects, APIs rarely use static inputs. Take as an example a login API that you want to run for 100 concurrent users. In a real-world scenario, every login request will have a different username and password.

To replicate this on JMeter, you need to run your test for 100 different login credentials. This means that your test should be test data-driven. We can build a data-driven test in JMeter using a CSV file:

Create a file named data.csv with the following content:
```
 post_id
 1
 2
 3
 4
 5
```
Save it in your JMeter project folder.
In JMeter, right-click on Thread Group → Add → Config Element → CSV Data Set Config.
Fill in the following fields:
- Filename: data.csv
- Variable Names: post_id
- Recycle on EOF: True
- Stop thread on EOF: False

Now each user will pick a new post_id for every iteration from the CSV file.

Step 5: Add the HTTP Request Sampler

Now let’s add the actual API call we'll test under load. To do this, follow the steps below:

Right-click on Thread Group → Add → Sampler → HTTP Request.
Rename it to Get Post Data.
Set the following fields:
- Method: GET
- Path: /posts/${post_id}

Here ${post_id} dynamically takes its value from your CSV file. The Protocol and Server IP fields will automatically get data from the ‘HTTP Request default’ config element that we added in Step #3.

Step 6: Add a JSON Extractor

When the API returns a response, we can extract a value (like userId) from it and use it later. This is used to implement an end-to-end flow where data is gotten (with GET) from an API and sent to the next POST/DELETE API.

For our API, below is the example response:

{
  "userId": 1,
  "id": 3,
  "title": "fugiat veniam minus",
  "body": "This is an example post body"
}

To extract userId:

Right-click on Get Post Data → Add → Post Processors → JSON Extractor.
Set the variables below in the JSON Extractor:
- Name: Extract User ID
- Variable Name: user_id
- JSON Path Expression: $.userId

Now you can use ${user_id} in the next request, making your test fully dynamic.

Step 7: Add an Assertion

Assertions help you verify that your API responds correctly even under load. You can assert on the API response code, response time, or even the response payload. To add an assertion, follow the steps below:

Right-click Get Post Data → Add → Assertions → Response Assertion.
Configure as:
- Response Field to Test: Response Code – This will add an assertion for the response code.
- Pattern Matching Rules: Contains
- Pattern to Test: 200

This ensures JMeter only counts the request as successful if the word fugiat appears in the response.

Step 8: Add Listeners

We’ll add listeners to display our test results in different forms, such as visually or in a summary. Let’s add two essential ones:

View Results Tree: to view and debug individual requests.
Summary Report: to view performance metrics like response time, error rate, and throughput.

Add them via Thread Group → Add → Listener → [Choose Listener]

Step 9: Run Your Test

Hit the green Start button at the top. JMeter will start sending requests to your API using the dynamic post IDs from your CSV file.

As the test runs:

Green checkmarks in View Results Tree mean successful responses.
Assertion failures will appear in red.
Summary Report will aggregate key metrics.

Step 10: Chain Another Request (Optional)

Let’s take it one step further: we’ll use the extracted user_id from the first response to get user details from the GET users call. To do this, follow the steps below:

Right-click Thread Group → Add → Sampler → HTTP Request.
Rename to Get User Details.
Set:
- Method: GET
- Path: /users/${user_id}

Step 11: Analyze the Results

Once the test completes, open the Summary Report. You’ll see:

Metric	Description
Sample Count	Number of total requests sent
Average	Mean response time per request
Min/Max	Fastest and slowest response times
Error %	Percentage of failed requests
Throughput	Requests handled per second

If your error percentage is 0% and throughput is stable, your system handled the load well.

Pro Tips

Parameterize everything. Use multiple CSVs for realistic test flows (users, IDs, tokens).
Add timers (like Constant Timer) to simulate think time between user actions.
Use Assertions wisely. Don’t add extra assertions; focus on key validations such as response time and API status code.

Generate HTML reports using the command below:

  jmeter -n -t test-plan.jmx -l results.jtl -e -o report

Example Folder Structure:

Follow the folder structure below for an organized test suite.

performance-test/
├── data.csv
├── test-plan.jmx
└── results/
    ├── summary.csv
    └── report.html

Conclusion

Performance testing is an essential element of a production readiness checklist for any product. It helps you ensure that your product can handle the expected user load and scale gracefully.

This guide is your first step towards writing end-to-end performance test cases and bridging the gap between being a functional test engineer and a full-stack QA Engineer who understands both quality and scalability.

I hope you found this tutorial helpful. If you want to stay connected or learn more about performance testing, follow me on LinkedIn.

How to Build Multi-Module Projects in Spring Boot for Scalable Microservices

Birkaran Sachdev — Tue, 12 Nov 2024 16:42:04 +0000

As software applications grow in complexity, managing scalability, modularity, and clarity becomes essential.

Spring Boot’s multi-module structure allows you to manage different parts of the application independently, which lets your team develop, test, and deploy components separately. This structure keeps code organized and modular, making it useful for both microservices and large monolithic systems.

In this tutorial, you’ll build a multi-module Spring Boot project, with each module dedicated to a specific responsibility. You’ll learn how to set up modules, configure inter-module communication, handle errors, implement JWT-based security, and deploy using Docker.

Prerequisites:

Basic knowledge of Spring Boot and Maven.
Familiarity with Docker and CI/CD concepts (optional but helpful).

Why Multi-Module Projects?
Project Structure and Architecture
How to Set Up the Parent Project
How to Create the Modules
Inter-Module Communication
Common Pitfalls and Solutions
Testing Strategy
Error Handling and Logging
Security and JWT Integration
Deployment with Docker and CI/CD
Best Practices and Advanced Use Cases
Conclusion and Key Takeaways

1. Why Multi-Module Projects?

In single-module projects, components are often tightly coupled, making it difficult to scale and manage complex codebases. A multi-module structure offers several advantages:

Modularity: Each module is dedicated to a specific task, such as User Management or Inventory, simplifying management and troubleshooting.
Team Scalability: Teams can work independently on different modules, minimizing conflicts and enhancing productivity.
Flexible Deployment: Modules can be deployed or updated independently, which is particularly beneficial for microservices or large applications with numerous features.

Real-World Example

Consider a large e-commerce application. Its architecture can be divided into distinct modules:

Customer Management: Responsible for handling customer profiles, preferences, and authentication.
Product Management: Focuses on managing product details, stock, and pricing.
Order Processing: Manages orders, payments, and order tracking.
Inventory Management: Oversees stock levels and supplier orders.

Case Study: Netflix

To illustrate these benefits, let's examine how Netflix employs a multi-module architecture.

Netflix is a leading example of a company that effectively uses this approach through its microservices architecture. Each microservice at Netflix is dedicated to a specific function, such as user authentication, content recommendations, or streaming services.

This modular structure enables Netflix to scale its operations efficiently, deploy updates independently, and maintain high availability and performance. By decoupling services, Netflix can manage millions of users and deliver content seamlessly worldwide, ensuring a robust and flexible system that supports its vast and dynamic platform.

This architecture not only enhances scalability but also improves fault isolation, allowing Netflix to innovate rapidly and respond effectively to user demands.

2. Project Structure and Architecture

Now let’s get back to our example project. Your multi-module Spring Boot project will use five key modules. Here’s the layout:

codespring-boot-multi-module/
 ├── common/               # Shared utilities and constants
 ├── domain/               # Domain entities
 ├── repository/           # Data access layer (DAL)
 ├── service/              # Business logic
 └── web/                  # Main Spring Boot application and controllers

Each module has a specific role:

common: Stores shared utilities, constants, and configuration files used across other modules.
domain: Contains data models for your application.
repository: Manages database operations.
service: Encapsulates business logic.
web: Defines REST API endpoints and serves as the application’s entry point.

This structure aligns with separation of concerns principles, where each layer is independent and handles its own logic.

The diagram below illustrates the various modules:

3. How to Set Up the Parent Project

Step 1: Create the Root Project

Let’s run these commands to create the Maven parent project:

mvn archetype:generate -DgroupId=com.example -DartifactId=spring-boot-multi-module -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
cd spring-boot-multi-module

Step 2: Configure the Parent `pom.xml`

In the pom.xml, let’s define our dependencies and modules:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://www.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>
    <groupId>com.examplegroupId>
    <artifactId>spring-boot-multi-moduleartifactId>
    <version>1.0-SNAPSHOTversion>
    <packaging>pompackaging>
    <modules>
        <module>commonmodule>
        <module>domainmodule>
        <module>repositorymodule>
        <module>servicemodule>
        <module>webmodule>
    modules>
    <properties>
        <java.version>11java.version>
        <spring.boot.version>2.5.4spring.boot.version>
    properties>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.bootgroupId>
                <artifactId>spring-boot-dependenciesartifactId>
                <version>${spring.boot.version}version>
                <type>pomtype>
                <scope>importscope>
            dependency>
        dependencies>
    dependencyManagement>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.bootgroupId>
                <artifactId>spring-boot-maven-pluginartifactId>
            plugin>
        plugins>
    build>
project>

This pom.xml file centralizes dependencies and configurations, making it easier to manage shared settings across modules.

4. How to Create the Modules

Common Module

Let’s create a common module to define shared utilities like date formatters. Create this module and add a sample utility class:

mvn archetype:generate -DgroupId=com.example.common -DartifactId=common -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

Date Formatter Utility:

package com.example.common;

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

public class DateUtils {
    public static String formatDate(LocalDate date) {
        return date.format(DateTimeFormatter.ofPattern("yyyy-MM-dd"));
    }
}

Domain Module

In the domain module, you will define your data models.

package com.example.domain;

import javax.persistence.Entity;
import javax.persistence.Id;

@Entity
public class User {
    @Id
    private Long id;
    private String name;

    // Getters and Setters
}

Repository Module

Let’s create the repository module to manage data access. Here’s a basic repository interface:

package com.example.repository;

import com.example.domain.User;
import org.springframework.data.jpa.repository.JpaRepository;

public interface UserRepository extends JpaRepository<User, Long> {}

Service Module

Let’s create the service module to hold your business logic. Here’s an example service class:

package com.example.service;

import com.example.domain.User;
import com.example.repository.UserRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class UserService {

    @Autowired
    private UserRepository userRepository;

    public User getUserById(Long id) {
        return userRepository.findById(id).orElse(null);
    }
}

Web Module

The web module serves as the REST API layer.

@RestController
public class UserController {

    @Autowired
    private UserService userService;

    @GetMapping("/users/{id}")
    public User getUserById(@PathVariable Long id) {
        return userService.getUserById(id);
    }
}

5. Inter-Module Communication

To avoid direct dependencies, you can use REST APIs or message brokers (like Kafka) for inter-module communication. This ensures loose coupling and allows each module to communicate independently.

The diagram below demonstrates how modules communicate with each other:

The diagram illustrates how different system components communicate to handle requests efficiently.

The Web Module processes incoming API requests and forwards them to the Service Module, which contains the business logic. The Service Module then interacts with the Repository Module to fetch or update data in the Database. This layered approach ensures that each module operates independently, promoting flexibility and easier maintenance.

Example Using Feign Client:

In the context of inter-module communication, using tools like Feign Clients is a powerful way to achieve loose coupling between services.

The Feign client allows one module to seamlessly communicate with another through REST API calls, without requiring direct dependencies. This approach fits perfectly within the layered architecture described earlier, where the Service Module can fetch data from other services or microservices using Feign clients, rather than directly accessing databases or hard-coding HTTP requests.

This not only simplifies the code but also improves scalability and maintainability by isolating service dependencies.

@FeignClient(name = "userServiceClient", url = "http://localhost:8081")
public interface UserServiceClient {
    @GetMapping("/users/{id}")
    User getUserById(@PathVariable("id") Long id);
}

6. Common Pitfalls and Solutions

When implementing a multi-module architecture, you may encounter several challenges. Here are some common pitfalls and their solutions:

Circular Dependencies: Modules may inadvertently depend on each other, creating a circular dependency that complicates builds and deployments.
- Solution: Carefully design module interfaces and use dependency management tools to detect and resolve circular dependencies early in the development process.
Over-Engineering: There's a risk of creating too many modules, leading to unnecessary complexity.
- Solution: Start with a minimal set of modules and only split further when there's a clear need, ensuring each module has a distinct responsibility.
Inconsistent Configurations: Managing configurations across multiple modules can lead to inconsistencies.
- Solution: Use centralized configuration management tools, such as Spring Cloud Config, to maintain consistency across modules.
Communication Overhead: Inter-module communication can introduce latency and complexity.
- Solution: Optimize communication by using efficient protocols and consider asynchronous messaging where appropriate to reduce latency.
Testing Complexity: Testing a multi-module project can be more complex due to the interactions between modules.
- Solution: Implement a robust testing strategy that includes unit tests for individual modules and integration tests for inter-module interactions.

By being aware of these pitfalls and applying these solutions, you can effectively manage the complexities of a multi-module architecture and ensure a smooth development process.

7. Testing Strategy and Configuration

Testing each module independently and as a unit is critical in multi-module setups.

Unit Tests

Here, we’ll use JUnit and Mockito for performing unit tests:

@RunWith(MockitoJUnitRunner.class)
public class UserServiceTest {

    @Mock
    private UserRepository userRepository;

    @InjectMocks
    private UserService userService;

    @Test
    public void testGetUserById() {
        User user = new User();
        user.setId(1L);
        user.setName("John");

        Mockito.when(userRepository.findById(1L)).thenReturn(Optional.of(user));

        User result = userService.getUserById(1L);
        assertEquals("John", result.getName());
    }
}

Integration Tests

And we’ll use Testcontainers with an in-memory database for integration tests:

@Testcontainers
@ExtendWith(SpringExtension.class)
@SpringBootTest
public class UserServiceIntegrationTest {

    @Container
    private static PostgreSQLContainer postgresqlContainer = new PostgreSQLContainer<>("postgres:latest");

    @Autowired
    private UserService userService;

    @Test
    public void testFindById() {
        User user = userService.getUserById(1L);
        assertNotNull(user);
    }
}

8. Error Handling and Logging

Error handling and logging ensure a robust and debuggable application.

Error Handling

In this section, we'll explore how to handle errors gracefully in your Spring Boot application using a global exception handler. By using @ControllerAdvice, we'll set up a centralized way to catch and respond to errors, keeping our code clean and our responses consistent.

@ControllerAdvice
public class GlobalExceptionHandler {

    @ExceptionHandler(UserNotFoundException.class)
    public ResponseEntity handleUserNotFoundException(UserNotFoundException ex) {
        return new ResponseEntity<>("User not found", HttpStatus.NOT_FOUND);
    }
}

In the code example above, we define a GlobalExceptionHandler that catches any UserNotFoundException and returns a friendly message like "User not found" with a status of 404. This way, you don’t have to handle this exception in every controller—you’ve got it covered in one place!

Now, let’s take a look at the diagram. Here’s how it all flows: when a client sends a request to our Web Module, if everything goes smoothly, you'll get a successful response. But if something goes wrong, like a user not being found, the error will be caught by our Global Error Handler. This handler logs the issue and returns a clean, structured response to the client.

This approach ensures that users get clear error messages while keeping your app’s internals hidden and secure.

Logging

Structured logging in each module improves traceability and debugging. You can use a centralized logging system like Logback and include correlation IDs to trace requests.

9. Security and JWT Integration

In this section, we’re going to set up JSON Web Tokens (JWT) to secure our endpoints and control access based on user roles. We'll configure this in the SecurityConfig class, which will help us enforce who can access what parts of our application.

@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http.authorizeRequests()
            .antMatchers("/admin/**").hasRole("ADMIN")
            .antMatchers("/user/**").hasAnyRole("USER", "ADMIN")
            .anyRequest().authenticated()
            .and()
            .oauth2ResourceServer().jwt();
    }
}

In the code example above, you can see how we’ve defined access rules:

The /admin/** endpoints are restricted to users with the ADMIN role.
The /user/** endpoints can be accessed by users with either the USER or ADMIN roles.
Any other requests will require the user to be authenticated.

Next, we set up our application to validate incoming tokens using .oauth2ResourceServer().jwt();. This ensures that only requests with a valid token can access our secured endpoints.

Now, let’s walk through the diagram. When a client sends a request to access a resource, the Security Filter first checks if the provided JWT token is valid. If the token is valid, the request proceeds to the Service Module to fetch or process the data. If not, access is denied right away, and the client receives an error response.

This flow ensures that only authenticated users can access sensitive resources, keeping our application secure.

10. Deployment with Docker and CI/CD

In this section, we'll containerize each module using Docker to make our application easier to deploy and run consistently across different environments. We’ll also set up a CI/CD pipeline using GitHub Actions (but you can use Jenkins too if you prefer). Automating this process ensures that any changes you push are automatically built, tested, and deployed.

Step 1: Containerizing with Docker

We start by creating a Dockerfile for the Web Module:

FROM openjdk:11-jre-slim
COPY target/web-1.0-SNAPSHOT.jar app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]

Here, we’re using a lightweight version of Java 11 to keep our image size small. We copy the compiled .jar file into the container and set it up to run when the container starts.

Step 2: Using Docker Compose for Multi-Module Deployment

Now, we'll use a Docker Compose file to orchestrate multiple modules together:

version: '3'
services:
  web:
    build: ./web
    ports:
      - "8080:8080"
  service:
    build: ./service
    ports:
      - "8081:8081"

With this setup, we can run both the Web Module and the Service Module at the same time, making it easy to spin up the entire application with a single command. Each service is built separately from its own directory, and we expose the necessary ports to access them.

CI/CD Example with GitHub Actions

name: CI Pipeline

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up JDK 11
      uses: actions/setup-java@v2
      with:
        java-version: '11'
    - name: Build with Maven
      run: mvn clean install

This pipeline automatically kicks in whenever you push new code or create a pull request. It checks out your code, sets up Java, and runs a Maven build to ensure everything is working correctly.

11. Best Practices and Advanced Use Cases

The following best practices ensure maintainability and scalability.

Best Practices

Avoid Circular Dependencies: Ensure modules don’t have circular references to avoid build issues.
Separate Concerns Clearly: Each module should focus on one responsibility.
Centralized Configurations: Manage configurations centrally for consistent setups.

Advanced Use Cases

Asynchronous Messaging with Kafka: Use Kafka for decoupled communication between services. Modules can publish and subscribe to events asynchronously.
REST Client with Feign: Use Feign to call services within modules. Define a Feign client interface for communication.
Caching for Performance: Use Spring Cache in the service module for optimizing data retrieval.

Conclusion and Key Takeaways

A multi-module Spring Boot project provides modularity, scalability, and ease of maintenance.

In this tutorial, you learned to set up modules, manage inter-module communication, handle errors, add security, and deploy with Docker.

Following best practices and using advanced techniques like messaging and caching will further optimize your multi-module architecture for production use.

How to Effectively Manage Unique Identifiers at Scale: From GUIDs to Snowflake IDs and Other Modern Solutions

Gor Grigoryan — Tue, 20 Aug 2024 18:21:25 +0000

What Are Unique Identifiers? 🪪

Unique identifiers (UIDs) are crucial components in software engineering and data management. They serve as distinct references for entities within a system and ensure that each item – whether a database record, a user, or a file – can be uniquely identified and accessed.

UIDs are critical for maintaining data, enabling efficient search and retrieval, and supporting large-scale operations in distributed systems. As data volumes and system complexities grow, the need for scalable UID solutions becomes increasingly important.

In this article, you'll learn all about the history of unique identifiers, as well as how some modern solutions work.

The concept of unique identifiers has enveloped significantly over time, reflecting the growing complexity and scale of human societies and technological systems. To understand why unique identifiers are so important today, let’s look at how we've historically managed identification and how it was developed.

In early human societies, individuals were often identified by a single name. This was usually sufficient in small communities where everyone knew each other personally. But as populations grew, it became necessary to distinguish between individuals who shared the same first name. This led to the adoption of surnames.

For example, in Armenia 🇦🇲, surnames are used to identify individuals by their family or ancestry. Take the example of a person named Gor. In a small group of up to 50 people, let's say, identifying Gor by his first name alone is easy.

But as the group grows to a larger community of, say, 500 people, additional identifiers become necessary. Gor will be identified as Gor Grigoryan, indicating that he belongs to the Grigoryan family/ancestry. This surname provides a clearer identification and connects Gor to his family's lineage.

As societies continued to expand and bureaucratic systems became more complex, even surnames proved not enough for uniquely identifying individuals. This was especially true in larger cities and for the administration of government services. The need for more robust identification methods became apparent.

Government Management of Unique Identifiers

The introduction of passports in the early 20th century marked a significant step in this direction. Passports included unique personal identifiers, such as passport numbers, to distinguish between individuals clearly. These unique IDs ensured that each person could be accurately identified, regardless of name similarities or other ambiguities.

Several countries pioneered the use of unique personal identification numbers to address this need:

Germany 🇩🇪: In the 19th century, Germany implemented a system for tracking individuals for social welfare and military conscription purposes.
Sweden 🇸🇪: Sweden began issuing personal identification numbers (Personnummer) in the 1940s, providing each citizen with a unique identifier for use in various administrative processes.
France 🇫🇷: France introduced the National Identification Number (Numéro de Sécurité Sociale) in the mid-20th century to streamline social security administration and other government services.
United States 🇺🇸: The USA followed with the introduction of Social Security Numbers (SSNs) in 1936 as part of the Social Security Act. This approach to unique identification has since been adopted worldwide, with countries issuing national identification numbers to their citizens.

Information page, Edwin James Tharp’s passport, March 27, 1936, Robert and Eva Tharp Collection.

As illustrated in the example image, the 1936 UK 🇬🇧 passport included detailed personal information such as eye color, hair color, profession, height, and information about the holder’s spouse and children.

A Social Security Number (SSN) in the United States is a nine-digit number formatted as "AAA-GG-SSSS". Each part of the SSN has historically carried specific information:

Area Number (AAA): Originally, the first three digits, known as the area number, represented the geographical region where the SSN was issued. This regional assignment helped to ensure a systematic distribution of numbers across the country.
Group Number (GG): The middle two digits, called the group number, were used to organize the numbers within a given area. The group numbers ranged from 01 to 99 and were issued in a specific order to prevent duplicate numbers within the same area.
Serial Number (SSSS): The last four digits are the serial number, which sequentially identifies each individual within a group. This part of the SSN ensures that even if the area and group numbers are the same, the overall SSN remains unique.

The Social Security Administration (SSA) has implemented several measures to ensure that each SSN is unique for the entire USA population (341.9 million people).

Governments around the world manage unique identifiers primarily for administrative purposes, such as social security, taxation, and national identification. These systems are designed to handle large populations and ensure that every citizen has a unique identifier for official records.

For example, the United States 🇺🇸 Social Security Administration (SSA) manages Social Security Numbers (SSNs) for over 330 million people. Similarly, the Indian 🇮🇳 government has issued Aadhaar numbers, a 12-digit unique identifier, to over 1.3 billion citizens. These identifiers are crucial for accessing government services, benefits, and other official processes.

Aadhaar is the world's largest biometric ID system described as "the most sophisticated ID program in the world".

Scalability in Government Systems

While government systems are large, they generally do not face the same scalability challenges as tech companies. Government databases are often centralized, and the rate at which new identifiers are issued is relatively steady and predictable. Also, the frequency of updates and interactions with these identifiers is lower compared to the dynamic environment of tech companies.

Tech companies, especially social media giants, operate on an entirely different scale. These companies manage billions of users and generate vast amounts of data daily. For instance, Meta (formerly Facebook) has over 3 billion monthly active users across its platforms, including Facebook, Instagram, and WhatsApp.

Tech Companies and Their Scale

Let's take a few examples:

Meta (Facebook)

User Base: With over 3 billion monthly active users, Meta needs a robust system to ensure that each user is uniquely identified.
Posts and Interactions: Facebook alone sees approximately 350 million new posts daily. Each of these posts, along with comments, likes, and shares require a unique identifier to manage interactions efficiently.
Messages: WhatsApp users send around 100 billion messages every day, each needing a unique identifier to ensure messages are correctly routed and stored.
Unique Data Rows: With the combination of user profiles, posts, comments, likes, and messages, Meta likely manages over 10+ trillion unique data rows. (If the global population is approximately 8 billion people, then 10 trillion people would be about 1,250 times the current global population).

X (Twitter)

Twitter, another social media giant, has about 450 million monthly active users. On average, users send around 500 million tweets per day. Each tweet, reply, and retweet needs a unique identifier to maintain the platform's integrity and usability.

Telegram is known for its high-traffic and robust messaging platform. With over 700 million monthly active users, Telegram experiences particularly high traffic spikes during events like New Year's Eve, where users send billions of messages within a short timeframe.

On a typical day, Telegram handles over 70 billion messages. Each message, channel post, and group interaction requires a unique identifier to ensure proper delivery and organization.

The scale at which tech companies operate requires sophisticated and highly scalable unique identifier systems. These systems must handle high concurrency, support distributed architectures, and ensure low latency.

The Role of Auto-increment IDs and Their Scalability Issues

Auto-increment IDs are a common method for generating unique identifiers in relational databases. When a new record is added to a table, the database automatically assigns the next available integer value to the ID field. This method is straightforward and ensures that each record within a table has a unique identifier without requiring any manual intervention.

Consider a table for storing user information in a relational database. When the first user is added, they might be assigned an ID of 1. The second user would receive an ID of 2, and so on.

While auto-increment IDs are simple and effective for small-scale applications, they face significant challenges in larger, distributed systems.

Concurrency Issues: In high-traffic applications, multiple transactions might attempt to insert records simultaneously. Ensuring that each transaction receives a unique auto-increment ID can lead to performance bottlenecks and require complex locking mechanisms.
Distributed Systems: In distributed databases, where data is spread across multiple servers, maintaining a global sequence for auto-increment IDs becomes problematic. Each server would need to coordinate with others to avoid generating duplicate IDs, which can significantly impact performance and reliability.
Single Point of Failure: Relying on a central authority to generate auto-increment IDs introduces a single point of failure. If the server responsible for assigning IDs goes down, the entire system might be unable to add new records.
Predictability: Auto-increment IDs are predictable. If someone knows the ID of one record, they can infer the IDs of subsequent records. This predictability can be a security concern in certain applications, such as those involving financial transactions or sensitive user data.

CREATE TABLE Admins (
    Id SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

CREATE TABLE Users (
    Id SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

INSERT INTO Admins (Name)
VALUES ('GorGrigoryan'),
       ('GorGrigoryan2');

SELECT * FROM Admins;


-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 1  | GorGrigoryan  |
-- +----+---------------+
-- | 2  | GorGrigoryan2 |
-- +----+---------------+

Sequence Numbers and Their Advantages Over Auto-increment IDs

Sequence numbers are a method of generating unique identifiers by maintaining a counter that is incremented with each new record. Unlike auto-increment IDs, which are typically limited to a single database instance, sequence numbers can be designed to work across distributed systems, addressing some of the scalability and concurrency issues associated with auto-increment IDs.

How sequence numbers work:

Centralized Sequence Generators: A central service or database table generates and manages the sequence numbers. Each request for a new identifier increments the counter and returns the next value.
Distributed Sequence Generators: In a distributed environment, sequence numbers can be generated by dividing the range of possible values among different nodes or using more complex algorithms to ensure uniqueness without central coordination.

Consider a distributed database system with multiple nodes, each responsible for generating unique sequence numbers. The system might allocate ranges of sequence numbers to each node, ensuring that they can generate identifiers independently:

Node 1: Allocated sequence numbers 1,000,000 to 1,999,999
Node 2: Allocated sequence numbers 2,000,000 to 2,999,999
Node 3: Allocated sequence numbers 3,000,000 to 3,999,999

Each node can now generate up to one million unique identifiers without needing to communicate with a central server. This approach improves scalability and performance, particularly in environments with high write loads.

CREATE SEQUENCE UserIdentifier
INCREMENT 1
START 1;

CREATE TABLE Admins (
    Id INT PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

CREATE TABLE Users (
    Id INT PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);


INSERT INTO Admins (Id, Name)
VALUES(nextval('UserIdentifier'), 'GorGrigoryan'),
(nextval('UserIdentifier'), 'GorGrigoryan2');

INSERT INTO Users (Id, Name)
VALUES(nextval('UserIdentifier'), 'UserGorGrigoryan'),
(nextval('UserIdentifier'), 'UserGorGrigoryan2');


SELECT * FROM Admins;

-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 1  | GorGrigoryan  |
-- +----+---------------+
-- | 2  | GorGrigoryan2 |
-- +----+---------------+

SELECT * FROM Users;

-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 3  | GorGrigoryan  |
-- +----+---------------+
-- | 4  | GorGrigoryan2 |
-- +----+---------------+

Another advantage of using sequence numbers is that you can obtain the ID of the entity before it is inserted into the database.

In the case of auto-increment IDs, this assignment is typically handled by the database upon insertion, which can limit flexibility. With sequence numbers, you can easily generate the ID on the application side, which can be an easy task when using some ORMs e.g the EF Core ORM in C#

Check out sequence numbers on the SQL server here.

UUIDs: Overview and Usage

GUIDs (Globally Unique Identifiers), also known as UUIDs (Universally Unique Identifiers), are 128-bit identifiers designed to be globally unique. A typical UUID is displayed in a 32-character hexadecimal string, divided into five groups separated by hyphens. For example: 126e3456-e89b-12d3-a456-426614174000.

What's so great about UUIDs?

One of the standout features of GUIDs is their huge capacity for uniqueness. With a 128-bit structure, the total number of possible GUIDs is very large: Specifically, there are 340,282,366,920,938,463,463,374,607,431,770,000,000 GUIDs available. To put that into perspective, let's compare it with something tangible.

Did you know that scientists have attempted to calculate the number of grains of sand on Earth? Science writer David Blatner, in his book Spectrums, mentions that a group of researchers at the University of Hawaii tried to estimate this number. They determined that Earth has roughly (and we are speaking very roughly) 7.5 x 10¹⁸ grains of sand, or seven quintillion, five hundred quadrillion grains. For more, consider reading the article titled: "Which Is Greater, The Number Of Sand Grains On Earth Or Stars In The Sky?"

Now, to compare those numbers:

| GUIDs available | 340,282,366,920,938,463,463,374,607,431,770,000,000
| Sand grains     | 75,000,000,000,000,000,000

If you decided to create an application to track every grain of sand on Earth and assign each a unique identifier, you could easily do that using GUIDs. The fun part is that you could actually repeat this process 4,537,098,225,612,512,846 times over without running out of unique GUIDs! 🤯

UUID Version 1

UUID Version 1 generates unique identifiers based on the current timestamp, clock sequence, and node identifier (typically the MAC address of the machine generating the UUID).

According to RFC 4122, the timestamp is the number of nanoseconds since October 15, 1582, at midnight UTC. Most computers do not have a clock that ticks fast enough to measure time in nanoseconds. Instead, a random number is often used to fill in timestamp digits beyond the computer's measurement accuracy.

When multiple version-1 UUIDs are generated in a single API call, the random portion may be incremented rather than regenerated for each UUID. This ensures uniqueness and is faster to generate.

UUID v1 also has the mac address attached to it. By including a MAC address in the UUID, you can be sure that two different computers will never generate the same UUID. Because MAC addresses are globally unique, but also note that version-1 UUIDs can be traced back to the computer that generated them.

This ensures that the UUID is unique across both time and space. It is suitable when the generation time and machine uniqueness are important. It is often used in systems where the timestamp of creation is relevant or needed.

(Image from here)

UUID Version 4

UUID Version 4 generates identifiers using random or pseudo-random numbers. This method ensures a high probability of uniqueness due to the vast number of possible GUIDs. This is the most common UUID version.

There are 2 main variants of UUID:

Variant 1: Minecraft UUID, also called Timestamp-first UUIDs
Variant 2: "GUID"

(Image from here)

GUID is entirely random, making it simple to generate and ensuring that each identifier is unique with a very high probability. The unique identifiers are made up of 128 bits. They are written as 32 characters using numbers (0-9) and letters (A-F). The characters are grouped in a specific format: 8-4-4-4-12, separated by hyphens, like this: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}.

The great thing about GUIDs is that you don’t need a central system to create them. Anyone can generate a GUID using an algorithm, and it will still be unique across different systems and applications. They are designed to be used nearly everywhere a unique identifier is needed. Here are some usage examples:

Windows: Uses GUIDs to generate unique product keys
Microsoft SQL Server: Uses GUIDs as primary keys to ensure global uniqueness across distributed databases
AWS: Uses GUIDs for uniquely identifying resources in their cloud infrastructure, such as EC2 instances and S3 objects
eBay: Uses GUIDs to identify listings, transactions, and users

UUID Version 5

UUID Version 5 generates unique identifiers based on a namespace identifier and a name. The namespace and name are combined and hashed using SHA-1 to produce the UUID. This ensures that the same namespace and name combination will always produce the same UUID. In UUID, the namespace must be a UUID, and the name can be anything.

UUID V5 is useful for generating consistent unique identifiers for the same input data across different systems and contexts. Let's say we want to generate a user id based on their username. Here’s how you can achieve this in C#:

Here the UUID Version 5 solves several important problems, particularly when you need a consistent and unique identifier based on a given input.

For instance, consider a scenario where you need a user ID to make an API call (or anything else), but in your code, you only have the username accessible. How would the problem be solved if we were using UUID Version 4 (GUIDs)? Most likely, it would work something like this:

/* When using GUID (UUID v4) */

var userName = "bob"; // Lets assume we only have username
// API call or DB call to get the user id using name
var userId = await userService.GetUserIdAsync(userName);

await userService.ChangeUserNameAsync(userId, "bob-2");

By using UUID Version 5 with a shared namespace across all your projects, you can easily generate the user ID from the username without making any additional API calls. So the same code would look like this:

/* When using UUID v5 */

// From some shared code
var userNamespace = SharedConstants.UserNamespace;

var userName = "bob"; // Lets assume we only have username

//Generate the user id in place, without additional call
var userId = Uuid.NewNameBased(userNamespace, userName);

await userService.ChangeUserNameAsync(userId, "bob-2");

This approach eliminates the need for redundant API calls. In a distributed system, making an API call to fetch a user ID every time you need it can be inefficient and slow. With UUID Version 5, you can locally generate the user ID from the username (or any other input), reducing the need for network requests and significantly improving the efficiency of your application.

What kind of problem have we solved with UUID v5? Let's say you need a user ID to make an API call but in your code, you have only a username, if you have the namespace shared across all your projects. Then you can easily get the user id using a username, without making any API call. That's because UUID v5 always reproduces the same UUID for the same input.

Also, UUID Version 5 ensures uniqueness and consistency across different systems. When integrating multiple systems or microservices, it can be challenging to keep user IDs consistent across various services. By using the same namespace and the same input (such as a username), UUID Version 5 guarantees that the generated IDs are unique and consistent across all systems, facilitating smoother integration and data consistency.

UUID Version 7

GUID Version 7 is a proposed new version that aims to combine the strengths of both timestamp-based and random-based GUIDs.

Problems with UUID v4 (GUID)

UUID Version 4 generates non-time-ordered values, meaning the identifiers created are not sequential. Since these values are randomly generated, they won't be clustered together in a database index. Instead, inserts will occur at random locations, which can negatively impact the performance of common index data structures, such as B-trees and their variants.

In a scenario where your product requires frequent access to recent data, non-sequential identifiers create a significant challenge.

With UUID Version 4, the most recent data will be inserted randomly throughout the index, lacking clustering. As a result, retrieving the most recent data from a large dataset requires traversing numerous database index pages.

In contrast, using sequential identifiers ensures that the latest data is logically arranged at the right-most part of the index, making it much more cache-friendly. This organization allows for faster and more efficient retrieval of recent data, as it minimizes the number of index pages that need to be accessed which is a lack in UUID v4.

The solution with UUID v7

UUID v7 is designed to provide unique and sortable identifiers that are both easy to generate and useful for distributed systems. It uses a combination of timestamps and random data to ensure both uniqueness and temporal order.

The first part of the UUID is a timestamp that provides a chronological component, ensuring that UUIDs generated close together in time are also close together in value. The remaining part is filled with random data, ensuring the uniqueness of each identifier.

Buildkite post about migrating to UUID v7

UUID Versions 2, 3, and 6

You may have noticed that our discussion focuses on UUID Versions 1, 4, 5, and 7, and skips over Versions 2, 3, and 6. Here's why:

UUID Version 2: This version is rarely used in modern applications. It’s similar to Version 1 but includes additional fields for things like domain information (such as POSIX UID or GID). It was mainly used in legacy systems and is now considered largely obsolete.
UUID Version 3: This version is based on a name and a namespace, similar to Version 5. The main difference is that Version 3 uses the MD5 hashing algorithm, which is less secure and less efficient than the SHA-1 algorithm used in Version 5. Version 5 is generally preferred because SHA-1 is more robust.
UUID Version 6: Version 6 is still under draft as a proposed standard. It is meant to provide a time-ordered UUID with better performance for distributed systems, but since it hasn't been fully adopted yet, we focus on Version 7, which offers similar features and has more momentum.

Snowflake ID

Snowflake ID is a unique identifier generation system developed by Twitter to address the challenges of generating unique, sequential, and distributed identifiers in a highly scalable and efficient manner.

Unlike GUIDs, which are often non-sequential and can cause performance issues in database indexing, Snowflake IDs are designed to be both time-ordered and globally unique, making them ideal for distributed systems and databases where sequential order is important.

A Snowflake ID is a 64-bit integer composed of several distinct parts:

Timestamp (41 bits): The largest portion of the Snowflake ID is the timestamp, which records the number of milliseconds since a custom epoch (often set to the date when the system was first deployed). This ensures that IDs are time-ordered and can be easily sorted based on their creation time.
Datacenter ID (5 bits): This part of the ID identifies the datacenter where the ID was generated, allowing the system to generate unique IDs across multiple data centers without conflicts.
Machine ID (5 bits): Similar to the datacenter ID, the machine ID identifies the specific server or machine within the datacenter that generated the ID. This ensures that even within the same data center, IDs remain unique.
Sequence Number (12 bits): The sequence number is used to differentiate between multiple IDs generated within the same millisecond by the same machine. With 12 bits, up to 4,096 unique IDs can be generated per machine per millisecond.

The format was created by Twitter (now X) and is used for the IDs of tweets. It is popularly believed that every snowflake has a unique structure, so they took the name "snowflake ID". The format has been adopted by other companies, including Discord and Instagram. The Mastodon social network uses a modified version.

The format was first announced by X/Twitter in June 2010. Due to implementation challenges, they waited until later in the year to roll out the update.

X uses snowflake IDs for posts, direct messages, users, lists, and all other objects available over the API.
Discord also uses snowflakes, with their epoch set to the first second of the year 2015.
Instagram uses a modified version of the format, with 41 bits for a timestamp, and 10 bits for a sequence number.
Mastodon's modified format has 48 bits for a millisecond-level timestamp, as it uses the UNIX epoch. The remaining 16 bits are for sequence data.

"The Problem" stated by Twitter:

We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).

Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL. We needed something that could generate tens of thousands of ids per second in a highly available manner.

This naturally led us to choose an uncoordinated approach. These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.

Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.

Check out here for more information

Finding Tweet Timestamps

We all know that deleting a tweet isn't truly possible—once it's out there, it's how Twitter is designed. However, Twitter's use of Snowflake IDs adds an interesting twist to this narrative. Snowflake IDs are designed to be unique and time-ordered, which makes them not just identifiers but also a trail that can be tracked.

On May 11, 2019, Derek Willis from Politwoops uncovered a list of deleted tweet IDs. By using the Snowflake structure, he was able to extract the timestamps from these IDs, and discovered the 107 missing tweets. This finding inspired the creation of TweetedAt, a tool designed to accurately retrieve timestamps from Snowflake IDs and estimate the timing of tweets generated before Snowflake was in use.

Check out here.

Wrapping Up

Unique identifiers play a critical role in software engineering, ensuring data integrity and enabling efficient data management across distributed systems.

From traditional GUIDs to modern solutions like Snowflake IDs, each identifier system offers distinct advantages tailored to specific use cases.

As technology evolves, understanding these systems and their implementations becomes increasingly important for scaling applications effectively. By exploring the various versions and alternatives, we can make informed decisions that best suit our needs in managing data at scale.

Cover image: A 2017 post celebrating Facebook reaching 2 billion users.

Best Practices for Scaling Your Node.js REST APIs

freeCodeCamp — Thu, 15 Sep 2022 16:59:00 +0000

By Rishabh Rawat

There is more to scalability than using cluster mode. In this tutorial, we'll explore 10 ways you can make your Node.js API ready to scale.

When working on a project, we often get a few real nuggets here and there on how to do something in a better way. We get to learn retrospectively, and then we're fully prepared to apply it next time around.

But how often does that actually work out? I don't even remember what I did yesterday sometimes. So I wrote this article.

This is my attempt to document some of the best Node.js scalability practices that are not talked about as often.

You can adopt these practices at any stage in your Node.js project. It doesn't have to be a last-minute patch.

With that said, here's what we will cover in this article:

🚦Use throttling
🐢 Optimize your database queries
䷪ Fail fast with circuit breaker
🔍 Log your checkpoints
🌠 Use Kafka over HTTP requests
🪝 Look out for memory leaks
🐇 Use caching
🎏 Use connection pooling
🕋 Seamless scale-ups
💎 OpenAPI compliant documentation

Use Throttling

Throttling allows you to limit access to your services to prevent them from being overwhelmed by too many requests. It has some clear benefits – you can safeguard your application whether it's a large burst of users or a denial-of-service attack.

The common place to implement a throttling mechanism is where the rate of input and output don't match. Particularly, when there is more inbound traffic than what a service can (or wants to) handle.

Let’s understand with a visualization.

Your application is throttling requests from News Feed Service

There's throttling at the first junction point between your application and the News Feed Service:

News Feed Service (NFS) subscribes to your application for sending notifications.
It sends 1000 requests to your application every second.
Your application only handles 500 requests/sec based on the billing plan NFS subscribed to.
Notifications are sent for the first 500 requests.

Now it is very important to note that all the requests by NFS that exceed the quota of 500 requests/sec should fail and have to be retried by the NFS.

Why reject the extra requests when you can queue them? There are a couple of reasons:

Accepting all the requests will cause your application to start accumulating them. It will become a single point of failure (by RAM/disk exhaustion) for all the clients subscribed to your application, including NFS.
You should not accept requests that are greater than the scope of the subscription plan of your clients (in this case, NFS).

For application level rate limiting, you can use express-rate-limit middleware for your Express.js API. For network level throttling, you can find solutions like WAF.

If you are using a pub-sub mechanism, you can throttle your consumers or subscribers as well. For instance, you can choose to consume only limited bytes of data when consuming a Kafka topic by setting the maxBytes option.

Optimize Your Database Queries

There will be times when querying the database is the only choice. You might have not cached the data or it could be stale.

When that happens, make sure your database is prepared for it. Having enough RAM and disk IOPS is a good first step.

Secondly, optimize your queries as much as possible. For starters, here are a couple of things that will set you on the right path:

Try to use indexed fields when querying. Don't over-index your tables in hopes of the best performance. Indexes have their cost.
For deletes, stick to soft deletes. If permanent deletion is necessary, delay it. (interesting story)
When reading data, only fetch the required fields using projection. If possible, strip away the unnecessary metadata and methods (for example, Mongoose has lean).
Try to decouple database performance from the user experience. If CRUD on the database can happen in the background (that is, non-blocking), do it. Don't leave the user waiting.
Directly update the desired fields using update queries. Don't fetch the document, update the field, and save the whole document back to the database. It has network and database overhead.

Fail Fast with a Circuit Breaker

Imagine you get burst traffic on your Node.js application, and one of the external services required to fulfill the requests is down. Would you want to keep hitting the dead end for every request thereafter? Definitely Not. We don't want to waste time and resources on the requests destined to fail.

This is the whole idea of a circuit breaker. Fail early. Fail fast.

For example, if 50 out of 100 requests fail, it doesn't allow any more requests to that external service for the next X seconds. It prevents firing requests that are bound to fail.

Once the circuit resets, it allows requests to go through. If they fail again, the circuit breaks and the cycle repeats.

Node.js Opposum circuit breaker states

To learn more about how to add a circuit breaker to your Node.js application, check out Opposum. You can read more on circuit breakers here.

Log Your Checkpoints

A good logging setup allows you to spot errors quickly. You can create visualizations to understand your app's behavior, set up alerts, and debug efficiently.

You can check out the ELK stack for setting up a good logging and alerting pipeline.

While logging is an essential tool, it is very easy to overdo it. If you start logging everything, you can end up exhausting your disk IOPS causing your application to suffer.

As a good rule of thumb is to only log checkpoints.

Checkpoints can be:

Requests, as they enter the main control flow in your application and after they are validated and sanitized.
Request and response when interacting with an external service/SDK/API.
The final response to that request.
Helpful error messages for your catch handlers (with sane defaults for error messages).

PS: If a request goes through multiple services during the lifecycle, you can pass along a unique ID in the logs to capture a particular request across all the services.

Use Kafka Over HTTP Requests

While HTTP has its use-cases, it is easy to overdo it. Avoid using HTTP requests where it is not necessary.

Let's understand this with the help of an example.

Overview of Kafka pub-sub using topics

Let's say you are building a product like Amazon and there are two services:

Vendor service
Inventory service

Whenever you receive new stock from the vendor service, you push the stock details to a Kafka topic. The inventory service listens to that topic and updates the database acknowledging the fresh restock.

To note that, you push the new stock data into the pipeline and move on. It is consumed by the inventory service at its own pace. Kafka allows you to decouple services.

Now, what happens if your inventory service goes down? It is not straightforward with HTTP requests. Whereas in the case of Kafka, you can replay the intended messages (for example using kcat). With Kafka, you do not lose data after consumption.

When an item comes back in stock, you might want to send out notifications to the users who wishlisted it. To do that, your notification service can listen to the same topic as the inventory service. This way, a single message bus is consumed at various places without HTTP overhead.

The Getting Started page of KafkaJS shares the exact snippet to get you started with a basic setup in your Node.js application. I’d highly recommend checking it out, as there's a lot to explore.

Look Out for Memory Leaks

If you don't write memory-safe code and don't profile your application often, you may end up with a crashed server.

You do not want your profiling results to look like this:

setTimeout retaining 98% memory after execution is over

For starters, I would recommend the following:

Run your Node.js API with the --inspect flag.
Open chrome://inspect/#devices in your Chrome browser.
Click inspect > Memory tab > Allocation instrumentation on timeline.
Perform some operations on your app. You can use apache bench on macOS to fire off multiple requests. Run curl cheat.sh/ab in your terminal to learn how to use it.
Stop the recording and analyze the memory retainers.

If you find any large blocks of retained memory, try to minimize it. There are a lot of resources on this topic. Start by googling "how to prevent memory leaks in Node.js".

Profiling your Node.js application and looking for memory utilization patterns should be regular practice. Let's make "Profiling Driven Refactor" (PDR) a thing?

Use Caching to Prevent Excessive Database Lookup

The goal is to not hit the database for every request your application gets. Storing the results in cache decreases the load on your database and boosts performance.

There are two strategies when working with caching.

Write through caching makes sure the data is inserted into the database and the cache when a write operation happens. This keeps the cache relevant and leads to better performance. Downsides? Expensive cache as you store infrequently used data to the cache as well.

Whereas in Lazy loading, the data is only written to the cache when it is first read. The first request serves the data from the database but the consequent requests use the cache. It has a smaller cost but increased response time for the first request.

To decide the TTL (or Time To Live) for the cached data, ask yourself:

How often the underlying data changes?
What is the risk of returning outdated data to the end user?

If it is okay, having more TTL will help you with a better performance.

Importantly, add a slight delta to your TTLs. If your application receives a large burst of traffic and all of your cached data expires at once, it can lead to unbearable load on the database, affecting user experience.

final TTL = estimated value of TTL + small random delta

There are a number of policies to perform cache eviction. But leaving it on default settings is a valid and accepted approach.

Use Connection Pooling

Opening a standalone connection to the database is costly. It involves TCP handshake, SSL, authentication and authorization checks, and so on.

Instead, you can leverage connection pooling.

Database connection pool

A connection pool holds multiple connections at any given time. Whenever you need it, the pool manager assigns any available/idle connection. You get to skip the cold start phase of a brand new connection.

Why not max out the number of connections in the pool, then? Because it highly depends on your hardware resources. If you ignore it, performance can take a massive toll.

The more the connections, the less RAM each connection has, and the slower the queries that leverage RAM (for example sort). The same principle applies to your disk and CPU. With every new connection, you are spreading your resources thin across the connections.

You can tweak the number of connections till it matches your needs. For starters, you can get an estimate on the size you need from here.

Read about the MongoDB connection pool here. For PostgreSQL, you can use the node-postgres package. It has built-in support for connection pooling.

Seamless Scale-ups

When your application's user base is starting to grow and you have already hit the ceiling on vertical scaling, what do you do? You scale horizontally.

Vertical scaling means increasing the resources of a node (CPU, memory, etc.) whereas horizontal scaling involves adding more nodes to balance out the load on each node.

If you’re using AWS, you can leverage Automatic Scaling Groups (ASG) which horizontally scales the number of servers based on a predefined rule (for example when CPU utilization is more than 50%).

You can even pre-schedule the scale up and scale down using scheduled actions in case of predictable traffic patterns (for example during the World Cup finals for a streaming service).

Once you have your ASG in place, adding a load balancer in front will make sure the traffic is routed to all the instances based on a chosen strategy (like round robin, for example).

Load balancing multiple targets based on predefined rules

PS: It is always a good idea to estimate the requests your single server can handle (CPU, memory, disk, and so on) and allocate at least 30% more.

OpenAPI Compliant Documentation

It might not directly affect your ability to scale a Node.js application, but I had to include this in the list. If you've ever done an API integration, you know it.

It is crucial to know everything about the API before you take a single step forward. It makes it easy to integrate, iterate, and reason about the design. Not to mention the gains in the speed of development.

Make sure to create OpenAPI Specification (OAS) for your Node.js API.

It allows you to create API documentation in an industry-standard manner. It acts as a single source of truth. When defined properly, it makes interacting with the API much more productive.

I have created and published a sample API documentation here. You can even inspect any API using the swagger inspector.

You can find all of your API documentations and create new ones from the Swagger Hub dashboard.

Now you go, captain!

We have looked at ten lesser-known best practices to prepare Node.js for scale and how you can take your first steps with each one of them.

Now it is your turn to go through the checklist and explore the ones you find lacking in your Node.js application.

Grab your checklist ✨

I hope you found this helpful and it gave you some pointers to move forward in your scalability endeavor. This is not an exhaustive list of all the best practices – I have just included the ones I found are not talked about as much based on my experience.

Feel free to reach out on Twitter. I'd love to hear your feedback and suggestions on other best practices that you are using.

Liked the article? Get the improvement pills on backend web development 💌.

Horizontal vs. Vertical Scaling – How to Scale a Database

Sophia Iroegbu — Thu, 09 Jun 2022 15:26:24 +0000

Data Scalability

Data scalability refers to the ability of a database to manipulate changing demands by adding and removing data. In this way, the database grows at the same pace as the software.

Via scaling, the database can expand or contract the capacity of the system's resources to support the application's frequently changing usage.

There are two ways a database can be scaled:

Horizontal scaling (scale-out)
Vertical scaling (scale-up)

In this article, we'll look at both methods of scaling and discuss the advantages and disadvantages of each to help you choose.

Horizontal Scaling

This scaling approach adds more database nodes to handle the increased workload. It decreases the load on the server rather than expanding the individual servers.

When you need more capacity, you can add more servers to the cluster. Another name for this scaling method is Scaling out.

Advantages of Horizontal Scaling:

It is easy to upgrade
It is simple to implement and costs less
It offers flexible, scalable tools
It has limitless scaling with unlimited addition of server instances
Upgrading a horizontally scaled database is easy – just add a node to the server

Disadvantages of Horizontal Scaling:

Any bugs in the code will become more complex to debug and understand
The licensing fee is expensive as you will have more nodes that are licensed
The cost of the data center will increase significantly because of the increased space, cooling, and power required

When to use horizontal scaling:

If you are dealing with more than a thousand users, it is best to use this scaling system because when the servers receive multiple user requests, everything will scale well.

It will also not crash because there are multiple servers.

Vertical Scaling

The vertical scaling approach increases the capacity of a single machine by increasing the resources in the same logical server. This involves adding resources like memory, storage, and processing power to existing software, enhancing its performance.

This is the traditional method of scaling a database. Another name for this approach is Scale-up.

Advantages of Vertical Scaling:

The cost of the data center for the space, cooling, and power will be smaller
It is a cost-efficient software
It is easy to use and implement – the administrator can easily manage and maintain the software
The resources for this approach are flexible

Disadvantages of Vertical Scaling:

The cost may be low, but you will need to pay for a license each time you scale up
The hardware costs more because of high-end servers
There is a limit to the amount you can upgrade
You are restricted to a single database vendor, and migration is challenging, or you may need to start over

When to use vertical scaling:

The vertical scaling approach is for you if you need a system with unique data consistency.

If you don't want to worry about balancing the server's workload, vertical scaling is the best option.

Differences Between Vertical and Horizontal Scaling

Vertical	Horizontal
The license costs less	The license costs more
This method increases the power of the server with additional individual servers	This method increases the power of the server with the existing server
This data is present on one single node, and it is scaled through a multicore	This is based on partitioning each node that contains a single part of data

Which scaling method is best for your app?

When choosing how to scale your database, you must consider what's at stake when you scale up and out.

Now we'll take a look at some factors to consider so you can choose which scaling system is best for your app:

Load balancing

The vertical scaling system is best for balancing loads because you have a single server (vertical scaling), and there is no need to balance your load. Horizontal scaling requires you to balance the workload evenly.

Point of failure

The horizontal scaling system has more than one server, so when one server crashes, the next one picks up the slack. This means that there is no single point of failure which makes the system resilient.

But in the vertical scaling system, there is only one server, so once the server crashes, everything goes offline.

Speed

In terms of speed, the vertical scaling system is faster because, since it runs on one server, the vertical scaling system has an interprocess communication – that is, the server communicates within itself and it's fast.

The horizontal scaling system has network calls between two or more servers. This is also known as Remote Procedure Calls (RPC). RPCs are slow, though.

Data consistency

When dealing with servers, you'll need to make sure that the data stored in them is consistent when end users send a request.

The vertical scaling system is data consistent because all information is on a single server. But the horizontal scaling system is scaled out with multiple servers, so data consistency can be a huge issue.

Hardware limitations

The horizontal scaling system scales well because the number of servers you throw at a request is linear to the number of users in the database or server. The vertical scaling system, on the other hand, has a limitation because everything runs on a single server.

When choosing a system to scale your database, make sure to make a pros and cons list of the information in this article. It will help you decide which to use.

Conclusion

A cloud computing model's scalability is the ability to quickly and instantly increase or decrease an IT capacity. Knowing how the two types of scaling work is crucial as this plays a massive role in your database or server management.

Quick recap...

A server's role is to enhance its capacity to handle the increased workload, called Vertical scaling.
A system's job is to add new nodes to manage the distributed workload, termed Horizontal scaling.
The horizontal scaling system scales well as the number of users increases.
The vertical scaling system is faster due to its ability to inter-process communication.

Thanks for reading!

How to Scale a Distributed System

freeCodeCamp — Mon, 13 Dec 2021 23:37:24 +0000

By Apoorv Tyagi

Designing a distributed system that supports millions of users is a complex task, and one that requires continuous improvement and refinement.

Recently I read a book by Alex Xu called "System Design Interview – An Insider's Guide". This article, inspired by the first part of the book, shares some popular techniques used by many large tech companies to scale their architecture to support up to a million users.

This is not an exhaustive list, but if you're a newer developer who's just getting started, this can help you build a stronger foundation for your career.

Use a Load Balancer**

A load balancer is a device that evenly distributes network traffic across several web servers. In this architecture, the clients do not connect to the servers directly – instead they connect to the public IP of the load balancer.

Using a load balancer also protects your site in the event of web server failure – and this, in turn, improves availability. For example,

If one server goes down, all the traffic can be routed to the second server. This prevents the overall system from going offline.
If in the future the traffic grows and these two servers are not enough to handle all the requests properly, then you just need to add more servers to your pool of web servers and the load balancer automatically starts distributing requests to them.

Load Balancing Algorithms

Let's look at some of the algorithms which a load balancer can use to choose a web server from a pool for an incoming request:

Round Robin – You start from the first server in the pool, move down to the next server, and when you're done with the last server you loop back up to the first and start working down the pool again.
Load-based server – You assign a server based on whichever server has the smallest load currently, thereby increasing throughput.
IP Hashing – You assign a server by hashing the IP address of incoming requests and using the hash value to do the modulo operation with the number of servers available in the server pool.

Use Caching

A cache stores the result of the previous responses so that any subsequent requests for the same data can be served faster. So you can use caching to minimize the network latency of a system.

You can significantly improve the performance of an application by decreasing the network calls to the database. This is because repeated database calls are expensive and cost time.

For example, every time a new user loads a website's home page, one or more database calls are made to fetch the data. This increases the response time. Caching can alleviate this problem by storing the results you know will get called often and those whose results get modified infrequently.

Here are a few considerations to keep in mind before using a cache:

Set an expiration policy: You should always have an expiration policy on your cache. If you don't have one, the data will get stored in the cache permanently and it will become stale.
Sync the cache and database: You should build a mechanism to keep the database and the cache in sync. If any data modifying operations occur in the databases and the same change doesn't reflect in the cache then it will introduce inconsistencies in your system.
Set an eviction policy: You should have an algorithm that can decide which existing items will get removed once the cache is full and you get a request to add other items to the cache. Least-recently-used (LRU) is one of the most popular cache eviction policies used today.

Use a Content Delivery Network (CDN)**

A CDN or a Content Delivery Network is a network of geographically distributed servers that help improve the delivery of static content from a performance perspective. CDN servers are generally used to cache content like images, CSS, and JavaScript files.

Here is how a CDN works:

When a client sends a request, a CDN server to the client will deliver all the static content related to the request.
If the CDN server does not have the required file, it then sends a request to the original web server.
The CDN caches the file and returns it to the client.
Let's say now another client sends the same request, then the file is returned from the CDN.

Here are a few considerations to keep in mind before using a CDN:

Cost: CDNs are generally run by third-party providers and they charge you for the data transfers in and out of the CDN. So caching infrequently used assets should not be stored in the CDN.
Fallback Mechanism: If a CDN fails, you should be able to detect it and start sending requests for resources from the original web server. So you should build a mechanism for how your application copes with a CDN failure.

Set Up a Message Queue**

A message queue allows an asynchronous form of communication. It acts as a buffer for the messages to get stored on the queue until they are processed.

The architecture of a message queue includes an input service, called publishers, that creates messages, publishes them to a message queue, and sends an event. Another service called subscribers receives these events and performs actions defined by the messages.

Both publishers and subscribers are decoupled from each other and that's what makes the message queue a preferred architecture for building scalable applications.

Message queue example

Consider the following use case:

You are building an application for ticket booking. As soon as a user completes their booking, a message confirming their payment and ticket should be triggered. This task may take some time to complete and it should not make our system wait for processing the next request.

Here, we can push the message details along with other metadata like the user's phone number to the message queue. Another worker service picks up the jobs from the message queue and asynchronously performs the message creation and sending tasks.

The publishers and the subscribers can be scaled independently. When the size of the queue increases, you can add more consumers to reduce the processing time.

Choose Your Database Wisely

According to Wikipedia:

A database is an organized collection of data stored and accessed via a computer system.

Databases are used for the persistent storage of data. We generally have two types of databases, relational and non-relational.

➔ Relational Database

A relational database has strict relationships between entries stored in the database and they are highly structured. This is to ensure data integrity. For example, adding a new field to the table when its schema doesn't allow for it will throw an error.

Another important feature of relational databases is ACID transactions.

ACID transactions

These are a set of features that describe any given transactions (a set of read or write operations) that a good relational database should support.

Atomicity means that when a transaction that comprises more than one operation takes place, the database must guarantee that if one operation fails the entire transaction fails. Either it happens completely or doesn't happen at all.

Consistency means that each transaction in a database does not violate the data integrity constraints whenever the database changes state and does not corrupt the data. In simple terms, consistency means for every "read" operation, you'll receive the most recent "write" operation results.

Isolation means that you can run multiple concurrent transactions on a database, without leading to any kind of inconsistency. All these multiple transactions will occur independently of each other.

Durability means that once the transaction has completed execution, the updated data remains stored in the database. It will be saved on a disk and will be persistent even if a system failure occurs.

➔ Non-Relational Databases

A non-relational database has a less rigid structure and may or may not have strict relationships between the entries stored in the database. The data typically is stored as key-value pairs. For example:

[
    { 
        firstName: "Apoorv",
        lastName: "Tyagi",
        gender: "M"
    },
    { 
        name: "Judit",
        rank: "Polgar",
        gender: "F"
    },
    {
      //...
    },
]

Similar to the ACID properties of relational databases, the non-relational database offers BASE properties:

Basically Available (BA) which states that the system guarantees availability even in the presence of multiple failures.

Soft State (S) means the state of the system may change over time, even without application interaction due to eventual consistency. In NoSQL, unlike RDBMS, it is believed that data consistency is the developer's responsibility and should not be handled by the database.

Eventual Consistency (E) means that the system will become consistent "eventually". However, there's no guarantee of when this will happen.

NoSQL vs SQL

Non-relational databases (also often referred to as NoSQL databases) might be a better choice if:

Your application requires low latency. Since there are no complex JOIN queries.
You have a large amount of unstructured data, or you do not have any relation among your data.

How to Scale a Database

Let's now look at the various ways you can scale your database:

Vertical vs horizontal database scaling

In vertical scaling, you scale by adding more power (CPU, RAM) to a single server.

In horizontal scaling, you scale by simply adding more servers to your pool of servers.

For low-scale applications, vertical scaling is a great option because of its simplicity. But vertical scaling has a hard limit. It is practically not possible to add unlimited RAM, CPU, and memory to a single server.

Because of this, it is recommended that you go for horizontal scaling (also known as sharding) for large-scale applications.

Database replication

This is the process of copying data from your central database to one or more databases.

You do database replication using primary-replica (formerly known as master-slave) architecture. The primary database generally only supports write operations. All the data modifying operations like insert or update will be sent to the primary database.

On the other hand, the replica databases get copies of the data from the primary database and only support read operations. All the data querying operations like read, fetch will be served by replica databases.

Advantages of database replication:

Performance Improvements: Database replication improves performance significantly as all the writes and updates happen in the primary node and all the read operations are distributed to replica nodes, thereby allowing more queries to run in parallel.
High Availability: Since we create replicas of data across different nodes available in different parts of the world, the application remains functional even if one database node goes offline as you can access data from other nodes. In case the failure occurs in the primary node, any one of the replica nodes will get promoted to a primary node and serve the write/update operations until the original primary node comes back online.

Wrapping Up

That's it. Thanks for stopping by. I hope you found this article interesting and informative!

My DMs are always open if you want to discuss further on any tech topic or if you've got any questions, suggestions, or feedback in general:

Happy learning! 💻 😄

How to Scale a System With Process Splitting and Redis

freeCodeCamp — Tue, 20 Jul 2021 19:26:45 +0000

By Pramono Winata

Have you ever gotten into trouble trying to handle a single process that's really huge or heavy? If so, I can help you figure out how to better manage it.

In this article I will be sharing how I'm currently managing a single message that is too big to be processed on a single process. I've split it into different chunks, which results in separate processes.

I won't go into much technical detail, but more of the architectural process.
I'll discuss some bits about caching usage and pubsub, but I will not go into details on the implementation. Instead, I'll focus on the pattern itself.

The Problem

_Photo by [Unsplash](https://unsplash.com/@dsmacinnes?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">Danielle MacInnes / My First Approach

_Photo by [Unsplash](https://unsplash.com/@dose?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">Dose Media / How to Handle Finishing Processes

_Photo by [Unsplash](https://unsplash.com/@tumbao1949?utm_source=ghost&utm_medium=referral&utm_campaign=api-credit">James Wainscoat / Redis, and I am using that to deal with my issue here.

If you are not familiar with Redis, it is a service that is generally used as a cache.

We will manage our Redis mechanism like this:

Adding Redis to mark our process

The process looks exactly the same as before, but with the addition of Redis in the middle. You need to make sure you have a valid initial count for this case.

In my case, since I'm publishing a list, I can easily put the length of my list as my initial counter. And for the counter, I can just decrease it by one each time a process has finished. Then I will be able to know if I have finished all my processes simply by referring to my Redis counter. If it has reached 0, it means that I can safely mark that all of my processes are done.

Wrapping Up

To sum it all up, I split the message into several messages which will be processed all together in several processes. To manage the message processes, I use Redis caching.

The solution that I have described above will not be a silver bullet every time you have a problem processing a very big message. There are other ways like streaming your message, but that will be a story for another day.

Thanks for reading my article through to the end! I sincerely hope that you enjoyed and found my article interesting and, most importantly, that it was useful.

How to Maintain Scalability in Your Python Code

freeCodeCamp — Tue, 20 Aug 2019 17:51:32 +0000

By Shifa Martin

Any application that processes data can start to perform slowly or even start to corrupt or break. It is better if developers are able to program quickly and add more value to coding.

As developers, we should have tools to prototype quickly. That’s why we should invest effort in making an app that is scalable. Broadly, building a substantial and scalable application is possible with the Python programming language.

Python is a high-level programming language that is also object-oriented. With its qualities such as built-in data structures, dynamic binding, and dynamic typing, we can use it to develop applications as rapidly as possible.

Python can also be used as a glued scripting language that integrates the existing components and helps us build scalable applications.

Python is one of the pioneers of programming languages that developers can use to do all the scaling work.

Here are some tips you can check out for developing scalable apps in Python.

Learn to Cleverly Use ‘Collections’

Python support rich and powerful data structures/containers for ‘collections’ such as dict, list, set, and tuple. They are so valuable in building scalable apps. However, overusing them can impact code scalability. It's easy to spot when collections have been overused.

# notebooks.csv holds meta information on a collection of notebooks:
# heading, writer, year of pub, etc.


# load_from_file returns a list of dicts.


notebooks = load_from_file('notebooks.csv')


notebook_summaries = dict()
for notebook in notebooks:


   notebook_summaries[notebook["heading"]] = notebook["summary"]


for heading, summary in notebook_summaries.items():


   # Do something interesting with the summary.


   print(heading, summary)

From the above code, you can clearly see it creates table mapping titles after reading notebook data from the CSV file. If you see it from the memory-usage viewpoint, there is nothing wrong if notebooks.csv has hundreds of titles.

However, it is not right if it is related to the inventory of entire notebook stores with dozens of titles. You can have either one or two issues with your coding which also depends on what version you are using, Python 2 or Python 3.

This creates a bottleneck issue with the scalability of code memory. Creating a data structure called notebook_summaries is unnecessary here but it improves the readability. The “for” line helps you immediately know that a loop is running here through the summaries.

The new data structure contains the full summary of every notebook that is likely to consume more memory than all the other fields. Suppose if a notebook consumes N bytes of memory, then the complete block will consume at least 1.5 * N bytes.

This will scale better in Python 3

notebooks = load_from_file('notebooks.csv')


for notebook in notebooks.items():


   print(notebook["title"], notebook["summary"])

I recommend that you create variables that are well-named as it helps boost the maintainability of your Python code.

Intelligent Iterating of Python Codes

While developing large-size applications with Python, scalability is not the only thing you should consider. You can face several other problems. For example, the iteration issue is the most common one.

Sometimes the for line in your coding iterates over notebook_summaries.items() and creates another copy of notebooks. This iteration of code can be responsible for low code performance in which Python code starts to hang before initiating the for loop.
This happens because the notebook_summaries.items() forms a very large list that consumes more memory. Also it is because the Python code executes the bytecode after the forloop.

It will start allocating more memory for this list. Again the iterating issue affects Python 2 as well as Python 3's items() and makes an extra copy of notebooks_summaries' contents. Developers can use iteritems instead of items in Python 2:

In Python 2, use "iteritems" instead of "items"

notebooks = load_from_file('notebooks.csv')


    for notebook in notebooks.iteritems():


      print(notebook["title"], notebook["summary"])

So, the point here is to notice the difference between using an iterator in all Python versions and creating a list. It is the developer’s responsibility to justify the right pattern according to the coding context.

Using ‘Generators’ For Scalability in Python Code

The generator function allows you to create iterators in a simpler manner. Imagine you are working on building a software program as Grammarly that takes in text, analyze the sentences, and perform some kind of grammar analysis. Each line of sentence will be split by a period followed by one or more characters.

See the coding

import re
text = '''Full body of text. It has many sentences.

Some have grammatical errors and some are correct.'''

sentences = re.analyzed(r'\.\s+', text)

for sentence in sentences:

   print(sentence)

Run the listing

This is a body of text
It has many sentences

Some have grammatical errors and some are correct.

import random
def weathermaker(volatility, days):
    '''
    Yield a series of messages giving the day's weather and occasional commentary
    volatility ‑ a float between 0 and 1; the greater this number the greater
  the likelihood that the weather will change on each given day
    days ‑ number of days for which to generate weather
    '''
    #Always start as if yesterday were sunny
    current_weather = 'sunny'
    #First item is the probability that the weather will stay the same
    #Second item is the probability that the weather will change
    #The higher the volatility the greater the likelihood of change
    weights = 1.0‑volatility, volatility    #For fun track how many sunny days in a row there have been
    sunny_run = 1
    #How many rainy days in a row there have been
    rainy_run = 0
    for day in range(days):
        #Figure out the opposite of the current weather
        other_weather = 'rainy' if current_weather == 'sunny' else 'sunny'
        #Set up to choose the next day's weather. First set up the choices
        choose_from = current_weather, other_weather        #random.choices returns a list of random choices based on the weights
        #By default a list of 1 item, so we grab that first and only item with 0 current_weather = random.choices(choose_from, weights)0        yield 'today it is ' + current_weather
        if current_weather == 'sunny':
            #Check for runs of three or more sunny days
            sunny_run += 1
            rainy_run = 0
            if sunny_run >= 3:
                yield "Uh oh! We're getting thirsty!"
        else:
            #Check for runs of three or more rainy days
            rainy_run += 1
            sunny_run = 0
            if rainy_run >= 3:
                yield "Rain, rain go away!"
    return

#Create a generator object and print its series of messages
for msg in weathermaker(0.2, 10):
    print(msg)

Output

$ python weathermaker.py
today it is sunny
today it is sunny
Uh oh! We're getting thirsty!
today it is sunny
Uh oh! We're getting thirsty!
today it is sunny
Uh oh! We're getting thirsty!
today it is rainy
today it is sunny
today it is rainy
today it is rainy
today it is rainy
Rain, rain go away!
today it is rainy
Rain, rain go away!

From the above code it’s clear that Python generators are a great way to quickly create iterators. They have many benefits, and they allocate memory for each sentence one at a time. They also make it easier for developers to modify the code without screwing up.

Another benefit generators provide is the encapsulation that provides new and useful ways for you to package and isolate the internal code dependencies. This is why you can use generators in for loops.

You can add multiple yield statements in a generator

def nums3():
   n = 0

   while n < 6:

  yield n
       n += 1
   yield 63 # Second yield
for num in nums3():
   print(num

Output

Explanation of the code above

Here the second yield is completed after the whileloop exits. When the function reaches the implicit return at the end, the iteration stops.

Final Words

So, if you don’t use generators in your python code yet, learn to do so. I know you will be glad you did it. They are the core part of Python coding and can be useful for your next application development on Python.

No doubt, Python is a very useful, diverse, and well-maintained language, and there is no bound to the features. However, I have shared the ideas which I use in my day to day coding process to make things simple.

ValueCoders is an experienced software development company. In case you need the Python development services, feel free to get in touch.

Essentials of monorepo development

freeCodeCamp — Thu, 13 Jun 2019 18:51:48 +0000

By Ovidiu Bute

The word monorepo is a combination between “mono”, as in the Greek word mónos (in translation, alone) and an abbreviation of the word repository. A simple concept if taken verbatim: one lonely repository. The domain is software engineering so we’re referring to a home for source code, multimedia assets, binary files, and so on. But this definition is just the tip of the iceberg, since a monorepo in practice is so much more.

In this article I plan to distill the pros and cons of having every piece of code your company owns in the same repository. At the end you should have a good idea about why you should consider working like this, what challenges you’ll face, what problems it’ll solve, and how much you’ll need to invest in it.

Relative interest in the term “monorepo” since 2004, source: Google Trends

The term itself, as visible in the chart above, looks to be as new as 2017. However it would be a mistake to think that previously nobody was storing all of their code in one place. In fact during my first job back in 2009, the company I worked at stored every project in a single SVN repository, one directory per project. Indeed you may well be able to trace this practice back even further. But how can we explain the recent explosive popularity, then?

The reality is that storing code in a single spot is not the main selling point. In the past years the major tech companies — Google, Facebook, or Dropbox have been showing off their way of working together within the same repository at massive scale. Organizations of tens of thousands of engineers collaborating within one repository is an awesome sight. And a difficult engineering problem. So difficult in fact that these companies invest a lot of money into tools and systems that allow developers to work productively. These systems in turn have solved problems that you may not even realize you had. This is what fascinates people during tech talks. This is what’s been driving searches since 2017.

Front-end development at Google, Alex Eagle: https://medium.com/@Jakeherringbone/you-too-can-love-the-monorepo-d95d1d6fcebe
Google monorepo presentation, Rachel Potvin: https://www.youtube.com/watch?v=W71BTkUbdqE
Scaling Mercurial to the size of Facebook’s codebase, Durham Goode: https://code.fb.com/core-data/scaling-mercurial-at-facebook/

I’ve identified a few core features that a Google or a Facebook vetted monorepo offers. This is surely not an exhaustive list, but it’s a great starting point. When discussing each of one of these points, I took into consideration what life looks like without them, and what exactly do they solve. Certainly in our field of work everything is a trade-off, nothing’s free. For every pro that I list someone will find use-cases that directly contradict me, butI’m OK with that.

All your code, regardless of language, is located in one repository

The first advantage of storing everything in once place may not be immediately obvious, but as a developer, simply being able to freely browse through everything is of great impact. It helps foster a sort of team spirit and is also a very valuable and cheap way to distribute information. Have you ever asked yourself what projects are in development at your company? Past and present? Curious what a certain team is up to? How have they solved a particular engineering problem? How are they writing unit-tests?

In direct opposition to the monorepo we have the multirepo structure. Each project or module gets its own separate space. In such a system developers can spend quite a bit of time getting answers to the questions I listed above. The distributed nature of the work means there’s no single source of information that you can subscribe to.

There are companies that have transitioned from a multi to a monorepo layout by following only this feature from my list. Such a structure should not be confused with the topic of this article though. I’d define it instead as a collocated multirepo. Yes, everything is in one place, but the rest of the features on this list are far more interesting.

You‘re able to organize dependencies between modules in a controlled and explicit way

The traditional, battle tested way of handling dependencies is by publishing versions to a separate storage system from continuous integration systems, or even manually, from development machines. These are versioned (or tagged) to make it easier to search later on. Now in a multirepo setup, each project has a set of dependencies of external origins (third parties) or internal, as in, published from inside the same company.

In order for one team to depend on another one’s code, everything needs to pass through a dependency management storage system. Examples of this are npm, MavenCentral, or PyPi. I said earlier that you can easily build a collocated multirepo just by storing everything in one place. Such a system is indirectly observable. Let’s examine why that’s important.

As developers, our time is split very unequally between reading and writing code. Now imagine having to debug an issue that has its root cause inside of a dependency. We can rule out third parties here, since that’s a difficult problem as it is. No, this problem occurs in a package published by another team in your company. If your project depends on the latest version, you’re in luck! Just navigate to the respective directory and grab a cup of coffee.

“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.”

― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

More often though you might depend on an older version. So now what do you do? Do you try and use your VCS to read through the older code? Do you try and read the actual artifact instead of the original code? What if it’s minified, as is usually the case with JavaScript?

Contrast this with Google’s system, for example — since code dependencies are direct, as in, there are essentially no versions anywhere, one can say the system is directly observable. The code you’re looking at is pretty much your entire world. I say mostly because of course there are always going to be minor exceptions to this rule, such as external dependencies that would be prohibitive to host yourself. But that shouldn’t take anything away from this discussion.

While we’re on the topic of dependency management we should touch upon the subject of restrictions. Imagine a project where you’re able to depend on any source file you need. Nothing is off limits, you can import anything. For those of you that started their careers at least 10 years ago, this sounds like business as usual for the time. This is an almost complete definition of a monolith.

The name implies grandeur, scale, but more importantly, singularity. Practically every source file inside of a monolith cannot live outside of it. There’s a fundamental reason for this is relevant to our discussion: you don’t have an explicit and audit-able way of managing dependencies inside of a monolith. Everything is up for grabs, and it feels free and cheap. So naturally, developers end up creating a complex graph of imports and includes.

Nowadays practically everyone is doing microservices, there can be little doubt about that. Given sufficient scale, a codebase becomes a beast, as everything is inexorably linked to each other. I’m sure many developers will provide counter-arguments that monoliths can be managed in a clean, reasonable way without falling into this trap. But exceptions simply reinforce the initial statement. Microservices solve this by defining clear boundaries and responsibilities, and a monorepo is a natural extension of this philosophy. Typically modules offer a set of public exports, or APIs, and other modules are only able to use those as part of their contracts.

Software modules reuse common infrastructure

This is a topic that’s very near and dear to my heart. I’ll define infrastructure in this context, that of a software codebase, as the essential tools necessary to ensure productivity and code quality.

One of the reasons why I think betting your company on multirepos is a mistake has to do with a set of basic requirements any software engineering project should meet:

A build system to be able to reliably produce a deliverable artifact.
A way to run automated tests.
A way to statically analyze code for common mistakes, potential bugs, and enforce best practices.
A way to install and manage third party dependencies, i.e. software modules which are external to your company.

If you have your code split in multiple repositories, you need to replicate this work everywhere. Don’t underestimate how much work this involves! All of the features listed above require at the very minimum a set of configuration files which need to be maintained in perpetuity. Having them copied across more than two places basically guarantees you will always generate technical debt.

I know that some companies go to extreme lengths to minimize the impact of this. They’ll have their configurations bundled as scaffolding (a la create-react-app or yeoman), and use them to setup new repositories. But as we’ve seen in the section before this one, there’s no way to enforce that everyone’s on the latest version of these boilerplate dependencies! The amount of time spent upgrading each repository individually increases linearly in large codebases. Given sufficient scale, practically all published versions of an internal package will be depended on at the same time!

There’s a quote I absolutely love that relates to this conundrum:

At scale, statistics are not your friend. The more instances of anything you have, the higher the likelihood one or more of them will break. Probably at the same time.

— Anne Curie

If you think distributed systems just refers to web services, I would disagree. Your codebase is an interconnected, living system. Tens, hundreds, or thousands of engineers are racing to get their code into production each day, all the while struggling to keep the build green and the code quality up. If anything, to me this sounds even scarier than a set of microservices :)

Changes are always reflected throughout the entire repository

This is highly dependent on the rest of the features. It’s one of the benefits that’s easier to understand through example.

Let’s say I work at a company that builds web applications for customers all around the world. Everything is organized into modules, as is exemplified below via the popular open-source project Babel. At this company we all use ReactJS for front-end work, and out of pure coincidence, all of our projects are on the same version of it.

_Babel’s myriad of modules: [https://github.com/babel/babel/tree/master/packages](https://github.com/babel/babel/tree/master/packages" data-href="https://github.com/babel/babel/tree/master/packages" class="markup--anchor markup--figure-anchor" rel="nofollow noopener noopener" target="blank)

But the folks at Facebook publish the latest version of React and we realize that upgrading to it is not trivial. To be more productive, we’ve built a library of reusable components that resides as a separate module. All projects depend on it. This new React version brings lots of breaking changes that affect it. What options do we have for doing the upgrade?

This is typically where monorepo adversaries would shoot down the entire concept. It’s easy to say that we’ve worked ourselves into a corner and that the multirepo structure would’ve been a superior choice given the circumstances. Indeed in the latter case what we would do is just gradually adopt the new React version in our projects one by one, preceded by a major version upgrade of our core components module.

But I would say this creates more problems than it solves. A core dependency breaking change release creates a schism in your engineering team. You now have two cores to maintain: the new one, which is used by a couple, brave teams in a few projects, and the older one, still depended on by almost the entire company.

Let’s take this problem to a bigger scale for further analysis. Our company may have some projects which are still in production, but are just in maintenance mode, and don’t have any active development teams assigned to them. These projects will probably be the last ones to migrate, extending the time window in which you keep working on two cores at the same time. The old version will still receive bugs or security fixes even though it’s deprecated, as you can’t risk your customers’ businesses.

All of this is to say that a multirepo solution promotes and enables a constant state of technical debt. There are lots of migrations going on, modules that depend on older versions of other modules, and many, many deprecation policies which may or may not be enforceable.

Let’s now consider an alternative solution to the React upgrade problem. By having all of the code in one place, and dependent on each other directly, without versioning, we’re left with one option: we have to do all of the work upfront, in all modules simultaneously.

If that sounds like a scary proposition, I don’t blame you. It’s terrifying to think about, at first. However the advantage is clear: no migrations, no technical debt, less confusion around the state of our codebase. In practical terms, there is one obstacle to overcome with this solution — there may be hundreds, thousands, or millions of lines of code that need to be changed all at once. By having separate projects we avoid the sheer volume of work by doing it piece by piece. It’s still the same total amount of changes, but we’re naturally inclined to think it would be easier to do that over time, rather than in one push.

To solve this last problem large companies have turned to codemods — programmatic transformations of source code that can run at very large scale. There are numerous tutorials out there if you’re interested, but the gist of it is — you write code that first detects certain patterns in your source code, and then applies specific changes to it. To take our React example further, you could write a codemod that replaces a deprecated API with a newer one, and even apply logic changes if necessary. Indeed this is how Facebook recommends you migrate from one version of their library to the next. It’s how they’re doing it internally. Check out their open-source examples.

Viewed from this angle, a migration doesn’t seem as scary as before. You do all of your research upfront, you define how you want to essentially rewrite the affected code, and apply the changes more or less all at once. This to me is a robust solution. I’ve seen it in action, it can be done. It’s indeed amazing when it works and lately more and more companies are adopting it.

Drawbacks

The old adage of “there’s no such thing as a free lunch” certainly applies here, as well. I’ve talked about a lot of pros, but there are some cons which you need to think about.

Given that everyone is working in the same place, and everything is interconnected, tests become the blood of the whole system. Trying to make a change that impacts potentially thousands of lines of code (or more) without the safety net of automated tests is simply not possible.

Why is this any different from traditional ways of storing code? I’d say that versioned modules hide this particular problem, at the expense of creating technical debt. If you own a module that depends on another team’s code, by way of a strict version number, then you’re in charge of upgrading it. If you don’t have sufficient test coverage, you’ll err on the side of caution and simply delay upgrading until you’re confident the module doesn’t affect your own project. As we’ve discussed earlier, this has a serious long term consequences, but it’s a viable strategy nonetheless. Especially if your business doesn’t actually promote long term projects.

We mentioned the benefit of every contributor being able to access all of the source code in your organization. If we flip that around, this can also be a problem for some types of work. There’s no easy way you can restrict access to projects. This is important if you consider government or military contracts as they typically have strict security requirements.

Finally let’s consider continuous integration. You may be using a system such as Jenkins, Travis, or CircleCI, to manage the way your code is tested and delivered to customers. When you have more than one repository you typically set up one pipeline for each. Some teams even go further and have one dedicated CI instance per project. This is a flexible system that can adapt to the needs of each team. Your billing team may deploy to production once a week, while your web team would move faster and deploy multiple times a day.

If you’re considering moving to a monorepo, be wary of your CI system’s capabilities. It will have to do a lot of work. Simple tasks such as checking out the code, or building an artifact may become long running tasks which impact productivity. Google developed and runs its own custom CI solution, and for good reason. Nothing available on the market was good enough.

Now before you conclude that this is a blocker, I’d recommend you carefully analyse your project and the tools you use. If you’re using git, for example, there’s a myth going around that it can’t handle big repositories. This is demonstrably inaccurate, as best exemplified by the project that inspired git in the first place, the Linux Kernel.

Make your own research and see how many files and lines of code you have, and try to predict how much your project will grow. If you’re nowhere near the scale of the Kernel, then you’re OK. You could also make the point that git isn’t very good at storing binaries. LFS aims to solve that. You can also rewrite your history to delete old binaries in order to optimize performance.

In a similar vein, open-source CI systems are much more powerful than you think. Jenkins for example can scale to hundreds of jobs, dozens of workers, and can serve the needs of a large team with ease. Can it do Google scale? Absolutely not! But do you have tens of thousands of engineers pushing to production every day? The plateau at which these tools stop performing is so high, it’s not worth thinking about until you’re close to it. And chances are, you’ll know when you’re getting close.

And finally, there’s cost. You’ll need at least one dedicated team to pull this off. Because the amount of work is certainly not trivial, and it demands passion and focus. This team will need to, and I’m just summarizing here, build and maintain in perpetuity what is essentially a platform that stores code, assets, build artifacts, reusable development infrastructure for running tests or static analysis, and a CI system able to withstand large workloads and traffic. If this sounds scary, it’s because it is. But you’ll have no problems convincing developers to join such a team, it’s the type of experience that’s hard to accumulate by doing side-projects at home.

In closing

I’ve talked about the many advantages of working in a monorepo, the drawbacks, and touched upon the costs. This setup is not for everyone. I wouldn’t encourage you to try it out without first evaluating exactly what your problems and your business requirements look like. And of course, do go through all of the possible alternatives before deciding.

How to scale your Node.js server using clustering

freeCodeCamp — Tue, 27 Nov 2018 16:59:21 +0000

By Michele Riva

Scalability is a hot topic in tech, and every programming language or framework provides its own way of handling high loads of traffic.

Today, we’re going to see an easy and straightforward example about Node.js clustering. This is a programming technique which will help you parallelize your code and speed up performance.

“A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load.”

Node.js Documentation

We’re gonna create a simple web server using Koa, which is really similar to Express in terms of use.

The complete example is available in this Github repository.

What we’re gonna build

We’ll build a simple web server which will act as follows:

Our server will receive a POST request, we’ll pretend that user is sending us a picture.
We’ll copy an image from the filesystem into a temporary directory.
We’ll flip it vertically using Jimp, an image processing library for Node.js.
We’ll save it to the file system.
We’ll delete it and we’ll send a response to the user.

Of course, this is not a real world application, but is pretty close to one. We just want to measure the benefits of using clustering.

Setting up the project

I’m gonna use yarn to install my dependencies and initialize my project:

Since Node.js is single threaded, if our web server crashes, it will remain down until some other process will restarts it. So we’re gonna install forever, a simple daemon which will restart our web server if it ever crashes.

We’ll also install Jimp, Koa and Koa Router.

Getting started with Koa

This is the folder structure we need to create:

We’ll have an src folder which contains two JavaScript files: cluster.js and standard.js .

The first one will be the file where we’ll experiment with the cluster module. The second is a simple Koa server which will work without any clustering.

In the module directory, we’re gonna create two files: job.js and log.js.

job.js will perform the image manipulation work. log.js will log every event that occurs during that process.

The Log module

Log module will be a simple function which will take an argument and will write it to the stdout (similar to console.log).

It will also append the current timestamp at the beginning of the log. This will allow us to check when a process started and to measure its performance.

The Job module

I’ll be honest, this is not a beautiful and super-optimized script. It’s just an easy job which will allow us to stress our machine.

The Koa Webserver

We’re gonna create a very simple webserver. It will respond on two routes with two different HTTP methods.

We’ll be able to perform a GET request on [http://localhost:3000/](http://localhost:3000/.). Koa will respond with a simple text which will show us the current PID (process id).

The second route will only accept POST requests on the /flip path, and will perform the job that we just created.

We’ll also create a simple middleware which will set an X-Response-Time header. This will allow us to measure the performance.

Great! We can now start our server typing node ./src/standard.js and test our routes.

The problem

The image I am currently manipulating (via Unsplash)

Let’s use my machine as a server:

Macbook Pro 15-inch 2016
2.7GHz Intel Core i7
16GB RAM

If I make a POST request, the script above will send me a response in ~3800 milliseconds. Not so bad, given that the image I am currently working on is about 6.7MB.

I can try making more requests, but the response time won’t decrease too much. This is because the requests will be performed sequentially.

So, what would happen if I tried to make 10, 100, 1000 concurrent requests?

I made a simple Elixir script which performs multiple concurrent HTTP requests:

I chose Elixir because it’s really easy to create parallel processes, but you can use whatever you prefer!

Testing ten concurrent requests — without clustering

As you can see, we spawn 10 concurrent processes from our iex (an Elixir REPL).

The Node.js server will immediately copy our image and start to flip it.
The first response will be logged after 16 seconds and the last one after 40 seconds.

Such a dramatic performance decrease! With just 10 concurrent requests, we decreased the webserver performance by 950%!

Introducing clustering

All credits to Pexels

Remember what I mentioned at the beginning of the article?

To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load.

Depending on which server we’re gonna run our Koa application, we could have a different number of cores.

Every core will be responsible for handling the load individually. Basically, each HTTP request will be satisfied by a single core.

So for example — my machine, which has eight cores, will handle eight concurrent requests.

We can now count how many CPUs we have thanks to the os module:

The cpus() method will return an array of objects that describe our CPUs. We can bind its length to a constant which will be called numWorkers, ’cause that’s the number of workers that we’re gonna use.

We’re now ready to require the cluster module.

We now need a way of splitting our main process into N distinct processes.
We’ll call our main process master and the other processes workers.

Node.js cluster module offers a method called isMaster. It will return a boolean value that will tell us if the current process is directed by a worker or master:

Great. The golden rule here is that we don’t want to serve our Koa application under the master process.

We want to create a Koa application for each worker, so when a request comes in, the first free worker will take care of it.

The cluster.fork() method will fit our purpose:

Ok, at first that may be a little tricky.

As you can see in the script above, if our script has been executed by the master process, we’re gonna declare a constant called workers. This will create a worker for each core of our CPU, and will store all the information about them.

If you feel unsure about the adopted syntax, using […Array(x)].map() is just the same as:

I just prefer to use immutable values while developing a high-concurrency app.

Adding Koa

All credit to Pexels

As we said before, we don’t want to serve our Koa application under the master process.

Let’s copy our Koa app structure into the else statement, so we will be sure that it will be served by a worker:

As you can see, we also added a couple of event listeners in the isMaster statement:

The first one will tell us that a new worker has been spawned. The second one will create a new worker when one other worker crashes.

That way, the master process will only be responsible for creating new workers and orchestrating them. Every worker will serve an instance of Koa which will be accessible on the :3000 port.

Testing ten concurrent requests — with clustering

As you can see, we got our first response after about 10 seconds, and the last one after about 14 seconds. It’s an amazing improvement over the previous 40 second response time!

We made ten concurrent requests, and the Koa server took eight of them immediately. When the first worker has sent its response to the client, it took one of the remaining requests and processed it!

Conclusion

Node.js has an amazing capacity of handling high loads, but it wouldn’t be wise to stop a request until the server finishes its process.

In fact, Node.js webservers can handle thousands of concurrent requests only if you immediately send a response to the client.

A best practice would be to add a pub/sub messaging interface using Redis or any other amazing tool. When the client sends a request, the server starts a realtime communication with other services. This takes charge of expensive jobs.

Load balancers would also help a lot splitting out high traffic loads.

Once again, technology is giving us endless possibilities, and we’re sure to find the right solution to scale our application to infinity and beyond!

scalability - freeCodeCamp.org

How to Optimize Django REST APIs for Performance: Profiling, Caching, and Scaling.

What we’ll cover:

Why Django REST APIs Become Slow

1. N+1 Query Problems in Serializers

2. Fetching Related Objects Inefficiently

3. Serializing Large Datasets Without Pagination

4. Recomputing Expensive Work Repeatedly

Profiling: Finding the Real Bottlenecks

Measuring Query Count in a View

Using the Django Debug Toolbar

How to Install and Enable the Django Debug Toolbar

Logging SQL Queries

How to Enable SQL Query Logging

Profiling API Response Time

How to Measure Total Response Time

SQL Query Optimization in Django REST APIs

Understanding the N+1 Query Problem

Solving the Problem with select_related and prefetch_related

Example: How to Optimize a Many-to-Many Relationship

Common Beginner Mistakes

Caching in Django REST APIs

Cache Eviction

Caching in Application Architectures

Caching in Django

When to Use Redis

Common Beginner Mistakes

Pagination and Limiting Expensive Datasets

Load Testing and Measuring Improvement

Summary and Next Steps

Key Takeaways

Next Steps for Your APIs

Read More

How to Build Your First Dynamic Performance Test in Apache JMeter

Table of Contents

Prerequisites

Introduction to Apache JMeter

Step 1: Create a New Test Plan

Step 2: Configure the Thread Group

Step 3: Add HTTP Request Defaults

Step 4: Add a CSV Data Set Config (Dynamic Input)

Step 5: Add the HTTP Request Sampler

Step 6: Add a JSON Extractor

Step 7: Add an Assertion

Step 8: Add Listeners

Step 9: Run Your Test

Step 10: Chain Another Request (Optional)

Step 11: Analyze the Results

Pro Tips

Example Folder Structure:

Conclusion

How to Build Multi-Module Projects in Spring Boot for Scalable Microservices

Table of Contents

1. Why Multi-Module Projects?

Real-World Example

Case Study: Netflix

2. Project Structure and Architecture

3. How to Set Up the Parent Project

Step 1: Create the Root Project

Step 2: Configure the Parent pom.xml

4. How to Create the Modules

Common Module

Domain Module

Repository Module

Service Module

Web Module

5. Inter-Module Communication

6. Common Pitfalls and Solutions

7. Testing Strategy and Configuration

Unit Tests

Integration Tests

8. Error Handling and Logging

Error Handling

Logging

9. Security and JWT Integration

10. Deployment with Docker and CI/CD

Step 1: Containerizing with Docker

Step 2: Using Docker Compose for Multi-Module Deployment

CI/CD Example with GitHub Actions

11. Best Practices and Advanced Use Cases

Solving the Problem with `select_related` and `prefetch_related`

Step 2: Configure the Parent `pom.xml`