Gor Grigoryan - freeCodeCamp.org

How to Effectively Manage Unique Identifiers at Scale: From GUIDs to Snowflake IDs and Other Modern Solutions

Gor Grigoryan — Tue, 20 Aug 2024 18:21:25 +0000

What Are Unique Identifiers? 🪪

Unique identifiers (UIDs) are crucial components in software engineering and data management. They serve as distinct references for entities within a system and ensure that each item – whether a database record, a user, or a file – can be uniquely identified and accessed.

UIDs are critical for maintaining data, enabling efficient search and retrieval, and supporting large-scale operations in distributed systems. As data volumes and system complexities grow, the need for scalable UID solutions becomes increasingly important.

In this article, you'll learn all about the history of unique identifiers, as well as how some modern solutions work.

The concept of unique identifiers has enveloped significantly over time, reflecting the growing complexity and scale of human societies and technological systems. To understand why unique identifiers are so important today, let’s look at how we've historically managed identification and how it was developed.

In early human societies, individuals were often identified by a single name. This was usually sufficient in small communities where everyone knew each other personally. But as populations grew, it became necessary to distinguish between individuals who shared the same first name. This led to the adoption of surnames.

For example, in Armenia 🇦🇲, surnames are used to identify individuals by their family or ancestry. Take the example of a person named Gor. In a small group of up to 50 people, let's say, identifying Gor by his first name alone is easy.

But as the group grows to a larger community of, say, 500 people, additional identifiers become necessary. Gor will be identified as Gor Grigoryan, indicating that he belongs to the Grigoryan family/ancestry. This surname provides a clearer identification and connects Gor to his family's lineage.

As societies continued to expand and bureaucratic systems became more complex, even surnames proved not enough for uniquely identifying individuals. This was especially true in larger cities and for the administration of government services. The need for more robust identification methods became apparent.

Government Management of Unique Identifiers

The introduction of passports in the early 20th century marked a significant step in this direction. Passports included unique personal identifiers, such as passport numbers, to distinguish between individuals clearly. These unique IDs ensured that each person could be accurately identified, regardless of name similarities or other ambiguities.

Several countries pioneered the use of unique personal identification numbers to address this need:

Germany 🇩🇪: In the 19th century, Germany implemented a system for tracking individuals for social welfare and military conscription purposes.
Sweden 🇸🇪: Sweden began issuing personal identification numbers (Personnummer) in the 1940s, providing each citizen with a unique identifier for use in various administrative processes.
France 🇫🇷: France introduced the National Identification Number (Numéro de Sécurité Sociale) in the mid-20th century to streamline social security administration and other government services.
United States 🇺🇸: The USA followed with the introduction of Social Security Numbers (SSNs) in 1936 as part of the Social Security Act. This approach to unique identification has since been adopted worldwide, with countries issuing national identification numbers to their citizens.

Information page, Edwin James Tharp’s passport, March 27, 1936, Robert and Eva Tharp Collection.

As illustrated in the example image, the 1936 UK 🇬🇧 passport included detailed personal information such as eye color, hair color, profession, height, and information about the holder’s spouse and children.

A Social Security Number (SSN) in the United States is a nine-digit number formatted as "AAA-GG-SSSS". Each part of the SSN has historically carried specific information:

Area Number (AAA): Originally, the first three digits, known as the area number, represented the geographical region where the SSN was issued. This regional assignment helped to ensure a systematic distribution of numbers across the country.
Group Number (GG): The middle two digits, called the group number, were used to organize the numbers within a given area. The group numbers ranged from 01 to 99 and were issued in a specific order to prevent duplicate numbers within the same area.
Serial Number (SSSS): The last four digits are the serial number, which sequentially identifies each individual within a group. This part of the SSN ensures that even if the area and group numbers are the same, the overall SSN remains unique.

The Social Security Administration (SSA) has implemented several measures to ensure that each SSN is unique for the entire USA population (341.9 million people).

Governments around the world manage unique identifiers primarily for administrative purposes, such as social security, taxation, and national identification. These systems are designed to handle large populations and ensure that every citizen has a unique identifier for official records.

For example, the United States 🇺🇸 Social Security Administration (SSA) manages Social Security Numbers (SSNs) for over 330 million people. Similarly, the Indian 🇮🇳 government has issued Aadhaar numbers, a 12-digit unique identifier, to over 1.3 billion citizens. These identifiers are crucial for accessing government services, benefits, and other official processes.

Aadhaar is the world's largest biometric ID system described as "the most sophisticated ID program in the world".

Scalability in Government Systems

While government systems are large, they generally do not face the same scalability challenges as tech companies. Government databases are often centralized, and the rate at which new identifiers are issued is relatively steady and predictable. Also, the frequency of updates and interactions with these identifiers is lower compared to the dynamic environment of tech companies.

Tech companies, especially social media giants, operate on an entirely different scale. These companies manage billions of users and generate vast amounts of data daily. For instance, Meta (formerly Facebook) has over 3 billion monthly active users across its platforms, including Facebook, Instagram, and WhatsApp.

Tech Companies and Their Scale

Let's take a few examples:

Meta (Facebook)

User Base: With over 3 billion monthly active users, Meta needs a robust system to ensure that each user is uniquely identified.
Posts and Interactions: Facebook alone sees approximately 350 million new posts daily. Each of these posts, along with comments, likes, and shares require a unique identifier to manage interactions efficiently.
Messages: WhatsApp users send around 100 billion messages every day, each needing a unique identifier to ensure messages are correctly routed and stored.
Unique Data Rows: With the combination of user profiles, posts, comments, likes, and messages, Meta likely manages over 10+ trillion unique data rows. (If the global population is approximately 8 billion people, then 10 trillion people would be about 1,250 times the current global population).

X (Twitter)

Twitter, another social media giant, has about 450 million monthly active users. On average, users send around 500 million tweets per day. Each tweet, reply, and retweet needs a unique identifier to maintain the platform's integrity and usability.

Telegram is known for its high-traffic and robust messaging platform. With over 700 million monthly active users, Telegram experiences particularly high traffic spikes during events like New Year's Eve, where users send billions of messages within a short timeframe.

On a typical day, Telegram handles over 70 billion messages. Each message, channel post, and group interaction requires a unique identifier to ensure proper delivery and organization.

The scale at which tech companies operate requires sophisticated and highly scalable unique identifier systems. These systems must handle high concurrency, support distributed architectures, and ensure low latency.

The Role of Auto-increment IDs and Their Scalability Issues

Auto-increment IDs are a common method for generating unique identifiers in relational databases. When a new record is added to a table, the database automatically assigns the next available integer value to the ID field. This method is straightforward and ensures that each record within a table has a unique identifier without requiring any manual intervention.

Consider a table for storing user information in a relational database. When the first user is added, they might be assigned an ID of 1. The second user would receive an ID of 2, and so on.

While auto-increment IDs are simple and effective for small-scale applications, they face significant challenges in larger, distributed systems.

Concurrency Issues: In high-traffic applications, multiple transactions might attempt to insert records simultaneously. Ensuring that each transaction receives a unique auto-increment ID can lead to performance bottlenecks and require complex locking mechanisms.
Distributed Systems: In distributed databases, where data is spread across multiple servers, maintaining a global sequence for auto-increment IDs becomes problematic. Each server would need to coordinate with others to avoid generating duplicate IDs, which can significantly impact performance and reliability.
Single Point of Failure: Relying on a central authority to generate auto-increment IDs introduces a single point of failure. If the server responsible for assigning IDs goes down, the entire system might be unable to add new records.
Predictability: Auto-increment IDs are predictable. If someone knows the ID of one record, they can infer the IDs of subsequent records. This predictability can be a security concern in certain applications, such as those involving financial transactions or sensitive user data.

CREATE TABLE Admins (
    Id SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

CREATE TABLE Users (
    Id SERIAL PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

INSERT INTO Admins (Name)
VALUES ('GorGrigoryan'),
       ('GorGrigoryan2');

SELECT * FROM Admins;


-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 1  | GorGrigoryan  |
-- +----+---------------+
-- | 2  | GorGrigoryan2 |
-- +----+---------------+

Sequence Numbers and Their Advantages Over Auto-increment IDs

Sequence numbers are a method of generating unique identifiers by maintaining a counter that is incremented with each new record. Unlike auto-increment IDs, which are typically limited to a single database instance, sequence numbers can be designed to work across distributed systems, addressing some of the scalability and concurrency issues associated with auto-increment IDs.

How sequence numbers work:

Centralized Sequence Generators: A central service or database table generates and manages the sequence numbers. Each request for a new identifier increments the counter and returns the next value.
Distributed Sequence Generators: In a distributed environment, sequence numbers can be generated by dividing the range of possible values among different nodes or using more complex algorithms to ensure uniqueness without central coordination.

Consider a distributed database system with multiple nodes, each responsible for generating unique sequence numbers. The system might allocate ranges of sequence numbers to each node, ensuring that they can generate identifiers independently:

Node 1: Allocated sequence numbers 1,000,000 to 1,999,999
Node 2: Allocated sequence numbers 2,000,000 to 2,999,999
Node 3: Allocated sequence numbers 3,000,000 to 3,999,999

Each node can now generate up to one million unique identifiers without needing to communicate with a central server. This approach improves scalability and performance, particularly in environments with high write loads.

CREATE SEQUENCE UserIdentifier
INCREMENT 1
START 1;

CREATE TABLE Admins (
    Id INT PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);

CREATE TABLE Users (
    Id INT PRIMARY KEY,
    Name VARCHAR(255) NOT NULL
);


INSERT INTO Admins (Id, Name)
VALUES(nextval('UserIdentifier'), 'GorGrigoryan'),
(nextval('UserIdentifier'), 'GorGrigoryan2');

INSERT INTO Users (Id, Name)
VALUES(nextval('UserIdentifier'), 'UserGorGrigoryan'),
(nextval('UserIdentifier'), 'UserGorGrigoryan2');


SELECT * FROM Admins;

-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 1  | GorGrigoryan  |
-- +----+---------------+
-- | 2  | GorGrigoryan2 |
-- +----+---------------+

SELECT * FROM Users;

-- +----+---------------+
-- | Id | Name          |
-- +----+---------------+
-- | 3  | GorGrigoryan  |
-- +----+---------------+
-- | 4  | GorGrigoryan2 |
-- +----+---------------+

Another advantage of using sequence numbers is that you can obtain the ID of the entity before it is inserted into the database.

In the case of auto-increment IDs, this assignment is typically handled by the database upon insertion, which can limit flexibility. With sequence numbers, you can easily generate the ID on the application side, which can be an easy task when using some ORMs e.g the EF Core ORM in C#

Check out sequence numbers on the SQL server here.

UUIDs: Overview and Usage

GUIDs (Globally Unique Identifiers), also known as UUIDs (Universally Unique Identifiers), are 128-bit identifiers designed to be globally unique. A typical UUID is displayed in a 32-character hexadecimal string, divided into five groups separated by hyphens. For example: 126e3456-e89b-12d3-a456-426614174000.

What's so great about UUIDs?

One of the standout features of GUIDs is their huge capacity for uniqueness. With a 128-bit structure, the total number of possible GUIDs is very large: Specifically, there are 340,282,366,920,938,463,463,374,607,431,770,000,000 GUIDs available. To put that into perspective, let's compare it with something tangible.

Did you know that scientists have attempted to calculate the number of grains of sand on Earth? Science writer David Blatner, in his book Spectrums, mentions that a group of researchers at the University of Hawaii tried to estimate this number. They determined that Earth has roughly (and we are speaking very roughly) 7.5 x 10¹⁸ grains of sand, or seven quintillion, five hundred quadrillion grains. For more, consider reading the article titled: "Which Is Greater, The Number Of Sand Grains On Earth Or Stars In The Sky?"

Now, to compare those numbers:

| GUIDs available | 340,282,366,920,938,463,463,374,607,431,770,000,000
| Sand grains     | 75,000,000,000,000,000,000

If you decided to create an application to track every grain of sand on Earth and assign each a unique identifier, you could easily do that using GUIDs. The fun part is that you could actually repeat this process 4,537,098,225,612,512,846 times over without running out of unique GUIDs! 🤯

UUID Version 1

UUID Version 1 generates unique identifiers based on the current timestamp, clock sequence, and node identifier (typically the MAC address of the machine generating the UUID).

According to RFC 4122, the timestamp is the number of nanoseconds since October 15, 1582, at midnight UTC. Most computers do not have a clock that ticks fast enough to measure time in nanoseconds. Instead, a random number is often used to fill in timestamp digits beyond the computer's measurement accuracy.

When multiple version-1 UUIDs are generated in a single API call, the random portion may be incremented rather than regenerated for each UUID. This ensures uniqueness and is faster to generate.

UUID v1 also has the mac address attached to it. By including a MAC address in the UUID, you can be sure that two different computers will never generate the same UUID. Because MAC addresses are globally unique, but also note that version-1 UUIDs can be traced back to the computer that generated them.

This ensures that the UUID is unique across both time and space. It is suitable when the generation time and machine uniqueness are important. It is often used in systems where the timestamp of creation is relevant or needed.

(Image from here)

UUID Version 4

UUID Version 4 generates identifiers using random or pseudo-random numbers. This method ensures a high probability of uniqueness due to the vast number of possible GUIDs. This is the most common UUID version.

There are 2 main variants of UUID:

Variant 1: Minecraft UUID, also called Timestamp-first UUIDs
Variant 2: "GUID"

(Image from here)

GUID is entirely random, making it simple to generate and ensuring that each identifier is unique with a very high probability. The unique identifiers are made up of 128 bits. They are written as 32 characters using numbers (0-9) and letters (A-F). The characters are grouped in a specific format: 8-4-4-4-12, separated by hyphens, like this: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}.

The great thing about GUIDs is that you don’t need a central system to create them. Anyone can generate a GUID using an algorithm, and it will still be unique across different systems and applications. They are designed to be used nearly everywhere a unique identifier is needed. Here are some usage examples:

Windows: Uses GUIDs to generate unique product keys
Microsoft SQL Server: Uses GUIDs as primary keys to ensure global uniqueness across distributed databases
AWS: Uses GUIDs for uniquely identifying resources in their cloud infrastructure, such as EC2 instances and S3 objects
eBay: Uses GUIDs to identify listings, transactions, and users

UUID Version 5

UUID Version 5 generates unique identifiers based on a namespace identifier and a name. The namespace and name are combined and hashed using SHA-1 to produce the UUID. This ensures that the same namespace and name combination will always produce the same UUID. In UUID, the namespace must be a UUID, and the name can be anything.

UUID V5 is useful for generating consistent unique identifiers for the same input data across different systems and contexts. Let's say we want to generate a user id based on their username. Here’s how you can achieve this in C#:

Here the UUID Version 5 solves several important problems, particularly when you need a consistent and unique identifier based on a given input.

For instance, consider a scenario where you need a user ID to make an API call (or anything else), but in your code, you only have the username accessible. How would the problem be solved if we were using UUID Version 4 (GUIDs)? Most likely, it would work something like this:

/* When using GUID (UUID v4) */

var userName = "bob"; // Lets assume we only have username
// API call or DB call to get the user id using name
var userId = await userService.GetUserIdAsync(userName);

await userService.ChangeUserNameAsync(userId, "bob-2");

By using UUID Version 5 with a shared namespace across all your projects, you can easily generate the user ID from the username without making any additional API calls. So the same code would look like this:

/* When using UUID v5 */

// From some shared code
var userNamespace = SharedConstants.UserNamespace;

var userName = "bob"; // Lets assume we only have username

//Generate the user id in place, without additional call
var userId = Uuid.NewNameBased(userNamespace, userName);

await userService.ChangeUserNameAsync(userId, "bob-2");

This approach eliminates the need for redundant API calls. In a distributed system, making an API call to fetch a user ID every time you need it can be inefficient and slow. With UUID Version 5, you can locally generate the user ID from the username (or any other input), reducing the need for network requests and significantly improving the efficiency of your application.

What kind of problem have we solved with UUID v5? Let's say you need a user ID to make an API call but in your code, you have only a username, if you have the namespace shared across all your projects. Then you can easily get the user id using a username, without making any API call. That's because UUID v5 always reproduces the same UUID for the same input.

Also, UUID Version 5 ensures uniqueness and consistency across different systems. When integrating multiple systems or microservices, it can be challenging to keep user IDs consistent across various services. By using the same namespace and the same input (such as a username), UUID Version 5 guarantees that the generated IDs are unique and consistent across all systems, facilitating smoother integration and data consistency.

UUID Version 7

GUID Version 7 is a proposed new version that aims to combine the strengths of both timestamp-based and random-based GUIDs.

Problems with UUID v4 (GUID)

UUID Version 4 generates non-time-ordered values, meaning the identifiers created are not sequential. Since these values are randomly generated, they won't be clustered together in a database index. Instead, inserts will occur at random locations, which can negatively impact the performance of common index data structures, such as B-trees and their variants.

In a scenario where your product requires frequent access to recent data, non-sequential identifiers create a significant challenge.

With UUID Version 4, the most recent data will be inserted randomly throughout the index, lacking clustering. As a result, retrieving the most recent data from a large dataset requires traversing numerous database index pages.

In contrast, using sequential identifiers ensures that the latest data is logically arranged at the right-most part of the index, making it much more cache-friendly. This organization allows for faster and more efficient retrieval of recent data, as it minimizes the number of index pages that need to be accessed which is a lack in UUID v4.

The solution with UUID v7

UUID v7 is designed to provide unique and sortable identifiers that are both easy to generate and useful for distributed systems. It uses a combination of timestamps and random data to ensure both uniqueness and temporal order.

The first part of the UUID is a timestamp that provides a chronological component, ensuring that UUIDs generated close together in time are also close together in value. The remaining part is filled with random data, ensuring the uniqueness of each identifier.

Buildkite post about migrating to UUID v7

UUID Versions 2, 3, and 6

You may have noticed that our discussion focuses on UUID Versions 1, 4, 5, and 7, and skips over Versions 2, 3, and 6. Here's why:

UUID Version 2: This version is rarely used in modern applications. It’s similar to Version 1 but includes additional fields for things like domain information (such as POSIX UID or GID). It was mainly used in legacy systems and is now considered largely obsolete.
UUID Version 3: This version is based on a name and a namespace, similar to Version 5. The main difference is that Version 3 uses the MD5 hashing algorithm, which is less secure and less efficient than the SHA-1 algorithm used in Version 5. Version 5 is generally preferred because SHA-1 is more robust.
UUID Version 6: Version 6 is still under draft as a proposed standard. It is meant to provide a time-ordered UUID with better performance for distributed systems, but since it hasn't been fully adopted yet, we focus on Version 7, which offers similar features and has more momentum.

Snowflake ID

Snowflake ID is a unique identifier generation system developed by Twitter to address the challenges of generating unique, sequential, and distributed identifiers in a highly scalable and efficient manner.

Unlike GUIDs, which are often non-sequential and can cause performance issues in database indexing, Snowflake IDs are designed to be both time-ordered and globally unique, making them ideal for distributed systems and databases where sequential order is important.

A Snowflake ID is a 64-bit integer composed of several distinct parts:

Timestamp (41 bits): The largest portion of the Snowflake ID is the timestamp, which records the number of milliseconds since a custom epoch (often set to the date when the system was first deployed). This ensures that IDs are time-ordered and can be easily sorted based on their creation time.
Datacenter ID (5 bits): This part of the ID identifies the datacenter where the ID was generated, allowing the system to generate unique IDs across multiple data centers without conflicts.
Machine ID (5 bits): Similar to the datacenter ID, the machine ID identifies the specific server or machine within the datacenter that generated the ID. This ensures that even within the same data center, IDs remain unique.
Sequence Number (12 bits): The sequence number is used to differentiate between multiple IDs generated within the same millisecond by the same machine. With 12 bits, up to 4,096 unique IDs can be generated per machine per millisecond.

The format was created by Twitter (now X) and is used for the IDs of tweets. It is popularly believed that every snowflake has a unique structure, so they took the name "snowflake ID". The format has been adopted by other companies, including Discord and Instagram. The Mastodon social network uses a modified version.

The format was first announced by X/Twitter in June 2010. Due to implementation challenges, they waited until later in the year to roll out the update.

X uses snowflake IDs for posts, direct messages, users, lists, and all other objects available over the API.
Discord also uses snowflakes, with their epoch set to the first second of the year 2015.
Instagram uses a modified version of the format, with 41 bits for a timestamp, and 10 bits for a sequence number.
Mastodon's modified format has 48 bits for a millisecond-level timestamp, as it uses the UNIX epoch. The remaining 16 bits are for sequence data.

"The Problem" stated by Twitter:

We currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters. For various reasons, the details of which merit a whole blog post, we’re working to replace many of these systems with the Cassandra distributed database or horizontally sharded MySQL (using gizzard).

Unlike MySQL, Cassandra has no built-in way of generating unique ids – nor should it, since at the scale where Cassandra becomes interesting, it would be difficult to provide a one-size-fits-all solution for ids. Same goes for sharded MySQL. We needed something that could generate tens of thousands of ids per second in a highly available manner.

This naturally led us to choose an uncoordinated approach. These ids need to be roughly sortable, meaning that if tweets A and B are posted around the same time, they should have ids in close proximity to one another since this is how we and most Twitter clients sort tweets.

Additionally, these numbers have to fit into 64 bits. We’ve been through the painful process of growing the number of bits used to store tweet ids before. It’s unsurprisingly hard to do when you have over 100,000 different codebases involved.

Check out here for more information

Finding Tweet Timestamps

We all know that deleting a tweet isn't truly possible—once it's out there, it's how Twitter is designed. However, Twitter's use of Snowflake IDs adds an interesting twist to this narrative. Snowflake IDs are designed to be unique and time-ordered, which makes them not just identifiers but also a trail that can be tracked.

On May 11, 2019, Derek Willis from Politwoops uncovered a list of deleted tweet IDs. By using the Snowflake structure, he was able to extract the timestamps from these IDs, and discovered the 107 missing tweets. This finding inspired the creation of TweetedAt, a tool designed to accurately retrieve timestamps from Snowflake IDs and estimate the timing of tweets generated before Snowflake was in use.

Check out here.

Wrapping Up

Unique identifiers play a critical role in software engineering, ensuring data integrity and enabling efficient data management across distributed systems.

From traditional GUIDs to modern solutions like Snowflake IDs, each identifier system offers distinct advantages tailored to specific use cases.

As technology evolves, understanding these systems and their implementations becomes increasingly important for scaling applications effectively. By exploring the various versions and alternatives, we can make informed decisions that best suit our needs in managing data at scale.

Cover image: A 2017 post celebrating Facebook reaching 2 billion users.

Decoding Chaos: How True Randomness Works in Software Engineering

Gor Grigoryan — Mon, 06 May 2024 16:27:18 +0000

Understanding Randomness

When you hear the word "randomness," what usually comes to mind? You may think of something intangible, an abstract concept without a specific shape or form – it's random.

But randomness is much more than an abstract idea – it's a fundamental aspect of our daily decisions and choices. Whether it's deciding what to eat for breakfast or picking a number from 1 to 10 in a game, randomness plays a crucial role.

Randomness isn't just about unpredictability. It's also about the lack of pattern or predictability in events. For instance, when you toss a coin, the outcome of heads or tails is random because it's equally likely and unpredictable.

Why is Randomness Important in Software Engineering?

This concept is incredibly important in the field of software engineering, where generating true randomness can enhance security, simulations, and algorithms. In software development, this unpredictability is not just a feature—it's a fundamental requirement for various critical functions.

Security

The most crucial role of randomness in software is in the realm of security. Random numbers are used to generate secure keys for encryption, ensuring that sensitive data—be it personal information, financial details, or confidential communications—is protected from unauthorized access.

The randomness ensures that these keys cannot be easily predicted or replicated, fortifying the security barriers (see more in the Randomness in Cryptographic Systems section)

Testing and Quality Assurance

Developers use random inputs to simulate how software might perform under different conditions. This approach helps uncover unexpected bugs and ensures that the software can handle a variety of scenarios, improving its reliability and stability.

Companies like Netflix, Facebook, Google use Chaos Engineering to make their systems more reliable (learn more in the Chaos Engineering section).

Simulation and Modeling

Randomness is a key component in simulations that mimic real-world phenomena, which can be inherently unpredictable. Whether it's modeling climate patterns, economic markets, or traffic flows, randomness helps create more accurate models that better reflect the complexity of these systems.

Additional Applications

Randomness is used in many areas and it helps distribute tasks across servers in load balancing, improves efficiency in traffic routing, and adds realism in image generation. Also, its crucial for creating unique identifiers like GUIDs (Globally Unique Identifiers) and shuffling playlists to enhance user experience. As you can see, the use cases for randomness are numerous.

Prerequisites

This article is designed to be accessible, with explanations straightforward enough for readers with various backgrounds. However, a few basic prerequisites can enhance your understanding:

Basic Programming Knowledge: While not essential, some familiarity with programming concepts in languages like C#, Java, or Python could help you grasp examples of how randomness is implemented in code more quickly.
Elementary Math Skills: A basic understanding of probability and statistics is beneficial but not necessary, as the article aims to explain these concepts in simple terms.
Introductory Cryptography: If you're curious about the security aspects of randomness, some background in cryptography concepts like encryption and key generation could be helpful.

Overall, the article is structured to be easy to follow, with no advanced knowledge required. It's meant to introduce the concept of randomness in software engineering broadly, making it suitable for readers from diverse fields.

Here's what we'll cover in this article:

Understanding Randomness
Coin Toss Paradigm
The Illusion of Human Randomness
How Random Number Generators Work
- Simple random number generator
True Random Number Generation (TRNG) and Entropy Sources
Randomness in software testing
- Chaos Monkey developed my Netflix
Randomness in Cryptographic Systems
- Could you hack the encryption?
Randomness in Simulation and Modeling
- Monte Carlo Simulation
Future of Randomness in Software Engineering
- Quantum Computing and Quantum Randomness
Wrapping Up

Coin Toss Paradigm

Is tossing a coin truly a random event? At first glance, a coin toss represents the paradigm of randomness : two outcomes, each with an equal chance of occurring.

But if we dive deeper into the physics behind a coin toss, the story starts to unfold differently. Hypothetically, if we could control and replicate every variable involved in the toss – the force applied, the angle of the toss, the air resistance, and even the surface it lands on – would the outcome still be unpredictable?

The answer leans towards a surprising declaration: in a perfectly controlled environment, the result of a coin toss could be predicted with near certainty. This challenges our understanding of randomness, suggesting that what we often perceive as random is influenced by numerous factors, many of which are beyond our control or too complex to replicate in practice.

Thus, we arrive at an insightful conclusion that randomness ≈ the result of variables that are exceedingly difficult to replicate.

Big research from the University of California at Berkeley, titled “Dynamical Bias in the Coin Toss”, delves into this phenomenon:

Abstract: We analyze the natural process of flipping a coin which is caught in the hand. We show that vigorously flipped coins tend to come up the same way they started. The limiting chance of coming up this way depends on a single parameter, the angle between the normal to the coin and the angular momentum vector. Measurements of this parameter based on high-speed photography are reported. For natural flips, the chance of coming up as started is about .51

_[Dynamical Bias in the Coin Toss](https://www.stat.berkeley.edu/~aldous/157/Papers/diaconiscoinbias.pdf" rel="noopener)

The Illusion of Human Randomness

For humans, it's an easy task to generate a random number, say a random word, or make a random decision. But again, is it really a random thing and can it be somehow predicted like we have stated for a coin toss?

If you have seen the 2015 movie Focus, you may remember the "priming" scene where they spend the day "priming" their victim to subconsciously recognize and choose the number 55 by having it represented all around him.

Priming is one of the most important psychological principles to understand because it influences behavior through implicit memory. In other words, exposure to a cue in one setting can form an association that carries into another.

One of the examples of priming comes to us from a supermarket bottle shop. Imagine one week you go into the bottle shop and there’s some French music playing in the background. You buy your wine and leave.

Now imagine you return a week later, but this time German music is piping through the speakers. Again, you buy your wine and leave. Chances are that when French music was playing, you purchased French wine, and when German music was playing, German wine – just like 77% and 73% of research participants did.

Were these consumers aware of the music and its impact on their decision? 86% of people said no, the music had no effect.

This phenomenon underscores a profound truth: whether knowingly or not, we are both the primers and the primed. Our perceived randomness in decision-making is continuously shaped by the stimuli around us. This reveals that the essence of human randomness is far more complex and influenced than we might initially believe.

How Random Number Generators Work

Let’s take a journey back to the early days of computing to understand the evolution of random number generators.

Initially, computers were quite basic compared to today’s sophisticated machines. Essentially, a computer operates on a strict set of instructions : it cannot spontaneously generate a number as humans might randomly choose a number from 1 to 10.

For a computer, generating a random number requires specific instructions. Today, this task has become straightforward in many programming languages through built-in functions. For example, in C#, you can generate a random number between 1 and 10 with this simple command:

Random.Next(1, 10) // <-- Generates a radom number from 1 to 10

The interesting part begins when we look under the hood.

Simple random number generator

What if you were given a task to create a function that generates a random number? Let’s say you have this function:

public static int GenerateRandomNumber(int start, int end)
{
  return ✨🪄 magic ✨🪄
}

One of the simplest ways to do this is using a Linear Congruential Generator (LCG). The example below is a simplistic approach and you shouldn't use it for cryptographic purposes or applications requiring high levels of randomness.

using System;

class SimpleRandomGenerator
{
    private long seed;
    private const long a = 25214903917;
    private const long c = 11;
    private long m = (long)Math.Pow(2, 48);

    public SimpleRandomGenerator(long seed)
    {
        this.seed = seed;
    }

    public int Next(int min, int max)
    {
        // Update the seed
        seed = (a * seed + c) % m;

        // Ensure the result is within the bounds [min, max)
        int result = (int)(min + (seed % (max - min)));
        return result;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var generator = new SimpleRandomGenerator(DateTime.Now.Ticks);

        for(int i = 0; i < 15; i++)
        {
            var rndNumber = generator.Next(1, 101);

            Console.WriteLine($"Random number between 1 and 100: {rndNumber}");        
        }
    }
}

/* Output
Random number between 1 and 100: 78
Random number between 1 and 100: 9
Random number between 1 and 100: -48
Random number between 1 and 100: 71
Random number between 1 and 100: 6
Random number between 1 and 100: 45
Random number between 1 and 100: 64
Random number between 1 and 100: 99
Random number between 1 and 100: -34
Random number between 1 and 100: 85
Random number between 1 and 100: -44
Random number between 1 and 100: -25
Random number between 1 and 100: 26
Random number between 1 and 100: -27
Random number between 1 and 100: 24
*/

This example uses the Linear Congruential Generator (LCG) method, which is a basic pseudorandom number generator.

LCGs are one of the oldest and simplest methods for generating sequences of pseudo-random numbers, and they operate based on a simple mathematical formula: "new seed = (a×seed+c) mod m" . The seed is typically initialized using a value with sufficient entropy, such as the current time (DateTime.Now.Ticks in this case). The Next method generates a new "random" number within the specified range [min, max).

Here's the step-by-step logic:

Update the Seed: The seed is updated using the LCG formula mentioned above. This step is critical, as it uses the old seed to produce a new one, ensuring that each call to Next results in a different output.
Scaling the Output: Once the new seed is calculated, it needs to be adjusted to fall within the user-specified range [min, max).
– The modulus operation seed % (max - min) scales the seed to a value within the range of 0 to (max - min) - 1.
– Adding min shifts this scaled value into the desired range, ensuring that the result is at least min but less than max.

True Random Number Generation (TRNG) and Entropy Sources

Random number generation based on natural events or hardware characteristics involves using unpredictable, non-deterministic sources to generate randomness. This approach is often referred to as using "entropy sources" or "true random number generation" (TRNG).

Unlike pseudo-random number generators (PRNGs) that use mathematical algorithms and require a seed value, true random number generators derive their randomness from physical events that are almost unpredictable. Here are a few examples:

Earthquakes in TRNG

Earthquakes generate seismic data that is almost unpredictable and can be used as a source of randomness. By measuring seismic activity through geophones or seismographs, the minute variations in the Earth's movement can be converted into random numbers.

Earthquakes occur due to the sudden release of energy in the Earth's crust, resulting in the ground shaking. This energy release is unpredictable and varies in magnitude, location, and frequency. The unpredictability of the timing, duration, and intensity of seismic events makes this a viable entropy source.

[USGS Magnitude 2.5+ Earthquakes data, Past Day](https://earthquake.usgs.gov/earthquakes/map/?currentFeatureId=pr71446783&extent=9.79568,-147.39258&extent=58.99531,-42.62695" rel="noopener)

Additional technical details

Here are some additional technical details about earthquakes in TRNG:

Data collection is typically done using instruments called seismometers or geophones, which are sensitive to ground vibrations. These devices convert the kinetic energy of ground movements into electrical signals that can then be digitized and analyzed.

This process might include:

Signal Conditioning and Filtering: Filtering the seismic signals to isolate the random components from predictable noise or background vibrations.
Digitization: Converting the analog signals into digital values, which typically involves sampling the signal at regular intervals and quantizing these samples into digital values.

The raw digital data derived from seismic activity might not be uniformly random due to natural biases in how earthquakes occur or how data is collected.

To ensure that the numbers generated are suitable for use in applications requiring high-quality randomness (such as cryptographic systems), further processing might be necessary.

Here are the common techniques:

Debiasing: Applying algorithms to remove any predictable patterns or biases from the data.
Whitening: Transforming the data to ensure a uniform distribution across all possible values. This often involves statistical tests to adjust the output until it meets the criteria for randomness.

Using earthquakes for random number generation could be particularly valuable in applications where an external, unpredictable source of randomness is beneficial.

But there are cons and practical considerations:

Geographical Limitations: Not all locations experience frequent seismic activity, which could limit the availability of this method to specific regions.
Event Rarity: Significant seismic events are relatively rare and unpredictable in timing, which might not provide a steady or reliable source of randomness when needed.
Data Collection and Processing Overhead: The infrastructure and computational effort required to capture, process, and utilize seismic data for random number generation can be significant.

Hardware Events in TRNG

Hardware-based random number generators (HRNGs) use physical processes within computing devices to generate randomness. Examples include:

Thermal Noise (Johnson-Nyquist Noise):

Thermal noise, also known as Johnson-Nyquist noise, is a type of interference naturally present in all electronic devices and circuits. It’s caused by the random motion of electrons within a material due to heat. This phenomenon can be used as a source of randomness for generating random numbers in hardware devices.

Every material that conducts electricity has electrons, which are tiny particles that move around and carry electrical current. Even when a device isn’t actively being used, these electrons are never completely still – they move randomly because of the heat energy within the material. The higher the temperature, the more active the electrons become.

Thermal noise is generated by the inherent energy present in all materials at temperatures above absolute zero (-273.15°C or -459.67°F). At these temperatures, electrons gain energy and start moving randomly. This movement causes tiny, random fluctuations in the electrical current when measured across components like resistors.

Thermal noise is ideal for cryptographic applications where high security is essential. This includes key generation and secure communications where unpredictability is paramount to preventing attacks.

In developing secure communication protocols for applications like instant messaging, VoIP, or data transmission systems, thermal noise can be used to generate encryption keys that are nearly impossible to predict, enhancing security.

Clock Drift

Clock drift occurs due to the slight and unpredictable variations in the timing mechanisms (like crystal oscillators) of computers and other digital devices. Clock drift exploits the natural variability in hardware clocks, which are designed to measure time but can drift apart due to minor differences in the frequency of their oscillators.

By comparing the time reported by two or more independent clocks, small differences that occur naturally and unpredictably can be measured. These differences are influenced by factors such as temperature changes, hardware imperfections, and supply voltage variations.

_[A USB-pluggable hardware true random number generator](https://en.wikipedia.org/wiki/Hardware_random_number_generator#Clockdrift" rel="noopener)

Photonic Emission

Photonic emission-based random number generation uses the process of light emission to create random numbers. This approach relies on the quantum nature of light – specifically, the behavior of photons, which are tiny particles that make up the light.

Photonic emission occurs when energy is released from atoms in the form of light. This happens in devices like LEDs (light-emitting diodes) and lasers.

In an LED, when electricity flows through the device, it excites electrons (tiny negatively charged particles) to higher energy states. As these electrons return to their normal states, they release energy in the form of photons.

The exact moment a photon is emitted is inherently unpredictable due to the principles of quantum mechanics, where particles like electrons behave in a probabilistic manner.

To turn photonic emission into random numbers, we first need to detect these photons. We can do this using a device called a photodetector, which captures the light and converts each photon hit into an electrical signal.

The key to randomness lies in the timing of each photon’s arrival at the detector. Since the emission of each photon is random, the times they are detected are also random. These times are then recorded with high precision.

Cloudflare’s Lava Lamps for Randomness

Cloudflare, a web performance and security company, has set up a wall of lava lamps in the lobby of their San Francisco office. The setup is known as the “LavaRand” system. It leverages the unpredictable and ever-changing movements of the “lava” inside these lamps to generate randomness.

Cloudflare’s Lava Lamps. The view from the camera

How LavaRand Works:
The process starts with visual capturing. A camera is pointed at the wall of lava lamps. The lamps contain blobs of wax in a liquid that expand and move in unpredictable ways when heated.

As the wax heats up, it rises, and as it cools, it falls, creating an ever-changing, visually chaotic display.

The camera takes images of the lava lamps at regular intervals. Each image captures a unique, random pattern of swirling wax. These images are then processed using computer algorithms to extract random data from the patterns observed in the images.

Relation to Photonic Emission:
While Cloudflare’s Lava Lamps use a form of photonic emission, it’s indirect. The photonic emission in this context is the light emitted by the lamps, which illuminates the wax inside.

The random number generation process, however, primarily relies on the chaotic physical movements of the wax, which are captured by the light and recorded by a camera. The randomness comes from how the light and shadows play off the moving lava, rather than the emission and detection of photons at a quantum level (which is more typical in photonic emission RNG systems using LEDs or lasers).

Information from Cloudflare's official website:

LavaRand is a system that uses lava lamps as a secondary source of randomness for our production servers. A wall of lava lamps in the lobby of our San Francisco office provides an unpredictable input to a camera aimed at the wall. A video feed from the camera is fed into a CSPRNG, and that CSPRNG provides a stream of random values that can be used as an extra source of randomness by our production servers. Since the flow of the “lava” in a lava lamp is very unpredictable,1 “measuring” the lamps by taking footage of them is a good way to obtain unpredictable randomness. Computers store images as very large numbers, so we can use them as the input to a CSPRNG just like any other number.

We’re not the first ones to do this. Our LavaRand system was inspired by a similar system first proposed and built by Silicon Graphics and patented in 1996 (the patent has since expired).

Hopefully, we’ll never need it. Hopefully, the primary sources of randomness used by our production servers will remain secure, and LavaRand will serve little purpose beyond adding some flair to our office. But if it turns out that we’re wrong, and that our randomness sources in production are actually flawed, then LavaRand will be our hedge, making it just a little bit harder to hack Cloudflare.

Read more here.

[First proposed and patented LavaLend in 1996](https://patents.google.com/patent/US5732138" rel="noopener)

Human Factors in TRNG

Mouseware

Some tools like Mouseware use human factors to generate randomness. Mouseware uses a cryptographically secure random number generator based on your mouse movements to generate secure, memorable passwords. Passwords are generated entirely in the browser, and no data is ever sent over the network.

For those generated passwords, it would take 22400.7 years to guess at 1000 guesses/second and 2.0 hours to guess at 100 billion guesses/second.

1000 guesses/second is a worst-case web-based attack. Typically this is the only type of attack feasible against a secure website.
100 billion guesses/second is a worst-case offline attack when a hashed password database is stolen by someone with nontrivial technical and financial resources.

Example of the flow to generate random numbers based on mouse movements

You can read more about Mouseware on their website.

Randomness in Software Testing

Chaos Monkey developed my Netflix

Chaos Monkey

Chaos Monkey is an innovative tool developed by Netflix. It's responsible for randomly terminating Netflix's instances in production to ensure that engineers implement their services to be resilient to instance failures.

Imagine a virtual, mischievous monkey randomly tinkering with the network—shutting down instances, disconnecting servers, or overloading systems to simulate possible failures.

Although it might seem counterintuitive, the purpose of Chaos Monkey is to proactively provoke controlled failures. This strategy allows Netflix's engineers to test how well their systems can handle unexpected disruptions. The aim is to identify and resolve weaknesses before they impact users, ensuring that the infrastructure is robust enough to withstand real-world issues.

For instance, if Chaos Monkey randomly terminates a server and everything continues to run smoothly, that’s a win. If problems arise, engineers quickly analyze and rectify them, thereby strengthening the system. This continuous testing and improvement cycle helps ensure that when you settle in to binge-watch your favorite series, you experience uninterrupted streaming.

Thanks to tools like Chaos Monkey and the principles of Chaos Engineering, Netflix can deliver a seamless viewing experience. Next time you watch a show without any glitches, remember the behind-the-scenes efforts of these unsung heroes keeping your entertainment flawless.

This tool is also available for open source usage. Check out the docs here.

Randomness in Cryptographic Systems

Randomness plays a critical role in cryptographic systems, forming the backbone of security protocols across the digital landscape. This section explores why randomness is essential in cryptography, how it is generated, and the challenges involved in ensuring its effectiveness.

In cryptographic systems, randomness is used to generate keys, initialize cryptographic algorithms, and for non-repudiation processes like digital signatures and secure communications.

The strength and security of almost all cryptographic techniques depend on the quality of the randomness used. If the randomness is predictable, so too are the cryptographic keys, making the system vulnerable to attacks.

If we encrypt the text “Hello World”, we will get this text “oO64D2IzNWKSQnDM8fcZ/w==”. To see the power of encryption, let’s also encrypt variations of the text: “HelloWorld” (without a space) and “Hello world” (with lowercase), while also experimenting with a different encryption key.

Here are the outcomes:

╔═════════════╦═══════════╦══════════════════════════╗
║    Text     ║ Password  ║      Encoded value       ║
╠═════════════╬═══════════╬══════════════════════════╣
║ Hello World ║      1234 ║ oO64D2IzNWKSQnDM8fcZ/w== ║
╠─────────────╬───────────╬──────────────────────────╣
║ HelloWorld  ║      1234 ║ KvqAEHQhP9iBdFWhOUcYVg== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello world ║      1234 ║ jdKRaAw9ULCFb627e3mNpQ== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello World ║       123 ║ S/eGTyDQsgLwcEIrCWUAJw== ║
╠─────────────╬───────────╬──────────────────────────╣
║ HelloWorld  ║       123 ║ /JRa5+mllydL/F0m7NuxYA== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello world ║       123 ║ s3AydwlvlgHCcpiAhaurXg== ║
╚═════════════╩═══════════╩══════════════════════════╝

If you consider the above table, you’ll notice that even a small change, such as a change in spacing or a single character, leads to a complete transformation of the encrypted text.

This means that if the intruder manages to obtain both the original text and its encrypted form, they would still face a significant challenge in trying to guess the password required to unlock the entire database.

Could you hack the encryption?

Brute force attacks are a straightforward yet powerful method used by attackers to crack passwords and encryption keys.

A brute force attack involves systematically checking all possible combinations until the correct one is found. Attackers use brute force methods to try every possible key or password until they decrypt the targeted data.

Ream more about brute force attacks

In our case, for decrypting the word we will need to try every possible combination (even like a, aa, b, bb strings and so on).

Now lets calculate how much time is needed to decrypt/check every possible combination for our password. Suppose you own an exceptionally powerful supercomputer, coupled with cutting-edge technology and virtually unlimited resources.

Let’s say the computer has a whopping 1 terabyte (TB) of RAM allowing it to handle lots of tasks at once. For the CPU, this supercomputer boasts a mind-boggling speed of 1 exaflop, which means it can do about 1 quintillion calculations in just one second. 1 exaflop is equal to 1,000,000 gigaflops. So, to achieve 1 exaflop of computing power using Intel i9 processors with a performance of 300 gigaflops each, you would need 1,000,000 gigaflops / 300 gigaflops = 3,333,333 Intel i9 processors.

This hypothetical supercomputer, performing mind-blowing calculations at lightning speed, could do a brute-force attack on an encryption algorithm.

If our hypothetical supercomputer were to attempt every possible combination of text to decipher the encrypted data, it would be faced with an astronomical number of possibilities — ²²⁵⁶. It’s estimated that it would take not just years, not even centuries, but potentially tens of thousands of decades.

To read more about this, you can refer to this article that I wrote.

Randomness in Simulation and Modeling

Monte Carlo Simulation

The Monte Carlo Simulation is a mathematical technique used to understand the impact of risk and uncertainty in prediction and forecasting models. Essentially, it’s a method used to predict the probability of different outcomes when the intervention of random variables is present.

Named after the famous Monte Carlo Casino due to its reliance on randomness, this method is widely used across finance, engineering, research, and more.

In the context of finance, Monte Carlo simulation is commonly used to assess the risk and value of financial instruments, such as options or portfolios. By generating a large number of random scenarios for different input variables, such as asset prices or interest rates, Monte Carlo simulation can provide a range of possible outcomes and their associated probabilities. This method is mostly used when there is no analytical solution for the given problem.

Telecoms use them to assess network performance in various scenarios, which helps them to optimize their networks. Financial analysts use Monte Carlo simulations to assess the risk that an entity will default, and to analyze derivatives such as options. Insurers and oil well drillers also use them to measure risk.

To read more, check out this article.

Monte Carlo Simulation Output of a Stock price. Retrieved from this article

Future of Randomness in Software Engineering

The future of randomness in software engineering looks particularly promising, with significant advancements expected from emerging technologies like quantum computing.

Quantum Computing and Quantum Randomness

Quantum computing introduces an inherently stochastic element known as quantum randomness.

Unlike classical computing, which relies on deterministic processes, quantum processes are unpredictable by nature. Quantum random number generators (QRNGs) exploit this property to generate true random numbers directly from quantum phenomena, such as the superposition of quantum states or the measurement of entangled particles.

These devices are expected to provide a more secure and fundamentally unpredictable source of randomness than is currently possible.

IBM’s new 53-qubit quantum computer

Quantum computing has the potential to revolutionize cryptography. Current cryptographic systems rely on the computational difficulty of certain problems (like factoring large numbers) which quantum computers could solve effortlessly. But quantum cryptography, utilizing quantum randomness for key distribution, promises to be virtually unbreakable due to the laws of quantum mechanics.

Current State of Quantum Computing

As of now, quantum computing is in an experimental phase. Researchers and companies like Google, IBM, and D-Wave are actively developing quantum computers and have made significant progress in recent years.

For instance, Google announced "quantum supremacy" in 2019, claiming that their quantum computer solved a problem that would be practically impossible for a classical computer to solve in any reasonable amount of time.

Quantum bits, or qubits, which are the basic units of information in quantum computing, are highly susceptible to interference from their environment. This leads to high error rates in quantum computations. Developing error-correcting codes and finding ways to make qubits more stable is a significant focus of current research.

Currently, quantum computers have a limited number of qubits. To be practical for widespread use, quantum computers need to scale up the number of qubits significantly without a corresponding increase in error rates.

Also those computers need to operate at extremely low temperatures, close to absolute zero, to maintain the quantum state of the qubits. Maintaining such conditions is technically challenging and expensive.

The consensus among experts is cautiously optimistic, but varies widely regarding when quantum computing will become practical for broad use.

Some experts believe that within the next decade, we'll begin to see quantum computers solving more practical, real-world problems, potentially revolutionizing fields like cryptography, materials science, and complex system simulation. Others think that these applications might remain out of reach for several more decades.

Wrapping Up

The future of randomness in software engineering holds vast potential to drive innovation across multiple domains.

As we delve deeper into quantum computing and enhance our current technologies, randomness will play an increasingly critical role in shaping the next generation of software solutions, making them more secure, efficient, and reflective of the complex world they model.

Gor Grigoryan - freeCodeCamp.org

How to Effectively Manage Unique Identifiers at Scale: From GUIDs to Snowflake IDs and Other Modern Solutions

What Are Unique Identifiers? 🪪

Table of Contents:

The Historical Context of Identifiers

Government Management of Unique Identifiers

Structure of Social Security Numbers

Scalability in Government Systems

Tech Companies and Their Scale

Meta (Facebook)

X (Twitter)

Telegram

The Role of Auto-increment IDs and Their Scalability Issues

Sequence Numbers and Their Advantages Over Auto-increment IDs

UUIDs: Overview and Usage

What's so great about UUIDs?

UUID Version 1

UUID Version 4

UUID Version 5

UUID Version 7

Problems with UUID v4 (GUID)

The solution with UUID v7

UUID Versions 2, 3, and 6

Snowflake ID

"The Problem" stated by Twitter:

Finding Tweet Timestamps

Wrapping Up

Cover image: A 2017 post celebrating Facebook reaching 2 billion users.

Decoding Chaos: How True Randomness Works in Software Engineering

Understanding Randomness

Why is Randomness Important in Software Engineering?

Security

Testing and Quality Assurance

Simulation and Modeling

Additional Applications

Prerequisites

Here's what we'll cover in this article:

Coin Toss Paradigm

The Illusion of Human Randomness

How Random Number Generators Work

Simple random number generator

True Random Number Generation (TRNG) and Entropy Sources

Earthquakes in TRNG

Additional technical details

Hardware Events in TRNG

Thermal Noise (Johnson-Nyquist Noise):

Clock Drift

Photonic Emission

Cloudflare’s Lava Lamps for Randomness

Human Factors in TRNG

Mouseware

Randomness in Software Testing

Chaos Monkey developed my Netflix

Randomness in Cryptographic Systems

Could you hack the encryption?

Randomness in Simulation and Modeling

Monte Carlo Simulation

Future of Randomness in Software Engineering

Quantum Computing and Quantum Randomness

Current State of Quantum Computing

Wrapping Up