Cryptography - freeCodeCamp.org

What Your Auth Library Isn't Telling You About Passwords: Hashing and Salting Explained

Tilda Udufo — Thu, 12 Mar 2026 19:15:55 +0000

Before I started building auth into my own projects, I didn't think too deeply about what was happening to passwords behind the scenes.

Like most developers, I installed a library, called a hash function, stored the result, and moved on. I see a random string like $2a11yMMbLgN9uY6J3LhorfU9iu.... in my database and assume my user's passwords are unbreakable. I knew it was a hashed password. But what was the $2a? What was 11? And if I couldn't reverse it, how was my app verifying logins at all?

If you've ever used bcrypt, Devise, Django's auth system, or really any authentication library, you've been protected from these details. That's good engineering. But understanding what's actually happening makes you a better developer, and it explains a lot of things that seem confusing or arbitrary until suddenly they don't.

By the end of this article, you'll be able to look at that string and know exactly what every part means.

Prerequisites

This article is written for developers who have used an auth library before but never looked closely at what it's doing. You don't need a cryptography background. If you've ever hashed a password and moved on, this is for you.

Hashing vs Encryption
Why a Plain Hash Isn't Enough
Enter Salting
Why bcrypt Is Slow (and Why That's the Point)
What's Actually in Your Database
Wrapping Up

Hashing vs Encryption

Most developers use the terms hashing and encryption interchangeably. They're not the same thing, and the difference matters more than you might think.

Encryption is a two-way process. You take data, encrypt it with a key, and you can decrypt it later using that same key (or a related one). This is useful when you need to retrieve the original value. Storing a credit card number you'll need to charge later, or sending a message that the recipient needs to read.

Hashing is different. It's a one-way process. You put data in, you get a fixed-length string out, and there's no key that lets you reverse it. The original value is gone.

That might sound like a limitation. For passwords, it's actually exactly what you want.

Think about it: when a user logs in, you don't need to know their password. You just need to verify that what they typed matches what they set when they signed up. You can do that entirely with hashes. Hash what they typed, compare it to the stored hash, done. You never need the original.

This is why "forgot password" flows always ask you to set a new password rather than sending you your old one. Yes, sending you your old password over email might be risky but the actual reason is that they genuinely can't retrieve it. If they can email you your original password, that's a red flag. It means they stored it in a way that's reversible, which means it's not properly protected.

Why a Plain Hash Isn't Enough

So if hashing is one-way and irreversible, isn't that enough? Just hash every password before storing it and you're done?

Not quite.

The first problem is rainbow tables. A rainbow table is a precomputed database of hashes for common passwords. An attacker who gets hold of your database doesn't need to reverse the hashes. They just look them up. If your user's password is "password123", its SHA-256 hash is always the same string, and that string is almost certainly already in a rainbow table somewhere.

The second problem is related. If two users have the same password, they'll have the same hash. So if an attacker cracks one, they've cracked all of them. In a database with thousands of users, that's a significant security risk.

Here's what that looks like in practice:

import hashlib

# Two users, same password
password = "password123"

hash_one = hashlib.sha256(password.encode()).hexdigest()
hash_two = hashlib.sha256(password.encode()).hexdigest()

print(hash_one == hash_two)  # True, every single time

The hash is deterministic. The same input always produces the same output. That's useful for a lot of things, but for passwords it creates a real vulnerability.

A plain hash gets you partway there. But it's not enough on its own.

Enter Salting

The fix for both problems is something called a salt. And, no it's not your regular table salt.

A salt is a random string generated uniquely for each password. Before hashing, you combine the salt with the password, then hash the result.

import hashlib
import os

password = "password123"

# Generate a random salt
salt = os.urandom(16).hex()

# Combine salt and password, then hash
salted_password = salt + password
hashed = hashlib.sha256(salted_password.encode()).hexdigest()

print(f"Salt: {salt}")
print(f"Hash: {hashed}")

Now two users with the same password produce completely different hashes, because their salts are different. And because the salt is random and unique, it can't be precomputed into a rainbow table.

Here's the surprising part: the salt doesn't need to be secret. It gets stored alongside the hash in your database, in plain text. That might feel wrong at first. If an attacker has your database, they have the salt too.

But that's fine. The salt's job isn't to be secret. Its job is to make each hash unique so that precomputed tables are useless. An attacker who wants to crack a salted hash has to brute force each password individually, from scratch, using that specific salt. They can't reuse work across users.

That's a meaningful increase in the cost of an attack, even when the salt is visible.

Why bcrypt Is Slow (and Why That's the Point)

Salting solves the rainbow table problem. But there's still a gap. If an attacker has your database and decides to brute force a password, they can just keep guessing. Hash a candidate password with the stored salt, compare it to the stored hash, repeat. With a fast hashing algorithm like SHA-256, a modern GPU can do billions of these comparisons per second.

That's the problem with using a general-purpose hash function for passwords. Algorithms like SHA-256 and MD5 were designed to be fast. That's great for things like verifying file integrity or generating checksums. For passwords, it's a liability.

This is where bcrypt comes in. bcrypt is a password hashing algorithm designed specifically to be slow. Not broken or inefficient by accident, but deliberately, configured-to-be slow. It has a cost factor (sometimes called a work factor) that controls how computationally expensive the hashing operation is.

import bcrypt

password = b"password123"

# The cost factor is set here (12 is a common production value)
hashed = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))

print(hashed)

Every time you increase the cost factor by 1, the hashing operation takes roughly twice as long. At a cost factor of 12, a single hash might take around 300 milliseconds on your server. That's imperceptible to a user logging in. But for an attacker trying to brute force millions of passwords, it turns a feasible attack into an impractical one.

The other advantage of a configurable cost factor is that you can increase it over time as hardware gets faster. What was slow enough in 2015 might not be slow enough today. bcrypt lets you adapt without changing the algorithm itself.

What's Actually in Your Database

So far, we've talked about salting and cost factors as separate concepts. Here's the satisfying part: in bcrypt, they're all stored together in a single string. That string sitting in your database contains everything needed to verify a password, and once you know how to read it, it's not mysterious at all.

Here's a typical bcrypt hash:

\(2a\)12$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO

Let's break it down:

$2a — the algorithm version. This tells your auth library which version of bcrypt was used to generate the hash.
$12 — the cost factor. This is the number we talked about in the previous section. A cost factor of 12 means the hashing operation was run 2¹² times.
$yMMbLgN9uY6J3LhorfU9iu — the salt. The first 22 characters after the final $ are the salt, stored right there in plain text alongside the hash. Your auth library reads this back out when verifying a login.
LAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO — the hash itself. The remaining characters are the actual output of the hashing operation.

When a user logs in, your auth library doesn't need any extra information. It reads the algorithm version, cost factor, and salt directly from the stored string, hashes the login attempt using those same parameters, and compares the result. If they match, the password is correct.

This is why bcrypt verification works even though the salt is never stored separately. It was never separate to begin with.

Wrapping Up

Next time you see a bcrypt string in your database, you'll know exactly what you're looking at. The algorithm version, the cost factor, the salt, and the hash, all encoded in a single string that your auth library knows how to read.

But the bigger takeaway is this: the libraries we rely on every day aren't magic. They're carefully designed systems built on top of concepts that are worth understanding.

Knowing why bcrypt is slow, why salting works even when the salt is visible, and why fast hash functions like SHA-256 are the wrong tool for passwords makes you a more intentional developer. You'll make better decisions about cost factors, you'll recognise a poorly implemented auth system when you see one, and you'll understand why a data breach where passwords were hashed with MD5 is so much worse than one where bcrypt was used.

Cryptography for Beginners: Full Python Course (SHA-256, AES, RSA, Passwords)

Beau Carnes — Wed, 05 Nov 2025 22:41:04 +0000

We just posted a course on the freeCodeCamp.org YouTube channel that will teach you all about cryptography. You'll learn essential techniques like hashing (SHA-256) for verifying file integrity, symmetric encryption (AES), and asymmetric encryption (RSA) using public and private keys. The practical focus of the tutorial involves building a fully functional command-line cryptography tool in Python. Upon completion, you'll have a complete practical toolkit and the skills to safeguard data, secure passwords, and deter tampering.

This course was developed by Thanishkka. She is part of Hack Club. Hack Club is a global non-profit organization that creates a community for high school students interested in coding and making things with technology.

Here are the sections in the course:

Introduction: What is Cryptography?
About Hack Club and the Course Creator
Cryptography Basics & Cybershe Demo
Three Main Areas: Hashing, Symmetric, and Asymmetric Encryption
Deep Dive into Hashing (SHA 256) and File Integrity
Symmetric Encryption with AES (Key, IV, and Modes)
Asymmetric Encryption with RSA (Public and Private Keys)
Setup: Python and VS Code Installation
Creating and Activating a Virtual Environment
Installing Required Python Libraries (cryptography, zxcvbn, bcrypt)
Coding the File Hashing Function (hash.py)
Coding the File Integrity Verification Function
Coding AES Symmetric Encryption/Decryption (encryption.py)
Coding RSA Asymmetric Encryption/Decryption
Coding the Password Strength Checker (password.py)
Coding Password Hashing and Verification (using bcrypt)
Building the Command Line UI (main.py)
Final Toolkit Demo and Testing
Conclusion and Next Steps

Watch the full course on the freeCodeCamp.org YouTube channel (1-hour watch).

The Cryptography Handbook: Exploring RSA PKCSv1.5, OAEP, and PSS

Hamdaan Ali — Wed, 02 Apr 2025 22:04:38 +0000

The RSA algorithm was introduced in 1978 in the seminal paper, "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems". Over the decades, as RSA became integral to secure communications, various vulnerabilities and attacks have emerged, underscoring the importance of understanding and implementing RSA correctly.

This handbook will help you understand the internal workings of the RSA algorithm, how they have evolved over the years, and the schemes defined under various RFCs. This knowledge will help you make informed choices about the most suitable RSA schemes depending on your business requirements.

In this handbook, we’ll begin by exploring the foundational principles of the RSA algorithm. By examining its mathematical underpinnings and historical evolution, you will gain insight into the diverse array of attacks that have emerged over the years.

The narrative unfolds as an evolutionary journey: from the original, straightforward (textbook) RSA implementation, through the discovery of vulnerabilities, to the development of effective countermeasures, and further refinements as new challenges were encountered. This progression illuminates how RSA has transformed over time and also demonstrates how modern cryptographic libraries have integrated these advancements to achieve secure implementations in today’s applications.

You can also watch the associated video here:

Prerequisites
The Alice-Bob Paradigm
The Birth of the RSA Cryptosystem
RSA Operations
Issues with Euler’s Totient Function in RSA
The Carmichael Function
- Mathematical Implication of The Carmichael function
- The Carmichael Function in Modern Implementations
Issues with Raw RSA
Exploiting Textbook RSA’s Determinism and Malleability
Low-Exponent Attacks
Håstad’s Broadcast Attack: Low Exponent Meets Multiple Recipients
Introduction to Padding Schemes in RSA
Public Key Cryptography Standards (PKCS#1 v1.5)
- The Mathematics Behind PKCS#1 v1.5
The Bleichenbacher Attack
Optimal Asymmetric Encryption Padding (OAEP)
- The Mathematics Behind OAEP
Why SHA-1 or MD5 Are Safe in RSA-OAEP
- Label Hashing
- Mask Generation Function (MGF1)
Adoption in Cryptographic Libraries (PKCS#1 v1.5 vs OAEP)
Enhancing Digital Signatures: The Transition to PSS
The Road Ahead: Assessing RSA’s Long-Term Viability
References

Prerequisites

Linear Algebra: A foundational understanding of Linear Algebra and Modular Arithmetic will help you understand certain sections of the handbook, though it is not an absolute requirement. This handbook provides comprehensive explanations of mathematical expressions and their underlying concepts as they arise.

For a concise and relevant introduction to the Chinese Remainder Theorem (CRT) in the context of the handbook, you may find this resource helpful: CRT, RSA, and Low Exponent Attacks | YouTube.

Patience (and a Sense of Adventure): RFCs can sometimes get dull to read, and research papers can feel intimidating at first glance. This handbook is designed to make standard cryptographic concepts accessible to everyone, guiding you through each step with clarity and intuition. Every concept is reinforced with clear, step-by-step examples, ensuring not only a thorough understanding but also familiarity with widely used standard notations. So take your time, take a deep breath, and embrace the journey.

For visual learners, the associated video may offer a more engaging experience.

The Alice-Bob Paradigm

Throughout this handbook, you will come across numerous sequence diagrams and mathematical proofs that use the Alice-Bob Paradigm.

The Alice-Bob paradigm is a common convention in cryptography where two generic entities, often named Alice and Bob, are used to illustrate various scenarios, protocols, or cryptographic principles.

These characters represent two parties engaged in communication, with Alice typically representing the sender or initiator, and Bob representing the receiver or responder.

We often introduce Eve as a third party, symbolizing an eavesdropper or potential attacker, adding an element of security risk, and illustrating scenarios where external entities might attempt to intercept or manipulate the communication.

The Birth of the RSA Cryptosystem

The year 1978 witnessed the birth of a new era in cryptography with the introduction of the RSA cryptosystem, named after its inventors (Rivest, Shamir, and Adleman).

This development, introduced in the paper "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems", provided a method for secure digital communication and laid the foundation for modern public-key cryptography.

At the heart of RSA lies elementary number theory – specifically, the properties of prime numbers and modular arithmetic. Let’s first understand how these key concepts form its mathematical foundations.

Prime Numbers and Composite Moduli

The algorithm starts by selecting two large prime numbers, denoted as p and q. Their product ($n = p \times q$) forms the modulus for both the public and private keys.

The security of RSA depends heavily on the fact that, while multiplying these primes is computationally straightforward, factoring the resulting large composite number n is considered infeasible for sufficiently large primes.

At this point, it’s important to note that p and q must be large prime numbers to ensure RSA’s security. Fortunately, modern libraries handle this automatically by using well-established prime-generation algorithms. As a result, you can focus on higher-level aspects of your applications without having to manage the low-level details of prime selection.

For instance, let’s have a look at OpenSSL’s RSA key generation routine which performs several checks to ensure that the resulting modulus $n = p \times q $ meets the desired bit-length requirements:

The below snippet right-shifts the product of the generated primes (stored in r1) by bitse - 4 bits to isolate the top 4 bits, which are then checked to ensure that the modulus meets the desired size criteria.

if (!BN_rshift(r2, r1, bitse - 4))
    goto err;
bitst = BN_get_word(r2);

The extracted bits (bitst) are then compared against a predefined range (from 0x9 to 0xF). This range ensures that the most significant byte of the modulus isn’t too small or too large.

if (bitst < 0x9 || bitst > 0xF) {
    bitse -= bitsr[i];

If the significant bits do not fall within the desired range, the bit length is adjusted and the prime-generation process is retried. If the number of retries exceeds a set limit, the entire process is restarted.

if (!BN_GENCB_call(cb, 2, n++))
    goto err;
if (primes > 4) {
    if (bitst < 0x9)
        adj++;
    else
        adj--;
} else if (retries == 4) {
    i = -1;
    bitse = 0;
    sk_BIGNUM_pop_free(factors, BN_clear_free);
    factors = sk_BIGNUM_new_null();
    if (factors == NULL)
        goto err;
    continue;
}
retries++;
goto redo;

To ensure that the numbers are necessarily primes, these libraries use a combination of probabilistic tests, including the Rabin-Miler Primality Testing, and sieving methods to quickly eliminate non-prime candidates.

The Euler Totient Function

For a number n that is the product of two primes, the Euler totient function is given by:

$$\varphi(n) = (p-1)(q-1)$$

This function counts the number of integers less than $n$ that are co-prime to $n$. Euler’s theorem, which states that for any integer a co-prime to n, $ a^{\varphi(n)} \equiv 1 \pmod{n}$ plays a central role in proving why RSA’s operations are reversible.

But most modern RSA cryptosystems use the Carmichael function instead of the Euler’s Totient Function. We will examine the reasoning behind this shift in the next few sections.

Computing the Keys

Now we select an integer $e$ such that $1 < e < \varphi(n)$and $\gcd(e, \varphi(n)) = 1$. This $e$ becomes the public exponent you see as a parameter in the RSA function calls you make.

With that done, now let’s determine $d$ as the modular multiplicative inverse of $e \, \, modulo \, \varphi(n)$. In other words, $d$ is computed such that:

$$e \times d \equiv 1 \pmod{\varphi(n)}$$

This step is the mathematical linchpin ensuring that decryption is the inverse operation of encryption.

In the 1978 paper, the authors explicitly provided these formulas and steps. They showed that if you encrypt a message m using $c = m^e \mod n$ and then decrypt using $m = c^d \mod n $ , the original message is recovered – thanks to the properties of modular exponentiation and Euler’s theorem. This mathematical framework was novel at the time and immediately set the stage for a new era in cryptography.

RSA Operations

Now that the mathematical foundations are laid, the RSA algorithm can be seen as a set of three core operations: Encryption, Decryption, and Signing. Throughout this handbook's next sections, we will critically analyze these operations and learn about several pitfalls in each. Then we will examine how these were averted with the birth of new schemes, each to solve a new issue discovered on the way.

Encryption

With the public key $(n, e)$ available to everyone, any user can encrypt a message $m$ (where $m$ is first encoded as an integer in the range $0 \leq m < n$ ) using the formula:

$$c = m^e \mod n$$

Here, c is the ciphertext. Because the operation is based on modular exponentiation, even if m is known, recovering m from c without knowing d is computationally hard.

Decryption

The intended recipient, who possesses the private key $d$, decrypts the cipher text $c$ by computing:

$$m = c^d \bmod n$$

Using the relationship ($e \times d \equiv 1 \pmod{\varphi(n)}$) and properties from Euler’s theorem, the above operation exactly inverts the encryption step, recovering the original message $m$.

This ensures that only the holder of the private key can read the encrypted message. This is the backbone of RSA’s use in secure communication.

The sequence diagram below wraps up our discussion so far:

Digital Signatures

Digital signatures fulfill a different security goal: authenticity and integrity rather than confidentiality. While encryption and decryption use the public key for “locking” and the private key for “unlocking,” digital signatures reverse these roles.

1. Signing

The author of a message uses their private key $d$ to compute a signature $s$ on the message $m$, guided by the formula mentioned below:

$$s = m^d \bmod n$$

This can later be verified by others using the corresponding public key. The purpose here is not to recover a secret message but to create a proof of authenticity.

2. Verification:

Anyone with the public key $(n, e)$ can verify that the signature s indeed belongs to the message $m$ by computing:

$$m \equiv s^e \bmod n$$

If the equivalence holds, it confirms two key points: That the message has not been tampered with (that is, integrity), and that the signature must have been generated using the private key d (that is, authenticity).
As long as $d$ is kept secret, only the legitimate signer can produce a valid signature. Take at look at the sequence diagram below to understand the complete process.

Issues with Euler’s Totient Function in RSA

While using Euler’s Totient Function works well in theory, implementers of the scheme realized its practical downsides. Simply put, the primary issue was that Euler’s Totient Function can lead to a larger private exponent $d$ than what was necessary.

To completely appreciate this fact, let’s take a step back to understand why the size of the private exponent $d$ matters in RSA.

RSA decryption (or signing) involves computing $m^d ~~mod ~n$ which is done via modular exponentiation. The time complexity of exponentiation algorithms (like square-and-multiply) grows with the number of bits in $d$. A larger $d$ means more multiplications and squarings, that is slower decryption.

In practice, if using the Euler’s Totient Function makes $d$ roughly twice as large as what is required, then decryption can be almost twice as slow compared to using the minimal $d$. This inefficiency is especially noticeable when $e$ is small (common public exponents like 3 or 65537). A small $e$ leads to a very large $d$ under $φ(n)$.

Beyond performance, having an unnecessarily large $d$ can increase storage size slightly (a few more bytes for the key). This can also lead to interoperability quirks, which is why standards and protocols such as FIPS 186-4 [1] and RFC 8017 [2] expect $d$ to be below a certain size. We will take a detailed look at this in the next section.

To combat these issues, cryptographers utilized the Carmichael function to generate RSA keys. Before we dive into how the Carmichael function helps our case, let’s quickly understand what the Carmichael function actually is.

The Carmichael Function

The Carmichael Function, represented by $λ(n)$, also known as the reduced totient or least universal exponent, is defined as the smallest positive integer $m$ such that for every integer $a$ co-prime to $n$, $ a^m ≡ 1 (mod n)$.

To put this in easy terms, $λ(n)$ is the exponent of the multiplicative group modulo $n$ (the least common multiple of the orders of all elements). For RSA-style moduli (product of primes), the Carmichael function is guided by the formula:

$$\lambda(n) = \operatorname{lcm}(p-1,\,q-1)$$

where $n = p . q$ with $p$ and $q$ being the large primes.

You may now understand the Carmichael function better if we put it in the following way: $λ(n)$ is the least common multiple of $λ(n)$ of each prime power dividing n. So for a prime $p$, $λ(p) = φ(p) = p – 1$, and for two primes, we take the $lcm$ of $p-1 $ and $q-1.$

Mathematical Implication of The Carmichael function

The Carmichael function $λ(n)$ is a “tighter” bound. What this means is that $λ(n)$ divides $φ(n)$ (since the exponent of a finite group always divides the group order by Lagrange’s Theorem [3])

If $p$ and $q$ are both odd primes, then $p–1$ and $q–1 $ are even, so their least common multiple is roughly half of $(p–1)(q–1)$. Mathematically:

$$λ(n) = \dfrac{(p–1)(q–1)} {gcd(p–1, q–1)}$$

We can observe that this $λ(n)$ is lesser than or equal to $φ(n)$ and often considerably smaller. This means $λ(n)$ provides the minimal exponent needed for RSA’s correctness, whereas $φ(n)$might be a larger number that still works but isn’t necessary.

When you choose two large random primes $p$ and $q$, you have:

$$\varphi(n) = (p-1)(q-1) \approx n,$$

because for large primes, the subtracted ones make only a small difference compared to $p$ and $q$ themselves.

Now, since both $p-1$ and $q-1 $ are even, they each have a factor of 2. If those are their only common factors (which is often the case for random primes), then:

$$\lambda(n) = \mathrm{lcm}(p-1, q-1) \approx \frac{\varphi(n)}{2}.$$

When you compute the private exponent $d$ as the modular inverse of $e$ (a small number) modulo $ \varphi(n)$ versus modulo $\lambda(n)$, the range from which $d$ is chosen is roughly twice as large in the former case. That means the typical $d$ when computed modulo $\varphi(n)$ can be about twice as large as when computed modulo $\lambda(n)$. A larger $d$ means that during decryption (or signing) the modular exponentiation $c^d \mod n$ takes slightly more time.

Intuitively, using $λ(n)$ ensures we don’t “overshoot” the exponent required for the modular arithmetic to cycle back to 1.

A smaller $d$ makes every RSA decryption and signature operation faster. For instance, if $λ(n)$ is roughly half of $φ(n)$, then $d$ will have one less bit than it would otherwise, cutting the exponentiation work by about 50%. This is a free performance gain, as we aren’t changing the security assumptions or the key size $n$, just using the mathematically tight value for the exponent. The RSA algorithm’s security is not weakened by this and now the $d$ is different but functionally equivalent.

The Carmichael Function in Modern Implementations

The critical property for RSA ($e·d ≡ 1 ~mod ~~λ(n)$) is both necessary and sufficient for correct decryption, thanks to Carmichael’s theorem. So there’s no need for $d$ to also satisfy the stronger condition modulo $φ(n)$.

By switching to computing $d ~ modulo ~~ λ(n)$ (i.e., $d = e^{-1} ~mod ~~λ(n)$), we directly get the smallest working private exponent. Ronald Rivest himself noted this optimization in his 1999 seminal paper [4], stating that solving for $d$ using $ λ(n)$ instead of $φ(n)$ is slightly preferable because it can result in a smaller value for d.

Over time, the use of $ λ(n)$ in RSA moved from an academic suggestion to an industry standard. Today’s cryptographic standards explicitly acknowledge or require the $λ(n)$ approach.

For example, the official RSA standard (PKCS #1 v2.2, RFC 8017 [2]) defines the RSA key generation in terms of $λ(n)$. It specifies that the private exponent $d$ is chosen such that $e·d ≡ 1 (mod λ(n))$ (with $λ(n) = lcm(p–1, q–1)$). In other words, PKCS #1 expects the Carmichael function to be used for the modulus of the exponent. Likewise, NIST’s FIPS 186-4 (Digital Signature Standard) mandates that $d$ be less than $λ(n)$.

Any RSA key where $d$ is larger than $λ(n)$ is considered non-compliant in those strict contexts. This effectively forces implementations to use the smaller $λ(n)$-based exponent, since any “oversized” $d$ can be reduced $mod ~~λ(n)$ to meet the criterion.

Standards such as FIPS 186-4 [1] (the Digital Signature Standard) and RFC 8017 [2] (which specifies PKCS#1 v2.2 for RSA Cryptography) include requirements or recommendations that imply the private exponent $d$ should be as small as possible and ideally less than $ \lambda(n)$. Using $\lambda(n)$ (the least common multiple of $p-1$ and $q-1$) directly produces the smallest valid $d$, whereas using $\varphi(n)$ often results in a $d$ that is larger than necessary. This not only improves performance (by reducing the number of modular multiplications needed during decryption/signing) but also helps maintain interoperability with protocols that expect d to be below a certain size.

The Python cryptography library (PyCA cryptography) explicitly documents [5] that it uses Carmichael’s totient to generate the “smallest working value of $d$,” noting that older implementations (including the original RSA paper) used Euler’s totient and ended up with larger exponents. OpenSSL also uses the Carmichael function in their low-level RSA APIs [6].

This shift to the Carmichael function ensures that under the hood your RSA key is a bit more efficient than the ones from the late 1970s while providing the same level of security.

Issues with Raw RSA

Raw or “Textbook” RSA soon turned out to be insecure when two major weaknesses were discovered.

The operations involved in RSA are entirely deterministic, which means that for a given plaintext $m$, encryption always produces the same cipher text $C = m^e \mod n$.

An eavesdropper or an attacker, say Eve, can guess or derive plain texts by exploiting the predictability of outputs. Since RSA encryption is a public operation, an attacker can encrypt likely messages and compare results to a target cipher text – a trivial chosen plaintext attack.

Besides this, textbook RSA is also malleable. This means that its algebraic structure allows attackers to manipulate cipher texts in meaningful ways. For instance, given a cipher text $C = RSA(M)$, an attacker can multiply it by the encryption of a known value (say, r) to produce a new cipher text $C’ = C · r^e ~~mod ~n$, which decrypts to the plaintext $M·r$. When the legitimate receiver decrypts $C'$, the result is $M·r$, from which the attacker can often recover $M$.

Let’s understand these vulnerabilities with a small practical example.

Exploiting Textbook RSA’s Determinism and Malleability

Key Generation (Setup)

For our toy example, we’ll choose small prime numbers and generate an RSA key pair:

Let’s select the values of $p =3$ and $q=11$. Both of these values are prime. Now, compute the modulus and Totient Function as follows:

$$\begin{gather} \begin{split} n = p × q = 3 × 11 = 33 \\ φ(n) = (p – 1) × (q – 1) = 2 × 10 = 20 \end{split} \end{gather}$$

Now choose the public exponent. Let’s consider $e=3$ since it is coprime with $ φ(n) = 20$, and $gcd(3, 20) = 1$.

Now let’s compute the private exponent. We know that d is the modular inverse of $e ~~mod ~φ(n)$. We need to find d such that $(d × e) ≡ 1~~ (mod ~20)$. Using this knowledge we can compute $d = 7$ as $3 × 7 = 21 ≡ 1 ~~ (mod~ 20)$.

Finally, the public key is $(n = 33, ~ e = 3)$ and the private key (secret) is $d = 7$.

Encryption Process

Now, let’s encrypt a simple message using the above key. Let us select our plaintext to be $M = 4$. The cipher text in this case would be:

$$\begin{gather} \begin{split} C = 4^3 ~~mod ~33 \\ C = 64 ~~mod ~33 \\ C = 64 – 33×1 = 31 \end{split} \end{gather}$$

To consolidate the findings so far, if we encrypt message $4$ with the public key $(e=3, n=33)$, we will produce the cipher text $31$. Now, let’s try the exploits.

Determinism Exploit (Ciphertext Guessing Attack)

Textbook RSA is deterministic – the same plaintext always yields the same ciphertext (with no randomness involved). An attacker who intercepts the ciphertext $C=31$ can exploit this by encrypting likely plaintext guesses and comparing results:

The adversary, say Eve, will try encrypting candidate plaintexts with the public key and see which one produces $31$. They may pick randomized values to increase their efficiency:

$$\begin{gather} \begin{aligned} Guess~ M = 1 ⇒ 1^3~~ mod ~33 = 1 \\ Guess~ M = 2 ⇒ 2^3~~ mod ~33 = 8 \\ Guess~ M = 3 ⇒ 3^3~~ mod ~33 = 27 \\ Guess~ M = 4 ⇒ 4^3~~ mod ~33 = 31 \\ \end{aligned} \end{gather}$$

By simply comparing ciphertexts, the attacker finds that encrypting $4$ yields 31, which matches the intercepted ciphertext. Thus, the attacker learns the original plaintext $M$ was $4$. This is possible because there’s no randomization in textbook RSA – an eavesdropper can identify a message by trial encryption of guesses, breaking confidentiality if the message space is small or guessable.

Malleability Exploit (Ciphertext Manipulation Attack)

Raw RSA is also malleable. This means an attacker can take a ciphertext and modify it in a way that results in a predictable change in the decrypted plaintext. Let’s understand how this works.

RSA has a multiplicative property, that is, multiplying two ciphertexts corresponds to multiplying their plaintexts before encryption:

$$E(M_1) \cdot E(M_2) \mod n = (M_1^e \mod n)\times(M_2^e \mod n) \mod n = (M_1 \cdot M_2)^e \mod n$$

The sequence diagram below explains how the malleability exploit works in naive RSA.

Alice sends a ciphertext to Bob after the initialization phase. Note that by this point, n and e are public knowledge. Eve intercepts this ciphertext by using mechanisms such as a MiTM (Man in the Middle) attack.

Now, Eve picks a known value to manipulate the message. Let’s say the attacker chooses $X = 2$ (with the intent to double the original plaintext).

Then they compute the encryption of X using the public key:

$$E(X) = 2^3 \mod 33 = 8.$$

Now, Eve multiplies the original ciphertext by this value (mod n) to get a new ciphertext:

$$\begin{gather} \begin{split} C{\prime} = C \times E(X) \mod n = 31 \times 8 \mod 33 \\ C{\prime} = 248~~ mod~ 33 = 248 – 33×7 = 248 – 231 = 17 \end{split} \end{gather}$$

This new ciphertext $C{\prime}$ is the encryption of the product of the original plaintext and $2$. If we directly encrypted $M \times X = 4 \times 2 = 8$ with RSA, we would get $8^3 \mod 33 = 512 \mod 33 = 17$. This means that $C′$ corresponds to the plaintext $8$, which is the original message $4$ multiplied by $2$.

In a real-world chosen ciphertext attack, the attacker may have access to a decryption oracle or observe a system response that reveals information about $M{\prime}$. The decryption result $8$ is exactly $M \times 2$ (the original message multiplied by the attacker’s chosen factor). Knowing the factor $X = 2$, the attacker can deduce the original message by dividing: $8/ 2 = 4$.

Note that Eve has not broken the mathematical foundations behind RSA here. They have only used the public key to compute an encryption of $2$, and then combined it with the intercepted ciphertext. They don’t know the original plaintext yet, but they have manipulated the ciphertext in a way that they know the new plaintext is twice the original message.

Low-Exponent Attacks

Beyond determinism and malleability exploits, textbook RSA is also vulnerable to Low-Exponent Attacks. Using a small public exponent like $e = 3$ (or sometimes $17$) was popular because it used to speed up encryption and signature verification. But this soon turned out to be a security concern.

When RSA uses a small public exponent (say, $e = 3$) and the plaintext is very short (so that $M^3$ is smaller than the modulus $n$), the encryption does not “wrap around” modulo $n$. Mathematically:

$$c = M^3 \mod n = M^3 \quad \text{(if $ M^3 < n $)}$$

Let’s understand this with an easy example:

Consider our plaintext to be: $M = 5$. We compute $M^3$ as $M^3 = 5^3 = 125$.

Now assume $n$ is a $4096$‑bit number which is large compared to $125$. In this case, the ciphertext is simply $c = 125$. Eve intercepting $c = 125$ can compute the cube root of $125$ to get the plaintext: $\sqrt[3]{125} = 5$ thus recovering $M$ directly.

This shows that if $M$ is small enough, the ciphertext leaks the plaintext when $e$ is low.

Håstad’s Broadcast Attack: Low Exponent Meets Multiple Recipients

In 1985, Johan Håstad’s highlighted the broadcast attack that illustrates the danger of a low exponent, $e$, when the same message is sent to multiple parties as a broadcast.

Imagine Alice wants to send the same plaintext message M to three different recipients. Each recipient has their own RSA public key with modulus $N_1, N_2, N_3,$ but for speed all use $e = 3$ (a common practice historically). Alice encrypts $M$ with each public key, yielding ciphertexts:

$$\begin{gather} \begin{split} C_1 = M^3 \bmod N_1 \\ C_2 = M^3 \bmod N_2 \\ C_3 = M^3 \bmod N_3 \end{split} \end{gather}$$

Eve, who intercepts all three $C_1, C_2, C_3$ can recover M without breaking any single RSA key.

Since each $N_i $ is different (and we assume they are pairwise coprime, as RSA keys should be), the attacker can use the Chinese Remainder Theorem (CRT) to combine the three congruences $x \equiv C_i \pmod{N_i}$. Note that at this point Eve only has $C_1$, $C_2$ and $C_3$. They do not have the plaintext $M$ or $M^3$ and yet they can reconstruct $M^3$ with the intercepted data. To understand the Chinese Remainder Theorem and this reconstruction, you may follow this: CRT, RSA, and Low Exponent Attacks | Youtube.

There is a unique solution modulo $N_1N_2N_3$ for $x$, and that solution turns out to be an integer, $x = M^3$ (because the true integer $M^3$ is smaller than the product $N_1N_2N_3$ of each $M < N_i $ ). In essence, CRT lets Eve reconstruct $M^3$ exactly. Once they have $M^3$ as an ordinary integer, they simply take the cube root to find $M$. There’s no need to factor any modulus or invert the RSA function – the math falls out due to the low exponent.

The sequence diagram below aims to provide a high-level understanding of the attack:

Now let’s see this attack in action with a sample:

Suppose three different RSA public keys all use exponent $e=3$, with moduli $ n_b = 187$ (for Bob),
$n_c = 115 $ (for Carol), and $n_d = 87$ (for Dave).

These $n_i$ are pairwise coprime ($gcd$ of each pair is $1$). Now assume the same plaintext message $M$ is encrypted with each public key. Let’s take a concrete $M$. For example with $M=42$, we will have:

$$\begin{gather} \begin{split} c_b = M^3 \bmod n_b \\ c_c = M^3 \bmod n_c \\ c_d = M^3 \bmod n_d \\ \end{split} \end{gather}$$

On calculating these, we have:

$$\begin{gather} \begin{split} c_b = 42^3 \bmod 187 = 36 \\ c_c = 42^3 \bmod 115 = 28 \\ c_d = 42^3 \bmod 87 = 51 \\ \end{split} \end{gather}$$

So the three ciphertexts observed are $36$, $28$, and $51$, respectively. Eve who knows $n_b, n_c, n_d$ and these ciphertexts can now recover $M$ as follows:

Eve will compute the total modulus $N = n_b \cdot n_c \cdot n_d = 187 \times 115 \times 87 = 1,870,935.$ (This is the modulus for the combined system of congruences).
Now Eve will compute the partial products for each congruence:

$$\begin{gather} \begin{split} N_b = \frac{N}{n_b} = \frac{1,870,935}{187} = 10,005 \\ N_c = \frac{N}{n_c} = \frac{1,870,935}{115} = 16,269 \\ N_d = \frac{N}{n_d} = \frac{1,870,935}{87} = 21,505 \end{split} \end{gather}$$

At this point, Eve needs the inverses of each $N_i$ modulo its corresponding $n_i$:
- First Eve computes $M_b = (N_b)^{-1} \bmod n_b$, i.e. the number $M_b$ such that $N_b \cdot M_b \equiv 1 \pmod{187}$. In this case, $N_b = 10005$. Using the extended Euclidean algorithm, Eve can find $M_b = 2$ (since $10005 \times 2 = 20010 \equiv 1 \pmod{187}$).
- Then Eve computes $M_c = (N_c)^{-1} \bmod n_c$. Here $N_c = 16269$. The inverse mod $115$ turns out to be $M_c = 49$ (For verification: $16269 \times 49 \equiv 1 \pmod{115}$).
- Next up, Eve computes $M_d = (N_d)^{-1} \bmod n_d$. For $N_d = 21505$, the inverse mod $87$ is $M_d = 49$ as well (coincidentally the same value in this case, since $21505 \times 49 \equiv 1 \pmod{87}$).

Now Eve reconstructs the combined value using the Chinese Remainder Theorem for three congruencies. The construction of this formula is beyond the scope of this handbook, but to completely understand how this springs into action, you may go through this video: CRT, RSA and Low Exponent Attacks | Youtube.

$$C \;=\; c_b \cdot N_b \cdot M_b \;+\; c_c \cdot N_c \cdot M_c \;+\; c_d \cdot N_d \cdot M_d \pmod{N}$$

On substituting the numbers:

$$C = 36 \cdot 10005 \cdot 2 \;+\; 28 \cdot 16269 \cdot 49 \;+\; 51 \cdot 21505 \cdot 49 \pmod{1,870,935}$$

Let’s carefully evaluate each term:

$$\begin{gather} \begin{split} 36 \cdot 10005 \cdot 2 = 720,360 \\ 28 \cdot 16269 \cdot 49 = 22,341,348 \\ 51 \cdot 21505 \cdot 49 = 5,37,40,995 \\ \end{split} \end{gather}$$

Summing these gives a raw total of $7,20,360 + 2,23,21,068 + 5,37,40,995 = 7,67,82,423$. Now reduce this modulo $N = 1,870,935$:

$$\begin{align} \begin{split} C \equiv 7,67,82,423 \pmod{1,870,935}\\ C = 74,088 \\ \end{split} \end{align}$$

Now Eve will simply take the cube root of $C: \sqrt[3]{74088} = 42$, which is the original plaintext.
Eve has successfully recovered $M$.

The key takeaway from these attacks is that without proper defenses. RSA alone does not satisfy modern definitions of security. It is not resistant to chosen-plaintext or chosen-cipher text attacks. This gap between the theoretical one-way function (RSA’s trapdoor permutation) and a secure encryption scheme became evident as implementers found that naive RSA could be “broken” by various clever tricks.

To counter these weaknesses, standards bodies introduced padding schemes to strengthen RSA encryption. In the following sections, you will learn about each of these paddings schemes and how they’ve been exploited over the years.

Introduction to Padding Schemes in RSA

Before we dive into the padding schemes and how it helps our case, let’s quickly recap the need for padding in RSA.

Textbook RSA encryption is deterministic. The same plaintext always produces the same ciphertext under a given public key. This determinism makes raw RSA insecure. An attacker can guess possible messages, encrypt them with the public key, and compare with the target ciphertext to see which guess matches.

Beyond determinism, small-exponent attacks illustrate why padding is critical. If the message $m$ is too small relative to the modulus, raising it to a small public exponent (like $e=3$) might not wrap around $N$. Padding the plaintext with random data before encryption remedies these problems by making the ciphertext unpredictable and ensuring $m^e$ spans the modulus’ range.

Public Key Cryptography Standards (PKCS#1 v1.5)

In 1998, Kaliski and RSA Laboratories introduced PKCS#1 v1.5 to the world in a public publication [7]. In PKCS#1 v1.5, every RSA‐encrypted message is wrapped inside a special “encryption block” $EB$. This block ensures that the raw message is both the right size for RSA and padded in a way that’s hard to tamper with.

In this scheme, the plaintext is padded to the size of the modulus $N$ (in bytes) as:

$$EB = 00 ~||~ BT ~||~ PS ~||~ 00 ~||~ M$$

Here, $0x00$ (Leading Zero Byte) is always at the front. It ensures that, when the concatenated string $EB$ is converted to a big‐endian integer, the value is less than the RSA modulus (that is, we don’t end up with a number too large for RSA to handle). You will better appreciate this fact when we dive into the mathematics behind this.

The next octet is the Block Type, $BT$, which tells us the “type” of padding being used. The standard defines three possible $BT$ values: $00, 01, $ and $02$- to support different operations. For example, $BT=00$ and $BT = 01$ is used for private-key operations (such as digital signatures) and $BT = 02$ is used for public-key operations. For encryption under PKCS#1 v1.5, this is always $0x02$. It’s basically a label that says, “This is an encryption block, not something else”.

The next block is the Padding String $PS$. This is a string of nonzero random bytes. This is crucial for security because it introduces randomness into each encryption. If the same message is encrypted multiple times, these random bytes ensure that each ciphertext looks different, foiling many simple attacks that rely on seeing repeated patterns.

The next octet, $0x00$, is a Delimiter. This single zero byte marks the end of the padding. During decryption, this helps the recipient quickly identify where the padding stops and the real message begins.

Finally, we have the actual data you want to protect – $M$. Once the recipient has verified the padding, they know exactly where to find this message.

This mechanism helped solve the deterministic issue of naive RSA. In the next sections, let’s understand the mathematics involved in PKCS#1 v1.5 padding and its security implications.

The Mathematics Behind PKCS#1 v1.5

Before we begin, let’s get our symbols and abbreviations correct. We will use upper-case symbols (such as $EB$) to denote octet strings and bit strings. We will use lower-case symbols (such as $n$) to denote integers.

In PKCS#1 v1.5, we will use $k$ to represents the length of the RSA modulus $n$ in bytes. For example, if you have a $1024$-bit RSA key, then the RSA modulus $n$ is a $1024$-bit number. Since there are $8$ bits in a byte, if your RSA modulus is $L$ bits long, then:

$$k = \left\lceil \frac{L}{8} \right\rceil = \frac{1024}{8} = 128 \text{ bytes}$$

The total length of the encryption block will be equal to this RSA key length $k$ (in bytes). Now here the length of the data $M$ shall not be more than $k-11$ octets, since the 11 bytes are consumed by the blocks – $0x00 ~||~ 0x02 ~||~ PS ~||~ 0x00$. This limitation guarantees that the length of the padding string $PS$ is at least eight octets, which is a security condition in PKCS#1v1.5:

$$∣PS∣=k~−∣M∣−~3$$

For example, with a $1024$-bit RSA modulus, the value of $k$ comes out to be $128$. Here Alice could encrypt up to $128 - 11 = 117$ bytes of data. The $11$ bytes are used for the $0x00 ~||~ 0x02 ~||~ PS ~||~ 0x00$ structure. The random $PS $ ensures that each encryption of the same message produces a different ciphertext, preventing the deterministic encryption problem.

RSA doesn’t directly operate on the bytes. Once the padded string $EB$ is ready, it needs to be converted into an integer guided by the Octet String to Integer Primitive (OS2IP) formula:

$$x = \sum_{i=1}^{k} 2^{8(k - i)} \,\mathrm{EB}_i$$

where $EB_i$ are the octets of $EB$ from first to last. In other words, $EB_1$ (the first byte) is the most significant byte, and $EB_k$ (the last byte) is the least significant. Now Alice can simply encrypt this block using $C = x^c \mod n$.

To solidify our learnings so far, let’s apply this to a sample plaintext and find the padded blocks.

Let’s assume the RSA modulus is $8$ bytes long ($k=8$). Suppose we want to encrypt a message $M$ that is $2$ bytes long. Then the padding string $PS$ must fill the remaining space:

$$Total ~ bytes=k=8=1(0x00)+1(BT)+∣PS∣+1(delimiter)+∣M∣$$

Since $∣M∣=2$ and there are $∣M∣=2∣$ fixed bytes, can find the required length of the padding string:

$$∣PS∣=8−3−2=3 ~ bytes$$

Let’s pick 3 arbitrary nonzero bytes for $PS$, say - $0xA3, ~0x5F, ~0xC2$. And let’s say the message is the ASCII text “Hi”. In hexadecimal, that’s: $0x48$ for 'H' and $0x69$ for 'i'.

Thus, the complete encryption block becomes:

Now we will convert this octet string to an integer using the OS2IP formula we discussed above:

$$x = \sum_{i=1}^{k} 2^{8(k - i)} \,\mathrm{EB}_i$$

For our example, with $k=8$ the conversion is:

$$x= 0x00×256^7+0x02×256^6+0xA3×256^5+0x5F×256^4+0xC2×256^3+0x00×256^2+0x48×256^1+0x69×256^0$$

Note that the hexadecimal values can be converted to decimal as needed. For instance, $0xA3 = 163, 0x5F = 95, 0xC2 = 194, 0x48 = 72,$ and $0x69 = 105$.

There is an interesting observation in the application of this formula. Because the first two bytes are fixed ($0x00$ and $0x02$), the integer $x$ has a known lower bound. The contribution of the first two bytes is:

$$0×256^ 7 +2×256^ 6 =2×256^ 6$$

The rest of the bytes ($PS$, the delimiter, and $M$) add some value that is at least $0$ and at most just less than $256^6$ (since the second byte is fixed as $0x02$ and cannot be $0x03$). Thus, $x$ is in the range:

$$2×256 ^ 6 ≤x<3×256 ^ 6$$

This property which makes the range predictable, paved the way for the Bleichenbacher attack (also known as the “padding oracle” attack). If a system reveals whether a decrypted block is “correctly padded,” an attacker can systematically probe different ciphertexts and narrow down the plaintext – because the attacker knows it must lie in that narrow range. Let’s take a detailed look at the Bleichenbacher attack in the next sections and understand how the exploit works.

The Bleichenbacher Attack

In 1998, Daniel Bleichenbacher published a seminal paper [8] demonstrating an adaptive chosen-ciphertext attack against RSA with PKCS#1 v1.5 padding. The Bleichenbacher Attack, also dubbed as the “million messages” attack, demonstrated that if an attacker has access to an oracle that tells whether a submitted ciphertext decrypts to a properly padded plaintext (that is, whether the PKCS#1 v1.5 formatting is correct), the attacker can gradually recover the full plaintext. Let’s break down how this attack works:

First, Eve needs an Oracle. The attack assumes the attacker can query a system, such as an SSL/TLS server, and find out if a given ciphertext $C$ is PKCS#1 v1.5 conformant. In the 1998 paper, Bleichenbacher exploited the fact that a TLS server, when presented with an improperly padded RSA-encrypted premaster secret, would respond with a specific error alert if the padding was wrong. Essentially, the server acted as an oracle: it would decrypt $C$ with its private key and simply tell the attacker “padding OK” or “padding error” (the error could be timing-based or an explicit alert).

Note that the oracle does not reveal the plaintext. It only reveals a single bit of information at a time: “valid padding or not.” This might seem harmless, but Bleichenbacher showed that it’s enough to eventually recover the plaintext.

To quickly recap, the attacker’s goal is to find the unknown message integer $m$ (the PKCS#1-padded plaintext as an integer) given its ciphertext $C = m^e \bmod N$, using the oracle. We know that if $m$ is properly padded, it lies in a specific numeric range: $2B \le m < 3B$ where $B = 2^{8*(k-2)}$, as defined earlier.

If $k=128$ bytes, then $B=2^{8*126}$, and a correctly padded $m$ will start with $0x00 ~||~0x02$, so it’s between $2B$ and $3B$. The attacker, Eve, initially only knows that $m$ is in the range $[2B, 3B)$.

In the Bleichenbacher Attack, Eve will exploit RSA’s multiplicative property. They will choose a number $s$ (called the multiplier) and compute a new ciphertext $C' = (C s^e) \bmod N$. This $C'$ here corresponds to a new plaintext: $m' = m s \bmod N$ (because $C' \equiv m^e * s^e \equiv (ms)^e \pmod{N}$).

To begin the attack, Eve finds some $s_0$ such that $C_0 = C * (s_0)^e \mod N$ yields a valid padding. This is referred to as the Blinding step. This is usually easy – for example, $s_0$ can be chosen so that $m * s_0$ is just slightly above $N$, which almost certainly will wrap around and land in $[2B,3B)$. The attacker does not know $m$ to verify this directly. They rely on the padding oracle’s yes/no response to infer that the blinded plaintext $(m×s_0)\mod N$ falls in the correct range.

If the oracle returns “valid padding” for a given $ s_0$, it tells the attacker that $s_0 \mod N$lies between $2B$and $3B$. Mathematically:

$$2B≤(m×s_0)~mod N<3B$$

Now, Eve will try to try to narrow down this range in a loop, which is often referred to as the interval having step. Initially, Eve had one wide interval $[a, b] = [2B, 3B)$ that contains $m$. In each iteration, Eve tries increasing values of $s$ (starting from a certain minimum) until the oracle returns “padding OK” for $C' = C_0 * s^e$. Suppose this happens at some $s = s_i$. Given this feedback, Eve now knows:

$$2𝐵 ≤ (𝑚 × 𝑠_i) ~ mod 𝑁 < 3𝐵$$

This congruence implies there exists some integer $r$ such that:

$$2B ≤ ( m×s_i)−rN < 3B$$

Rearranging, we get a constraint on $m$:

$$\frac{2B+rN}{s_i} ≤ m < \frac{3B+rN}{s_i}$$

Eve doesn’t know $r$ outright, but they can solve for the possible range of $r$ by considering the current interval $[a,b]$ for $m$. Essentially, Eve uses the previous bounds on $m$ to guess which $r$ would make the inequality true, then updates the new bounds $[a, b]$ as the intersection of all possible solutions for $m$. This dramatically shrinks the interval.

Each oracle query yields such a constraint. Eventually, the interval $[a,b]$ collapses to a single value, $[a,a]$. Now, Eve can find the plaintext using:

$$m = (a × s_i^{-1}) ~ mod N$$

At that point, Eve has recovered the entire padded plaintext $m$, and by stripping off the padding, the original message itself.

The sequence diagram below consolidates our learning of the attack:

The Bleichenbacher attack showed that the format of the padding in PKCS#1 v1.5 leaked just enough info to enable a full private-key operation (decrypting the message) without ever factoring N. The attack leveraged the fact that it’s possible to craft ciphertexts that will decrypt to a valid-looking plaintext without knowing the plaintext. In essence, PKCS#1 v1.5 padding allowed about $1$ in $2^{16}$ chance (roughly) for a random blob to appear as “valid padding.” That was enough for an adaptive attack to succeed with feasible queries.

This is precisely what later padding designs like OAEP fixed. OAEP’s design makes such random valid ciphertexts astronomically unlikely (plaintext aware). We will learn about RSA-OAEP in the next sections.

To mitigate the Bleichenbacher attack without immediately changing the padding scheme, practitioners implemented defensive measures. For example, TLS should treat all decryption failures the same way (so an attacker can’t distinguish padding vs. other errors), and servers would generate a fake premaster secret on padding failure to continue the handshake and avoid timing leaks. Nonetheless, the safest course has been to deprecate PKCS#1 v1.5 encryption in favor of schemes like RSA-OAEP.

Optimal Asymmetric Encryption Padding (OAEP)

By the end of 1995, Bellare and Rogaway proposed Optimal Asymmetric Encryption Padding (OAEP) with the goal of achieving provable security. This padding aimed to make RSA encryption resistant not just to passive attacks but also to adaptive chosen-ciphertext attacks. In other words, even if an attacker can trick a system into decrypting chosen ciphertexts (as an “oracle”), they should learn nothing useful about the plaintext. OAEP was subsequently standardized in PKCS#1 v2.0 (published as RFC 2437 in 1998) and later versions.

Compared to PKCS#1 v1.5, OAEP has a more complex encoding that uses hash functions and a mask generation function (MGF) to thoroughly randomize the plaintext before RSA encryption, providing stronger guarantees.

OAEP’s design can be viewed as a two-layer Feistel-like network using a random seed. It takes the input message and randomizes it in a way that is reversible only with the correct seed. The scheme was proven plaintext-aware in the random oracle model which means that an adversary cannot concoct a valid ciphertext without knowing the corresponding plaintext. If an attacker tries to forge or tamper with ciphertexts, they almost surely produce an invalid padding that will be rejected. This property directly counters padding-oracle attacks.

OAEP (with a proper hash/MGF) is semantically secure against adaptive chosen ciphertext attacks, assuming RSA is hard to invert and treating the hash functions as random oracles. Unlike PKCS#1 v1.5, which lacked a formal proof, OAEP comes with a proof sketch that breaking RSA-OAEP is as hard as breaking RSA itself.

In practice, this means OAEP drastically reduces the risk of any padding oracle: an attacker can no longer easily find ciphertexts that slip through the padding check except by brute force which has a $2^{-hLen*8}$ success probability. For example, the success probability with SHA-1 would be $2^{-160}$.

The block diagram below is a visual representation of the OAEP encoding schema:

Let’s understand what these mathematical notions mean and the workings of RSA-OEAP, up next.

The Mathematics Behind OAEP

Optimal Asymmetric Encryption Padding requires a hash function for two operations we will discuss in this section. We will choose SHA-1 as a hash function in OAEP and $hLen$ denotes the length in octets of the hash function output. We will later demonstrate why even MD5 or SHA-1 is a secure choice for OAEP even if it is not collision resistant.

Before we dive into the mathematics, let’s recap a few notations and define the main pieces we’ll be using:

In RSA, $N$is the modulus, and $k$ is the size of $N$ in bytes. For a $2048$-bit modulus, $k=256$ bytes.
$M $ is the message or plaintext to be encrypted. This plaintext must be short enough to fit into the padded block (at most $k−2⋅hLen−2$ bytes). In our notation, $Hash$ refers to the cryptographic hash function (for example, SHA-1, SHA-256) of output length $hLen$. For example: If using SHA-1, $hLen=20$ bytes.

We will also use an optional string associated with the message (often empty). This is the Label $L$. If this label is empty, its hash is a fixed value. (For example: the SHA-1 of an empty string).

The hash of this label $L$ is represented by $lHash$, where $lHash=Hash(L)$. As mentioned earlier, if $L$ is empty, $lHash$ is simply $Hash('')$. This means that in any case $lHash$ will hold a value.

We will also use a Mask Generation Function, $MGF$, which is often mentioned as $MGF1$. This function takes an input (seed or masked data) and produces an output of a specified length by iterating the underlying hash function. We’ll write $MGF(input,length)$ to indicate “generate a mask of $length$ bytes from $input$”.

Now that you are familiar with all the necessary notations, we are ready to begin the encoding step.

Step 1: Constructing the Data Block (DB)

We will compute $lHash=Hash(L)$. If $L$ is empty, $lHash$ is a constant (For example, the SHA-1 of the empty string).

Form the padding string $PS$, the length of $PS$ is chosen so that the entire block $DB$ has length $(k−hLen−1)$ bytes. Numerically, $PS$ has $(k−mLen−2⋅hLen−2)$ bytes of $0x00$, where $mLen$ is the length of the message $M$.

Now we simply concatenate the blocks to generate the octet string for the Data Block ($DB$):

$$DB=lHash~∣∣~PS~∣∣~0x01~∣∣~M$$

Here the single byte $0x01$ acts as a delimiter which marks where the zero padding ends and the actual message $M$ begins. $DB$ ends up being $(k−hLen−1)$ bytes.

Step 2: Generating a Mask for the Data Block

First, we pick a random string called $seed$ of length $hLen$ bytes. For example, when using SHA-1 where $hLen=20$, then we say that the seed consists of $20$ random bytes.

Now we use the mask generation function, $MGF$, on the $seed$ to create a mask the same length as $DB$:

$$dbMask=MGF(seed,k−hLen−1)$$

The idea is to spread the randomness of the seed across the entire $DB$.

Step 3: Mask the Data Block

Now, we will Combine $DB$ and $dbMask$ with the bitwise $XOR$ operation:

$$maskedDB=DB \oplus dbMask$$

This step “scrambles” $DB$ with the random seed.

Step 4: Generate a Mask for the Seed

Next, we will produce a mask for the seed itself, based on $maskedDB$:

$$seedMask=MGF(maskedDB,hLen)$$

This step simply ensures that the seed is not left in the clear.

Step 5: Mask the Seed

Now we will combine the original seed and the new mask with an $XOR$ operation:

$$maskedSeed=seed \oplus seedMask$$

Now the seed is also “scrambled” by the data block.

Step 6: Form the Final Encoded Message (EM)

We are now ready to build our final block. Simply concatenate everything into a $k$-byte string:

$$EM=0x00~∣∣~maskedSeed~∣∣~maskedDB$$

The leading $0x00$ byte ensures that when $EM$ is interpreted as an integer, it’s less than the RSA modulus $N$. At this point, $EM$ is your OAEP-padded message of length $k$.

Step 7: Covert concatenated String to Integer

Remember from our discussion before on PKCS#1v1.5 that RSA cannot directly operate on this concatenated string of bytes. We need to convert the $EM$ block to a non-negative integer using the OS2IP formula:

$$x = \sum_{i=1}^{k} 2^{8(k - i)} \,\mathrm{EB}_i$$

Step 8: Perform RSA Encryption

Now that we have the encoded message ($EM$) as an integer $x$, we are ready to perform RSA guided by the formula:

$$C =x^e \bmod N$$

where $(e,N)$ is the public key. The thus computed $C$ is our ciphertext generated using RSA-OAEP.

When decrypting, the process is reversed: the recipient uses their private key $d$ to compute $m = c^d \bmod N$, recovers the $EM$, then splits it into the $0x00$, $maskedSeed$, and $maskedDB$, and uses the same $MGF$ and hash function to unravel the $XORs$ in reverse order. Finally, they check that the recovered $lHash'$ matches the expected hash and that the block contains the proper structure ($...||0x01||...$).

If any check fails, the padding is invalid. Only if everything checks out is the message $M$ returned. The result is that an invalid ciphertext will almost always be detected and rejected without giving an attacker any useful information.

By design, OAEP effectively foiled the padding oracle problem. The chance that a random guess produces a valid OAEP encoding is negligible: on the order of $2^{-hLen*8}$). In fact, Daniel Bleichenbacher (after breaking PKCS#1 v1.5) advocated for exactly such a “plaintext-aware” padding where forging a valid padding is infeasible.

Why SHA-1 or MD5 Are Safe in RSA-OAEP

Earlier in the section above, we mentioned that we’d be using SHA-1 for our mathematical formulation and examples. When you see SHA-1 or MD5 used in the context of RSA-OAEP, don’t let the fact that these hash functions are considered broken for collision resistance alarm you. If you notice carefully in the previous section, the hash functions serve two very specific roles that do not rely on their collision resistance. Let’s break them down one by one:

Label Hashing

The hash function is used to compute a fixed-length hash of an optional label $L$ (often empty).

Now let’s see why is this safe in the context. This hash, called $lHash$, acts as a domain separator. Its job is simply to ensure that the label is correctly associated with the ciphertext during decryption. As long as the label is chosen wisely (that is, not built from adversary-controlled parts), collision resistance isn’t critical here.

Mask Generation Function (MGF1)

The hash function is also used inside $MGF1$ to create a pseudorandom mask. This mask is applied both to the data block $DB$ and to the random seed used in the encoding process.

In this context, the hash function is treated as a random oracle. The job is to spread the randomness of the seed across a larger block of data. For this purpose, properties like length extension or collision resistance are not relevant. What matters is that the output appears random, and even SHA-1 or MD5 can deliver that when used in this controlled, fixed-input scenario.

Adoption in Cryptographic Libraries (PKCS#1 v1.5 vs OAEP)

After the Bleichenbacher attack, standards and libraries migrated to OAEP or at least added support for it, while treating PKCS#1 v1.5 as a legacy option. Modern cryptographic libraries and protocols reflect these lessons.

In 1998, the RSA standard was updated. PKCS#1 v2.0 introduced RSAES-OAEP as the new recommended encryption scheme, and by PKCS#1 v2.1 and v2.2 (RFC 3447 and RFC 8017), OAEP is required for new applications, with PKCS#1 v1.5 included only for backward compatibility.

OpenSSL discourages users from using PKCS#1 v1.5 as it leaks information that can potentially be used to mount a Bleichenbacher padding oracle attack [10]. The documentation clearly mentions that it is highly recommended to use RSA_PKCS1_OAEP_PADDING in new applications.

The Python cryptography library (PyCA cryptography) also asks developers to use OAEP for encryption instead of PKCS#1 v1.5 [11].

After Bleichenbacher’s 1998 attack, it was impractical to instantly replace PKCS#1 v1.5 everywhere. Instead, protocol designers issued countermeasures.

TLS, for example, responded by changing the error handling: the server would not reveal a padding failure distinctly. It would generate a fake premaster secret and proceed to prevent timing clues, and always return a generic handshake failure at a later stage, making it harder for the attacker to distinguish why decryption failed.

These countermeasures reduced the oracle’s fidelity but were tricky to get right across different implementations. In fact, not everyone got it right – the Bleichenbacher attack continued to resurface in various forms when implementations made mistakes in error handling.

In 2018, researchers discovered the ROBOT attack (Return Of Bleichenbacher’s Oracle Threat): several TLS implementations had subtle bugs that recreated a padding oracle, allowing the attack to succeed 19 years later. The ROBOT paper showed that even with countermeasure guidelines, the complexity of uniformly handling errors led to slip-ups in popular products.

This underscores that patching an insecure scheme is often error-prone – a design that is secure by construction (like OAEP) is preferable.

PKCS#1 v1.5 continues to exist because of these patchwork security measures and the fact that it cannot be abruptly removed from all existing systems. It is generally regarded as "legacy" or maintained "for compatibility" purposes. The collective wisdom is clear: use OAEP for RSA encryption whenever possible.

Enhancing Digital Signatures: The Transition to PSS

Now that you understand how OAEP transformed RSA encryption by mitigating vulnerabilities in deterministic padding, it’s time to turn our attention to RSA digital signatures – a critical function for ensuring message integrity and authenticity.

Early RSA signature schemes suffered from similar problems as raw encryption: their deterministic nature made them prone to forgery and replay attacks. This vulnerability paved the way for an improvement: the Probabilistic Signature Scheme (PSS).

Before we dive into PSS itself, let’s quickly understand the pain points with early RSA signatures.

Problems with Early RSA Signature Schemes

Traditional RSA signatures were generated by simply applying the RSA decryption function on a message digest (often with minimal formatting):

$$s=m^d \bmod N$$

where $m$ is the hash (or encoded hash) of the message. This approach was deterministic which meant that each time the same message was signed, the exact signature was produced. Such determinism had two major drawbacks:

Predictability and Replay

Since the signature for a given message was always identical, an attacker could replay a captured signature with impunity or forge signatures if they could deduce patterns in the signature scheme.
Forgery Risks

In a deterministic setting, if an attacker finds any structure or mathematical relationship in the signature, they might be able to forge a valid signature for a new message. In certain scenarios, weak formatting could allow an adversary to create a “signature transformation” that produces a valid signature without having access to the private key.

These issues highlighted that a signature scheme must be probabilistic to be secure against adaptive forgery attempts and to ensure non-repudiation. This means that the signer should not be able to repudiate a signature because it is bound to a random value known only at signing time.

Birth of the Probabilistic Signature Scheme (PSS)

Towards the end of 1998, Bellare and Rogaway also proposed a scheme to overcome the inherent limitations of deterministic RSA signatures [12]. The core idea was to introduce randomness into the signature generation process so that even when signing the same message twice, the resulting signatures would be different. This randomness comes from a salt value and a carefully designed encoding process. The result is a signature method with strong, provable security guarantees.

This randomness prevents attackers from exploiting patterns in the signature process. The probabilistic Signature Scheme was designed to be provably secure in the random oracle model, meaning that forging a signature would be as hard as breaking RSA itself under certain assumptions [13].

The block diagram below is a visual representation of the PSS encoding schema:

Let’s understand what these mathematical notions mean as well as the workings of RSA-PSS, up next.

The Mathematics Behind PSS

Before diving into the mechanics of RSA-PSS, it’s helpful to define the notations and terms you’ll see in the steps ahead.

In RSA, $N$is the modulus, a large integer that is the product of two primes. $k$ is the length of $N$ in bytes. For an $2048$-bit key, $k=256$ bytes.

$M$represents the message data or document you want to sign. In RSA-PSS, you’ll typically first compute a hash of $M$. $Hash$ refers to a cryptographic hash function (for example, SHA-256) that maps data to a fixed-size output. The output length is denoted $hLen$. For SHA-256, $hLen=32$ bytes.

We will use a salt, $S$, randomly generated string of fixed length (often the same as $hLen$). This randomness is essential in ensuring that each signature is unique, even for the same message.

$H$ or $mHash$ is the hash of the message $M$and $H'$ is a secondary hash that includes both $M$ and the salt $S$. This appears in the PSS encoding step.

The Mask Generation Function, $MGF$, is a function that uses the hash internally to produce a pseudorandom output of arbitrary length. In PSS, it is used to “mask” parts of the data block so that the signature is hard to forge.

A fixed byte, $0xbc$ (in hex) is appended at the end of the encoded message to mark the boundary of the PSS structure. This serves as a simple integrity check during decoding. After a successful encoding we receive an encoded message $EM$ which is an octet string of length $emLen = \left\lceil{\frac{emBits}{8}}\right\rceil$.

Now that you are familiar with all the necessary notations, we are ready to begin the encoding step.

Step 1: Message Hashing and Salt Generation

We compute the hash of the message as $H~( mHash)=Hash(M)$ where $M$ is our message. We will also create a random salt $S$ (of fixed length, say 20 bytes if you use SHA-1).

Step 2: Encoding the Hash with the Salt (PSS-Encode)

We will construct a Data Block, $DB$, by combining a padding with the hash and the salt. The padding is a sequence of $0$’s that fills space and ensures a fixed length. Mathematically:

$$M' = (0x)~00 ~00 ~00 ~00 ~00 ~00 ~00 ~00 ~||~ mHash ~||~ salt$$

Now we compute the Hash of this block as $H' = Hash(M')$. We will generate another octet string $PS$ and concatenate it with the salt and $0x01$ as a delimiter:

$$DB = PS ~||~ 0x01 ~||~ salt$$

Note that DB is an octet string of length $emLen - hLen - 1$. The mask that you see in the visual representation above must be of this length. Mathematically:

$$dbMask = MGF(H, emLen - hLen - 1)$$

We will then apply this mask on the $DB$ block using an $XOR$ operation to produce our $maskedDB$:

$$maskedDB = DB \oplus dbMask$$

Recollect that $emLen$ is the intended length of the Encoded Message $EM$ and $hLen$ is the length of the hash output. Now we append a fixed trailer field $0xbc$ and produce the encoded message in its octet string representation:

$$EM = maskedDB ~||~ H ~||~ 0xbc$$

This encoding process ensures that both the salt and the hash are mixed together in a non-reversible, pseudorandom manner. The randomness from the salt is “spread” over the data block by the $MGF$, making it extremely difficult for any adversary to manipulate the signature.

Step 3: RSA Signature Generation

Once you have the encoded message $EM$, the RSA signature is produced by using the RSA private key. First, convert the Octet String to its integer representation using the OS2IP method we’ve discussed before. Then apply the RSA Private Key Operation:

$$s=m^d \bmod N$$

where $d$ is the private exponent and $N$ is the RSA modulus.

Step 4: Signature Verification

At the receiver end, when any recipient wants to verify a signature, they reverse the process:

$$m′= s^e \bmod N$$

and convert $m'$ back to an encoded message $EM$. The verifier then extracts the components $(MaskedDB, H′, trailer)$ and recomputes $H'$ from the message and salt. The verifier confirms that the hash and salt embedded in $EM$ match what is expected. If everything checks out, the signature is valid.

The Road Ahead: Assessing RSA’s Long-Term Viability

In 1994, Peter Shor’s algorithm [14], demonstrated that a quantum computer can factor large integers in polynomial time, thereby efficiently breaking RSA’s underlying hard problem – the difficulty of factoring $N = p \times q$.

Although experimental quantum computers have made progress, they remain far from having the number of stable qubits required to break RSA keys of practical sizes (2048 or 4096 bits).

In anticipation of large-scale quantum computers, the cryptographic community is actively developing and standardizing algorithms believed to be resistant to quantum attacks. These include lattice-based schemes (such as CRYSTALS-Kyber and NTRU), code-based cryptography (such as the McEliece cryptosystem), hash-based signatures (such as XMSS), and multivariate polynomial cryptosystems.

It’s important to note that while OAEP and PSS improve the security of RSA against classical attacks, they do not protect RSA from quantum attacks. In a post-quantum world, even the most secure classical padding will not prevent a quantum computer from breaking RSA using Shor’s algorithm.

In the near term, RSA remains in widespread use and, when implemented with padding schemes such as OAEP and PSS, continues to provide strong security against classical adversaries. But looking ahead, it’s expected that organizations will gradually migrate to post-quantum algorithms as they mature and become standardized.

References

[1] FIPS 186-5: Digital Signature Standard (DSS)

[2] RFC 8017 PKCS #1: RSA Cryptography Specifications

[3] Lagrange's theorem

[4] Ronald L. Rivest, Robert D. Silverman: Are Strong Primes Needed for RSA?

[5] pyca/cryptography

[6] OpenSSL Github: rsa_chk.c

[7] RFC 2313: PKCS #1: RSA Encryption

[8 ] Daniel Bleichenbacher: Chosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS #1

[9] RFC 8017: PKCS #1 RSA Cryptography Specifications Version 2.2

[10] RSA_public_encrypt: Warnings

[11] pyca/PKCS1v1

[12] Probabilistic signature scheme

[13] RFC 8017: RSASSA-PSS

[14] Algorithms for quantum computation: discrete logarithms and factoring

Decoding Chaos: How True Randomness Works in Software Engineering

Gor Grigoryan — Mon, 06 May 2024 16:27:18 +0000

Understanding Randomness

When you hear the word "randomness," what usually comes to mind? You may think of something intangible, an abstract concept without a specific shape or form – it's random.

But randomness is much more than an abstract idea – it's a fundamental aspect of our daily decisions and choices. Whether it's deciding what to eat for breakfast or picking a number from 1 to 10 in a game, randomness plays a crucial role.

Randomness isn't just about unpredictability. It's also about the lack of pattern or predictability in events. For instance, when you toss a coin, the outcome of heads or tails is random because it's equally likely and unpredictable.

Why is Randomness Important in Software Engineering?

This concept is incredibly important in the field of software engineering, where generating true randomness can enhance security, simulations, and algorithms. In software development, this unpredictability is not just a feature—it's a fundamental requirement for various critical functions.

Security

The most crucial role of randomness in software is in the realm of security. Random numbers are used to generate secure keys for encryption, ensuring that sensitive data—be it personal information, financial details, or confidential communications—is protected from unauthorized access.

The randomness ensures that these keys cannot be easily predicted or replicated, fortifying the security barriers (see more in the Randomness in Cryptographic Systems section)

Testing and Quality Assurance

Developers use random inputs to simulate how software might perform under different conditions. This approach helps uncover unexpected bugs and ensures that the software can handle a variety of scenarios, improving its reliability and stability.

Companies like Netflix, Facebook, Google use Chaos Engineering to make their systems more reliable (learn more in the Chaos Engineering section).

Simulation and Modeling

Randomness is a key component in simulations that mimic real-world phenomena, which can be inherently unpredictable. Whether it's modeling climate patterns, economic markets, or traffic flows, randomness helps create more accurate models that better reflect the complexity of these systems.

Additional Applications

Randomness is used in many areas and it helps distribute tasks across servers in load balancing, improves efficiency in traffic routing, and adds realism in image generation. Also, its crucial for creating unique identifiers like GUIDs (Globally Unique Identifiers) and shuffling playlists to enhance user experience. As you can see, the use cases for randomness are numerous.

Prerequisites

This article is designed to be accessible, with explanations straightforward enough for readers with various backgrounds. However, a few basic prerequisites can enhance your understanding:

Basic Programming Knowledge: While not essential, some familiarity with programming concepts in languages like C#, Java, or Python could help you grasp examples of how randomness is implemented in code more quickly.
Elementary Math Skills: A basic understanding of probability and statistics is beneficial but not necessary, as the article aims to explain these concepts in simple terms.
Introductory Cryptography: If you're curious about the security aspects of randomness, some background in cryptography concepts like encryption and key generation could be helpful.

Overall, the article is structured to be easy to follow, with no advanced knowledge required. It's meant to introduce the concept of randomness in software engineering broadly, making it suitable for readers from diverse fields.

Here's what we'll cover in this article:

Understanding Randomness
Coin Toss Paradigm
The Illusion of Human Randomness
How Random Number Generators Work
- Simple random number generator
True Random Number Generation (TRNG) and Entropy Sources
Randomness in software testing
- Chaos Monkey developed my Netflix
Randomness in Cryptographic Systems
- Could you hack the encryption?
Randomness in Simulation and Modeling
- Monte Carlo Simulation
Future of Randomness in Software Engineering
- Quantum Computing and Quantum Randomness
Wrapping Up

Coin Toss Paradigm

Is tossing a coin truly a random event? At first glance, a coin toss represents the paradigm of randomness : two outcomes, each with an equal chance of occurring.

But if we dive deeper into the physics behind a coin toss, the story starts to unfold differently. Hypothetically, if we could control and replicate every variable involved in the toss – the force applied, the angle of the toss, the air resistance, and even the surface it lands on – would the outcome still be unpredictable?

The answer leans towards a surprising declaration: in a perfectly controlled environment, the result of a coin toss could be predicted with near certainty. This challenges our understanding of randomness, suggesting that what we often perceive as random is influenced by numerous factors, many of which are beyond our control or too complex to replicate in practice.

Thus, we arrive at an insightful conclusion that randomness ≈ the result of variables that are exceedingly difficult to replicate.

Big research from the University of California at Berkeley, titled “Dynamical Bias in the Coin Toss”, delves into this phenomenon:

Abstract: We analyze the natural process of flipping a coin which is caught in the hand. We show that vigorously flipped coins tend to come up the same way they started. The limiting chance of coming up this way depends on a single parameter, the angle between the normal to the coin and the angular momentum vector. Measurements of this parameter based on high-speed photography are reported. For natural flips, the chance of coming up as started is about .51

_[Dynamical Bias in the Coin Toss](https://www.stat.berkeley.edu/~aldous/157/Papers/diaconiscoinbias.pdf" rel="noopener)

The Illusion of Human Randomness

For humans, it's an easy task to generate a random number, say a random word, or make a random decision. But again, is it really a random thing and can it be somehow predicted like we have stated for a coin toss?

If you have seen the 2015 movie Focus, you may remember the "priming" scene where they spend the day "priming" their victim to subconsciously recognize and choose the number 55 by having it represented all around him.

Priming is one of the most important psychological principles to understand because it influences behavior through implicit memory. In other words, exposure to a cue in one setting can form an association that carries into another.

One of the examples of priming comes to us from a supermarket bottle shop. Imagine one week you go into the bottle shop and there’s some French music playing in the background. You buy your wine and leave.

Now imagine you return a week later, but this time German music is piping through the speakers. Again, you buy your wine and leave. Chances are that when French music was playing, you purchased French wine, and when German music was playing, German wine – just like 77% and 73% of research participants did.

Were these consumers aware of the music and its impact on their decision? 86% of people said no, the music had no effect.

This phenomenon underscores a profound truth: whether knowingly or not, we are both the primers and the primed. Our perceived randomness in decision-making is continuously shaped by the stimuli around us. This reveals that the essence of human randomness is far more complex and influenced than we might initially believe.

How Random Number Generators Work

Let’s take a journey back to the early days of computing to understand the evolution of random number generators.

Initially, computers were quite basic compared to today’s sophisticated machines. Essentially, a computer operates on a strict set of instructions : it cannot spontaneously generate a number as humans might randomly choose a number from 1 to 10.

For a computer, generating a random number requires specific instructions. Today, this task has become straightforward in many programming languages through built-in functions. For example, in C#, you can generate a random number between 1 and 10 with this simple command:

Random.Next(1, 10) // <-- Generates a radom number from 1 to 10

The interesting part begins when we look under the hood.

Simple random number generator

What if you were given a task to create a function that generates a random number? Let’s say you have this function:

public static int GenerateRandomNumber(int start, int end)
{
  return ✨🪄 magic ✨🪄
}

One of the simplest ways to do this is using a Linear Congruential Generator (LCG). The example below is a simplistic approach and you shouldn't use it for cryptographic purposes or applications requiring high levels of randomness.

using System;

class SimpleRandomGenerator
{
    private long seed;
    private const long a = 25214903917;
    private const long c = 11;
    private long m = (long)Math.Pow(2, 48);

    public SimpleRandomGenerator(long seed)
    {
        this.seed = seed;
    }

    public int Next(int min, int max)
    {
        // Update the seed
        seed = (a * seed + c) % m;

        // Ensure the result is within the bounds [min, max)
        int result = (int)(min + (seed % (max - min)));
        return result;
    }
}

class Program
{
    static void Main(string[] args)
    {
        var generator = new SimpleRandomGenerator(DateTime.Now.Ticks);

        for(int i = 0; i < 15; i++)
        {
            var rndNumber = generator.Next(1, 101);

            Console.WriteLine($"Random number between 1 and 100: {rndNumber}");        
        }
    }
}

/* Output
Random number between 1 and 100: 78
Random number between 1 and 100: 9
Random number between 1 and 100: -48
Random number between 1 and 100: 71
Random number between 1 and 100: 6
Random number between 1 and 100: 45
Random number between 1 and 100: 64
Random number between 1 and 100: 99
Random number between 1 and 100: -34
Random number between 1 and 100: 85
Random number between 1 and 100: -44
Random number between 1 and 100: -25
Random number between 1 and 100: 26
Random number between 1 and 100: -27
Random number between 1 and 100: 24
*/

This example uses the Linear Congruential Generator (LCG) method, which is a basic pseudorandom number generator.

LCGs are one of the oldest and simplest methods for generating sequences of pseudo-random numbers, and they operate based on a simple mathematical formula: "new seed = (a×seed+c) mod m" . The seed is typically initialized using a value with sufficient entropy, such as the current time (DateTime.Now.Ticks in this case). The Next method generates a new "random" number within the specified range [min, max).

Here's the step-by-step logic:

Update the Seed: The seed is updated using the LCG formula mentioned above. This step is critical, as it uses the old seed to produce a new one, ensuring that each call to Next results in a different output.
Scaling the Output: Once the new seed is calculated, it needs to be adjusted to fall within the user-specified range [min, max).
– The modulus operation seed % (max - min) scales the seed to a value within the range of 0 to (max - min) - 1.
– Adding min shifts this scaled value into the desired range, ensuring that the result is at least min but less than max.

True Random Number Generation (TRNG) and Entropy Sources

Random number generation based on natural events or hardware characteristics involves using unpredictable, non-deterministic sources to generate randomness. This approach is often referred to as using "entropy sources" or "true random number generation" (TRNG).

Unlike pseudo-random number generators (PRNGs) that use mathematical algorithms and require a seed value, true random number generators derive their randomness from physical events that are almost unpredictable. Here are a few examples:

Earthquakes in TRNG

Earthquakes generate seismic data that is almost unpredictable and can be used as a source of randomness. By measuring seismic activity through geophones or seismographs, the minute variations in the Earth's movement can be converted into random numbers.

Earthquakes occur due to the sudden release of energy in the Earth's crust, resulting in the ground shaking. This energy release is unpredictable and varies in magnitude, location, and frequency. The unpredictability of the timing, duration, and intensity of seismic events makes this a viable entropy source.

[USGS Magnitude 2.5+ Earthquakes data, Past Day](https://earthquake.usgs.gov/earthquakes/map/?currentFeatureId=pr71446783&extent=9.79568,-147.39258&extent=58.99531,-42.62695" rel="noopener)

Additional technical details

Here are some additional technical details about earthquakes in TRNG:

Data collection is typically done using instruments called seismometers or geophones, which are sensitive to ground vibrations. These devices convert the kinetic energy of ground movements into electrical signals that can then be digitized and analyzed.

This process might include:

Signal Conditioning and Filtering: Filtering the seismic signals to isolate the random components from predictable noise or background vibrations.
Digitization: Converting the analog signals into digital values, which typically involves sampling the signal at regular intervals and quantizing these samples into digital values.

The raw digital data derived from seismic activity might not be uniformly random due to natural biases in how earthquakes occur or how data is collected.

To ensure that the numbers generated are suitable for use in applications requiring high-quality randomness (such as cryptographic systems), further processing might be necessary.

Here are the common techniques:

Debiasing: Applying algorithms to remove any predictable patterns or biases from the data.
Whitening: Transforming the data to ensure a uniform distribution across all possible values. This often involves statistical tests to adjust the output until it meets the criteria for randomness.

Using earthquakes for random number generation could be particularly valuable in applications where an external, unpredictable source of randomness is beneficial.

But there are cons and practical considerations:

Geographical Limitations: Not all locations experience frequent seismic activity, which could limit the availability of this method to specific regions.
Event Rarity: Significant seismic events are relatively rare and unpredictable in timing, which might not provide a steady or reliable source of randomness when needed.
Data Collection and Processing Overhead: The infrastructure and computational effort required to capture, process, and utilize seismic data for random number generation can be significant.

Hardware Events in TRNG

Hardware-based random number generators (HRNGs) use physical processes within computing devices to generate randomness. Examples include:

Thermal Noise (Johnson-Nyquist Noise):

Thermal noise, also known as Johnson-Nyquist noise, is a type of interference naturally present in all electronic devices and circuits. It’s caused by the random motion of electrons within a material due to heat. This phenomenon can be used as a source of randomness for generating random numbers in hardware devices.

Every material that conducts electricity has electrons, which are tiny particles that move around and carry electrical current. Even when a device isn’t actively being used, these electrons are never completely still – they move randomly because of the heat energy within the material. The higher the temperature, the more active the electrons become.

Thermal noise is generated by the inherent energy present in all materials at temperatures above absolute zero (-273.15°C or -459.67°F). At these temperatures, electrons gain energy and start moving randomly. This movement causes tiny, random fluctuations in the electrical current when measured across components like resistors.

Thermal noise is ideal for cryptographic applications where high security is essential. This includes key generation and secure communications where unpredictability is paramount to preventing attacks.

In developing secure communication protocols for applications like instant messaging, VoIP, or data transmission systems, thermal noise can be used to generate encryption keys that are nearly impossible to predict, enhancing security.

Clock Drift

Clock drift occurs due to the slight and unpredictable variations in the timing mechanisms (like crystal oscillators) of computers and other digital devices. Clock drift exploits the natural variability in hardware clocks, which are designed to measure time but can drift apart due to minor differences in the frequency of their oscillators.

By comparing the time reported by two or more independent clocks, small differences that occur naturally and unpredictably can be measured. These differences are influenced by factors such as temperature changes, hardware imperfections, and supply voltage variations.

_[A USB-pluggable hardware true random number generator](https://en.wikipedia.org/wiki/Hardware_random_number_generator#Clockdrift" rel="noopener)

Photonic Emission

Photonic emission-based random number generation uses the process of light emission to create random numbers. This approach relies on the quantum nature of light – specifically, the behavior of photons, which are tiny particles that make up the light.

Photonic emission occurs when energy is released from atoms in the form of light. This happens in devices like LEDs (light-emitting diodes) and lasers.

In an LED, when electricity flows through the device, it excites electrons (tiny negatively charged particles) to higher energy states. As these electrons return to their normal states, they release energy in the form of photons.

The exact moment a photon is emitted is inherently unpredictable due to the principles of quantum mechanics, where particles like electrons behave in a probabilistic manner.

To turn photonic emission into random numbers, we first need to detect these photons. We can do this using a device called a photodetector, which captures the light and converts each photon hit into an electrical signal.

The key to randomness lies in the timing of each photon’s arrival at the detector. Since the emission of each photon is random, the times they are detected are also random. These times are then recorded with high precision.

Cloudflare’s Lava Lamps for Randomness

Cloudflare, a web performance and security company, has set up a wall of lava lamps in the lobby of their San Francisco office. The setup is known as the “LavaRand” system. It leverages the unpredictable and ever-changing movements of the “lava” inside these lamps to generate randomness.

Cloudflare’s Lava Lamps. The view from the camera

How LavaRand Works:
The process starts with visual capturing. A camera is pointed at the wall of lava lamps. The lamps contain blobs of wax in a liquid that expand and move in unpredictable ways when heated.

As the wax heats up, it rises, and as it cools, it falls, creating an ever-changing, visually chaotic display.

The camera takes images of the lava lamps at regular intervals. Each image captures a unique, random pattern of swirling wax. These images are then processed using computer algorithms to extract random data from the patterns observed in the images.

Relation to Photonic Emission:
While Cloudflare’s Lava Lamps use a form of photonic emission, it’s indirect. The photonic emission in this context is the light emitted by the lamps, which illuminates the wax inside.

The random number generation process, however, primarily relies on the chaotic physical movements of the wax, which are captured by the light and recorded by a camera. The randomness comes from how the light and shadows play off the moving lava, rather than the emission and detection of photons at a quantum level (which is more typical in photonic emission RNG systems using LEDs or lasers).

Information from Cloudflare's official website:

LavaRand is a system that uses lava lamps as a secondary source of randomness for our production servers. A wall of lava lamps in the lobby of our San Francisco office provides an unpredictable input to a camera aimed at the wall. A video feed from the camera is fed into a CSPRNG, and that CSPRNG provides a stream of random values that can be used as an extra source of randomness by our production servers. Since the flow of the “lava” in a lava lamp is very unpredictable,1 “measuring” the lamps by taking footage of them is a good way to obtain unpredictable randomness. Computers store images as very large numbers, so we can use them as the input to a CSPRNG just like any other number.

We’re not the first ones to do this. Our LavaRand system was inspired by a similar system first proposed and built by Silicon Graphics and patented in 1996 (the patent has since expired).

Hopefully, we’ll never need it. Hopefully, the primary sources of randomness used by our production servers will remain secure, and LavaRand will serve little purpose beyond adding some flair to our office. But if it turns out that we’re wrong, and that our randomness sources in production are actually flawed, then LavaRand will be our hedge, making it just a little bit harder to hack Cloudflare.

Read more here.

[First proposed and patented LavaLend in 1996](https://patents.google.com/patent/US5732138" rel="noopener)

Human Factors in TRNG

Mouseware

Some tools like Mouseware use human factors to generate randomness. Mouseware uses a cryptographically secure random number generator based on your mouse movements to generate secure, memorable passwords. Passwords are generated entirely in the browser, and no data is ever sent over the network.

For those generated passwords, it would take 22400.7 years to guess at 1000 guesses/second and 2.0 hours to guess at 100 billion guesses/second.

1000 guesses/second is a worst-case web-based attack. Typically this is the only type of attack feasible against a secure website.
100 billion guesses/second is a worst-case offline attack when a hashed password database is stolen by someone with nontrivial technical and financial resources.

Example of the flow to generate random numbers based on mouse movements

You can read more about Mouseware on their website.

Randomness in Software Testing

Chaos Monkey developed my Netflix

Chaos Monkey

Chaos Monkey is an innovative tool developed by Netflix. It's responsible for randomly terminating Netflix's instances in production to ensure that engineers implement their services to be resilient to instance failures.

Imagine a virtual, mischievous monkey randomly tinkering with the network—shutting down instances, disconnecting servers, or overloading systems to simulate possible failures.

Although it might seem counterintuitive, the purpose of Chaos Monkey is to proactively provoke controlled failures. This strategy allows Netflix's engineers to test how well their systems can handle unexpected disruptions. The aim is to identify and resolve weaknesses before they impact users, ensuring that the infrastructure is robust enough to withstand real-world issues.

For instance, if Chaos Monkey randomly terminates a server and everything continues to run smoothly, that’s a win. If problems arise, engineers quickly analyze and rectify them, thereby strengthening the system. This continuous testing and improvement cycle helps ensure that when you settle in to binge-watch your favorite series, you experience uninterrupted streaming.

Thanks to tools like Chaos Monkey and the principles of Chaos Engineering, Netflix can deliver a seamless viewing experience. Next time you watch a show without any glitches, remember the behind-the-scenes efforts of these unsung heroes keeping your entertainment flawless.

This tool is also available for open source usage. Check out the docs here.

Randomness in Cryptographic Systems

Randomness plays a critical role in cryptographic systems, forming the backbone of security protocols across the digital landscape. This section explores why randomness is essential in cryptography, how it is generated, and the challenges involved in ensuring its effectiveness.

In cryptographic systems, randomness is used to generate keys, initialize cryptographic algorithms, and for non-repudiation processes like digital signatures and secure communications.

The strength and security of almost all cryptographic techniques depend on the quality of the randomness used. If the randomness is predictable, so too are the cryptographic keys, making the system vulnerable to attacks.

If we encrypt the text “Hello World”, we will get this text “oO64D2IzNWKSQnDM8fcZ/w==”. To see the power of encryption, let’s also encrypt variations of the text: “HelloWorld” (without a space) and “Hello world” (with lowercase), while also experimenting with a different encryption key.

Here are the outcomes:

╔═════════════╦═══════════╦══════════════════════════╗
║    Text     ║ Password  ║      Encoded value       ║
╠═════════════╬═══════════╬══════════════════════════╣
║ Hello World ║      1234 ║ oO64D2IzNWKSQnDM8fcZ/w== ║
╠─────────────╬───────────╬──────────────────────────╣
║ HelloWorld  ║      1234 ║ KvqAEHQhP9iBdFWhOUcYVg== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello world ║      1234 ║ jdKRaAw9ULCFb627e3mNpQ== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello World ║       123 ║ S/eGTyDQsgLwcEIrCWUAJw== ║
╠─────────────╬───────────╬──────────────────────────╣
║ HelloWorld  ║       123 ║ /JRa5+mllydL/F0m7NuxYA== ║
╠─────────────╬───────────╬──────────────────────────╣
║ Hello world ║       123 ║ s3AydwlvlgHCcpiAhaurXg== ║
╚═════════════╩═══════════╩══════════════════════════╝

If you consider the above table, you’ll notice that even a small change, such as a change in spacing or a single character, leads to a complete transformation of the encrypted text.

This means that if the intruder manages to obtain both the original text and its encrypted form, they would still face a significant challenge in trying to guess the password required to unlock the entire database.

Could you hack the encryption?

Brute force attacks are a straightforward yet powerful method used by attackers to crack passwords and encryption keys.

A brute force attack involves systematically checking all possible combinations until the correct one is found. Attackers use brute force methods to try every possible key or password until they decrypt the targeted data.

Ream more about brute force attacks

In our case, for decrypting the word we will need to try every possible combination (even like a, aa, b, bb strings and so on).

Now lets calculate how much time is needed to decrypt/check every possible combination for our password. Suppose you own an exceptionally powerful supercomputer, coupled with cutting-edge technology and virtually unlimited resources.

Let’s say the computer has a whopping 1 terabyte (TB) of RAM allowing it to handle lots of tasks at once. For the CPU, this supercomputer boasts a mind-boggling speed of 1 exaflop, which means it can do about 1 quintillion calculations in just one second. 1 exaflop is equal to 1,000,000 gigaflops. So, to achieve 1 exaflop of computing power using Intel i9 processors with a performance of 300 gigaflops each, you would need 1,000,000 gigaflops / 300 gigaflops = 3,333,333 Intel i9 processors.

This hypothetical supercomputer, performing mind-blowing calculations at lightning speed, could do a brute-force attack on an encryption algorithm.

If our hypothetical supercomputer were to attempt every possible combination of text to decipher the encrypted data, it would be faced with an astronomical number of possibilities — ²²⁵⁶. It’s estimated that it would take not just years, not even centuries, but potentially tens of thousands of decades.

To read more about this, you can refer to this article that I wrote.

Randomness in Simulation and Modeling

Monte Carlo Simulation

The Monte Carlo Simulation is a mathematical technique used to understand the impact of risk and uncertainty in prediction and forecasting models. Essentially, it’s a method used to predict the probability of different outcomes when the intervention of random variables is present.

Named after the famous Monte Carlo Casino due to its reliance on randomness, this method is widely used across finance, engineering, research, and more.

In the context of finance, Monte Carlo simulation is commonly used to assess the risk and value of financial instruments, such as options or portfolios. By generating a large number of random scenarios for different input variables, such as asset prices or interest rates, Monte Carlo simulation can provide a range of possible outcomes and their associated probabilities. This method is mostly used when there is no analytical solution for the given problem.

Telecoms use them to assess network performance in various scenarios, which helps them to optimize their networks. Financial analysts use Monte Carlo simulations to assess the risk that an entity will default, and to analyze derivatives such as options. Insurers and oil well drillers also use them to measure risk.

To read more, check out this article.

Monte Carlo Simulation Output of a Stock price. Retrieved from this article

Future of Randomness in Software Engineering

The future of randomness in software engineering looks particularly promising, with significant advancements expected from emerging technologies like quantum computing.

Quantum Computing and Quantum Randomness

Quantum computing introduces an inherently stochastic element known as quantum randomness.

Unlike classical computing, which relies on deterministic processes, quantum processes are unpredictable by nature. Quantum random number generators (QRNGs) exploit this property to generate true random numbers directly from quantum phenomena, such as the superposition of quantum states or the measurement of entangled particles.

These devices are expected to provide a more secure and fundamentally unpredictable source of randomness than is currently possible.

IBM’s new 53-qubit quantum computer

Quantum computing has the potential to revolutionize cryptography. Current cryptographic systems rely on the computational difficulty of certain problems (like factoring large numbers) which quantum computers could solve effortlessly. But quantum cryptography, utilizing quantum randomness for key distribution, promises to be virtually unbreakable due to the laws of quantum mechanics.

Current State of Quantum Computing

As of now, quantum computing is in an experimental phase. Researchers and companies like Google, IBM, and D-Wave are actively developing quantum computers and have made significant progress in recent years.

For instance, Google announced "quantum supremacy" in 2019, claiming that their quantum computer solved a problem that would be practically impossible for a classical computer to solve in any reasonable amount of time.

Quantum bits, or qubits, which are the basic units of information in quantum computing, are highly susceptible to interference from their environment. This leads to high error rates in quantum computations. Developing error-correcting codes and finding ways to make qubits more stable is a significant focus of current research.

Currently, quantum computers have a limited number of qubits. To be practical for widespread use, quantum computers need to scale up the number of qubits significantly without a corresponding increase in error rates.

Also those computers need to operate at extremely low temperatures, close to absolute zero, to maintain the quantum state of the qubits. Maintaining such conditions is technically challenging and expensive.

The consensus among experts is cautiously optimistic, but varies widely regarding when quantum computing will become practical for broad use.

Some experts believe that within the next decade, we'll begin to see quantum computers solving more practical, real-world problems, potentially revolutionizing fields like cryptography, materials science, and complex system simulation. Others think that these applications might remain out of reach for several more decades.

Wrapping Up

The future of randomness in software engineering holds vast potential to drive innovation across multiple domains.

As we delve deeper into quantum computing and enhance our current technologies, randomness will play an increasingly critical role in shaping the next generation of software solutions, making them more secure, efficient, and reflective of the complex world they model.

What is Steganography? How to Hide Data Inside Data

Daniel Iwugo — Thu, 13 Jul 2023 17:15:06 +0000

Ladies and Gentlemen, welcome to the world of Spies 🕵️.

In the movie Uncharted (great movie by the way), Tom Holland and his brother have a secret form of communication. They would write a message on a plain postcard with special ink that became invisible and then send it to the other person.

On the outside, it seemed like another plain old postcard. But if a lighter was lit just behind the paper, the ink would reappear, and a new message would be found 🔥.

This is one of the coolest hidden information tricks seen in movies. But what if we could do this on computers?

Well, turns out we sorta can. Using Steganography.

Disclaimer: This concept can be used for both good and bad. The content of this article is for educational purposes only and is not to be used to play pranks, or harm people and infrastructure.

And with that out of the way, here’s what we’re going to explore in this article:

What is Steganography?
Types of Steganography – Text, Image, Video, Audio, Network
Image steganography using Steghide

What is Steganography?

Steganography is the art of hiding secret data in plain sight. It sounds kind of counter-intuitive, but you’d be surprised how effective it is.

Hiding things such as source code, passwords, IP addresses, and other confidential information in pictures, music, or other random files tends to be the last place anyone would think of finding them.

You should note that steganography and cryptography are not mutually exclusive from each other. One may contain elements of the other or both. For example, you could perform steganography with an encryption algorithm or password, as you’ll find out soon.

Types of Steganography

There are various types of steganography, and we’ll look at five of them in this tutorial.

Text Steganography

This form involves hiding a message within a text. A common way to do this is substitution. It involves replacing certain characters with others and then substituting them back to retrieve the original data.

For example, take the following text.

Thi follow eng tixt contaens a sicrit missagi

Doesn’t really make sense right? But what if we replace the i’s with e’s and the e’s with i’s?

The follow ing text contains a secret message

I think that’s a little easier on the eyes. This is a pretty easy example, but there are much more complicated ones and even some you could come up with on your own.

Image Steganography

Frankly, this is my favourite. It involves hiding data behind digital images. There are various techniques for image steganography which include the Least Significant Bit technique, Masking and Filtering, and Coding and Cosine Transformation.

Take a look at the two images below and spot the difference:

Groot on Linux ¦ Credit: Mercury

Basically, no human on earth can tell the visual difference. But if you take a closer look at the file details…

Comparing the images ¦ Credit: Mercury

The only difference is the size of the images. That’s because the one on the right is hiding 260 words of text in it. How cool is that?

Video Steganography

In Video steganography, you can literally hide entire videos inside another video. Videos are basically a sequence of images with audio playing as the sequence progresses. This type of steganography allows each video frame to encode an image of the one you want to hide.

This technique can also be used to hide text as demonstrated in the software Steganosaurus by James Ridgeway. He shows how it works in this video.

Audio Steganography

This type of steganography enables hidden messages to be encoded inside an audio file. A common technique used in this is called Backmasking. Backmasking is hiding a message in the audio file and it can only be heard when played backwards.

The famous rapper, Eminem, did some backmasking in the song ‘Stimulate’ back in 2002.

Network Steganography

This is relatively rare, but nevertheless, it is a technique in which messages are passed by hiding them in network traffic. The messages could be found in the payload or headers of data packets when captured and analysed by the receiver.

Now let’s take a look at how to do some image steganography.

Steganography using Steghide

Steghide is an open source image steganography tool that uses the least significant bit (LSB) method to hide data in images.

Images are made up of pixels, which are made up of bits. The bit depth determines how many colours are present in an image. The higher the bit depth, the more colourful the image tends to look.

What LSB does is change the last bit of each byte (or pixel) in the image to one that represents the data you want to hide. This changes the image data, but if done properly is not perceivable. The higher the bit depth and resolution, the more data can be stored in the image.

Now that you understand how it works, let’s play a little hide and seek (no pun intended 👀).

First we’ll be needing a few things:

A Linux OS
An Internet Connection
An Image
A Text file

Install Steghide

First we need to install Steghide. Open your terminal and run the following command to do that:

sudo apt install steghide

You can always run steghide --help to get the command list to see all your options.

Get your image ready

Next, have an image and a text file in a directory. My files are ‘information.txt’ and ‘image.png’. I’ve also put some text in the file to hide in the image later.

Setting up files ¦ Credit: Mercury

Open up your terminal again and go to the directory you stored the files. Mine is in ~/Documents/steganography_tutorial.

Looking for the files ¦ Credit: Mercury

Create a new image

Next, run the following command to create a new image that contains the text file you want hide.

steghide embed -ef <data> -cf <image> -sf <stego_image> -v

Let’s take a look at the command:

steghide – We specify the tool to use
embed – Tells the tool we want to embed data
-ef – Embed file, specifies the file to hide
-cf – Cover file, specifies the cover image
-sf – Stego file, creates a duplicate of the original image with the embedded file in it
-v – Verbose, gives us more information about the process

When the command is run, you’ll be asked to enter a password. If you want an extra layer of security, you might want to do this. If you don’t, just hit enter twice. Here’s the result of what I ran:

Embedding the information ¦ Credit: Mercury

Inspect the new file

Now let’s take a look at the new file.

Comparing the images side by side ¦ Credit: Mercury

There’s seems to be no difference. We can take a closer look with a site called diffchecker.com.

Comparing the images details ¦ Credit: Mercury

Extract the data

The stego file is slightly bigger than the original because it contains information. We can extract the data from the stego file using the command below.

steghide extract -sf <stego_image> -xf <extracted_data>

Let’s review the command above:

-sf – stego file, the image containing hidden data
-xf – extract file, the file with extracted data

Below is the screenshot from running the command. The extracted text is also shown below.

Extracting the information ¦ Credit: Mercury

If you extracted the text, Congratulations 🎉🎊. You have successfully hidden and extracted the text from the image. You can do this with a number of things, even whole books.

Using a different tool called Stegcore, I hid a text file containing Quincy Larson’s new book, “How to Learn to Code & Get a Developer Job”, behind an image of the book🔍.

Here’s an excerpt from the book.

An excerpt from the book ¦ Credit: Quincy Larson

And just like before, the text was embedded into a new image. Here is the original and the stego image side by side.

The original image compared to the stego image ¦ Credit: Mercury

And as expected, the stego image is slightly larger in size than the original.

The image details side by side ¦ Credit: Mercury

Talk about hiding a book behind a book (bad joke, I know 🤧). If you want to try it out, you can check out the Github repository or the app.

Conclusion

You’ve learned what steganography is and how to implement it using tools. Keep in mind that steganography is a tool and can be used for both good and bad. Companies can hide sensitive information using these means. On the other hand, a hacker could use it to hide malicious code.

Once again, this tutorial is for educational purposes only and is to be used to help and defend information from black hat hackers. Stay safe in the online jungle and happy hacking 🙃.

Acknowledgements

Thanks to Anuoluwapo Victor, Chinaza Nwukwa, Holumidey Mercy, Favour Ojo, Georgina Awani, and my family for the inspiration, support and knowledge used to put this together. I appreciate all of you.

If you want articles similar to this one, hit me up on Upwork or read more of my articles here.

Cover image credit: Abstract Data Cube ¦ Credit: Shubham Dhage.

SSH Keygen Tutorial – How to Generate an SSH Public Key for RSA Login

Bolaji Ayodeji — Tue, 30 Aug 2022 15:51:22 +0000

Cryptography uses encryption and decryption to conceal messages. This introduces secrecy in information security.

The purpose of cryptography is to ensure secure communication between two people or devices who are connecting through insecure channels.

The sender often employs an encryption key to lock the message, while the recipient uses a decryption key to unlock the message.

In general, cryptography employs two strategies:

Symmetric-key Cryptography (Private key): With this technique, the encryption and decryption keys are both known to the sender and receiver. Some examples of algorithms that use this technique include One Time Pad cipher, Vernam cipher, Playfair, Row column cipher, and Data Encryption Standard (DES).
Asymmetric Key Cryptography (Public key): With this technique, each person has two keys: the Private (secret and accessible to the creator) and Public keys (freely available to anyone). The sender and receiver use different keys for encryption and decryption. Some examples of algorithms that use this technique include the Rivest–Shamir–Adleman algorithm (RSA), Diffie - Hellman Key Exchange (DHE), and the Digital Signature Algorithm (DSA).

The Encryption Model for Secured Data Transmission

Software engineers generally have to authenticate with servers or other services like GitHub for version control.

As opposed to using password authentication, they can use public key authentication to generate and store a pair of cryptographic keys on their computer. Then they can configure the server running on another computer to recognize and accept those keys.

This is the asymmetric key cryptography technique flow we discussed earlier and it is a more secure authentication process.

In this tutorial, you will learn how it all works, what SSH means, and how to generate SSH keys with an RSA algorithm using SSH keygen.

Prerequisites

A working computer running on any operating system.
Basic knowledge of navigating around the command-line.
A smile on your face :)

Brief Introduction to SSH (Secure Shell Protocol)

Public key authentication using SSH is a more secure approach for logging into services than passwords. Understanding SSH is easier once you understand how cryptography works from the above intro.

Here's a helpful basic definition:

"The Secure Shell Protocol is a cryptographic network protocol for operating network services securely over an unsecured network." (Source)

SSH is used between a client and a server both running on the SSH protocol to remotely login into the server and access certain resources through the command line.

Source: SSH Academy

There is an open-source version of the SSH protocol (version 2) with a suite of tools called OpenSSH (also known as OpenBSD Secure Shell). This project includes the following tools:

Remote operations: ssh, scp, and sftp.
Key generation: ssh-add, ssh-keysign, ssh-keyscan, and ssh-keygen.
Service side: sshd, sftp-server, and ssh-agent.

Our goal is to use ssh-keygen to generate an SSH public key using the RSA algorithm. This will create a key pair containing a private key (saved to your local computer) and a public key (uploaded to your chosen service).

Now to proceed, follow the steps below to achieve this:

Install OpenSSH if you don't have it installed already using the command below:

// for mac

brew install openssh

// for linux

sudo apt install openssh-client && sudo apt install openssh-server

Create a private/public key pair with an RSA algorithm (2046-bit encryption by default), using the command:

ssh-keygen -t rsa

Or, if you want to create with an RSA algorithm with 4096-bit encryption, use the command:

ssh-keygen -t rsa -b 4096

Enter a file location to save the key to (by default it will save to your users directory (for example, (/Users/bolajiayodeji/.ssh/id_rsa) ).
Enter a passphrase for extra security to your private key. Generally, a good passphrase should have at least 15 characters (including at least one upper case letter, lower case letters, numerical digits, and special characters) and must be difficult to guess. You can use one of those password generators online or use hexdump to generate a paraphrase easily like so:

hexdump -vn16 -e'4/4 "%08X" 1 "\n"' /dev/urandom

Once you've successfully created your password, your private key will be saved in //.ssh/id_rsa and your public key will be saved in //.ssh/id_rsa.pub.

Now you can copy the created key into the authorized_keys file of the server you want to connect to using ssh-copy-id (this tool is a part of openSSH) like so:

ssh-copy-id username@remote_host

Alternatively, you'd want to add your SSH private key to the ssh-agent and store your passphrase in the keychain. You can then add the SHH key to your server's account via a dashboard UI or so (for example, using tools like Git or GitHub).

Conclusion

Although a strong password helps prevent brute-force attacks, public key authentication provides a much more secure authentication process using cryptography.

I hope you found this article helpful. In addition, you can check out the ssh-keygen manual page and the following resources for further learning:

Cheers! 💙

Cipher Definition – What is a Block Cipher and How Does it Work to Protect Your Data?

freeCodeCamp — Thu, 03 Jun 2021 16:21:19 +0000

By Megan Kaczanowski

Cryptography is the science of using codes and ciphers to protect messages. And encryption involves encoding messages so that only the intended recipient can understand the meaning of the message. It's often used to protect data in transit.

Encryption is a two way function – that is, you need to be able to undo whatever scrambling you’ve done to the message.

Today, there are two basic types of algorithms — symmetric and asymmetric.

Symmetric algorithms are also known as ‘secret key’ algorithms, and asymmetric algorithms are known as ‘public key’ algorithms.

The key difference between the two is that symmetric algorithms use the same key for encryption and decryption, while asymmetric algorithms use different keys for encryption and decryption.

For a general overview of cryptography and the difference between symmetric and asymmetric ciphers, check out this article.

What Principles are Important When You're Developing a Cipher?

Kerckhoff's principle states that a cryptographic system should be secure, even if all the details (other than the key) are known publicly. Claude Shannon later rewrote this message as 'The enemy knows the system.'

Essentially, a very well designed system should be able to send secret messages even if an attacker can encrypt and decrypt their own messages using the same algorithm (with a different key). The security of the encrypted message should depend entirely on the key.

Additionally, in order to hinder statistical analysis (attempts to break an encryption algorithm), a good cryptographic system should employ the principles of confusion and diffusion.

Confusion requires that the key does not relate to the ciphertext in a simple manner. Each character of the ciphertext should depend on multiple parts of the key. The goal is to make it very difficult for an attacker to determine the key from the ciphertext.

Diffusion means that if a single character of the plaintext is changed, then several characters of the ciphertext should change. And if a single character of the ciphertext is changed, then several characters of the plaintext should change.

Ideally, the relationship between the ciphertext and the plaintext is hidden. No diffusion is perfect (all will have some patterns), but the best diffusion scatters patterns widely, even scrambling several patterns together.

Diffusion makes patterns hard for an attacker to spot, and requires the attacker to have more data in order to mount a successful attack.

If you want to read up on this a bit more, check out A Mathematical Theory of Cryptography.

What are Block and Stream Ciphers?

Both block and stream ciphers are symmetric key ciphers (like DES, RCx, Blowfish, and Rijndael AES). Block ciphers convert plaintext to ciphertext block by block, while stream ciphers convert one byte at a time.

Most modern symmetric algorithms are block ciphers, though the block sizes vary (such as DES (64 bits), AES (128, 192, and 256 bits), and so on).

What is the advantage of a stream cipher?

Stream encryption is faster (linear in time) and constant in space. It is unlikely to propagate errors, as an error in one byte's translation won't impact the next byte.

However, there's little diffusion as one plaintext symbol is directly translated to one ciphertext symbol. Also, the ciphertext is susceptible to insertions or modifications. If an attacker is able to break the algorithm, they may be able to insert text which looks authentic.

You typically use a stream cipher when the amount of plaintext is unknown (like audio or video streaming), or when extreme performance is important (like with very high speed connections, or for devices which need to be very efficient and compact, like smart cards).

A stream cipher works by generating a series of pseudorandom bytes which depend on the key (for any given key, the series of bytes is the same for encryption and decryption). Different keys will produce different strings of bytes.

In order to encrypt data the plaintext bytes are XORed with the string of pseudorandom bytes. To decrypt, the ciphertext is XORed with the same string in order to see the plaintext.

What is the advantage of a block cipher?

A block cipher has high diffusion (information from one plaintext symbol is spread into several cipher-text symbols). It is also fairly difficult for an attacker to insert symbols without detection, because they can't easily insert them into the middle of a block.

However, it is slower than a stream cipher (an entire block needs to be transmitted before encryption/decryption can happen) and if an error does occur, it can propagate throughout the block, corrupting the entire section.

Block ciphers are a better choice when you know the transmission size – such as in file transfer.

What are the common modes of Block Ciphers?

In order to encrypt data which is longer than a single block, there are several 'modes' which have been developed. These describe how to apply the single block principles to longer messages.

There are 5 confidentiality modes for block ciphers. Some of these modes require an initialization vector (IV) in order to function.

What is an Initialization Vector (IV)?

An IV is essentially just another input (in addition to the plaintext and the key) used to create ciphertext. It's a data block, used by several modes of block ciphers to randomize encryption so that different cipher text is created even if the same plain text is repeatedly encrypted.

It usually does not need to be secret, though it cannot be re-used. Ideally, it should be random, unpredictable, and single-use.

Two of the same messages encrypted with the same key, but different IVs, will result in different ciphertext. This makes an attacker's job more difficult.

Electronic Code Book Mode (ECB)

There is a fixed mapping between input blocks of plaintext and output blocks of ciphertext (essentially like an actual code book where ciphertext words directly relate to plaintext words).

ECB applies the cipher function independently to each block of plaintext to encrypt it (and the inverse function to each block of ciphertext to decrypt it). This means that CBC can encrypt and decrypt multiple blocks in parallel (since they don't depend on each other), speeding up the process.

_https://en.wikipedia.org/wiki/Block_cipher_mode_ofoperation

For this mode to work correctly, either the message length needs to be a multiple of the block size or you need to use padding for the length condition to be met.

Padding is essentially extra data that's added in order to ensure that the block size is met. With this mode, given the same key, the same plaintext block will always result in the same ciphertext block. That makes it vulnerable to attack, so this mode is rarely used (and should be avoided).

Cipher Block Chaining Mode (CBC)

This mode 'chains' or combines new plaintext blocks with the previous ciphertext block when encrypting them which requires an IV for the first block. The IV doesn't need to be secret, but it needs to be unpredictable.

CBC exclusive ors (XORs) the first block of plaintext with the IV ciphertext block to create the first ciphertext block. The IV is sent separately as a short message using ECB Mode.

Then, CBC applies the encryption algorithm to the block, creating the first block of ciphertext. CBC then XORs this block with the second plaintext block and the applies the encryption algorithm to produce the second ciphertext block, and so on until the end of the message.

In order to decrypt the message, CBC does the reverse - applies the inverse of the encryption algorithm to the first ciphertext block and then XORs the block with the IV to obtain the first plaintext block.

CBC then applies the inverse of the encryption algorithm to the second ciphertext block and XORs the block with the first ciphertext block to obtain the second plaintext block. This process continues until the message is decrypted.

_https://en.wikipedia.org/wiki/Block_cipher_mode_ofoperation

Because each input block (except the first) relies on the previous block being encrypted, CBC can't perform encryption in parallel. However, since the decryption requires XORing with the (immediately available) ciphertext blocks, it can be done in parallel. CBC is one of the most commonly used modes.

Similarly to ECB, for this mode to work correctly, either the message length needs to be a multiple of the block size or you need to use padding for the length condition to be met.

Cipher Feedback Mode (CFB)

CFB is similar to CBC, but instead of using the entire previous ciphertext block to compute the next block, CFB uses a fraction of the previous block.

CFB starts with an IV of the same block size as expected by the block cipher, and encrypts it with the encryption algorithm.

CFB retains s (significant) bytes from this output and XORs them with s bytes of plaintext to be transmitted.

Then, CFB shifts the IV s bytes to the left, inserting the ciphertext bytes produced by step 2 as the righthand bytes (IV stays the same length).

Then it repeats these steps.

To decrypt a message, CFB uses the IV as the first block and forms each following block by performing step 3 above and applying the encryption algorithm to form blocks. CFB then XORs s bites with the corresponding ciphertext to reveal the plaintext.

Within CFB, the encryption system processes s < b plaintext bits at a time, even though the algorithm itself carries out b-bits to b-bits transformation. This means that s can be any number, including 1 byte and CFP can functionally operate as a stream cipher.

_https://en.wikipedia.org/wiki/Block_cipher_mode_ofoperation

Unfortunately, this means that CFB can propagate errors downstream. If a byte is received with an error, when CFB uses it to decrypt the first byte, it will produce an erroneous decryption, causing downstream errors when fed back into the decryption.

Like CBC, when CFB encrypts, the input to each round relies on the result of the previous round, meaning that encryption cannot be done in parallel, though decryption can be performed in parallel if the input blocks are first created from the IV and ciphertext.

Output Feedback (OFB)

OFB is similar to CFB, but instead of processing s < b bits into a b-bits to b-bits transformation, it processes s bits directly. Similarly to CFB, OFB can be functionally used as a stream cipher.

OFB requires that the IV be a unique nonce (number used once) for each execution with a given key.

First, OFB encrypts the IV with the encryption algorithm, to produce an output block. OFB then XORs this block with the first plaintext block, producing the first ciphertext block.

OFB encrypts the first output block with the encryption algorithm to produce the second output block. It then XORs this block with the second plaintext block to produce the second ciphertext block. OFB repeats this process for the length of the message.

_https://en.wikipedia.org/wiki/Block_cipher_mode_ofoperation

When decrypting, OFB encrypts the IV with the encryption algorithm, producing an output block. OFB then XORs this block with the first ciphertext block, recovering the first plaintext block.

OFB encrypts the first output block with the encryption algorithm to produce the second output block. OFB then XORs it with the second ciphertext block to recover the second plaintext block. OFB repeats this process for the length of the message.

Because the output blocks for decryption are locally generated, OFB is more resistant to transmission errors than CFB.

Counter (CTR)

CTR applies the encryption algorithm to a set of unique input blocks (counters) in order to produce outputs which are XORed with the plaintext to produce ciphertext.

CTR encrypts the first counter with the encryption algorithm, then XORs the resulting output with the first plaintext block to produce the first ciphertext block. CTR repeats this for each block (with a new counter – counters must be unique across all messages encrypted using a single key).

If the final block is a partial block of s bytes, the most significant bits, s, of the output block are used for the XOR, while the b - s bytes of the output block are discarded.

_https://en.wikipedia.org/wiki/Block_cipher_mode_ofoperation

The decryption follows the same pattern. CTR encrypts the counter with the encryption algorithm, then XORs the output with the corresponding ciphertext block to produce the plaintext block.

If the final block is a partial block of s bytes, the most significant bits, s, of the output block are used for the XOR, while the b - s bytes of the output block are discarded.

CTR has been shown to be at least as secure as the other four modes, while also being able to be executed in parallel (both encryption and decryption), meaning that it is very fast.

Each block can be recovered independently if its counter block can be determined and the encryption can be applied to the counters in advance of receiving the plaintext or ciphertext (if memory is no constraint).

Further Reading: NIST Recommendation for Block Cipher Modes of Operation

How do Attackers Attempt to Break Ciphers?

There are a number of techniques attackers use, but they broadly fall into the following categories of attack, based on information required to carry it out.

This isn't an exhaustive list (there are other attacks such side channel attacks), but many of the most common fall into one of these categories.

Known Ciphertext Attack

An attacker has some ciphertext, but does not know what plaintext was used to generate this ciphertext. The attacker does not get to choose which ciphertext they have and they cannot obtain/produce more.

This is the easiest type of attack to try, since it's easiest to eavesdrop on an encrypted conversation (since presumably the people having the conversation are using strong encryption and aren't as worried about eavesdroppers). But it's the hardest to be successful, as long as the people sending messages used appropriately strong encryption.

_For example: David finds an encrypted message (ciphertext) in a dead drop, but has no idea what the message means._

Known Plaintext Attack

An attacker has some plaintext and ciphertext pairs which they didn't choose (so the attacker didn't choose the message that was encrypted, but was able to successfully steal a plaintext message and its associated ciphertext). The attacker cannot obtain/produce more pairs.

For example: David finds an enemy spy's hiding place and interrupts him while he is sending an encrypted message. The spy is silly enough to have fled, leaving both the plaintext message and its associated ciphertext written down.

Chosen Plaintext Attack

An attacker can choose any plaintext and obtain the ciphertext in return (but they can't see the key itself).

This is further broken down into batch chosen plaintext (where the attacker can submit a set of plaintexts and receive the ciphertext, but cannot do so again) and adaptive chosen-plaintext (where the attacker can submit plaintext, receive the ciphertext and submit additional plaintext based on the previous ciphertext.)

For example: One nation-state is eavesdropping on another's encrypted communication and knows they use the same key for all of their encryption. They send a sensitive diplomatic communication to the other nation-state, knowing it will be transmitted via the encrypted channel, thus giving them a chosen plaintext - ciphertext pair.

Chosen Ciphertext Attack

This is the opposite of the last attack, where the attacker can choose any ciphertext and obtain the plaintext in return (but they can't see the key itself).

For example: David knows an enemy spy is going to send an encrypted message tomorrow, so he replaces the text with his own chosen ciphertext, then spies on the recipient, listening as they read out the plaintext of the message.

Sources/Further Reading:

What is Commit Signing in Git?

Seth Falco — Wed, 02 Jun 2021 23:14:01 +0000

Git has a feature to "sign" commits, but what is signing, and what are the benefits?

TL;DR: If you don't care for the details, and just need to get commit signing setup quickly, skip to How to Sign Commits.

Signing, or code signing specifically, is the process of using cryptography to digitally add a signature to data. The receiver of the data can verify that the signature is authentic, and therefore must've come from the signatory.

It's like physical signatures, but digital and more reliable.

Git's Default Behavior

First let's note that all commits have the following properties:

Author – The contributor who did the work, this is informational.
Committer – The user who committed the change.

In most cases, these will be the same, but they can be overridden when committing, so it's important to note the difference.

When you first installed Git, you probably had to configure a few settings, namely user.email and user.name. This may've been handled for you depending on your Git client.

In the command line, this requires executing the following commands:

git config --global user.email "seth@example.org"
git config --global user.name "Seth Falco"

Git commits are trust-based, so it'll assume you put in your real email and name. You can then commit and push to remote providers like GitHub and GitLab with the details provided.

What happens when someone else uses your email address, and then pushes changes remotely?

git config --global user.email "seth@example.org"
git commit -m "Jen did this."
git push origin main

The result looks normal, but I'm not the one who did this commit. Jen committed to her repository, authenticating with her GitHub credentials, but it's showing my name and linking to my profile. The default behavior sets both the author and committer to the details in git config.

On GitHub, the commit is already indistinguishable from my own. If a user set both user.email and user.name to mine, which they can get from doing git log on any of my commits, then even locally it'd look the same.

This means that anyone can set their user.email to your email address, and it'd look like you made the commit.

Why Does Git Do This?

You might wonder why this is possible. You authenticate to your account when you push to the repository after all, shouldn't it use that email? Doesn't this seem a bit flawed?

When you authenticate to push to remote repositories, you're authenticating to do just that– push changes. The commits don't require authentication regardless of who authored or committed them.

If commits required authentication by default, it'd be impossible to migrate or mirror projects to other platforms. The commit history will include former employees, dead users, inactive accounts, or email addresses that aren't on other platforms.

The only solution would be to rewrite the history to remove that they ever worked on the project, which isn't ideal.

Another scenario would be if I forked a project on GitHub, but want to maintain my fork on GitLab. My first push would include all commits from previous committers. For a large project, it's not feasible to authenticate every committer.

The author of a commit signifies attribution for who did the work, not proof of who did the work.

In fact, you can always override the author when committing just for this purpose. Using the --author argument, you can specify a different name and email to your global settings, even details that aren't associated with an account where the repository is hosted.

On public repositories, be mindful when committing on behalf of someone without an account, though. Names and email addresses become public information once pushed, and are accessible to anyone using git log!

git commit -m "Jen didn't even author this." --author "Jen "
git push origin main

This has different behavior than using another email in git config. This makes the author what we specified in --author, but the committer what we specified in git config.

Translation platforms like Weblate rely on this feature to ensure translators still get attribution, even though an automated user commits and opens the pull requests, not the translator.

How to Prove You're the Committer in Git

GNU Privacy Guard (GnuPG or GPG) allows you to create cryptographic asymmetric key pairs that can be used for the encryption and signing of data. They consist of a public and private key.

You can share the public-key with anyone – you may upload this to your GitHub and GitLab accounts, or put it on the internet for anyone to access.

The private-key, as the name suggests, is private. You should treat this like a password, and under no circumstances should you ever share your private-key with anyone.

We'll be generating a key pair, and then uploading the public key to GitHub and GitLab. Using your private-key, you can sign your commits, and servers with the public key will use it to confirm it was really you.

How to Sign Commits in Git

I'll only cover how to do this in the terminal, since this provides a uniform experience across operating systems. If you're uncomfortable with the terminal, you pretty much just have to copy the commands.

Prerequisites

The only prerequisite, other than Git itself, is to install the GPG command-line utility.

You can verify if it's installed with gpg --version.

Windows

Git BASH

If you have Git BASH installed (optionally bundled with Git for Windows), then you already have access to GPG. Just launch an instance of Git BASH, and it'll be available immediately.

Gpg4win

If you don't have Git BASH, then there's no need to install it. You can install Gpg4win, which will provide GPG globally, so you can just use it from PowerShell.

When installing Gpg4win, you can untick all the additional components, as we won't be needing them since we plan to use the terminal.

If you already had PowerShell open, you'll have to restart it before you can use GPG.

Linux

Your distribution most likely already includes GPG. If not, then you can install it through your package manager.

apt (Debian / Ubuntu)

sudo apt install gnupg

pacman (Arch / Manjaro)

sudo pacman -S gnupg

How to Generate GPG Keys

If you already have a GPG key, you can skip this step. It's perfectly fine to reuse GPG keys. Just read below and verify that your key is compatible with Git and GitHub.

You can get a list of your GPG keys with:

gpg --list-keys

First we need to generate an RSA key pair. The following will start an interactive script that will ask questions so we can provide the necessary information.

gpg --full-gen-key

For what kind of key you want, input 1 which is "RSA and RSA".
For key size, input 4096. This is the minimum size for GitHub and GitLab, and the maximum size GPG will let us generate.
For how long the key should last, use whatever suits you. The default is 0, which means to never expire.
Verify the information is correct by inputting y.

GPG will ask for personal information which is stored in your key.

Your name, this can be anything at least 5 characters in length.
Your email address, use an email you plan to commit with. You must've verified this email on the remote account you'll push with.
A comment, you can type whatever, or press enter to leave it blank.
Verify the information is correct by inputting o.

root@799d1cc3c99c:/# gpg --full-gen-key
gpg (GnuPG) 2.2.19; Copyright (C) 2019 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
  (14) Existing key from card
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (3072) 4096
Requested keysize is 4096 bits
Please specify how long the key should be valid.
         0 = key does not expire
        = key expires in n days
      w = key expires in n weeks
      m = key expires in n months
      y = key expires in n years
Key is valid for? (0) 0
Key does not expire at all
Is this correct? (y/N) y

GnuPG needs to construct a user ID to identify your key.

Real name: Seth Falco
Email address: seth@example.org
Comment: 
You selected this USER-ID:
    "Seth Falco "

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o

GPG will ask for a passphrase to protect the key. You can set this to anything, or leave it blank for no passphrase at all. Of course, it's ideal to use a good passphrase, rely on your password manager if you use one.

The password prompt is environment-dependent, so this step will look different for different users, but what it's asking is effectively the same.

It'll start generating the key, which requires a lot of randomly generated data. Performing actions on your PC will help make it more random, so I'd recommend moving your mouse around while the key is generating.

How to Export Your Keys

Next you need to get the identifier of the newly generated key so we can refer to it when exporting your key and configuring Git.

GPG keys can be referred to in multiple ways. It's a good a habit to use and share the full fingerprint, to minimize the risk of ambiguity when users request it from a key server. Long (64-bit) IDs are fine for now, but short (32-bit) IDs are best avoided, as it's easy to produce a collision. (More Info)

We'll be using the full GPG fingerprint, which we can get with the command:

gpg --list-keys

You'll get output like the following:

pub   rsa4096 2021-05-23 [SC]
      C6656513A0F9B7B7F4E76389EF39187D04795745
uid           [ultimate] Seth Falco 
sub   rsa4096 2021-05-23 [E]

For me, it's C6656513A0F9B7B7F4E76389EF39187D04795745. Make sure to use your fingerprint instead of mine when you do the rest of the commands.

You need to export the public-key so you can upload it to GitHub. We use the --armor argument to indicate that we want to export it in an ASCII armored format instead of binary. This writes the public-key to a file named gpg-key.pub.

gpg --export --armor C6656513A0F9B7B7F4E76389EF39187D04795745 > ./gpg-key.pub

How to Back Up Your Keys

It's worth having a remote backup of your GPG keys because you'll likely use them across services. If you lose it, it'd be a pain to have to update everything.

You can export your private-key in the same way we exported the public-key, this writes the private-key to a file named gpg-key.asc:

gpg --export-secret-keys --armor C6656513A0F9B7B7F4E76389EF39187D04795745 > ./gpg-key.asc

You can now back up both your public and private keys, but remember that you should never send the non-encrypted copy of the private-key to the cloud. Always use end-to-end encrypted cloud storage, or a password manager like Bitwarden to back up sensitive data.

How to Enable Commit Signing

Then to enable signing all commits, set the commit.gpgsign setting using git config. This will make git commit sign commits by default.

git config --global commit.gpgsign true

If you have multiple GPG keys, or just for future reference, you may want to set user.signingkey as well. This will indicate specifically which key Git should use for signing to avoid ambiguity.

git config --global user.signingkey C6656513A0F9B7B7F4E76389EF39187D04795745

How to Use your Key

Finally, you have to upload your public key. You can use the same GPG key for both GitHub and GitLab, or any other Git provider.

We'll need the exported public-key for the following steps, so open the gpg-key.pub file in any editor like Visual Studio Code, and copy the contents to your clipboard.

On GitHub, you can go to your settings, under "SSH and GPG keys", then click "New GPG key". Paste the contents of gpg-key.pub into the Key field on GitHub, and click "Add GPG key".

On GitLab, the steps are almost identical, just go to your preferences, then "GPG Keys". Paste the contents of gpg-key.pub into the Key field on GitLab, and click "Add key".

Now you're able to make signed commits to your repositories! The next commit will prompt for your GPG key password, since it's the first time using it. Subsequent commits will be seamless.

How to Verify Commits in Git

GitHub and GitLab will show a "Verified" badge next to your new commits.

The final thing to remember is that commit signing will only verify the committer, not the author. That means when you see a verified commit, the author has nothing to do with the verified status.

Vigilant Mode

As a bonus, on GitHub specifically there is a setting called vigilant mode.

You can optionally enable this if you want all unsigned commits to explicitly say "Unverified". This can be enabled in your settings, under "SSH and GPG keys", then tick "Flag unsigned commits as unverified".

Now the commit that Jen did with my email address shows "Unverified" next to it, to indicate that it wasn't signed with a key associated with my account.

Diffie-Hellman: The Genius Algorithm Behind Secure Network Communication

David Karolyi — Mon, 11 May 2020 18:53:11 +0000

Let's start with a quick thought experiment.

You have a network of 3 computers, used by Alice, Bob, and Charlie. All 3 participants can send messages, but just in a way that all other clients who connected to the network can read it. This is the only possible communication form between participants.

If Alice sends a message through the wires, both Bob and Charlie get it. In other words, Alice cannot send a direct message to Bob without Charlie receiving it as well.

But Alice wants to send a confidential message to Bob and doesn't want Charlie to be able to read it.

Seems impossible with these strict rules, right? The beautiful thing that this problem is solved in 1976 by Whitfield Diffie and Martin Hellman.

This is a simplified version of the real world, but we face the same problem when communicating through the biggest network that's ever existed.

Usually, you are not directly connected to the internet, but you are part of a local smaller network, called Ethernet.

This smaller network can be wired or wireless (Wi-Fi), but the base concept remains. If you send a signal through the network this signal can be read by all other clients connected to the same network.

Once you emit a message to your bank's server with your credit card information, all other clients in the local network will get the message, including the router. It will then forward it to the actual server of the bank. All other clients will ignore the message.

But what if there is a malicious client in the network who won't ignore your confidential messages, but read them instead? How is it possible you still have money on your bank account?

Encryption

It's kind of clear at this point that we need to use some kind of encryption to make sure that the message is readable for Alice and Bob, but complete gibberish for Charlie.

Encrypting information is done by an encryption algorithm, which takes a key (for example a string) and gives back an encrypted value, called ciphertext. The ciphertext is just a completely random-looking string.

It's important that the encrypted value (ciphertext) can be decrypted only with the original key. This is called a symmetric-key algorithm because you need the same key for decrypting the message as it was encrypted with. There are also asymmetric-key algorithms, but we don't need them right now.

To make it easier to understand this, here is a dummy encryption algorithm implemented in JavaScript:

function encrypt(message, key) {
    return message.split("").map(character => {
        const characterAsciiCode = character.charCodeAt(0)
        return String.fromCharCode(characterAsciiCode+key.length)
    }).join("");
}

In this function, I mapped each character into another character based on the length of the given key.

Every character has an integer representation, called ASCII code. There is a dictionary that contains all characters with its code, called the ASCII table. So we incremented this integer by the length of the key:

Character mapping

Decrypting the ciphertext is pretty similar. But instead of addition, we subtract the key length from every character in the ciphertext, so we get back the original message.

function decrypt(cipher, key) {
    return cipher.split("").map(character => {
        const characterAsciiCode = character.charCodeAt(0)
        return String.fromCharCode(characterAsciiCode-key.length)
    }).join("");
}

Finally here is the dummy encryption in action:

const message = "Hi Bob, here is a confidential message!";
const key = "password";

const cipher = encrypt(message, key);
console.log("Encrypted message:", cipher);
// Encrypted message: Pq(Jwj4(pmzm(q{(i(kwvnqlmv|qit(um{{iom)

const decryptedMessage = decrypt(cipher, key);
console.log("Decrypted message:", decryptedMessage);
// Decrypted message: Hi Bob, here is a confidential message!

We applied some degree of encryption to the message, but this algorithm was only useful for demonstration purposes, to get a sense of how symmetric-key encryption algorithms behave.

There are a couple of problems with this implementation besides handling corner cases and parameter types poorly.

First of all every 8 character-long key can decrypt the message which was encrypted with the key "password". We want an encryption algorithm to only be able to decrypt a message if we give it the same key that the message was encrypted with. A door lock that can be opened by every other key isn't that useful.

Secondly, the logic is too simple – every character is shifted the same amount in the ASCII table, which is too predictable. We need something more complex to make it harder to find out the message without the key.

Thirdly, there isn't a minimal key length. Modern algorithms work with at least 128 bit long keys (~16 characters). This significantly increases the number of possible keys, and with this the secureness of encryption.

Lastly, it takes too little time to encrypt or decrypt the message. This is a problem because it doesn't take too much time to try out all possible keys and crack the encrypted message.

This is hand in hand with the key length: An algorithm is secure if I as an attacker want to find the key, then I need to try a large number of key combinations and it takes a relatively long time to try a single combination.

There is a wide range of symmetric encryption algorithms that addressed all of these claims, often used together to find a good ratio of speed and secureness for every situation.

The more popular symmetric-key algorithms are Twofish, Serpent, AES (Rijndael), Blowfish, CAST5, RC4, TDES, and IDEA.

If you want to learn more about cryptography in general check out this talk.

Key exchange

It looks like we reduced the original problem space. With encryption, we can create a message which is meaningful for parties who are eligible to read the information, but which is unreadable for others.

When Alice wants to write a confidential message, she would pick a key and encrypt her message with it and send the ciphertext through the wires. Both Bob and Charlie would receive the encrypted message, but none of them could interpret it without Alice's key.

Now the only question to answer is how Alice and Bob can find a common key just by communicating through the network and prevent Charlie from finding out that same key.

If Alice sends her key directly through the wires Charlie would intercept it and would be able to decrypt all Alice's messages. So this is not a solution. This is called the key exchange problem in computer science.

Diffie–Hellman key exchange

This cool algorithm provides a way of generating a shared key between two people in such a way that the key can't be seen by observing the communication.

As a first step, we'll say that there is a huge prime number, known to all participants, it's public information. We call it "p" or modulus.

There is also another public number called "g" or base, which is less than p.

Don't worry about how these numbers are generated. For the sake of simplicity let's just say Alice picks a very big prime number (p) and a number which is considerably less than p. She then sends them through the wires without any encryption, so all participants will know these numbers.

Example: To understand this through an example, we'll use small numbers. Let's say p=23 and g=5.

As a second step both Alice (a) and Bob (b) will pick a secret number, which they won't tell anybody, it's just locally living in their computers.

Example: Let's say Alice picked 4 (a=4), and Bob picked 3 (b=3).

As a next step, they will do some math on their secret numbers, they will calculate:

the base (g) in the power of their secret number,
and take the calculated number's modulo to p.
Call the result A (for Alice) and B (for Bob).

Modulo is a simple mathematical statement, and we use it to find the remainder after dividing one number by another. Here is an example: 23 mod 4 = 3, because 23/4 is 5 and 3 remains.

Maybe it's easier to see all of this in code:

// base
const g = 5;
// modulus
const p = 23;

// Alice's randomly picked number
const a = 4;
// Alice's calculated value
const A = Math.pow(g, a)%p;

// Do the same for Bob
const b = 3;
const B = Math.pow(g, b)%p;

console.log("Alice's calculated value (A):", A);
// Alice's calculated value (A): 4
console.log("Bob's calculated value (B):", B);
// Bob's calculated value (B): 10

Now both Alice and Bob will send their calculated values (A, B) through the network, so all participants will know them.

As a last step Alice and Bob will take each other's calculated values and do the following:

Alice will take Bob's calculated value (B) in the power of his secret number (a),
and calculate this number's modulo to p and will call the result s (secret).
Bob will do the same but with Alice's calculated value (A), and his secret number (b).

At this point, they successfully generated a common secret (s), even if it's hard to see right now. We will explore this in more detail in a second.

In code:

// Alice calculate the common secret
const secretOfAlice = Math.pow(B, a)%p;
console.log("Alice's calculated secret:", secretOfAlice);
// Alice's calculated secret: 18

// Bob will calculate
const secretOfBob = Math.pow(A, b)%p;
console.log("Bob's calculated secret:", secretOfBob);
// Bob's calculated secret: 18

As you can see both Alice and Bob got the number 18, which they can use as a key to encrypt messages. It seems magic at this point, but it's just some math.

Let's see why they got the same number by splitting up the calculations into elementary pieces:

The process as an equation

In the last step, we used a modulo arithmetic identity and its distributive properties to simplify nested modulo statements.

So Alice and Bob have the same key, but let's see what Charlie saw from all of this. We know that p and g are public numbers, available for everyone.

We also know that Alice and Bob sent their calculated values (A, B) through the network, so that can be also caught by Charlie.

Charlie's perspective

Charlie knows almost all parameters of this equation, just a and b remain hidden. To stay with the example, if he knows that A is 4 and p is 23, g to the power of a can be 4, 27, 50, 73, ... and infinite other numbers which result in 4 in the modulo space.

He also knows that only the subset of these numbers are possible options because not all numbers are an exponent of 5 (g), but this is still an infinite number of options to try.

This doesn't seem too secure with small numbers. But at the beginning I said that p is a really large number, often 2000 or 4000 bits long. This makes it almost impossible to guess the value of a or b in the real world.

The common key Alice and Bob both possess only can be generated by knowing a or b, besides the information that traveled through the network.

If you're more visual, here is a great diagram shows this whole process by mixing buckets of paint instead of numbers.

_source: Wikipedia_

Here p and g shared constants represented by the yellow "Common paint". Secret numbers of Alice and Bob (a, b) is "Secret colours", and "Common secret" is what we called s.

This is a great analogy because it's representing the irreversibility of the modulo operation. As mixed paints can't be unmixed to their original components, the result of a modulo operation can't be reversed.

Summary

Now the original problem can be solved by encrypting messages using a shared key, which was exchanged with the Diffie-Hellman algorithm.

With this Alice and Bob can communicate securely, and Charlie cannot read their messages even if he is part of the same network.

Thanks for reading this far! I hope you got some value from this post and understood some parts of this interesting communication flow.

If it was hard to follow the math of this explanation, here is a great video to help you understand the algorithm without math, from a higher level.

If you liked this post, you may want to follow me on Twitter to find some more exciting resources about programming and software development.

How to Send Secret Messages

freeCodeCamp — Mon, 08 Jul 2019 21:04:00 +0000

By Megan Kaczanowski

Cryptography is the science of using codes and ciphers to protect messages, at its most basic level. Encryption is encoding messages with the intent of only allowing the intended recipient to understand the meaning of the message. It is a two way function (you need to be able to undo whatever scrambling you’ve done to the message). This is designed to protect data in transit.

One of the earliest ciphers involved a simple shift. For example, if you just shift all the letters in the alphabet by a few, the alphabet might look like the following:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

NOPQRSTUVWXYZABCDEFGHIJKLM

Then, each letter of the alphabet corresponds to a different letter, but it is difficult to figure out which one, if you don’t already know. Using this cipher, the message, ‘Hello’ translates to ‘Uryyb’.

Unfortunately, advances in analysis, particularly pattern analysis driven by very powerful computers, made these types of cyphers very easy to break.

In response to that, we’ve developed very strong, complex algorithms. These can be broken down into two basic types of encryptions — symmetric algorithms and asymmetric algorithms.

Symmetric algorithms are also known as ‘secret key’ algorithms, and asymmetric algorithms are known as ‘public key’ algorithms. The key difference between the two is that symmetric algorithms use the same key to encode and decode (see the first figure below), while asymmetric algorithms use different keys for encryption and decryption (see the second figure below).

As you can see in the above figure, with symmetric encryption, if Bob and Midge want to communicate, Bob first encrypts his message with the secret key (the encrypted message is called ciphertext). Then he sends it to Midge. Midge then decrypts the message with the same secret key and is able to read the message. To send a message back, the process is reversed.

This process is fast, scalable, and very secure. The problem with it is that it requires both parties to already have the same secret key. If they don’t, they need to pass it along insecure channels, which essentially removes the security of the encryption.

With Asymmetric encryption, as in the above figure, if Bob and Midge want to communicate, Bob encrypts his message with Midge’s public key and sends it to her. She then decrypts the message with her private key to read it. To send a message back, the process is reversed.

In this way, anyone can send Midge a message, as she can make her public key available to anyone, but only she can decrypt a message (as she keeps her private key secret). It also solves the need to pass a secret key along insecure channels, because there is no need to pass a secret at all. The disadvantage is that it requires everyone who wants to communicate to have two different keys (not scalable), and it is relatively slow.

In general, when talking about encryption, the most important considerations are:

Authentication/Nonrepudiation — Whether or not you can prove where messages originated (Am I sure who sent this message?).
Reuse — Can I continue to use this key or will it need to be regenerated for each new communication?
Effectiveness — How fast can I transfer large amounts of data?
Scalability — Is this feasible for large groups?
Distribution — how do you distribute keys to the people who you’re communicating with, without divulging the secret to anyone else?

That’s where significant differences start to come up between symmetric and asymmetric encryption, summarized below:

In order to use the best of both worlds, many modern encryption protocols will use asymmetric encryption to establish a connection and create a shared secret. Then, they will switch to symmetric encryption to benefit from the speed difference.

An Introduction to Cryptography and Linear Feedback Shift Registers

freeCodeCamp — Sat, 22 Jun 2019 12:02:44 +0000

By Magdalena Stenius

All around us data is transferred faster than ever. Sensitive data is also part of our everyday life. To protect that data, we use encryption. When we encrypt data, it changes in some way that renders it useless to the possible viewer, but that can be changed back to its original state when it arrives safely to the meant receiver. These transformations rely heavily on math, and particularly on a field of math called number theory. This text takes us through the basics of cryptography both from a mathematical perspective and as a programming matter.

Ciphers Yesterday and Today

For as long as writing has existed, the concept of encryption has lived and developed alongside the plain text writing. The idea of rendering text seemingly incomprehensible for purposes of guarding a secret has been central especially in military use and politics. The word cipher originates from the medieval times, from words such as the latin cifra and Arabic صفر (sifr), which means “zero”. There are numerous theories on why zero would have been used to describe encryption, including that the concept of zero was not part of the roman number system and seen as a mystery among numbers. One of the oldest and most widely known ciphers used in military context is Caesars cipher, also known as Caesars shift.

Caesars Shift in Python3.

Caesars shift takes one key, which is used to shift each character in the plaintext. This single key is the weakness of the cipher: once the correct shift is figured out, the whole message is revealed. Mathematically, this type of cipher can be written as a problem in modular arithmetic, which works with values wrapped up in a specific range. We’ll discuss this in more depth later.

Shift encryption and decryption as modular arithmetic using a 26-letter alphabet.

The way we can solve the plaintext from the encrypted text is by finding the key. In the case of a Caesars cipher of value 3, finding out the key (3) lets us decrypt the whole text in one chunk. The key specifies the output of the encryption algorithm.

Factors and Primes

Perhaps surprisingly, one of the foundational concepts that lays the ground for encryption is that of divisibility. To define what it means, let’s lay down some rules. Firstly, if we have a and b that are integers and a is not 0, a divides b if there is such an integer k that satisfies the following statement.

A is a factor of b.

In case we find an integer which is larger than 1 and that does not have other positive factors than 1 and itself, we call it a prime. Integers larger than one which are not primes are known as composite numbers, due to their composed nature. For example, 4 is larger than 1 and it has a factor 2. Hence, it is a composite. On the other hand, 3 is an integer larger than one, but it does not have any other positive factors than 1 and itself. It is a prime. Other small primes are 2, 5, 7, 11 and 13.

According to the fundamental theorem of arithmetic, every integer larger than 1 can be written as an unique product of primes. This is good news for cryptographers, since they love working with primes. Why would that be? Well, one of the most straightforward reasons is that prime factorisation of large numbers takes up a lot of time. Many well known encryption systems such as RSA is fully based on this fact. The principal it works on is that there exists a public key (a product of two large primes) which is used to encrypt the message, and a secret key (containing those primes) which is used to decrypt the message. These primes are usually around 300 digits long.

A Matter of Congruence

Modularity is one of the foundational pillars of cryptography. Let’s approach this concept first from a perspective of division. What happens if we have 5 small candies and three students? Each student gets a candy, and 2 remain. This can be described as the following.

R is the remainder of a when divided by n.

Can you find the other amounts of candies which leave 2 as a remainder when divided to the 3 students? The next amount would be 8, since each student would get two candies and again 2 would be left over. This can be described using congruence. 8 and 5 are congruent is modulo 3, meaning that they leave the same remainder when divided by 3.

5 is congruent to 8 in modulo 3.

In the example of Caesars shift, we use an alphabet that consists of 26 letters. We only work with those 26 values. After ‘Z’, we go back to ‘A’. This is modularity in practice. ‘A’ will always be at position 1 in our 26-letter list, so any count of position we get, if we divide it by 26 and the remainder is 1, we know to use ‘A’. This wraps up our numbers into a finite field, in which the largest value is 26. In practice, if my secret message would be ‘ABC’, I would first convert this to the numbers 123. After that, I would apply the shift. In case the key would be 3, the shift would produce 456. After this, I would point the numbers back to their letter representations, which are in the class of modulo 26. The encrypted message becomes ‘DEF’.

Again, encryption and decryption as modular arithmetic using a 26-letter alphabet.

You can think of this like a clock. When the arrow has gone around the clock, it ends up where it started. In modular arithmetics, the last integer is followed by the first. Another way to understand this is that the world of a specific modulo, only that amount of values exist. For example in modulo 2, only 2 values exist. In our alphabet, 26 values exist, and so on.

Types of Ciphers

What kind of keys a cipher uses can be used to categorise the cipher into asymmetric and symmetric keys. They differ in the question of which key is used for encryption and decryption. Symmetric ciphers are encrypted and decrypted using the same key (such as Caesars Cipher). Asymmetric key ciphers are decrypted with a different key than they are created with, such as the RSA encryption system which we briefly discussed earlier. This results in a longer time for creating the encryption, but the result is also much more secure.

Another way to categorise ciphers is by their way of operating in streams or blocks. Stream ciphers are symmetric key ciphers that operate on continuous streams of symbols. For example the encryptions used in Bluetooth is a stream cipher. Needless to say, in the age of wireless communication with a need for encryption, stream ciphers have become a vital part of mobile technology.

A Look at Stream Ciphers

Remember that we discussed the concept of modular arithmetic earlier? In short, modular arithmetics are arithmetics in a finite field. Now, let’s take a look at another cipher that works with a finite field of values (also known as a Galois field). This cipher, however, does not always produce the same values given the same input, like shifting does. Its purpose is to produce a stream of keys used to encrypt another stream. Like a snake eating its own tail (a symbol often used for eternity), linear feedback shift registers work by feeding on their own output. They are constructed in a way that makes them endlessly cycle through a pattern of values while outputting that seemingly random pattern. The seed and all the outputted values are binary, meaning they get values 0 or 1. The way new values are created is by using a logical operator, usually exclusive or (XOR).

Logical Gate XOR.

To describe this in a practical way, lets start looking at what we need to create a LFSR. We need a seed, which is a list of ones and zeros. The seed will be what we start shifting. In addition to our seed (or shift register) we have a collection of taps. The taps tell us which parts of the register we use when feeding back into it. Say that we have a seed 001 and two taps, 1 and 3. This means that when we start shifting, the new value will be a combination of the first and third numbers of the seed, 0 and 1. We use an operation called exclusive or to combine the two. 0 xor 1 gives 1. Since we are working with binary values, the feedback from our taps can be expressed as a polynomial in modulo 2.

The feedback polynomial from taps 3 and 1.

So, if our shift register is 001 and we get a new value, 1, we insert it in the beginning and drop the last number out. Our new shift register state is now 100. We continue this shifting until we notice that our shift register has returned to it’s initial state, 001. Depending on the seed and taps we select, we can get loops of different lengths. A loop is called maximal length if it passes through all possible different combinations before reaching its original state. Since we’re using the binary system, the maximal length of a loop will be 2^n-1. The loop can also end up leaving its original state and getting stuck in a shorter loop within, never returning to its original state. Finding the seeds and taps that lead to a maximal-length cycle is essential. Some of the criterions for finding these taps is that the number of taps must be even and that the taps are setwise co-primes, meaning that they have no common divisor except 1.

Wait, that doesn’t seem so random! Wouldn’t a cycle like that be pretty easy to crack? The thing about shift registers is that they get pretty long, pretty quickly. Say we choose a seed of 20 bits and a tap of two values, 2 and 19. The length of the loop produced is 1 048 575, meaning we would get quite a large amount of seemingly random binary values.

Linear Feedback Shift Register in Python3.

The flavour of LFSR we have briefly gone through is called Fibonacci LFSR. There are also other variations, in which the way the register is shifted differs. They all work to produce a pseudorandom stream of bits used to encrypt streams. The range of applications for this type of encryption ranges from bluetooth to GSM (cellphone communication) standards.

In Conclusion

As a programmer, learning about the concept of modular arithmetics and division opens new ways in thinking about everyday coding problems. However, in security-critical projects using ready-made systems and standards for encryption is always recommended, since specialists in the field of cryptography probably find a safer and more effective solution than an enthusiastic hobbyist.

Sources:

Algebraic Structures in Cryptography by V. Niemi

Tutorial on Linear Feedback Shift Registers by EETimes

Encyclopedia of Cryptography and Security by Anne Canteout

How Devise keeps your Rails app passwords safe

freeCodeCamp — Mon, 15 Oct 2018 10:45:18 +0000

By Tiago Alves

Devise is an incredible authentication solution for Rails with more than 40 million downloads. However, since it abstracts most of the cryptographic operations, it’s not always easy to understand what’s happening behind the scenes.

One of those abstractions culminates in the persistence of an encrypted_password directly on the database. So I’ve always been curious about what it actually represents. Here’s an example:

$2a$11$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO

But what does that gibberish mean?

Devise uses Bcrypt to securely store information. On its website it mentions that it uses “OpenBSD bcrypt() password hashing algorithm, allowing you to easily store a secure hash of your users’ passwords”. But what exactly is this hash? How does it work and how does it keep stored passwords safe?

That’s what I want to show you today.

Let’s work backwards — from the stored hash on your database to the encryption and decryption process.

That hash $2a$11$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO is actually comprised of several components:

Bcrypt version (2a) - the version of the bcrypt() algorithm used to produce this hash (stored after the first $ sign)
Cost (11) - the cost factor used to create the hash (stored after the second $ sign)
Salt ($2a$11$yMMbLgN9uY6J3LhorfU9iu) - a random string that when combined with your password makes it unique (first 29 characters)
Checksum (LAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO) - the actual hash portion of the stored encrypted_password (remaining string after the 29 chars)

Let’s explore the last 3 parameters:

When using Devise, the **Cost** value is set by a class variable called stretches and the default value is 11. It specifies the number of times the password is hashed. (_On your devise.rb initializer, you can configure this to a lower value for the test environment to make your test suite run faster._) *
The salt is the random string used to combine with the original password. This is what makes the same password have different values when stored encrypted. (_See more below about why that matters and what are Rainbow Table Attack_s.) **
The checksum is the actual generated hash of the password after being combined with the random salt.

When a user registers on your app, they must set a password. Before this password is stored in the database, a random salt is generated via BCrypt::Engine.generate_salt(cost) by taking into account the cost factor previously mentioned. (Note: if the [pepper](https://github.com/plataformatec/devise/blob/715192a7709a4c02127afb067e66230061b82cf2/lib/devise.rb#L155) class variable value is set it will append its value to the password before salting it.)

With that salt (ex. $2a$11$yMMbLgN9uY6J3LhorfU9iu, which includes the cost factor) it will call BCrypt::Engine.hash_secret(password, salt) that computes the final hash to be stored using the generated salt and the password selected by the user. This final hash (for example, $2a$11$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO) will in turn be stored in the encrypted_password column of the database.

But if this hash is nonreversible and the salt is randomly generated on the BCrypt::Password.create call by BCrypt::Engine.generate_salt(cost), how can it be used to sign in the user?

That’s where those different hash components are useful. After finding the record that matches the email supplied by the user to sign in, the encrypted password is retrieved and broken down into the different components mentioned above (Bcrypt version, Cost, Salt and Checksum).

After this initial preparation, here’s what happens next:

Fetch the input password (1234)
Fetch the salt of the stored password ($2a$11$yMMbLgN9uY6J3LhorfU9iu)
Generate the hash from the password and salt using the same bcrypt version and cost factor (BCrypt::Engine.hash_secret(“1234”, “$2a$11$yMMbLgN9uY6J3LhorfU9iu”))
Check if the stored hash is the same one as the computed on step 3 ($2a$11$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO)

And that’s how Devise stores passwords securely and protects you from a range of attacks even if your database is compromised.

Get in touch on Twitter @alvesjtiago and let me know if you found this article interesting! Thank you for reading.

PS: I’m by no means a security or cryptography expert so please do reach out if you find something wrong. I’m hoping that by simplifying some of the concepts it will be easier to understand what’s happening.

_Thank you @filipepina, @ivobenedito, @jackveiga, @joao_mags and @pedrosmmoreira for the reviews and suggestions. This article is also available at http://blog.tiagoalves.me/how-does-devise-keep-your-passwords-safe._

More information about some of the topics.

Cost factor *

Rainbow Table Attacks **

HTTPS explained with carrier pigeons

freeCodeCamp — Wed, 10 Jan 2018 22:07:10 +0000

By Andrea Zanin

Korean translation
Portuguese translation
Spanish translation
Mongolian translation
Persian translation
Vietnamese translation

Cryptography can be a hard subject to understand. It’s full of mathematical proofs. But unless you are actually developing cryptographic systems, much of that complexity is not necessary to understand what is going on at a high level.

If you opened this article hoping to create the next HTTPS protocol, I’m sorry to say that pigeons won’t be enough. Otherwise, brew some coffee and enjoy the article.

Alice, Bob and … pigeons?

Any activity you do on the Internet (reading this article, buying stuff on Amazon, uploading cat pictures) comes down to sending and receiving messages to and from a server.

This can be a bit abstract so let’s imagine that those messages were delivered by carrier pigeons. I know that this may seem very arbitrary, but trust me HTTPS works the same way, albeit a lot faster.

Also instead of talking about servers, clients and hackers, we will talk about Alice, Bob and Mallory. If this isn’t your first time trying to understand cryptographic concepts you will recognize those names, because they are widely used in technical literature.

A first naive communication

If Alice wants to send a message to Bob, she attaches the message on the carrier pigeon’s leg and sends it to Bob. Bob receives the message, reads it and it’s all is good.

But what if Mallory intercepted Alice’s pigeon in flight and changed the message? Bob would have no way of knowing that the message that was sent by Alice was modified in transit.

This is how HTTP works. Pretty scary right? I wouldn’t send my bank credentials over HTTP and neither should you.

A secret code

Now what if Alice and Bob are very crafty. They agree that they will write their messages using a secret code. They will shift each letter by 3 positions in the alphabet. For example D → A, E → B, F → C. The plain text message “secret message” would be “pbzobq jbppxdb”.

Now if Mallory intercepts the pigeon she won’t be able to change the message into something meaningful nor understand what it says, because she doesn’t know the code. But Bob can simply apply the code in reverse and decrypt the message where A → D, B → E, C → F. The cipher text “pbzobq jbppxdb” would be decrypted back to “secret message”.

Success!

This is called symmetric key cryptography, because if you know how to encrypt a message you also know how to decrypt it.

The code I described above is commonly known as the Caesar cipher. In real life, we use fancier and more complex codes, but the main idea is the same.

How do we decide the key?

Symmetric key cryptography is very secure if no one apart from the sender and receiver know what key was used. In the Caesar cipher, the key is an offset of how many letters we shift each letter by. In our example we used an offset of 3, but could have also used 4 or 12.

The issue is that if Alice and Bob don’t meet before starting to send messages with the pigeon, they would have no way to establish a key securely. If they send the key in the message itself, Mallory would intercept the message and discover the key. This would allow Mallory to then read or change the message as she wishes before and after Alice and Bob start to encrypt their messages.

This is the typical example of a Man in the Middle Attack and the only way to avoid it is to change the encryption system all together.

Pigeons carrying boxes

So Alice and Bob come up with an even better system. When Bob wants to send Alice a message she will follow the procedure below:

Bob sends a pigeon to Alice without any message.
Alice sends the pigeon back carrying a box with an open lock, but keeping the key.
Bob puts the message in the box, closes the locks and sends the box to Alice.
Alice receives the box, opens it with the key and reads the message.

This way Mallory can’t change the message by intercepting the pigeon, because she doesn’t have the key. The same process is followed when Alice wants to send Bob a message.

Alice and Bob just used what is commonly known as asymmetric key cryptography. It’s called asymmetric, because even if you can encrypt a message (lock the box) you can’t decrypt it (open a closed box).
In technical speech the box is known as the public key and the key to open it is known as the private key.

How do I trust the box?

If you paid attention you may have noticed that we still have a problem. When Bob receives that open box how can he be sure that it came from Alice and that Mallory didn’t intercept the pigeon and changed the box with one she has the key to?

Alice decides that she will sign the box, this way when Bob receives the box he checks the signature and knows that it was Alice who sent the box.

Some of you may be thinking, how would Bob identify Alice’s signature in the first place? Good question. Alice and Bob had this problem too, so they decided that, instead of Alice signing the box, Ted will sign the box.

Who is Ted? Ted is a very famous, well known and trustworthy guy. Ted gave his signature to everyone and everybody trusts that he will only sign boxes for legitimate people.

Ted will only sign an Alice box if he’s sure that the one asking for the signature is Alice. So Mallory cannot get an Alice box signed by Ted on behalf of her as Bob will know that the box is a fraud because Ted only signs boxes for people after verifying their identity.

Ted in technical terms is commonly referred to as a Certification Authority and the browser you are reading this article with comes packaged with the signatures of various Certification Authorities.

So when you connect to a website for the first time you trust its box because you trust Ted and Ted tells you that the box is legitimate.

Boxes are heavy

Alice and Bob now have a reliable system to communicate, but they realize that pigeons carrying boxes are slower than the ones carrying only the message.

They decide that they will use the box method (asymmetric cryptography) only to choose a key to encrypt the message using symmetric cryptography with (remember the Caesar cipher?).

This way they get the best of both worlds. The reliability of asymmetric cryptography and the efficiency of symmetric cryptography.

In the real world there aren’t slow pigeons, but nonetheless encrypting messages using asymmetric cryptography is slower than using symmetric cryptography, so we only use it to exchange the encryption keys.

Now you know how HTTPS works and your coffee should also be ready. Go drink it you deserved it ?

How “Gravity Falls” can help you teach your kids basics of cryptography

freeCodeCamp — Sun, 06 Aug 2017 12:37:42 +0000

By Kamil Tustanowski

It’s Wednesday evening. My two sons and daughter are ready. I press play and we start a journey that takes us all farther than we ever anticipated.

We watched the first episode of Gravity Falls. The visuals, characters, plot and humor are top notch and we definitely wanted more but… we spotted something at the end of credits. Something we didn’t expect. Something that made watching this series far more interesting and engaging.

An encrypted message.

Here’s how we deciphered the codes. And we had a great fun doing this on our own. Without checking any of this in the internet. If I’ve caught your interest, I recommend you to stop reading and try doing this yourself. Then you can come back and read my solutions and explanations below later.

ZHOFRPH WR JUDYLWB IDOOV

We were certain this was a message. By the looks of it I was guessing that it’s encrypted with some kind of substitution cipher.

Encrypting using substitution cipher basically substitutes letters with other letters based on some general rule. Decrypting is done by applying this rule in reverse to encrypted text. This kinds of ciphers are not used anymore because they are easy to break i.e. with cryptoanalysis. You can find more details on this wiki page.

At first we were too excited about the story to focus on the ciphers just yet. We just acknowledged that the ciphers exist and we didn’t know how to decrypt them. I thought we would just break them later but…

After one episode my son had an idea. He wanted to watch show intro. Backwards. I thought why not ? Guess what! When you watch it backwards at some point you can hear hidden message:

Three letters back

Hmm… three letters back. Normally this would’t make any sense. But we had ciphers which we didn’t know how to decode. For us this made perfect sense.

Hello Mr. Caesar

The Caesar cipher is one of the earliest known and simplest ciphers. It is a type of substitution cipher in which each letter in the plaintext is ‘shifted’ a certain number of places down the alphabet. For example, with a shift of 1, A would be replaced by B, B would become C, and so on. The method is named after Julius Caesar, who apparently used it to communicate with his generals. Read more here.

I printed english alphabet for everyone from here and decrypting started:

Z → W because if we move 3 letters back from Z we end up with W
H → E
…
B→ Y because if we move 1 letter back we end up on A and the next 2 we have to count from the end of alphabet so in the end it’s Y

After a while we knew that ZHOFRPH WR JUDYLWB IDOOV is actually WELCOME TO GRAVITY FALLS.

My kids loved it.

When they were manually decrypting next messages I thought that this is great opportunity to actually show them what I’m doing at work. In the way it’s easier for them to understand.

I started new Swift Playground because it’s offering awesome way for working with code. And started coding. I wrote this just for fun so please don’t judge ?:

When manual decoding was done I sit down with my children in front of a computer. I explained that my code is doing the same things they were doing when decrypting messages. But instead of doing this manually it’s automatic and can used many times. They didn’t understood the code, I would be surprised if they did, but I’m pretty sure they got the idea.

KZKVI QZN WRKKVI HZBH: “ZFFTSDCJTSTZWHZWFS!”

Everything was great until episode 7. We started decoding first word and:
KZKVI → HWHSF
Oh-oh, our luck just run out. It was clear that cipher has changed. Luckily there was a clue in message we did decrypt for episode 6 :

MR. CEASARIAN WILL BE OUT NEXT WEEK MR. ATBASH WILL SUBSTITUTE

Ceasar cipher → Atbash cipher

Hello Mr. Atbash

The Atbash cipher is a substitution cipher with a specific key where the letters of the alphabet are reversed. I.e. all ‘A’s are replaced with ‘Z’s, all ‘B’s are replaced with ‘Y’s, and so on. It was originally used for the Hebrew alphabet, but can be used for any alphabet. Read more here. Atbash encrypted strings can be found even in a Bible. You can read a bit more about this in here.

This time it was a bit more time consuming because we had to check character index from beginning and then find letter with this index counted from the end of alphabet. Again my kids were decrypting this manually:
K → P because index of K is 11 and when we count 11 from the end of alphabet we get P
Z → A
K → P
V → E
I → R
KZKVI → PAPER This made sense again.

After a few minutes my daughter approached me and asked whether she decrypted the message properly. She did. But this wasn’t most interesting. I noticed that she wrote something on the printed alphabet page. Above the alphabet indexes 1, 2, 3, …, 26 she added reversed index numbers 26, 25, 24, …, 1.

Thanks to this she didn’t have to count from the end of alphabet anymore. We, programmers, call this optimization. I was amazed that she already started to improve her toolset to make job easier.
Again I prepared small piece of code that was able to decode the messages:

14–5–24–20 21–16: “6–15–15–20–2–15–20 20–23–15: 7–18–21–14–11–12–5'19 7–18–5–22–5–14–7–5”

All was good until episode 14. Then out of the blue cipher changed again. We didn’t get any clue this time. Or maybe just missed it?

Well… maybe not exactly without any clue. The greatest number in ciphered text was 24 smallest was 2. Alphabet letters has indexes from 1 to 26. Based on this we made educated guess that:
1 → A
2 → B
…
26 → Z

When 14–5–24–20 decoded to NEXT we knew that our assumption was correct.

It was a bit more annoying because I didn’t want to strip the message from any characters when decoded. If it doesn’t work for you — please remove unsupported non-aplhanumeric-characters or add currently unsupported characters to .replacingOccurrences. Like I said. Don’t judge ?

5–19–23–6–21–16 18–9–6 4–16–19 22–12–15–10–20–19–25–19

We failed again when we tried to decrypt first word from message from episode 20.
5–19–23–6–21–16 → ESWFUP

Cipher changed. But we didn’t give up easily. Hint there is an , encryptedclue, that says how to decode this message. But I’m leaving this to you. It’s just too much fun to work on this stuff.

Please note that this series has two seasons filled with mysteries and encrypted messages. You won’t get bored.

The end?

Now when I know that my children like to play with cryptography I have a few ideas on thenext step. Definitely it’s not the last time they were working with ciphers and encrypted messages.

Thanks for reading! I hope that I was able to interest you a bit with this. If you actually try this with your kids please add a comment about it. I’m very curious whether it was as fun to you as it was for us.

The many, many ways that cryptographic software can fail

freeCodeCamp — Wed, 25 Jan 2017 02:25:49 +0000

By Nabeel Yoosuf

When cryptographic software fails, what’s to blame?

Algorithms?

Cryptography libraries?

Apps incorrectly using those libraries?

Or is it something else entirely?

We rely on cryptographic algorithms and protocols every day for secure communication over the Internet. We’re able to access our bank accounts online because cryptography protects us. We’re able to send private messages to our friends because cryptography protects us. We’re able to buy and sell things using credit cards and Bitcoin because cryptography protects us.

Let me give you a concrete example of this. When you check your email through your favorite browser, the connection between your browser and the email server is secured using the TLS (transport level security) protocol, so that no one can eavesdrop on your emails or modify them in transit without your knowledge.

In short, without cryptography, the Internet we know today could not be possible. Law and order on the internet depends on cryptography.

But this tool that we all rely upon so heavily is also quite brittle. Our cryptographic software often lets us down. Sometime it really lets us down.

Have you ever wondered why the cryptographic software — including implementations of the TLS protocol — fail over and over again?

According Veracode’s state of security reports, our cryptographic software is just as vulnerabilities as it was two years ago.

Veracode ranked cryptographic issues as #2 vulnerability found in apps in 2015

Veracode again ranked cryptographic issues as #2 vulnerability found in apps in 2o16

Are these failing because of weaknesses in the underlying cryptographic algorithms?

Well, several past attacks (Apple iOS TLS, WD self encrypting drives, Heartbleed, WhatsApp messages, Juniper’s ScreenOS, DROWN, Android N-encryption and so on) show us that our cryptographic software is less likely to be broken due to the weaknesses in the underlying cryptographic algorithms. In other words, cryptanalysis is one of the less likely threats to our cryptographic software.

_A sketch of the AES algorithm ([image credit](http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html" rel="noopener" target="blank" title=")) AKA why you don’t want to roll your own cryptography.

Have you ever heard an attacker breaking a 256-bit AES encryption algorithm to recover the secret hidden within it? None that I know of. (Of course, if you use a vulnerable obsolete cryptographic protocol like DES or RC4, cryptanalysis might help break the software). So if the culprit isn’t cryptanalysis, then what is it?

Your security is only as good as its weakest link.

Well, it’s everything but cryptanalysis. In other words, cryptanalysis is not the weakest link of cryptographic software. Bad actors use numerous other weak links to break cryptographic software.

Cause of failure #1: bugs in crypto libraries

One popular example is the Heartbleed bug.

What’s the matter with Heartbleed? This bug (CVE-2014–0160) was introduced due to an incorrect implementation of the TLS heartbeat extension in the widely-used OpenSSL (read 66% of the internet), which is used to support TLS in web servers. What does this extension do? As the the name suggests, it’s a keep-alive feature where one end of the connection sends a payload of arbitrary data and the other end is supposed to send the exact copy of the data to prove that all is fine and well.

The bug turned out to be an age-old mistake of not bound checking before memcpy() that uses non-sanitized data. The vulnerable OpenSSL implementation does not validate the payload length against the actual payload. An attacker could lie about the length and get the victim to send more bytes from its memory, as shown in the following diagram.

Attacker sends only one byte payload but sets the length to 65535; the victim blindly copies 65535 from its memory and sends back to the attacker.

This in turn allowed the attacker to obtain session keys and other secret information (like your username and password) from any websites currently in your browser’s memory.

Let me show you the code. The patch is essentially a bound check added to the patched version 1.0.1g as shown below.

====== Vulnerable code =======/* Enter response type, length and copy payload */*bp++ = TLS1_HB_RESPONSE;s2n(payload, bp);memcpy(bp, pl, payload);

====== Patched code =========hbtype = *p++;n2s(p, payload);if (1 + 2 + payload + 16 > s->s3->rrec.length)  return 0; /* silently discard per RFC 6520 sec. 4 */pl = p;

Lesson learned: Always bound check your strings before using them. Sanitization is vital for stopping bad inputs from getting into your system.

Cause of failure #2: operating systems and apps

You probably remember Apple’s “goto” bug (CVE-2014–1266) in its SSL/TLS implementation, disclosed in February 2014.

Apple’s code with the “goto” bug:

1 static OSStatus2 SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa,                                  SSLBuffer signedParams,3                       uint8_t *signature, UInt16 signatureLen)4 {5   OSStatus err;6 …78   if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)9     goto fail;10  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)11    goto fail;12    goto fail;13  if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)14    goto fail;15  …1617 fail:18   SSLFreeBuffer(&signedHashes);19   SSLFreeBuffer(&hashCtx);20   return err;21 }

So, what’s the issue here? The extra goto statement on line 12 bypasses all certificate checks for SSL/TLS connections in iOS and Mac devices. This makes lines 13 to 16 effectively dead code. This simple implementation mistake accepts any invalid certificate, making the connection susceptible to Man in the Middle attacks.

I was curious to find out whether the implementation bugs in crypto software are more due to bugs in the crypto libraries themselves than in the way apps use them. Well researchers from MIT analyzed 269 cryptographic bugs reported in the Common Vulnerabilities and Exposures database between January 2011 and May 2014. They found that only 17% of bugs are caused by the crypto libraries themselves. The remaining 83% are due to misuse of crypto libs by app developers.

But just because the majority of bugs are due to misuse of crypto libraries in apps doesn’t mean that we can just blame app developers and get on with our day.

There could be many reasons behind the above statistics on the crypto misuse. The crypto libraries themselves may not be providing safe default options, may not have adequate documentation or may be difficult to use. Further, many developers may not have a formal understanding of applying cryptography in their software, even though they are experts at software development itself. These all could result in the misuse of crypto libs.

Lesson learned: always use tools to analyze your code. A dead code analysis tool should have caught this specific case.

Cause of failure #3: bad design

In 2015, researchers uncovered a series of issues in WD self-encrypting drives. There were serious design flaws in their use of cryptographic algorithms. I wrote about this in a previous post. Let me show a couple of flaws here.

WD’s self encrypting drive architecture

Following the best practices, WD did use two levels of keys to encrypt documents stored in the drive — master KEK (Key Encryption Key) and per file DEK (Data Encryption Key). Further, they did use a key derivation function to derive KEKs from the password.

But the way they designed the key derivation function itself was totally insecure. They used a fixed salt and a fixed number of iterations. Thus, it was susceptible to pre-computed hash table-based attacks. Attackers could recover keys much faster than a pure brute force attack would have been able to.

WD’s vulnerable key derivation algorithm

And if this vulnerability weren’t enough, WD used a dismal random number generator to generate KEKs. It was not only predictable — it also didn’t have enough complexity (only 40 bits).

Cryptographic protocols critically rely on cryptographically secure pseudorandom number generators. If these aren’t secure enough, any cryptographic algorithm or protocol using these random numbers will be quite easy to break.

WD’s weak random number generator

Lesson learned: Have a good understanding of cryptographic constructs and know their limitations. Follow industry best practices for key derivation.

Cause of failure #4: misconfigurations or insecure default configurations

_Exploiting the weaknesses of SSLv2 ([source](https://drownattack.com/" rel="noopener" target="blank" title="))

DROWN attack of breaking TLS connections via SSLv2 is a good example of this. You may be using fairly secure TLS connection to communicate with a web server, but if the web server still supports (which it shouldn’t) old SSLv2, an attacker can exploit it to break the security provided by TLS and get at your keys and other sensitive information.

SSLv2 has long considered to be broken, and none of the clients today use it for secure connections. But researchers have found that out of 36 million HTTPS servers they probed, 6 million (about 17%) still supported SSLv2.

The above research also uncovers another common lazy practice of using the same key pair in different servers of an organization. It shows how even when one server supports only TLS, if there are other servers supporting SSLv2 with a shared certificate, the server that only supports TLS is vulnerable as well.

Lesson learned: a system is only as secure as its weakest link. Try to protect all of your systems at least reasonably well.

There are lots of other ways cryptographic software can fail

Can you think of some additional ways?

It fails due to users. How? Think about social engineering attacks. RSA SecureID breach is said to originate from phishing emails exploiting users and a zero day vulnerability.

It fails due to unrealistic threat models (Breaking web applications built on top of encrypted data).

It fails due to hardware (Breaking hardware enforced technologies such as TPM with hypervisors).

It fails due to side channels (Timing attacks on RSA, DH and DSS algorithms).

As you can see, cryptographic software can fail due to many reasons. Are we really doomed to never get cryptographic software right? Or can we at least can reduce the number of such failures? Why can’t we learn from the past and avoid the same mistakes happening again and again? What tools will help us spot most of these issues?

Our situation actually isn’t all that bleak. There are ways to prevent most of the failures discussed above. In a follow up post, I’ll explore the topic of how we can make cryptographic software fail less often.

Thanks for reading. If you found this article useful, please click the ? below so that others can see this on Medium.

Cryptography - freeCodeCamp.org

What Your Auth Library Isn't Telling You About Passwords: Hashing and Salting Explained

Prerequisites

Table of Contents

Hashing vs Encryption

Why a Plain Hash Isn't Enough

Enter Salting

Why bcrypt Is Slow (and Why That's the Point)

What's Actually in Your Database

Wrapping Up

Cryptography for Beginners: Full Python Course (SHA-256, AES, RSA, Passwords)

The Cryptography Handbook: Exploring RSA PKCSv1.5, OAEP, and PSS

Table of Contents

Prerequisites

The Alice-Bob Paradigm

The Birth of the RSA Cryptosystem

Prime Numbers and Composite Moduli

The Euler Totient Function

Computing the Keys

RSA Operations

Encryption

Decryption

Digital Signatures

1. Signing

2. Verification:

Issues with Euler’s Totient Function in RSA

The Carmichael Function

Mathematical Implication of The Carmichael function

The Carmichael Function in Modern Implementations

Issues with Raw RSA

Exploiting Textbook RSA’s Determinism and Malleability

Key Generation (Setup)

Encryption Process

Determinism Exploit (Ciphertext Guessing Attack)

Malleability Exploit (Ciphertext Manipulation Attack)

Low-Exponent Attacks

Håstad’s Broadcast Attack: Low Exponent Meets Multiple Recipients

Introduction to Padding Schemes in RSA

Public Key Cryptography Standards (PKCS#1 v1.5)

The Mathematics Behind PKCS#1 v1.5

The Bleichenbacher Attack

Optimal Asymmetric Encryption Padding (OAEP)

The Mathematics Behind OAEP

Step 1: Constructing the Data Block (DB)

Step 2: Generating a Mask for the Data Block

Step 3: Mask the Data Block

Step 4: Generate a Mask for the Seed

Step 5: Mask the Seed

Step 6: Form the Final Encoded Message (EM)

Step 7: Covert concatenated String to Integer

Step 8: Perform RSA Encryption

Why SHA-1 or MD5 Are Safe in RSA-OAEP

Label Hashing

Mask Generation Function (MGF1)

Adoption in Cryptographic Libraries (PKCS#1 v1.5 vs OAEP)

Enhancing Digital Signatures: The Transition to PSS

Problems with Early RSA Signature Schemes

Predictability and Replay

Forgery Risks

Birth of the Probabilistic Signature Scheme (PSS)

The Mathematics Behind PSS

Step 1: Message Hashing and Salt Generation

Step 2: Encoding the Hash with the Salt (PSS-Encode)

Step 3: RSA Signature Generation

Step 4: Signature Verification

The Road Ahead: Assessing RSA’s Long-Term Viability

References

Decoding Chaos: How True Randomness Works in Software Engineering

Understanding Randomness

Why is Randomness Important in Software Engineering?

Security

Testing and Quality Assurance

Simulation and Modeling

Additional Applications

Prerequisites

Here's what we'll cover in this article:

Coin Toss Paradigm

The Illusion of Human Randomness

How Random Number Generators Work

Simple random number generator