Mathematics - freeCodeCamp.org

The Math Behind Artificial Intelligence: A Guide to AI Foundations [Full Book]

Tiago Capelo Monteiro — Tue, 06 Jan 2026 23:14:23 +0000

"To understand is to perceive patterns." - Isaiah Berlin

This is not a math book filled with complex formulas, theorems, and concepts that are hard to grasp.

Instead, it’s a detailed guide where we’ll break complex ideas down into simpler terms.

Even if you only have a general understanding of algebra, you should be able to easily follow along.

Here’s what we’ll cover:

Chapter 1: Background on this Book
Chapter 2: The Architecture of Mathematics
Chapter 3: The Field of Artificial Intelligence
Chapter 4: Linear Algebra - The Geometry of Data
Chapter 5: Multivariable Calculus - Change in Many Directions
Chapter 6: Probability & Statistics - Learning from Uncertainty
Chapter 7: Optimization Theory - Teaching Machines to Improve
Conclusion: Where Mathematics and AI Meet
About the Author

Chapter 1: Background on this Book

The Objective Here

My objective in this book is simple: Explain the key mathematical ideas you need to grasp in order to deeply understand AI and train machine learning models.

So you might be wondering: Why is it important to have a good math foundation before creating these models?

Well, there are many reasons, but some are:

It gives you the capacity to understand new AI research on your own.
You can use this same foundation to study other STEM concepts like signal theory and advanced statistical methods.
It helps you understand that AI models are just a mixture of different math ideas working together and gives you insight into how new innovations make LLMs more efficient.
It gives you a foundation so you know how to calibrate AI models and even create derivative models.

These skills are also important for startup founders, especially in Silicon Valley. Many startups begin with APIs or API wrappers but eventually need their own AI solutions.

Outsourcing all AI isn't ideal. This book will help you understand AI foundations so you can design better growth strategies and communicate effectively with investors – especially those who were successful technical co-founders.

Why is This Book About AI Different?

In this book, we’ll look at AI from an engineering perspective. This differs from the typical computer science approach to AI that most introductory courses take.

In doing so, I won’t spend a lot of time explaining formulas and theorems. Instead, I’ll explain their importance, how and why they are applied the way they are.

In this way, I hope to offer a unique viewpoint that emphasizes the engineering principles and good practices that underlie all modern AI technologies.

I will also explain how many of these strange math ideas make billion dollar industries possible.

We’ll start with the fundamentals: the structure of the areas of mathematics and AI. After that, we’ll look at the four subareas of math that make AI possible:

Linear Algebra
Calculus
Probability Theory and Statistics
Optimization Theory

After going through all the math, we’ll connect it with the foundation of ChatGPT and all of these large language models.

This way, you’ll get a basic foundation in key math concepts that, when mixed together like the ingredients of a cake, make all AI models possible.

By knowing where the ideas come from, you’ll develop a system-level understanding of AI and a first-principles approach.

So just keep in mind that, even though concepts like integral calculus and eigenvalues/eigenvectors might not be widely used in AI, they’ll help you develop these system-level and first-principle approaches.

Also, this book will be a work in progress. After its first release, I’ll seek feedback on things I need to perfect, chapters to add, and so on.

Here is my email for any feedback you might have: monteiro.t@northeastern.edu

And here is the book’s GitHub repository with all code: https://github.com/tiagomonteiro0715/The-Math-Behind-Artificial-Intelligence-A-Guide-to-AI-Foundations

Let Me Introduce Myself

My name is Tiago Monteiro, an electrical and computer engineer and AI master's degree student at Northeastern University's Silicon Valley campus. I have authored 20+ articles with 240K+ views here on freeCodeCamp on math, AI, and tech.

If you’d like to know more about my background, I’ll share that at the end of the book.

Prerequisites

In terms of minimum requirements, you only need to know the basics of mathematics and programming:

Basic algebra and what functions and the coordinate system are.
You should be able to read Python code and understand things like variables, functions, and loops.

Chapter 2: The Architecture of Mathematics

Math is more than numbers. It’s the science of locating complex patterns that shape our world. To truly understand math, we must look beyond numbers and formulas to grasp its structures.

This chapter aims to show math as a growing tree of ideas, a living system of logic, not just formulas to memorize. With analogies, history, and code examples, I want to help you understand math deeply and how to apply it to programming.

I’ve included code examples to connect theory and practice, showing how math ideas apply to real problems. Whether you're new to advanced math or are more experienced, these examples will help you apply math in programming.

This way, before we start going over the different math pillars that sustain AI, you will understand the structure of the field.

The Tree of Mathematics: How Everything Connects

Photo by Lerkrat Tangsri

Imagine math as a vast, ever-growing tree.

The roots are the foundations: logic and set theory. From these roots, the main fields emerge: arithmetic, algebra, geometry, and analysis.

As the tree branches out, new subfields like topology and abstract algebra appear. Sometimes branches connect with each other.

This tree keeps growing in many directions. History shows that sometimes it grows rapidly due to scientific discoveries, while at other times, growth is slow.

And you might wonder: How many more branches and connections between them will keep appearing?

A Quick History of Mathematics: From Counting to Infinity

The first mathematical ideas emerged independently in ancient civilizations, such as:

India's invention of zero
Islamic algebraic advances
Greek geometric rigor

Great mathematicians developed and shared these ideas through writing and lectures. Over time, new generations built on these ideas, creating new branches of mathematics. This endless growth is why Isaac Newton wrote to Robert Hooke in 1675:

“If I have seen further, it is by standing on the shoulders of giants.”

He meant that by working from previous knowledge, he was able to create and (re)discover new ideas.

Yet, the real power of math lies in practicing it over and over and studying it more and more deeply.

As one of my professors once pointed out:

“More important than knowing the theorems is knowing the ideas behind them and the history of how they were created.”

To solve problems, it's often necessary to think from first principles, and math teaches this. Math is not just an academic topic. It’s a global language for scientists and engineers.

By preserving and sharing it, new math can grow from old ideas, allowing the tree to keep expanding.

Foundations of Relativity: How Einstein Used Math to Understand Space and Time

Photo by Pixabay

Albert Einstein developed the general and special theories of relativity, which impact:

GPS and global communication
Satellite telecommunications
Space exploration and satellite launches

And more.

But this was only possible by combining geometry with calculus, known as differential geometry. This field evolved over centuries, thanks to many great mathematicians. Here are a few of them, though the list is not exhaustive:

Euclid (circa 300 BCE): Contributed to geometry, laying the groundwork for later mathematical systems
Archimedes (circa 287–212 BCE): Pioneered the understanding of volume, surface area, and the principles of mechanics
René Descartes (1596–1650): Developed Cartesian coordinates and analytical geometry
Isaac Newton (1642–1727) & Gottfried Wilhelm Leibniz (1646–1716): Newton’s laws of motion and gravitation, alongside Leibniz’s development of calculus, formed the basis of classical mechanics that Einstein sought to extend and modify in his theory of relativity.
Leonhard Euler (1707–1783): Contributed to the development of differential equations, which are essential in the mathematical foundations of physics.
Gaspard Monge (1746–1818): The father of differential geometry and pioneer in descriptive geometry
Carl Friedrich Gauss (1777–1855): Made groundbreaking advances in geometry, including the concept of curved surfaces.
Bernhard Riemann (1826–1866): Introduced Riemannian geometry, a branch of differential geometry.

Going back to Albert Einstein, he saw what no one else in his time saw, thanks to these great math giants and countless others.

Gödel’s Biggest Paradox: Can Math Explain Itself?

The biggest paradox in math, discovered by Kurt Gödel, is his incompleteness theorems. They show that in any consistent formal system capable of simple arithmetic, there are true statements that cannot be proven within the system.

This means there are limits to what can be proven as true or false. For mathematicians, this implies that some truths are beyond formal proofs, yet we assume they are true. It demonstrates that no matter how much effort or AI is used, some things remain unprovable, known only through approximations and non-exact methods.

What About Applied Math and Engineering?

Applied math and engineering involve adapting the pure math ideas in real-world scenarios.

Actually, in many cases, it’s the combination of many math ideas.

Let’s consider some examples:

In harmonic analysis, Laplace, Fourier, and Z-transforms are a way to see the same thing in a new domain to get new insights. In this case, integrals are used to make this mapping possible.
Principal component analysis (PCA) is a widely used tool in data science. Yet, it is a mixture of linear algebra (in PCA, eigenvalues) with optimization (order eigenvalues that represent more data with less data) in order to make datasets shorter.
In machine learning, logistic regression is a mixture of calculus with statistics and probability.
In deep learning, neural networks are just many matrices multiplying and updating themselves that adapt to model a dataset representing a system. This optimization of matrix values happens with activation functions, a gradient descent-based optimization method (tells how much values need to change), and backpropagation (applies those alterations to all matrix values).

But the best example of this fusion of math in engineering is in control theory. Control theory is the study of the architecture of systems. From trains to cars to airplanes, everything is based on control theory. It’s everywhere, in nearly all modern electronic devices. In electric circuits, control theory is also used heavily to guarantee circuit stability in the face of electric disturbances.

So as you can probably start to see, many of the tools we now have are just a mixture of many pure math ideas – like different recipes. In essence, applied math is the application of pure math as “ingredients“ in "recipes" to solve problems.

So, we’ve explored the structure and evolution of mathematics. But it’s important to see how we can apply these ideas in real life. Pure math makes the framework, and applied math applies that framework to solve problems. To understand this, we’ll examine two code examples that show how you can use math ideas as programming tools.

Code Examples: Analytical and Numerical Approaches

These code examples demonstrate a couple ways you can use Python to solve math equations.

In the first code example, we’ll solve the problem in the same way that kids in school solve math exercises: essentially, by hand with a pencil. In the second example, we’ll solve the problem using numerical analysis.

Example 1: Solve a Problem Analytically

In this problem, we need to find the values of the variables x and y. So we’ll be moving variables from left to right to find their values.

When we solve math problems analytically, like we did in school, we are manipulating symbols to get exact values. Often these symbols are x, y, and z.

The code below solves a system of two equations with two unknowns variables, x and y.

We will use the SymPy Python library to do this. It’s mainly used for symbolic mathematics.

from sympy import symbols, Eq, solve

x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)

solution = solve((eq1, eq2), (x, y))
print(solution)

Once again with this code we are finding the values of the variables x and y.

Essentially, we’re finding x and y based on this equation:

$$\begin{align} 2x + 3y &= 6 \ -x + y &= 1 \end{align}$$

Which gives us the following result:

{x: 3/5, y: 8/5}

Or:

x= 0.6
y = 1.6

When we say that we’re solving this analytically, it means that we’re finding an exact mathematical solution using formulas or equations.

But many times, problems are harder and can be solved by adding symbols to the right or left of the equation. Sometimes, there can be so many symbols and transformed versions of them, with things like derivatives and integrals, that it can become very hard to manage and takes a lot of time.

For example, let’s look at this partial differential equation:

$$\begin{cases} \frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, & 0 < x < L, , t > 0 \ u(0,t) = 0, & t > 0 \ u(L,t) = 0, & t > 0 \ u(x,0) = f(x), & 0 < x < L \end{cases}$$

It can be solved with an analytical method call separation of variables.

But it requires many steps, and it’s easy to make mistakes. Even engineers who learned this often struggle to remember the process later.

When I first encountered this type of math exercise in my electrical and computer engineering degree back in Portugal, it took me 20 to 30 minutes to solve it.

For this reason, there's a branch of mathematics called numerical analysis that focuses on finding approximations of existing formulas. It helps solve problems faster. This is the method we'll explore next.

Example 2: Solve Numerically (Approximation)

Now let’s solve a different problem: we’re going to find the values of each of the 5 variables:

$$\begin{bmatrix} 3 & 2 & -1 & 4 & 5 \ 1 & 1 & 3 & 2 & -2 \ 4 & -1 & 2 & 1 & 0 \ 5 & 3 & -2 & 1 & 1 \ 2 & -3 & 1 & 3 & 4 \end{bmatrix} \times \begin{bmatrix} x_1 \ x_2 \ x_3 \ x_4 \ x_5 \end{bmatrix} = \begin{bmatrix} 12 \ 5 \ 7 \ 9 \ 10 \end{bmatrix}$$

Solving this by hand will take some time…but with Python code, it’s very fast.

We’ll also use the SciPy Python library for this example.

Let’s solve the system numerically:

import numpy as np
from scipy.linalg import solve

A = np.array([[3, 2, -1, 4, 5],
              [1, 1, 3, 2, -2],
              [4, -1, 2, 1, 0],
              [5, 3, -2, 1, 1],
              [2, -3, 1, 3, 4]])

b = np.array([12, 5, 7, 9, 10])

solution = solve(A, b)

print(solution)

Which corresponds to this operation:

Again, it takes time to solve this and it’s very easy to make a simple mistake.

But in this code example, this line of code:

solution = solve(A, b)

Uses the solve method from SciPy:

from scipy.linalg import solve

It’s a method that helps you find the values of x in an equation A⋅x=b, where A is a square grid of numbers and b is a list of numbers. That gives us the following:

[ 1.35022026 -0.79955947 -1.17180617  3.14317181 -0.83920705]

Which corresponds to:

$$\begin{bmatrix} x_1 \ x_2 \ x_3 \ x_4 \ x_5 \end{bmatrix} = \begin{bmatrix} 1.35022026 \ -0.79955947 \ -1.17180617 \ 3.14317181 \ -0.83920705 \end{bmatrix}$$

And is the same thing as:

$$\begin{align} x_1 &= 1.35022026 \ x_2 &= -0.79955947 \ x_3 &= -1.17180617 \ x_4 &= 3.14317181 \ x_5 &= -0.83920705 \end{align}$$

Why These Two Approaches Matter

We have solved two mathematical problems in two different ways:

Analytical: Exact solutions through algebraic manipulation
Numerical: Approximate solutions using algorithms

In engineering and in AI, we are constantly choosing between these approaches.

When training AI models with millions of parameters, analytical solutions are impossible. This is why, in these cases, we need numerical approaches.

When creating math theorems, we need analytical precision to make sure it is the best possible solution.

This is one of the many things an engineering degree teaches you: often, in the real world, it’s better to just write some code to solve a problem than to actually solve it by hand with math. Other times, the best solution is to just think in first principles and from there create new theorems to solve a problem.

Now let's step out of the code examples and see how different branches of mathematics connect.

The Impact of a Grand Unified Theory of Mathematics

Is it possible to unify all math?

In theory, yes. This is known as the Grand Unified Theory of Mathematics. It's the idea that all different areas of math can be linked together to discover deeper patterns in mathematics.

The Langlands program is trying to make this unification possible. It’s an attempt to interconnect the largest parts of the big tree of math to uncover new patterns in math.

With a Grand Unified Theory of Mathematics, we would be able to understand how every branch of the tree connects with the others and all the relationships between them.

What’s the Value of this Big Unification for Society?

By studying history, we can find patterns. The unification of various fields has created many massive impacts on society, such as:

In the 19th century, James Clerk Maxwell united the fields of electricity and magnetism with his famous Maxwell equations. This allowed the creation of radios and electric grids around the globe. In turn, it served as a foundation for all technological progress in the 20th and 21st century.
In the 20th century, the unification of algebra with logic led to the rise of digital systems. In turn, digital systems gave rise to processors and the evolution of computers and the modern laptop.
Also in the 20th century, the unification of probability and communication led to information theory. This became the foundation for the internet. This unification was carried out by a great mathematician named Claude Shannon.

In the end, a grand unified theory of mathematics could be one of the biggest achievements in modern society.

In AI, it could help unify all machine learning models in a common architecture. This would help accelerate the development of new AI models and could also open the door to new material science advances.

It could help reveal – with math – the deep patterns we still haven’t found in these fields. Just as uniting electricity and magnetism led to modern technology, a unified math framework would lead to a wave of innovation.

A Final Lesson From History

From Greek geometry to AI, math has grown like a tree over centuries. By understanding its structure, it’s possible to see its role in finding the patterns of our universe.

I hope I was able to make you see math in this way. I hope you can also see that the unification of scientific fields helps lay the foundations for the creation of new innovations to help society go forward.

Many major societal transformations only came to be thanks to abstract math ideas. When these are shared and refined, they become the hidden architecture of progress in society. Innovation begins when disconnected ideas are united, well-linked, and widely shared.

Chapter 3: The Field of Artificial Intelligence

What is Artificial Intelligence?

Photo by Pavel Danilyuk

The term Artificial Intelligence was born from the work of John McCarthy, who is often called the "father of AI."

He used it when he, along with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, proposed the famous Dartmouth Summer Research Project on Artificial Intelligence in 1956.

Artificial intelligence was defined, in the Dartmouth Conference, as:

“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

Since then, the field has evolved in waves of innovation, from early rules-based systems to modern neural networks.

But over time, rather than creating general intelligence, most AI systems have been designed to excel at narrow tasks.

For example:

Chess-playing programs like Deep Blue that defeated world champion Garry Kasparov
Image recognition systems that can identify objects in photographs with impressive accuracy
Natural language processing models that can translate between languages
Game-playing AI like AlphaGo that mastered the ancient game of Go

Artificial General Intelligence isn’t yet here

Only very narrow AI models have demonstrated human-level or superhuman performance in their narrow domains.

In my view, and as we will see in this book, AGI will be the combination and interaction of different large language models interacting with each other and with the tools available to them.

Symbolic vs. Non-symbolic AI: What’s the Difference?

What is Symbolic AI?

Symbolic AI refers to the creation of a program based on many rules and symbols to simulate how humans think.

It uses symbols to represent concepts (like farms and distributors) and logical rules to reason about them.

The specific data about your domain is called facts. Facts are the pieces of information the rules operate on. For example, a fact might be "green_acres has high water usage and good pH levels."

Also, imagine someone wants to optimize farm distribution logistics. The symbols would represent farms, distributors, and transport methods. Then the rules would be:

If the farm has high water usage and good pH levels, then classify it as high-yield producer
If a high-yield producer and distributor has low demand, then prioritize direct connection
If a direct connection is needed, then select transport with lowest environmental impact

The facts would be the actual data like "farm X has high water usage" or "distributor Y has low demand."

This way, the system combines these rules and facts through logical reasoning to make decisions. A very popular programming language we use in this field is called Prolog that was designed to create rule-based systems.

Symbolic AI program: Manage agricultural networks with a Prolog program.

Let’s look at an example project to understand this more clearly. The project we’ll examine is called SymbolicAIHarvest. It was part of a course at NOVA University during my undergraduate studies in Electrical and Computer Engineering. The course was titled "Modelation of Data in Engineering."

SymbolicAIHarvest is an AI system developed with Prolog to manage agricultural networks. Here’s the project on GitHub so you can check it out.

The project optimizes farm operations using rule-based reasoning. It monitors sensors for real-time data and improves route planning for machinery. It also coordinates produce movement to reduce delays and waste, enhancing productivity and sustainability.

Understanding the code below is not a priority for this book. I just want to show you an example of all the facts of the project:

% FARMERS(owner)
farmer(ana).
farmer(asdrubal).
farmer(miguel).
farmer(joao).
farmer(teresinha).
farmer(victor).
farmer(carlos).
farmer(anabela).

% FARMS(name, owner, region, type)
farm(q1, ana, alentejo, vinha).
farm(q2, ana, alentejo, olival).
farm(q3, asdrubal, lisboa, cenoureira).
farm(q4, asdrubal, lisboa, milharal).
farm(q5, asdrubal, lisboa, vinha).
farm(q6, miguel, evora, trigal).
farm(q7, miguel, evora, cenoureia).
farm(q8, miguel, evora, vinha).
farm(q9, miguel, evora, morangueira).
farm(q10, joao, porto, vinha).
farm(q11, joao, porto, trigal).
farm(q12, joao, porto, cenoureira).
farm(q13, teresinha, algarve, olival).
farm(q14, teresinha, algarve, vinha).
farm(q15, victor, setubal, olival).
farm(q16, victor, setubal, vinha).
farm(q17, victor, setubal, trigal).
farm(q18, carlos, sintra, milharal).
farm(q19, carlos, sintra, vinha).
farm(q20, anabela, coina, milharal).
farm(q21, anabela, coina, olival).
farm(q22, anabela, coina, trigal).

% SENSOR READINGS(name, type, value)
sensor_reading(q1,humidity,28).
sensor_reading(q2,humidity,35).
sensor_reading(q3,humidity,42).
sensor_reading(q4,humidity,38).
sensor_reading(q5,humidity,33).
sensor_reading(q6,humidity,45).
sensor_reading(q7,humidity,30).
sensor_reading(q8,humidity,36).
sensor_reading(q9,humidity,50).
sensor_reading(q10,humidity,41).
sensor_reading(q11,humidity,40).
sensor_reading(q12,humidity,44).
sensor_reading(q13,humidity,32).
sensor_reading(q14,humidity,29).
sensor_reading(q15,humidity,47).
sensor_reading(q16,humidity,39).
sensor_reading(q17,humidity,53).
sensor_reading(q18,humidity,27).
sensor_reading(q19,humidity,24).
sensor_reading(q20,humidity,31).
sensor_reading(q21,humidity,37).
sensor_reading(q22,humidity,46).
sensor_reading(q1, temperature, 25).
sensor_reading(q2, temperature, 25).
sensor_reading(q3, temperature, 25).
sensor_reading(q4, temperature, 25).
sensor_reading(q5, temperature, 25).
sensor_reading(q6, temperature, 25).
sensor_reading(q7, temperature, 25).
sensor_reading(q8, temperature, 25).
sensor_reading(q9, temperature, 25).
sensor_reading(q10, temperature, 25).
sensor_reading(q11, temperature, 25).
sensor_reading(q12, temperature, 25).
sensor_reading(q13, temperature, 25).
sensor_reading(q14, temperature, 25).
sensor_reading(q15, temperature, 25).
sensor_reading(q16, temperature, 25).
sensor_reading(q17, temperature, 25).
sensor_reading(q18, temperature, 25).
sensor_reading(q19, temperature, 25).
sensor_reading(q20, temperature, 25).
sensor_reading(q21, temperature, 25).
sensor_reading(q22, temperature, 25).
sensor_reading(q1, water, 47000).
sensor_reading(q2, water, 52500).
sensor_reading(q3, water, 39000).
sensor_reading(q5, water, 61000).
sensor_reading(q8, water, 58000).
sensor_reading(q10, water, 43000).
sensor_reading(q13, water, 72000).
sensor_reading(q16, water, 49000).
sensor_reading(q18, water, 35000).
sensor_reading(q21, water, 66500).
sensor_reading(q1, ph, 6.5).
sensor_reading(q2, ph, 4.7).
sensor_reading(q3, ph, 8.2).
sensor_reading(q4, ph, 7.0).
sensor_reading(q5, ph, 5.1).
sensor_reading(q6, ph, 8.0).
sensor_reading(q7, ph, 4.5).

% DISTRIBUTORS (name, region, capacity, demand level)
distributor(d1, alentejo, 1000, 2).
distributor(d2, lisboa, 800, 1).
distributor(d3, evora, 1200, 3).
distributor(d4, porto, 900, 2).
distributor(d5, algarve, 700, 2).
distributor(d6, setubal, 1100, 1).
distributor(d7, sintra, 950, 2).
distributor(d8, coina, 1000, 1).

% TRANSPORTS (name, capacity, type, autonomy, region, impact)
transport(t1, 1000, fossil, 100, alentejo, 3).
transport(t2, 500, electric, 10, alentejo, 1).
transport(t3, 800, fossil, 400, algarve, 5).
transport(t4, 700, hybrid, 300, setubal, 2).
transport(t5, 150, electric, 340, coina, 1).
transport(t6, 700, fossil, 220, porto, 3).
transport(t7, 900, hybrid, 350, evora, 2).
transport(t8, 1000, electric, 170, sintra, 1).

% Connections based on graph image

% Top of the network
link(q2, d1, 5).
link(q1, d1, 7).
link(q3, d1, 6).

% Network center
link(q3, q4, 8).
link(q4, d2, 6).
link(q4, d3, 7).
link(q4, q5, 5).
link(q4, d4, 6).

% Additional connections
link(q2, d2, 8).
link(q3, d3, 7).

This Prolog code models an agricultural supply chain system that has:

Farmers
Farms
Sensors Readings
Distributors
Transports

In addition, in this part of the code on the facts of the system:

% Top of the network
link(q2, d1, 5).
link(q1, d1, 7).
link(q3, d1, 6).

% Network center
link(q3, q4, 8).
link(q4, d2, 6).
link(q4, d3, 7).
link(q4, q5, 5).
link(q4, d4, 6).

% Additional connections
link(q2, d2, 8).
link(q3, d3, 7).

We connect farms with distributors. This way, we can see that between the farm q1 and distributor d1 is a distance of 7k. This makes it possible to find/create algorithms to find the shortest path between them.

In the end, symbolic AI just creates programs based on a context and rules applied to that context.

What is Non-Symbolic AI?

Non symbolic AI doesn’t use symbols or rules to think. Instead, it’s data driven. In other words, it learns patterns from large datasets. This is the approach used in machine learning and deep learning.

When we create an AI model, we can associate it with an API (Application Programming Interface) so that we can use the AI model in websites, applications, and other systems. Basically, the trained AI model is set up behind an API endpoint. An API endpoint is like a web service that lets other applications send requests to the model and get responses back.

For example, when you use ChatGPT in a web browser, your messages are sent through OpenAI's API to their language model, which processes your input and sends back a response.

An AI agent is a software program that can autonomously perform tasks by making decisions and taking actions to achieve specific goals.

Unlike basic chatbots that only reply to questions, AI agents can plan steps, use tools, and work towards achieving complex goals. They do this by combining language models with extra features like accessing outside data or working with other AI agents.

Here’s an example of a non-symbolic AI agent project I worked on. I developed it using the crewAI Python library and the OpenAI API, one of the most popular libraries for creating AI agents.

In this system, five AI agents collaborate to create optimized content:

Research and Fact Checker: Conducts research to find trends and data.
Audience Specialist: Analyzes audience needs for better engagement.
Lead Content Writer: Writes engaging content based on research.
Senior Editorial Director: Ensures content quality and consistency.
SEO Specialist: Optimizes content for search engines.

Using the OpenAI API, it employs chatGPT with crewAI to have these agents work for me.

Before AI: Control Theory as the “First AI”

Before symbolic and non symbolic AI, electrical engineering had data-driven methods. One key area that I’ve already mentioned above was control theory (which studies control systems for machines like cars and rockets). This field allows us to design systems that ensure stability despite disturbances and achieve goals beyond human capabilities.

Nowadays, after creating a control theory algorithm, we check if AI can improve the control system. In my experience, only some advanced deep learning methods are effective. Most machine learning methods don't outperform control theory in efficiency and security.

Control theory also offers better interpretability, allowing us to understand decisions, unlike advanced machine learning and deep learning.

Due to the historical importance of control theory, I will continue to mention its role and mathematical applications. This will help you learn AI's math foundations and understand its significance in electronic systems and AI applications in engineering beyond dataset predictions.

Chapter 4: Linear Algebra - The Geometry of Data

Photo by Nothing Ahead.

Linear algebra is like having organized containers for data.

Instead of playing with individual numbers, we can pack them into structured boxes that are easier to handle. These structured boxes are called matrices.

When you have a lot of variables like customer data, sensor readings, or images, these structured boxes are very helpful. Also, what we can do when we play around with these boxes is very valuable.

In AI, linear algebra is everywhere. Take matrices, for example – a key concept in Linear Algebra. LLMs perform many matrix multiplications as their core operation. The data that they take in is also organized into matrices. In image recognition, matrices are used to represent pixels of images.

So as you can see, this core Linear Algebra concept is important to understand. Let's start!

What Are Matrices and Why Do They Simplify Equations?

Very often, systems in the real world can be simplified and modeled with a system of equations.

Those equations are often differential equations of many orders. But to simplify, let’s choose a very simple system like the one below:

$$\begin{align} 2x + 3y - z &= 7 \ x - 2y + 4z &= -1 \ 3x + y + 2z &= 10 \end{align}$$

When dealing with many variables and equations, writing each equation separately quickly becomes frustrating. Matrices provide a compact way to represent these systems.

For example, here’s the system above as a single matrix equation:

$$\begin{bmatrix} 2 & 3 & -1 \ 1 & -2 & 4 \ 3 & 1 & 2 \end{bmatrix} \begin{bmatrix} x \ y \ z \end{bmatrix} = \begin{bmatrix} 7 \ -1 \ 10 \end{bmatrix}$$

By seeing systems of equations as matrices, we can use linear algebra techniques to understand how the system behaves.

Some of these techniques are:

Linear Independence, Dependence, and Rank
Determinants
Eigenvalues and Eigenvectors

So to summarize:

A real world system can be represented as a system of equations
A system of equations can be compressed in a structured manipulable form called a matrix.
With matrices and linear algebra techniques, we can understand how the system works.

This way, we can study the basic behavior of a system with Linear Algebra.

For complex systems like a rocket, Linear Algebra is still the foundation. More advanced tools from control theory are used, but understanding simpler systems is essential for modeling and creating complex ones.

Vectors and Transformations: Moving in Multiple Directions

Vectors are matrices with a single row or a single column. You can also think of them as the building blocks of AI. They represent things like data points, model parameters, and much more.

For example, every data input (like an image or sentence) becomes a vector that the model can processes.

Here are two examples of vectors:

$$\mathbf{A} = \begin{bmatrix} 4 & -2 & 7 & 1 & 5 \end{bmatrix}$$

And:

$$\mathbf{B} = \begin{bmatrix} 3 \ -1 \ 8 \ 0 \ -4 \end{bmatrix}$$

All operations that you can perform on matrices can also be performed on vectors.

In Python, we can represent this by:

import numpy as np

# Define vectors A and B
A = np.array([4, -2, 7, 1, 5])
B = np.array([3, -1, 8, 0, -4])

We’re using the NumPy library because it makes math with arrays easy and fast.

As a simplification of a system of equations, a vector with a single row represents:

$$\mathbf{A} = \begin{bmatrix} 4 & -2 & 7 & 1 & 5 \end{bmatrix}$$

And this represents this system of equations:

$$4x_1 - 2x_2 + 7x_3 + x_4 + 5x_5 = k$$

A vector with a single column represents:

$$\mathbf{B} = \begin{bmatrix} 3 \ -1 \ 8 \ 0 \ -4 \end{bmatrix}$$

Which represents this system of equations:

$$\begin{align} x_1 &= 3 \ x_2 &= -1 \ x_3 &= 8 \ x_4 &= 0 \ x_5 &= -4 \end{align}$$

Now let’s see some matrix operations.

For example:

$$\mathbf{A} + \mathbf{B}^T = \begin{bmatrix} 4 & -2 & 7 & 1 & 5 \end{bmatrix} + \begin{bmatrix} 3 & -1 & 8 & 0 & -4 \end{bmatrix} = \begin{bmatrix} 7 & -3 & 15 & 1 & 1 \end{bmatrix}$$

vector_addition = A + B
print("A + B =", vector_addition)

Which gives the result of the equation above.

Often, vector addition is used to combine features. For example, adding many user preference vectors creates a profile of a user.

Here’s a scalar multiplication:

$$3\mathbf{A} = 3\begin{bmatrix} 4 & -2 & 7 & 1 & 5 \end{bmatrix} = \begin{bmatrix} 12 & -6 & 21 & 3 & 15 \end{bmatrix}$$

scalar_mult = 3 * A
print("3 * A =", scalar_mult)

Which gives the result of the equation above.

In AI, scaling vectors is usually done to adjust relevancy. For example, if we do a scalar product multiplication of a vector by 100, it means we are increasing its value. If it is by 0.3, it means we are reducing its importance.

Here's an outer product multiplication:

$$\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} 4 \ -2 \ 7 \ 1 \ 5 \end{bmatrix} \times \begin{bmatrix} 3 & -1 & 8 & 0 & -4 \end{bmatrix} = \begin{bmatrix} 12 & -4 & 32 & 0 & -20 \ -6 & 2 & -16 & 0 & 8 \ 21 & -7 & 56 & 0 & -28 \ 3 & -1 & 8 & 0 & -4 \ 15 & -5 & 40 & 0 & -20 \end{bmatrix}$$

And here’s a dot product multiplication (also called a dot product):

$$\mathbf{A} \cdot \mathbf{B}^T = \begin{bmatrix} 4 & -2 & 7 & 1 & 5 \end{bmatrix} \cdot \begin{bmatrix} 3 & -1 & 8 & 0 & -4 \end{bmatrix}$$

$$= 4 \cdot 3 + (-2) \cdot (-1) + 7 \cdot 8 + 1 \cdot 0 + 5 \cdot (-4) = 50$$

We mainly use dot products when we want to measure similarity, or alignment between two vectors.

In machine learning, in one simple phrase, it gives us a measure of similarity.

import numpy as np

dot_product = np.dot(A, B)
print("A · B =", dot_product)

Which gives the result of the equation above.

Linear Independence, Dependence, and Rank: Why It Matters

A lot of times, matrices can be made smaller and simpler. So it’s a good practice to reduce a matrix to its simplest form before we start to analyze its properties.

When each row of a matrix can be made with other rows, then that matrix is linearly dependent. This means the matrix can be further modified.

This way, a matrix has the property of linear independence when its rows cannot be created by combining each other.

For example, when we have a complex matrix like this one:

$$C = \begin{bmatrix} 1 & 2 & 3 & 4 \ 2 & 4 & 6 & 8 \ 1 & 3 & 5 & 7 \ 0 & 1 & 2 & 3 \end{bmatrix}$$

We can, with calculations, convert to this:

$$C_{\text{reduced}} = \begin{bmatrix} 1 & 0 & -1 & -2 \ 0 & 1 & 2 & 3 \ 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 \end{bmatrix}$$

if you are not familiar with row reduction, I recommend this YouTube video.

The above simplified matrix is the same thing as this:

$$C_{\text{reduced}} = \begin{bmatrix} 1 & 0 & -1 & -2 \ 0 & 1 & 2 & 3 \end{bmatrix}$$

This way, we conclude that the C matrix has a rank of 2.

In other words, since the simplest form of the matrix has only 2 rows with numbers, it has a rank of 2.

From this, we can conclude that the reduced version of the matrix is linearly independent. This is because no row or column can be made from the existing rows or column. It’s the simplest possible matrix.

The original matrix C is linearly dependent because some rows are just multiples or combinations of other rows. For example, row 2 of the original matrix C is exactly row 1 multiplied by 2.

Another way of seeing this is that we have 4 rows in the original matrix and the rank of matrix C is 2. Since they are not equal, C is linearly dependent.

Why are these concepts important?

Linear independence and rank are important in engineering because they show whether equations, represented as matrices, give unique information. In electrical circuits and control systems, knowing that equations, represented as matrices, are independent ensures that you have unique solutions and avoids confusion.

The matrix rank shows the maximum number of independent equations that can exist. This help engineers model the simplest possible form of the systems.

In LLMs like ChatGPT, Gemini, Grok, and Claude, linear independence, dependence, and rank are used in a very important technique called LoRA (Low-Rank Adaptation).

LoRA (Low-Rank Adaptation) is widely used to calibrate these models to make sure they adapt efficiently to new tasks or domains without retraining the full model. Also, there are variants of this technique, like Quantized LoRA. This way, in many data centers, LoRA saves energy, water for cooling, and so many other things.

Determinants: Measuring Space and Scaling

Why are determinants important?

Determinants tell us if a system of equations has infinite solutions, no solutions, or if it has a unique solution without having to solve the whole system.

This way, instead of immediately trying to solve a complex system, we can first use the determinant to find out if it is even worth solving in the first place.

Many engineers don’t really understand the importance of the determinant. The only thing they know is the formula and how to apply it.

So now let’s learn, with some examples, what exactly the determinant is and why it matters.

A determinant is just a number. It’s always calculated from a square matrix. By calculating the determinant, we can find certain properties about the system it represents.

The determinant of a given matrix A:

$$A = \begin{bmatrix} a & b \ c & d \end{bmatrix}.$$

can be represented by two notations:

$$\det(A) = ad - bc$$

$$|A| = ad - bc$$

Both are the same thing.

Let's see how to calculate a determinant:

$$|A| = \begin{vmatrix} 2 & 3 \ 1 & 4 \end{vmatrix} = (2)(4) - (3)(1) = 8 - 3 = 5.$$

Let’s see how to do this in Python:

import numpy as np

# Define the matrix
A = np.array([
    [2, 3],
    [1, 4]
])

# Calculate the determinant
det_A = np.linalg.det(A)

print("Determinant of A:", det_A)

The same calculation works for other matrices!

Here's the determinant formula for a 3×3 matrix:

For a 3 by 3 matrix:

$$|B|= \begin{vmatrix} a & b & c \ d & e & f \ g & h & i \end{vmatrix} = aei + bfg + cdh - ceg - bdi - afh.$$

Now let’s apply the formula to an example:

$$|B| = \begin{vmatrix} 1 & 2 & 3 \ 0 & 4 & 5 \ 1 & 0 & 6 \end{vmatrix} = (1)(4)(6) + (2)(5)(1) + (3)(0)(0) - (3)(4)(1) - (2)(0)(6) - (1)(5)(0)$$

Assessing each term:

$$= (1)(4)(6) + (2)(5)(1) - (3)(4)(1) = 4 \cdot 6 + 2 \cdot 5 - ( 3 \cdot 4) = 24+10-12 = 22$$

In Python code:

import numpy as np

# Define the matrix
B = np.array([
    [1, 2, 3],
    [0, 4, 5],
    [1, 0, 6]
])

# Calculate the determinant
det_B = np.linalg.det(B)

print("Determinant of B:", det_B)

Now, let’s visualize matrix A by plotting its column vectors. Each column will become a vector: (3,1) and (-2,4). This shows us geometrically what the matrix is actually doing.

In a geogebra graph, it gives us this:

As we can see, the vectors define how each variable influences the system. By visualizing what the matrices are doing, we can find patterns that are harder to find just by looking at formulas.

What does this mean visually?

It means that in the space, this is what our matrix looks like. It’s also how our system of equations is represented.

C1 represents the “force“ or the impact the variable x1 has. And C2 does the same thing for the variable x2.

Now we’ll focus on a 3D matrix example. This matrix D represents a system of three equations with three variables:

$$D = \begin{bmatrix} 2 & -1 & 3 \ 4 & 0 & -2 \ -1 & 5 & 1 \end{bmatrix}$$

$$\begin{align} 2x_1 - x_2 + 3x_3 &= p \ 4x_1 + 0x_2 - 2x_3 &= q \ -x_1 + 5x_2 + x_3 &= r \end{align}$$

Each column can be described as a separate vector:

$$\begin{equation} D = \left[ D_1 \mid D_2 \mid D_3 \right] = \left[ \begin{bmatrix} 2 \ 4 \ -1 \end{bmatrix} \mid \begin{bmatrix} -1 \ 0 \ 5 \end{bmatrix} \mid \begin{bmatrix} 3 \ -2 \ 1 \end{bmatrix} \right] \end{equation}$$

As we can see, D was decomposed in 3 new column vectors:

$$\begin{equation} D_1 = \begin{bmatrix} 2 \ 4 \ -1 \end{bmatrix} \end{equation}$$

and:

$$\begin{equation} D_2 = \begin{bmatrix} -1 \ 0 \ 5 \end{bmatrix} \end{equation}$$

and:

$$\begin{equation} D_3 = \begin{bmatrix} 3 \ -2 \ 1 \end{bmatrix} \end{equation}$$

In a geogebra graph, it gives us this:

In 3D, each vector points in its own direction. Together, they organize three planes. Where all three planes touch is the solution to the system.

This is a key advantage of matrices and linear algebra. They help us visualize both simple and complex systems, enhancing systems thinking and first principles thinking.

The determinant is directly connected to these visualizations. For example, in 2D it measures the area that the vectors stretch over. Now we’ll see how that’s possible.

Let's use matrix A and see what its determinant looks like in geometric terms:

$$A = \begin{bmatrix} 2 & 3 \ 1 & 4 \end{bmatrix}$$

Which can be decomposed into 2 vectors u and v:

It gives us this determinant:

$$|A| = \begin{vmatrix} 2 & 3 \ 1 & 4 \end{vmatrix} = (2)(4) - (3)(1) = 8 - 3 = 5.$$

Now let’s see the determinant visually.

From (2,1) and (3,4), we can draw vectors parallel to u and and v. These are called u' and v' and have the same magnitude. They meet at (5,5), and we have a parallelogram that’s completed with these points: (0,0),(2,1),(3,4),(5,5)

The area of the parallelogram is the determinant:

Let’s see another example.

Let’s use a matrix F and see what it truly is:

$$F = \begin{bmatrix} 1 & 2 \ 2 & 4 \end{bmatrix}$$

It gives us this determinant:

$$|F| = \begin{vmatrix} 1 & 2 \ 2 & 4 \end{vmatrix} = (1)(4) - (2)(2) = 4 - 4 = 0$$

In geogebra, we can see that:

Now let’s try to see the determinant visually:

We can conclude that the area is 0.

Now let’s use a matrix G and see what it truly is:

$$G = \begin{bmatrix} 1 & 5 \ 2 & 3 \end{bmatrix}$$

It gives us this determinant:

$$|G| = \begin{vmatrix} 1 & 5 \ 2 & 3 \end{vmatrix} = (1)(3) - (5)(2) = 3 - 10 = -7$$

In geogebra, we can see that:

Now let’s try to see the determinant visually.

From (1,2) and (5,3), we can draw vectors parallel to u and and v. These are called u' and v' and have the same magnitude. They meet at (6,5). A parallelogram is completed with these points: (0,0),(1,2),(5,3),(6,5)

Again, the area of the parallelogram is the determinant:

We just saw that the determinant is the area of a parallelogram formed by the vectors. When the determinant is 0, there is no area. In other cases, there is an area. But what does this mean, and why do we care about these different values?

When the det = 0:

The vectors are linearly dependent (one can be written as a combination of the others)
They lie on the same line or one is a scaled version of the other
The parallelogram collapses to a line, hence zero area
This tells us the matrix has no inverse
Systems of equations either have no solution or infinitely many solutions

When the det ≠ 0 (det > 0 or det < 0):

The vectors form a proper parallelogram with an area
- If det > 0, the area is positive and transformation preserves orientation
- If det < 0, the area is negative and the orientation is flipped
The vectors are linearly independent
Systems of equations have exactly one solution

In electrical engineering, determinants help verify if a control system is controllable and observable.

Control systems use matrices a lot. For this reason, checking if their determinants are zero or non-zero tells engineers:

If it is controllable, it means the system is reachable, which helps in stabilization and performance optimization.
If it is observable, it means the system is measurable, which helps in fault detection and system monitoring.

In finite element analysis, a very popular math tool to solve partial differential equations, determinants helps figure out quickly if the calculations will give reliable results.

This way, with finite element analysis, we can design safer buildings, optimize aircraft wings, and simulate medical implants – all of which have a large impact on human lives and safety.

In machine learning, determinants are crucial to understanding data transformations. In these methods, if a determinant with a value of zero shows up, it means you are losing information and can't recover original data.

Also in deep learning, it’s used to decide the first parameters of neural networks (weight initialization) to prevent problems like the vanishing/exploding gradients.

In a 3×3 matrix, the determinant represents the volume of a parallelepiped (a 3D "box") formed by three vectors in 3D space.

If det = 0: The three vectors lie in the same plane, so they don't span any 3D volume
If det ≠ 0: The vectors form a proper 3D shape with actual volume

The absolute value |det| gives you the exact volume of that parallelepiped.

For example, if you have vectors a, b, and c, the determinant tells you how much 3D space they "fill up" when you use them as the edges of a box.

This is where it gets fascinating:

4×4 matrix: The determinant represents the "hypervolume" of a 4D parallelepiped formed by four vectors in 4-dimensional space.
1000×1000 matrix: The determinant represents the hypervolume in 1000-dimensional space!

So, to summarize, the determinant tells us easily if there are no solutions, infinite solutions, or exactly one solution in a system of equations, represented by a compact matrix.

What Are Mathematical Spaces and How Do They Simplify Calculations?

We now have a great foundation to understand the rest of this chapter on linear algebra.

Now, we will see see how a linearly independent matrix create something called a basis. Also, we will see that a basis is just a a set of building blocks for mathematical spaces!

The row vectors of a linearly independent matrix form a basis.

For example in matrix A, which is linearly independent:

$$A = \begin{bmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & 1 & 0 \ 0 & 0 & 0 & 1 \end{bmatrix}$$

forms this set:

$$((1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1))$$

In this case, since matrix A is linearly independent, the set of matrix rows is called a basis. From this basis, you can create endless linear combinations of any other vector. The collection of all these possible combinations is called a mathematical space.

A mathematical space is an infinite set where all linear combinations of a basis exist. Its called a basis because these vectors form the base to express any vector in the space as a linear combination.

This matrix B is linearly independent:

$$B = \begin{bmatrix} 1 & 0 \ 0 & 1 \ \end{bmatrix}$$

And forms this set:

$$((1, 0), (0, 1))$$

And from this come all possible points in this cartesian coordinate system:

For example, mathematically, we can get the point (2,3) by:

$$(x=2, y=3) = 2(1, 0) + 3(0, 1) = (2, 0) + (0, 3) = (2, 3)$$

Note: There are other bases for the cartesian coordinate plane. I chose this one because it’s the easiest to understand.

Eigenvalues and Eigenvectors: Unlocking Hidden Patterns

Eigenvalues and eigenvectors, in my opinion, are far simpler than what mathematics professors make them out to be at university:

Eigenvalues tell you how much a matrix stretches or shrinks things.
Eigenvectors tell you which directions stay unchanged when the matrix transforms them.

This way, a matrix may have one or many eigenvalues which in turn result in many eigenvectors.

Let’s see an example:

For a square matrix A, eigenvalue λ, and eigenvector v:

$$Av=λv$$

The easiest way to find the eigenvalue is to calculate this:

$$det(A−λI)=0$$

or:

$$|A−λI|=0$$

Again, we have different notations for the determinant, but they’re the same thing.

Anyway, let’s define a very simple matrix A:

$$A = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}$$

Now let’s make some calculations.

This formula:

$$det(A−λI)=0$$

Can be decomposed into:

$$det(\begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix} - λ \times \begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}) = 0$$

Which is the same has:

$$det(\begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix} - \begin{bmatrix} λ & 0 \ 0 & λ \end{bmatrix}) = 0$$

Which gives us:

$$det(\begin{bmatrix} 2-λ & 0 \ 0 & 3-λ \end{bmatrix}) = 0$$

By the calculations we made above on the determinant, we can conclude that:

$$(2-λ) \times (3-λ) = 0$$

Which is the same has:

$$2-\lambda = 0 \text{ or } 3-\lambda = 0$$

Which gives us these eigenvalues:

$$\lambda_1 = 2, \quad \lambda_2 = 3$$

And these eigenvectors:

$$\mathbf{v_1} = \begin{bmatrix} 1 \ 0 \end{bmatrix}, \quad \mathbf{v_2} = \begin{bmatrix} 0 \ 1 \end{bmatrix}$$

This means that in the Cartesian coordinate system:

By applying the eigenvectors, we can see that:

The eigenvalue 2 is associated with the eigenvector v1:

$$A\mathbf{v_1} = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}\begin{bmatrix} 1 \ 0 \end{bmatrix} = \begin{bmatrix} 2 \ 0 \end{bmatrix} = 2\begin{bmatrix} 1 \ 0 \end{bmatrix}$$

The eigenvalue 3 is associated with the eigenvector v2:

$$A\mathbf{v_2} = \begin{bmatrix} 2 & 0 \ 0 & 3 \end{bmatrix}\begin{bmatrix} 0 \ 1 \end{bmatrix} = \begin{bmatrix} 0 \ 3 \end{bmatrix} = 3\begin{bmatrix} 0 \ 1 \end{bmatrix}$$

Here is the Python code to calculate this:

import numpy as np

# Define matrix A
A = np.array([[2, 0],
              [0, 3]])

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:")
print(eigenvalues)

print("Eigenvectors (columns):")
print(eigenvectors)

Eigenvalues and eigenvectors are key tools in engineering and machine learning because they reveal a matrix's fundamental behavior. Although a matrix transformation might seem complex, in reality:

Eigenvalues show how much stretching or compression occur.
Eigenvectors identify the special directions where this stretching happens most naturally.

In machine learning, we can use Principal Component Analysis (PCA) to make datasets smaller.

So, for example, let's say you’re building a machine learning application to predict heart disease. You have 100 data categories and 1 target variable telling whether a person has it or not.

With PCA, you can convert the 100 categories into, say, 40 categories. This way, you can make a smaller machine learning model and save computational resources.

PCA uses eigenvectors of covariance matrices to find important directions in data with many variables. It reduces data size without losing much detail, helping machine learning algorithms focus on key features and ignore unnecessary information.

Applications of Linear Algebra in AI and Control Theory

‌Linear algebra serves as the mathematical foundation for all engineering fields.

In addition, the principles of matrices and linear transformations provide the computational foundation that makes modern AI possible while enabling the control of complex systems.

All LLMs, from ChatGPT and Claude to Gemini and Grok, rely on linear operations.

All these systems carry out huge matrix multiplications to handle and create human language. So, when you type something into ChatGPT, probably millions of matrix multiplications are happening as you wait for a response!

In control theory, especially in an area called state-space control theory, matrices make it possible to create complex controllers. Linear algebra helps engineers design controllers for things like aircraft autopilots and robotic systems, among other applications

For example, when a rocket adjusts its trajectory or a drone maintains stable flight, many matrix multiplications are happening to determine the best way to guarantee the system’s stability.

Thanks to GPUs, linear algebra matrices are very efficient to compute. Also, any new matrix multiplication algorithms or special hardware for faster linear operations can greatly enhance AI and control systems.

In the end, linear algebra is the hidden mathematical engine powering the current AI revolution.

Chapter 5: Multivariable Calculus - Change in Many Directions

Photo by ThisIsEngineering

Limits and Continuity: Understanding Smooth Change

Calculus is one of the most valuable areas of mathematics and it focus on the study of continuous change.

Before we start learning a topic that makes many people give up on engineering degrees, I want to once again assure you that this chapter is very easily explained with a lot of images and code examples.

Also, just like linear algebra, many concepts in calculus are components of tools that have helped create billion-dollar industries.

What is continuity?

Before going and explaining topics like derivatives and integrals, we need to understand continuity.

In simple terms, continuity means that a function has no breaks, jumps, or holes.

Essentially, you can draw it without lifting your pencil from the paper.

For example, this function is continuous:

You can draw this graph without taking the pencil off the paper.

The above graph is represented by this function:

$$y = x^2 - 4x + 3$$

But the below function is not continuous:

This one, you can’t draw without taking the pencil off the paper.

It’s represented by this piecewise function:

$$y = \begin{cases} 1.5 + \frac{1}{x+1} & \text{if } -1 < x < 2 \ 2 + \frac{2}{(x-1)^2} & \text{if } x > 2 \end{cases}$$

This piecewise function is essentially two individual functions for two different intervals of numbers. Since calculus is the study of continuous change, we can only realistically use it in continuous functions.

How do limits guarantee continuity?

We can only use tools like derivatives and integrals if a function is continuous.

How can we describe mathematically that a function is continuous – like drawing it without lifting our pencil from the paper?

Limits solve that problem.

When we take the limit of a function at a given point, we're asking: what value does a function approach as we get close to that point?

Let's look at some examples of this function at these points and also understand the notation used in limits:

What is the limit of the point x=0?

It is 3. It actually crosses the y axis.

In mathematical notation,

$$\begin{align} \lim_{x \to 0} (x^2 - 4x + 3) &= (0)^2 - 4(0) + 3 \ &= 0 - 0 + 3 \ &= 3 \end{align}$$

In this notation, we're asking what the value of the y function is as x gets very close to 0. Think of x as being at 0.00000000000001 or -0.00000000000001. It gets so close that we can consider it near enough.

What is the limit of the point x=1?

Le’s see another example:

In this case, it’s 0.

$$\begin{align} \lim_{x \to 1} (x^2 - 4x + 3) &= (1)^2 - 4(1) + 3 \ &= 1 - 4 + 3 \ &= 0 \end{align}$$

In this notation, we're asking what the value of the y function is as x gets very close to 1. Think of x as being at 0.99999999999999 or 1.00000000000001. It gets so close that we can consider it near enough.

What is the limit of the point x=2?

Le’s see another example

Here, it’s -1.

$$\begin{align} \lim_{x \to 2} (x^2 - 4x + 3) &= (2)^2 - 4(2) + 3 \ &= 4 - 8 + 3 \ &= -1 \end{align}$$

Some more quick examples:

What is the limit of the point x=3?

In this notation, we're asking what the value of the y function is as x gets very close to 1. Think of x as being at 1.99999999999999 or 2.00000000000001. It gets so close that we can consider it near enough.

What is the limit of the point x=4?

It is 0.

What is the limit of the point x=5?

It is 3.

Now let’s see another example:

In the point x=2, it’s not well defined

If we draw with a pencil from the left to x=2, we end up with 1.83333
If we draw with a pencil from the right to x=2, we end up with 4

Why are limits important to understand derivatives and integrals?

As we have seen, when we talk about limits, we are talking about a value that symbolizes the value that a function approaches as it comes toward a particular point.

It’s critical to note that we're not looking at the value of that point itself. We’re looking at what happens as we get so near to it that we can pin down what value the function is approaching.

I will now show a very simple example to demonstrate this concept using mathematical notation.

I know that limits can be a difficult concept to understand at first. But if you understand limits very well, then you'll be well-prepared to understand derivatives and integrals.

And, as you’ll see, derivatives are responsible for modern AI and integrals are important parts of tolls widely used in billion-dollar industries.

I want you to understand the intuition behind this.

The function z(x) is continuous:

$$z(x) = \frac{3x + 7}{x + 2}$$

So to what value does this expression converge as x approaches infinity?

If you have a background in math, you might see why. But here for those who aren’t sure:

It converges to 3.

This time, the limit will be approaching infinity instead of a constant:

$$\begin{align} \lim_{x \to \infty} \frac{3x + 7}{x + 2} \end{align}$$

Let’s solve this in a very simple way:

For x = 1:

$$f(1) = \frac{3(1) + 7}{1 + 2} = \frac{10}{3} \approx 3.333...$$

For x = 5:

$$f(5) = \frac{3(5) + 7}{5 + 2} = \frac{22}{7} \approx 3.143...$$

For x = 10:

$$f(10) = \frac{3(10) + 7}{10 + 2} = \frac{37}{12} \approx 3.083...$$

For x = 50:

$$f(50) = \frac{3(50) + 7}{50 + 2} = \frac{157}{52} \approx 3.019...$$

For x = 100:

$$f(100) = \frac{3(100) + 7}{100 + 2} = \frac{307}{102} \approx 3.010...$$

For x = 1000:

$$f(1000) = \frac{3(1000) + 7}{1000 + 2} = \frac{3007}{1002} \approx 3.001...$$

For x = 10000:

$$f(10000) = \frac{3(10000) + 7}{10000 + 2} = \frac{30007}{10002} \approx 3.0001...$$

As x gets bigger and bigger, we get closer and closer to 3.

This is the main idea of limits: Describe the value a function approaches as the input approaches some point.

This same idea applies to derivatives: they’re just limits that measure rates of change (slopes of tangent lines).

And as well, Integrals are just limits that measure accumulated quantities (areas under curves)..

Let’s now see how derivatives work in depth.

Derivatives: How Things Change and How Fast

As I said before, derivatives are just limits that measure rates of change (slopes of tangent lines).

But what does this actually mean?

Let’s see an example:

What is the rate of change in the point A?

Hard question right? Let’s think how to answer this with limits.

We can find the limit of the rate of change in point A(0.72, 0.66), also called the instantaneous rate of change.

Let’s do that:

To find the slope, we take the coordinates of the points B(0.2, 0.2) and C(1.6, 1):

$$\text{slope} = \frac{1 - 0.2}{1.6 - 0.2} = \frac{0.8}{1.4} = \frac{4}{7} \approx 0.571$$

This gives us a rate of change:

$$y=0.571x + 0.084$$

Let's approximate more:

Let’s also zoom in:

To find the slope, we use the coordinates of the points B(0.58, 0.55) and C(0.85, 0.75):

$$\text{slope} = \frac{0.85- 0.58}{0.75 - 0.55} = \frac{0.27}{0.2} = \frac{2.7}{2} \approx 1.35$$

It gives us a rate of change:

$$y=1.35x + 0.11$$

Now let's approximate a lot:

To find the slope, we use the coordinates of the points B(0.7242549, 0.6625776) and C(0.7242884, 0.66260026):

$$\text{slope} = \frac{0.66260026- 0.6625776}{0.7242884- 0.7242549} = \frac{0.0000226}{0.0000335} = \frac{0.226}{0.335} \approx 0.674$$

Now let’s zoom out:

As we can see, we are so close that we can consider the limit of the rate of change to be 0.65.

It gives us the rate of change:

$$y=0.674x + 0.12$$

This way, the limit of a rate of change is called a derivative.

To recap, here is an animation:

Here’s a Python code example that lets you find the derivative in point A:

import sympy as sp

x = sp.symbols('x')
f = sp.sin(x)

# Derivative of sin(x)
derivative_of_sin = sp.diff(f, x)

# Evaluate at x = 0.72 and x = 0.66
val = f_prime.subs(x, 0.72).evalf()

print("Derivative of sin(x) at x=0.72:", val)

The function that had the point A is called a sine wave.

We convert it to its derivative function. From there we have our rate of change at point 0.72.

When we do math by hand, we usually have many rules to convert a function to its derivative, and from these find the rate of change for a given point.

Before seeing it, let’s look at a very simple example to understand the definition of a derivative:

$$\frac{d}{dx}f(x) \approx \frac{f(\textcolor{green}{x + h}) - f(\textcolor{red}{x - h})}{\textcolor{green}{x + h} - \textcolor{red}{x - h}} = \frac{f({x + h}) - f({x - h})}{2h}$$

h represents a small difference.

The derivative is the slope of the function’s small change near a point. In other words, it’s the limit of the rate of change of a given point.

A simple derivative transformation might look like this one:

$$\frac{d}{dx}x^n = nx^{n-1}$$

Two examples are:

$$\frac{d}{dx}x^3 = 3x^2$$

And:

$$\frac{d}{dx}x^5 = 5x^4$$

There are many more. But we won’t go into deep detail on this topic.

Where and why are derivatives so important?

Derivatives are one of the most important math tools out there. They serve as the foundation for understanding change across nearly all fields of STEM.

In physics (classical mechanics), derivatives are very important to find new information that draws on information that’s already made available.

For example, knowing how a body's position changes over time allows us to use derivatives to find its velocity and acceleration. This is crucial for self-driving cars, trains, rockets, and more.

Also, derivatives are the foundation of understanding how electricity works in depth. Without derivatives, there would’ve been no electromagnetic theory. Without electromagnetic theory, modern technology would not exist.

In machine learning, derivatives are so important that they served to create the algorithm that is one of the most important components of ChatGPT and others AI models. (backpropagation).

Backpropagation is in fact so important that its creators, John Hopfield and Geoffrey Hinton, won the 2024 Nobel Prize in Physics for it.

Also, autonomous vehicles like Tesla and Waymo use AI models called neural networks that depend on backpropagation to work.

It’s awesome that a math concept created in the 17th century is now one of the foundations of the current AI revolution.

What About Integral Calculus?

Before explaining derivatives further, I will ask you a question:

How can we find the area of the below shape?

In other words how can we find the integral of the function in the given interval?

Let’s see how to do it step by step.

First, we’ll try using 2 rectangles to approximate the area behind the curve:

Now the area of the rectangles is 6.282573.

But there is still a lot of error…

As we can see, the left rectangle does not cover completely the curve and the right rectangle covers too much.

So we’ll add more smaller rectangles so that we can better approximate the curve.

Now let’s try using 4 rectangles:

Now the area is 6.497481. But there’s still some error.

As we can see, the error is getting smaller. In other words, the 4 rectangles cover the area of the curve better than just the 2 rectangles. But there’s still a lot of room to make it better.

Let’s try using 8 rectangles:

Now the area is 6.604935.

How about using 16 rectangles?

Now the area is 6.658662.

Let’s try using 32 rectangles:

Now the area is 6.685525.

Now how about using 64 rectangles:

Now the area is 6.698957.

And using 128 rectangles:

Now the area is 6.705673.

What about using 256 rectangles:

Now the area is 6.709031. And the error has reached 0.0000!

Now let’s see an animation of this:

As you can see, we can approximate the area by having a limit to infinity to the number of rectangles to approximate the area.

This way, we can conclude that:

$$F(x) = \int_0^{3.14} f(x) , dx = \int_0^{3.14} (\sin(x) + 1.5) , dx = 6.71$$

This means that the area between 0 and 3.14, limited by the math equation, is 6.71!

Or, mathematically, the integral of f(x) in the interval 0 and 3.14 is 6.71.

Where and how is this applied?

In electrical engineering, integrals calculate total energy use in circuits by integrating power over time. For example, when designing a power supply for a device, engineers integrate the power to determine total energy costs and heat absorption requirements.

In other words, they see the area over time and how much power is used.

Let's see an example:

Imagine that in the image above:

The X axis can be the time in months.
The Y axis is the power used in Watts (Joules per second).

We can conclude that in 3.14 months(3 months and 4 days) the total amount of energy is 6.71 watt-months.

Here is the code to find that out:

# Import libraries
import numpy as np
import matplotlib.pyplot as plt

# Create Function
x = np.linspace(0, 3.14, 100)
y = np.sin(x) + 1.5

# Find the area under the function
area = np.trapezoid(y, x)

# Show the final image
plt.fill_between(x, y)
plt.title(f'Area = {area:.2f}')
plt.show()

In this code, we import the libraries, create the function, and find the area and plot it.

We used numpy.trapezoid to find the area, because it’s a numerical approximation to quickly find the integral of a function between two x values.

numpy.trapezoid uses a numerical approximation method called the composite trapezoidal rule.

The basic idea of the composite trapezoidal rule is to divide the area under the curve into many trapezoids and sum all of them.

If you want to learn more about this, I recommend reading the NumPy documentation on this method.

From this value, we can convert to other units:

52,400,000 joules
14.6 kWh

By converting to other units, we can more easily compare this device with other devices and see if it obeys any technical standards and laws.

This is a real-life application of integrals in engineering.

In my degree, I used this a lot in classes related to power engineering. In simple words, power engineering is a subfield of electrical engineering focused on working with electricity with very high voltage values and electric motors.

In audio compression, the Fourier transform (built on integrals) decomposes sound waves into frequency components. MP3 encoders use this to identify and remove frequencies humans can't hear. This reduces file sizes while preserving quality.

Medical imaging relies on the Radon transform, which uses integrals to reconstruct 3D images from 2D X-ray projections. When you get a CT scan, the machine takes hundreds of X-ray "slices" at different angles. During this process, integrals combine "slices" into a detailed cross-sectional image of your body.

Applications in AI and Control Theory: Calculus in Action

Modern AI depends on derivatives that use the backpropagation algorithm.

When training a neural network, the system calculates partial derivatives of the error with respect to millions of parameters. This way, find out how to adjust each weight to improve performance. Without this, large language models like ChatGPT couldn't learn from data.

PID controllers, which stabilize the temperature in your oven or maintain altitude in aircraft autopilot systems, combine calculus ideas:

The proportional term responds to the current error.
The integral term accumulates past errors to eliminate steady-state drift.
The derivative term predicts future trends to prevent overshooting.

And these are just some of the applications of calculus!

Chapter 6: Probability & Statistics - Learning from Uncertainty

Photo by Armando Are

It’s thanks to probabilities and statistics that many industries have grown so much. With statistics, we can make informed decisions and optimize many different processes. With probabilities, we can understand and model uncertainty in systems and, in this way, solve or even avoid problems.

While you may be familiar with some of the key concepts like median and mean, we’ll start with some basics to build up your intuition on more advanced stuff like the central limit theorem, Bayes’ theorem, and Markov chains.

Mean, Median, Mode: Measuring Central Tendency

Let's imagine you are a data scientist working in research. You’re going to work with data to optimize the output of farms in the Central Valley in California.

The idea is to take in a bunch of data, and by studying it, you can help farmers make better decisions.

Here’s the data from one year of activity:

Farm	Yield (tons/ha)	Fertilizer Used (kg/ha)	Rainfall (mm)
A	4.2	150	280
B	5.8	220	420
C	3.9	120	230
D	6.1	250	480
E	4.7	200	340
F	5.3	200	390

We have 6 farms in our dataset. For each farm, we know:

How much yield was obtained in tons per hectare
How much fertilizer was used in kilograms per hectare
How much rainfall happened during a year of activity

Now, let’s answer some questions we might have about the data to understand the mean, mode and median:

1. What is the average yield during one year of activity?

To find the average, we just need to sum all the yield values and divide by the number of farms. Like this:

$$\text{Mean} = \frac{4.2 + 5.8 + 3.9 + 6.1 + 4.7 + 5.3}{6} = \frac{30}{6} = 5$$

This is what is called the mean. The mean is just the sum of all values divided by how many values there are.

In Python, we can do the following to calculate the mean:

def calculate_mean(values):
    return sum(values) / len(values)

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
result = calculate_mean(data)
print(f"Mean: {result}")

2. What is the mode of fertilizer used?

The mode is just the most popular value in a given dataset. In our case, it’s 200 since that’s the most common value that appears in our farm dataset.

In Python, we can do this to calculate the mode:

import statistics

def calculate_mode(values):
    return statistics.mode(values)

# Example usage
data = [150, 220, 120, 250, 200, 200]
result = calculate_mode(data)
print(f"Mode: {result}")

3. What is the median of the yield?

The median is just the value in the middle of a set of numbers. If the number of elements in the list is even, we take the mean of the two middle numbers. Here are our current yield values:

$$4.2, 5.8, 3.9, 6.1, 4.7, 5.3$$

First, we sort the values:

$$3.9, 4.2, 4.7, 5.3, 5.8, 6.1$$

Since we have 6 values (even number), the median is the average of the two middle values:

$$\text{Median} = \frac{4.7 + 5.3}{2} = \frac{10}{2} = 5$$

In Python we can do this to calculate the median:

import statistics

def calculate_median(values):
    return statistics.median(values)

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
result = calculate_median(data)
print(f"Median: {result}")

Variance and Standard Deviation: Measuring Spread

Knowing the mean, mode, and median of data is helpful. But it’s also important to know how far away data points are from each other.

That’s where measures of dispersion come in. Variance tells us, on average, how far numbers are from the mean.

Let’s see an example of how to calculate this.

Given yield data from the table:

$$4.2, 5.8, 3.9, 6.1, 4.7, 5.3$$

The first step is the calculate the mean:

$$\bar{x} = \frac{4.2 + 5.8 + 3.9 + 6.1 + 4.7 + 5.3}{6} = \frac{30}{6} = 5$$

The second step is to calculate the variance with the sample variance formula:

$$s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$$

Let's apply the formula little by little to understand how it works.

We will first we will calculate the variance of each yield data point:

$$\begin{align*} (4.2 - 5.0)^2 &= (-0.8)^2 = 0.64 \ (5.8 - 5.0)^2 &= (0.8)^2 = 0.64 \ (3.9 - 5.0)^2 &= (-1.1)^2 = 1.21 \ (6.1 - 5.0)^2 &= (1.1)^2 = 1.21 \ (4.7 - 5.0)^2 &= (-0.3)^2 = 0.09 \ (5.3 - 5.0)^2 &= (0.3)^2 = 0.09 \end{align*}$$

Then we will sum all the squared differences:

$$\sum(x_i - \bar{x})^2 = 0.64 + 0.64 + 1.21 + 1.21 + 0.09 + 0.09 = 3.88$$

Now, we will finally find the variance:

$$s^2 = \frac{3.88}{6-1} = \frac{3.88}{5} = 0.776$$

The standard deviation is just the square root of the variance.

$$s = \sqrt{s^2} = \sqrt{0.776} \approx 0.881 tons/ha$$

Why is this useful?

It puts the spread back into the same units as the data, making it easier to interpret.

A small standard deviation means the data huddles close to the mean, while a large one means it’s widely scattered.

And here is a code example of how to calculate both:

import statistics

def calculate_variance_and_std(values):
    variance = statistics.variance(values)
    std_dev = statistics.stdev(values)
    return variance, std_dev

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
variance, std_dev = calculate_variance_and_std(data)
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")

What Is the Normal Distribution? The Bell Curve of Life

The normal distribution tells us how data naturally converges around the average value. Most values are focused on the center, and extreme values are more to the edges. This creates a bell curve.

By understanding this distribution, we can understand other distributions and also the central limit theorem.

To understand what normal distribution is, let’s look at it:

The normal distribution looks like like a mountain.

As you can see, most values are around the mean. Also, in and around the mean is the peak. Toward the extremes, the curve gets lower and lower. This means that in the extremes there are fewer and fewer values.

Normal distribution also has a formula associated with it:

$$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right)$$

I won’t go in depth into how the formula works here. I just want you to understand the main idea behind the concept.

There are many other distributions besides the normal distribution. Some of the most common are:

Chi-squared distribution
Student’s t distribution
Bernoulli distribution
Binomial distribution
Poisson distribution

Each distribution can model different events and phenomenons. For example the Chi-squared distribution is widely used to find the correlation between two phenomenons (sunburns and skin cancer, for example).

The Poisson distribution is also used in modeling counts of events, like the number of clients that enter a store per hour or the number of data packets that are transmitted in a Ethernet cable.

But it’s also possible to approximate a lot of distributions to the normal distribution using one of the most important theorems in all of mathematics: the central limit theorem. This is what we will explore next.

How the Central Limit Theorem Helps Approximate the World

Photo by Porapak Apichodilok

The main idea of the central limit theorem is very simple:

Most distributions can be approximated to become the normal distribution.

This is just like pouring sand into a funnel. Grains may fall randomly, but over time the pile of sand will always begin to form the shape of a mountain.

This way, we can take many data points and average them. Over time, it will converge to become a normal distribution.

In other words, when independent random variables are all summed together, their sum tends toward a normal distribution.

Here is the formula:

$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right) \quad \text{or equivalently} \quad Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \approx N(0, 1)$$

You don’t need to understand in depth what it means. Just understand that it’s a theorem that approximates other distributions to the normal distribution.

And why is this important?

Because this theorem makes many billion-dollar industries possible.

Instead of testing every single possible scenario, we can test for a smaller amount of scenarios and assume that if it works for the smaller one, it will work for the bigger one.

For example, in telecommunications, instead of testing every possible phone call or data transmission, we can just test a few connections. If it works for those few connections, we can assume it will work for millions of phone and data transmissions.

For clinical trials, instead of testing a drug on millions of people, we can just test a smaller number of patients. If it works for a (relative) few patients, we can assume it will work on most people with the same condition.

Without this idea, clinical trials would not be possible. The same with telecommunications and so many other areas of engineering.

Bayes Theorem: Learning from Evidence

Now we’ll start looking at probability more in depth based on the data table we have been using.

Here’s the table again so that you can reference it more easily:

Farm	Yield (tons/ha)	Fertilizer Used (Kg/ha)	Rainfall (mm)
A	4.2	150	280
B	5.8	220	420
C	3.9	120	230
D	6.1	250	480
E	4.7	200	340
F	5.3	200	390

Now there are a lot of ideas and formulas related to probabilities. But here, I want to explain to you the core ones that are applied in AI and give you a high-level definition of things.

We’ll start with conditional probability, which is foundational to understanding Bayes’ theorem. Then we’ll get to the extended Bayes’ theorem formula.

So, let's get started!

What is Conditional Probability?

Photo by KOUSHIK BALA

Conditional probability is the probability that an event will happen given that another event has already taken place.

Confused? Don't worry! Let's see an example:

Let’s say that:

A = Farm has rainfall above or equal 400 mm
B = Farm has a yield above or equal to 5.0 tons/ha

Here is the formula for Conditional Probability:

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

Now let’s see this formula more in detail:

$$P(A)$$

This represents the probability that a farm has rainfall above or equal to 400 mm.

We have 6 farms, and 2 of them (farm B and D) have a rainfall above or equal to 400 mm.

So, the probability that a farm has rainfall above or equal to 400 mm is:

$$P(A) = \frac {2}{6} = \frac {1}{3} ≈ 0.33$$

Now let’s see for event B:

$$P(B)$$

This represents the probability that a farm has a yield above or equal to 5.0 tons/ha.

We have 6 farms and 3 of them (farm B, D and F) have a yield above or equal to 5.0 tons/ha.

So, the probability that a farm has a yield above or equal to 5.0 tons/ha is:

$$P(B) = \frac {3}{6} = \frac {1}{2} = 0.5$$

What about if we want to see both conditions’ probabilities at the same time?

$$P(A \cap B)$$

This refers to the probability of A and B being both true.

In our example, in means the probability that a farm both has a rainfall above or equal to 400 mm and a yield above or equal to 5.0 tons/ha.

We have:

6 farms and 2 of them (farm B and D) have a rainfall above or equal 400 mm
6 farms and 3 of them (farm B, D and F) have a yield above or equal to 5.0 tons/ha

For A and B to be true, only 2 farms (farm B and D) have both conditions.

This way:

$$P(A \cap B) = \frac {2}{6} = \frac {1}{3} ≈ 0.33$$

Now we’re ready to find out the conditional probability:

$$P(A|B)$$

This means the probability of A, knowing that B is true.

In our example, we can conclude that:

$$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.33}{0.5} = 0.66$$

So, the probability that a farm has rainfall above or equal 400 mm – knowing that it has a yield above or equal to 5.0 tons/ha – is 0.66

Bayes’ Theorem

This is one of the most important theorems in mathematics.

Bayes’ theorem is a formula that tells us how to change the probability of a prediction when new verified data becomes available.

In other words, it’s like a rule that tells us how to update our beliefs when new evidence appears.

Now, based on what we already know, let’s see how Bayes’ Theorem works.

Here is its formula:

$$P(B|A) = \frac{P(A|B) \cdot P(A)}{P(B)}$$

Now, based on the previous values, we can very easily find the probability of B, given that A is true.

In other words, the probability that a farm has a yield above or equal to 5.0 tons/ha given that is has a rainfall above or equal to 400 mm.

Let’s find the answer:

$$P(B|A) = \frac{P(A|B) \cdot P(A)}{P(B)}= \frac{0.66 \cdot 0.33}{0.5}=0.44$$

So, the probability that a farm has a a yield above or equal to to 5.0 tons/ha, knowing it rained equal to or more than 400 mm, is 44%.

Now that we’ve gone through this formula step by step, hopefully it doesn’t feel as complex.

Where is this applied in real life?

As with many math ideas in this book, Bayes' Theorem has applications in many business sectors.

For example, what is the best way to make a control system for a self-driving car, robot, or really any other device?

One effective approach is to use a Kalman filter. Kalman filters rely heavily on Bayes' Theorem to handle control systems with incomplete data.

Kalman filters have a lot of applications in engineering. For example, thanks to Kalman filters, commercial jets can fly safely on autopilot.

So as you can see, Bayes’ Theorem is the foundation of many control systems used in risky industries.

What Are Markov Models? Predicting the Next Step, One Step at a Time

Photo by lil artsy

How do you predict the future with math? Markov chains allow you to do this to a certain degree.

For this reason, Markov chains are widely used in science, engineering, economics, and many other areas.

In addition to this, Markov decision processes are a very important foundation for reinforcement learning. Reinforcement learning is a branch of AI where agents learn to make decisions by interacting with an environment to maximize rewards.

In this section, I’ll introduce you to Markov chains and decision processes with an analogy, a plain English explanation, and a code example.

If you want to dive in further, I recommend my freeCodeCamp article on the subject.

Markov Chain Analogy

Imagine that you want to predict the weather tomorrow, and it only depends on the weather today. The weather can be either sunny or rainy.

Here are the probabilities:

If it's sunny today, there's an 80% chance that it will be sunny again tomorrow, and a 20% chance that it will be rainy.
If it's rainy today, there's a 50% chance that it will be sunny tomorrow, and a 50% chance that it will be rainy.

In this scenario, we can predict future states of the weather based on current states using probabilities.

This idea of predicting the future based solely on probabilities of the present is called a Markov chain.

Here, the states are either sunny or rainy and the probabilities describe the chances of the weather changing based on the current state.

Markov Chain Explained in Plain English

A Markov chain describes random processes where systems move between states, and a new state only depends on the current state, not on how it got there.

Mathematically, Markov chains are called stochastic models because they model (simulate) real life events that are random by nature (stochastic).

Markov chains are popular because they are easy to implement and efficient at modeling complex systems.

Another key advantage is their "memoryless" property. This makes it faster to run on computers, and powerful to study random processes and make predictions based on current conditions.

Applications of Markov Chains

Photo by Google DeepMind

At some level, almost all real-life events are stochastic. In other words, they involve randomness and uncertainty.

This is exactly why they are so widely used.

They can predict the behavior of systems based on current conditions:

In finance, they are used to detect changes in credit ratings for forecasting market regimes.
In genetics, they help understand how proteins change over time (which is important when studying genetic variations).

These real life examples show how effective Markov chains can be used to solve real problems in different fields.

In AI, Markov chains are used to model an environment like a factory or home. Modeling an environment with Markov chains is called a Markov decision process.

Using a Markov decision process, it’s possible to use reinforcement learning to create and optimize agents to act in the environment.

Of course, new and better variants of the Markov decision process have appeared over the years. But the key idea here is that it is thanks to Markov decision processes that the basis for reinforcement learning exists.

Reinforcement learning is widely used in advertising systems, logistics, robotics, video games, and many more applications.

Types of Markov Chains

There are many types of Markov chains. In this section, we'll only discuss the most important variants.

Discrete-Time Markov Chains (DTMCs)

In DTMCs, the system changes state at specific time steps. They are called discrete because the state transitions occur at distinct, separate time intervals.

They are used in queuing theory (study of the behavior of waiting lines), genetics, and economics because they are simple to analyze.

Continuous-Time Markov Chains (CTMCs)

CTMCs differ from DTMCs in that state transitions can occur at any continuous time point, not at fixed intervals.

This makes them stochastic models where state changes happen continuously. This is important in chemical reactions and reliability engineering.

Reversible Markov Chains

Reversible Markov chains are special. The process of state change is the same whether the direction is forwards or backwards, like rewinding a video and playing it again.

This property makes it easier to know when a system is stable and study how a system behaves over time. They are widely used in statistical physics and economics

Doubly Stochastic Markov Chains

Doubly stochastic Markov chains are defined by a transition probability matrix. In the matrix, the sum of the probabilities in each row and each column equals 1.

This means each row and each column represent a valid probability distribution. In other words, each row and column represent a list of chances for different outcomes.

This property is crucial in quantum computing and statistical mechanics.

Thanks to Doubly stochastic Markov chains, systems change in a way that preserves probabilities and symmetry, making the modeling and analysis of quantum computing systems far more accurate.

Hidden Markov Chains Code Example

Photo by Kevin Ku

Before we jump into code examples, let’s first understand what Hidden Markov Chains are.

The main idea behind hidden Markov chains is to model systems that have hidden states (states for which we don’t know their values) which can only be discovered through observable events.

In other words, hidden Markov chains allow us to predict the behavior of a system by:

Considering the likelihood of moving from one state to another.
Knowing the probability of observing a certain event from each state

We can understand this by observing how the states change from an indirect point of view.

We may not know the states’ original values. But by knowing the way they change, we can predict what their values will be in the future.

This way, hidden Markov chains are flexible in modeling sequences, capturing both the transitions between hidden states and the observable outcomes.

Because of this, hidden Markov models are used in fields such as engineering, financial modeling, speech recognition, bioinformatics, and many more.

Code Example:

In this code example, we’ll see a simple example with synthetic data.

Here is the full code:

import numpy as np
from hmmlearn import hmm

# Set random seed for reproducibility
np.random.seed(42)

# Define the HMM parameters
n_components = 2  # Number of states
n_features = 1    # Number of observation features

# Create a Gaussian HMM
model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag")

# Define transition matrix (rows must sum to 1)
model.startprob_ = np.array([0.6, 0.4])
model.transmat_ = np.array([[0.7, 0.3],
                            [0.4, 0.6]])

# Define means and covariances for each state
model.means_ = np.array([[0.0], [3.0]])
model.covars_ = np.array([[0.5], [0.5]])

# Generate synthetic observation data
X, Z = model.sample(100)  # 100 samples

# Create a new HMM instance
new_model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag", n_iter=100)

# Fit the model to the data
new_model.fit(X)

# Print the learned parameters
print("Transition matrix:")
print(new_model.transmat_)
print("Means:")
print(new_model.means_)
print("Covariances:")
print(new_model.covars_)

# Predict the hidden states for the observed data
hidden_states = new_model.predict(X)

print("Hidden states:")
print(hidden_states)

Now let’s break the code down block by block:

Import libraries and set random seed:

import numpy as np
from hmmlearn import hmm

np.random.seed(42)

In this block of code, we imported two Python libraries:

NumPy: For numerical operations.
hmmlearn: For hidden Markov model implementation.

Next we defined a random seed with the NumPy library. A random seed is a value used to start a pseudorandom number generator.

With a fixed random seed, we can ensure that the sequence of pseudorandom numbers generated is always the same. This allows us to duplicate experiments and verify results.

The specific value of the seed doesn’t matter as long as it remains consistent.

Define the HMM parameters and create a Gaussian HMM:

n_components = 2  # Number of states
n_features = 1    # Number of observation features

model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag")

In this code block, we created an HMM with two hidden states and a single observed variable.

covariance_type "diag" means the matrices that represent covariance (how two variables change together) are diagonal. In other words, each row and column is assumed to be independent of the others.

This implies that the probability distributions of each row and column are independent of each other.

But there is still something strange when we defined the hidden Markov chain:

What does “Gaussian“ mean?

This is a very big topic in statistics, but in a few words, Markov chains can only be created when we specify the transition probabilities (chances of moving from one state to another in a Markov chain) and an initial probability distribution.

A Gaussian HMM assumes events are initially modeled by a Gaussian distribution, also called a normal distribution!

And recall, we have already seen before what a normal distribution is.

Here is it again:

From a normal distribution and other components, we can create a hidden Markov chain. And hidden Markov chains serve as a foundation for systems that affect millions of lives.

Define transition matrix, means, and covariances for each state:

model.startprob_ = np.array([0.6, 0.4])
model.transmat_ = np.array([[0.7, 0.3],
                            [0.4, 0.6]])

model.means_ = np.array([[0.0], [3.0]])
model.covars_ = np.array([[0.5], [0.5]])

model.startprob_ = np.array([0.6, 0.4])

This line sets the initial state probabilities for a Hidden Markov Model (HMM). It points out that there is a 60% probability of starting in state 0 and a 40% probability of starting in state 1.

model.transmat_ = np.array([[0.7, 0.3], [0.4, 0.6]])

This line of code sets the state transition probability matrix for the HMM.

The matrix specifies the probabilities of moving from one state to another:

From state 0, there is a 70% chance of staying in state 0 and a 30% chance of transitioning to state 1.
From state 1, there is a 40% chance of transitioning to state 0 and a 60% chance of staying in state 1.

model.means_ = np.array([[0.0], [3.0]])

This line sets the mean values for the observation distributions in each state.

It indicates that the observations are normally distributed with a mean of 0.0 in state 0 and a mean of 3.0 in state 1.

model.covars_ = np.array([[0.5], [0.5]])

This line sets the covariance values for the observation distributions in each state.

It specifies that the variance (covariance in this 1-dimensional case) of the observations is 0.5 for both state 0 and state 1.

Create data, new HMM instance, and fit the model with the data:

X, Z = model.sample(100)  # 100 samples

new_model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag", n_iter=100)

new_model.fit(X)

print("Transition matrix:")
print(new_model.transmat_)
print("Means:")
print(new_model.means_)
print("Covariances:")
print(new_model.covars_)

In this code, we created a model with 100 samples, iterated it 100 times, and printed the new state transition matrix, means, and covariances.

In other words, we:

Generated 100 samples from the original model
Fitted a new HMM to these samples.
Printed the learned parameters of this new model.

What do X and Z mean here?

X means the observed data samples generated by the original model, while Z means the hidden state sequences corresponding to the observed data samples generated by the original model.

The transition matrix prints out:

[[0.8100804  0.1899196 ]
 [0.49398918 0.50601082]]

Which means that the model tends to stay in state 0 and has nearly equal chances of switching or staying when in state 1.

The means print out:

[[0.01577373]
 [3.06245496]]

Which means that the average observed value is approximately 0.016 in state 0 and 3.062 in state 1.

The covariances print out:

[[[0.41987084]]
 [[0.53146802]]]

Which means that the observed values vary by about 0.420 in state 0 and 0.531 in state 1.

This way, we may never know the exact values of the states, but we know their average observed value and how they vary and tend to change with each other.

Predict the hidden states for the observed data:

hidden_states = new_model.predict(X)

print("Hidden states:")
print(hidden_states)

In this code, based on the X observed data samples, we predicted the new states of the Markov model.

The hidden states print out:

[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 0 0 0 1
 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0]

Which means that the hidden states switch between state 0 and state 1, showing how the system changes states over time.

Applications in AI and Control Theory: Making Decisions Under Uncertainty

Photo by capt.sopon

I have been giving you a high-level overview of the field of probabilities and statistics. As I explained before, I wanted to make the explanations simple to understand.

As someone with a bachelor's degree in electrical and computer engineering, I can assure you that while this chapter seems simple, in probabilities and statistics, things can get very complicated very quickly.

Many more concepts like:

p-values
Advanced Monte Carlo methods
Bayesian networks
Statistical hypotheses

Are not as straightforward as the ideas I’ve just told you about.

But as it is, probability and statistics are the starting points for making decisions where uncertainty exists in AI and control theory.

For example, the Bayes’ theorem, besides being the foundation of the Kalman filter, is also the foundation of many probabilistic models in the field of AI. Probabilistic models are usually used in quant firms and banks to model risk.

In control theory, probabilities and statistics are widely used to design robust control systems (as is the case with Kalman filters).

So as you can see, the application of probabilities and statistics, as with calculus and linear algebra, is the foundation for many tools that impact millions of lives and move billions of dollars in the global economy.

Chapter 7: Optimization Theory - Teaching Machines to Improve

Photo by Pixabay

This is the most advanced math chapter of the book. To truly understand it, it’s very important that you’ve first read the other chapters first.

We’re going to examine a few machine learning methods, and I’ll show you some recipes of how machine learning is just the use of linear algebra, calculus, probabilities and statistics, and optimization theory.

Just like making a cake!

What is Optimization Theory?

In AI, optimization theory is responsible for the algorithms that optimize data-driven AI models.

Often, big companies invest millions in research to create or refine algorithms that make training AI models faster.

This way, companies save far more money than the upfront research costs when scaling to train multiple large AI models.

It is thanks to optimization theory that deep learning was able to scale efficiently, eventually leading to the creation of ChatGPT and many other large language models.

But why is that?

In all data-driven machine learning models, there is a learning phase that has to happen. That is, there’s a period where the algorithms make predictions that are not correct and then need to change some parameters to make sure the next predictions are correct – or at least closer to being correct.

Without optimization, machine learning algorithms don't get anywhere on their learning path to the right solution. Without optimization, they spend too much time on a learning path that won’t increase their ability to predict things the right way.

So, let’s start learning!

Why Optimization Drives Learning in AI

Photo by Alex Knight

Optimization theory is the mathematical foundation that allows algorithms to improve their performance over many iterations.

When we combine an algorithm with a path to change its parameters to meet a certain objective (done with an optimization method), it’s called a machine learning algorithm.

This learning process always involves minimizing or maximizing a certain objective. For example, for many machine learning algorithms, the main objective is to minimize errors. To do this, over many iterations, the optimization methods "tells" the internal components of an algorithm what to change after receiving feedback on how well it’s performing.

It’s like someone first learning how to drive a car. The first few times, it may be complicated. But after a while and some practice, the driver learns how to drive properly and not make the same mistakes they once did in the past with the help of the instructor.

The same applies to optimization methods when optimizing algorithms.

Types of Optimization Theory Methods in ML and Deep Learning

The field of optimization theory is huge! Just as with many fields of mathematics, it is constantly growing every year.

But for the purposes of this book, there are three main categories of optimization methods:

First-Order Methods

These are the most used in deep learning and in all LLM models like Gemini, Grok, and others.

They are called first-order methods because they all use the first derivative of functions. The first derivative of a function measures how much a function's output changes when its input changes very little. The most widely used in deep learning are advanced variants of gradient descent.

While there are many variants, here are some popular examples:

Standard batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent
RMSprop
Adam

In this chapter, we will look in depth at one of these methods called Adam (below).

Second-Order Methods

They are called second-order methods because they use information from second derivatives for better updates. There are many methods, like:

BFGS
L-BFGS
Newton's method

But these are not often used in machine and deep learning. While they optimize with fewer iterations, for the type of optimization problems algorithms in AI create (high-dimensional problems), they’re very computationally expensive.

So they’re not widely used like first-order optimization methods.

Zeroth-Order and Other Methods

These methods do not require derivatives to optimize algorithms. Some examples of algorithms where derivatives are not used are:

Genetic algorithms
Dynamic programming algorithms
Particle swarm optimization methods

The problem with these algorithms is that they are often very slow for many variables.

But in certain AI contexts, they can help optimize the architecture of deep learning models to improve AI models from an architectural point of view (instead of a parameter point of view).

How does optimization theory connect with linear algebra, calculus, and probability and statistics?

Essentially:

Calculus teaches you derivatives, which help you understand optimization theory.
Linear algebra teaches you matrices, which help you understand how different states relate and transform.
Probability and statistics teach you concepts like covariance and correlation, which help you understand how variables are connected with each other.

This way, with linear algebra and probability and statistics, you gain the knowledge necessary to understand the algorithms. With calculus you gain the basis to understand optimization theory and how it changes certain parameters of the fundamental algorithms to minimize/maximize a certain objective.

Simple Optimization Techniques: How Machines Learn Step by Step

Photo by LJ Checo

Now, we’re going to see examples of machine learning algorithms used for optimization and deconstruct them so that you can understand how these areas of mathematics apply to them.

In each example, I will explain their main idea with an analogy as well as how each math area is used in each algorithm.

Linear Regression

Imagine that you are solving a puzzle. To complete the puzzle, you need to arrange the pieces in the right design/order.

The same idea applies to linear regression.

We have matrices (linear algebra) that represent the parameters of the linear regression model and the data that flow into it.

And we can see over time how well the line is fitting the numbers, as well as its error (probabilities and statistics).

To find the best line for the linear regression, we need to know how much the parameters of the model need to change (calculus) and actually apply that change to the parameters (optimization theory).

This way, calculus tells us which direction to change the parameters, and optimization theory tells us how much to actually change them.

Let’s see how to code the linear regression above:

import numpy as np

np.random.seed(42)
X = np.linspace(0, 10, 50)
y_true = 3 * X + 2
noise = np.random.normal(0, 2, 50)
y = y_true + noise

w = 0.1 
b = 0.5
learning_rate = 0.01
iterations = [0, 1, 2, 3, 4, 5]
saved_states = []

for epoch in range(max(iterations) + 1):
    y_pred = w * X + b
    error = np.mean((y - y_pred) ** 2)
    
    if epoch in iterations:
        saved_states.append({
            'epoch': epoch,
            'w': w,
            'b': b,
            'y_pred': y_pred.copy(),
            'error': error
        })
    
    dw = -2 * np.mean(X * (y - y_pred))
    db = -2 * np.mean(y - y_pred)
    
    w = w - learning_rate * dw
    b = b - learning_rate * db

Let’s see the code block by block:

Import library:

import numpy as np

For this problem, we’ll import one of the most used Python libraries: NumPy (which we’ve worked with earlier in the book).

Create data points:

np.random.seed(42)
X = np.linspace(0, 10, 50)
y_true = 3 * X + 2
noise = np.random.normal(0, 2, 50)
y = y_true + noise

In this code, we define a base line that will help in generating the data points:

X = np.linspace(0, 10, 50)
y_true = 3 * X + 2

After this green line has been created, we will add noise to it to create the data points:

noise = np.random.normal(0, 2, 50)
y = y_true + noise

This is how we defined the data points for the line dataset.

Initializing linear regression parameters and others:

w = 0.1 
b = 0.5
learning_rate = 0.01
iterations = [0, 1, 2, 3, 4, 5]
saved_states = []

In this block of code, we initialize:

Linear regression parameters: Weight to be 0.1 and bias to be 0.5
One hyperparameter: Learning rate
How many iterations we are going to use to improve the linear regression
An array called saved_states to store values to later create graphs

This way, we start with this red line:

Making the linear regression learn with the data:

for epoch in range(max(iterations) + 1):
    y_pred = w * X + b
    error = np.mean((y - y_pred) ** 2)
    
    if epoch in iterations:
        saved_states.append({
            'epoch': epoch,
            'w': w,
            'b': b,
            'y_pred': y_pred.copy(),
            'error': error
        })
    
    dw = -2 * np.mean(X * (y - y_pred))
    db = -2 * np.mean(y - y_pred)
    
    w = w - learning_rate * dw
    b = b - learning_rate * db

It may appear complicated, but let’s see in smaller blocks:

For loop

for epoch in range(max(iterations) + 1):

Making an prediction and seeing its error

y_pred = w * X + b
error = np.mean((y - y_pred) ** 2)

In this block of the code, we find the values predicted for the current parameters and see its error from the real values.

Saving current iteration values for future statistics

if epoch in iterations:
     saved_states.append({
         'epoch': epoch,
         'w': w,
         'b': b,
         'y_pred': y_pred.copy(),
         'error': error
     })

Here we are juts storing in the saved_states array the values of the current iteration to later compute images.

Finding the gradients

dw = -2 * np.mean(X * (y - y_pred))
db = -2 * np.mean(y - y_pred)

In this block of code, we find the gradients values for the current prediction.

In other words, for the weight and bias, we find out how much they need to change in order to approximate better the values of the parameters to the data points.

Updating the parameters values

w = w - learning_rate * dw
b = b - learning_rate * db

Finally, we update the weight and the bias with the new values so that the line better approximates the data points:

Neural Networks

The same puzzle idea applies to neural networks. Neural networks are algorithmic models inspired by the brain that learn patterns from data. They are part of a machine learning field called deep learning, which uses neural networks to learn complex patterns.

Neural networks are important because they power modern AI applications like:

Image recognition
Language translation
Chatbots

For example, ChatGPT means Chat Generative Pre-trained Transformer. A transformer is an architecture of neural networks.

If you understand neural networks, you’ll understand the foundations that make ChatGPT work.

We have matrices (linear algebra) that represent the parameters of the neural network model and the data that flow into it.
And we can know over time how well the neural network model is converging to the dataset, fitting the numbers, and see its error (probabilities and statistics).
Calculus will tell us in which direction the parameters of the neural network need to change.
Optimization theory will tell us how much they need to change.

For example, this is a neural network:

This model has in total 13 parameters:

It has 10 lines(connections between circles). These are called weights.
It has 2 circles in the hidden layer and 1 in the output layer. Each circle has one bias.

Big question:

Imagine you work in a bank. You are in charge of deciding who gets credit cards or not. For that, you create the neural network above that takes 4 inputs:

Income
Credit score
Debt ratio
Bankruptcy history

With this neural network well optimized, you can figure it out!

Very simply, without going into things like activation functions, the network processes the 4 inputs through its weights and biases.

Each connection multiplies the input by its weight. After that, each node adds its bias.

The final output is a number between 0 and 1:

Numbers close to 0 mean "Not approved"
Numbers close to 1 mean "Approved"

For example, a high income figure, a good credit score, and no bankruptcy history data flow through the neural networks and produce 0.92. This means that it should be approved.

But a low income figure with a history of bankruptcy may produce 0.15, which results in a not approved.

In reality, bank systems and others have neural networks that take far more well-chosen parameters and decide this automatically.

This is precisely how AI can be used for credit approval.

But a question remains: What is the best way to know how much the parameters need to change?

In the next part, we are going to see the most famous optimization theory algorithm that will help us decide that.

What is Adam? The Most Popular Way AI Models Finds the Best Learning Path

Photo by Lum3n

To optimize neural network based AI models, one of the most popular methods is called Adam, which means Adaptive Moment Estimation.

The paper that introduced the method is one of the most influential in the 21st century in machine learning, with thousands of citations. As with all ideas in non-symbolic AI, Adam is a mixture of different math concepts.

It's composed of the ideas of two other optimization methods:

Momentum Gradient Descent: Accumulates velocity from previous gradients to move faster in consistent directions
Root Mean Square Propagation (RMSProp): Adapts learning rates based on recent gradient magnitudes

Let's understand them with an analogy.

Imagine that you are riding a bicycle down a mountain little by little. You already know the direction thanks to calculus.

But how do you descend safely without losing control or going too slowly?

First, you need to build up speed gradually using past momentum. This is one of the main ideas of momentum gradient descent.

It's also important that you adjust your speed based on the terrain's elevation. This is the main idea of RMSProp.

This way, you can safely accelerate and brake appropriately.

When optimizing a model with Adam, this is the same concept. With Adam, we want to optimize a model in a fast and stable way.

The momentum gradient descent ensures the fast part, and the RMSProp ensures the secure part.

Nowadays, for LLMs, which once again are just very big neural network models, a variant of Adam called AdamW is more often used.

Now, let's build a code example of using Adam.

Code example:

Using Adam, we are going to optimize this neural network based on fake data.

It will take 4 features:

Income
Credit score
Debt ratio
Bankruptcy history

And it will tell us if we should or should not approve credit for a given person.

Also, since this book is an introduction to the math of AI, I will not, in this code example, discuss hyperparameter optimization, regularization techniques, and other more advanced topics and good practices.

I want to show why this neural network fails with this data and explain the importance of using great data.

Here is the whole code (and we’ll see each part more in-depth below):

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, random_split
import pytorch_lightning as pl
import matplotlib.pyplot as plt

torch.manual_seed(42)
x = torch.randn(10000, 4)
y = torch.randint(0, 2, (10000, 1)).float()
dataset = TensorDataset(x, y)

train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

class CreditApprovalNet(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []
    
    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss
    
    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)

model = CreditApprovalNet()
trainer = pl.Trainer(max_epochs=100, logger=False, enable_checkpointing=False)
trainer.fit(model, train_loader, val_loader)

# 
plt.plot(model.train_losses)
plt.xlabel('Training Step')
plt.ylabel('Loss')
plt.title('Credit Approval Training')
plt.grid(True, alpha=0.3)
plt.show()

Now let’s break it down:

Importing libraries:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, random_split
import pytorch_lightning as pl
import matplotlib.pyplot as plt

In this block of code, we are importing code from 3 Python libraries:

PyTorch: One of the most popular python libraries to create new AI models in AI research
PyTorch Lightning: A PyTorch wrapper that organizes training code and handles repetitive tasks automatically
Matplotlib: One of the most popular python libraries to make graphs from data

Creating data:

torch.manual_seed(42)
x = torch.randn(10000, 4)
y = torch.randint(0, 2, (10000, 1)).float()
dataset = TensorDataset(x, y)

In this part, we define a seed to make the random numbers reproducible. In other words, when we run the code many times, the same random numbers will be generated.

Next, we will create 10,000 applications for credit with 4 features in X and their approval decisions in y. After that, we unify everything in the dataset variable.

We’ll use TensorDataset because it allows us to have the 4 features and the target paired together. This way, the data does not get mixed up during training.

Dividing data:

train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

In this block of code, we divide the data into a training dataset and a validation dataset.

This way, we have one dataset that’s being used to train and find the parameters while comparing results with the validation dataset.

As we can see, 80% of the data will be training data, and 20% of the data will be validation data.

Loading data:

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

Here, we load the data into data loaders for the AI model to use.

This way, we have the data automatically split into small batches and shuffled. So instead of processing all 10,000 data points, the model will be trained on one batch, improved, then another batch, then improved again, and so forth. That makes training go faster.

Creating AI model and training process:

class CreditApprovalNet(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []
    
    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss
    
    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)

This code block appears to be complicated, but let’s see each method block by block:

Creating the class with inheritance:

class CreditApprovalNet(pl.LightningModule):

This way, in one line, we can import everything we need to define both the model and how it will be trained.

init: Builds the model's layers and components:

    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []

In this section of the code, we are defining the architecture of the AI model.

forward: Processes input data through the network to make predictions:

    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))

In this part of the code, we are defining how data will flow in the AI model based on the architecture defined.

training_step: Calculates loss for each batch during training:

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss

Here, we are defining how the model will be trained. In other words, how we will find the best parameters for the model to predict well.

configure_optimizers: Sets the Adam optimizer with learning rate:

    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)

Finally, here we are defining what optimizer we are going to use to, step by step, improve the AI model parameters.

Training AI model:

model = CreditApprovalNet()
trainer = pl.Trainer(max_epochs=100, logger=False, enable_checkpointing=False)
trainer.fit(model, train_loader, val_loader)

In this block of code:

We create the neural network model in the first line
In the 2nd and 3rd line, we prepare the training settings and train the model for 100 epochs

This way, in the command line, this appears:

The PyTorch code is essentially telling us the number of parameters in the AI model!

Seeing results and understanding why they are not good:


plt.plot(model.train_losses)
plt.xlabel('Training Step')
plt.ylabel('Loss')
plt.title('Credit Approval Training')
plt.grid(True, alpha=0.3)
plt.show()

Using the Matplotlib library, we plot the results:

The AI model is not converging.

We can see that because the loss is nearly 0.7 (70%) over time.

The main reason the model is not converging well is that there is little to no relationship between the 4 features and the target variable.

In other words, we do not have good data.

The code works perfectly, but this shows the most important rule in machine learning: when we create an AI model, the MOST IMPORTANT thing is data.

It does not matter if you use a simple linear regression or a neural network based on transformers or whatever. If you do not have high quality data, the model is not going to perform well.

Even if we use a good optimizer, like Adam, it will not solve the data problem.

Next steps: Common beginner mistakes

I also wrote this exact code example to show you something very important: neural networks are not always the best models to use.

This is a very common beginner mistake. You may start with neural networks for everything, when often machine learning methods with little data preprocessing do the job well.

For this type of problem, the solution is to first try machine learning methods instead of going to neural networks.

There are many reasons for this, but the main ones are:

Machine learning methods are simpler and often quicker to train than neural networks
Machine learning methods are simpler to understand how they make decisions. In other words, we can understand how the machine learning model thought to make a prediction.
With computational learning, we can guess with certain machine learning models how well they will predict in the future and provide theoretical guarantees about their performance.

Another common mistake is not dividing the data.

To simplify, I created only a training and validation division of the data

In a serious project, you should always divide it into 3 parts: training, validation, and testing.

With training, you create the model. With validation, you test the model based on the data it was trained on. With the test dataset part, you compare if the loss of the model is similar to the validation or different. If they are very different, it means that the AI model converged to the validation dataset but not the test dataset.

I challenge you to think further about how you could improve this code and to try to make the synthetic data more correlated in order to improve its quality.

Applications in AI and Control Theory of Optimization Theory

Photo by Tara Winstead

Optimization theory serves as the engine behind AI and control systems that shape our lives.

From unlocking your phone with facial recognition to autopilot systems guiding planes, optimization algorithms are constantly at work.

When you ask ChatGPT a question, optimization theory determines the values of billions of parameters during training.

The same is true for all other LLMs like Gemini, Claude, Grok, DeepSeek, and others. All of them contain millions and millions of parameters. The only way to find the best combination of the parameters to achieve a certain objective is with optimization theory.

In control theory, many systems like Model Predictive Control (MPC) and adaptive control systems only work thanks to optimization methods that balance how internal components of the control system should work together

Beyond training neural networks and controlling physical systems, optimization powers recommendation systems, resource allocation, and so many other systems.

Some examples are:

Netflix movie recommendation system
Spotify's song suggestion system
Google systems to reduce data center cooling costs
Quantitative trading firms high-frequency trading systems

To end this final chapter, I’ll share this:

It is optimization theory that makes math models into AI models that impact the lives of millions worldwide.

Conclusion: Where Mathematics and AI Meet

Photo by AXP Photography

When ancient civilizations first carved numbers into clay tablets, they likely didn’t imagine that these symbols would one day allow humanity to create the scientific, technological, and medical marvels we have today.

Yet here we are.

We’re in an era where mathematical ideas developed over many centuries – even millennia – have converged to create artificial intelligence.

Throughout this book, we've traced a path from the most basic math concepts to the cutting edge of AI. We have seen how:

Matrices compress complex systems into simple forms
Derivatives measure change
Probability helps us navigate uncertainty
Optimization guides algorithms toward better decisions to learn faster.

We’ve also learned how each math field has helped create tools that are responsible for many of the things we take for granted today.

Mathematics is the Foundation of AI

Photo by Jeswin Thomas

Always remember this: AI is not pure magic or a "being" we don't understand. It’s just the combination of many math ideas working very well together.

When you ask a question of ChatGPT or any other LLM, it generates a response. And in the process of generating that response, there are millions of matrix multiplications happening in seconds.

Or, for example, when a self-driving car decides to stop moving because it’s coming up to a crosswalk, there are a lot of math computations (related to calculus and probability and statistics) working very fast to ensure safety.

The great thing about mathematics is that it’s a common, standard language of logic. No matter the backgrounds of people or where they were born, a derivative will always be a derivative, and the same thing goes for key AI concepts.

This way, scientists and engineers worldwide can improve each other's work because everyone understands the same language.

The Future: On Device AI and the Democratization of AI

Photo by Steve Johnson

One shift happening now is the move toward edge AI. That is, AI that runs locally on your phone, computer, and really in all your devices (rather than in distant data centers).

This way, privacy is guaranteed because it runs locally. Waiting times for AI models decrease because no data needs to be sent. AI can be used offline, and costs decrease.

And what about the massive data centers being built all over the world? Those will be used for more products that will help improve the lives of millions of people.

As AI becomes more local and more processing power is freed up from big data centers, new AI innovations will appear, and more benefits will come.

The same way that in the past century every computer got its own networking chip, every device will have (and in some cases, already has) AI accelerators.

And much of it will be thanks to the math you learned in this book.

Final Reflections

Isaac Newton wrote, "If I have seen further, it is by standing on the shoulders of giants."

Every algorithm you use, every model you train, and every new theorem you learn stands on centuries of mathematical progress. You now stand on those same shoulders of these giants!

Thank you for reading, and happy learning.

Here’s the full book GitHub repository with all the code.

Acknowledgements

First and foremost, I would like to thank Guilherme Mendes, currently a Master’s student in Electrical and Computer Engineering at NOVA University, specializing in Control Theory, for reviewing the mathematical and technical details of the 1st version of this book.

I am also grateful to the organizations that gave me opportunities to grow:

A special thank you goes to the freeCodeCamp editorial team**,** especially Abigail Rennemeyer, for their patience and for reviewing every chapter of this book.

I would also like to thank all the professors at NOVA FCT who have taught and guided me throughout my academic journey, especially those from the Department of Electrical and Computer Engineering.

About the Author

LinkedIn: https://www.linkedin.com/in/tiago-monteiro-
GitHub: https://github.com/tiagomonteiro0715
Email: monteiro.t@northeastern.edu

My name is Tiago Monteiro, and I’m now pursuing a master's degree in Artificial Intelligence at Northeastern University in the Silicon Valley Campus (San Jose) on a merit-based scholarship.

I’m not from the United States. I am a Portuguese national, born and raised in the district of Lisbon.

In Portugal, I completed a bachelor's degree in electrical and computer engineering at NOVA University, one of Portugal's best universities.

I have authored over 20 articles for freeCodeCamp, which have accumulated more than 240,000 views over the years, and completed the Deep Learning Specialization from DeepLearningAI, taught by Andrew Ng.

Also, I had the privilege of participating in the winter 2025 batch of the renowned Silicon Valley Fellowship program.

Why did I choose electrical and computer engineering?

After finishing the Portuguese national math exam in 12th grade, I chose Electrical and Computer Engineering (ECE) to challenge myself and learn new math on my own.

The ECE degree combined:

Advanced Mathematics
Programming (from Assembly to Python)
Physics (classical mechanics, electromagnetism)

What did I gain exactly?

I mastered the skills needed to quickly understand AI research, particularly after completing Andrew Ng's Deep Learning Specialization.

In Portugal, I also studied advanced STEM areas including, for example:

Partial Differential Equations for modeling real-world phenomena
Harmonic analysis (Fourier/Laplace transforms) for signal processing and alternative problem perspectives
Complex analysis involving derivatives and integrals in the complex domain
Numerical methods for approximating mathematical solutions computationally
Signal/control theory for ensuring system stability in dynamic environments
Physics classes in classical mechanics and electromagnetism fundamentals

While not directly applied to AI, these studies enhanced my systems thinking and ability to independently learn complex STEM concepts.

Learn Discrete Mathematics

Beau Carnes — Thu, 13 Nov 2025 14:34:56 +0000

Discrete mathematics plays a key role in machine learning and algorithms. You can use it to find the shortest path (graph theory), encrypt files, compress data, and to do many other things.

We just posted a discreet mathematics course on the freeCodeCamp.org YouTube channel. Karol Kurek teaches this course. He is a former math teacher and senior Python developer.

This field is constantly evolving along with the development of its key application: computer science. This course is an introduction to this group of mathematical sciences, and we will focus on the most important issues on which other branches of discrete mathematics are based: combinatorics, number theory, prime numbers, and several selected topics: pigeonhole principle, stars and bars principle, Stirling's number, and the Chinese remainder theorem.

At the end of the course, there are tips and encouragement for further exploration of this field.

Here are the sections in this course:

Introduction to Discrete Mathematics
Permutations: Definition and Examples
Applications of Permutations
Cycles and Multiset Permutations
Counting Permutations: The Formulas
Permutations in Python with itertools
Custom Python Function for Counting Permutations
Heap's Algorithm
K-Permutations and K-Tuples
The Rule of Product
The Rule of Sum
Exercises: Rule of Product & Sum
The Inclusion-Exclusion Principle
Exercises: Inclusion-Exclusion Principle
Mathematical Notations (Sigma & Pi)
Equinumerosity & Countable Sets
Proving Rational Numbers are Countable
Prime Numbers & Sieve of Eratosthenes
Prime Number Generation in Python
Advanced Properties of Prime Numbers
GCD & LCM (Greatest Common Divisor & Least Common Multiple)
Co-prime Numbers
Congruences (Modular Arithmetic)
Binomial Coefficients & Pascal's Triangle
Combinations
Solving a Complex Combinatorics Problem
Stirling Numbers
Bell Numbers
The Chinese Remainder Theorem
Conclusion & What's Next

Watch the full course on the freeCodeCamp.org YouTube channel (9-hour watch).

How Does Cosine Similarity Work? The Math Behind LLMs Explained

Manish Shivanandhan — Thu, 18 Sep 2025 01:12:39 +0000

When you talk to a large language model (LLM), it feels like the model understands meaning. But under the hood, the system relies on numbers, vectors, and math to find the relationships between words and sentences.

One of the most important tools that makes this possible is cosine similarity. If you want to know how an LLM can judge that two sentences mean almost the same thing, cosine similarity is the key.

This article explains cosine similarity in plain language, shows the math behind it, and connects it to the way modern language models work. By the end, you will see why this simple idea of measuring angles between vectors powers search, chatbots, and many other AI systems.

What Is Cosine Similarity
The Math Behind Cosine Similarity
A Simple Example
Cosine Similarity in Embeddings
How LLMs Use Cosine Similarity
Limits of Cosine Similarity
Why It Matters for LLMs
Conclusion

What Is Cosine Similarity?

Imagine you have two sentences. To a computer, they are not words but vectors, a long lists of numbers that capture meaning.

Cosine similarity measures how close these two vectors are, not by their length, but by the angle between them.

Think of two arrows starting from the same point. If they point in the same direction, the angle between them is zero, and cosine similarity is one. If they point in opposite directions, the angle is 180 degrees, and cosine similarity is negative one. If they are at a right angle, the cosine similarity is zero.

So, cosine similarity tells us whether two vectors are pointing in the same general direction. In language tasks, this means it tells us whether two pieces of text carry a similar meaning.

The Math Behind Cosine Similarity

To understand cosine similarity, we need to look at a bit of math. The cosine of an angle in geometry is the ratio between the dot product of two vectors and the product of their magnitudes. Written as a formula, cosine similarity looks like this:

cosine_similarity(A, B) = (A · B) / (||A|| * ||B||)

Here:

A · B is the dot product of vectors A and B.
||A|| is the magnitude (length) of vector A.
||B|| is the magnitude of vector B.

The dot product multiplies corresponding numbers in the two vectors and adds them up. The magnitude of a vector is like finding the length of an arrow, using the Pythagorean theorem.

This formula always gives a value between -1 and 1. A value close to 1 means the vectors are pointing in nearly the same direction. A value close to 0 means they are unrelated. A value close to -1 means they are opposite.

A Simple Example

Let’s see a short example using Python. Suppose you want to check how similar two short texts are. We can use scikit-learn to turn them into vectors and then compute cosine similarity.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

texts = [
    "I love machine learning",
    "I love deep learning"
]

vectorizer = TfidfVectorizer().fit_transform(texts)
vectors = vectorizer.toarray()

similarity = cosine_similarity([vectors[0]], [vectors[1]])
print("Cosine similarity:", similarity[0][0])

The code starts by importing two important tools. TfidfVectorizer is responsible for turning text into numbers, while cosine_similarity measures how similar two sets of numbers are. Together, they let us compare text in a way a computer can understand.

Next, we define the sentences we want to compare. In this example, we use “I love machine learning” and “I love deep learning.” These two sentences share some words such as “I,” “love,” and “learning,” while differing in one word: “machine” versus “deep.” This makes them good examples to test, because they are clearly related but not exactly the same.

The vectorizer then builds a vocabulary from all the unique words across the two sentences. For these inputs, the vocabulary becomes ["deep", "learning", "love", "machine"]. This means the program now has a list of all the words it will track when building the numerical representation of the sentences.

Each sentence is then converted into a vector, which is simply a list of numbers. These numbers are not just raw word counts. Instead, they are weighted using TF-IDF, which stands for Term Frequency–Inverse Document Frequency.

TF-IDF gives more importance to words that matter in a sentence and less importance to very common words. In simplified form, the first sentence becomes something like [0. 0.50154891 0.50154891 0.70490949], while the second becomes [0.70490949 0.50154891 0.50154891 0. ]. The numbers may look small, but what matters is their relative values.

The .toarray() method then converts these vectors into standard Python arrays. This makes them easier to handle, since the TF-IDF output is stored in a special sparse format by default.

Once the sentences are represented as vectors, cosine similarity is applied. This step checks the angle between the two vectors.

If the vectors point in exactly the same direction, their similarity score will be one. If they are unrelated, the score will be close to zero. If they point in opposite directions, the score will be negative.

In this case, because the two sentences share most of their words, the vectors point in a similar direction, so the cosine similarity falls somewhere around 0.5 to 0.7.

In simple terms, this code shows how a computer can compare two sentences by turning them into vectors of numbers and then checking how close those vectors are. By using cosine similarity, the program can judge not just whether the sentences share words, but also how strongly they overlap in meaning.

Cosine Similarity in Embeddings

In practice, LLMs like GPT or BERT do not use simple word counts. Instead, they use embeddings.

An embedding is a dense vector that captures meaning. Each word, phrase, or sentence is turned into a set of numbers that place it in a high-dimensional space.

In this space, words with similar meaning are close together. For example, the embeddings for “king” and “queen” are closer than the embeddings for “king” and “table.”

Cosine similarity is the tool that allows us to measure how close two embeddings are. When you search for “dog,” the system can look for embeddings that point in a similar direction. That way, it finds results about “puppy,” “canine,” or “pet” even if those exact words are not in your query.

How LLMs Use Cosine Similarity

Large language models use cosine similarity in many ways. When you ask a question, the model encodes your input into a vector. It then compares this vector with stored knowledge or with candidate answers using cosine similarity.

For semantic search, cosine similarity helps rank documents. A system can embed all documents into vectors, then embed your query and compute similarity scores. The documents with the highest scores are the most relevant.

In clustering, cosine similarity helps group sentences that have related meaning. In recommendation systems, it helps match users to items by comparing their preference vectors.

Even when generating answers, LLMs rely on vector similarity to decide which words or phrases best follow in context. Cosine similarity gives the model a simple but powerful way to measure closeness of meaning.

Limits of Cosine Similarity

While cosine similarity is powerful, it has limits. It depends heavily on the quality of embeddings. If embeddings fail to capture meaning well, similarity scores may not reflect real-world closeness.

Also, cosine similarity only measures direction. Sometimes, magnitude contains useful information too. For example, a sentence embedding might have a length that reflects confidence. By ignoring it, cosine similarity may lose part of the picture.

Still, despite these limits, cosine similarity remains one of the most widely used methods in natural language processing.

Why It Matters for LLMs

Cosine similarity is not just a math trick. It is a bridge between human language and machine understanding. It allows a model to treat meaning as geometry, turning questions and answers into points in space.

Without cosine similarity, embeddings would be less useful, and tasks like semantic search, clustering, and ranking would be harder. By reducing the problem to measuring angles, we make meaning measurable and usable.

Every time you search on Google, chat with an AI, or use a recommendation engine, cosine similarity is at work behind the scenes.

Conclusion

Cosine similarity explains how LLMs judge the closeness of meaning between words, sentences, or even whole documents. It works by comparing the angle between vectors, not their length, which makes it ideal for text. With embeddings, cosine similarity becomes the foundation of semantic search, clustering, recommendations, and many other tasks in natural language processing.

The next time an AI gives you an answer that feels “close enough,” remember that a simple mathematical idea, measuring the angle between two arrows, is doing much of the heavy lifting.

Hope you enjoyed this article. Signup for my free AI newsletter TuringTalks.ai for more hands-on tutorials on AI. You can also find visit my website.

How to Write a PHP Script to Calculate the Area of a Triangle

AYUSH MISHRA — Thu, 19 Jun 2025 15:33:06 +0000

In programming, being able to find the area of a triangle is useful for many reasons. It can help you understand logic-building and syntax, and it’s a common programming problem used in school assignments. There are also many real-world applications, such as computer graphics, geometry-based simulations, or construction-related calculations.

In this article, we’ll look at a common problem: we are given the dimensions of a triangle, and our task is to calculate its area. You can calculate the area of a triangle using different formulas, depending on the information you have about the triangle. Here, you’re going to learn how to do it using PHP.

After reading this tutorial:

You will understand the basic logic behind calculating the area of a triangle.
You will know how to write PHP code that calculates the triangle’s area using pre-defined and user-entered values.
You will know how to apply this logic in small projects and assignments.

Prerequisites
Find the Area of a Triangle Using Direct Formulas
Find the Area of a Triangle Using the Base and Height Approach
Find the Area of a Triangle Using Heron's Formula
Find the Area of a Triangle Using Two Sides and Included Angle (Trigonometric Formula)
Conclusion

Prerequisites

You’ll understand this guide more easily if you have some knowledge about a few things:

Basic PHP

You’ll need to know basic PHP syntax to fully understand the problem. If you know how to write a simple echo statement or create a variable in PHP, then you should be good to go.

Local PHP Environment

To run the PHP code successfully, you should have local PHP development, such as XAMPP or WAMP, on your machine. You can also use online PHP editors like PHP Fiddle or OnlineGDB to run a PHP script without any installation.

In this tutorial we are going to explore three approaches to determine the area of the triangle in PHP based on the amount of information available about the triangle.

Base and Height Formula Approach: This approach is applicable when you have the perpendicular height from the base and length of the base in the problem.
Heron’s Formula: This approach is used to calculate the area of triangle when you have the lengths of all three sides of the triangle.
Trigonometric Formula Approach: This approach is applied on the problem when you have the length of two sides and the included angle between them.

First, let’s go back to math class and use some direct formulas to find the area.

Find the Area of a Triangle Using Direct Formulas

Example 1:

In this first example, you’re given the input base and height of a triangle. You have to return the area of the triangle. For this example, you’ll use a direct formula to calculate the area of the triangle.

Input:

Base = 5,

Height = 10

You can calculate the area of the triangle using the formula:

$$Area = (Base * Height) / 2$$

So, if you plug in the values you have, you get: (5* 10) / 2 = 25.

Output:

Area = 25

Example 2:

In this second example, you’re given the length of two sides of a triangle and one angle between them. You have to return the area of the triangle. In this example, you’ll use another direct formula to calculate the area of the triangle.

Input:

Side A = 7, Side B = 9, Angle between them = 60°

In this case, you’ll use the formula:

$$Area = (1/2) A B * sin(Angle).$$

Then just substitute in the values you’ve been given to find the area.

Output:

Area = 27.33 (approximately)

Now let’s look at some different approaches to finding the area of a triangle using PHP.

Find the Area of a Triangle Using the Base and Height Approach

This is the simplest and most direct approach for calculating the area of a triangle when you know the base and height. In this approach, you’ll directly put values in the formula and find the area of the triangle – but you’ll do it with PHP code.

First, define the base and height of the triangle. Then apply the formula for the area of the triangle. As we saw above, the formula for the area of a triangle is:

$$Area = (Base * Height) / 2$$

After calculating the area of the triangle, output the answer.

Alright, so here’s how we can implement that in PHP:


// Define the base and height
$base = 5;
$height = 10;

// Calculate the area
$area = ($base * $height) / 2;

// Output the result
echo "The area of the triangle is: " . $area . " square units.";
?>

Output:

The area of the triangle is 25 square units.

In the above code, first we initialize the base and height of triangle in two variables. Then we plug those values into the area formula. PHP calculates the area of the triangle and displays the answer.

Time Complexity: In the above approach, we are using the direct formula to calculate and return the area of the triangle, so the time complexity will be constant at O(1). The constant time complexity is efficient as it will remain constant, regardless of the size or values of the base and height.

Space Complexity: The Space Complexity will be O(1). The space used by the above program is constant, which ensures minimal use of memory. This space complexity is ideal in environments where memory efficiency is a priority.

We use the above approach when we have the length of the base and height of the triangle (whether directly given or easily measurable in a right angle triangle). This method works best for right-angled triangles.

Find the Area of a Triangle Using Heron's Formula

Heron’s formula is named after a Greek mathematician named Heron of Alexandria. Heron’s formula is useful when you know the lengths of all three sides of the triangle and you want to calculate the area without needing the height. This formula works for any type of triangle, including scalene triangles (triangles with sides of all different lengths).

Here’s Heron’s formula to calculate the area of a triangle:

$$√s(s−a)(s−b)(s−c) $$

Where:

s = semi-perimeter = (a+b+c)/2 is the semi-perimeter of the triangle.
a, b, and c are the lengths of the sides.

First, we define the three sides of the triangle. Then, we check all three conditions of the Triangle Inequality Theorem which states that if the sum of two sides is greater than the third side, then it is a valid triangle, and the given sides can form a triangle.

We can calculate the semi-perimeter of the triangle using the formula s = a+b+c/2. Then we can apply Heron's formula to calculate the area. After calculating the area, then output the answer.

Here’s how you can implement this in PHP:


// Define the sides of the triangle
$a = 7;
$b = 9;
$c = 10;

// Check if the sides form a valid triangle using the Triangle Inequality Theorem
if (($a + $b > $c) && ($a + $c > $b) && ($b + $c > $a)) {

    // Calculate the semi-perimeter
    $s = ($a + $b + $c) / 2;

    // Calculate the area using Heron's formula
    $area = sqrt($s * ($s - $a) * ($s - $b) * ($s - $c));

    // Output the result
    echo "The area of the triangle is: " . $area . " square units.";

} else {
    // If the sides can't form a valid triangle
    echo "The given sides do not form a valid triangle.";
}
?>

Output:

The area of the triangle is: 27.321 square units.

In the above code, we first create three variables to store the lengths of the triangle’s sides, and check if the given sides form a valid triangle or not using the Triangle Inequality Theorem. Then we calculate the semi-perimeter using the formula: s = a + b + c / 2. We put the value of the semi-perimeter and lengths of all sides in Heron’s formula to calculate the area. The area of triangle is returned after calculating using the formula.

Time Complexity: There is a total fixed number of operations such as addition, subtraction, multiplication, and square root. These operations don’t depend on input size as they are performed only a fixed number of times. This means that the time complexity is constant O(1).

Space Complexity: We have used a fixed number of variables to calculate the area of the triangle. We have not used any additional data structures such as arrays or objects. The memory usage in the program is constant, which is better for low-memory environments. The space complexity is constant O(1).

This approach works best when the lengths of all sides are given. This approach is used mainly for scalene or isosceles triangles where height is directly not given. This approach can work for any type of triangle, however – scalene, isosceles, or equilateral.

Find the Area of a Triangle Using Two Sides and Included Angle (Trigonometric Formula)

In this approach, we will see a different variation of the problem. When you know two sides of a triangle and the included angle between them, you can calculate the area using this formula:

$$Area = 1/2 × a × b × sin(θ)$$

Where:

a and b are the lengths of the two sides.
θ is the included angle between the two sides, measured in degrees or radians.

Using the above formula, you can calculate the area of a triangle without needing its height. First, you define the two sides of the triangle and the angle between them. Then you convert the angle from degrees to radians if needed (in PHP, you can use deg2rad() to convert degrees to radians). Then you apply the formula.

After calculating the area of the triangle, output the result.

Here’s how to implement this in PHP:


// Define the two sides and the included angle
$a = 7;
$b = 9;
$angle = 60; // Angle in degrees

// Convert the angle from degrees to radians
$angle_in_radians = deg2rad($angle);

// Calculate the area using the formula
$area = 0.5 * $a * $b * sin($angle_in_radians);

// Output the result
echo "The area of the triangle is: " . $area . " square units.";
?>

Output:

The area of the triangle is: 27.321 square units.

Explanation:

In the above case, we’re using the formula:

Area of Triangle = 1/2 × a × b × sin(θ)

And we’re substituting the following values into the formula:

Area= 1/2 × 7 × 9 × sin(60 ∘) ≈ 27.321

In the code, we declared two variables to store the length of the two sides of the triangle, and the variable $angle hold the included angle in degrees. We used deg2rad(), a PHP built-in function which converts an angle from degrees to radians. Then, we applied the actual formula: Area = 1/2 × 7 × 9 × sin(60 ∘). PHP stores the final answer in the $area variable.

Time Complexity: We are using the direct formula to calculate the area of a triangle when the length of two sides and the angle between them are given. The constant time complexity is O(1).

Space Complexity: Similarly, it does not take any extra space or use any data structures. It uses a single variable to store the result, which is why the space complexity is constant O(1).

This approach is perfect for the problem in which two sides and the included angle (angle between those sides) are known. You can use it when you cannot easily calculate the height of the triangle. This problem has real-life applications in geometry problems, CAD applications, or physics simulations. This method is very accurate and doesn’t require the length of all sides.

Conclusion

In this article, you’ve learned how you can calculate the area of a triangle, both manually and using PHP. You have seen different approaches and learned about which one is best given the information you have. First, we discussed the base and height approach, then looked at Heron’s formula, and finally examined how to handle things when two sides and the included angle are given.

Understanding the logic behind each of these approaches helps you choose the right one based on the given data.

And if you'd like to support me and my work directly so I can keep creating these tutorials, you can do so here. Thank you!

The Architecture of Mathematics – And How Developers Can Use it in Code

Tiago Capelo Monteiro — Fri, 23 May 2025 15:06:16 +0000

"To understand is to perceive patterns." - Isaiah Berlin

Math is not just numbers. It is the science of finding complex patterns that shape our world. This means that to truly understand it, we need to see beyond numbers, formulas, and theorems and understand its structures.

The main goal of this article is to show how math is just like a growing tree of ideas. I want to show that math is a living system of logic, not just formulas to memorize. With analogies, history, and code examples, I want to help you understand math more deeply and how you can apply it to programming.

I’ve also included some code examples here to help you connect theory and practice. I show them to demonstrate how math ideas are applied to real problems. Whether you are new to more advanced math or are more experienced, these code examples will help you understand how to apply math in programming.

This link across theory and application reflects my own studies. I am a finalist in an undergraduate degree in Electrical and Computer Engineering at NOVA FCT, one of the best engineering faculties in Portugal.

My engineering degree is one with more math and physics. This is because it’s key to get a solid grasp of math to understand electronics, telecommunications, control theory, and other areas of engineering.

Here’s a brief overview of some of the math and physics subjects I’ve learned:

Partial Differential Equations (PDEs): These equations model real-world phenomena, from heat diffusion to the economy of a country.
Harmonic Analysis (Fourier & Laplace): Integral transforms like the Fourier and Laplace transform allow us to understand problems in new domains.
Complex Analysis: Extending calculus into the complex plane gives rise to powerful tools used in physics and engineering.
Numerical Analysis: When analytical solutions are impossible or inefficient, numerical methods provide computer-based approximations. This is crucial for real-world applications.
Control and Signal Theory: These areas show us how to design stable systems like rockets, trains, and robots.
Physics: Courses in Classical Mechanics and Electromagnetism helped bridge theoretical math to physical laws

During my years of study, besides technical skills, I’ve developed a deeper understanding of how the world works and the structure of the field of mathematics. And I’ve started to find patterns in how math is a framework of interconnected logic.

In this article, we’ll explore:

Simple Analogy: The Tree of Mathematics
The Structure and History of Mathematics
An Tree example: Foundations of Relativity by Albert Einstein
The Biggest Paradox of Math, Discovered by Kurt Gödel
What About Applied Math and Engineering?
Code Examples – Analytical and Numerical Approaches
The Impact of a Grand Unified Theory of Mathematics
A Final Lesson From History

Simple Analogy: The Tree of Mathematics

Imagine math as a vast tree growing forever.

The roots of the tree are the foundations of mathematics: logic and set theory. From this foundation emerge the main basic fields of math: arithmetic, algebra, geometry, and analysis.

As the tree divides further and further into more branches, new, more complex subfields start to appear, like topology, abstract algebra, and complex analysis. Sometimes the branches are connected to each other.

And remember: this tree is always growing in many directions. From branches creating new branches to branches connecting to other branches. Little by little, it grows.

Throughout history, there have been times that, due to some big scientific discoveries, parts of the math tree started to grow very fast. Other times, decades and even centuries passed without many new branches. This is the case for imaginary numbers, for example.

And you might wonder: How many more branches and connections between them will keep appearing?

The Structure and History of Mathematics

The first mathematical ideas appeared independently across ancient civilizations. For example:

India’s invention of zero
Islamic algebraic advances
Greek geometric rigor

Over time, many different great mathematicians created and shared them by writing and giving lectures.

Eventually, these new ideas were shared widely with new generations and these new generations created new math based on old math.

This is is how new branches are continuously born from previous branches of the tree of mathematics.

And this is why Isaac Newton wrote, in a letter to Robert Hooke in 1675:

If I have seen further, it is by standing on the shoulders of giants

He meant that by working from previous knowledge, he was able to create and (re)discover new ideas.

Yet, the real power of math lies in practicing it over and over and understanding it more and more deeply. As one of my professors once explained:

More important than knowing the theorems is knowing the ideas behind them and the history of how they were created.

Very often, to solve problems, it is necessary to think in terms of first principles and build from there. Math teaches exactly that. In this way, math is not just an academic subject. It is a language spoken by scientists and engineers around the globe.

By having it well preserved and shared, it is still possible to create new math from previous ideas. And it’s possible for the big tree to continue growing based on previous branches or nodes.

An Tree example: Foundations of Relativity by Albert Einstein

Albert Einstein created the general and special theories of relativity. These have big consequences nowadays:

GPS and Global Communication
Advancements in Satellite Telecommunications
Space Exploration and Satellite Launches

But this was only possible through the unification of geometry with calculus, called differential geometry. The evolution of differential geometry happened over the centuries, thanks to many great mathematicians. Below are some of them, but this is not a complete list:

Euclid (circa 300 BCE): Contributed to geometry, laying the groundwork for later mathematical systems
Archimedes (circa 287–212 BCE): Pioneered the understanding of volume, surface area, and the principles of mechanics
René Descartes (1596–1650): Developed Cartesian coordinates and analytical geometry
Isaac Newton (1642–1727) & Gottfried Wilhelm Leibniz (1646–1716): Newton’s laws of motion and gravitation, alongside Leibniz’s development of calculus, formed the basis of classical mechanics that Einstein sought to extend and modify in his theory of relativity.
Leonhard Euler (1707–1783): Contributed to the development of differential equations, which are essential in the mathematical foundations of physics.
Gaspard Monge (1746–1818): The father of differential geometry and pioneer in descriptive geometry
Carl Friedrich Gauss (1777–1855): Made groundbreaking advances in geometry, including the concept of curved surfaces.
Bernhard Riemann (1826–1866): Introduced Riemannian geometry, a branch of differential geometry.

Once again, as Isaac Newton wrote, in a letter to Robert Hooke in 1675:

If I have seen further, it is by standing on the shoulders of giants.

Albert Einstein saw what no one else in his time saw, thanks to these great math giants and countless others.

The Biggest Paradox of Math, Discovered by Kurt Gödel

The biggest paradox in math, in my opinion, is what Kurt Gödel discovered. His early 20th century research revealed a limitation within this cycle.

This paradox – that is, his incompleteness theorems – shows that in any consistent formal system capable of expressing simple arithmetic, there will always be true mathematical statements that cannot be proven within the system itself.

This means that in ALL systems, there are limits to what you can actually prove as to what is true and false. For for mathematicians, this means that the tree will never be completed. There are truths that are beyond formal truths, and yet we still assume that they are true (albeit unproven).

This way, it proves that no matter how many mathematicians work in the field or how much AI is used to find new mathematics, there will always exist limitations. Some things are impossible to prove that are true, and we just know that they are due to approximation estimations and other non logical exact methods.

What About Applied Math and Engineering?

Applied math and engineering involves interpreting the same pure math ideas in real-world scenarios. Actually, in many cases, it is the combination of many math ideas. Let’s consider some examples:

Principal component analysis (PCA) is a widely used tool in data science. Yet, it is a mixture of linear algebra (in PCA, eigenvalues) with optimization (order eigenvalues that represent more data with less data) in order to make datasets shorter.

In machine learning, logistic regression is a mixture of calculus with statistics and probability.

In harmonic analysis, Laplace, Fourier, and Z-transforms are a way to see the same thing in a new domain to get new insights. In this case, integrals are used to make this mapping.

In deep learning, neural networks are just many matrices multiplying and updating themselves that adapt to model a dataset representing a system. This optimization of matrix values happens with activation functions, a gradient descent-based optimization method (tells how much values need to change), and backpropagation (applies those alterations to all matrix values).

I have actually written an article where I teach why activation functions are important if you want to check it out.

But the best example of this fusion of math with engineering is in control theory.

Control theory is the study of the architecture of systems. From trains to cars to airplanes, everything is based on control theory. It is everywhere in nearly all modern electronic devices. In electric circuits, control theory is also used heavily to guarantee circuit stability in the face of electric disturbances.

So as you can probably start to see, many of the tools we now have are just a mixture of many pure math ideas. Just many combinations and recipes of pure math ideas. In essence, applied math is the application of pure math as “ingredients“ in "recipes" to solve problems.

So, we’ve explored the structure and evolution of mathematics. Yet, it is important to see how these ideas can be applied in real life. Pure math makes the framework, and applied math applies that framework to solve problems. To understand this, we’ll examine two code examples that show how you can use math ideas as programming tools.

Code Examples – Analytical and Numerical Approaches

These code examples demonstrate a couple ways you can use Python to solve math equations.

In the first code example, we’ll solve the problem in the same way that kids in school solve math exercises: essentially, by hand with a pencil. Moving variables from left to right to find their values. In the second example, we’ll solve the problem using numerical analysis.

Example 1: Solve a Problem Analytically

When we solve math problems analytically, like we did in school, we are manipulating symbols to get exact values. Often there symbols are x, y and z. In Python, we can do this using the SymPy library:

from sympy import symbols, Eq, solve

x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)

solution = solve((eq1, eq2), (x, y))
print(solution)

Essentially, we are finding x and y based on this equation:

$$\begin{align*} 2x + 3y &= 6 \\ -x + y &= 1 \end{align*}$$

Which gives us the following result:

{x: 3/5, y: 8/5}

Or:

x= 0.6
y = 1.6

When we say that we’re solving this analytically, it means that we’re finding an exact mathematical solution using formulas or equations.

But many times, problems are harder and can be solved by adding symbols to the right or left of the equation.

Sometimes, there can be so many symbols and transformed versions of them, with things like derivatives and integrals, that it can become very hard to manage and takes a lot of time.

For this reason, there is an area of mathematics devoted to finding approximations of already created mathematical formulas called numerical analysis. It makes it faster to solve these problems. And this is the method we will explore next.

Example 2: Solve Numerically (Approximation)

We’ll now use SciPy to solve the same system with numerical methods:

import numpy as np
from scipy.linalg import solve

A = np.array([[3, 2, -1, 4, 5],
              [1, 1, 3, 2, -2],
              [4, -1, 2, 1, 0],
              [5, 3, -2, 1, 1],
              [2, -3, 1, 3, 4]])

b = np.array([12, 5, 7, 9, 10])

solution = solve(A, b)

print(solution)

In this code example, this line of code:

solution = solve(A, b)

Uses the solve method from the SciPy Python library:

from scipy.linalg import solve

It’s a method that helps you find the values of x in an equation A⋅x=b, where a is a square grid of numbers and b is a list of numbers. Which gives us the following:

[ 1.35022026 -0.79955947 -1.17180617  3.14317181 -0.83920705]

Now imagine, in this simple case, that a matrix like A could represent the traffic flow between cities or intersections, and b could represent the traffic entering or leaving each city.

By solving the system, it could help us determine the distribution of traffic between cities to meet desired traffic conditions.

Of course, these types of problems are far more complex in real life. But to understand and solve the big problems, you need to first understand the smaller problems.

And by the way, a system of equations is the same thing as a matrix. We just represent systems of equations as matrices to make the findings of properties and clarity easier to understand.

The thing is that by using matrices, it is easier to make calculations and to perform linear algebra math to check for characteristics of the matrix and understand it better.

In essence, a matrix represents a system of equations. Also, systems of equations can represent real life phenomena like the economy of a country or the weather.

If you want to know more, I wrote an entire article on numerical analysis that you can check out.

The Impact of a Grand Unified Theory of Mathematics

Despite the biggest paradox in mathematics, what would happen with a Grand Unified Theory of Mathematics?

Remember that such a theory tells us that there are things that are true that are impossible to formally prove, and we need to just accept it. But even with this assumption, it is still possible to unify all math.

This is what the Langland's program is trying to solve. A kind of attempt to interconnect the largest parts of the big tree of math to uncover new patterns in math.

With a Grand Unified Theory of Mathematics, we would be able to understand how every branch of the tree connects with the others and all the relationships between them.

What is the value of this big unification for society?

By studying history, we can find patterns. The unification of various fields has created many massive impacts on society, such as:

In the 19th century, James Clerk Maxwell united the fields of electricity and magnetism with his famous Maxwell equations. This allowed the creation of radios and electric grids around the globe. In turn, it served as a foundation for all technological progress in the 20th and 21th century.
In the 20th century, the unification of algebra with logic led to the rise of digital systems. In turn, digital systems gave the rise of processors and the evolution of computers to the modern laptop.
Also in the 20th century, the unification of probability and communication led to information theory. This became the foundation for the internet. This unification was carried out by a great mathematician called Clause Shannon.

In the end, a Grand Unified Theory of Mathematics could be one of the biggest achievements in modern society.

It could lead to new discoveries in physics, such as in string theory or quantum gravity, where deep mathematical structures are needed to create new physics. In AI, it could help unify all machine learning models in a common architecture. This would help accelerate the development of new AI models. It could also open the door to new cryptographic methods and material science advances, revealing, with math, the deep patterns still not found in these fields.

Just as uniting electricity and magnetism led to modern technology, a unified math framework would lead to a wave of innovation.

A Final Lesson From History

From Greek geometry to AI, math has grown like a tree over centuries. By understanding its structure, it is possible to see its role in finding the patterns of our universe. I hope I was able to make you see math in this way.

In addition, we can conclude that the unification of scientific fields makes the foundations for the creation of new innovations to help society go forward. Many profound societal transformations only came to be thanks to abstract math ideas. When these are shared and refined, they become the hidden architecture of progress in society. Innovation begins when disconnected ideas are united, well-linked, and widely shared.

Find the full code here.

How to Write Math Equations in Google Docs

Vikram Aruchamy — Fri, 16 May 2025 15:42:41 +0000

Math equations are a critical part of academic papers, research reports, and technical documentation. While LaTeX is widely used for professional typesetting, Google Docs offers a robust set of features for inserting and formatting math equations and also supports LaTeX-style input.

Whether you're a student submitting a math assignment or a professional documenting formulas, Google Docs provides multiple ways to insert and format equations efficiently.

In this article, you'll learn how to write math equations in Google Docs using different methods, including using Google Docs’ built-in equation editor and typing LaTeX-style commands directly, inserting complex equations with the help of the Auto-LaTeX add-on, and copying math equations from ChatGPT to Google Docs without losing formatting by using the ChatGPT to Google Docs or PDF Chrome extension.

How to Write Equations Using the Built-in Equation Editor
How to Write Equations Using LaTeX Commands
How to Use Auto Latex Add-on for Writing Advanced Math Equations
How to Copy Math Equations from ChatGPT to Google Docs
Watch: How to Write Equations in Google Docs
Tips for Formatting Math Equations in Google Docs
Conclusion

How to Write Equations Using the Built-in Equation Editor

Google Docs has a built-in equation editor that makes it easy to insert mathematical symbols and expressions.

To insert an equation editor box:

Open your Google Docs document.
Go to the top menu and click Insert → Equation.
An equation editor will appear, and a new toolbar will show up with common math symbols like fractions, exponents, Greek letters, and more.

Alternatively, you can use the following keyboard shortcuts to insert an equation editor box.

Windows/Linux: Alt + I, then E
Mac: Control + Option + I, then E

This shortcut quickly opens the equation editor without clicking through menus.

Toolbar Symbols:

Once the toolbar appears, you’ll find buttons for:

Greek letters
Miscellaneous operations
Relations
Math operations
Arrows

The equation editor box and a toolbar look like the following:

Now let’s learn how to write equations using the equation editor with a practical example.

Example: Typing the Quadratic Formula

Follow these steps to insert the following quadratic formula in Google Docs:

$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

Go to Insert → Equation to insert an equation editor and enable the equation toolbar.
Type x=
Click the Math Operations dropdown (the one with templates like square roots, brackets), then select the fraction template. This inserts a placeholder with two parts: a numerator and a denominator.
Click inside the numerator field. Begin by typing -b.
Now insert the ± (plus-minus) symbol. To do this:
- Click the Miscellaneous Operations dropdown
- Select the ± symbol from the list.
  Your numerator should now show: -b ± as in the following image:
After the ± symbol, insert a square root:
- Go back to the Math Operations dropdown and select the square root template.
- Inside the root, type b^2 - 4ac.
  - Use ^ to enter exponents. For example, b^2 will be rendered as b².
Move to the denominator field and type 2a.

Now your full equation should appear as:

The equation will be properly formatted using Google Docs’ equation rendering, making it easy to read and mathematically accurate. You can continue typing more text below or beside the equation as needed – it behaves like any other element in your document.

This approach is useful for inserting neatly formatted equations without relying on add-ons or external tools. It’s especially helpful for students, teachers, and professionals preparing technical documents directly in Google Docs.

How to Write Equations Using LaTeX Commands

If you're familiar with LaTeX, you can take advantage of Google Docs’ support for a subset of LaTeX-style commands inside the built-in equation editor. This can greatly speed up the process of entering complex mathematical expressions, especially if you're already comfortable with LaTeX syntax.

How to Use LaTeX Commands in Google Docs

Open your Google Docs document.
Go to Insert → Equation to activate the equation toolbar and equation input field.
Click inside the equation box. Instead of using the toolbar buttons, type LaTeX-style commands directly.
As you type, Google Docs will automatically render the commands into formatted math once you press space or enter after each command or expression.

Commonly Supported LaTeX Commands in Google Docs:

Instruction	Result
To insert a fraction	`\frac{a}{b}` → 𝑎⁄𝑏
To insert a square root	`\sqrt{x}` → √𝑥
To insert Greek letters like α, β	`\alpha, \beta` → α, β
To insert an integral with limits	`\int_a^b f(x)\,dx` → ∫ᵃᵇ 𝑓(𝑥) 𝑑𝑥
To insert x superscript 2	`x^2` → 𝑥²
To insert x subscript 1	`x_1` → 𝑥₁

Type these commands in the equation box, and when you press space or enter, they will be converted to properly formatted mathematical notation.

Example: Typing the Quadratic Formula Using LaTeX Commands

Let’s walk through how to enter the following quadratic formula using LaTeX-style commands:

$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

Steps:

Insert the equation box: Go to Insert → Equation.
In the equation input area, type the following:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

\frac creates a fraction.
-b is the numerator’s first term.
\pm inserts the plus-minus symbol.
\sqrt creates a square root.
b^2 formats b squared.
- 4ac is written normally inside the square root.
2a is the denominator.

As you type, press space or enter after each LaTeX command. Google Docs will automatically convert the code into properly formatted math notation.

After rendering, the equation will appear as:

This method is ideal for users who prefer keyboard-based input over clicking toolbar icons. It also allows you to enter complex expressions faster and more accurately, especially if you're familiar with standard LaTeX syntax.

Notes:

Not all LaTeX features are supported in Google Docs. The supported commands are limited to basic math formatting, Greek letters, and common symbols.
Make sure to press space after each LaTeX command so that Docs knows to render it.

How to Use Auto Latex Add-on for Writing Advanced Math Equations

When generating mathematical content using tools like ChatGPT, you'll notice that equations are rendered visually on the webpage, but behind the scenes they’re created using LaTeX code. So when you copy content from ChatGPT into Google Docs, the equations come through as raw LaTeX code rather than rendered math expressions.

For example, a quadratic formula provided by ChatGPT might look like this when pasted into your document:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

While this format is ideal for precision, Google Docs doesn’t support LaTeX rendering by default.

This is where the Auto-LaTeX Equations add-on becomes essential, especially if you're moving content from ChatGPT to Google Docs. It’s also incredibly useful when importing LaTeX-based documents into Google Docs, such as content originally written in Overleaf or other LaTeX editors.

Instead of manually reformatting equations, the add-on automatically renders LaTeX code into properly formatted math equations, preserving the typesetting and structure you’d expect from a LaTeX environment.

What is Auto-LaTeX Equations?

Auto-LaTeX Equations is a free and open-source Google Docs add-on that scans your document for LaTeX expressions and converts them into a properly formatted equations.

It recognizes LaTeX code wrapped in these delimiters:

Inline: $$ ... $$
Display: \[ ... \]

Once detected, it renders the equations seamlessly within your document, eliminating the need to retype or manually format them.

Paste your LaTeX expression into the Google Docs document. Make sure the expression is enclosed using one of the supported delimiters:

$$ ... $$ or \[ ... \]
Open the add-on sidebar by clicking Extensions → Auto-LaTeX Equations → Start.
Once the sidebar opens, you’ll see a dropdown labeled “Delimiters” and a button called “Render Equations.”
Select the delimiter you used when enclosing your LaTeX equations – for example, $$ or \[ \].
Click the “Render Equations” button.

The add-on will automatically scan your document and convert all valid LaTeX expressions into properly formatted equation images.

This step-by-step process allows you to take any LaTeX-based math copied from ChatGPT and render it cleanly within Google Docs – ready for export to Word or PDF.

Example: Converting a LaTeX coded Equation to Rendered Math Equations

Paste the following equation into Google Docs:

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

To convert it:

Go to Extensions → Auto-LaTeX Equations → Start.
Select the Delimitor as $$ ..$$ and click on the Render Equations button. The equation will be rendered and look as follows:

$$x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

How to Install Auto-LaTeX Equations

In Google Docs, click Extensions → Add-ons → Get add-ons.
Search for Auto-LaTeX Equations.
Click Install and follow the prompts.
After installation, access it from Extensions → Auto-LaTeX Equations.

How to Copy Math Equations from ChatGPT to Google Docs

To easily transfer math equations and the surrounding content from ChatGPT into Google Docs without losing formatting, use the free ChatGPT to Google Docs or PDF Chrome extension.

This extension allows you to:

Export a single response (with equations and tables) into Google Docs while preserving formatting
Export an entire conversation, including math, code, and text, into a clean, one organized Google Docs, no need to export responses separately and merge multiple Google Docs into one later
Save ChatGPT canvas content as a Google Docs or PDF
Export ChatGPT deep research documents directly into Google Docs
Export ChatGPT content directly into PDF format when no further edits are necessary, eliminating the need to first export to Google Docs and then convert Google Docs to PDF

It’s especially useful for students, researchers, and professionals who want to keep their AI-generated math, notes, and research well-organized in Google Docs or PDF format with minimal effort.

Watch: How to Write Equations in Google Docs

If you prefer visual learning, here’s a helpful video walkthrough that demonstrates all the methods discussed above – using the built-in equation editor, LaTeX-like commands, and the Auto-LaTeX Equations add-on.

This step-by-step tutorial covers:

Opening and using the built-in equation toolbar
Typing LaTeX-style commands directly in the equation editor
Converting AI-generated LaTeX (e.g., from ChatGPT) into clean equations

Tips for Formatting Math Equations in Google Docs

Use inline equations when:

Inserting short expressions like x², a/b, or single variables
Including math within a sentence to maintain the flow of text

Use block equations when:

Writing complex or multi-line formulas (e.g., the quadratic formula)
You want the equation to be clearly separated from the surrounding text for readability

Wrapping tips for rendered equations:

Rendered equations are treated as images in Google Docs, which may disrupt the document layout if not positioned correctly
To fix this:
- Click the equation image
- Choose from:
  - In line – aligns the equation with surrounding text (best for inline use)
  - Wrap text – wraps paragraph text around the equation image
  - Break text – places the equation on its own line, isolating it
- Use the margin handles or spacing options to fine-tune the layout and prevent overlap or crowding

Conclusion

Google Docs offers several flexible ways to write and manage math equations:

Use the built-in equation editor for basic symbols, fractions, exponents, and common operations. It’s easy to access and great for straightforward math tasks without needing special syntax.
Try LaTeX-like commands inside the equation editor for faster input. You can type commands like \frac, \sqrt, or \alpha to quickly insert structured equations without navigating menus.
Install add-ons like Auto-LaTeX Equations for advanced LaTeX rendering. This is especially useful if you're copying equations from Overleaf, ChatGPT, or LaTeX documents into Google Docs, as it preserves formatting and converts code into clean equation images.
Use external tools when copying from other formats, like the ChatGPT to Google Docs or PDF Chrome extension, which helps retain equation formatting when moving content from ChatGPT or other platforms.

Whether you’re completing math homework, preparing teaching materials, or writing a research paper, Google Docs, combined with these tools, gives you everything you need to create clear, professional-looking documents with math content.

Learn Calculus by Coding in Python

Beau Carnes — Tue, 29 Apr 2025 15:22:37 +0000

Calculus is one of the cornerstones of higher mathematics and a powerful tool for understanding change, motion, and growth across countless disciplines. But for many students, Calculus can seem intimidating or abstract. What if you could learn it step by step from a seasoned university professor, and simultaneously see how each concept works in code? That’s exactly what this new course offers: a practical, intuitive, and coding-friendly approach to mastering Calculus.

We just published a course on the freeCodeCamp.org YouTube channel that will teach you all about Calculus through the lens of Python programming. Taught by experienced mathematics professor Ed Pratowski, this course walks you through essential topics in college-level Calculus while showing you how to implement these concepts using Python.

The course begins with foundational concepts like limits and the idea of a "hole in the graph" before moving into derivative rules, slope interpretation, and real-world applications like financial modeling and projectile motion. You'll explore important theorems like Rolle's and the Mean Value Theorem, as well as integral calculus topics like Riemann sums, the Fundamental Theorem of Calculus, and calculating volume using solids of revolution. You'll also learn how to apply symbolic math libraries like SymPy for graphing and computation.

One of the unique features of this course is its consistent use of Python to illustrate and reinforce each Calculus concept. By coding derivatives, plotting graphs, and computing integrals programmatically, you'll not only deepen your mathematical understanding but also gain practical skills in mathematical programming. This is an invaluable asset for data science, engineering, and technical careers.

Here is the full list of topics covered.

Intro: Calculus with Python
Limits: Hole in the Graph
Limits: Asymptotes
Limits: Graphing
Limits and Slope
Slope and the Derivative
Derivatives and Calculus
Chain Rule
Product Rule
Implicit Differentiation
Multiple Derivative Steps
Derivative Example
Financial Applications
Projectile Motion
Derivatives and Differentials
Tangent Lines
Parametric Equations
Related Rates: Ladder Sliding
Related Rates: Balloon Volume
Mean Value Theorem
Rolles Theorem
Riemann Sums: Area Under a Curve
Summation and the Integral
Fundamental Theorem of Calculus
Area Above and Below the Axis
Area Between Curves
Volume Revolved Around X
Volume of a Hollow Shape
Volume Revolved Around Y
Center of Mass
The Normal Curve
Sympy Graphing
Arc Length
Surface Area
Integral Formulas

Ready to bring math to life with code? Check out this comprehensive Calculus with Python course on the freeCodeCamp.org YouTube channel.

How to Run LaTeX Projects Locally (for Free) On Windows

Md. Fahim Bin Amin — Tue, 25 Feb 2025 15:44:04 +0000

LaTeX is a high-quality typesetting system that is widely used in technical, academic, and scientific writing. It’s very popular in academia, especially in fields like mathematics, physics, computer science, and engineering.

LaTeX is not a word processor like Microsoft Word – rather, it’s a document preparation system that allows you to focus on the content of your writing while it handles the formatting. If you use LaTeX to write your formal documents (like a CV, résumé, or research paper), then you don’t need to worry about the formatting and structure, as everything can be done using LaTeX scripts.

If you use LaTeX to write your academic or research papers, you might be familiar with website-based applications like Overleaf. Overleaf is a website that allows anyone to read, write, and compile LaTeX scripts online.

These sites are okay for small tasks or compilations, or if you need only a little bit of free collaboration. But if you need to work on bigger projects or need to conduct many collaborative tasks, then the free tier may be insufficient. And in my opinion, the paid subscription costs too much.

But don’t worry: running LaTeX locally may be the perfect solution for you. I know this because I also faced a similar situation, and this simply changed my life! I also keep all of the tracks in Git (GitHub, GitLab, and so on) along with unlimited collaboration opportunities and compilation. And the great thing is, all of these are completely free as it’s all happening on my local machine.

So in this article, I am going to discuss the methods in detail. I have also created an in-depth video for you to understand how this works.

Video Tutorial

Resources You’ll Need:

1. GitHub Repository

This entire guide is available in one of my GitHub projects named Install-LaTeX. The live website is available here (fahimfba.github.io/Install-LaTeX) as well. I would highly appreciate it if you star (⭐) the repository. Also, you can create issues there if you face any problems. Any kind of good contribution is also welcome here.

2. Operating System

You can install LaTeX on any major operating system (Windows, MacOS, and Linux-based OSes). But in this article, I am only going to talk about the Windows operating system.

Here, I’m using the latest Windows 11 operating system, but the same procedure should be applicable to all of the Windows-based operating systems that are going to come out in the future. Windows 10 should also be okay too.

3. Editor

I am going to use the popular Visual Studio Code as my editor. It is a 100% free and robust editor that’s very popular among devs all over the world. If you don’t already have it, go ahead and install it before proceeding further.

4. LaTeX Compiler/IDE

To work on LaTeX files, you’ll need a specific compiler. I am going to use MikTeX. There are other tools out there, but this is the best tool right now (according to me!). It is completely free and supports all major operating systems as well. It also has a built-in IDE, but we are going to use VS Code as our main editor.

Download the Windows executable file from the Download section.

After the download is finished, install the executable. At the end of the installation, keep the tick in “Check for updates now”.

You will find the MikTeX console in your taskbar. Open that.

Go to the “Updates” tab and click “Update now”. It will install all of those packages.

At the end, it will prompt you to close the console. Click “OK”. Open MiKTeX again.

That’s it for this tool.

5. Perl

The commands we are going to execute for building the LaTeX files are dependent on Perl. As the Windows operating system doesn’t come with a built-in Perl compiler, we are going to install the Strawberry Perl.

Download the latest MSI package from it.

Install the executable after it gets finished downloading the application.

We need to add Perl’s path to the system environment. To do that, go into the location where it has been installed. By default, it gets installed inside C:\Strawberry\perl\bin directory. Copy the path.

Now search for “env” in the Windows search bar until you find something called “Edit the system environment variable”.

Now click on “Environment Variables…”.

Now select “Path” from “System variables” and click “Edit”.

Click “New”. Paste the path. Now exit every windows sequentially by clicking on “OK” in each window.

Visual Code Studio Extensions

We need some extensions in VS Code to streamline our workflow.

First, let’s get LaTeX Workshop. It is the core extension for working with LaTeX files inside VS Code Studio.

Next, you’ll need Rewrap. It is an amazing tool that lets you wrap longer lines. It helps you work in a long line in separate lines without breaking any structure or sentence.

Build the LaTeX File

Whenever you want to build any LaTeX file inside VS Code studio, simply open that file in it. Then open the command palette using Ctrl + Shift + P.

Search for “LaTeX Workshop: Build with recipe” and go there. It will start building the file. Whenever it prompts you to install any missing package, untick the box that says “Always show this dialog” and press “Install”. I do this because clicking on “Install” on hundreds of prompt windows for building a LaTeX file is very difficult for me.

After it finishes building the LaTeX file, you will get the output PDF file inside VS Code. You can open the PDF file directly in VS Code.

If you want to go into any specific line in the code from the output PDF file like Overleaf, simply click on that specific portion in the PDF by pressing the Ctrl key. It will immediately take you to the code part where it belongs.

That’s it! It’s now running on your local machine and there are no restrictions or limitation to it, literally! Also, for collaboration and keeping track of the history, using Git is the best option, like I do.

Conclusion

Thanks for reading this short tutorial. I hope it helped you interact more easily with LaTeX.

You can follow me on GitHub, LinkedIn, and You Tube to get more content like this. Also, my website is always available for you!

What is a Floating-Point Arithmetic Problem?

Syeda Maham Fahim — Thu, 24 Oct 2024 14:19:22 +0000

Have you ever worked with numbers like 1/3, where the result is 0.33333… and continues forever? As humans, we naturally round off such numbers, but have you ever wondered how computers handle them?

In this article, you’ll explore how computers manage continuous values, including the concept of precision errors. We’ll examine the floating-point arithmetic problem, a universal issue that affects many programming languages. We’ll focus specifically on how JavaScript addresses this challenge.

Additionally, you’ll learn how binary operations work behind the scenes, the threshold at which JavaScript truncates numbers based on the IEEE 754 standard, and introduce BigInt as a solution for accurately handling larger numbers without precision loss.

First, let's consider an example. Can you guess the output of this operation?

console.log(0.1 + 0.2);

You may think the answer is 0.3, right? But no, the actual output is:

Output: 0.30000000000000004

You must be wondering why this is happening. Why so many extra zeros, and why does it end with a 4?

The answer is simple: the numbers 0.1 and 0.2 cannot be precisely represented in JavaScript (that is, "exactly" or "accurately.")

It sounds simple, right? But the explanation is a bit more complex.

So, what do you think—bug or feature?

Well, it’s not a bug. It’s actually a fundamental issue with how computers handle numbers, specifically floating-point numbers.

Why Does This Happen?

Let’s understand this with basic math.

The fraction 1/3 is represented in decimal by 0.33333... and it never ends. This means that 3 repeats itself infinitely. We can’t write it down exactly, so we approximate it to something like 0.333 or 0.333333 to save time and space.

Similarly, in a computer, we also have to approximate because 1/3 or 0.3333... would be a very large number and take up infinite space (which we don’t have).

This leads to what we call the floating-point arithmetic problem.

Floating-Point Arithmetic Problem

In simple terms, floating-point numbers are numbers that cannot be written down exactly, so we approximate them. In a computer, this kind of approximation can lead to small precision errors, which we call the floating-point arithmetic problem.

Binary Explanation

Now that we've covered the simple explanation, let’s also understand this in binary terms. JavaScript handles everything in binary behind the scenes.

Binary is a number system that only uses two digits: 0 and 1.

Why Can’t 0.1 and 0.2 Be Represented Exactly in Binary?

The core issue is that not all decimal numbers can be perfectly represented as binary fractions.

Let’s take 0.1 as an example:

When you try to represent 0.1 in binary, you’ll find out that it can’t be expressed as a finite binary fraction. Instead, it becomes a repeating fraction, much like how 1/3 in decimal becomes 0.333..., repeating forever.

In binary, 0.1 becomes:

0.0001100110011001100110011001100110011... (repeating infinitely)

Since computers have limited memory, they can’t store this infinite sequence exactly. Instead, they have to cut off the number at some point, which introduces a small rounding error. This is why 0.1 in binary is only an approximation of the actual value.

Like 0.1, 0.2 can’t be exactly represented in binary. It becomes:

0.00110011001100110011001100110011... (repeating infinitely)

Again, the computer truncates (cutting off part of a number to fit a limit or remove extra digits) this infinite binary sequence, leading to a small error in representation.

So, what happens when we add 0.1 + 0.2? When you add 0.1 + 0.2 in JavaScript, the binary approximations for 0.1 and 0.2 are added together. But since both values are only approximations, the result is also an approximation.

Instead of getting exactly 0.3, you get something close to this:

console.log(0.1 + 0.2); // Output: 0.30000000000000004

This slight error occurs because neither 0.1 nor 0.2 can be represented exactly in binary, so the final result has a small rounding error.

How Does JavaScript Truncate the Number?

Now, the question arises: how does JavaScript know when to truncate the value?

( Truncation means cutting off or shortening a number by removing extra digits beyond a certain point. )

There’s a maximum and minimum limit for it.

To handle this in the computer world, we have a standard that defines how floating-point numbers are stored and calculated.

IEEE 754 Standard

JavaScript uses the IEEE 754 standard to handle floating-point arithmetic.

The standard defines safe integer limits for the Number type in JavaScript without precision loss:

Maximum Safe Integer: 2^53 - 1 or 9007199254740991
Minimum Safe Integer: -(2^53 - 1) or -9007199254740991

Beyond these limits, JavaScript cannot accurately represent integers due to the way floating-point arithmetic works.

For this reason, JavaScript provides two constants to represent these limits:

Number.MAX_SAFE_INTEGER
Number.MIN_SAFE_INTEGER

What If I Need a Bigger Number?

If you need to work with numbers larger than the Maximum Safe Integer (like those used in cryptography or finance), JavaScript has a solution: BigInt.

Enter BigInt

BigInt is a built-in object that allows you to work with whole numbers beyond the safe integer limit. It enables you to represent numbers larger than 9007199254740991, so you don't need to worry about precision errors here!

To use BigInt, simply append an n to the end of an integer literal:

const bigNumber = 1234567890123456789012345678901234567890n;

Alternatively, you can use the BigInt constructor:

const bigNumber = BigInt("1234567890123456789012345678901234567890");

Operations with BigInt

You can perform arithmetic operations with BigInt, like addition, subtraction, multiplication, and even exponentiation. However, there’s a catch: you can’t mix BigInt with regular Number types in arithmetic operations without explicitly converting between them.

For example, this won’t work:

let result = bigNumber + 5; // Error: cannot mix BigInt and other types

You would need to convert the Number to BigInt first:

let result = bigNumber + BigInt(5); // Now it works!

Where Do We Use BigInt?

BigInt is particularly useful in areas requiring precision, such as:

Cryptographic algorithms
Handling large datasets
Financial calculations requiring exactness

In Summary

The safe integer limit in JavaScript ensures accurate number representation for integers between -(2^53 - 1) and 2^53 - 1.
Precision errors occur due to floating-point arithmetic when handling certain numbers (like 0.1 + 0.2).
If you need numbers bigger than the safe limit, BigInt is your friend. But remember, mixing BigInt and Number types require explicit conversions.

How to Blend Images in Rust Using Pixel Math

Anshul Sanghi — Tue, 27 Aug 2024 10:25:56 +0000

For anyone looking to learn about image processing as a programming niche, blending images is a very good place to start. It's one of the simplest yet most rewarding techniques when it comes to image processing.

To help your intuition, it's best to imagine an image as a mathematical graph of pixel values plotted along the x and y coordinates. The top right pixel in an image is your origin, which corresponds to an x value of 0 and a y value of 0.

Once you imagine this, any pixel in an image can be read or modified using it's coordinate in this x-y graph. For example, for a square image of size 5px x 5px, the coordinate of the center pixel is 2, 2. You may have expected it to be 3, 3, but image coordinates in this context work similar to array indexes and start from 0 for both axis.

Approaching image processing this way also helps you address each pixel individually, making the process much simpler.

Prerequisites

The focus of this article is for you to understand and learn how to blend images using the Rust programming language, without going into the details of the language or it's syntax. So being comfortable writing Rust programs is required.

If you're not familiar with Rust, I highly encourage you to learn the basics. Here's an interactive Rust course that can get you started.

Introduction
How Image Blending Works
Project Setup
How to Read Pixel Values
How to Blend Functions
How to Apply Blend Functions To Images
Putting It All Together
Glossary

Introduction

Image blending refers to the technique of merging pixels from multiple images to create a single output image that is derived from all of its inputs. Depending on which blending operation is used, the image output can vary widely given the same inputs.

This technique serves as the basis for many complex image processing tools, some of which you may already be familiar with. Things such as removing moving people from images if you have multiple images, merging images of the night sky to create star trails, and merging multiple noise-heavy images to create a noise reduced image are all examples of this technique at play.

To achieve the blending of images in this tutorial, we will make use of "pixel math", which while not being a truly standard term, refers to the technique of performing mathematical operations on a pixel or set of pixels to generate an output pixel.

For example, to blend two images using the "average" blend mode, you will perform the mathematical average operation on all input pixels at a given location, to generate the output at the same location.

Pixel math is not limited to point operations, which are basically operations performed during image processing that generate a given output pixel based on input pixel from single or multiple images from the same location in the x-y coordinate system.

In my experience so far, the entirety of image processing field is 99% mathematics and 1% black magic. Mathematical operations on pixels and it's surrounding pixels is the basis of image manipulation techniques such as compression, resizing, blurring and sharpening, noise reduction, and so on.

How Image Blending Works

The technique is technically simple to implement. Let's take the example of a simple average blend. Here's how it works:

Read the pixel data of both images into memory, usually into an array for each image.
- The array is usually 2 dimensional. Each entry in array is another array for color images, the secondary array holds the 3 pixel values corresponding to Red, Green, and Blue color channels.
For each pixel location:
1. For each channel:
  a. Take the value of the channel from the 2nd image, let's consider it y.
  b. Perform the averaging operation x/2 + y/2.
  c. Save the output value of this operation as the value of the output channel
2. Save the result of previous operation as the value of the output pixel.
Construct the output image with the same dimensions from the computed data.

You'll notice that pixel math is performed on a per-channel basis. This is always true for the blend modes we cover in this tutorial, but many techniques involve applying blends between the channels themselves and many times within the same image.

Project Setup

Let's get started by setting up a project that gives us a good baseline to work with.

cargo new --bin image-blender
cd image-blender

You will also need a single dependency to help you perform these operations:

cargo add image

image is a Rust library we'll use to work with images of all of the standard formats and encodings. It also helps us convert between various formats, and provides easy access to pixel data as buffers.

For more information on the image crate, you can refer to the official documentation.

To follow along, you can use any two images, the only requirement being that they should be of the same size and in the same format. You can also find the images used in this tutorial, along with complete code, in the GitHub repository here.

How to Read Pixel Values

The first step is to load the images and read their pixel values into a data structure that facilitates our operation. For this tutorial, we're going to use a Vec of arrays (Vec<[u8; 3]>). Each entry in the outer Vec represents a pixel, and the channel-wise values of each pixel are stored in [u8; 3] array.

Let's start by creating a new file to hold this code called io.rs.

// src/io.rs

use image::GenericImageView;

pub struct SourceData {
    pub width: usize,
    pub height: usize,
    pub image1: Vec<[u8; 3]>,
    pub image2: Vec<[u8; 3]>,
}

pub fn read_pixel_data(image1_path: String, image2_path: String) -> SourceData {
    // Open the images
    let image1 = image::open(image1_path).unwrap();
    let image2 = image::open(image2_path).unwrap();

    // Compute image dimensions
    let (width, height) = image1.dimensions();
    let (width, height) = (width as usize, height as usize);

    // Create arrays to hold input pixel data
    let mut image1_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];
    let mut image2_data: Vec<[u8; 3]> = vec![[0, 0, 0]; width * height];

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image1.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image1_data[index] = [r, g, b];
    }

    // Iterate over all pixels in the input image, along with their positions in x & y
    // coordinates.
    for (x, y, pixel) in image2.to_rgb8().enumerate_pixels() {
        // Compute the raw values for each channel in the RGB pixel.
        let [r, g, b] = pixel.0;

        // Compute linear index based on 2D index. This is basically computing index in
        // 1D array based on the row and column index of the pixel in the 2D image.
        let index = (y * (width as u32) + x) as usize;

        // Save the channel-wise values in the correct index in data arrays.
        image2_data[index] = [r, g, b];
    }

    SourceData {
        width,
        height,
        image1: image1_data,
        image2: image2_data,
    }
}

How to Blend Functions

The next step is to implement the blending functions, which are pure functions that take two pixel values as input and return the output value. This is implemented through the BlendOperation trait defined below. Let's create a new file to host all the operations called operations.rs.

// src/operations.rs

pub trait BlendOperation {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3];
}

Next, we need to implement this trait for all of the blending methods we want to support.

For showcasing the result of each of the blending modes, the following two input images are blended together

Average Blend

An average blend involves channel-wise averaging the input pixel values to get the output pixel.

// src/operations.rs

pub struct AverageBlend;

impl BlendOperation for AverageBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0] / 2 + pixel2[0] / 2,
            pixel1[1] / 2 + pixel2[1] / 2,
            pixel1[2] / 2 + pixel2[2] / 2,
        ]
    }
}

Multiply Blend

A multiply blend involves channel-wise multiplication of input pixel values after they've been normalized[¹] to get the output pixel. The output pixel is then rescaled back to the original range by multiplying with 255.

// src/operations.rs

pub struct MultiplyBlend;

impl BlendOperation for MultiplyBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((pixel1[0] as f32 / 255. * pixel2[0] as f32 / 255.) * 255.) as u8,
            ((pixel1[1] as f32 / 255. * pixel2[1] as f32 / 255.) * 255.) as u8,
            ((pixel1[2] as f32 / 255. * pixel2[2] as f32 / 255.) * 255.) as u8,
        ]
    }
}

Lighten Blend

Lighten blend involves channel-wise comparison of input pixel values, selecting the pixel with higher value (intensity) as the output pixel.

// src/operations.rs

pub struct LightenBlend;

impl BlendOperation for LightenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].max(pixel2[0]),
            pixel1[1].max(pixel2[1]),
            pixel1[2].max(pixel2[2]),
        ]
    }
}

Darken Blend

Darken blend is the opposite operation of lighten blend. It involves channel-wise comparison of input pixel values, selecting the pixel with least value (intensity) as the output pixel.

// src/operations.rs

pub struct DarkenBlend;

impl BlendOperation for DarkenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            pixel1[0].min(pixel2[0]),
            pixel1[1].min(pixel2[1]),
            pixel1[2].min(pixel2[2]),
        ]
    }
}

Screen Blend

Screen blend refers to multiplying the inverse of two images, and then inverting the result. In our implementation, the pixels first need to be normalized[¹]. The normalized[¹] values are then inverted by subtracting them from 1, then they're multiplied and inverted again.

Finally, the output is multiplied by 255 to de-normalize the output pixel value.

// src/operations.rs

pub struct ScreenBlend;

impl BlendOperation for ScreenBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            ((1. - ((1. - (pixel1[0] as f32 / 255.)) * (1. - (pixel2[0] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[1] as f32 / 255.)) * (1. - (pixel2[1] as f32 / 255.)))) * u8::MAX as f32) as u8,
            ((1. - ((1. - (pixel1[2] as f32 / 255.)) * (1. - (pixel2[2] as f32 / 255.)))) * u8::MAX as f32) as u8,
        ]
    }
}

Addition Blend

Addition blend involves adding the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also have to convert the values to u16 in order to avoid loss of value due to overflow. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct AdditionBlend;

impl BlendOperation for AdditionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as u16 + pixel2[0] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[1] as u16 + pixel2[1] as u16).clamp(0, u8::MAX as u16) as u8,
            (pixel1[2] as u16 + pixel2[2] as u16).clamp(0, u8::MAX as u16) as u8,
        ]
    }
}

Subtraction Blend

Addition blend involves subtracting the input values and then clamping the result to the maximum range of the color depth we're targeting. In this case, that would be 0-255 as we're targeting 8-bit color depth.

We also convert the values to i16 in order to avoid loss of value due to overflow and lack of sign. We can also use normalized[¹] values here to achieve the same result.

// src/operations.rs

pub struct SubtractionBlend;

impl BlendOperation for SubtractionBlend {
    fn perform_operation(&self, pixel1: [u8; 3], pixel2: [u8; 3]) -> [u8; 3] {
        [
            (pixel1[0] as i16 - pixel2[0] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[1] as i16 - pixel2[1] as i16).clamp(0, u8::MAX as i16) as u8,
            (pixel1[2] as i16 - pixel2[2] as i16).clamp(0, u8::MAX as i16) as u8,
        ]
    }
}

How to Apply Blend Functions To Images

The final step is to actually use the blending operations we created previously and apply them to pairs of images.

To achieve this, we need a function that can take the SourceData type we defined previously as input, along with a blending operation as the arguments, and gives us the final output buffer. Let's start by creating a new file for it called blend.rs.

// src/blend.rs

use image::{ImageBuffer, Rgb};
use crate::{operations::BlendOperation, SourceData};

impl SourceData {
    pub fn blend_images(&self, operation: impl BlendOperation)  -> ImageBufferu8>, Vec<u8>> {
        let SourceData {
            width,
            height,
            image1,
            image2,
        } = self;

        // Create a new buffer that has the same size as input images, which will serve as our output data
        let mut buffer = ImageBuffer::new(*width as u32, *height as u32);

        // Iterate over all pixels in the output buffer, along with their coordinates
        for (x, y, output_pixel) in buffer.enumerate_pixels_mut() {
            // Compute linear index form x & y coordinates. In other words, you have the
            // row and column indexes here, and you want to compute the array index based
            // on these two positions.
            let index = (y * *width as u32 + x) as usize;

            // Store pixel values in the given position into variables
            let pixel1 = image1[index];
            let pixel2 = image2[index];

            // Compute the blended pixel and convert it into the `Rgb` type, which is then
            // assigned to the output pixel in the buffer.
            *output_pixel = Rgb::from(operation.perform_operation(pixel1, pixel2));
        }

        buffer
    }
}

Putting It All Together

It's now time to make use of all the new things you've learnt so far, and put them together in main.rs file.

// src/main.rs

mod blend;
mod io;
mod operations;

use io::*;
use operations::{
    AdditionBlend, AverageBlend, DarkenBlend, LightenBlend, MultiplyBlend, ScreenBlend,
    SubtractionBlend,
};

fn main() {
    let source_data = read_pixel_data("image1.jpg".to_string(), "image2.jpg".to_string());

    let output_buffer = source_data.blend_images(AdditionBlend);
    output_buffer.save("addition.jpg").unwrap();

    let output_buffer = source_data.blend_images(AverageBlend);
    output_buffer.save("average.jpg").unwrap();

    let output_buffer = source_data.blend_images(DarkenBlend);
    output_buffer.save("darken.jpg").unwrap();

    let output_buffer = source_data.blend_images(LightenBlend);
    output_buffer.save("lighten.jpg").unwrap();

    let output_buffer = source_data.blend_images(MultiplyBlend);
    output_buffer.save("multiply.jpg").unwrap();

    let output_buffer = source_data.blend_images(ScreenBlend);
    output_buffer.save("screen.jpg").unwrap();

    let output_buffer = source_data.blend_images(SubtractionBlend);
    output_buffer.save("subtraction.jpg").unwrap();
}

You can now run the program using the following command, and you should have all the images generated and saved in the project folder:

cargo run --release

As you might have guessed already, this implementation only works for 8-bit RGB images. This code, however, can be extended very easily to support the other color formats such as 8-bit Luma (Monochrome), 16-bit RGB (Many RAW camera images), and so on.

I highly encourage you to try that out. You can also reach out to me for help with anything in this tutorial or with extending the code in this tutorial. I'd be happy to answer all your queries. Email is the best way to reach me, you can email me at anshul@anshulsanghi.tech.

Glossary

Normalization refers to the process of rescaling the pixel values so that the values are in floating point format and are in the range of 0-1. For example, for an 8 bit image, the color black is represented by 0 (0 in de-normalized value) and the color white is represented by 1 (255 in de-normalized value). Intermediary decimal values between 0 & 1 represent different intensities of the pixel between black and white. Normalization is done for many different reasons such as:

Preventing overflows during calculations.
Re-scaling images to the same range irrespective of their individual color depth.
Expanding possible dynamic range of the image.

Enjoying my work?

Consider buying me a coffee to support my work!

Till next time, happy coding and wishing you clear skies!

Learn College Precalculus with Python

Beau Carnes — Fri, 17 May 2024 19:07:23 +0000

Precalculus is an important mathematical course that lays the groundwork for calculus and other higher-level math subjects.

We just released a college precalculus course on the freeCodeCamp.org YouTube channel. This course is unique in that you will also learn to implement the concepts using the Python programming language. This course is ideal for students gearing up for college-level mathematics or anyone looking to strengthen their understanding of precalculus.

Ed Pratowski developed this course. He is an experienced mathematics instructor at a University in Pennsylvania.

Precalculus covers a variety of topics such as functions, trigonometry, matrices, and complex numbers. A solid grasp of precalculus is essential for success in fields like engineering, physics, computer science, and more.

In this course, you'll gain a deep understanding of essential precalculus concepts. Here’s a brief overview of what you’ll learn:

Data Analysis and Graphing: Learn to gather, interpret, and graph data effectively.
Linear Equations: Understand how to draw and analyze lines, slopes, and intercepts.
Trigonometry: Dive into the basics of trigonometry, solve right triangles, and explore degrees, radians, and the unit circle.
Trigonometric Functions: Graph the six trigonometric functions, transform these graphs, and apply trigonometry to real-world problems.
Triangles: Master the Law of Sines and Cosines, calculate the area of any triangle, and solve triangles from given points.
Matrices: Perform operations with matrices, including addition, subtraction, multiplication, and finding inverses. Use matrices to solve systems of equations and develop equations.
Complex Numbers and Series: Understand complex numbers, graph the Mandelbrot set, and explore series, including the number e.
Probability and Applications: Learn probability theory, Pascal's triangle, and the math behind gambling.
Practical Applications: Apply trigonometry to build a clock, encode and decode messages, and more.
Advanced Topics: Simplify trigonometric expressions and discover Euler's identity, known as the beautiful math formula.

This course offers clear explanations and practical examples to ensure you grasp these foundational concepts. Watch the full course on the freeCodeCamp.org YouTube channel (12-hour watch).

How to Apply Math with Python – Numerical Analysis Explained

Tiago Capelo Monteiro — Thu, 29 Feb 2024 11:41:59 +0000

Numerical analysis is the bridge between math and computer science.

Essentially, it is the development of algorithms that approximate solutions that pure math would also solve, but using less computational resources and faster.

This field is very important. Because for most solutions in the real world, we only need good approximations and not the exact solutions.

In this article, we will explore:

Analogy Illustrating the Importance of Numerical Analysis
Fundamentals of Numerical Analysis
Application of Numerical Analysis in Real-World Problems
Introduction to Partial Differential Equations (PDEs)
Introduction to Optimization in Numerical Analysis

An Analogy that Illustrates the Importance of Numerical Analysis

How can we measure the coastline of an island?

If we try to measure every centimeter of every small segment, it would be impossible and probably time-consuming.

Because of the sea, the coastline is always changing at that level of detail.

However, by approximating and measuring in larger segments, we can get a practical measurement of the coastline.

This situation mirrors numerical analysis.

Approximation gives insights in situations where precise measurement is impossible or impractical.

Just as we accept a good estimation of the coastline length, numerical analysis uses approximation to solve hard problems.

Fundamentals of Numerical Analysis

Numerical analysis is all about approximation. It is like using binoculars to see a landscape that is very far away. We can't see every leaf. But we get a good enough picture to understand the terrain.

This is crucial in numerical analysis.

In this, we solve hard math problems where exact solutions are either impossible or extremely resource-intensive.

By approximating, we get sufficient good results with less computational effort.

Application of Numerical Analysis in Real-World Problems

There are many applications of numerical analysis

In engineering, it enables simulation of structures and fluids.
In finance, for risk assessment and portfolio optimization.
In environmental science, it predicts climate patterns.

In each field, numerical analysis is a toolkit to solve problems where pure math just takes too much time, or it is impossible to give good results.

An Introduction to Partial Differential Equations (PDEs)

Partial Differential Equations (PDEs) are equations that describe how quantities like heat, sound, or electricity change in different places and as time goes on.

Solving PDEs is very important. Because it allows us to control these changes.

By allowing us to control them, we can:

Predict weather patterns.
Understand sound propagation in different environments.
Design efficient transportation systems.
Optimize energy distribution.

However, most PDE can only be approximated with numerical methods.

It is either too hard or impossible to find through normal calculations.

This way, with numerical methods, we are able to solve PDEs which in turn allows us to solve many real life problems.

Numerical Solutions of PDEs with SciPy

Solving PDEs with numerical methods often involves dividing the PDEs in small, manageable parts. Solve each one and then add them up.

SciPy, a Python library for scientific and technical computing, gives many tools for this purpose.

Now, let's solve a heat transfer problem in a rod.

In the below code, we will see line by line how it allows us to know how heat spreads in a rod:

import numpy as np
from scipy.integrate import solve_bvp

def heat_equation(x, y):
    return np.vstack((y[1], -y[0]))

def boundary_conditions(ya, yb):
    return np.array([ya[0], yb[0] - 1])

x = np.linspace(0, 1, 5)
y = np.zeros((2, x.size))

sol = solve_bvp(heat_equation, boundary_conditions, x, y)

Lets see how thhe code works block by block in the following sections.

How to importing libraries

import numpy as np
from scipy.integrate import solve_bvp

Importing libraries

Here we import 2 python libraries:

N umPy
S ciPy

These two python libraries are some of the most used in data science.

How to define the head equation and boundary conditions

def heat_equation(x, y):
    return np.vstack((y[1], -y[0]))

def boundary_conditions(ya, yb):
    return np.array([ya[0], yb[0] - 1])

Defining heat equation and boundary conditions

We create heat_equation(x, y) and boundary_conditions(ya, yb).

In heat_equation(x, y) we are defining the differential equation we want to solve.

The boundary_conditions(ya, yb) function defines constrains at the start and end of a solution. The condition is that the end of the solution needs to be one unit less than the start.

How to solve the equation

x = np.linspace(0, 1, 5)
y = np.zeros((2, x.size))

sol = solve_bvp(heat_equation, boundary_conditions, x, y)

Solving equation

The line sol = solve_bvp(heat_equation, boundary_conditions, x, y) is the solution.

The code solve_bvp stands for solve boundary value problem.

It takes four arguments:

heat_equation: This is the main problem we are trying to solve.
boundary_conditions: These are the mathematical constrains at the start and end of a solution.
x: Are the spots we choose to explore our answers.
y: Are initial attempts to solve the problem, based on your chosen x values.

An Introduction to Optimization in Numerical Analysis

Optimization is finding the best solution from all solutions. It is like finding the most efficient route in a complex network of roads.

Numerical optimization methods find the most efficient or cost-effective solution to a problem, whether that is:

Minimizing waste in production.
Maximizing efficiency in a logistic network.
Finding best fit for a certain data model.

An Overview of Numerical Optimization Techniques with SciPy

The goal in this example is to minimize transportation cost across a network.

For instance, let's consider an optimization problem in logistics, where the goal is to minimize transportation cost across a network.

SciPy's minimize function can be used to find the best strategy to minimizes cost while meeting all constraints:

from scipy.optimize import minimize

def objective_function(x):
    return x[0]**2 + x[1]**2

def constraint_eq(x):
    return x[0] + x[1] - 10

con_eq = {'type': 'eq', 'fun': constraint_eq}

bounds = [(0, 10), (0, 10)]

x0 = [5, 5]

result = minimize(objective_function, x0, method='SLSQP', bounds=bounds, constraints=[con_eq])

Lets explain how the code works block by block.

How to importing the library

from scipy.optimize import minimize

Importing scipy

Once again we import the necessary library:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

How to defining objective and constraint equation

def objective_function(x):
    return x[0]**2 + x[1]**2

def constraint_eq(x):
    return x[0] + x[1] - 10

con_eq = {'type': 'eq', 'fun': constraint_eq}

Define objective and constrain equations

The objective function is the function we want to minimize to find the best answer.
The constraint equation is the equation that limits the search space to those x values that fulfill this equation.

con_eq is defined by the following:

'type': 'eq' indicates the type of constraint. 'eq' means equality, in other words, the function must equal zero at the solution.
'fun': constraint_eq assigns the constraint function.

We will see in the next block of code, it is where we constrain the possible solutions of the problem.

How to define an initial condition and result

bounds = [(0, 10), (0, 10)]

x0 = [5, 5]

result = minimize(objective_function, x0, method='SLSQP', bounds=bounds, constraints=[con_eq])

Defining initial condition and solving equation

To understand this block of code, let's understand each parameter of result = minimize(objective_function, x0, method='SLSQP', bounds=bounds, constraints=[con_eq]):

objective_function: Is the function to be minimized.
x0: Is the initial guess for the variables.
method='SLSQP': This specifies the optimization algorithm we are using. In this case, we use SLSQP (Sequential Least SQuares Programming).
bounds=bounds: This parameter specifies the bounds for each of the decision variables.
constraints=[con_eq]: This parameter tells us the constraints applied in the optimization problem.

This is how many real life problems are solved

Many things in real life are modeled with partial differential equation.

Then, with optimization methods developed with numerical analysis, they are optimized.

I am writing this because I know math can be boring for some people, and they may not be aware of where it is applied to solve real problems. The Calculus they learn can be applied in non-ideal situations outside the exams exercises.

Here, we can see finally see why math is important in two scenarios:

To model systems to get solutions from it
To optimize a certain system

Conclusion

Numerical analysis is one of the most important areas of applied math in STEM.

From solving PDE to optimize problems, numerical analysis is everywhere.

With more complex problems, numerical analysis is growing in importance to get faster algorithms that approximate pure math solutions.

This way, it is a bridge between theoretical mathematics and practical application.

If you want to, you can get the full code used in this article on GitHub.

How the Euclidean Algorithm Works – with Code Examples in Go

freeCodeCamp — Mon, 08 May 2023 20:18:08 +0000

By Otavio Ehrenberger

The Euclidean Algorithm is a well-known and efficient method for finding the greatest common divisor (GCD) of two integers. The GCD is the largest number that can divide both integers without leaving a remainder.

The algorithm is named after the ancient Greek mathematician Euclid, who presented it in his book "Elements" around 300 BCE.

You can use this algorithm to solve Diophantine equations, to tackle the shortest-vector problem which is the foundation of lattice-based cryptography, and also to detect common patterns of pixels in images. This is, among other things, applied to optimize rendering processes and detect different objects in images.

How Does the Euclidean Algorithm Work?

Here's a step-by-step explanation of how the Euclidean Algorithm works:

Start with two positive integers, a and b, where a >= b. If a < b, simply swap their values. Note that this is meant for a convenient mathematical demonstration, as the implementation also works for a < b.

Divide a by b and find the remainder, r (use the modulo operation, represented as a % b). If r is 0, the GCD is b, and the algorithm terminates.

If r is not 0, set a to b and b to r. Then, repeat step 2.

The algorithm continues to iterate until the remainder is 0. At that point, the last non-zero remainder is the GCD of the original two numbers.

The Euclidean Algorithm works because the GCD of two numbers remains unchanged when the larger number is replaced by its remainder when divided by the smaller number.

Example of Euclidean Algorithm

Here's an example to illustrate the algorithm:

Let's find the GCD of 30 and 9:

a = 30, b = 9

Calculate the remainder: r = a % b = 30 % 9 = 3 (since 3 is not 0, continue to step 3)

Update the values: a = 9, b = 3

Calculate the new remainder: r = a % b = 9 % 3 = 0 (r is now 0)

The GCD of 30 and 9 is 3.

Why Does the Euclidean Algorithm Work?

The greatest common divisor of two integers is the largest positive integer that divides both of them without leaving a remainder. So the algorithm is based on the following key property:

If a and b are two integers, then the GCD of a and b is the same as the GCD of b and a % b, where % represents the modulo operator (the remainder after division).

Mathematically, the key property of the algorithm can be justified using the division algorithm:

Let a and b be two positive integers, such that a >= b. We can write the division algorithm as:

a = bq + r, where q is the quotient and r is the remainder.

Now, let d be a common divisor of a and b. Then, a = d * m1 and b = d * m2 for some integers m1 and m2. We can rewrite the division algorithm as:

d * m1 = (d * m2) * q + r.

Rearranging the equation, we get:

r = d * (m1 - m2 * q).

Since d is a factor of both a and b, and r can also be written as a multiple of d, we can conclude that d is also a divisor of r. This means that the GCD of a and b is also a divisor of r. So, we can replace b with r and keep finding the GCD using this algorithm until b becomes 0.

The Euclidean Algorithm is particularly useful due to its efficiency and simplicity, making it easy to implement in computer algorithms and programming languages.

Let's see some different ways to implement it in Go:

Recursive Implementation of the Euclidean Algorithm in Go

This implementation of the Euclidean Algorithm in Golang is a recursive version that finds the GCD of two integers. Let's go through it step by step:

The function is defined as GCD(a, b int) int. It takes two integer inputs, a and b, and returns an integer output.

The base case of the recursion is checked with if b == 0. If b is 0, the function returns the value of a as the GCD.

If b is not 0, a temporary variable tmp is created and assigned the value of a. This temporary variable is used to store the value of a before updating its value in the next step.

The values of a and b are updated as follows:

a is assigned the current value of b.
b is assigned the value of the remainder when tmp (the previous value of a) is divided by the new value of a (which was b before the update).

The function calls itself recursively with the updated values of a and b as input, return GCD(a, b).

The algorithm continues to call itself recursively until the base case is reached, that is b becomes 0. At this point, the function returns the GCD, which is the value of a.

// Recursive approach:
func GCD(a, b int) int {
    if b == 0 {
        return a
    }
    tmp := a
    a = b
    b = tmp % a
    return GCD(a, b)
}

For example, let's say we want to find the GCD of 56 and 48:

First call: GCD(56, 48)

Since b (48) is not 0, update a and b:
a becomes 48
b becomes 56 % 48 = 8
The function calls itself with the new values: GCD(48, 8)

Second call: GCD(48, 8)

Since b (8) is not 0, update a and b:
a becomes 8
b becomes 48 % 8 = 0
The function calls itself with the new values: GCD(8, 0)

Third call: GCD(8, 0)

Now, b (0) is 0, so the function returns a (8) as the GCD.

Iterative Implementation of the Euclidean Algorithm in Go

This implementation of the Euclidean Algorithm in Golang is an iterative version using a loop to find the GCD of two integers. Let's go through the code step by step:

The function is defined as GCD(a, b int) int. It takes two integer inputs, a and b, and returns an integer output.

A loop is used to iterate as long as b is not equal to 0. The loop condition is b != 0. Note that this for loop construction in Go is essentially a while loop in many other languages.

Inside the loop, the values of a and b are updated simultaneously using a tuple assignment: a, b = b, a%b. This line does the following:

a is assigned the current value of b.
b is assigned the value of the remainder when a is divided by b.

When the loop exits (that is, b becomes 0), the value of a is returned as the GCD.

The algorithm iterates until the remainder (b) is 0, at which point the GCD is the last non-zero remainder, which is the value of a.

func GCD(a, b int) int {
    for b != 0 {
        a, b = b, a%b
    }
    return a
}

For example, let's say we want to find the GCD of 100 and 64:

Initialize a as 100 and b as 64. Check the loop condition: b (64) is not 0.