<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Tiago Capelo Monteiro - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Tiago Capelo Monteiro - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sat, 16 May 2026 08:36:23 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/tiagomonteiro/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build Optimal AI Agents That Actually Work – A Handbook for Devs ]]>
                </title>
                <description>
                    <![CDATA[ Since moving to Silicon Valley in 2025, I've seen AI everywhere. And after I attended NVIDIA GTC 2025, one thing became very clear from many conversations I had: most companies now have AI agents runn ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-optimal-ai-agents-that-actually-work-a-handbook-for-devs/</link>
                <guid isPermaLink="false">6a024a82fca21b0d4b6c5283</guid>
                
                    <category>
                        <![CDATA[ ai-agent ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ agentic AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Mon, 11 May 2026 21:30:42 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/f1ca2c84-0c3f-4f20-84f2-9bad5cc1c915.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Since moving to Silicon Valley in 2025, I've seen AI everywhere. And after I attended NVIDIA GTC 2025, one thing became very clear from many conversations I had: most companies now have AI agents running successfully in various projects or departments.</p>
<p>But almost no one has managed to roll them out well across an entire organization. And even where agents are deployed, they're often poorly organized.</p>
<p>Companies are shipping agent systems almost by guessing.</p>
<p>Some of the questions I heard were:</p>
<ul>
<li><p>What's the right number of AI agents in a team?</p>
</li>
<li><p>What's the best model provider to use?</p>
</li>
<li><p>Should the agents have a "boss" agent supervising them, or should they coordinate peer-to-peer?</p>
</li>
</ul>
<p>In other words, the main question was:</p>
<blockquote>
<p>What is the best organizational structure for a team of AI agents?</p>
</blockquote>
<p>This article tries to answer exactly that.</p>
<p>I previously wrote <a href="https://www.freecodecamp.org/news/the-math-behind-artificial-intelligence-book/">a book on the math behind AI</a>, so we won't be doing any math here.</p>
<p>Instead, we'll focus on how to organize agents for real business cases.</p>
<p>We'll use a recent AI paper from Google Research, Google DeepMind, and MIT — <a href="https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/">Towards a Science of Scaling Agent Systems: When and Why Agent Systems Work</a> as our primary source.</p>
<p>For the code, I'll use a Jupyter notebook in Google Collab.</p>
<h3 id="heading-heres-what-well-cover">Here's What We'll Cover:</h3>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-is-an-llm">What is an LLM?</a></p>
</li>
<li><p><a href="#heading-what-are-ai-agents">What Are AI Agents?</a></p>
</li>
<li><p><a href="#heading-a-decision-algorithm-for-creating-optimal-ai-agents">A Decision Algorithm for Creating Optimal AI Agents</a></p>
</li>
<li><p><a href="#heading-three-code-examples">Three Code Examples</a></p>
<ul>
<li><p><a href="#heading-1-installing-utilities-python-libraries-and-doing-config">1. Installing Utilities, Python Libraries, and Doing Config</a></p>
</li>
<li><p><a href="#heading-2-starting-the-ollama-server-getting-the-model-and-tools">2. Starting the Ollama Server, Getting the Model and Tools</a></p>
</li>
<li><p><a href="#heading-3-testing-the-model">3. Testing the Model</a></p>
</li>
<li><p><a href="#heading-4-running-ai-agents">4. Running AI Agents</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion-the-future-of-ai-is-evals">Conclusion: The Future of AI is Evals</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You don't need to be an expert developer to create AI agents. There are many no-code tools that can help you through the process.</p>
<p>But to get the most out of the examples here (and to be able to check your agents' work and understand what they're doing), you'll need:</p>
<ul>
<li><p>A general understanding of Python and what an LLM is.</p>
</li>
<li><p>Ollama installed on your machine to run large language models locally and for free.</p>
</li>
<li><p>A Jupyter Notebook setup. Google Colab is highly recommended if you have limited local hardware or need cloud GPUs.</p>
</li>
</ul>
<p>Let's get into it!</p>
<h2 id="heading-what-is-an-llm">What is an LLM?</h2>
<p>An LLM (Large Language Model) is like a very well-read intern who has never left the library.</p>
<p>The LLM can quote, summarize, translate, and imitate almost any style. It can write a Python script and a Shakespearean sonnet in the same breath!</p>
<p>But it has limitations. For example, when an LLM is unsure, it often invents something with the same confidence it uses for topics it's sure about.</p>
<p>This is called hallucination.</p>
<p>Also, LLMs don't have memory between conversations by default, and they can't do anything on their own. For example, an LLM alone can tell you how to send an email, but it can't send one.</p>
<p>This is where agents come in.</p>
<h2 id="heading-what-are-ai-agents">What Are AI Agents?</h2>
<p>If an LLM is like an intern, an AI agent is that same intern given a desk, a laptop, and a to-do list – and the ability to act.</p>
<p>An agent is essentially an LLM that has been wrapped in tools, memory, and a loop.</p>
<p>Tools allow the agent to do things like search the web, read a particular file, send an email, and run code. Memory allows the LLM to remember what it did before in other tasks. A loop is just code that lets the LLM think, call a tool, see the result, and think again until the task is done.</p>
<p>In many cases, an individual agent is very useful. But what happens when you have a task too big for one intern (or agent in this case)?</p>
<p>Naturally, you can hire more interns! But you get new problems:</p>
<ul>
<li><p>Should you have one intern with a long to-do list (single-agent)?</p>
</li>
<li><p>Should you have five interns all working on the same task independently (independent multi-agent)?</p>
</li>
<li><p>How many interns should be on a team?</p>
</li>
<li><p>Should a boss who assigns subtasks manage the interns?</p>
</li>
<li><p>Should you have a group of peers who coordinate among themselves? A mix?</p>
</li>
</ul>
<p>This is the exact question the Google paper we're using as our primary source here tries to answer with over 150 controlled experiments.</p>
<p>Just keep in mind that having more agents doesn't always mean you'll get better results. Sometimes one agent is a perfect fit. And other times you'll need more.</p>
<h3 id="heading-some-background">Some Background</h3>
<p>Before we dive in, an important note: these are experimental findings, not laws of physics.</p>
<p>The Google paper evaluated, using an exhaustive methodology, many possible teams of AI agents and providers.</p>
<p>Some of the providers where:</p>
<ul>
<li><p>OpenAI (ChatGPT)</p>
</li>
<li><p>Google (Gemini)</p>
</li>
<li><p>Anthropic (Claude)</p>
</li>
</ul>
<p>The results of each differed by model family:</p>
<ul>
<li><p>OpenAI models gained most from centralized/hybrid setups</p>
</li>
<li><p>Google models showed a clear efficiency plateau</p>
</li>
<li><p>Anthropic models were more sensitive to coordination overhead.</p>
</li>
</ul>
<p>Since it's a persuasive study based on a lot of experiments, your team can consider these to be strong guidelines you can use when choosing a model family.</p>
<h2 id="heading-a-decision-algorithm-for-creating-optimal-ai-agents">A Decision Algorithm for Creating Optimal AI Agents</h2>
<p>Now, we'll take the research in the article and convert it into a simple-to-apply algorithm that anyone can use to create AI agents to automate their work.</p>
<p>The main objective of this algorithm is to help you decide, with the Google paper as a scientific reference, if you need just one agent or a couple more.</p>
<p>This way, instead of explaining the article step by step, I'll show you how to actually apply it to solve your problems.</p>
<h3 id="heading-1-check-your-budget">1. Check Your Budget</h3>
<p>If you have limited hardware, I recommend starting with Ollama.</p>
<p>Ollama is a tool that allows you to run LLMs on your personal computer. And when you run it locally, it's free (and open source).</p>
<p>If you use an API from OpenAI, Google, or Anthropic to access their models, you'll start spending money.</p>
<p>As of 6 of may 2026, OpenAI's GPT-5.5 costs \(5.00 per 1M tokens, but for GPT-5.4 mini, it costs \)0.75 per 1M tokens.</p>
<p>If you have limited cloud resources, you can use Google Colab to access GPUs and run larger and newer billion-parameter LLMs. Often, newer LLMs have better results in image generation, coding, and others.</p>
<p>You can also use LLMs with Ollama in Google Colab.</p>
<p>If you have a company project, I recommend this same cloud-based option. It allows you to build a demo and run evaluations in an environment with more memory than most local office hardware provides.</p>
<p>If you have a flexible budget, you can use professional APIs like Claude or Gemini.</p>
<p>Always remember that agents cost tokens, and tokens cost money.</p>
<h3 id="heading-2-start-with-only-one-agent">2. Start with Only ONE Agent</h3>
<p>Always begin with a single agent. Usually, if you're using frontier models, they'll have better performance than older open source models.</p>
<h3 id="heading-3-measure-performance">3. Measure Performance</h3>
<p>According to the paper, if a single agent's real-world success rate (how well it works and how accurately it performs) is more than 45%, then there's typically no need to create a team of agents for the task.</p>
<p>To measure this, run the agent on 50–100 representative tasks. Then, score each against a quality bar you defined before starting (human review, a known-good answer, or a checklist).</p>
<p>Note that the paper's 45% finding is only one-directional: it identifies when <strong>not</strong> to add agents (above 45%). But the rule doesn't go the other way and state that if performance is below 45%, that means another agent or two will help.</p>
<p>The authors state that "coordination benefits arise from matching communication topology to task structure, not from scaling the number of agents".</p>
<p>Basically, if your agent underperforms, fix the agent first! Don't just automatically think you need another agent.</p>
<p>If you determine, for your project, that a single agent works, then go ahead to step 7.</p>
<p>If the single agent's performance is below 45%, first try improving it (better prompts, tools, or model). Only consider creating a team of agents if the task is naturally parallel (see the next step).</p>
<h3 id="heading-4-assess-task-parallelism">4. Assess Task Parallelism</h3>
<p>A big question then becomes, why use multiple agents at all? Here's how you can decide:</p>
<p>If your task involves just one continuous job, a single agent typically does it better and cheaper.</p>
<p>But multiple agents can help when you can clearly split your project into discrete subtasks. Then a different specialist (agent) can tackle each subtask and multiple agents can work on multiple tasks in parallel.</p>
<p>In this step of our algorithm, you want to see if the task you're trying to apply the AI agents to is naturally parallel.</p>
<p>A task is naturally parallel if it can be split into independent subtasks. For example:</p>
<ul>
<li><p>Searching for the best flight across five different websites.</p>
</li>
<li><p>Summarizing ten separate news articles at once.</p>
</li>
</ul>
<p>Examples where tasks are not naturally parallel:</p>
<ul>
<li><p>Planning a trip from start to finish (you must choose a destination before booking a hotel, for example – so those tasks can't be completed in parallel).</p>
</li>
<li><p>Managing a bank transfer (the funds must be verified before they're sent).</p>
</li>
</ul>
<p>If the task is naturally parallel, you may benefit from more agents, and you should continue on to step 5.</p>
<p>If it's not (the task is sequential or step-by-step), stop. According to the article's research, multi-agent teams will just negatively impact the result in these cases and you should stick to one agent.</p>
<p>In this case (not naturally parallel), you can just work on improving your prompts, tools, or your model for the single agent. Then after it beats the 45%, go to step 7.</p>
<h3 id="heading-5-pick-the-topology-by-task-type">5. Pick the Topology by Task Type</h3>
<p>Now we'll decide on the structure for our agent team.</p>
<p>Topology simply means the structure of a system. In this case, we're talking about the structure of the team of AI agents.</p>
<p>This step only applies once you've decided you need multiple agents. Both topologies we'll examine here are multi-agent.</p>
<p>If the task is based on analysis or structured work, it's better to use a centralized model. A centralized model is like a manager managing a group of interns below them. The interns report to the manager, and the manager coordinates them.</p>
<p>A centralized model is good for pipelines like financial reports.</p>
<p>According to the study, this reduces error amplification from ~17x to 4x. This means that, when the manager makes a mistake, instead of 17 errors being created by the interns, there are more like 4 errors.</p>
<p>If the task is more related to exploration, use a decentralized model.</p>
<p>They're good for open-ended research or audits where agents review the same material from different angles.</p>
<p>A decentralized model is like interns in a team brainstorming ideas for a new product for the company or discussing over lunch how to make a process faster.</p>
<h3 id="heading-6-cap-the-team-size-and-available-tools-per-agent">6. Cap the Team Size and Available Tools Per Agent</h3>
<p>According to the paper, AI agent success starts to degrade after about 3–4 agents.</p>
<p>They also explain that each agent should have access to the minimum tools necessary (1–3 tools per agent). The more tools each agent has, the worse it performs.</p>
<h3 id="heading-7-build-evaluations">7. Build Evaluations</h3>
<p>Now, you have something that works most of the time. But how can you ensure the agents will scale across the organization? For this reason, now you need to establish internal tests before scaling the agents.</p>
<p>These internal tests are called evals (evaluations).</p>
<p>For each evaluation, you'll need to have clear metrics that let you know how the agents are performing in each evaluation.</p>
<p>You'll want to measure things like accuracy, efficiency, and trajectory. Accuracy tells us if the model got it right. Efficiency reports how fast and cheap it was to process the request. And trajectory shows if the model used the right tools to do the task.</p>
<p>Remember, in AI and engineering in general, if you can't measure the system's performance, you can't trust the system.</p>
<p>This way, you can start seeing how well the model performs with the data your organization works with and its context. Using these evals, you can help the agents become more independent and better over time.</p>
<p>Evals might be:</p>
<ul>
<li><p>Input emails and output responses expected</p>
</li>
<li><p>Input customer support transcripts and outputs summarized action items</p>
</li>
<li><p>Input complex legal contracts and outputs identified high-risk clauses</p>
</li>
</ul>
<p>Then you see how close the agent's or agents' outputs are to the expected output.</p>
<p>You can also try different models and go through this decision algorithm again to see which models work best for your use case. After all, new models are often better than previous models.</p>
<p>With this workflow in place, you'll create more accurate and efficient agents.</p>
<p>Now let's look at this algorithm in action using three use cases.</p>
<h2 id="heading-three-code-examples">Three Code Examples</h2>
<p>In this section, I'll explain how I ran the code in the Jupyter notebook. I recommend that you copy the code and run it yourself so you can follow along and understand how it works.</p>
<p>We'll start the code in the sections I defined in the Google Colab so that you understand everything.</p>
<p>You can find the <a href="https://github.com/tiagomonteiro0715/How-to-Build-Optimal-AI-Agents-That-Actually-Work-Handbook">here on GitHub as well</a>. I used the MIT license for this code.</p>
<h3 id="heading-1-installing-utilities-python-libraries-and-doing-config">1. Installing Utilities, Python Libraries, and Doing Config</h3>
<pre><code class="language-python">!sudo apt update &amp;&amp; sudo apt install -y pciutils
!sudo apt-get install -y zstd
!curl -fsSL https://ollama.com/install.sh | sh
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/c91a3d8b-18dd-4850-bca6-ae707e69736c.png" alt="c91a3d8b-18dd-4850-bca6-ae707e69736c" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This code essentially prepares the notebook to run AI agents.</p>
<p>The first line updates the package list and installs hardware detection tools to identify your GPU. The second line installs a high-speed decompression utility needed to unpack model files. Finally, it downloads the official Ollama setup script and executes it to install the software.</p>
<p>Ollama is an open-source tool that allows you to use LLMs on your computer.</p>
<pre><code class="language-python">!pip install uv
!uv pip install langchain-ollama ollama crewai duckduckgo-search langchain-community ddgs faker
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/d86340f3-3a19-4a89-9975-ecb4116d379a.png" alt="d86340f3-3a19-4a89-9975-ecb4116d379a" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here, we downloaded the <code>uv</code> Python package. It's like pip but far faster and safer.</p>
<p>With this, we can download the rest of the Python libraries much more quickly.</p>
<pre><code class="language-python">import socket
import subprocess
import threading
import time

import ollama
from crewai import Agent, Crew, LLM, Process, Task
from IPython.display import Markdown
from langchain_ollama.llms import OllamaLLM

from crewai.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

from faker import Faker
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/60effe35-2293-4201-afb0-f561a64470e4.png" alt="60effe35-2293-4201-afb0-f561a64470e4" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>With the above code, we imported all the Python libraries needed to create optimal AI agents.</p>
<p>Let's see what each one does:</p>
<ul>
<li><p><a href="https://github.com/python/cpython/blob/main/Lib/socket.py">socket</a>: Connects your computer to others over a network.</p>
</li>
<li><p><a href="https://github.com/python/cpython/blob/main/Lib/subprocess.py">subprocess</a>: Lets Python launch and control other programs on your computer.</p>
</li>
<li><p><a href="https://github.com/python/cpython/blob/main/Lib/threading.py">threading</a>: Runs multiple tasks at once so one slow process doesn't freeze the whole code.</p>
</li>
<li><p><a href="https://github.com/python/cpython/blob/main/Modules/timemodule.c">time</a>: Handles delays and timestamps, like making the code wait or measuring speed.</p>
</li>
<li><p><a href="https://github.com/ollama/ollama-python">ollama</a>: The tool we'll use for talking to AI models running locally on your machine.</p>
</li>
<li><p><a href="https://github.com/crewAIInc/crewAI">crewai</a>: Organizes multiple AI agents to work together like a specialized team.</p>
</li>
<li><p><a href="https://github.com/ipython/ipython">IPython</a>: Powers interactive coding features and pretty-printing in tools like Jupyter.</p>
</li>
<li><p><a href="https://github.com/langchain-ai/langchain/blob/master/libs/partners/ollama/README.md">langchain_ollama</a>: Plugs local Ollama models into the popular LangChain AI framework.</p>
</li>
<li><p><a href="https://github.com/langchain-ai/langchain-community">langchain_community</a>: Offers hundreds of extra "connectors" to link AI to the outside world.</p>
</li>
<li><p><a href="https://github.com/joke2k/faker">faker</a>: Generates realistic "dummy" data (names, emails) for testing your code safely.</p>
</li>
</ul>
<pre><code class="language-python">fake = Faker("en_US")

Faker.seed(42)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/6d896775-9db5-4d1a-b144-07b035f1dc35.png" alt="6d896775-9db5-4d1a-b144-07b035f1dc35" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In these two lines of code, we configured the Faker Python library to generate fake data in English from the United States.</p>
<h3 id="heading-2-starting-the-ollama-server-getting-the-model-and-tools">2. Starting the Ollama Server, Getting the Model and Tools</h3>
<pre><code class="language-python">with open("ollama.log", "w") as log_file:
    process = subprocess.Popen(["ollama", "serve"], stdout=log_file, stderr=log_file)

def is_server_ready(port=11434):
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        return s.connect_ex(('localhost', port)) == 0

print("Booting Ollama server...")
max_retries = 20
ready = False

for i in range(max_retries):
    if is_server_ready():
        ready = True
        break
    time.sleep(1)
    if i % 5 == 0:
        print(f"Still waiting... ({i}s)")

if ready:
    print("\n Success! Ollama is running and ready for models.")
    !curl -s http://localhost:11434 | grep "Ollama is running"
else:
    print("\n Error: Ollama server failed to start. Check 'ollama.log' for details.")
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/1daf506b-fb25-4487-9bb3-887b37bb0aaf.png" alt="1daf506b-fb25-4487-9bb3-887b37bb0aaf" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This code helps ensure that your local environment is fully prepared before your AI models try to run.</p>
<p>AI servers often take some time to boot, so just be patient.</p>
<p>This script prevents "connection refused" errors by using a background process to start Ollama and a network "handshake" to confirm that it's awake.</p>
<pre><code class="language-python">!ollama pull mistral-small3.2
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/ce54b7e0-0b4f-4751-b797-ac4bd45cae63.png" alt="ce54b7e0-0b4f-4751-b797-ac4bd45cae63" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this line, we loaded the <code>mistral-small3.2</code> LLM to the Google Colab notebook.</p>
<p>Mistral is a model developed by a well-known French startup, Mistral AI SAS.</p>
<pre><code class="language-python">_ddg = DuckDuckGoSearchRun()

@tool("web_search")
def web_search(query: str) -&gt; str:
    """Search the public web via DuckDuckGo. Input: a concise search query string. Returns: top result snippets as plain text."""
    return _ddg.run(query)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/0cadabf5-d454-418d-844c-3167a68283bd.png" alt="0cadabf5-d454-418d-844c-3167a68283bd" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code we've created a tool for our agents to use: we're giving the agents the ability to search the web with DuckDuckGo. DuckDuckGo is one of the most popular privacy-focused search engines on the web.</p>
<p>This is crucial because it enables our agents to provide recent information they haven't yet been programmed to know.</p>
<h3 id="heading-3-testing-the-model">3. Testing the Model</h3>
<p>Now we'll write the code that's the layout where we'll define and test the LLM.</p>
<p>We're initializing both a standard model for direct tasks and a specialized LLM object for the CrewAI framework. It's the specialized LLM object for the CrewAI framework that we'll use to power our AI agents.</p>
<p>This initial configuration is important because it validates that your machine is properly communicating with the software before you try to create AI agents.</p>
<pre><code class="language-python">AI_prompt = "Write a quick system prompt for an AI agent whose job is to summarize financial documents."

AI_model = OllamaLLM(model="mistral-small3.2")

crew_llm = LLM(
    model="ollama/mistral-small3.2",
    base_url="http://localhost:11434"
)

print("Running Mistral...")
AI_response = AI_model.invoke(AI_prompt)
display(Markdown(f"### AI Output:\n{AI_response}"))
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/5f76b8c8-6713-40dd-a624-fc83fb35f666.png" alt="5f76b8c8-6713-40dd-a624-fc83fb35f666" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-4-running-the-ai-agents">4. Running the AI Agents</h3>
<p>Now, we'll run three different agent configurations.</p>
<p>The first one is a single agent for sequential tasks. The second one is a centralized team, and the third one is a decentralized team.</p>
<h4 id="heading-sequential-tasks-with-a-single-agent">Sequential Tasks with a Single Agent</h4>
<pre><code class="language-python">doc_5_1 = f"""{fake.company()} {fake.company_suffix()} — Q3 2026 Earnings Report
Prepared by: {fake.name()}, CFO
KEY METRICS
Revenue: ${fake.random_int(50, 500)}M (up {fake.random_int(5, 25)}% YoY)
Net Income: ${fake.random_int(10, 80)}M
Operating Margin: {fake.random_int(12, 28)}%
Active Customers: {fake.random_int(10_000, 500_000):,}
Cash on Hand: ${fake.random_int(100, 900)}M
Employee Headcount: {fake.random_int(200, 5000):,}
MANAGEMENT COMMENTARY
{fake.paragraph(nb_sentences=5)}
RISK FACTORS
{fake.paragraph(nb_sentences=4)}
"""
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/15c0b2f4-9e8e-4ed1-950b-2d897502ae28.png" alt="15c0b2f4-9e8e-4ed1-950b-2d897502ae28" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, we prepared the general template where the fake data will be generated.</p>
<pre><code class="language-python">print(doc_5_1)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/c16aa43e-da98-4255-be6e-0ba60b342163.png" alt="c16aa43e-da98-4255-be6e-0ba60b342163" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">Rodriguez, Figueroa and Sanchez and Sons — Q3 2026 Earnings Report
Prepared by: Megan Mcclain, CFO
KEY METRICS
Revenue: $94M (up 23% YoY)
Net Income: $64M
Operating Margin: 13%
Active Customers: 25,622
Cash on Hand: $195M
Employee Headcount: 1,991
MANAGEMENT COMMENTARY
Own night respond red information last everything. Serve civil institution. Choice whatever from behavior benefit. Page southern role movie win her.
RISK FACTORS
Stop peace technology officer relate. Product significant world. Term herself law street class. Decide environment view possible participant commercial. Clear here writer policy news.
</code></pre>
<p>With this code, we printed the document the agent will process.</p>
<pre><code class="language-python">analyst = Agent(
    role="Senior Financial Document Specialist",
    goal=(
        "Read the provided document end-to-end, extract the 5 most decision-relevant KPIs "
        "(with units, period, and source line when available), and produce a CEO-ready summary. "
        "When a figure is missing or ambiguous, use web_search to verify it against public sources."
    ),
    backstory=(
        "You have 10+ years auditing 10-Ks, earnings releases, and investor decks at a Big Four firm. "
        "You work linearly, cite page/section for every metric, and never invent numbers — "
        "if a value isn't in the text, you search for it or mark it as 'not disclosed'."
    ),
    tools=[web_search],
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/528b2693-3b24-4119-b88e-3eda4d1d9141.png" alt="528b2693-3b24-4119-b88e-3eda4d1d9141" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, we defined an agent that acts as an analyst. This analyst will analyze the report that's generated. It will also have access to DuckDuckGo.</p>
<pre><code class="language-python">task_1 = Task(
    description=(
        "Analyze the following document for KPI metrics.\n\n"
        "DOCUMENT:\n"
        f"{doc_5_1}"
    ),
    agent=analyst,
    expected_output="A list of 5 key KPIs found in the text.",
)

task_2 = Task(
    description="Based on the KPIs extracted in the previous task, write a professional executive summary.",
    agent=analyst,
    expected_output="A 200-word summary suitable for a CEO.",
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/b5737a9c-ccc8-477c-b859-bf6de5a82f87.png" alt="b5737a9c-ccc8-477c-b859-bf6de5a82f87" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The analyst will only have two tasks: one is to find KPI metrics and the second is to write a report of the document. So, in this way we have sequential tasks performed by only one AI agent, and we're following the empirical guidelines of the Google paper.</p>
<pre><code class="language-python">sequential_crew = Crew(
    agents=[analyst],
    tasks=[task_1, task_2],
    process=Process.sequential
)

print("Running Case 1: Sequential...")
result_1 = sequential_crew.kickoff()
display(Markdown(f"### Case 1 Result:\n{result_1}"))
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/c1a24352-e8e3-4f49-a2d0-c7e0bd75d3db.png" alt="c1a24352-e8e3-4f49-a2d0-c7e0bd75d3db" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">Dear CEO,

I am pleased to present a concise overview of Rodriguez, Figueroa and Sanchez and Sons Q3 2026 Earnings Report. Our company has demonstrated strong financial performance this quarter. We reported a significant increase in revenue, achieving $94 million, which represents a substantial 23% year-over-year growth. This growth is a testament to our effective business strategies and the increasing demand for our products or services.

Our net income for the quarter stands at $64 million, showcasing our ability to maintain robust profitability. The operating margin of 13% further highlights our efficient cost management and operational excellence. Customer satisfaction and engagement continue to be a priority, as evidenced by our growing base of 25,622 active customers.

In terms of liquidity, we have a solid cash position of $195 million, ensuring that we have the necessary resources to seize new opportunities and navigate any challenges that may arise. Our employee headcount of 1,991 reflects our commitment to talent acquisition and development.

In conclusion, this quarter's results underscore our strong market position and the successful execution of our business strategies. We remain optimistic about our future prospects and are committed to driving sustainable growth and shareholder value. Let's continue to build on this momentum in the coming quarters.

Best Regards, [Your Name]
</code></pre>
<p>Finally, we've run the agent we created and the above is the agent's report.</p>
<h4 id="heading-centralized-team-of-four-agents">Centralized Team of Four Agents</h4>
<p>Now we'll create a team of four agents so you can see how multiple agents work.</p>
<p>This team researches lithium market trends to carry out financial modeling and generate an investment proposal based on data.</p>
<p>A centralized team works here because each step feeds into the next. We start our research, then we study the research, and finally we make a recommendation.</p>
<p>Let's build the first one that will research the market:</p>
<pre><code class="language-python">researcher = Agent(
    role="Commodity Market Researcher (Battery Metals)",
    goal=(
        "Produce dated, sourced price data points for 2026 lithium carbonate and lithium hydroxide forecasts. "
        "Always pull from web_search; never guess. Return each data point as: value, unit, date, source URL."
    ),
    backstory=(
        "Ex-analyst at a commodities desk. You trust only primary sources (IEA, Benchmark Mineral Intelligence, "
        "Fastmarkets, company filings) and you flag any figure that lacks a verifiable source."
    ),
    tools=[web_search],
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/6d204267-0a65-4b0a-b93a-844282724550.png" alt="6d204267-0a65-4b0a-b93a-844282724550" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The first agent we created will search the web for data related to lithium. For this task it will have access to DuckDuckGo.</p>
<p>Now we'll create an agent that knows and works in finance to model the data the researcher got.</p>
<pre><code class="language-python">finance_pro = Agent(
    role="Capex Financial Modeler",
    goal=(
        "Take the researcher's price data and run a 10-year NPV and IRR simulation at a 10% discount rate, "
        "stating all assumptions explicitly and returning a table plus a short narrative."
    ),
    backstory=(
        "You've built DCF models for gigafactory investments. You show your formulas, label base/bull/bear cases, "
        "and refuse to produce a number without stating the inputs behind it."
    ),
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/375e5943-3bd4-4c05-8ab1-4fcc10dab892.png" alt="375e5943-3bd4-4c05-8ab1-4fcc10dab892" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The finance agent will use the researcher's information and make simulations of it.</p>
<p>From there, we'll define another agent that will advise us on strategy based on the financial model:</p>
<pre><code class="language-plaintext">strategy_advisor = Agent(
    role="Investment Strategy Advisor",
    goal=(
        "Synthesize the researcher's price data and the modeler's NPV/IRR results into a "
        "clear go/no-go recommendation, with the top 3 risks and the conditions under which "
        "the recommendation flips."
    ),
    backstory=(
        "Former MD at a project-finance fund. You translate models into decisions and always "
        "name the sensitivities that would change your call."
    ),
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/daf6b079-cb53-410b-a2cb-5d7d933a13f6.png" alt="daf6b079-cb53-410b-a2cb-5d7d933a13f6" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This way, we have one agent to do the research, another to do the modeling, and a final one to advise us on strategy.</p>
<pre><code class="language-python">centralized_crew = Crew(
    agents=[researcher, finance_pro, strategy_advisor],
    tasks=[
        Task(description="Research 2026 lithium price forecasts.", agent=researcher, expected_output="Price data points."),
        Task(description="Run an NPV simulation using prices.", agent=finance_pro, expected_output="Full NPV report."),
        Task(description="Issue a go/no-go recommendation based on the NPV report.", agent=strategy_advisor, expected_output="Go/no-go memo with top 3 risks."),
    ],
    process=Process.hierarchical,
    manager_llm=crew_llm
)

print("Running Case 2: Centralized (Hierarchical)...")
result_2 = centralized_crew.kickoff()
display(Markdown(f"### Case 2 Result:\n{result_2}"))
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/90723254-2519-4187-a208-d014c7b20b66.png" alt="90723254-2519-4187-a208-d014c7b20b66" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, we create the 4th agent. This is the<code>manager_llm</code>, and it auto-spawns the manager that will review the other agents' work.</p>
<p>Then, we run the three agents together.</p>
<h4 id="heading-decentralized-team-of-three-agents">Decentralized Team of Three Agents</h4>
<p>Now we'll create a decentralized team of three agents. Once again, the first step is to create the data.</p>
<p>A decentralized model fits here because the auditors review the same data from different angles. Also, the auditors cross-reference findings.</p>
<pre><code class="language-python">groups = ["Group A (men)", "Group B (women)", "Group C (under-40)", "Group D (over-40)"]
hiring_stats = "\n".join(
    f"{g}: {fake.random_int(40, 120)} applicants, {fake.random_int(5, 25)} hired"
    for g in groups
)
feedback = "\n".join(
    f'- Candidate {fake.name()}: "{fake.sentence(nb_words=12)}"'
    for _ in range(6)
)
doc_5_3 = f"""Q1 2026 Hiring Audit Data — {fake.company()}
APPLICANT POOL &amp; SELECTION RATES
{hiring_stats}
INTERVIEWER FEEDBACK NOTES (sample)
{feedback}
"""
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/5ff84edc-306e-460b-bb3a-181254cbab79.png" alt="5ff84edc-306e-460b-bb3a-181254cbab79" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>We also defined a general template to generate the fake data.</p>
<pre><code class="language-python">print(doc_5_3)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/d68ddc9a-15c6-4f0f-aa12-ecdf08e6c7d0.png" alt="d68ddc9a-15c6-4f0f-aa12-ecdf08e6c7d0" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">Q1 2026 Hiring Audit Data — Zimmerman Inc
APPLICANT POOL &amp; SELECTION RATES
Group A (men): 81 applicants, 6 hired
Group B (women): 69 applicants, 6 hired
Group C (under-40): 80 applicants, 17 hired
Group D (over-40): 74 applicants, 7 hired
INTERVIEWER FEEDBACK NOTES (sample)
- Candidate Tommy Walter: "Defense material those poor central cause seat much section investment on gun."
- Candidate Brenda Snyder PhD: "Check civil quite others his other life edge."
- Candidate Terri Frazier: "Race Mr environment political born itself law west."
- Candidate Deborah Mason: "Medical blood personal success medical current hear claim well."
- Candidate Tamara George: "Affect upon these story film around there water beat magazine attorney set she campaign."
- Candidate Joshua Baker: "Institution deep much role cut find yet practice just military building different full open discover detail."
</code></pre>
<p>Above is the fake data we generated.</p>
<p>Now, we'll create three auditors.</p>
<p>The first auditor focuses on the demographic groups of the people it hires.</p>
<pre><code class="language-python">auditor_a = Agent(
    role="Statistical Hiring Auditor",
    goal=(
        "Compute selection-rate ratios across demographic groups for the Q1 hiring batch, "
        "apply the 4/5ths rule, and flag any group where the ratio falls below 0.80. "
        "Use web_search only to confirm regulatory definitions."
    ),
    backstory=(
        "Former EEOC compliance analyst. You are rigorously numerical, cite the Uniform "
        "Guidelines on Employee Selection Procedures, and never draw qualitative conclusions "
        "outside your lane."
    ),
    tools=[web_search],
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/bd05e48c-156e-4f34-aaa7-6ded4e460a46.png" alt="bd05e48c-156e-4f34-aaa7-6ded4e460a46" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Then we'll define the second auditor for recruitment processing. This one seeks to find bias in the way interviews are conducted.</p>
<pre><code class="language-python">auditor_b = Agent(
    role="Qualitative Bias Reviewer",
    goal=(
        "Read interview notes and written feedback for coded language, inconsistent rubric "
        "application, and sentiment skew across candidate groups. Combine your findings with "
        "the statistical auditor's numbers into one final report."
    ),
    backstory=(
        "I/O psychologist with a focus on structured-interview research. You cite specific "
        "phrases as evidence and distinguish 'concerning pattern' from 'isolated incident'."
    ),
    tools=[web_search],
    llm=crew_llm,
    verbose=True,
    allow_delegation=False,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/bcb01353-cab0-4fa1-8ca5-22aacc8ed88e.png" alt="bcb01353-cab0-4fa1-8ca5-22aacc8ed88e" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Finally, we create a third auditor that will focus on whether the the various hiring policies are met or not.</p>
<pre><code class="language-plaintext">auditor_c = Agent(
    role="Process &amp; Policy Compliance Auditor",
    goal=(
        "Review the hiring process for adherence to documented policy: structured-interview "
        "use, rubric consistency, and required approval steps. Cross-check the statistical "
        "and qualitative findings to surface root-cause process gaps."
    ),
    backstory=(
        "Internal audit lead with an HR-ops background. You map findings to specific policy "
        "clauses and recommend concrete process fixes."
    ),
    tools=[web_search],
    llm=crew_llm,
    verbose=True,
    allow_delegation=True,
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/d1be79dd-7346-4d6a-b794-672050a97aa4.png" alt="d1be79dd-7346-4d6a-b794-672050a97aa4" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In each auditor initialization, we define 'allow_delegation=True'. This way, the agents know they can communicate with each other.</p>
<p>Then we give each auditor a task.</p>
<pre><code class="language-python">task_audit_stats = Task(
    description=(
        "Audit the Q1 hiring batch for structural bias. "
        "Compute selection rates per group and flag any disparities.\n\n"
        "DATA:\n"
        f"{doc_5_3}"
    ),
    agent=auditor_a,
    expected_output="A report highlighting any group disparities found.",
)

task_audit_review = Task(
    description=(
        "Review the findings of the Statistical Auditor and add qualitative "
        "context from the interviewer notes in the original document."
    ),
    agent=auditor_b,
    expected_output="A final combined audit report with numbers and narrative.",
)

task_audit_process = Task(
    description=(
        "Using the statistical and qualitative findings above, identify process-level root "
        "causes (e.g. unstructured interviews, missing rubrics, approval gaps) and propose fixes."
    ),
    agent=auditor_c,
    expected_output="A process-gap list with policy references and recommended fixes.",
)
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/5af5e0b0-14d7-4a5b-a274-a0df4b7012cb.png" alt="5af5e0b0-14d7-4a5b-a274-a0df4b7012cb" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Finally, we assemble the auditor team:</p>
<pre><code class="language-python">decentralized_crew = Crew(
    agents=[auditor_a, auditor_b, auditor_c],
    tasks=[task_audit_stats, task_audit_review, task_audit_process],
    process=Process.sequential,
)

print("Running Case 3: Decentralized (Peer Review)...")
result_3 = decentralized_crew.kickoff()
display(Markdown(f"### Case 3 Result:\n{result_3}"))
</code></pre>
<img src="https://cdn.hashnode.com/uploads/covers/66b61567c4f938d4d78aca50/c9cfff42-eb86-4f57-9840-7f85cc83768a.png" alt="c9cfff42-eb86-4f57-9840-7f85cc83768a" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">
Case 3 Result:
Combined Audit Report: Q1 Hiring Batch Audit for Structural Bias
Statistical Audit Findings:

    Applicant Pool and Selection Rates:
        Group A (men): 81 applicants, 6 hired
            Selection Rate: 6/81 = 0.074074 (7.41%)
        Group B (women): 69 applicants, 6 hired
            Selection Rate: 6/69 = 0.08696 (8.70%)
        Group C (under-40): 80 applicants, 17 hired
            Selection Rate: 17/80 = 0.2125 (21.25%)
        Group D (over-40): 74 applicants, 7 hired
            Selection Rate: 7/74 = 0.094595 (9.46%)

    Selection Rate Ratios:
        Group A / Group B: 0.074074 / 0.08696 = 0.85 (85%)
        Group C / Group D: 0.2125 / 0.094595 = 2.24 (224%)

    Application of the 4/5ths Rule:
        Group A (men) vs Group B (women): The selection rate ratio is 0.85, which is above the 0.80 threshold.
        Group C (under-40) vs Group D (over-40): The selection rate ratio is 2.24, which is above the 0.80 threshold.

    Conclusion: Based on the selection rate analysis, no group disparities are flagged as falling below the 0.80 threshold according to the 4/5ths rule.

Qualitative Audit Findings:
Group A (men) vs Group B (women):

    Concerning Patterns:
        Feedback Inconsistency:
            Isolated Incident: "Candidate lacked experience but showed strong potential."
                This feedback was given to a female candidate but not to similarly situated male candidates.
        Sentiment Skew:
            Concerning Pattern: More frequently in female candidate assessments the phrases "needs improvement in leadership skills" and "less assertive" were observed.

Group C (under-40) vs Group D (over-40):

    Concerning Patterns:
        Feedback Inconsistency:
            Concerning Pattern: Phrases like "strong strategic thinker" and "in-depth industry knowledge" frequently used to describe over-40 candidates.
                Similar competence indicators were not noted in feedback for candidates under 40.
        Sentiment Skew:
            Isolated Incident: For a few under-40 candidates, feedback noted "lacks experience in leading teams."
                This sentiment was not applied to under-40 candidates with similar profiles but differed in gender.

Additional Notes:

    Rubric Application:
        Concerning Pattern: The rubric application was inconsistent when evaluating "leadership skills" and "assertiveness" especially between male and female candidates.
        Isolated Incident: Some reviewers emphasized "cultural fit" for female candidates which was not a requirement and was not consistently applied.

Final Conclusion:

Based on the selection rate analysis, no group disparities are flagged as falling below the 0.80 threshold according to the 4/5ths rule. However, qualitative findings indicate potential biases in feedback and rubric application which could influence hiring decisions. Recommendations:

    Standardize evaluation criteria and implement unbiased language in evaluations.
    Conduct further training to ensure consistent understanding and application of rubric standards across all reviewers.
    Monitor the impact of these interventions in future hiring cycles to ensure equitable selection practices.
</code></pre>
<p>Above, you can see the report from the three auditors about the hiring process.</p>
<h2 id="heading-conclusion-the-future-of-ai-is-evals">Conclusion: The Future of AI is Evals</h2>
<p>If you remember one thing from this article, let it be this: <strong>The organizations that win with AI agents are not the ones with the most agents. They are the ones with the best evals.</strong></p>
<p>The Google paper gave us simple rules for picking agent architectures. Those rules are very useful, and I've laid them out&nbsp;in the form of an algorithm.</p>
<p>But those rules were derived from benchmarks, not an organization's data. For that reason, you have to build your own evals. Nobody knows what "correct" looks like in your domain except you.</p>
<p>This is the same point made by Sam Bhagwat in <a href="https://mastra.ai/blog/principles-of-ai-engineering">Principles of Building AI Agents</a>, which I'd recommend to anyone shipping agents.</p>
<p>So here's the playbook again:</p>
<ol>
<li><p><strong>Check your budget first:</strong> Tokens cost money. Know what you can spend per task.</p>
</li>
<li><p><strong>Always start with one agent:</strong> If it solves the task &gt;45% of the time, ship it. Don't add agents.</p>
</li>
<li><p><strong>Only build a team if the task is naturally parallel:</strong> Sequential tasks get worse with a team.</p>
</li>
<li><p><strong>Match topology to task:</strong> For analysis it is better a centralized team. For open web research it is betetr a decentralized team. If it is sequential, it is better just one agent.</p>
</li>
<li><p><strong>Cap teams at 3–4 agents and no more than 3 tools per agent:</strong> Like in real life the smaller the team the more agile and less mistakes it makes.</p>
</li>
<li><p><strong>Put a supervisor on any parallel setup:</strong> According to the study, unchecked swarms amplify errors ~17×. Supervised ones ~4×.</p>
</li>
<li><p><strong>Build evals before you scale:</strong> Synthetic tests, historical back-tests, LLM-as-judge with human calibration.</p>
</li>
</ol>
<p>And keep humans in the loop for high-stakes decisions.</p>
<p>Once again, agents are like interns. Now, whether they produce great work or burn down the organization depends on how well you organize and check their work.</p>
<p>You can find the <a href="https://github.com/tiagomonteiro0715/How-to-Build-Optimal-AI-Agents-That-Actually-Work-Handbook">code on GitHub here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Math Behind Artificial Intelligence: A Guide to AI Foundations [Full Book] ]]>
                </title>
                <description>
                    <![CDATA[ "To understand is to perceive patterns." - Isaiah Berlin This is not a math book filled with complex formulas, theorems, and concepts that are hard to grasp. Instead, it’s a detailed guide where we’l ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-math-behind-artificial-intelligence-book/</link>
                <guid isPermaLink="false">695d974f512957bf332d653a</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ book ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Tue, 06 Jan 2026 23:14:23 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767723634484/4748bd8a-26a1-4d9c-89c3-1a6d07bde69e.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <blockquote>
<p>"To understand is to perceive patterns." - Isaiah Berlin</p>
</blockquote>
<p>This is <strong>not</strong> a math book filled with complex formulas, theorems, and concepts that are hard to grasp.</p>
<p>Instead, it’s a detailed guide where we’ll break complex ideas down into simpler terms.</p>
<p>Even if you only have a general understanding of algebra, you should be able to easily follow along.</p>
<h3 id="heading-heres-what-well-cover">Here’s what we’ll cover:</h3>
<ol>
<li><p><a href="#heading-chapter-1-background-on-this-book">Chapter 1: Background on this Book</a></p>
<ul>
<li><p><a href="#heading-the-objective-here">The Objective Here</a></p>
</li>
<li><p><a href="#heading-why-is-this-book-about-ai-different">Why is This Book About AI Different?</a></p>
</li>
<li><p><a href="#heading-let-me-introduce-myself">Let Me Introduce Myself</a></p>
</li>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-2-the-architecture-of-mathematics">Chapter 2: The Architecture of Mathematics</a></p>
<ul>
<li><p><a href="#heading-the-tree-of-mathematics-how-everything-connects">The Tree of Mathematics: How Everything Connects</a></p>
</li>
<li><p><a href="#heading-a-quick-history-of-mathematics-from-counting-to-infinity">A Quick History of Mathematics: From Counting to Infinity</a></p>
</li>
<li><p><a href="#heading-foundations-of-relativity-how-einstein-used-math-to-understand-space-and-time">Foundations of Relativity: How Einstein Used Math to Understand Space and Time</a></p>
</li>
<li><p><a href="#heading-godels-biggest-paradox-can-math-explain-itself">Gödel’s Biggest Paradox: Can Math Explain Itself?</a></p>
</li>
<li><p><a href="#heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</a></p>
</li>
<li><p><a href="#heading-code-examples-analytical-and-numerical-approaches">Code Examples: Analytical and Numerical Approaches</a></p>
</li>
<li><p><a href="#heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</a></p>
</li>
<li><p><a href="#heading-a-final-lesson-from-history">A Final Lesson From History</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-3-the-field-of-artificial-intelligence">Chapter 3: The Field of Artificial Intelligence</a></p>
<ul>
<li><p><a href="#heading-what-is-artificial-intelligence">What is Artificial Intelligence?</a></p>
</li>
<li><p><a href="#heading-symbolic-vs-non-symbolic-ai-whats-the-difference">Symbolic vs. Non-symbolic AI: What’s the Difference?</a></p>
</li>
<li><p><a href="#heading-before-ai-control-theory-as-the-first-ai">Before AI: Control Theory as the “First AI”</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-4-linear-algebra-the-geometry-of-data">Chapter 4: Linear Algebra - The Geometry of Data</a></p>
<ul>
<li><p><a href="#heading-what-are-matrices-and-why-do-they-simplify-equations">What Are Matrices and Why Do They Simplify Equations?</a></p>
</li>
<li><p><a href="#heading-vectors-and-transformations-moving-in-multiple-directions">Vectors and Transformations: Moving in Multiple Directions</a></p>
</li>
<li><p><a href="#heading-linear-independence-dependence-and-rank-why-it-matters">Linear Independence, Dependence, and Rank: Why It Matters</a></p>
</li>
<li><p><a href="#heading-determinants-measuring-space-and-scaling">Determinants: Measuring Space and Scaling</a></p>
</li>
<li><p><a href="#heading-what-are-mathematical-spaces-and-how-do-they-simplify-calculations">What Are Mathematical Spaces and How Do They Simplify Calculations?</a></p>
</li>
<li><p><a href="#heading-eigenvalues-and-eigenvectors-unlocking-hidden-patterns">Eigenvalues and Eigenvectors: Unlocking Hidden Patterns</a></p>
</li>
<li><p><a href="#heading-applications-of-linear-algebra-in-ai-and-control-theory">Applications of Linear Algebra in AI and Control Theory</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-5-multivariable-calculus-change-in-many-directions">Chapter 5: Multivariable Calculus - Change in Many Directions</a></p>
<ul>
<li><p><a href="#heading-limits-and-continuity-understanding-smooth-change">Limits and Continuity: Understanding Smooth Change</a></p>
</li>
<li><p><a href="#heading-why-are-limits-important-to-understand-derivatives-and-integrals">Why are limits important to understand derivatives and integrals?</a></p>
</li>
<li><p><a href="#heading-derivatives-how-things-change-and-how-fast">Derivatives: How Things Change and How Fast</a></p>
</li>
<li><p><a href="#heading-what-about-integral-calculus">What About Integral Calculus?</a></p>
</li>
<li><p><a href="#heading-applications-in-ai-and-control-theory-calculus-in-action">Applications in AI and Control Theory: Calculus in Action</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-6-probability-amp-statistics-learning-from-uncertainty">Chapter 6: Probability &amp; Statistics - Learning from Uncertainty</a></p>
<ul>
<li><p><a href="#heading-mean-median-mode-measuring-central-tendency">Mean, Median, Mode: Measuring Central Tendency</a></p>
</li>
<li><p><a href="#heading-variance-and-standard-deviation-measuring-spread">Variance and Standard Deviation: Measuring Spread</a></p>
</li>
<li><p><a href="#heading-what-is-the-normal-distribution-the-bell-curve-of-life">What Is the Normal Distribution? The Bell Curve of Life</a></p>
</li>
<li><p><a href="#heading-how-the-central-limit-theorem-helps-approximate-the-world">How the Central Limit Theorem Helps Approximate the World</a></p>
</li>
<li><p><a href="#heading-bayes-theorem-learning-from-evidence">Bayes Theorem: Learning from Evidence</a></p>
</li>
<li><p><a href="#heading-what-are-markov-models-predicting-the-next-step-one-step-at-a-time">What Are Markov Models? Predicting the Next Step, One Step at a Time</a></p>
</li>
<li><p><a href="#heading-applications-in-ai-and-control-theory-making-decisions-under-uncertainty">Applications in AI and Control Theory: Making Decisions Under Uncertainty</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-chapter-7-optimization-theory-teaching-machines-to-improve">Chapter 7: Optimization Theory - Teaching Machines to Improve</a></p>
<ul>
<li><p><a href="#heading-what-is-optimization-theory">What is Optimization Theory?</a></p>
</li>
<li><p><a href="#heading-why-optimization-drives-learning-in-ai">Why Optimization Drives Learning in AI</a></p>
</li>
<li><p><a href="#heading-simple-optimization-techniques-how-machines-learn-step-by-step">Simple Optimization Techniques: How Machines Learn Step by Step</a></p>
</li>
<li><p><a href="#heading-what-is-adam-the-most-popular-way-ai-models-finds-the-best-learning-path">What is Adam? The Most Popular Way AI Models Finds the Best Learning Path</a></p>
</li>
<li><p><a href="#heading-applications-in-ai-and-control-theory-of-optimization-theory">Applications in AI and Control Theory of Optimization Theory</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-conclusion-where-mathematics-and-ai-meet">Conclusion: Where Mathematics and AI Meet</a></p>
<ul>
<li><p><a href="#heading-mathematics-is-the-foundation-of-ai">Mathematics is the Foundation of AI</a></p>
</li>
<li><p><a href="#heading-the-future-on-device-ai-and-the-democratization-of-ai">The Future: On Device AI and the Democratization of AI</a></p>
</li>
<li><p><a href="#heading-final-reflections">Final Reflections</a></p>
</li>
<li><p><a href="#heading-acknowledgements">Acknowledgements</a></p>
</li>
</ul>
</li>
<li><p><a href="#heading-about-the-author">About the Author</a></p>
</li>
</ol>
<h2 id="heading-chapter-1-background-on-this-book">Chapter 1: Background on this Book</h2>
<h3 id="heading-the-objective-here">The Objective Here</h3>
<p>My objective in this book is simple: Explain the key mathematical ideas you need to grasp in order to deeply understand AI and train machine learning models.</p>
<p>So you might be wondering: Why is it important to have a good math foundation before creating these models?</p>
<p>Well, there are many reasons, but some are:</p>
<ul>
<li><p>It gives you the capacity to understand new AI research on your own.</p>
</li>
<li><p>You can use this same foundation to study other STEM concepts like signal theory and advanced statistical methods.</p>
</li>
<li><p>It helps you understand that AI models are just a mixture of different math ideas working together and gives you insight into how new innovations make LLMs more efficient.</p>
</li>
<li><p>It gives you a foundation so you know how to calibrate AI models and even create derivative models.</p>
</li>
</ul>
<p>These skills are also important for startup founders, especially in Silicon Valley. Many startups begin with APIs or API wrappers but eventually need their own AI solutions.</p>
<p>Outsourcing all AI isn't ideal. This book will help you understand AI foundations so you can design better growth strategies and communicate effectively with investors – especially those who were successful technical co-founders.</p>
<h3 id="heading-why-is-this-book-about-ai-different">Why is This Book About AI Different?</h3>
<p>In this book, we’ll look at AI from an engineering perspective. This differs from the typical computer science approach to AI that most introductory courses take.</p>
<p>In doing so, I won’t spend a lot of time explaining formulas and theorems. Instead, I’ll explain their importance, how and why they are applied the way they are.</p>
<p>In this way, I hope to offer a unique viewpoint that emphasizes the engineering principles and good practices that underlie all modern AI technologies.</p>
<p>I will also explain how many of these strange math ideas make billion dollar industries possible.</p>
<p>We’ll start with the fundamentals: the structure of the areas of mathematics and AI. After that, we’ll look at the four subareas of math that make AI possible:</p>
<ul>
<li><p>Linear Algebra</p>
</li>
<li><p>Calculus</p>
</li>
<li><p>Probability Theory and Statistics</p>
</li>
<li><p>Optimization Theory</p>
</li>
</ul>
<p>After going through all the math, we’ll connect it with the foundation of ChatGPT and all of these large language models.</p>
<p>This way, you’ll get a basic foundation in key math concepts that, when mixed together like the ingredients of a cake, make all AI models possible.</p>
<p>By knowing where the ideas come from, you’ll develop a system-level understanding of AI and a first-principles approach.</p>
<p>So just keep in mind that, even though concepts like integral calculus and eigenvalues/eigenvectors might not be widely used in AI, they’ll help you develop these system-level and first-principle approaches.</p>
<p>Also, this book will be a work in progress. After its first release, I’ll seek feedback on things I need to perfect, chapters to add, and so on.</p>
<p>Here is my email for any feedback you might have: <a href="mailto:monteiro.t@northeastern.edu">monteiro.t@northeastern.edu</a></p>
<p>And here is the book’s GitHub repository with all code: <a href="https://github.com/tiagomonteiro0715/The-Math-Behind-Artificial-Intelligence-A-Guide-to-AI-Foundations">https://github.com/tiagomonteiro0715/The-Math-Behind-Artificial-Intelligence-A-Guide-to-AI-Foundations</a></p>
<h3 id="heading-let-me-introduce-myself">Let Me Introduce Myself</h3>
<p>My name is Tiago Monteiro, an electrical and computer engineer and AI master's degree student at Northeastern University's Silicon Valley campus. I have authored 20+ articles with 240K+ views here on freeCodeCamp on math, AI, and tech.</p>
<p>If you’d like to know more about my background, I’ll share that at the end of the book.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p>In terms of minimum requirements, you only need to know the basics of mathematics and programming:</p>
<ul>
<li><p>Basic algebra and what functions and the coordinate system are.</p>
</li>
<li><p>You should be able to read Python code and understand things like variables, functions, and loops.</p>
</li>
</ul>
<h2 id="heading-chapter-2-the-architecture-of-mathematics">Chapter 2: The Architecture of Mathematics</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766099739986/049ff3c0-0150-495e-97e9-4f16f3861058.png" alt="Cover of the chapter the architecture of mathematics" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Math is more than numbers. It’s the science of locating complex patterns that shape our world. To truly understand math, we must look beyond numbers and formulas to grasp its structures.</p>
<p>This chapter aims to show math as a growing tree of ideas, a living system of logic, not just formulas to memorize. With analogies, history, and code examples, I want to help you understand math deeply and how to apply it to programming.</p>
<p>I’ve included code examples to connect theory and practice, showing how math ideas apply to real problems. Whether you're new to advanced math or are more experienced, these examples will help you apply math in programming.</p>
<p>This way, before we start going over the different math pillars that sustain AI, you will understand the structure of the field.</p>
<h3 id="heading-the-tree-of-mathematics-how-everything-connects">The Tree of Mathematics: How Everything Connects</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765001557970/7ac6c8c8-d0fd-4a67-be6a-6d8b9a1a6615.jpeg" alt="Seeing a tree from its root to a tree" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/bottom-view-of-green-leaved-tree-during-daytime-91153/">Lerkrat Tangsri</a></p>
<p>Imagine math as a vast, ever-growing tree.</p>
<p>The roots are the foundations: logic and set theory. From these roots, the main fields emerge: arithmetic, algebra, geometry, and analysis.</p>
<p>As the tree branches out, new subfields like topology and abstract algebra appear. Sometimes branches connect with each other.</p>
<p>This tree keeps growing in many directions. History shows that sometimes it grows rapidly due to scientific discoveries, while at other times, growth is slow.</p>
<p>And you might wonder: How many more branches and connections between them will keep appearing?</p>
<h3 id="heading-a-quick-history-of-mathematics-from-counting-to-infinity">A Quick History of Mathematics: From Counting to Infinity</h3>
<p>The first mathematical ideas emerged independently in ancient civilizations, such as:</p>
<ul>
<li><p>India's invention of zero</p>
</li>
<li><p>Islamic algebraic advances</p>
</li>
<li><p>Greek geometric rigor</p>
</li>
</ul>
<p>Great mathematicians developed and shared these ideas through writing and lectures. Over time, new generations built on these ideas, creating new branches of mathematics. This endless growth is why Isaac Newton wrote to Robert Hooke in 1675:</p>
<blockquote>
<p>“If I have seen further, it is by standing on the shoulders of giants.”</p>
</blockquote>
<p>He meant that by working from previous knowledge, he was able to create and (re)discover new ideas.</p>
<p>Yet, the real power of math lies in practicing it over and over and studying it more and more deeply.</p>
<p>As one of my professors once pointed out:</p>
<blockquote>
<p><em>“More important than knowing the theorems is knowing the ideas behind them and the history of how they were created.”</em></p>
</blockquote>
<p>To solve problems, it's often necessary to think from first principles, and math teaches this. Math is not just an academic topic. It’s a global language for scientists and engineers.</p>
<p>By preserving and sharing it, new math can grow from old ideas, allowing the tree to keep expanding.</p>
<h3 id="heading-foundations-of-relativity-how-einstein-used-math-to-understand-space-and-time">Foundations of Relativity: How Einstein Used Math to Understand Space and Time</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766903578928/a4102586-cb63-4410-8793-72950145726d.jpeg" alt="A satellite in space" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/gray-and-white-satellite-41006/">Pixabay</a></p>
<p>Albert Einstein developed the general and special theories of relativity, which impact:</p>
<ul>
<li><p>GPS and global communication</p>
</li>
<li><p>Satellite telecommunications</p>
</li>
<li><p>Space exploration and satellite launches</p>
</li>
</ul>
<p>And more.</p>
<p>But this was only possible by combining geometry with calculus, known as <strong>differential geometry.</strong> This field evolved over centuries, thanks to many great mathematicians. Here are a few of them, though the list is not exhaustive:</p>
<ul>
<li><p><strong>Euclid (circa 300 BCE):</strong> Contributed to geometry, laying the groundwork for later mathematical systems</p>
</li>
<li><p><strong>Archimedes (circa 287–212 BCE):</strong> Pioneered the understanding of volume, surface area, and the principles of mechanics</p>
</li>
<li><p><strong>René Descartes (1596–1650):</strong> Developed Cartesian coordinates and analytical geometry</p>
</li>
<li><p><strong>Isaac Newton (1642–1727) &amp; Gottfried Wilhelm Leibniz (1646–1716):</strong> Newton’s laws of motion and gravitation, alongside Leibniz’s development of calculus, formed the basis of classical mechanics that Einstein sought to extend and modify in his theory of relativity.</p>
</li>
<li><p><strong>Leonhard Euler (1707–1783):</strong> Contributed to the development of differential equations, which are essential in the mathematical foundations of physics.</p>
</li>
<li><p><strong>Gaspard Monge (1746–1818):</strong> The father of differential geometry and pioneer in descriptive geometry</p>
</li>
<li><p><strong>Carl Friedrich Gauss (1777–1855):</strong> Made groundbreaking advances in geometry, including the concept of curved surfaces.</p>
</li>
<li><p><strong>Bernhard Riemann (1826–1866):</strong> Introduced Riemannian geometry, a branch of differential geometry.</p>
</li>
</ul>
<p>Going back to Albert Einstein, he saw what no one else in his time saw, thanks to these great math giants and countless others.</p>
<h3 id="heading-godels-biggest-paradox-can-math-explain-itself">Gödel’s Biggest Paradox: Can Math Explain Itself?</h3>
<p>The biggest paradox in math, discovered by Kurt Gödel, is his incompleteness theorems. They show that in any consistent formal system capable of simple arithmetic, there are true statements that cannot be proven within the system.</p>
<p>This means there are limits to what can be proven as true or false. For mathematicians, this implies that some truths are beyond formal proofs, yet we assume they are true. It demonstrates that no matter how much effort or AI is used, some things remain unprovable, known only through approximations and non-exact methods.</p>
<h3 id="heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</h3>
<p>Applied math and engineering involve adapting the pure math ideas in real-world scenarios.</p>
<p>Actually, in many cases, it’s the combination of many math ideas.</p>
<p>Let’s consider some examples:</p>
<ul>
<li><p>In <strong>harmonic analysis</strong>, Laplace, Fourier, and Z-transforms are a way to see the same thing in a new domain to get new insights. In this case, integrals are used to make this mapping possible.</p>
</li>
<li><p><strong>Principal component analysis (PCA)</strong> is a widely used tool in data science. Yet, it is a mixture of linear algebra (in PCA, eigenvalues) with optimization (order eigenvalues that represent more data with less data) in order to make datasets shorter.</p>
</li>
<li><p>In <strong>machine learning</strong>, logistic regression is a mixture of calculus with statistics and probability.</p>
</li>
<li><p>In <strong>deep learning</strong>, neural networks are just many matrices multiplying and updating themselves that adapt to model a dataset representing a system. This optimization of matrix values happens with activation functions, a gradient descent-based optimization method (tells how much values need to change), and backpropagation (applies those alterations to all matrix values).</p>
</li>
</ul>
<p>But the best example of this fusion of math in engineering is in <a href="https://www.freecodecamp.org/news/basic-control-theory-with-python/">control theory</a>. Control theory is the study of the architecture of systems. From trains to cars to airplanes, everything is based on control theory. It’s everywhere, in nearly all modern electronic devices. In electric circuits, control theory is also used heavily to guarantee circuit stability in the face of electric disturbances.</p>
<p>So as you can probably start to see, many of the tools we now have are just a mixture of many pure math ideas – like different recipes. In essence, applied math is the application of pure math as “ingredients“ in "recipes" to solve problems.</p>
<p>So, we’ve explored the structure and evolution of mathematics. But it’s important to see how we can apply these ideas in real life. Pure math makes the framework, and applied math applies that framework to solve problems. To understand this, we’ll examine two code examples that show how you can use math ideas as programming tools.</p>
<h3 id="heading-code-examples-analytical-and-numerical-approaches">Code Examples: Analytical and Numerical Approaches</h3>
<p>These code examples demonstrate a couple ways you can use Python to solve math equations.</p>
<p>In the first code example, we’ll solve the problem in the same way that kids in school solve math exercises: essentially, by hand with a pencil. In the second example, we’ll solve the problem using numerical analysis.</p>
<h4 id="heading-example-1-solve-a-problem-analytically">Example 1: Solve a Problem Analytically</h4>
<p>In this problem, we need to find the values of the variables x and y. So we’ll be moving variables from left to right to find their values.</p>
<p>When we solve math problems analytically, like we did in school, we are manipulating symbols to get exact values. Often these symbols are x, y, and z.</p>
<p>The code below solves a system of two equations with two unknowns variables, x and y.</p>
<p>We will use the <a href="https://www.sympy.org">SymPy</a> Python library to do this. It’s mainly used for symbolic mathematics.</p>
<pre><code class="language-python">from sympy import symbols, Eq, solve

x, y = symbols('x y')
eq1 = Eq(2*x + 3*y, 6)
eq2 = Eq(-x + y, 1)

solution = solve((eq1, eq2), (x, y))
print(solution)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160359386/7a21cddc-f4ba-4f9f-afa0-d1cc11fb27d6.png" alt="Image of the equations and analytical method in Python" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Once again with this code we are finding the values of the variables x and y.</p>
<p>Essentially, we’re finding x and y based on this equation:</p>
<p>$$\begin{align} 2x + 3y &amp;= 6 \ -x + y &amp;= 1 \end{align}$$</p>
<p>Which gives us the following result:</p>
<pre><code class="language-python">{x: 3/5, y: 8/5}
</code></pre>
<p>Or:</p>
<ul>
<li><p>x= 0.6</p>
</li>
<li><p>y = 1.6</p>
</li>
</ul>
<p>When we say that we’re solving this analytically, it means that we’re finding an exact mathematical solution using formulas or equations.</p>
<p>But many times, problems are harder and can be solved by adding symbols to the right or left of the equation. Sometimes, there can be so many symbols and transformed versions of them, with things like derivatives and integrals, that it can become very hard to manage and takes a lot of time.</p>
<p>For example, let’s look at this partial differential equation:</p>
<p>$$\begin{cases} \frac{\partial u}{\partial t} = \alpha \frac{\partial^2 u}{\partial x^2}, &amp; 0 &lt; x &lt; L, , t &gt; 0 \ u(0,t) = 0, &amp; t &gt; 0 \ u(L,t) = 0, &amp; t &gt; 0 \ u(x,0) = f(x), &amp; 0 &lt; x &lt; L \end{cases}$$</p>
<p>It can be solved with an analytical method call separation of variables.</p>
<p>But it requires many steps, and it’s easy to make mistakes. Even engineers who learned this often struggle to remember the process later.</p>
<p>When I first encountered this type of math exercise in my electrical and computer engineering degree back in Portugal, it took me 20 to 30 minutes to solve it.</p>
<p>For this reason, there's a branch of mathematics called numerical analysis that focuses on finding approximations of existing formulas. It helps solve problems faster. This is the method we'll explore next.</p>
<h4 id="heading-example-2-solve-numerically-approximation">Example 2: Solve Numerically (Approximation)</h4>
<p>Now let’s solve a different problem: we’re going to find the values of each of the 5 variables:</p>
<p>$$\begin{bmatrix} 3 &amp; 2 &amp; -1 &amp; 4 &amp; 5 \ 1 &amp; 1 &amp; 3 &amp; 2 &amp; -2 \ 4 &amp; -1 &amp; 2 &amp; 1 &amp; 0 \ 5 &amp; 3 &amp; -2 &amp; 1 &amp; 1 \ 2 &amp; -3 &amp; 1 &amp; 3 &amp; 4 \end{bmatrix} \times \begin{bmatrix} x_1 \ x_2 \ x_3 \ x_4 \ x_5 \end{bmatrix} = \begin{bmatrix} 12 \ 5 \ 7 \ 9 \ 10 \end{bmatrix}$$</p>
<p>Solving this by hand will take some time…but with Python code, it’s very fast.</p>
<p>We’ll also use the <a href="https://scipy.org">SciPy</a> Python library for this example.</p>
<p>Let’s solve the system numerically:</p>
<pre><code class="language-python">import numpy as np
from scipy.linalg import solve

A = np.array([[3, 2, -1, 4, 5],
              [1, 1, 3, 2, -2],
              [4, -1, 2, 1, 0],
              [5, 3, -2, 1, 1],
              [2, -3, 1, 3, 4]])

b = np.array([12, 5, 7, 9, 10])

solution = solve(A, b)

print(solution)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160347486/d1f17aa6-b288-4e41-9be7-0810c45e778c.png" alt="Image of equations and numerical method" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Which corresponds to this operation:</p>
<p>$$\begin{bmatrix} 3 &amp; 2 &amp; -1 &amp; 4 &amp; 5 \ 1 &amp; 1 &amp; 3 &amp; 2 &amp; -2 \ 4 &amp; -1 &amp; 2 &amp; 1 &amp; 0 \ 5 &amp; 3 &amp; -2 &amp; 1 &amp; 1 \ 2 &amp; -3 &amp; 1 &amp; 3 &amp; 4 \end{bmatrix} \times \begin{bmatrix} x_1 \ x_2 \ x_3 \ x_4 \ x_5 \end{bmatrix} = \begin{bmatrix} 12 \ 5 \ 7 \ 9 \ 10 \end{bmatrix}$$</p>
<p>Again, it takes time to solve this and it’s very easy to make a simple mistake.</p>
<p>But in this code example, this line of code:</p>
<pre><code class="language-python">solution = solve(A, b)
</code></pre>
<p>Uses the <code>solve</code> method from SciPy:</p>
<pre><code class="language-python">from scipy.linalg import solve
</code></pre>
<p>It’s a method that helps you find the values of x in an equation A⋅x=b, where A is a square grid of numbers and b is a list of numbers. That gives us the following:</p>
<pre><code class="language-python">[ 1.35022026 -0.79955947 -1.17180617  3.14317181 -0.83920705]
</code></pre>
<p>Which corresponds to:</p>
<p>$$\begin{bmatrix} x_1 \ x_2 \ x_3 \ x_4 \ x_5 \end{bmatrix} = \begin{bmatrix} 1.35022026 \ -0.79955947 \ -1.17180617 \ 3.14317181 \ -0.83920705 \end{bmatrix}$$</p>
<p>And is the same thing as:</p>
<p>$$\begin{align} x_1 &amp;= 1.35022026 \ x_2 &amp;= -0.79955947 \ x_3 &amp;= -1.17180617 \ x_4 &amp;= 3.14317181 \ x_5 &amp;= -0.83920705 \end{align}$$</p>
<h4 id="heading-why-these-two-approaches-matter">Why These Two Approaches Matter</h4>
<p>We have solved two mathematical problems in two different ways:</p>
<ul>
<li><p>Analytical: Exact solutions through algebraic manipulation</p>
</li>
<li><p>Numerical: Approximate solutions using algorithms</p>
</li>
</ul>
<p>In engineering and in AI, we are constantly choosing between these approaches.</p>
<p>When training AI models with millions of parameters, analytical solutions are impossible. This is why, in these cases, we need numerical approaches.</p>
<p>When creating math theorems, we need analytical precision to make sure it is the best possible solution.</p>
<p>This is one of the many things an engineering degree teaches you: often, in the real world, it’s better to just write some code to solve a problem than to actually solve it by hand with math. Other times, the best solution is to just think in first principles and from there create new theorems to solve a problem.</p>
<p>Now let's step out of the code examples and see how different branches of mathematics connect.</p>
<h3 id="heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</h3>
<p>Is it possible to unify all math?</p>
<p>In theory, yes. This is known as the Grand Unified Theory of Mathematics. It's the idea that all different areas of math can be linked together to discover deeper patterns in mathematics.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Langlands_program">Langlands program</a> is trying to make this unification possible. It’s an attempt to interconnect the largest parts of the big tree of math to uncover new patterns in math.</p>
<p>With a Grand Unified Theory of Mathematics, we would be able to understand how every branch of the tree connects with the others and all the relationships between them.</p>
<h4 id="heading-whats-the-value-of-this-big-unification-for-society">What’s the Value of this Big Unification for Society?</h4>
<p>By studying history, we can find patterns. The unification of various fields has created many massive impacts on society, such as:</p>
<ul>
<li><p>In the 19th century, James Clerk Maxwell united the fields of electricity and magnetism with his famous Maxwell equations. This allowed the creation of radios and electric grids around the globe. In turn, it served as a foundation for all technological progress in the 20th and 21st century.</p>
</li>
<li><p>In the 20th century, the unification of algebra with logic led to the rise of digital systems. In turn, digital systems gave rise to processors and the evolution of computers and the modern laptop.</p>
</li>
<li><p>Also in the 20th century, the unification of probability and communication led to information theory. This became the foundation for the internet. This unification was carried out by a great mathematician named Claude Shannon.</p>
</li>
</ul>
<p>In the end, a grand unified theory of mathematics could be one of the biggest achievements in modern society.</p>
<p>In AI, it could help unify all machine learning models in a common architecture. This would help accelerate the development of new AI models and could also open the door to new material science advances.</p>
<p>It could help reveal – with math – the deep patterns we still haven’t found in these fields. Just as uniting electricity and magnetism led to modern technology, a unified math framework would lead to a wave of innovation.</p>
<h3 id="heading-a-final-lesson-from-history">A Final Lesson From History</h3>
<p>From Greek geometry to AI, math has grown like a tree over centuries. By understanding its structure, it’s possible to see its role in finding the patterns of our universe.</p>
<p>I hope I was able to make you see math in this way. I hope you can also see that the unification of scientific fields helps lay the foundations for the creation of new innovations to help society go forward.</p>
<p>Many major societal transformations only came to be thanks to abstract math ideas. When these are shared and refined, they become the hidden architecture of progress in society. Innovation begins when disconnected ideas are united, well-linked, and widely shared.</p>
<h2 id="heading-chapter-3-the-field-of-artificial-intelligence">Chapter 3: The Field of Artificial Intelligence</h2>
<h3 id="heading-what-is-artificial-intelligence">What is Artificial Intelligence?</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765001693682/bbec3565-643f-421f-b32e-3de62285a2c0.jpeg" alt="A man playing chess against a robot" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/elderly-man-thinking-while-looking-at-a-chessboard-8438918/">Pavel Danilyuk</a></p>
<p>The term Artificial Intelligence was born from the work of John McCarthy, who is often called the "father of AI."</p>
<p>He used it when he, along with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, proposed the famous Dartmouth Summer Research Project on Artificial Intelligence in 1956.</p>
<p>Artificial intelligence was defined, in the Dartmouth Conference, as:</p>
<blockquote>
<p><em>“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”</em></p>
</blockquote>
<p>Since then, the field has evolved in waves of innovation, from early rules-based systems to modern neural networks.</p>
<p>But over time, rather than creating <a href="https://en.wikipedia.org/wiki/Artificial_general_intelligence">general intelligence</a>, most AI systems have been designed to excel at narrow tasks.</p>
<p>For example:</p>
<ul>
<li><p>Chess-playing programs like Deep Blue that defeated world champion Garry Kasparov</p>
</li>
<li><p>Image recognition systems that can identify objects in photographs with impressive accuracy</p>
</li>
<li><p>Natural language processing models that can translate between languages</p>
</li>
<li><p>Game-playing AI like AlphaGo that mastered the ancient game of Go</p>
</li>
</ul>
<h4 id="heading-artificial-general-intelligence-isnt-yet-here">Artificial General Intelligence isn’t yet here</h4>
<p>Only very narrow AI models have demonstrated human-level or superhuman performance in their narrow domains.</p>
<p>In my view, and as we will see in this book, AGI will be the combination and interaction of different large language models interacting with each other and with the tools available to them.</p>
<h3 id="heading-symbolic-vs-non-symbolic-ai-whats-the-difference">Symbolic vs. Non-symbolic AI: What’s the Difference?</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755906822438/f639efd3-3f8b-45a7-ad2d-d1795d772947.png" alt="Image comparing artificial general intelligence with narrow AI and, inside narrow AI, non-symbolic AI and symbolic AI circles" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-what-is-symbolic-ai">What is Symbolic AI?</h4>
<p>Symbolic AI refers to the creation of a program based on many rules and symbols to simulate how humans think.</p>
<p>It uses symbols to represent concepts (like farms and distributors) and logical rules to reason about them.</p>
<p>The specific data about your domain is called facts. Facts are the pieces of information the rules operate on. For example, a fact might be "green_acres has high water usage and good pH levels."</p>
<p>Also, imagine someone wants to optimize farm distribution logistics. The symbols would represent farms, distributors, and transport methods. Then the rules would be:</p>
<ul>
<li><p>If the farm has high water usage and good pH levels, then classify it as high-yield producer</p>
</li>
<li><p>If a high-yield producer and distributor has low demand, then prioritize direct connection</p>
</li>
<li><p>If a direct connection is needed, then select transport with lowest environmental impact</p>
</li>
</ul>
<p>The facts would be the actual data like "farm X has high water usage" or "distributor Y has low demand."</p>
<p>This way, the system combines these rules and facts through logical reasoning to make decisions. A very popular programming language we use in this field is called Prolog that was designed to create rule-based systems.</p>
<p><strong>Symbolic AI program: Manage agricultural networks with a Prolog program.</strong></p>
<p>Let’s look at an example project to understand this more clearly. The project we’ll examine is called SymbolicAIHarvest. It was part of a course at NOVA University during my undergraduate studies in Electrical and Computer Engineering. The course was titled "Modelation of Data in Engineering."</p>
<p>SymbolicAIHarvest is an AI system developed with Prolog to manage agricultural networks. <a href="https://github.com/tiagomonteiro0715/SymbolicAIHarvest">Here’s the project</a> on GitHub so you can check it out.</p>
<p>The project optimizes farm operations using rule-based reasoning. It monitors sensors for real-time data and improves route planning for machinery. It also coordinates produce movement to reduce delays and waste, enhancing productivity and sustainability.</p>
<p>Understanding the code below is not a priority for this book. I just want to show you an example of all the facts of the project:</p>
<pre><code class="language-plaintext">% FARMERS(owner)
farmer(ana).
farmer(asdrubal).
farmer(miguel).
farmer(joao).
farmer(teresinha).
farmer(victor).
farmer(carlos).
farmer(anabela).

% FARMS(name, owner, region, type)
farm(q1, ana, alentejo, vinha).
farm(q2, ana, alentejo, olival).
farm(q3, asdrubal, lisboa, cenoureira).
farm(q4, asdrubal, lisboa, milharal).
farm(q5, asdrubal, lisboa, vinha).
farm(q6, miguel, evora, trigal).
farm(q7, miguel, evora, cenoureia).
farm(q8, miguel, evora, vinha).
farm(q9, miguel, evora, morangueira).
farm(q10, joao, porto, vinha).
farm(q11, joao, porto, trigal).
farm(q12, joao, porto, cenoureira).
farm(q13, teresinha, algarve, olival).
farm(q14, teresinha, algarve, vinha).
farm(q15, victor, setubal, olival).
farm(q16, victor, setubal, vinha).
farm(q17, victor, setubal, trigal).
farm(q18, carlos, sintra, milharal).
farm(q19, carlos, sintra, vinha).
farm(q20, anabela, coina, milharal).
farm(q21, anabela, coina, olival).
farm(q22, anabela, coina, trigal).

% SENSOR READINGS(name, type, value)
sensor_reading(q1,humidity,28).
sensor_reading(q2,humidity,35).
sensor_reading(q3,humidity,42).
sensor_reading(q4,humidity,38).
sensor_reading(q5,humidity,33).
sensor_reading(q6,humidity,45).
sensor_reading(q7,humidity,30).
sensor_reading(q8,humidity,36).
sensor_reading(q9,humidity,50).
sensor_reading(q10,humidity,41).
sensor_reading(q11,humidity,40).
sensor_reading(q12,humidity,44).
sensor_reading(q13,humidity,32).
sensor_reading(q14,humidity,29).
sensor_reading(q15,humidity,47).
sensor_reading(q16,humidity,39).
sensor_reading(q17,humidity,53).
sensor_reading(q18,humidity,27).
sensor_reading(q19,humidity,24).
sensor_reading(q20,humidity,31).
sensor_reading(q21,humidity,37).
sensor_reading(q22,humidity,46).
sensor_reading(q1, temperature, 25).
sensor_reading(q2, temperature, 25).
sensor_reading(q3, temperature, 25).
sensor_reading(q4, temperature, 25).
sensor_reading(q5, temperature, 25).
sensor_reading(q6, temperature, 25).
sensor_reading(q7, temperature, 25).
sensor_reading(q8, temperature, 25).
sensor_reading(q9, temperature, 25).
sensor_reading(q10, temperature, 25).
sensor_reading(q11, temperature, 25).
sensor_reading(q12, temperature, 25).
sensor_reading(q13, temperature, 25).
sensor_reading(q14, temperature, 25).
sensor_reading(q15, temperature, 25).
sensor_reading(q16, temperature, 25).
sensor_reading(q17, temperature, 25).
sensor_reading(q18, temperature, 25).
sensor_reading(q19, temperature, 25).
sensor_reading(q20, temperature, 25).
sensor_reading(q21, temperature, 25).
sensor_reading(q22, temperature, 25).
sensor_reading(q1, water, 47000).
sensor_reading(q2, water, 52500).
sensor_reading(q3, water, 39000).
sensor_reading(q5, water, 61000).
sensor_reading(q8, water, 58000).
sensor_reading(q10, water, 43000).
sensor_reading(q13, water, 72000).
sensor_reading(q16, water, 49000).
sensor_reading(q18, water, 35000).
sensor_reading(q21, water, 66500).
sensor_reading(q1, ph, 6.5).
sensor_reading(q2, ph, 4.7).
sensor_reading(q3, ph, 8.2).
sensor_reading(q4, ph, 7.0).
sensor_reading(q5, ph, 5.1).
sensor_reading(q6, ph, 8.0).
sensor_reading(q7, ph, 4.5).

% DISTRIBUTORS (name, region, capacity, demand level)
distributor(d1, alentejo, 1000, 2).
distributor(d2, lisboa, 800, 1).
distributor(d3, evora, 1200, 3).
distributor(d4, porto, 900, 2).
distributor(d5, algarve, 700, 2).
distributor(d6, setubal, 1100, 1).
distributor(d7, sintra, 950, 2).
distributor(d8, coina, 1000, 1).

% TRANSPORTS (name, capacity, type, autonomy, region, impact)
transport(t1, 1000, fossil, 100, alentejo, 3).
transport(t2, 500, electric, 10, alentejo, 1).
transport(t3, 800, fossil, 400, algarve, 5).
transport(t4, 700, hybrid, 300, setubal, 2).
transport(t5, 150, electric, 340, coina, 1).
transport(t6, 700, fossil, 220, porto, 3).
transport(t7, 900, hybrid, 350, evora, 2).
transport(t8, 1000, electric, 170, sintra, 1).

% Connections based on graph image

% Top of the network
link(q2, d1, 5).
link(q1, d1, 7).
link(q3, d1, 6).

% Network center
link(q3, q4, 8).
link(q4, d2, 6).
link(q4, d3, 7).
link(q4, q5, 5).
link(q4, d4, 6).

% Additional connections
link(q2, d2, 8).
link(q3, d3, 7).
</code></pre>
<p>This Prolog code models an agricultural supply chain system that has:</p>
<ul>
<li><p>Farmers</p>
</li>
<li><p>Farms</p>
</li>
<li><p>Sensors Readings</p>
</li>
<li><p>Distributors</p>
</li>
<li><p>Transports</p>
</li>
</ul>
<p>In addition, in this part of the code on the facts of the system:</p>
<pre><code class="language-plaintext">% Top of the network
link(q2, d1, 5).
link(q1, d1, 7).
link(q3, d1, 6).

% Network center
link(q3, q4, 8).
link(q4, d2, 6).
link(q4, d3, 7).
link(q4, q5, 5).
link(q4, d4, 6).

% Additional connections
link(q2, d2, 8).
link(q3, d3, 7).
</code></pre>
<p>We connect farms with distributors. This way, we can see that between the farm <code>q1</code> and distributor <code>d1</code> is a distance of 7k. This makes it possible to find/create algorithms to find the shortest path between them.</p>
<p>In the end, symbolic AI just creates programs based on a context and rules applied to that context.</p>
<h4 id="heading-what-is-non-symbolic-ai">What is Non-Symbolic AI?</h4>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755906892854/197f7bc3-8c05-46f2-aa2a-99dbaa733a9a.png" alt="Non-symbolic AI with a circle titled machine learning inside. Inside the machine learning circle is another circle with the text deep learning." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Non symbolic AI doesn’t use symbols or rules to think. Instead, it’s data driven. In other words, it learns patterns from large datasets. This is the approach used in machine learning and deep learning.</p>
<p>When we create an AI model, we can associate it with an API (Application Programming Interface) so that we can use the AI model in websites, applications, and other systems. Basically, the trained AI model is set up behind an API endpoint. An API endpoint is like a web service that lets other applications send requests to the model and get responses back.</p>
<p>For example, when you use ChatGPT in a web browser, your messages are sent through OpenAI's API to their language model, which processes your input and sends back a response.</p>
<p>An AI agent is a software program that can autonomously perform tasks by making decisions and taking actions to achieve specific goals.</p>
<p>Unlike basic chatbots that only reply to questions, AI agents can plan steps, use tools, and work towards achieving complex goals. They do this by combining language models with extra features like accessing outside data or working with other AI agents.</p>
<p><a href="https://github.com/tiagomonteiro0715/ai-content-lab">Here’s an example</a> of a non-symbolic AI agent project I worked on. I developed it using the <a href="https://www.crewai.com/">crewAI</a> Python library and the OpenAI API, one of the most popular libraries for creating AI agents.</p>
<p>In this system, five AI agents collaborate to create optimized content:</p>
<ul>
<li><p><strong>Research and Fact Checker:</strong> Conducts research to find trends and data.</p>
</li>
<li><p><strong>Audience Specialist:</strong> Analyzes audience needs for better engagement.</p>
</li>
<li><p><strong>Lead Content Writer:</strong> Writes engaging content based on research.</p>
</li>
<li><p><strong>Senior Editorial Director:</strong> Ensures content quality and consistency.</p>
</li>
<li><p><strong>SEO Specialist:</strong> Optimizes content for search engines.</p>
</li>
</ul>
<p>Using the OpenAI API, it employs chatGPT with crewAI to have these agents work for me.</p>
<h3 id="heading-before-ai-control-theory-as-the-first-ai">Before AI: Control Theory as the “First AI”</h3>
<p>Before symbolic and non symbolic AI, electrical engineering had data-driven methods. One key area that I’ve already mentioned above was control theory (which studies control systems for machines like cars and rockets). This field allows us to design systems that ensure stability despite disturbances and achieve goals beyond human capabilities.</p>
<p>Nowadays, after creating a control theory algorithm, we check if AI can improve the control system. In my experience, only some advanced deep learning methods are effective. Most machine learning methods don't outperform control theory in efficiency and security.</p>
<p>Control theory also offers better interpretability, allowing us to understand decisions, unlike advanced machine learning and deep learning.</p>
<p>Due to the historical importance of control theory, I will continue to mention its role and mathematical applications. This will help you learn AI's math foundations and understand its significance in electronic systems and AI applications in engineering beyond dataset predictions.</p>
<h2 id="heading-chapter-4-linear-algebra-the-geometry-of-data">Chapter 4: Linear Algebra - The Geometry of Data</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002362611/905a356e-7686-4212-94ac-2b4a5b359c8a.jpeg" alt="Magnifying glass pointing at a book" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/monochrome-photo-of-math-formulas-3729557/">Nothing Ahead</a>.</p>
<p>Linear algebra is like having organized containers for data.</p>
<p>Instead of playing with individual numbers, we can pack them into structured boxes that are easier to handle. These structured boxes are called matrices.</p>
<p>When you have a lot of variables like customer data, sensor readings, or images, these structured boxes are very helpful. Also, what we can do when we play around with these boxes is very valuable.</p>
<p>In AI, linear algebra is everywhere. Take matrices, for example – a key concept in Linear Algebra. LLMs perform many matrix multiplications as their core operation. The data that they take in is also organized into matrices. In image recognition, matrices are used to represent pixels of images.</p>
<p>So as you can see, this core Linear Algebra concept is important to understand. Let's start!</p>
<h3 id="heading-what-are-matrices-and-why-do-they-simplify-equations">What Are Matrices and Why Do They Simplify Equations?</h3>
<p>Very often, systems in the real world can be simplified and modeled with a system of equations.</p>
<p>Those equations are often differential equations of many orders. But to simplify, let’s choose a very simple system like the one below:</p>
<p>$$\begin{align} 2x + 3y - z &amp;= 7 \ x - 2y + 4z &amp;= -1 \ 3x + y + 2z &amp;= 10 \end{align}$$</p>
<p>When dealing with many variables and equations, writing each equation separately quickly becomes frustrating. Matrices provide a compact way to represent these systems.</p>
<p>For example, here’s the system above as a single matrix equation:</p>
<p>$$\begin{bmatrix} 2 &amp; 3 &amp; -1 \ 1 &amp; -2 &amp; 4 \ 3 &amp; 1 &amp; 2 \end{bmatrix} \begin{bmatrix} x \ y \ z \end{bmatrix} = \begin{bmatrix} 7 \ -1 \ 10 \end{bmatrix}$$</p>
<p>By seeing systems of equations as matrices, we can use linear algebra techniques to understand how the system behaves.</p>
<p>Some of these techniques are:</p>
<ul>
<li><p>Linear Independence, Dependence, and Rank</p>
</li>
<li><p>Determinants</p>
</li>
<li><p>Eigenvalues and Eigenvectors</p>
</li>
</ul>
<p>So to summarize:</p>
<ol>
<li><p>A real world system can be represented as a system of equations</p>
</li>
<li><p>A system of equations can be compressed in a structured manipulable form called a matrix.</p>
</li>
<li><p>With matrices and linear algebra techniques, we can understand how the system works.</p>
</li>
</ol>
<p>This way, we can study the basic behavior of a system with Linear Algebra.</p>
<p>For complex systems like a rocket, Linear Algebra is still the foundation. More advanced tools from control theory are used, but understanding simpler systems is essential for modeling and creating complex ones.</p>
<h3 id="heading-vectors-and-transformations-moving-in-multiple-directions">Vectors and Transformations: Moving in Multiple Directions</h3>
<p>Vectors are matrices <strong>with a single row or a single column.</strong> You can also think of them as the building blocks of AI. They represent things like data points, model parameters, and much more.</p>
<p>For example, every data input (like an image or sentence) becomes a vector that the model can processes.</p>
<p>Here are two examples of vectors:</p>
<p>$$\mathbf{A} = \begin{bmatrix} 4 &amp; -2 &amp; 7 &amp; 1 &amp; 5 \end{bmatrix}$$</p>
<p>And:</p>
<p>$$\mathbf{B} = \begin{bmatrix} 3 \ -1 \ 8 \ 0 \ -4 \end{bmatrix}$$</p>
<p>All operations that you can perform on matrices can also be performed on vectors.</p>
<p>In Python, we can represent this by:</p>
<pre><code class="language-plaintext">import numpy as np

# Define vectors A and B
A = np.array([4, -2, 7, 1, 5])
B = np.array([3, -1, 8, 0, -4])
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756171163870/4fa7dc5d-5b68-4baf-a211-3db0c3915781.png" alt="Python code image representing the code above. Defining two NumPy arrays." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>We’re using the <a href="https://numpy.org/">NumPy</a> library because it makes math with arrays easy and fast.</p>
<p>As a simplification of a system of equations, a vector with a single row represents:</p>
<p>$$\mathbf{A} = \begin{bmatrix} 4 &amp; -2 &amp; 7 &amp; 1 &amp; 5 \end{bmatrix}$$</p>
<p>And this represents this system of equations:</p>
<p>$$4x_1 - 2x_2 + 7x_3 + x_4 + 5x_5 = k$$</p>
<p>A vector with a single column represents:</p>
<p>$$\mathbf{B} = \begin{bmatrix} 3 \ -1 \ 8 \ 0 \ -4 \end{bmatrix}$$</p>
<p>Which represents this system of equations:</p>
<p>$$\begin{align} x_1 &amp;= 3 \ x_2 &amp;= -1 \ x_3 &amp;= 8 \ x_4 &amp;= 0 \ x_5 &amp;= -4 \end{align}$$</p>
<p>Now let’s see some matrix operations.</p>
<p>For example:</p>
<p>$$\mathbf{A} + \mathbf{B}^T = \begin{bmatrix} 4 &amp; -2 &amp; 7 &amp; 1 &amp; 5 \end{bmatrix} + \begin{bmatrix} 3 &amp; -1 &amp; 8 &amp; 0 &amp; -4 \end{bmatrix} = \begin{bmatrix} 7 &amp; -3 &amp; 15 &amp; 1 &amp; 1 \end{bmatrix}$$</p>
<pre><code class="language-plaintext">vector_addition = A + B
print("A + B =", vector_addition)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756171174149/62309c55-a5c5-4f69-aef6-e8ab341b5926.png" alt="Python code image representing the code above. Adding two NumPy arrays." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Which gives the result of the equation above.</p>
<p>Often, vector addition is used to combine features. For example, adding many user preference vectors creates a profile of a user.</p>
<p>Here’s a <strong>scalar multiplication:</strong></p>
<p>$$3\mathbf{A} = 3\begin{bmatrix} 4 &amp; -2 &amp; 7 &amp; 1 &amp; 5 \end{bmatrix} = \begin{bmatrix} 12 &amp; -6 &amp; 21 &amp; 3 &amp; 15 \end{bmatrix}$$</p>
<pre><code class="language-plaintext">scalar_mult = 3 * A
print("3 * A =", scalar_mult)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756171180976/17e260a4-baab-4866-ba30-fc12e090b87a.png" alt="Python code image representing the code above. Multiplying a NumPy array with a scalar." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Which gives the result of the equation above.</p>
<p>In AI, scaling vectors is usually done to adjust relevancy. For example, if we do a scalar product multiplication of a vector by 100, it means we are increasing its value. If it is by 0.3, it means we are reducing its importance.</p>
<p>Here's an outer product multiplication:</p>
<p>$$\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} 4 \ -2 \ 7 \ 1 \ 5 \end{bmatrix} \times \begin{bmatrix} 3 &amp; -1 &amp; 8 &amp; 0 &amp; -4 \end{bmatrix} = \begin{bmatrix} 12 &amp; -4 &amp; 32 &amp; 0 &amp; -20 \ -6 &amp; 2 &amp; -16 &amp; 0 &amp; 8 \ 21 &amp; -7 &amp; 56 &amp; 0 &amp; -28 \ 3 &amp; -1 &amp; 8 &amp; 0 &amp; -4 \ 15 &amp; -5 &amp; 40 &amp; 0 &amp; -20 \end{bmatrix}$$</p>
<p>And here’s a <strong>dot product multiplication</strong> (also called a <strong>dot product</strong>):</p>
<p>$$\mathbf{A} \cdot \mathbf{B}^T = \begin{bmatrix} 4 &amp; -2 &amp; 7 &amp; 1 &amp; 5 \end{bmatrix} \cdot \begin{bmatrix} 3 &amp; -1 &amp; 8 &amp; 0 &amp; -4 \end{bmatrix}$$</p>
<p>$$= 4 \cdot 3 + (-2) \cdot (-1) + 7 \cdot 8 + 1 \cdot 0 + 5 \cdot (-4) = 50$$</p>
<p>We mainly use dot products when we want to measure similarity, or alignment between two vectors.</p>
<p>In machine learning, in one simple phrase, it gives us a measure of similarity.</p>
<pre><code class="language-plaintext">import numpy as np

dot_product = np.dot(A, B)
print("A · B =", dot_product)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756171200508/ee7b9e61-c1cb-497d-b038-b6a672c6d24b.png" alt="Python code image representing the code above. Multiplying a NumPy array via dot product." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Which gives the result of the equation above.</p>
<h3 id="heading-linear-independence-dependence-and-rank-why-it-matters">Linear Independence, Dependence, and Rank: Why It Matters</h3>
<p>A lot of times, matrices can be made smaller and simpler. So it’s a good practice to reduce a matrix to its simplest form before we start to analyze its properties.</p>
<p>When each row of a matrix can be made with other rows, then that matrix is linearly dependent. This means the matrix can be further modified.</p>
<p>This way, a matrix&nbsp; has the property of linear independence when its rows cannot be created by combining each other.</p>
<p>For example, when we have a complex matrix like this one:</p>
<p>$$C = \begin{bmatrix} 1 &amp; 2 &amp; 3 &amp; 4 \ 2 &amp; 4 &amp; 6 &amp; 8 \ 1 &amp; 3 &amp; 5 &amp; 7 \ 0 &amp; 1 &amp; 2 &amp; 3 \end{bmatrix}$$</p>
<p>We can, with calculations, convert to this:</p>
<p>$$C_{\text{reduced}} = \begin{bmatrix} 1 &amp; 0 &amp; -1 &amp; -2 \ 0 &amp; 1 &amp; 2 &amp; 3 \ 0 &amp; 0 &amp; 0 &amp; 0 \ 0 &amp; 0 &amp; 0 &amp; 0 \end{bmatrix}$$</p>
<p>if you are not familiar with row reduction, I recommend <a href="https://www.youtube.com/watch?v=eDb6iugi6Uk">this YouTube video</a>.</p>
<p>The above simplified matrix is the same thing as this:</p>
<p>$$C_{\text{reduced}} = \begin{bmatrix} 1 &amp; 0 &amp; -1 &amp; -2 \ 0 &amp; 1 &amp; 2 &amp; 3 \end{bmatrix}$$</p>
<p>This way, we conclude that the C matrix has a <strong>rank</strong> of 2.</p>
<p>In other words, since the simplest form of the matrix has only 2 rows with numbers, it has a rank of 2.</p>
<p>From this, we can conclude that the reduced version of the matrix is <strong>linearly independent</strong>. This is because no row or column can be made from the existing rows or column. It’s the simplest possible matrix.</p>
<p>The original matrix C is linearly dependent because some rows are just multiples or combinations of other rows. For example, row 2 of the original matrix C is exactly row 1 multiplied by 2.</p>
<p>Another way of seeing this is that we have 4 rows in the original matrix and the rank of matrix C is 2. Since they are not equal, C is linearly dependent.</p>
<h4 id="heading-why-are-these-concepts-important">Why are these concepts important?</h4>
<p>Linear independence and rank are important in engineering because they show whether equations, represented as matrices, give unique information. In electrical circuits and control systems, knowing that equations, represented as matrices, are independent ensures that you have unique solutions and avoids confusion.</p>
<p>The matrix rank shows the maximum number of independent equations that can exist. This help engineers model the simplest possible form of the systems.</p>
<p>In LLMs like ChatGPT, Gemini, Grok, and Claude, linear independence, dependence, and rank are used in a very important technique called LoRA (Low-Rank Adaptation).</p>
<p>LoRA (Low-Rank Adaptation) is widely used to calibrate these models to make sure they adapt efficiently to new tasks or domains without retraining the full model. Also, there are variants of this technique, like Quantized LoRA. This way, in many data centers, LoRA saves energy, water for cooling, and so many other things.</p>
<h3 id="heading-determinants-measuring-space-and-scaling">Determinants: Measuring Space and Scaling</h3>
<p>Why are determinants important?</p>
<p>Determinants tell us if a system of equations has infinite solutions, no solutions, or if it has a unique solution without having to solve the whole system.</p>
<p>This way, instead of immediately trying to solve a complex system, we can first use the determinant to find out if it is even worth solving in the first place.</p>
<p>Many engineers don’t really understand the importance of the determinant. The only thing they know is the formula and how to apply it.</p>
<p>So now let’s learn, with some examples, what exactly the determinant is and why it matters.</p>
<p>A determinant is just a number. It’s always calculated from a square matrix. By calculating the determinant, we can find certain properties about the system it represents.</p>
<p>The determinant of a given matrix A:</p>
<p>$$A = \begin{bmatrix} a &amp; b \ c &amp; d \end{bmatrix}.$$</p>
<p>can be represented by two notations:</p>
<p>$$\det(A) = ad - bc$$</p>
<p>or</p>
<p>$$|A| = ad - bc$$</p>
<p>Both are the same thing.</p>
<p>Let's see how to calculate a determinant:</p>
<p>$$|A| = \begin{vmatrix} 2 &amp; 3 \ 1 &amp; 4 \end{vmatrix} = (2)(4) - (3)(1) = 8 - 3 = 5.$$</p>
<p>Let’s see how to do this in Python:</p>
<pre><code class="language-plaintext">import numpy as np

# Define the matrix
A = np.array([
    [2, 3],
    [1, 4]
])

# Calculate the determinant
det_A = np.linalg.det(A)

print("Determinant of A:", det_A)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756233259727/feea57a3-5a33-49b9-a74a-979eba5ec7fe.png" alt="Python code image representing the code above. Finding the determinant." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-the-same-calculation-works-for-other-matrices">The same calculation works for other matrices!</h4>
<p>Here's the determinant formula for a 3×3 matrix:</p>
<p>For a 3 by 3 matrix:</p>
<p>$$|B|= \begin{vmatrix} a &amp; b &amp; c \ d &amp; e &amp; f \ g &amp; h &amp; i \end{vmatrix} = aei + bfg + cdh - ceg - bdi - afh.$$</p>
<p>Now let’s apply the formula to an example:</p>
<p>$$|B| = \begin{vmatrix} 1 &amp; 2 &amp; 3 \ 0 &amp; 4 &amp; 5 \ 1 &amp; 0 &amp; 6 \end{vmatrix} = (1)(4)(6) + (2)(5)(1) + (3)(0)(0) - (3)(4)(1) - (2)(0)(6) - (1)(5)(0)$$</p>
<p>Assessing each term:</p>
<p>$$= (1)(4)(6) + (2)(5)(1) - (3)(4)(1) = 4 \cdot 6 + 2 \cdot 5 - ( 3 \cdot 4) = 24+10-12 = 22$$</p>
<p>In Python code:</p>
<pre><code class="language-plaintext">import numpy as np

# Define the matrix
B = np.array([
    [1, 2, 3],
    [0, 4, 5],
    [1, 0, 6]
])

# Calculate the determinant
det_B = np.linalg.det(B)

print("Determinant of B:", det_B)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756233606615/4e333b35-4714-480a-8a3b-62db799614e1.png" alt="Python code image representing the code above. Finding a 3 by 3 determinant." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now, let’s visualize matrix A by plotting its column vectors. Each column will become a vector: (3,1) and (-2,4). This shows us geometrically what the matrix is actually doing.</p>
<p>In a geogebra graph, it gives us this:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756235393476/6b5c38ea-7b27-4e3d-8ad4-346417d35e77.png" alt="Representation of 2 vectors in a Cartesian plane." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>As we can see, the vectors define how each variable influences the system. By visualizing what the matrices are doing, we can find patterns that are harder to find just by looking at formulas.</p>
<p><strong>What does this mean visually?</strong></p>
<p>It means that in the space, this is what our matrix looks like. It’s also how our system of equations is represented.</p>
<p>C1 represents the “force“ or the impact the variable x1 has. And C2 does the same thing for the variable x2.</p>
<p>Now we’ll focus on a 3D matrix example. This matrix D represents a system of three equations with three variables:</p>
<p>$$D = \begin{bmatrix} 2 &amp; -1 &amp; 3 \ 4 &amp; 0 &amp; -2 \ -1 &amp; 5 &amp; 1 \end{bmatrix}$$</p>
<p>$$\begin{align} 2x_1 - x_2 + 3x_3 &amp;= p \ 4x_1 + 0x_2 - 2x_3 &amp;= q \ -x_1 + 5x_2 + x_3 &amp;= r \end{align}$$</p>
<p>Each column can be described as a separate vector:</p>
<p>$$\begin{equation} D = \left[ D_1 \mid D_2 \mid D_3 \right] = \left[ \begin{bmatrix} 2 \ 4 \ -1 \end{bmatrix} \mid \begin{bmatrix} -1 \ 0 \ 5 \end{bmatrix} \mid \begin{bmatrix} 3 \ -2 \ 1 \end{bmatrix} \right] \end{equation}$$</p>
<p>As we can see, D was decomposed in 3 new column vectors:</p>
<p>$$\begin{equation} D_1 = \begin{bmatrix} 2 \ 4 \ -1 \end{bmatrix} \end{equation}$$</p>
<p>and:</p>
<p>$$\begin{equation} D_2 = \begin{bmatrix} -1 \ 0 \ 5 \end{bmatrix} \end{equation}$$</p>
<p>and:</p>
<p>$$\begin{equation} D_3 = \begin{bmatrix} 3 \ -2 \ 1 \end{bmatrix} \end{equation}$$</p>
<p>In a geogebra graph, it gives us this:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756236913078/8d8a3d48-20a9-423b-bfb8-4368d92ec340.png" alt="Representation of 3 vectors in a 3D Cartesian plane." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In 3D, each vector points in its own direction. Together, they organize three planes. Where all three planes touch is the solution to the system.</p>
<p>This is a key advantage of matrices and linear algebra. They help us visualize both simple and complex systems, enhancing systems thinking and first principles thinking.</p>
<p>The determinant is directly connected to these visualizations. For example, in 2D it measures the area that the vectors stretch over. Now we’ll see how that’s possible.</p>
<p>Let's use matrix A and see what its determinant looks like in geometric terms:</p>
<p>$$A = \begin{bmatrix} 2 &amp; 3 \ 1 &amp; 4 \end{bmatrix}$$</p>
<p>Which can be decomposed into 2 vectors <code>u</code> and <code>v</code>:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756241016899/ded47498-b030-4fa1-a4fe-07153d138a7f.png" alt="Representation of 2 vectors (matrix A) in a Cartesian plane." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>It gives us this determinant:</p>
<p>$$|A| = \begin{vmatrix} 2 &amp; 3 \ 1 &amp; 4 \end{vmatrix} = (2)(4) - (3)(1) = 8 - 3 = 5.$$</p>
<p>Now let’s see the determinant visually.</p>
<p>From (2,1) and (3,4), we can draw vectors parallel to u and and v. These are called u' and v' and have the same magnitude. They meet at (5,5), and we have a parallelogram that’s completed with these points: (0,0),(2,1),(3,4),(5,5)</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756241586617/d825b8e2-d839-4b15-bdd0-d9b5efd80942.png" alt="Representation of the 4 vectors being used in the determinant" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The area of the parallelogram is the determinant:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756241692073/deb2e0cd-32a3-4a1a-90e7-e556f5039169.png" alt="Illustrating that the area limited by the 4 vectors is the determinant." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Let’s see another example.</p>
<p>Let’s use a matrix F and see what it truly is:</p>
<p>$$F = \begin{bmatrix} 1 &amp; 2 \ 2 &amp; 4 \end{bmatrix}$$</p>
<p>It gives us this determinant:</p>
<p>$$|F| = \begin{vmatrix} 1 &amp; 2 \ 2 &amp; 4 \end{vmatrix} = (1)(4) - (2)(2) = 4 - 4 = 0$$</p>
<p>In geogebra, we can see that:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756242215981/d88f2e80-04ba-46b9-979d-d7684f161210.png" alt="Representation of the 2 vectors being used in the determinant" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now let’s try to see the determinant visually:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756242340382/46551578-69a5-4ef9-ab86-9149e7fb4aaa.png" alt="Illustrating that the area limited by the 2 vectors is the determinant and that it does not exist. So the determinant is zero." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>We can conclude that the area is 0.</p>
<p>Now let’s use a matrix G and see what it truly is:</p>
<p>$$G = \begin{bmatrix} 1 &amp; 5 \ 2 &amp; 3 \end{bmatrix}$$</p>
<p>It gives us this determinant:</p>
<p>$$|G| = \begin{vmatrix} 1 &amp; 5 \ 2 &amp; 3 \end{vmatrix} = (1)(3) - (5)(2) = 3 - 10 = -7$$</p>
<p>In geogebra, we can see that:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756242987960/d182b725-81ba-4042-81e1-6b0232e09ffb.png" alt="Representation of the 2 vectors being used to find the determinant" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now let’s try to see the determinant visually.</p>
<p>From (1,2) and (5,3), we can draw vectors parallel to u and and v. These are called u' and v' and have the same magnitude. They meet at (6,5). A parallelogram is completed with these points: (0,0),(1,2),(5,3),(6,5)</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756243098714/881693d4-7a84-4b72-bb87-3fb48b25fe4b.png" alt="Representation of 4 vectors being used to find the determinant before showing the area" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Again, the area of the parallelogram is the determinant:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756243316071/ce8fa65b-6370-4ada-9fe6-cdf20ab4546d.png" alt="Illustrating that the area limited by the 4 vectors is the determinant." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>We just saw that the determinant is the area of a parallelogram formed by the vectors. When the determinant is 0, there is no area. In other cases, there is an area. But what does this mean, and why do we care about these different values?</p>
<p><strong>When the det = 0:</strong></p>
<ul>
<li><p>The vectors are linearly dependent (one can be written as a combination of the others)</p>
</li>
<li><p>They lie on the same line or one is a scaled version of the other</p>
</li>
<li><p>The parallelogram collapses to a line, hence zero area</p>
</li>
<li><p>This tells us the matrix has no inverse</p>
</li>
<li><p><strong>Systems of equations either have no solution or infinitely many solutions</strong></p>
</li>
</ul>
<p><strong>When the det ≠ 0 (det &gt; 0 or det &lt; 0):</strong></p>
<ul>
<li><p>The vectors form a proper parallelogram with an area</p>
<ul>
<li><p>If det &gt; 0, the area is positive and transformation preserves orientation</p>
</li>
<li><p>If det &lt; 0, the area is negative and the orientation is flipped</p>
</li>
</ul>
</li>
<li><p>The vectors are linearly independent</p>
</li>
<li><p><strong>Systems of equations have exactly one solution</strong></p>
</li>
</ul>
<p>In electrical engineering, determinants help verify if a control system is controllable and observable.</p>
<p>Control systems use matrices a lot. For this reason, checking if their determinants are zero or non-zero tells engineers:</p>
<ul>
<li><p>If it is controllable, it means the system is reachable, which helps in stabilization and performance optimization.</p>
</li>
<li><p>If it is observable, it means the system is measurable, which helps in fault detection and system monitoring.</p>
</li>
</ul>
<p>In finite element analysis, a very popular math tool to solve partial differential equations, determinants helps figure out quickly if the calculations will give reliable results.</p>
<p>This way, with finite element analysis, we can design safer buildings, optimize aircraft wings, and simulate medical implants – all of which have a large impact on human lives and safety.</p>
<p>In machine learning, determinants are crucial to understanding data transformations. In these methods, if a determinant with a value of zero shows up, it means you are losing information and can't recover original data.</p>
<p>Also in deep learning, it’s used to decide the first parameters of neural networks (weight initialization) to prevent problems like the vanishing/exploding gradients.</p>
<p>In a 3×3 matrix, the determinant represents the volume of a parallelepiped (a 3D "box") formed by three vectors in 3D space.</p>
<ul>
<li><p>If det = 0: The three vectors lie in the same plane, so they don't span any 3D volume</p>
</li>
<li><p>If det ≠ 0: The vectors form a proper 3D shape with actual volume</p>
</li>
</ul>
<p>The absolute value |det| gives you the exact volume of that <a href="https://en.wikipedia.org/wiki/Parallelepiped">parallelepiped</a>.</p>
<p>For example, if you have vectors a, b, and c, the determinant tells you how much 3D space they "fill up" when you use them as the edges of a box.</p>
<p>This is where it gets fascinating:</p>
<ul>
<li><p>4×4 matrix: The determinant represents the "hypervolume" of a 4D parallelepiped formed by four vectors in 4-dimensional space.</p>
</li>
<li><p>1000×1000 matrix: The determinant represents the hypervolume in 1000-dimensional space!</p>
</li>
</ul>
<p>So, to summarize, the determinant tells us easily if there are no solutions, infinite solutions, or exactly one solution in a system of equations, represented by a compact matrix.</p>
<h3 id="heading-what-are-mathematical-spaces-and-how-do-they-simplify-calculations">What Are Mathematical Spaces and How Do They Simplify Calculations?</h3>
<p>We now have a great foundation to understand the rest of this chapter on linear algebra.</p>
<p>Now, we will see see how a linearly independent matrix create something called a basis. Also, we will see that a basis is just a a set of building blocks for mathematical spaces!</p>
<p>The row vectors of a linearly independent matrix form a basis.</p>
<p>For example in matrix A, which is linearly independent:</p>
<p>$$A = \begin{bmatrix} 1 &amp; 0 &amp; 0 &amp; 0 \ 0 &amp; 1 &amp; 0 &amp; 0 \ 0 &amp; 0 &amp; 1 &amp; 0 \ 0 &amp; 0 &amp; 0 &amp; 1 \end{bmatrix}$$</p>
<p>forms this set:</p>
<p>$$((1,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1))$$</p>
<p>In this case, since matrix A is linearly independent, the set of matrix rows is called a <strong>basis</strong>. From this basis, you can create endless linear combinations of any other vector. The collection of all these possible combinations is called a <strong>mathematical space</strong>.</p>
<p>A mathematical space is an infinite set where all linear combinations of a basis exist. Its called a basis because these vectors <strong>form the base</strong> to express any vector in the space as a linear combination.</p>
<p>This matrix B is linearly independent:</p>
<p>$$B = \begin{bmatrix} 1 &amp; 0 \ 0 &amp; 1 \ \end{bmatrix}$$</p>
<p>And forms this set:</p>
<p>$$((1, 0), (0, 1))$$</p>
<p>And from this come all possible points in this cartesian coordinate system:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756247201687/a847b8c0-5678-431c-b446-e1897afdffc6.png" alt="Showing in the Cartesian plane where the point (2, 3) is" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>For example, mathematically, we can get the point (2,3) by:</p>
<p>$$(x=2, y=3) = 2(1, 0) + 3(0, 1) = (2, 0) + (0, 3) = (2, 3)$$</p>
<p>Note: There are other bases for the cartesian coordinate plane. I chose this one because it’s the easiest to understand.</p>
<h3 id="heading-eigenvalues-and-eigenvectors-unlocking-hidden-patterns">Eigenvalues and Eigenvectors: Unlocking Hidden Patterns</h3>
<p>Eigenvalues and eigenvectors, in my opinion, are far simpler than what mathematics professors make them out to be at university:</p>
<ul>
<li><p>Eigenvalues tell you how much a matrix stretches or shrinks things.</p>
</li>
<li><p>Eigenvectors tell you which directions stay unchanged when the matrix transforms them.</p>
</li>
</ul>
<p>This way, a matrix may have one or many eigenvalues which in turn result in many eigenvectors.</p>
<p>Let’s see an example:</p>
<p>For a square matrix A, eigenvalue λ, and eigenvector v:</p>
<p>$$Av=λv$$</p>
<p>The easiest way to find the eigenvalue is to calculate this:</p>
<p>$$det(A−λI)=0$$</p>
<p>or:</p>
<p>$$|A−λI|=0$$</p>
<p>Again, we have different notations for the determinant, but they’re the same thing.</p>
<p>Anyway, let’s define a very simple matrix A:</p>
<p>$$A = \begin{bmatrix} 2 &amp; 0 \ 0 &amp; 3 \end{bmatrix}$$</p>
<p>Now let’s make some calculations.</p>
<p>This formula:</p>
<p>$$det(A−λI)=0$$</p>
<p>Can be decomposed into:</p>
<p>$$det(\begin{bmatrix} 2 &amp; 0 \ 0 &amp; 3 \end{bmatrix} - λ \times \begin{bmatrix} 1 &amp; 0 \ 0 &amp; 1 \end{bmatrix}) = 0$$</p>
<p>Which is the same has:</p>
<p>$$det(\begin{bmatrix} 2 &amp; 0 \ 0 &amp; 3 \end{bmatrix} - \begin{bmatrix} λ &amp; 0 \ 0 &amp; λ \end{bmatrix}) = 0$$</p>
<p>Which gives us:</p>
<p>$$det(\begin{bmatrix} 2-λ &amp; 0 \ 0 &amp; 3-λ \end{bmatrix}) = 0$$</p>
<p>By the calculations we made above on the determinant, we can conclude that:</p>
<p>$$(2-λ) \times (3-λ) = 0$$</p>
<p>Which is the same has:</p>
<p>$$2-\lambda = 0 \text{ or } 3-\lambda = 0$$</p>
<p>Which gives us these eigenvalues:</p>
<p>$$\lambda_1 = 2, \quad \lambda_2 = 3$$</p>
<p>And these eigenvectors:</p>
<p>$$\mathbf{v_1} = \begin{bmatrix} 1 \ 0 \end{bmatrix}, \quad \mathbf{v_2} = \begin{bmatrix} 0 \ 1 \end{bmatrix}$$</p>
<p>This means that in the Cartesian coordinate system:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756321668969/949a5a4b-12ff-4490-bbff-1cc032bc5705.png" alt="Showing how the eigenvectors are related to the vectors in matrix A visually. Both have the same directions but different scalar values." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>By applying the eigenvectors, we can see that:</p>
<ul>
<li>The eigenvalue 2 is associated with the eigenvector v1:</li>
</ul>
<p>$$A\mathbf{v_1} = \begin{bmatrix} 2 &amp; 0 \ 0 &amp; 3 \end{bmatrix}\begin{bmatrix} 1 \ 0 \end{bmatrix} = \begin{bmatrix} 2 \ 0 \end{bmatrix} = 2\begin{bmatrix} 1 \ 0 \end{bmatrix}$$</p>
<ul>
<li>The eigenvalue 3 is associated with the eigenvector v2:</li>
</ul>
<p>$$A\mathbf{v_2} = \begin{bmatrix} 2 &amp; 0 \ 0 &amp; 3 \end{bmatrix}\begin{bmatrix} 0 \ 1 \end{bmatrix} = \begin{bmatrix} 0 \ 3 \end{bmatrix} = 3\begin{bmatrix} 0 \ 1 \end{bmatrix}$$</p>
<p>Here is the Python code to calculate this:</p>
<pre><code class="language-plaintext">import numpy as np

# Define matrix A
A = np.array([[2, 0],
              [0, 3]])

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:")
print(eigenvalues)

print("Eigenvectors (columns):")
print(eigenvectors)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756322044095/bc76f0ec-1d13-4845-b0f3-2847118860a3.png" alt="Python code, with NumPy array, showing how to find the eigenvalues" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Eigenvalues and eigenvectors are key tools in engineering and machine learning because they reveal a matrix's fundamental behavior. Although a matrix transformation might seem complex, in reality:</p>
<ul>
<li><p>Eigenvalues show how much stretching or compression occur.</p>
</li>
<li><p>Eigenvectors identify the special directions where this stretching happens most naturally.</p>
</li>
</ul>
<p>In machine learning, we can use Principal Component Analysis (PCA) to make datasets smaller.</p>
<p>So, for example, let's say you’re building a machine learning application to predict heart disease. You have 100 data categories and 1 target variable telling whether a person has it or not.</p>
<p>With PCA, you can convert the 100 categories into, say, 40 categories. This way, you can make a smaller machine learning model and save computational resources.</p>
<p>PCA uses eigenvectors of covariance matrices to find important directions in data with many variables. It reduces data size without losing much detail, helping machine learning algorithms focus on key features and ignore unnecessary information.</p>
<h3 id="heading-applications-of-linear-algebra-in-ai-and-control-theory">Applications of Linear Algebra in AI and Control Theory</h3>
<p>‌Linear algebra serves as the mathematical foundation for all engineering fields.</p>
<p>In addition, the principles of matrices and linear transformations provide the computational foundation that makes modern AI possible while enabling the control of complex systems.</p>
<p>All LLMs, from ChatGPT and Claude to Gemini and Grok, rely on linear operations.</p>
<p>All these systems carry out huge matrix multiplications to handle and create human language. So, when you type something into ChatGPT, probably millions of matrix multiplications are happening as you wait for a response!</p>
<p>In control theory, especially in an area called state-space control theory, matrices make it possible to create complex controllers. Linear algebra helps engineers design controllers for things like aircraft autopilots and robotic systems, among other applications</p>
<p>For example, when a rocket adjusts its trajectory or a drone maintains stable flight, many matrix multiplications are happening to determine the best way to guarantee the system’s stability.</p>
<p>Thanks to GPUs, linear algebra matrices are very efficient to compute. Also, any new matrix multiplication algorithms or special hardware for faster linear operations can greatly enhance AI and control systems.</p>
<p>In the end, linear algebra is the hidden mathematical engine powering the current AI revolution.</p>
<h2 id="heading-chapter-5-multivariable-calculus-change-in-many-directions">Chapter 5: Multivariable Calculus -&nbsp;Change in Many Directions</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002238157/a377cdc6-7e85-491b-90b8-8b3243618288.jpeg" alt="Photo of a women writing a calculus equation in a board" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/woman-writing-on-a-whiteboard-3862130/">Photo by ThisIsEngineering</a></p>
<h3 id="heading-limits-and-continuity-understanding-smooth-change">Limits and Continuity: Understanding Smooth Change</h3>
<p>Calculus is one of the most valuable areas of mathematics and it focus on the study of continuous change.</p>
<p>Before we start learning a topic that makes many people give up on engineering degrees, I want to once again assure you that this chapter is very easily explained with a lot of images and code examples.</p>
<p>Also, just like linear algebra, many concepts in calculus are components of tools that have helped create billion-dollar industries.</p>
<h4 id="heading-what-is-continuity">What is continuity?</h4>
<p>Before going and explaining topics like derivatives and integrals, we need to understand continuity.</p>
<p>In simple terms, continuity means that a function has no breaks, jumps, or holes.</p>
<p>Essentially, you can draw it without lifting your pencil from the paper.</p>
<p>For example, this function is continuous:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756402257225/f9cfc4f3-a6f1-4fb9-9ed1-f690c4ffffc4.png" alt="Example of a function that is continuous" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>You can draw this graph without taking the pencil off the paper.</p>
<p>The above graph is represented by this function:</p>
<p>$$y = x^2 - 4x + 3$$</p>
<p>But the below function is <strong>not</strong> continuous:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756402337970/b5a65748-572d-4342-9685-9472babde38a.png" alt="Example of a function that is not continuous" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This one, you <strong>can’t</strong> draw without taking the pencil off the paper.</p>
<p>It’s represented by this piecewise function:</p>
<p>$$y = \begin{cases} 1.5 + \frac{1}{x+1} &amp; \text{if } -1 &lt; x &lt; 2 \ 2 + \frac{2}{(x-1)^2} &amp; \text{if } x &gt; 2 \end{cases}$$</p>
<p>This piecewise function is essentially two individual functions for two different intervals of numbers. Since calculus is the study of continuous change, we can only realistically use it in continuous functions.</p>
<h4 id="heading-how-do-limits-guarantee-continuity">How do limits guarantee continuity?</h4>
<p>We can only use tools like derivatives and integrals if a function is continuous.</p>
<p>How can we describe mathematically that a function is continuous – like drawing it without lifting our pencil from the paper?</p>
<p>Limits solve that problem.</p>
<p>When we take the limit of a function at a given point, we're asking: what value does a function approach as we get close to that point?</p>
<p>Let's look at some examples of this function at these points and also understand the notation used in limits:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756403511442/de3450f2-dcf9-40e3-a04e-846334abeebd.png" alt="Example of a function that is continuous and its various points" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<ol>
<li><strong>What is the limit of the point x=0?</strong></li>
</ol>
<p>It is 3. It actually crosses the y axis.</p>
<p>In mathematical notation,</p>
<p>$$\begin{align} \lim_{x \to 0} (x^2 - 4x + 3) &amp;= (0)^2 - 4(0) + 3 \ &amp;= 0 - 0 + 3 \ &amp;= 3 \end{align}$$</p>
<p>In this notation, we're asking what the value of the y function is as x gets very close to 0. Think of x as being at 0.00000000000001 or -0.00000000000001. It gets so close that we can consider it near enough.</p>
<ol>
<li><strong>What is the limit of the point x=1?</strong></li>
</ol>
<p>Le’s see another example:</p>
<p>In this case, it’s 0.</p>
<p>$$\begin{align} \lim_{x \to 1} (x^2 - 4x + 3) &amp;= (1)^2 - 4(1) + 3 \ &amp;= 1 - 4 + 3 \ &amp;= 0 \end{align}$$</p>
<p>In this notation, we're asking what the value of the y function is as x gets very close to 1. Think of x as being at 0.99999999999999 or 1.00000000000001. It gets so close that we can consider it near enough.</p>
<ol>
<li><strong>What is the limit of the point x=2?</strong></li>
</ol>
<p>Le’s see another example</p>
<p>Here, it’s -1.</p>
<p>$$\begin{align} \lim_{x \to 2} (x^2 - 4x + 3) &amp;= (2)^2 - 4(2) + 3 \ &amp;= 4 - 8 + 3 \ &amp;= -1 \end{align}$$</p>
<p>Some more quick examples:</p>
<ol>
<li><strong>What is the limit of the point x=3?</strong></li>
</ol>
<p>In this notation, we're asking what the value of the y function is as x gets very close to 1. Think of x as being at 1.99999999999999 or 2.00000000000001. It gets so close that we can consider it near enough.</p>
<ol>
<li><strong>What is the limit of the point x=4?</strong></li>
</ol>
<p>It is 0.</p>
<ol>
<li><strong>What is the limit of the point x=5?</strong></li>
</ol>
<p>It is 3.</p>
<p>Now let’s see another example:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756403617161/b67b2977-8ae4-4c06-8156-d7c6a64ee2e1.png" alt="Example of a function that is not continuous at a point of x=2" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In the point x=2, it’s not well defined</p>
<ul>
<li><p>If we draw with a pencil from the left to x=2, we end up with 1.83333</p>
</li>
<li><p>If we draw with a pencil from the right to x=2, we end up with 4</p>
</li>
</ul>
<h3 id="heading-why-are-limits-important-to-understand-derivatives-and-integrals">Why are limits important to understand derivatives and integrals?</h3>
<p>As we have seen, when we talk about limits, we are talking about a value that symbolizes the value that a function approaches as it comes toward a particular point.</p>
<p>It’s critical to note that we're not looking at the value of that point itself. We’re looking at what happens as we get so near to it that we can pin down what value the function is approaching.</p>
<p>I will now show a very simple example to demonstrate this concept using mathematical notation.</p>
<p>I know that limits can be a difficult concept to understand at first. But if you understand limits very well, then you'll be well-prepared to understand derivatives and integrals.</p>
<p>And, as you’ll see, derivatives are responsible for modern AI and integrals are important parts of tolls widely used in billion-dollar industries.</p>
<p>I want you to understand the <strong>intuition</strong> behind this.</p>
<p>The function z(x) is continuous:</p>
<p>$$z(x) = \frac{3x + 7}{x + 2}$$</p>
<p><strong>So to what value does this expression converge as x approaches infinity?</strong></p>
<p>If you have a background in math, you might see why. But here for those who aren’t sure:</p>
<ul>
<li>It converges to 3.</li>
</ul>
<p>This time, the limit will be approaching infinity instead of a constant:</p>
<p>$$\begin{align} \lim_{x \to \infty} \frac{3x + 7}{x + 2} \end{align}$$</p>
<p>Let’s solve this in a very simple way:</p>
<ul>
<li>For x = 1:</li>
</ul>
<p>$$f(1) = \frac{3(1) + 7}{1 + 2} = \frac{10}{3} \approx 3.333...$$</p>
<ul>
<li>For x = 5:</li>
</ul>
<p>$$f(5) = \frac{3(5) + 7}{5 + 2} = \frac{22}{7} \approx 3.143...$$</p>
<ul>
<li>For x = 10:</li>
</ul>
<p>$$f(10) = \frac{3(10) + 7}{10 + 2} = \frac{37}{12} \approx 3.083...$$</p>
<ul>
<li>For x = 50:</li>
</ul>
<p>$$f(50) = \frac{3(50) + 7}{50 + 2} = \frac{157}{52} \approx 3.019...$$</p>
<ul>
<li>For x = 100:</li>
</ul>
<p>$$f(100) = \frac{3(100) + 7}{100 + 2} = \frac{307}{102} \approx 3.010...$$</p>
<ul>
<li>For x = 1000:</li>
</ul>
<p>$$f(1000) = \frac{3(1000) + 7}{1000 + 2} = \frac{3007}{1002} \approx 3.001...$$</p>
<ul>
<li>For x = 10000:</li>
</ul>
<p>$$f(10000) = \frac{3(10000) + 7}{10000 + 2} = \frac{30007}{10002} \approx 3.0001...$$</p>
<p>As x gets bigger and bigger, we get closer and closer to 3.</p>
<p>This is the main idea of limits: Describe the value a function approaches as the input approaches some point.</p>
<p>This same idea applies to derivatives: they’re just limits that measure rates of change (slopes of tangent lines).</p>
<p>And as well, Integrals are just limits that measure accumulated quantities (areas under curves)..</p>
<p>Let’s now see how derivatives work in depth.</p>
<h3 id="heading-derivatives-how-things-change-and-how-fast">Derivatives: How Things Change and How Fast</h3>
<p>As I said before, derivatives are just limits that measure rates of change (slopes of tangent lines).</p>
<p>But what does this actually mean?</p>
<p>Let’s see an example:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756755419750/75b36254-0f4a-4395-8dd4-14ac16399ff3.png" alt="Example of a function" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>What is the rate of change in the point A?</strong></p>
<p>Hard question right? Let’s think how to answer this with limits.</p>
<p>We can find the limit of the rate of change in point A(0.72, 0.66), also called the instantaneous rate of change.</p>
<p>Let’s do that:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756755680672/40f94361-55c7-4a9e-bfaf-b2b855fa0712.png" alt="Example of a function and choosing two points (B and C) to find the rate of change in point A" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>To find the slope, we take the coordinates of the points B(0.2, 0.2) and C(1.6, 1):</p>
<p>$$\text{slope} = \frac{1 - 0.2}{1.6 - 0.2} = \frac{0.8}{1.4} = \frac{4}{7} \approx 0.571$$</p>
<p>This gives us a rate of change:</p>
<p>$$y=0.571x + 0.084$$</p>
<p>Let's approximate more:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756756069833/3a4a1991-4983-4751-a68e-68bd6780300d.png" alt="Example of a function and choosing two points (B and C) to find the rate of change in point A. But B and C are closer to A." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Let’s also zoom in:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756756131072/f96b7f82-a4ed-4720-8c87-fd2936bae9d9.png" alt="Example of a function and choosing two points (B and C) to find the rate of change in point A. But B and C are closer to A, and we have to zoom in." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>To find the slope, we use the coordinates of the points B(0.58, 0.55) and C(0.85, 0.75):</p>
<p>$$\text{slope} = \frac{0.85- 0.58}{0.75 - 0.55} = \frac{0.27}{0.2} = \frac{2.7}{2} \approx 1.35$$</p>
<p>It gives us a rate of change:</p>
<p>$$y=1.35x + 0.11$$</p>
<p>Now let's approximate a lot:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756756879223/11d26af3-06ec-4419-b631-10308b4cadef.png" alt="Example of a function and choosing two points (B and C) to find the rate of change in point A. But B and C are closer to A, and we have to zoom in." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>To find the slope, we use the coordinates of the points B(0.7242549, 0.6625776) and C(0.7242884, 0.66260026):</p>
<p>$$\text{slope} = \frac{0.66260026- 0.6625776}{0.7242884- 0.7242549} = \frac{0.0000226}{0.0000335} = \frac{0.226}{0.335} \approx 0.674$$</p>
<p>Now let’s zoom out:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756757322888/a6f58b41-d6ff-44fd-b18f-06fb1f8f0e06.png" alt="Rate of change at point C" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>As we can see, we are so close that we can consider the limit of the rate of change to be 0.65.</p>
<p>It gives us the rate of change:</p>
<p>$$y=0.674x + 0.12$$</p>
<p><strong>This way, the limit of a rate of change is called a derivative.</strong></p>
<p>To recap, here is an animation:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756766733257/a1754b47-7c57-4387-8b4c-886ed7b8f80a.gif" alt="GIF animation based on previous images" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here’s a Python code example that lets you find the derivative in point A:</p>
<pre><code class="language-python">import sympy as sp

x = sp.symbols('x')
f = sp.sin(x)

# Derivative of sin(x)
derivative_of_sin = sp.diff(f, x)

# Evaluate at x = 0.72 and x = 0.66
val = f_prime.subs(x, 0.72).evalf()

print("Derivative of sin(x) at x=0.72:", val)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756758436107/3bda58c5-96d6-4834-a2ec-ab8fedc4cb56.png" alt="Image of code example to find the derivative of the function sin(x)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The function that had the point A is called a sine wave.</p>
<p>We convert it to its derivative function. From there we have our rate of change at point 0.72.</p>
<p>When we do math by hand, <strong>we usually have many rules to convert a function to its derivative, and from these find the rate of change for a given point.</strong></p>
<p>Before seeing it, let’s look at a very simple example to understand the definition of a derivative:</p>
<p>$$\frac{d}{dx}f(x) \approx \frac{f(\textcolor{green}{x + h}) - f(\textcolor{red}{x - h})}{\textcolor{green}{x + h} - \textcolor{red}{x - h}} = \frac{f({x + h}) - f({x - h})}{2h}$$</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756767749954/87486d8c-9437-460c-b556-e9333b1590c5.png" alt="Image showing in derivative definition how each component is related visually to a line representing the rate of change" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><code>h</code> represents a small difference.</p>
<p>The derivative is the slope of the function’s small change near a point. In other words, it’s the limit of the rate of change of a given point.</p>
<p>A simple derivative transformation might look like this one:</p>
<p>$$\frac{d}{dx}x^n = nx^{n-1}$$</p>
<p>Two examples are:</p>
<p>$$\frac{d}{dx}x^3 = 3x^2$$</p>
<p>And:</p>
<p>$$\frac{d}{dx}x^5 = 5x^4$$</p>
<p>There are many more. But we won’t go into deep detail on this topic.</p>
<h4 id="heading-where-and-why-are-derivatives-so-important">Where and why are derivatives so important?</h4>
<p>Derivatives are one of the most important math tools out there. They serve as the foundation for understanding change across nearly all fields of STEM.</p>
<p>In physics (classical mechanics), derivatives are very important to find new information that draws on information that’s already made available.</p>
<p>For example, knowing how a body's position changes over time allows us to use derivatives to find its velocity and acceleration. This is crucial for self-driving cars, trains, rockets, and more.</p>
<p>Also, derivatives are the foundation of understanding how electricity works in depth. Without derivatives, there would’ve been no electromagnetic theory. Without electromagnetic theory, modern technology would not exist.</p>
<p>In machine learning, derivatives are so important that they served to create the algorithm that is one of the most important components of ChatGPT and others AI models. (backpropagation).</p>
<p>Backpropagation is in fact so important that its creators, John Hopfield and Geoffrey Hinton, won the 2024 Nobel Prize in Physics for it.</p>
<p>Also, autonomous vehicles like Tesla and Waymo use AI models called neural networks that depend on backpropagation to work.</p>
<p>It’s awesome that a math concept created in the 17th century is now one of the foundations of the current AI revolution.</p>
<h3 id="heading-what-about-integral-calculus">What About Integral Calculus?</h3>
<p>Before explaining derivatives further, I will ask you a question:</p>
<p>How can we find the area of the below shape?</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764401826343/2583b3b0-0bcd-4204-921e-300b27c9fc3d.png" alt="Image of a finite integral of the function sin(x)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In other words how can we find the integral of the function in the given interval?</p>
<p>Let’s see how to do it step by step.</p>
<p>First, we’ll try using 2 rectangles to approximate the area behind the curve:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402058848/5023772e-ed0d-4efc-a5cd-3e1a856f6d69.png" alt="Using 2 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area of the rectangles is 6.282573.</p>
<p>But there is still a lot of error…</p>
<p>As we can see, the left rectangle does not cover completely the curve and the right rectangle covers too much.</p>
<p>So we’ll add more smaller rectangles so that we can better approximate the curve.</p>
<p>Now let’s try using 4 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764483444354/c06cd1c2-0f92-4728-898e-fbaf1534d57f.png" alt="Using 4 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.497481. But there’s still some error.</p>
<p>As we can see, the error is getting smaller. In other words, the 4 rectangles cover the area of the curve better than just the 2 rectangles. But there’s still a lot of room to make it better.</p>
<p>Let’s try using 8 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402069389/e9ad0576-dd9d-4535-bf3a-4c4bcd77db98.png" alt="Using 8 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.604935.</p>
<p>How about using 16 rectangles?</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402075078/6ad6278f-4b71-411b-8552-2554152a04cb.png" alt="Using 16 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.658662.</p>
<p>Let’s try using 32 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402079649/4e673391-7e7a-4ca3-b07a-22508c5b058e.png" alt="Using 32 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.685525.</p>
<p>Now how about using 64 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402084920/4851d710-ff9d-4562-ba7d-9b759473f577.png" alt="Using 64 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.698957.</p>
<p>And using 128 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402090280/bd5b139c-58e1-4a7a-869d-5107b7eff345.png" alt="Using 128 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.705673.</p>
<p>What about using 256 rectangles:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402098061/3ee50020-0143-42b1-aea7-8c762aa33e53.png" alt="Using 256 rectangles to try to find the area under the curve" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now the area is 6.709031. And the error has reached 0.0000!</p>
<p>Now let’s see an animation of this:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764402052869/e9a54332-75b5-4e46-90cc-3bc09e636ad3.gif" alt="GIF animation of the rectangles from 2 to 256 to represent the finite integral" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>As you can see, we can approximate the area by having a limit to infinity to the number of rectangles to approximate the area.</p>
<p>This way, we can conclude that:</p>
<p>$$F(x) = \int_0^{3.14} f(x) , dx = \int_0^{3.14} (\sin(x) + 1.5) , dx = 6.71$$</p>
<p>This means that the area between 0 and 3.14, limited by the math equation, is 6.71!</p>
<p>Or, mathematically, the integral of f(x) in the interval 0 and 3.14 is 6.71.</p>
<h4 id="heading-where-and-how-is-this-applied">Where and how is this applied?</h4>
<p>In electrical engineering, integrals calculate total energy use in circuits by integrating power over time. For example, when designing a power supply for a device, engineers integrate the power to determine total energy costs and heat absorption requirements.</p>
<p>In other words, they see the area over time and how much power is used.</p>
<p>Let's see an example:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764832775180/911672dd-05ff-47c7-ac5f-81f4933c96ff.png" alt="Image of integral" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Imagine that in the image above:</p>
<ul>
<li><p>The X axis can be the time in months.</p>
</li>
<li><p>The Y axis is the power used in Watts (Joules per second).</p>
</li>
</ul>
<p>We can conclude that in 3.14 months(3 months and 4 days) the total amount of energy is 6.71 watt-months.</p>
<p>Here is the code to find that out:</p>
<pre><code class="language-plaintext"># Import libraries
import numpy as np
import matplotlib.pyplot as plt

# Create Function
x = np.linspace(0, 3.14, 100)
y = np.sin(x) + 1.5

# Find the area under the function
area = np.trapezoid(y, x)

# Show the final image
plt.fill_between(x, y)
plt.title(f'Area = {area:.2f}')
plt.show()
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765435075995/defc251b-812c-44ae-8b67-9a323c0af040.png" alt="Code to find finite integral of the function sin between two points" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, we import the libraries, create the function, and find the area and plot it.</p>
<p>We used numpy.trapezoid to find the area, because it’s a numerical approximation to quickly find the integral of a function between two x values.</p>
<p>numpy.trapezoid uses a numerical approximation method called the <strong>composite trapezoidal rule.</strong></p>
<p>The basic idea of the composite trapezoidal rule is to divide the area under the curve into many trapezoids and sum all of them.</p>
<p>If you want to learn more about this, I recommend reading the <a href="https://numpy.org/doc/stable/reference/generated/numpy.trapezoid.html">NumPy documentation on this method</a>.</p>
<p>From this value, we can convert to other units:</p>
<ul>
<li><p>52,400,000 joules</p>
</li>
<li><p>14.6 kWh</p>
</li>
</ul>
<p>By converting to other units, we can more easily compare this device with other devices and see if it obeys any technical standards and laws.</p>
<p><strong>This is a real-life application of integrals in engineering.</strong></p>
<p>In my degree, I used this a lot in classes related to power engineering. In simple words, power engineering is a subfield of electrical engineering focused on working with electricity with very high voltage values and electric motors.</p>
<p>In audio compression, the Fourier transform (built on integrals) decomposes sound waves into frequency components. MP3 encoders use this to identify and remove frequencies humans can't hear. This reduces file sizes while preserving quality.</p>
<p>Medical imaging relies on the Radon transform, which uses integrals to reconstruct 3D images from 2D X-ray projections. When you get a CT scan, the machine takes hundreds of X-ray "slices" at different angles. During this process, integrals combine "slices" into a detailed cross-sectional image of your body.</p>
<h3 id="heading-applications-in-ai-and-control-theory-calculus-in-action">Applications in AI and Control Theory: Calculus in Action</h3>
<p>Modern AI depends on derivatives that use the backpropagation algorithm.</p>
<p>When training a neural network, the system calculates partial derivatives of the error with respect to millions of parameters. This way, find out how to adjust each weight to improve performance. Without this, large language models like ChatGPT couldn't learn from data.</p>
<p>PID controllers, which stabilize the temperature in your oven or maintain altitude in aircraft autopilot systems, combine calculus ideas:</p>
<ul>
<li><p>The proportional term responds to the current error.</p>
</li>
<li><p>The integral term accumulates past errors to eliminate steady-state drift.</p>
</li>
<li><p>The derivative term predicts future trends to prevent overshooting.</p>
</li>
</ul>
<p>And these are just some of the applications of calculus!</p>
<h2 id="heading-chapter-6-probability-amp-statistics-learning-from-uncertainty">Chapter 6: Probability &amp; Statistics - Learning from Uncertainty</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002445093/b606e188-969e-49d8-9be9-9c15330a2939.jpeg" alt="Many purple dice together" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/purple-dices-with-different-geometrical-shape-on-a-white-surface-3649115/">Photo by Armando Are</a></p>
<p>It’s thanks to probabilities and statistics that many industries have grown so much. With statistics, we can make informed decisions and optimize many different processes. With probabilities, we can understand and model uncertainty in systems and, in this way, solve or even avoid problems.</p>
<p>While you may be familiar with some of the key concepts like median and mean, we’ll start with some basics to build up your intuition on more advanced stuff like the central limit theorem, Bayes’ theorem, and Markov chains.</p>
<h3 id="heading-mean-median-mode-measuring-central-tendency">Mean, Median, Mode: Measuring Central Tendency</h3>
<p>Let's imagine you are a data scientist working in research. You’re going to work with data to optimize the output of farms in the Central Valley in California.</p>
<p>The idea is to take in a bunch of data, and by studying it, you can help farmers make better decisions.</p>
<p>Here’s the data from one year of activity:</p>
<table>
<thead>
<tr>
<th>Farm</th>
<th>Yield (tons/ha)</th>
<th>Fertilizer Used (kg/ha)</th>
<th>Rainfall (mm)</th>
</tr>
</thead>
<tbody><tr>
<td>A</td>
<td>4.2</td>
<td>150</td>
<td>280</td>
</tr>
<tr>
<td>B</td>
<td>5.8</td>
<td>220</td>
<td>420</td>
</tr>
<tr>
<td>C</td>
<td>3.9</td>
<td>120</td>
<td>230</td>
</tr>
<tr>
<td>D</td>
<td>6.1</td>
<td>250</td>
<td>480</td>
</tr>
<tr>
<td>E</td>
<td>4.7</td>
<td>200</td>
<td>340</td>
</tr>
<tr>
<td>F</td>
<td>5.3</td>
<td>200</td>
<td>390</td>
</tr>
</tbody></table>
<p>We have 6 farms in our dataset. For each farm, we know:</p>
<ul>
<li><p>How much yield was obtained in tons per hectare</p>
</li>
<li><p>How much fertilizer was used in kilograms per hectare</p>
</li>
<li><p>How much rainfall happened during a year of activity</p>
</li>
</ul>
<p>Now, let’s answer some questions we might have about the data to understand the <strong>mean</strong>, <strong>mode</strong> and <strong>median</strong>:</p>
<h4 id="heading-1-what-is-the-average-yield-during-one-year-of-activity">1. What is the average yield during one year of activity?</h4>
<p>To find the average, we just need to sum all the yield values and divide by the number of farms. Like this:</p>
<p>$$\text{Mean} = \frac{4.2 + 5.8 + 3.9 + 6.1 + 4.7 + 5.3}{6} = \frac{30}{6} = 5$$</p>
<p>This is what is called the mean. The mean is just the sum of all values divided by how many values there are.</p>
<p>In Python, we can do the following to calculate the mean:</p>
<pre><code class="language-plaintext">def calculate_mean(values):
    return sum(values) / len(values)

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
result = calculate_mean(data)
print(f"Mean: {result}")
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763102054838/b5619d92-95ca-4c50-bb32-39d6e8e7ba7b.png" alt="Python code in an image showing how to find the mean" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-2-what-is-the-mode-of-fertilizer-used">2. What is the mode of fertilizer used?</h4>
<p>The mode is just the most popular value in a given dataset. In our case, it’s <strong>200</strong> since that’s the most common value that appears in our farm dataset.</p>
<p>In Python, we can do this to calculate the mode:</p>
<pre><code class="language-plaintext">import statistics

def calculate_mode(values):
    return statistics.mode(values)

# Example usage
data = [150, 220, 120, 250, 200, 200]
result = calculate_mode(data)
print(f"Mode: {result}")
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763102576660/3ca71e03-f762-44ad-85c3-8ccb4cb1db54.png" alt="Python code in an image showing how to find the mode" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-3-what-is-the-median-of-the-yield">3. What is the median of the yield?</h4>
<p>The median is just the value in the middle of a set of numbers. If the number of elements in the list is even, we take the mean of the two middle numbers. Here are our current yield values:</p>
<p>$$4.2, 5.8, 3.9, 6.1, 4.7, 5.3$$</p>
<p>First, we sort the values:</p>
<p>$$3.9, 4.2, 4.7, 5.3, 5.8, 6.1$$</p>
<p>Since we have 6 values (even number), the median is the average of the two middle values:</p>
<p>$$\text{Median} = \frac{4.7 + 5.3}{2} = \frac{10}{2} = 5$$</p>
<p>In Python we can do this to calculate the median:</p>
<pre><code class="language-plaintext">import statistics

def calculate_median(values):
    return statistics.median(values)

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
result = calculate_median(data)
print(f"Median: {result}")
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763102389405/52e5009b-6bc8-42c5-b8da-efe8c372fe96.png" alt="Python code in an image showing how to find the median" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-variance-and-standard-deviation-measuring-spread">Variance and Standard Deviation: Measuring Spread</h3>
<p>Knowing the mean, mode, and median of data is helpful. But it’s also important to know how far away data points are from each other.</p>
<p>That’s where measures of <a href="https://en.wikipedia.org/wiki/Statistical_dispersion">dispersion</a> come in. Variance tells us, on average, how far numbers are from the mean.</p>
<p>Let’s see an example of how to calculate this.</p>
<p>Given yield data from the table:</p>
<p>$$4.2, 5.8, 3.9, 6.1, 4.7, 5.3$$</p>
<p>The first step is the calculate the mean:</p>
<p>$$\bar{x} = \frac{4.2 + 5.8 + 3.9 + 6.1 + 4.7 + 5.3}{6} = \frac{30}{6} = 5$$</p>
<p>The second step is to calculate the variance with the sample variance formula:</p>
<p>$$s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}$$</p>
<p>Let's apply the formula little by little to understand how it works.</p>
<p>We will first we will calculate the variance of each yield data point:</p>
<p>$$\begin{align*} (4.2 - 5.0)^2 &amp;= (-0.8)^2 = 0.64 \ (5.8 - 5.0)^2 &amp;= (0.8)^2 = 0.64 \ (3.9 - 5.0)^2 &amp;= (-1.1)^2 = 1.21 \ (6.1 - 5.0)^2 &amp;= (1.1)^2 = 1.21 \ (4.7 - 5.0)^2 &amp;= (-0.3)^2 = 0.09 \ (5.3 - 5.0)^2 &amp;= (0.3)^2 = 0.09 \end{align*}$$</p>
<p>Then we will sum all the squared differences:</p>
<p>$$\sum(x_i - \bar{x})^2 = 0.64 + 0.64 + 1.21 + 1.21 + 0.09 + 0.09 = 3.88$$</p>
<p>Now, we will finally find the variance:</p>
<p>$$s^2 = \frac{3.88}{6-1} = \frac{3.88}{5} = 0.776$$</p>
<p>The standard deviation is just the square root of the variance.</p>
<p>$$s = \sqrt{s^2} = \sqrt{0.776} \approx 0.881 tons/ha$$</p>
<p>Why is this useful?</p>
<p>It puts the spread back into the same units as the data, making it easier to interpret.</p>
<p>A small standard deviation means the data huddles close to the mean, while a large one means it’s widely scattered.</p>
<p>And here is a code example of how to calculate both:</p>
<pre><code class="language-plaintext">import statistics

def calculate_variance_and_std(values):
    variance = statistics.variance(values)
    std_dev = statistics.stdev(values)
    return variance, std_dev

# Example usage
data = [4.2, 5.8, 3.9, 6.1, 4.7, 5.3]
variance, std_dev = calculate_variance_and_std(data)
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763102806607/a8236667-e4b0-48a5-9171-544c4b94096e.png" alt="Python code in an image showing how to find the variance and standard deviation" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h3 id="heading-what-is-the-normal-distribution-the-bell-curve-of-life">What Is the Normal Distribution? The Bell Curve of Life</h3>
<p>The normal distribution tells us how data naturally converges around the average value. Most values are focused on the center, and extreme values are more to the edges. This creates a bell curve.</p>
<p>By understanding this distribution, we can understand other distributions and also the central limit theorem.</p>
<p>To understand what normal distribution is, let’s look at it:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529094535/f90ffdb8-543e-4d1f-9627-335e8f356512.png" alt="Image representing the normal distribution" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The normal distribution looks like like a mountain.</p>
<p>As you can see, most values are around the mean. Also, in and around the mean is the peak. Toward the extremes, the curve gets lower and lower. This means that in the extremes there are fewer and fewer values.</p>
<p>Normal distribution also has a formula associated with it:</p>
<p>$$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right)$$</p>
<p>I won’t go in depth into how the formula works here. I just want you to understand the main idea behind the concept.</p>
<p>There are many other distributions besides the normal distribution. Some of the most common are:</p>
<ul>
<li><p>Chi-squared distribution</p>
</li>
<li><p>Student’s t distribution</p>
</li>
<li><p>Bernoulli distribution</p>
</li>
<li><p>Binomial distribution</p>
</li>
<li><p>Poisson distribution</p>
</li>
</ul>
<p>Each distribution can model different events and phenomenons. For example the Chi-squared distribution is widely used to find the correlation between two phenomenons (sunburns and skin cancer, for example).</p>
<p>The Poisson distribution is also used in modeling counts of events, like the number of clients that enter a store per hour or the number of data packets that are transmitted in a Ethernet cable.</p>
<p>But it’s also possible to approximate a lot of distributions to the normal distribution using one of the most important theorems in all of mathematics: the central limit theorem. This is what we will explore next.</p>
<h3 id="heading-how-the-central-limit-theorem-helps-approximate-the-world">How the Central Limit Theorem Helps Approximate the World</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902263857/9a03bb38-a7b9-4ef0-93f2-a7e0d80bd249.jpeg" alt="Person holding a small version of the world in their hand" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/person-holding-world-globe-facing-mountain-346885/">Porapak Apichodilok</a></p>
<p>The main idea of the central limit theorem is very simple:</p>
<ul>
<li>Most distributions can be approximated to become the normal distribution.</li>
</ul>
<p>This is just like pouring sand into a funnel. Grains may fall randomly, but over time the pile of sand will&nbsp;always begin to form the shape of a mountain.</p>
<p>This way, we can take many data points and average them. Over time, it will converge to become a normal distribution.</p>
<p>In other words, when independent random variables are all summed together, their sum tends toward a normal distribution.</p>
<p>Here is the formula:</p>
<p>$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right) \quad \text{or equivalently} \quad Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \approx N(0, 1)$$</p>
<p>You don’t need to understand in depth what it means. Just understand that it’s a theorem that approximates other distributions to the normal distribution.</p>
<h4 id="heading-and-why-is-this-important">And why is this important?</h4>
<p>Because this theorem makes many billion-dollar industries possible.</p>
<p>Instead of testing every single possible scenario, we can test for a smaller amount of scenarios and assume that if it works for the smaller one, it will work for the bigger one.</p>
<p>For example, in telecommunications, instead of testing every possible phone call or data transmission, we can just test a few connections. If it works for those few connections, we can assume it will work for millions of phone and data transmissions.</p>
<p>For clinical trials, instead of testing a drug on millions of people, we can just test a smaller number of patients. If it works for a (relative) few patients, we can assume it will work on most people with the same condition.</p>
<p>Without this idea, clinical trials would not be possible. The same with telecommunications and so many other areas of engineering.</p>
<h3 id="heading-bayes-theorem-learning-from-evidence">Bayes Theorem: Learning from Evidence</h3>
<p>Now we’ll start looking at probability more in depth based on the data table we have been using.</p>
<p>Here’s the table again so that you can reference it more easily:</p>
<table>
<thead>
<tr>
<th>Farm</th>
<th>Yield (tons/ha)</th>
<th>Fertilizer Used (Kg/ha)</th>
<th>Rainfall (mm)</th>
</tr>
</thead>
<tbody><tr>
<td>A</td>
<td>4.2</td>
<td>150</td>
<td>280</td>
</tr>
<tr>
<td>B</td>
<td>5.8</td>
<td>220</td>
<td>420</td>
</tr>
<tr>
<td>C</td>
<td>3.9</td>
<td>120</td>
<td>230</td>
</tr>
<tr>
<td>D</td>
<td>6.1</td>
<td>250</td>
<td>480</td>
</tr>
<tr>
<td>E</td>
<td>4.7</td>
<td>200</td>
<td>340</td>
</tr>
<tr>
<td>F</td>
<td>5.3</td>
<td>200</td>
<td>390</td>
</tr>
</tbody></table>
<p>Now there are a lot of ideas and formulas related to probabilities. But here, I want to explain to you the core ones that are applied in AI and give you a high-level definition of things.</p>
<p>We’ll start with conditional probability, which is foundational to understanding Bayes’ theorem. Then we’ll get to the extended Bayes’ theorem formula.</p>
<p>So, let's get started!</p>
<h4 id="heading-what-is-conditional-probability">What is Conditional Probability?</h4>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766903189931/420cc60a-71cd-4c37-ab0a-f8aebe825ca7.jpeg" alt="Image of a person playing chess with the black pieces" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/black-and-yellow-chess-pieces-3830671/">KOUSHIK BALA</a></p>
<p>Conditional probability is the probability that an event will happen given that another event has already taken place.</p>
<p>Confused? Don't worry! Let's see an example:</p>
<p>Let’s say that:</p>
<ul>
<li><p>A = Farm has rainfall above or equal 400 mm</p>
</li>
<li><p>B = Farm has a yield above or equal to 5.0 tons/ha</p>
</li>
</ul>
<p>Here is the formula for Conditional Probability:</p>
<p>$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$</p>
<p>Now let’s see this formula more in detail:</p>
<p>$$P(A)$$</p>
<p>This represents the probability that a farm has rainfall above or equal to 400 mm.</p>
<p>We have 6 farms, and 2 of them (farm B and D) have a rainfall above or equal to 400 mm.</p>
<p>So, the probability that a farm has rainfall above or equal to 400 mm is:</p>
<p>$$P(A) = \frac {2}{6} = \frac {1}{3} ≈ 0.33$$</p>
<p>Now let’s see for event B:</p>
<p>$$P(B)$$</p>
<p>This represents the probability that a farm has a yield above or equal to 5.0 tons/ha.</p>
<p>We have 6 farms and 3 of them (farm B, D and F) have a yield above or equal to 5.0 tons/ha.</p>
<p>So, the probability that a farm has a yield above or equal to 5.0 tons/ha is:</p>
<p>$$P(B) = \frac {3}{6} = \frac {1}{2} = 0.5$$</p>
<p>What about if we want to see both conditions’ probabilities at the same time?</p>
<p>$$P(A \cap B)$$</p>
<p>This refers to the probability of A and B being both true.</p>
<p>In our example, in means the probability that a farm both has a rainfall above or equal to 400 mm <strong>and</strong> a yield above or equal to 5.0 tons/ha.</p>
<p>We have:</p>
<ul>
<li><p>6 farms and 2 of them (farm B and D) have a rainfall above or equal 400 mm</p>
</li>
<li><p>6 farms and 3 of them (farm B, D and F) have a yield above or equal to 5.0 tons/ha</p>
</li>
</ul>
<p>For A and B to be true, only 2 farms (farm B and D) have both conditions.</p>
<p>This way:</p>
<p>$$P(A \cap B) = \frac {2}{6} = \frac {1}{3} ≈ 0.33$$</p>
<p>Now we’re ready to find out the conditional probability:</p>
<p>$$P(A|B)$$</p>
<p>This means the probability of A, knowing that B is true.</p>
<p>In our example, we can conclude that:</p>
<p>$$P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.33}{0.5} = 0.66$$</p>
<p>So, the probability that a farm has rainfall above or equal 400 mm – knowing that it has a yield above or equal to 5.0 tons/ha – is 0.66</p>
<h4 id="heading-bayes-theorem">Bayes’ Theorem</h4>
<p>This is one of the most important theorems in mathematics.</p>
<p>Bayes’ theorem is a formula that tells us how to change the probability of a prediction when new verified data becomes available.</p>
<p>In other words, it’s like a rule that tells us how to update our beliefs when new evidence appears.</p>
<p>Now, based on what we already know, let’s see how Bayes’ Theorem works.</p>
<p>Here is its formula:</p>
<p>$$P(B|A) = \frac{P(A|B) \cdot P(A)}{P(B)}$$</p>
<p>Now, based on the previous values, we can very easily find the probability of B, given that A is true.</p>
<p>In other words, the probability that a farm has a yield above or equal to 5.0 tons/ha given that is has a rainfall above or equal to 400 mm.</p>
<p>Let’s find the answer:</p>
<p>$$P(B|A) = \frac{P(A|B) \cdot P(A)}{P(B)}= \frac{0.66 \cdot 0.33}{0.5}=0.44$$</p>
<p>So, the probability that a farm has a a yield above or equal to to 5.0 tons/ha, knowing it rained equal to or more than 400 mm, is 44%.</p>
<p>Now that we’ve gone through this formula step by step, hopefully it doesn’t feel as complex.</p>
<h4 id="heading-where-is-this-applied-in-real-life">Where is this applied in real life?</h4>
<p>As with many math ideas in this book, Bayes' Theorem has applications in many business sectors.</p>
<p>For example, what is the best way to make a control system for a self-driving car, robot, or really any other device?</p>
<p>One effective approach is to use a <a href="https://en.wikipedia.org/wiki/Kalman_filter">Kalman filter</a>. Kalman filters rely heavily on Bayes' Theorem to handle control systems with incomplete data.</p>
<p>Kalman filters have a lot of applications in engineering. For example, thanks to Kalman filters, commercial jets can fly safely on autopilot.</p>
<p>So as you can see, Bayes’ Theorem is the foundation of many control systems used in risky industries.</p>
<h3 id="heading-what-are-markov-models-predicting-the-next-step-one-step-at-a-time">What Are Markov Models? Predicting the Next Step, One Step at a Time</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902389612/c80d7118-f13d-4f9b-a149-861db3f2037d.jpeg" alt="Image of the hand of a person throwing dice into the air" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/person-about-to-catch-four-dices-1111597/">lil artsy</a></p>
<p>How do you predict the future with math? Markov chains allow you to do this to a certain degree.</p>
<p>For this reason, Markov chains are widely used in science, engineering, economics, and many other areas.</p>
<p>In addition to this, Markov decision processes are a very important foundation for reinforcement learning. Reinforcement learning is a branch of AI where agents learn to make decisions by interacting with an environment to maximize rewards.</p>
<p>In this section, I’ll introduce you to Markov chains and decision processes with an analogy, a plain English explanation, and a code example.</p>
<p>If you want to dive in further, I recommend my <a href="https://www.freecodecamp.org/news/what-is-a-markov-chain/">freeCodeCamp article on the subject</a>.</p>
<h4 id="heading-markov-chain-analogy">Markov Chain Analogy</h4>
<p>Imagine that you want to predict the weather tomorrow, and it <strong>only</strong> depends on the weather today. The weather can be either sunny or rainy.</p>
<p>Here are the probabilities:</p>
<ul>
<li><p>If it's sunny today, there's an 80% chance that it will be sunny again tomorrow, and a 20% chance that it will be rainy.</p>
</li>
<li><p>If it's rainy today, there's a 50% chance that it will be sunny tomorrow, and a 50% chance that it will be rainy.</p>
</li>
</ul>
<p>In this scenario, we can predict future states of the weather based on current states using probabilities.</p>
<p>This idea of predicting the future based solely on probabilities of the present is called a Markov chain.</p>
<p>Here, the states are either sunny or rainy and the probabilities describe the chances of the weather changing based on the current state.</p>
<h4 id="heading-markov-chain-explained-in-plain-english">Markov Chain Explained in Plain English</h4>
<p>A Markov chain describes random processes where systems move between states, and a new state only depends on the current state, not on how it got there.</p>
<p>Mathematically, Markov chains are called stochastic models because they model (simulate) real life events that are random by nature (stochastic).</p>
<p>Markov chains are popular because they are easy to implement and efficient at modeling complex systems.</p>
<p>Another key advantage is their "memoryless" property. This makes it faster to run on computers, and powerful to study random processes and make predictions based on current conditions.</p>
<h4 id="heading-applications-of-markov-chains">Applications of Markov Chains</h4>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902558494/8129d378-5cd8-4fdc-be48-8ba0a34181b7.jpeg" alt="Image of a white square with a dark star inside it, surrounded by many other dark squares" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/shapes-on-a-dark-background-25630338/">Google DeepMind</a></p>
<p>At some level, almost all real-life events are stochastic. In other words, they involve randomness and uncertainty.</p>
<p>This is exactly why they are so widely used.</p>
<p>They can predict the behavior of systems based on current conditions:</p>
<ul>
<li><p>In finance, they are used to detect changes in credit ratings for forecasting market regimes.</p>
</li>
<li><p>In genetics, they help understand how proteins change over time (which is important when studying genetic variations).</p>
</li>
</ul>
<p>These real life examples show how effective Markov chains can be used to solve real problems in different fields.</p>
<p>In AI, Markov chains are used to model an environment like a factory or home. Modeling an environment with Markov chains is called a Markov decision process.</p>
<p>Using a Markov decision process, it’s possible to use reinforcement learning to create and optimize agents to act in the environment.</p>
<p>Of course, new and better variants of the Markov decision process have appeared over the years. But the key idea here is that it is thanks to Markov decision processes that the basis for reinforcement learning exists.</p>
<p>Reinforcement learning is widely used in advertising systems, logistics, robotics, video games, and many more applications.</p>
<h4 id="heading-types-of-markov-chains">Types of Markov Chains</h4>
<p>There are many types of Markov chains. In this section, we'll only discuss the most important variants.</p>
<ol>
<li>Discrete-Time Markov Chains (DTMCs)</li>
</ol>
<p>In DTMCs, the system changes state at specific time steps. They are called discrete because the state transitions occur at distinct, separate time intervals.</p>
<p>They are used in queuing theory (study of the behavior of waiting lines), genetics, and economics because they are simple to analyze.</p>
<ol>
<li>Continuous-Time Markov Chains (CTMCs)</li>
</ol>
<p>CTMCs differ from DTMCs in that state transitions can occur at any continuous time point, not at fixed intervals.</p>
<p>This makes them stochastic models where state changes happen continuously. This is important in chemical reactions and reliability engineering.</p>
<ol>
<li>Reversible Markov Chains</li>
</ol>
<p>Reversible Markov chains are special. The process of state change is the same whether the direction is forwards or backwards, like rewinding a video and playing it again.</p>
<p>This property makes it easier to know when a system is stable and study how a system behaves over time. They are widely used in statistical physics and economics</p>
<ol>
<li>Doubly Stochastic Markov Chains</li>
</ol>
<p>Doubly stochastic Markov chains are defined by a transition probability matrix. In the matrix, the sum of the probabilities in each row and each column equals 1.</p>
<p>This means each row and each column represent a valid probability distribution. In other words, each row and column represent a list of chances for different outcomes.</p>
<p>This property is crucial in quantum computing and statistical mechanics.</p>
<p>Thanks to Doubly stochastic Markov chains, systems change in a way that preserves probabilities and symmetry, making the modeling and analysis of quantum computing systems far more accurate.</p>
<h4 id="heading-hidden-markov-chains-code-example">Hidden Markov Chains Code Example</h4>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766903059652/ad8c6509-87ae-4978-8b64-24146161d1cb.jpeg" alt="Image of glasses, a MAC computer, and blurry code in it" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Photo by <a href="https://www.pexels.com/photo/data-codes-through-eyeglasses-577585/">Kevin Ku</a></p>
<p>Before we jump into code examples, let’s first understand what Hidden Markov Chains are.</p>
<p>The main idea behind hidden Markov chains is to model systems that have hidden states (states for which we don’t know their values) which can only be discovered through observable events.</p>
<p>In other words, hidden Markov chains allow us to predict the behavior of a system by:</p>
<ul>
<li><p>Considering the likelihood of moving from one state to another.</p>
</li>
<li><p>Knowing the probability of observing a certain event from each state</p>
</li>
</ul>
<p>We can understand this by observing how the states change from an indirect point of view.</p>
<p>We may not know the states’ original values. But by knowing the way they change, we can predict what their values will be in the future.</p>
<p>This way, hidden Markov chains are flexible in modeling sequences, capturing both the transitions between hidden states and the observable outcomes.</p>
<p>Because of this, hidden Markov models are used in fields such as engineering, financial modeling, speech recognition, bioinformatics, and many more.</p>
<h4 id="heading-code-example">Code Example:</h4>
<p>In this code example, we’ll see a simple example with synthetic data.</p>
<p>Here is the full code:</p>
<pre><code class="language-python">import numpy as np
from hmmlearn import hmm

# Set random seed for reproducibility
np.random.seed(42)

# Define the HMM parameters
n_components = 2  # Number of states
n_features = 1    # Number of observation features

# Create a Gaussian HMM
model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag")

# Define transition matrix (rows must sum to 1)
model.startprob_ = np.array([0.6, 0.4])
model.transmat_ = np.array([[0.7, 0.3],
                            [0.4, 0.6]])

# Define means and covariances for each state
model.means_ = np.array([[0.0], [3.0]])
model.covars_ = np.array([[0.5], [0.5]])

# Generate synthetic observation data
X, Z = model.sample(100)  # 100 samples

# Create a new HMM instance
new_model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag", n_iter=100)

# Fit the model to the data
new_model.fit(X)

# Print the learned parameters
print("Transition matrix:")
print(new_model.transmat_)
print("Means:")
print(new_model.means_)
print("Covariances:")
print(new_model.covars_)

# Predict the hidden states for the observed data
hidden_states = new_model.predict(X)

print("Hidden states:")
print(hidden_states)
</code></pre>
<img src="https://cdn-media-0.freecodecamp.org/2024/06/1.png" alt="Full code example of HMM (Hidden Markov Chain)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now let’s break the code down block by block:</p>
<p><strong>Import libraries and set random seed:</strong></p>
<pre><code class="language-python">import numpy as np
from hmmlearn import hmm

np.random.seed(42)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529887680/2440547e-ccf4-4067-83c2-20fafb16f045.png" alt="Code example of HMM (Hidden Markov Chain) - Import libraries and set random seed" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this block of code, we imported two Python libraries:</p>
<ul>
<li><p><a href="https://numpy.org/">NumPy</a>: For numerical operations.</p>
</li>
<li><p><a href="https://hmmlearn.readthedocs.io/en/latest/index.html">hmmlearn</a>: For hidden Markov model implementation.</p>
</li>
</ul>
<p>Next we defined a random seed with the NumPy library. A random seed is a value used to start a pseudorandom number generator.</p>
<p>With a fixed random seed, we can ensure that the sequence of pseudorandom numbers generated is always the same. This allows us to duplicate experiments and verify results.</p>
<p>The specific value of the seed doesn’t matter as long as it remains consistent.</p>
<p><strong>Define the HMM parameters and create a Gaussian HMM:</strong></p>
<pre><code class="language-python">n_components = 2  # Number of states
n_features = 1    # Number of observation features

model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag")
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529894398/094ac272-2788-4856-a984-b1f687464e90.png" alt="Code example of HMM (Hidden Markov Chain) - Define the HMM parameters and create a Gaussian HMM" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code block, we created an HMM with two hidden states and a single observed variable.</p>
<p><code>covariance_type "diag"</code> means the matrices that represent covariance (how two variables change together) are diagonal. In other words, each row and column is assumed to be independent of the others.</p>
<p>This implies that the probability distributions of each row and column are independent of each other.</p>
<p>But there is still something strange when we defined the hidden Markov chain:</p>
<p><strong>What does “Gaussian“ mean?</strong></p>
<p>This is a very big topic in statistics, but in a few words, Markov chains can only be created when we specify the transition probabilities (chances of moving from one state to another in a Markov chain) and an initial probability distribution.</p>
<p>A Gaussian HMM assumes events are initially modeled by a Gaussian distribution, also called a normal distribution!</p>
<p>And recall, we have already seen before what a normal distribution is.</p>
<p>Here is it again:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529107399/e51cb7a3-e751-45c7-8164-c07795ad32e1.png" alt="Code example of HMM (Hidden Markov Chain) - Image of normal distribution" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>From a normal distribution and other components, we can create a hidden Markov chain. And hidden Markov chains serve as a foundation for systems that affect millions of lives.</p>
<p><strong>Define transition matrix, means, and covariances for each state:</strong></p>
<pre><code class="language-python">model.startprob_ = np.array([0.6, 0.4])
model.transmat_ = np.array([[0.7, 0.3],
                            [0.4, 0.6]])

model.means_ = np.array([[0.0], [3.0]])
model.covars_ = np.array([[0.5], [0.5]])
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529901607/53442504-bcec-46d0-8114-fcd627947576.png" alt="Code example of HMM (Hidden Markov Chain) - Define transition matrix, means, and covariances for each state" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-python">model.startprob_ = np.array([0.6, 0.4])
</code></pre>
<p>This line sets the initial state probabilities for a Hidden Markov Model (HMM). It points out that there is a 60% probability of starting in state 0 and a 40% probability of starting in state 1.</p>
<pre><code class="language-python">model.transmat_ = np.array([[0.7, 0.3], [0.4, 0.6]])
</code></pre>
<p>This line of code sets the state transition probability matrix for the HMM.</p>
<p>The matrix specifies the probabilities of moving from one state to another:</p>
<ul>
<li><p>From state 0, there is a 70% chance of staying in state 0 and a 30% chance of transitioning to state 1.</p>
</li>
<li><p>From state 1, there is a 40% chance of transitioning to state 0 and a 60% chance of staying in state 1.</p>
</li>
</ul>
<pre><code class="language-python">model.means_ = np.array([[0.0], [3.0]])
</code></pre>
<p>This line sets the mean values for the observation distributions in each state.</p>
<p>It indicates that the observations are normally distributed with a mean of 0.0 in state 0 and a mean of 3.0 in state 1.</p>
<pre><code class="language-python">model.covars_ = np.array([[0.5], [0.5]])
</code></pre>
<p>This line sets the covariance values for the observation distributions in each state.</p>
<p>It specifies that the variance (covariance in this 1-dimensional case) of the observations is 0.5 for both state 0 and state 1.</p>
<p><strong>Create data, new HMM instance, and fit the model with the data:</strong></p>
<pre><code class="language-python">X, Z = model.sample(100)  # 100 samples

new_model = hmm.GaussianHMM(n_components=n_components, covariance_type="diag", n_iter=100)

new_model.fit(X)

print("Transition matrix:")
print(new_model.transmat_)
print("Means:")
print(new_model.means_)
print("Covariances:")
print(new_model.covars_)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529906427/009804bc-40db-4979-99dd-564935b175cc.png" alt="Code example of HMM (Hidden Markov Chain) - Create data, new HMM instance, and fit the model with the data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, we created a model with 100 samples, iterated it 100 times, and printed the new state transition matrix, means, and covariances.</p>
<p>In other words, we:</p>
<ol>
<li><p>Generated 100 samples from the original model</p>
</li>
<li><p>Fitted a new HMM to these samples.</p>
</li>
<li><p>Printed the learned parameters of this new model.</p>
</li>
</ol>
<p>What do X and Z mean here?</p>
<p>X means the observed data samples generated by the original model, while Z means the hidden state sequences corresponding to the observed data samples generated by the original model.</p>
<p>The transition matrix prints out:</p>
<pre><code class="language-python">[[0.8100804  0.1899196 ]
 [0.49398918 0.50601082]]
</code></pre>
<p>Which means that the model tends to stay in state 0 and has nearly equal chances of switching or staying when in state 1.</p>
<p>The means print out:</p>
<pre><code class="language-python">[[0.01577373]
 [3.06245496]]
</code></pre>
<p>Which means that the average observed value is approximately 0.016 in state 0 and 3.062 in state 1.</p>
<p>The covariances print out:</p>
<pre><code class="language-python">[[[0.41987084]]
 [[0.53146802]]]
</code></pre>
<p>Which means that the observed values vary by about 0.420 in state 0 and 0.531 in state 1.</p>
<p>This way, we may never know the exact values of the states, but we know their average observed value and how they vary and tend to change with each other.</p>
<p><strong>Predict the hidden states for the observed data:</strong></p>
<pre><code class="language-python">hidden_states = new_model.predict(X)

print("Hidden states:")
print(hidden_states)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763529913530/f81b3dbf-f517-4857-ac92-4732a524a621.png" alt="Code example of HMM (Hidden Markov Chain) - Predict the hidden states for the observed data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, based on the X observed data samples, we predicted the new states of the Markov model.</p>
<p>The hidden states print out:</p>
<pre><code class="language-python">[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 1 0 0 0 1
 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0]
</code></pre>
<p>Which means that the hidden states switch between state 0 and state 1, showing how the system changes states over time.</p>
<h3 id="heading-applications-in-ai-and-control-theory-making-decisions-under-uncertainty"><strong>Applications in AI and Control Theory: Making Decisions Under Uncertainty</strong></h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002495967/325e5ee4-df14-4adc-a520-0764d89fe8c8.jpeg" alt="Image of many flight instruments in an airplane" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/gray-airplane-control-panel-3402846/">Photo by capt.sopon</a></p>
<p>I have been giving you a high-level overview of the field of probabilities and statistics. As I explained before, I wanted to make the explanations simple to understand.</p>
<p>As someone with a bachelor's degree in electrical and computer engineering, I can assure you that while this chapter seems simple, in probabilities and statistics, things can get very complicated very quickly.</p>
<p>Many more concepts like:</p>
<ul>
<li><p>p-values</p>
</li>
<li><p>Advanced Monte Carlo methods</p>
</li>
<li><p>Bayesian networks</p>
</li>
<li><p>Statistical hypotheses</p>
</li>
</ul>
<p>Are not as straightforward as the ideas I’ve just told you about.</p>
<p>But as it is, probability and statistics are the starting points for making decisions where uncertainty exists in AI and control theory.</p>
<p>For example, the Bayes’ theorem, besides being the foundation of the Kalman filter, is also the foundation of many probabilistic models in the field of AI. Probabilistic models are usually used in quant firms and banks to model risk.</p>
<p>In control theory, probabilities and statistics are widely used to design robust control systems (as is the case with Kalman filters).</p>
<p>So as you can see, the application of probabilities and statistics, as with calculus and linear algebra, is the foundation for many tools that impact millions of lives and move billions of dollars in the global economy.</p>
<h2 id="heading-chapter-7-optimization-theory-teaching-machines-to-improve">Chapter 7: Optimization Theory - Teaching Machines to Improve</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002637327/9dea740c-4582-42bf-95a6-1230b7e9092d.jpeg" alt="Black and white image of many railways originating from a single one" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/railroad-tracks-in-city-258510/">Photo by Pixabay</a></p>
<p>This is the most advanced math chapter of the book. To truly understand it, it’s very important that you’ve first read the other chapters first.</p>
<p>We’re going to examine a few machine learning methods, and I’ll show you some recipes of how machine learning is just the use of linear algebra, calculus, probabilities and statistics, and optimization theory.</p>
<p>Just like making a cake!</p>
<h3 id="heading-what-is-optimization-theory">What is Optimization Theory?</h3>
<p>In AI, optimization theory is responsible for the algorithms that optimize data-driven AI models.</p>
<p>Often, big companies invest millions in research to create or refine algorithms that make training AI models faster.</p>
<p>This way, companies save far more money than the upfront research costs when scaling to train multiple large AI models.</p>
<p>It is thanks to optimization theory that deep learning was able to scale efficiently, eventually leading to the creation of ChatGPT and many other large language models.</p>
<p><strong>But why is that?</strong></p>
<p>In all data-driven machine learning models, there is a learning phase that has to happen. That is, there’s a period where the algorithms make predictions that are not correct and then need to change some parameters to make sure the next predictions are correct – or at least closer to being correct.</p>
<p>Without optimization, machine learning algorithms don't get anywhere on their learning path to the right solution. Without optimization, they spend too much time on a learning path that won’t increase their ability to predict things the right way.</p>
<p>So, let’s start learning!</p>
<h3 id="heading-why-optimization-drives-learning-in-ai">Why Optimization Drives Learning in AI</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766903297889/4075d065-9b55-42e2-a6f6-8aae02de940f.jpeg" alt="Image of a very cute white robot" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/high-angle-photo-of-robot-2599244/">Photo by Alex Knight</a></p>
<p>Optimization theory is the mathematical foundation that allows algorithms to improve their performance over many iterations.</p>
<p>When we combine an algorithm with a path to change its parameters to meet a certain objective (done with an optimization method), it’s called a machine learning algorithm.</p>
<p>This learning process always involves minimizing or maximizing a certain objective. For example, for many machine learning algorithms, the main objective is to minimize errors. To do this, over many iterations, the optimization methods "tells" the internal components of an algorithm what to change after receiving feedback on how well it’s performing.</p>
<p>It’s like someone first learning how to drive a car. The first few times, it may be complicated. But after a while and some practice, the driver learns how to drive properly and not make the same mistakes they once did in the past with the help of the instructor.</p>
<p>The same applies to optimization methods when optimizing algorithms.</p>
<h4 id="heading-types-of-optimization-theory-methods-in-ml-and-deep-learning">Types of Optimization Theory Methods in ML and Deep Learning</h4>
<p>The field of optimization theory is huge! Just as with many fields of mathematics, it is constantly growing every year.</p>
<p>But for the purposes of this book, there are three main categories of optimization methods:</p>
<ol>
<li><strong>First-Order Methods</strong></li>
</ol>
<p>These are the most used in deep learning and in all LLM models like Gemini, Grok, and others.</p>
<p>They are called first-order methods because they all use the first derivative of functions. The first derivative of a function measures how much a function's output changes when its input changes very little. The most widely used in deep learning are advanced variants of gradient descent.</p>
<p>While there are many variants, here are some popular examples:</p>
<ul>
<li><p>Standard batch gradient descent</p>
</li>
<li><p>Stochastic gradient descent</p>
</li>
<li><p>Mini-batch gradient descent</p>
</li>
<li><p>RMSprop</p>
</li>
<li><p><strong>Adam</strong></p>
</li>
</ul>
<p>In this chapter, we will look in depth at one of these methods called <strong>Adam</strong> (below).</p>
<ol>
<li><strong>Second-Order Methods</strong></li>
</ol>
<p>They are called second-order methods because they use information from second derivatives for better updates. There are many methods, like:</p>
<ul>
<li><p>BFGS</p>
</li>
<li><p>L-BFGS</p>
</li>
<li><p>Newton's method</p>
</li>
</ul>
<p>But these are not often used in machine and deep learning. While they optimize with fewer iterations, for the type of optimization problems algorithms in AI create (high-dimensional problems), they’re very computationally expensive.</p>
<p>So they’re not widely used like first-order optimization methods.</p>
<ol>
<li><strong>Zeroth-Order and Other Methods</strong></li>
</ol>
<p>These methods do not require derivatives to optimize algorithms. Some examples of algorithms where derivatives are not used are:</p>
<ul>
<li><p>Genetic algorithms</p>
</li>
<li><p>Dynamic programming algorithms</p>
</li>
<li><p>Particle swarm optimization methods</p>
</li>
</ul>
<p>The problem with these algorithms is that they are often very slow for many variables.</p>
<p>But in certain AI contexts, they can help optimize the architecture of deep learning models to improve AI models from an architectural point of view (instead of a parameter point of view).</p>
<h4 id="heading-how-does-optimization-theory-connect-with-linear-algebra-calculus-and-probability-and-statistics">How does optimization theory connect with linear algebra, calculus, and probability and statistics?</h4>
<p>Essentially:</p>
<ul>
<li><p>Calculus teaches you derivatives, which help you understand optimization theory.</p>
</li>
<li><p>Linear algebra teaches you matrices, which help you understand how different states relate and transform.</p>
</li>
<li><p>Probability and statistics teach you concepts like covariance and correlation, which help you understand how variables are connected with each other.</p>
</li>
</ul>
<p>This way, with linear algebra and probability and statistics, you gain the knowledge necessary to understand the algorithms. With calculus you gain the basis to understand optimization theory and how it changes certain parameters of the fundamental algorithms to minimize/maximize a certain objective.</p>
<h3 id="heading-simple-optimization-techniques-how-machines-learn-step-by-step">Simple Optimization Techniques: How Machines Learn Step by Step</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002727335/a265939c-dea8-4763-8861-7c7a0dbe1081.jpeg" alt="Image of a Star Wars blue and white robot" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/star-wars-r2-d2-2085831/">Photo by LJ Checo</a></p>
<p>Now, we’re going to see examples of machine learning algorithms used for optimization and deconstruct them so that you can understand how these areas of mathematics apply to them.</p>
<p>In each example, I will explain their main idea with an analogy as well as how each math area is used in each algorithm.</p>
<h4 id="heading-linear-regression">Linear Regression</h4>
<p>Imagine that you are solving a puzzle. To complete the puzzle, you need to arrange the pieces in the right design/order.</p>
<p>The same idea applies to linear regression.</p>
<p>We have matrices (linear algebra) that represent the parameters of the linear regression model and the data that flow into it.</p>
<p>And we can see over time how well the line is fitting the numbers, as well as its error (probabilities and statistics).</p>
<p>To find the best line for the linear regression, we need to know how much the parameters of the model need to change (calculus) and actually apply that change to the parameters (optimization theory).</p>
<p>This way, calculus tells us which direction to change the parameters, and optimization theory tells us how much to actually change them.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764295886800/0c5efd95-9368-4b68-b945-ff911632ca4c.gif" alt="GIF animation of linear regression working over many iterations" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Let’s see how to code the linear regression above:</p>
<pre><code class="language-python">import numpy as np

np.random.seed(42)
X = np.linspace(0, 10, 50)
y_true = 3 * X + 2
noise = np.random.normal(0, 2, 50)
y = y_true + noise

w = 0.1 
b = 0.5
learning_rate = 0.01
iterations = [0, 1, 2, 3, 4, 5]
saved_states = []

for epoch in range(max(iterations) + 1):
    y_pred = w * X + b
    error = np.mean((y - y_pred) ** 2)
    
    if epoch in iterations:
        saved_states.append({
            'epoch': epoch,
            'w': w,
            'b': b,
            'y_pred': y_pred.copy(),
            'error': error
        })
    
    dw = -2 * np.mean(X * (y - y_pred))
    db = -2 * np.mean(y - y_pred)
    
    w = w - learning_rate * dw
    b = b - learning_rate * db
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335029715/f77be0d9-ea3d-48f1-8cb5-f4806d1295e6.png" alt="Linear regression code example - full code example" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Let’s see the code block by block:</p>
<p><strong>Import library:</strong></p>
<pre><code class="language-plaintext">import numpy as np
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335026504/94989760-bb16-4469-947e-eba7bd25b5be.png" alt="Linear regression code example - Import library" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>For this problem, we’ll import one of the most used Python libraries: NumPy (which we’ve worked with earlier in the book).</p>
<p><strong>Create data points:</strong></p>
<pre><code class="language-python">np.random.seed(42)
X = np.linspace(0, 10, 50)
y_true = 3 * X + 2
noise = np.random.normal(0, 2, 50)
y = y_true + noise
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335038511/59e01c3d-27bf-4e6c-8500-9178f1ff569f.png" alt="Linear regression code example - Create data points" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this code, we define a base line that will help in generating the data points:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765336338665/caa859d0-92cb-424e-8eb2-292093c24355.png" alt="Linear regression code example - image of green base line that will help in generating the data points" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-python">X = np.linspace(0, 10, 50)
y_true = 3 * X + 2
</code></pre>
<p>After this green line has been created, we will add noise to it to create the data points:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765336395290/80849617-9489-471d-88f6-fb2aaea5b385.png" alt="Linear regression code example - image of a green baseline that will help in generating the data points with blue dots added by introduced noise" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<pre><code class="language-plaintext">noise = np.random.normal(0, 2, 50)
y = y_true + noise
</code></pre>
<p>This is how we defined the data points for the line dataset.</p>
<p><strong>Initializing linear regression parameters and others:</strong></p>
<pre><code class="language-python">w = 0.1 
b = 0.5
learning_rate = 0.01
iterations = [0, 1, 2, 3, 4, 5]
saved_states = []
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335044810/72a775ee-9929-488d-b05e-ab5d32d6b031.png" alt="Linear regression code example - Initializing linear regression parameters and others" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this block of code, we initialize:</p>
<ul>
<li><p>Linear regression parameters: Weight to be 0.1 and bias to be 0.5</p>
</li>
<li><p>One hyperparameter: Learning rate</p>
</li>
<li><p>How many iterations we are going to use to improve the linear regression</p>
</li>
<li><p>An array called saved_states to store values to later create graphs</p>
</li>
</ul>
<p>This way, we start with this red line:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765336283612/d7bb34b5-aefc-4565-bed2-d2819bc449df.png" alt="Linear regression code example - initializing linear regression parameters and line to fit data points starting with near zero slope" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Making the linear regression learn with the data:</strong></p>
<pre><code class="language-python">for epoch in range(max(iterations) + 1):
    y_pred = w * X + b
    error = np.mean((y - y_pred) ** 2)
    
    if epoch in iterations:
        saved_states.append({
            'epoch': epoch,
            'w': w,
            'b': b,
            'y_pred': y_pred.copy(),
            'error': error
        })
    
    dw = -2 * np.mean(X * (y - y_pred))
    db = -2 * np.mean(y - y_pred)
    
    w = w - learning_rate * dw
    b = b - learning_rate * db
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335055978/2395671a-d873-4bd1-bfa0-349cc6c7be65.png" alt="Linear regression code example - Making the linear regression learn with the data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>It may appear complicated, but let’s see in smaller blocks:</p>
<ul>
<li>For loop</li>
</ul>
<pre><code class="language-python">for epoch in range(max(iterations) + 1):
</code></pre>
<ul>
<li>Making an prediction and seeing its error</li>
</ul>
<pre><code class="language-python">y_pred = w * X + b
error = np.mean((y - y_pred) ** 2)
</code></pre>
<p>In this block of the code, we find the values predicted for the current parameters and see its error from the real values.</p>
<ul>
<li>Saving current iteration values for future statistics</li>
</ul>
<pre><code class="language-plaintext">if epoch in iterations:
     saved_states.append({
         'epoch': epoch,
         'w': w,
         'b': b,
         'y_pred': y_pred.copy(),
         'error': error
     })
</code></pre>
<p>Here we are juts storing in the saved_states array the values of the current iteration to later compute images.</p>
<ul>
<li>Finding the gradients</li>
</ul>
<pre><code class="language-plaintext">dw = -2 * np.mean(X * (y - y_pred))
db = -2 * np.mean(y - y_pred)
</code></pre>
<p>In this block of code, we find the gradients values for the current prediction.</p>
<p>In other words, for the weight and bias, we find out how much they need to change in order to approximate better the values of the parameters to the data points.</p>
<ul>
<li>Updating the parameters values</li>
</ul>
<pre><code class="language-plaintext">w = w - learning_rate * dw
b = b - learning_rate * db
</code></pre>
<p>Finally, we update the weight and the bias with the new values so that the line better approximates the data points:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765335279159/97e4914a-ed8a-4cf7-8155-e7cde0fa7edd.gif" alt="GIF animation of linear regression working over many iterations" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h4 id="heading-neural-networks">Neural Networks</h4>
<p>The same puzzle idea applies to neural networks. Neural networks are algorithmic models inspired by the brain that learn patterns from data. They are part of a machine learning field called deep learning, which uses neural networks to learn complex patterns.</p>
<p>Neural networks are important because they power modern AI applications like:</p>
<ul>
<li><p>Image recognition</p>
</li>
<li><p>Language translation</p>
</li>
<li><p>Chatbots</p>
</li>
</ul>
<p>For example, ChatGPT means Chat Generative Pre-trained Transformer. A transformer is an architecture of neural networks.</p>
<p>If you understand neural networks, you’ll understand the foundations that make ChatGPT work.</p>
<ul>
<li><p>We have matrices (linear algebra) that represent the parameters of the neural network model and the data that flow into it.</p>
</li>
<li><p>And we can know over time how well the neural network model is converging to the dataset, fitting the numbers, and see its error (probabilities and statistics).</p>
</li>
<li><p>Calculus will tell us in which direction the parameters of the neural network need to change.</p>
</li>
<li><p>Optimization theory will tell us how much they need to change.</p>
</li>
</ul>
<p>For example, this is a neural network:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764296443948/e1f46e04-d508-407c-8da6-de8e267a2ba7.png" alt="Image example of a simple neural network" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This model has in total 13 parameters:</p>
<ul>
<li><p>It has 10 lines(connections between circles). These are called weights.</p>
</li>
<li><p>It has 2 circles in the hidden layer and 1 in the output layer. Each circle has one bias.</p>
</li>
</ul>
<p><strong>Big question:</strong></p>
<p>Imagine you work in a bank. You are in charge of deciding who gets credit cards or not. For that, you create the neural network above that takes 4 inputs:</p>
<ul>
<li><p>Income</p>
</li>
<li><p>Credit score</p>
</li>
<li><p>Debt ratio</p>
</li>
<li><p>Bankruptcy history</p>
</li>
</ul>
<p>With this neural network well optimized, you can figure it out!</p>
<p>Very simply, without going into things like activation functions, the network processes the 4 inputs through its weights and biases.</p>
<p>Each connection multiplies the input by its weight. After that, each node adds its bias.</p>
<p>The final output is a number between 0 and 1:</p>
<ul>
<li><p>Numbers close to 0 mean "Not approved"</p>
</li>
<li><p>Numbers close to 1 mean "Approved"</p>
</li>
</ul>
<p>For example, a high income figure, a good credit score, and no bankruptcy history data flow through the neural networks and produce 0.92. This means that it should be approved.</p>
<p>But a low income figure with a history of bankruptcy may produce 0.15, which results in a not approved.</p>
<p>In reality, bank systems and others have neural networks that take far more well-chosen parameters and decide this automatically.</p>
<p>This is precisely how AI can be used for credit approval.</p>
<p>But a question remains: What is the best way to know how much the parameters need to change?</p>
<p>In the next part, we are going to see the most famous optimization theory algorithm that will help us decide that.</p>
<h3 id="heading-what-is-adam-the-most-popular-way-ai-models-finds-the-best-learning-path">What is Adam? The Most Popular Way AI Models Finds the Best Learning Path</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902926221/0b6fbbee-dfda-4a55-bd5d-21215ea33074.jpeg" alt="Image of a mountain" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/green-leafed-trees-during-fog-time-167684/">Photo by Lum3n</a></p>
<p>To optimize neural network based AI models, one of the most popular methods is called Adam, which means Adaptive Moment Estimation.</p>
<p>The paper that introduced the method is one of the most influential in the 21st century in machine learning, with thousands of citations. As with all ideas in non-symbolic AI, Adam is a mixture of different math concepts.</p>
<p>It's composed of the ideas of two other optimization methods:</p>
<ul>
<li><p>Momentum Gradient Descent: Accumulates velocity from previous gradients to move faster in consistent directions</p>
</li>
<li><p>Root Mean Square Propagation (RMSProp): Adapts learning rates based on recent gradient magnitudes</p>
</li>
</ul>
<p><strong>Let's understand them with an analogy.</strong></p>
<p>Imagine that you are riding a bicycle down a mountain little by little. You already know the direction thanks to calculus.</p>
<p>But how do you descend safely without losing control or going too slowly?</p>
<p>First, you need to build up speed gradually using past momentum. This is one of the main ideas of momentum gradient descent.</p>
<p>It's also important that you adjust your speed based on the terrain's elevation. This is the main idea of RMSProp.</p>
<p>This way, you can safely accelerate and brake appropriately.</p>
<p>When optimizing a model with Adam, this is the same concept. With Adam, we want to optimize a model in a fast and stable way.</p>
<p>The momentum gradient descent ensures the fast part, and the RMSProp ensures the secure part.</p>
<p>Nowadays, for LLMs, which once again are just very big neural network models, a variant of Adam called AdamW is more often used.</p>
<p>Now, let's build a code example of using Adam.</p>
<h4 id="heading-code-example">Code example:</h4>
<p>Using Adam, we are going to optimize this neural network based on fake data.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765148552889/28101efb-529f-4828-bb7e-adfbf5202d7f.png" alt="Image of a neural network" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>It will take 4 features:</p>
<ul>
<li><p>Income</p>
</li>
<li><p>Credit score</p>
</li>
<li><p>Debt ratio</p>
</li>
<li><p>Bankruptcy history</p>
</li>
</ul>
<p>And it will tell us if we should or should not approve credit for a given person.</p>
<p>Also, since this book is an introduction to the math of AI, I will not, in this code example, discuss hyperparameter optimization, regularization techniques, and other more advanced topics and good practices.</p>
<p>I want to show why this neural network fails with this data and explain the importance of using great data.</p>
<p>Here is the whole code (and we’ll see each part more in-depth below):</p>
<pre><code class="language-python">import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, random_split
import pytorch_lightning as pl
import matplotlib.pyplot as plt

torch.manual_seed(42)
x = torch.randn(10000, 4)
y = torch.randint(0, 2, (10000, 1)).float()
dataset = TensorDataset(x, y)

train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

class CreditApprovalNet(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []
    
    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss
    
    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)

model = CreditApprovalNet()
trainer = pl.Trainer(max_epochs=100, logger=False, enable_checkpointing=False)
trainer.fit(model, train_loader, val_loader)

# 
plt.plot(model.train_losses)
plt.xlabel('Training Step')
plt.ylabel('Loss')
plt.title('Credit Approval Training')
plt.grid(True, alpha=0.3)
plt.show()
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765150336432/8bb2eab8-60a1-4a01-babf-1b5b11d9187a.png" alt="Code example of training a neural network - Full code" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Now let’s break it down:</p>
<p><strong>Importing libraries:</strong></p>
<pre><code class="language-python">import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader, random_split
import pytorch_lightning as pl
import matplotlib.pyplot as plt
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151014087/80097a4b-6bf2-4af0-94da-7f929cf35d2c.png" alt="Code example of training a neural network - Importing libraries" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this block of code, we are importing code from 3 Python libraries:</p>
<ul>
<li><p><a href="https://pytorch.org/">PyTorch</a>: One of the most popular python libraries to create new AI models in AI research</p>
</li>
<li><p><a href="https://lightning.ai/docs/pytorch/stable/">PyTorch Lightning</a>: A PyTorch wrapper that organizes training code and handles repetitive tasks automatically</p>
</li>
<li><p><a href="https://matplotlib.org/">Matplotlib</a>: One of the most popular python libraries to make graphs from data</p>
</li>
</ul>
<p><strong>Creating data:</strong></p>
<pre><code class="language-python">torch.manual_seed(42)
x = torch.randn(10000, 4)
y = torch.randint(0, 2, (10000, 1)).float()
dataset = TensorDataset(x, y)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151040691/a2405e15-8ed0-4988-8b78-724f1bd60347.png" alt="Code example of training a neural network - creating data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this part, we define a seed to make the random numbers reproducible. In other words, when we run the code many times, the same random numbers will be generated.</p>
<p>Next, we will create 10,000 applications for credit with 4 features in X and their approval decisions in y. After that, we unify everything in the dataset variable.</p>
<p>We’ll use TensorDataset because it allows us to have the 4 features and the target paired together. This way, the data does not get mixed up during training.</p>
<p><strong>Dividing data:</strong></p>
<pre><code class="language-python">train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151063358/8325f2eb-3cf9-4900-909d-545637e20608.png" alt="Code example of training a neural network - Dividing data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this block of code, we divide the data into a training dataset and a validation dataset.</p>
<p>This way, we have one dataset that’s being used to train and find the parameters while comparing results with the validation dataset.</p>
<p>As we can see, 80% of the data will be training data, and 20% of the data will be validation data.</p>
<p><strong>Loading data:</strong></p>
<pre><code class="language-python">train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151090966/a80b2483-0bc3-4693-9b58-36765e4b2da2.png" alt="Code example of training a neural network - Loading data" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Here, we load the data into data loaders for the AI model to use.</p>
<p>This way, we have the data automatically split into small batches and shuffled. So instead of processing all 10,000 data points, the model will be trained on one batch, improved, then another batch, then improved again, and so forth. That makes training go faster.</p>
<p><strong>Creating AI model and training process:</strong></p>
<pre><code class="language-python">class CreditApprovalNet(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []
    
    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss
    
    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151116959/d75bd178-24bb-4e5d-b043-c504e280f500.png" alt="Code example of training a neural network - Creating AI model and training process" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>This code block appears to be complicated, but let’s see each method block by block:</p>
<ul>
<li><strong>Creating the class with inheritance:</strong></li>
</ul>
<pre><code class="language-python">class CreditApprovalNet(pl.LightningModule):
</code></pre>
<p>This way, in one line, we can import everything we need to define both the model and how it will be trained.</p>
<ul>
<li><strong>init: Builds the model's layers and components:</strong></li>
</ul>
<pre><code class="language-python">    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(4, 2)
        self.relu = nn.ReLU()
        self.output = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()
        self.train_losses = []
</code></pre>
<p>In this section of the code, we are defining the architecture of the AI model.</p>
<ul>
<li><strong>forward: Processes input data through the network to make predictions:</strong></li>
</ul>
<pre><code class="language-python">    def forward(self, x):
        x = self.relu(self.hidden(x))
        return self.sigmoid(self.output(x))
</code></pre>
<p>In this part of the code, we are defining how data will flow in the AI model based on the architecture defined.</p>
<ul>
<li><strong>training_step: Calculates loss for each batch during training:</strong></li>
</ul>
<pre><code class="language-python">    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log('train_loss', loss)
        self.train_losses.append(loss.item())
        return loss
</code></pre>
<p>Here, we are defining how the model will be trained. In other words, how we will find the best parameters for the model to predict well.</p>
<ul>
<li><strong>configure_optimizers: Sets the Adam optimizer with learning rate:</strong></li>
</ul>
<pre><code class="language-python">    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.0001)
</code></pre>
<p>Finally, here we are defining what optimizer we are going to use to, step by step, improve the AI model parameters.</p>
<p><strong>Training AI model:</strong></p>
<pre><code class="language-python">model = CreditApprovalNet()
trainer = pl.Trainer(max_epochs=100, logger=False, enable_checkpointing=False)
trainer.fit(model, train_loader, val_loader)
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151149824/33cb6ad3-3a5d-4964-ab45-ccfd68cd0521.png" alt="Code example of training a neural network - Training AI model" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>In this block of code:</p>
<ul>
<li><p>We create the neural network model in the first line</p>
</li>
<li><p>In the 2nd and 3rd line, we prepare the training settings and train the model for 100 epochs</p>
</li>
</ul>
<p>This way, in the command line, this appears:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765152230535/3a5a6a13-12b1-4f31-8bec-cfbc830510a6.png" alt="Code example of training a neural network - training an AI model - command line showing number of layers and parameters" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The PyTorch code is essentially telling us the number of parameters in the AI model!</p>
<p><strong>Seeing results and understanding why they are not good:</strong></p>
<pre><code class="language-python">
plt.plot(model.train_losses)
plt.xlabel('Training Step')
plt.ylabel('Loss')
plt.title('Credit Approval Training')
plt.grid(True, alpha=0.3)
plt.show()
</code></pre>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765151210074/3cbecda5-616e-4c3b-a942-2512f81697a1.png" alt="Code example of seeing results and understanding why they are not good:" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Using the Matplotlib library, we plot the results:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765152336092/6cfce900-ffb6-449f-9d5d-827ff71735bb.png" alt="Code example of training a neural network - Plot the training done over time." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>The AI model is not converging.</strong></p>
<p>We can see that because the loss is nearly 0.7 (70%) over time.</p>
<p>The main reason the model is not converging well is that there is little to no relationship between the 4 features and the target variable.</p>
<p>In other words, we do not have good data.</p>
<p>The code works perfectly, but this shows the <strong>most important rule in machine learning</strong>: when we create an AI model, the MOST IMPORTANT thing is data.</p>
<p>It does not matter if you use a simple linear regression or a neural network based on transformers or whatever. If you do not have high quality data, the model is not going to perform well.</p>
<p>Even if we use a good optimizer, like Adam, it will not solve the data problem.</p>
<p><strong>Next steps: Common beginner mistakes</strong></p>
<p>I also wrote this exact code example to show you something very important: neural networks are not always the best models to use.</p>
<p>This is a very common beginner mistake. You may start with neural networks for everything, when often machine learning methods with little data preprocessing do the job well.</p>
<p>For this type of problem, the solution is to first try machine learning methods instead of going to neural networks.</p>
<p>There are many reasons for this, but the main ones are:</p>
<ul>
<li><p>Machine learning methods are simpler and often quicker to train than neural networks</p>
</li>
<li><p>Machine learning methods are simpler to understand how they make decisions. In other words, we can understand how the machine learning model thought to make a prediction.</p>
</li>
<li><p>With computational learning, we can guess with certain machine learning models how well they will predict in the future and provide theoretical guarantees about their performance.</p>
</li>
</ul>
<p>Another common mistake is not dividing the data.</p>
<p>To simplify, I created only a training and validation division of the data</p>
<p>In a serious project, you should always divide it into 3 parts: training, validation, and testing.</p>
<p>With training, you create the model. With validation, you test the model based on the data it was trained on. With the test dataset part, you compare if the loss of the model is similar to the validation or different. If they are very different, it means that the AI model converged to the validation dataset but not the test dataset.</p>
<p>I challenge you to think further about how you could improve this code and to try to make the synthetic data more correlated in order to improve its quality.</p>
<h3 id="heading-applications-in-ai-and-control-theory-of-optimization-theory">Applications in AI and Control Theory of&nbsp;Optimization Theory</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002780396/5aaf78bb-a06a-4d09-b681-a604a323d430.jpeg" alt="Image of a robot hand touching a web" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/robot-pointing-on-a-wall-8386440/">Photo by Tara Winstead</a></p>
<p>Optimization theory serves as the engine behind AI and control systems that shape our lives.</p>
<p>From unlocking your phone with facial recognition to autopilot systems guiding planes, optimization algorithms are constantly at work.</p>
<p>When you ask ChatGPT a question, optimization theory determines the values of billions of parameters during training.</p>
<p>The same is true for all other LLMs like Gemini, Claude, Grok, DeepSeek, and others. All of them contain millions and millions of parameters. The only way to find the best combination of the parameters to achieve a certain objective is with optimization theory.</p>
<p>In control theory, many systems like Model Predictive Control (MPC) and adaptive control systems only work thanks to optimization methods that balance how internal components of the control system should work together</p>
<p>Beyond training neural networks and controlling physical systems, optimization powers recommendation systems, resource allocation, and so many other systems.</p>
<p>Some examples are:</p>
<ul>
<li><p>Netflix movie recommendation system</p>
</li>
<li><p>Spotify's song suggestion system</p>
</li>
<li><p>Google systems to reduce data center cooling costs</p>
</li>
<li><p>Quantitative trading firms high-frequency trading systems</p>
</li>
</ul>
<p>To end this final chapter, I’ll share this:</p>
<p><strong>It is optimization theory that makes math models into AI models that impact the lives of millions worldwide.</strong></p>
<h2 id="heading-conclusion-where-mathematics-and-ai-meet">Conclusion: Where Mathematics and AI Meet</h2>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765002962447/8cdbc79a-5d9c-406d-bad6-2f2e49566b36.jpeg" alt="Pyramids of Egypt with a camel sitting" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/a-camel-lying-in-the-ground-on-the-background-of-pyramids-18991572/">Photo by AXP Photography</a></p>
<p>When ancient civilizations first carved numbers into clay tablets, they likely didn’t imagine that these symbols would one day allow humanity to create the scientific, technological, and medical marvels we have today.</p>
<p>Yet here we are.</p>
<p>We’re in an era where mathematical ideas developed over many centuries – even millennia – have converged to create artificial intelligence.</p>
<p>Throughout this book, we've traced a path from the most basic math concepts to the cutting edge of AI. We have seen how:</p>
<ul>
<li><p>Matrices compress complex systems into simple forms</p>
</li>
<li><p>Derivatives measure change</p>
</li>
<li><p>Probability helps us navigate uncertainty</p>
</li>
<li><p>Optimization guides algorithms toward better decisions to learn faster.</p>
</li>
</ul>
<p>We’ve also learned how each math field has helped create tools that are responsible for many of the things we take for granted today.</p>
<h3 id="heading-mathematics-is-the-foundation-of-ai">Mathematics is the Foundation of AI</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902825228/e14431de-44da-4e26-a646-5d277c16b073.jpeg" alt="Board with an integral equation in it" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/person-writing-on-white-board-3781338/">Photo by Jeswin Thomas</a></p>
<p>Always remember this: AI is not pure magic or a "being" we don't understand. It’s just the combination of many math ideas working very well together.</p>
<p>When you ask a question of ChatGPT or any other LLM, it generates a response. And in the process of generating that response, there are millions of matrix multiplications happening in seconds.</p>
<p>Or, for example, when a self-driving car decides to stop moving because it’s coming up to a crosswalk, there are a lot of math computations (related to calculus and probability and statistics) working very fast to ensure safety.</p>
<p>The great thing about mathematics is that it’s a common, standard language of logic. No matter the backgrounds of people or where they were born, a derivative will always be a derivative, and the same thing goes for key AI concepts.</p>
<p>This way, scientists and engineers worldwide can improve each other's work because everyone understands the same language.</p>
<h3 id="heading-the-future-on-device-ai-and-the-democratization-of-ai">The Future: On Device AI and the Democratization of AI</h3>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766902760109/02b3f00d-a8df-4546-bf41-c1791cdc5f18.jpeg" alt="Image of an chip" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><a href="https://www.pexels.com/photo/abstract-image-of-a-microchip-with-heatmap-colors-28767589/">Photo by Steve Johnson</a></p>
<p>One shift happening now is the move toward edge AI. That is, AI that runs locally on your phone, computer, and really in all your devices (rather than in distant data centers).</p>
<p>This way, privacy is guaranteed because it runs locally. Waiting times for AI models decrease because no data needs to be sent. AI can be used offline, and costs decrease.</p>
<p>And what about the massive data centers being built all over the world? Those will be used for more products that will help improve the lives of millions of people.</p>
<p>As AI becomes more local and more processing power is freed up from big data centers, new AI innovations will appear, and more benefits will come.</p>
<p>The same way that in the past century every computer got its own networking chip, every device will have (and in some cases, already has) AI accelerators.</p>
<p>And much of it will be thanks to the math you learned in this book.</p>
<h3 id="heading-final-reflections">Final Reflections</h3>
<p>Isaac Newton wrote, "If I have seen further, it is by standing on the shoulders of giants."</p>
<p>Every algorithm you use, every model you train, and every new theorem you learn stands on centuries of mathematical progress. You now stand on those same shoulders of these giants!</p>
<p>Thank you for reading, and happy learning.</p>
<p>Here’s the full book <a href="https://github.com/tiagomonteiro0715/The-Math-Behind-Artificial-Intelligence-A-Guide-to-AI-Foundations">GitHub repository with all the code</a>.</p>
<h3 id="heading-acknowledgements">Acknowledgements</h3>
<p>First and foremost, I would like to thank <a href="https://www.linkedin.com/in/guilherme-mendes-a416b7206/"><strong>Guilherme Mendes</strong></a>, currently a Master’s student in Electrical and Computer Engineering at NOVA University, specializing in Control Theory, for reviewing the mathematical and technical details of the 1st version of this book.</p>
<p>I am also grateful to the organizations that gave me opportunities to grow:</p>
<ul>
<li><p><a href="https://www.fct.unl.pt/en">NOVA School of Science and Technology</a></p>
</li>
<li><p><a href="https://ieee-pt.org/">IEEE Portugal Section</a></p>
</li>
<li><p><a href="https://www.siliconvalleyfellowship.com/">Silicon Valley Fellowship</a></p>
</li>
<li><p><a href="https://www.northeastern.edu/">Northeastern University</a></p>
</li>
<li><p><a href="https://best.eu.org/index.jsp">BEST and BEST Almada</a></p>
</li>
<li><p><a href="https://magmastudio.pt/">Magma Studio</a></p>
</li>
</ul>
<p>A special thank you goes to the freeCodeCamp editorial team**,** especially Abigail Rennemeyer, for their patience and for reviewing every chapter of this book.</p>
<p>I would also like to thank all the professors at NOVA FCT who have taught and guided me throughout my academic journey, especially those from the Department of Electrical and Computer Engineering.</p>
<h2 id="heading-about-the-author">About the Author</h2>
<ul>
<li><p>LinkedIn: <a href="https://www.linkedin.com/in/tiago-monteiro-/">https://www.linkedin.com/in/tiago-monteiro-</a></p>
</li>
<li><p>GitHub: <a href="https://github.com/tiagomonteiro0715">https://github.com/tiagomonteiro0715</a></p>
</li>
<li><p>Email: <a href="mailto:monteiro.t@northeastern.edu">monteiro.t@northeastern.edu</a></p>
</li>
</ul>
<p>My name is Tiago Monteiro, and I’m now pursuing a master's degree in Artificial Intelligence at Northeastern University in the Silicon Valley Campus (San Jose) on a merit-based scholarship.</p>
<p>I’m not from the United States. I am a Portuguese national, born and raised in the district of Lisbon.</p>
<p>In Portugal, I completed a bachelor's degree in electrical and computer engineering at NOVA University, one of Portugal's best universities.</p>
<p>I have authored over 20 articles for freeCodeCamp, which have accumulated more than 240,000 views over the years, and completed the Deep Learning Specialization from DeepLearningAI, taught by Andrew Ng.</p>
<p>Also, I had the privilege of participating in the winter 2025 batch of the renowned Silicon Valley Fellowship program.</p>
<h4 id="heading-why-did-i-choose-electrical-and-computer-engineering">Why did I choose electrical and computer engineering?</h4>
<p>After finishing the Portuguese national math exam in 12th grade, I chose Electrical and Computer Engineering (ECE) to challenge myself and learn new math on my own.</p>
<p>The ECE degree combined:</p>
<ul>
<li><p>Advanced Mathematics</p>
</li>
<li><p>Programming (from Assembly to Python)</p>
</li>
<li><p>Physics (classical mechanics, electromagnetism)</p>
</li>
</ul>
<h4 id="heading-what-did-i-gain-exactly">What did I gain exactly?</h4>
<p>I mastered the skills needed to quickly understand AI research, particularly after completing Andrew Ng's Deep Learning Specialization.</p>
<p>In Portugal, I also studied advanced STEM areas including, for example:</p>
<ul>
<li><p><strong>Partial Differential Equations</strong> for modeling real-world phenomena</p>
</li>
<li><p><strong>Harmonic analysis</strong> (Fourier/Laplace transforms) for signal processing and alternative problem perspectives</p>
</li>
<li><p><strong>Complex analysis</strong> involving derivatives and integrals in the complex domain</p>
</li>
<li><p><strong>Numerical methods</strong> for approximating mathematical solutions computationally</p>
</li>
<li><p><strong>Signal/control theory</strong> for ensuring system stability in dynamic environments</p>
</li>
<li><p><strong>Physics classes</strong> in classical mechanics and electromagnetism fundamentals</p>
</li>
</ul>
<p>While not directly applied to AI, these studies enhanced my systems thinking and ability to independently learn complex STEM concepts.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Architecture of Mathematics – And How Developers Can Use it in Code ]]>
                </title>
                <description>
                    <![CDATA[ "To understand is to perceive patterns." - Isaiah Berlin Math is not just numbers. It is the science of finding complex patterns that shape our world. This means that to truly understand it, we need to see beyond numbers, formulas, and theorems and ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-architecture-of-mathematics-and-how-developers-can-use-it-in-code/</link>
                <guid isPermaLink="false">68308ee8ccde6bc325c82393</guid>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Math ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ history ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Fri, 23 May 2025 15:06:16 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748012748947/1df613bf-93e7-4f03-b0f0-47ff49f38504.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <blockquote>
<p>"To understand is to perceive patterns." - Isaiah Berlin</p>
</blockquote>
<p>Math is not just numbers. It is the science of finding complex patterns that shape our world. This means that to truly understand it, we need to see beyond numbers, formulas, and theorems and understand its structures.</p>
<p>The main goal of this article is to show how math is just like a growing tree of ideas. I want to show that math is a living system of logic, not just formulas to memorize. With analogies, history, and code examples, I want to help you understand math more deeply and how you can apply it to programming.</p>
<p>I’ve also included some code examples here to help you connect theory and practice. I show them to demonstrate how math ideas are applied to real problems. Whether you are new to more advanced math or are more experienced, these code examples will help you understand how to apply math in programming.</p>
<p>This link across theory and application reflects my own studies. I am a finalist in an undergraduate degree in Electrical and Computer Engineering at NOVA FCT, one of the best engineering faculties in Portugal.</p>
<p>My engineering degree is one with more math and physics. This is because it’s key to get a solid grasp of math to understand electronics, telecommunications, control theory, and other areas of engineering.</p>
<p>Here’s a brief overview of some of the math and physics subjects I’ve learned:</p>
<ul>
<li><p><strong>Partial Differential Equations (PDEs):</strong> These equations model real-world phenomena, from heat diffusion to the economy of a country.</p>
</li>
<li><p><strong>Harmonic Analysis (Fourier &amp; Laplace):</strong> Integral transforms like the Fourier and Laplace transform allow us to understand problems in new domains.</p>
</li>
<li><p><strong>Complex Analysis:</strong> Extending calculus into the complex plane gives rise to powerful tools used in physics and engineering.</p>
</li>
<li><p><strong>Numerical Analysis:</strong> When analytical solutions are impossible or inefficient, numerical methods provide computer-based approximations. This is crucial for real-world applications.</p>
</li>
<li><p><strong>Control and Signal Theory:</strong> These areas show us how to design stable systems like rockets, trains, and robots.</p>
</li>
<li><p><strong>Physics:</strong> Courses in Classical Mechanics and Electromagnetism helped bridge theoretical math to physical laws</p>
</li>
</ul>
<p>During my years of study, besides technical skills, I’ve developed a deeper understanding of how the world works and the structure of the field of mathematics. And I’ve started to find patterns in how math is a framework of interconnected logic.</p>
<h3 id="heading-in-this-article-well-explore">In this article, we’ll explore:</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-simple-analogy-the-tree-of-mathematics">Simple Analogy: The Tree of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-structure-and-history-of-mathematics">The Structure and History of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-an-tree-example-foundations-of-relativity-by-albert-einstein">An Tree example: Foundations of Relativity by Albert Einstein</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-biggest-paradox-of-math-discovered-by-kurt-godel">The Biggest Paradox of Math, Discovered by Kurt Gödel</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-code-examples-analytical-and-numerical-approaches">Code Examples – Analytical and Numerical Approaches</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-final-lesson-from-history">A Final Lesson From History</a></p>
</li>
</ul>
<h2 id="heading-simple-analogy-the-tree-of-mathematics">Simple Analogy: The Tree of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518175609/78838825-d872-42df-9dc8-736fa012a630.jpeg" alt="Photo of two trees by Johannes Plenio: https://www.pexels.com/photo/two-brown-trees-1632790/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Imagine math as a vast tree growing forever.</p>
<p>The roots of the tree are the foundations of mathematics: logic and set theory. From this foundation emerge the main basic fields of math: arithmetic, algebra, geometry, and analysis.</p>
<p>As the tree divides further and further into more branches, new, more complex subfields start to appear, like topology, abstract algebra, and complex analysis. Sometimes the branches are connected to each other.</p>
<p>And remember: this tree is always growing in many directions. From branches creating new branches to branches connecting to other branches. Little by little, it grows.</p>
<p>Throughout history, there have been times that, due to some big scientific discoveries, parts of the math tree started to grow very fast. Other times, decades and even centuries passed without many new branches. This is the case for imaginary numbers, for example.</p>
<p>And you might wonder: How many more branches and connections between them will keep appearing?</p>
<h2 id="heading-the-structure-and-history-of-mathematics">The Structure and History of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518363058/9911acd4-ad4f-4da2-a62b-9fa87e219c35.jpeg" alt="Photo of a writing desk and notebook on Pixabay: https://www.pexels.com/photo/brown-wooden-desk-159618/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The first mathematical ideas appeared independently across ancient civilizations. For example:</p>
<ul>
<li><p>India’s invention of zero</p>
</li>
<li><p>Islamic algebraic advances</p>
</li>
<li><p>Greek geometric rigor</p>
</li>
</ul>
<p>Over time, many different great mathematicians created and shared them by writing and giving lectures.</p>
<p>Eventually, these new ideas were shared widely with new generations and these new generations created new math based on old math.</p>
<p>This is is how new branches are continuously born from previous branches of the tree of mathematics.</p>
<p>And this is why Isaac Newton wrote, in a letter to Robert Hooke in 1675:</p>
<blockquote>
<p>If I have seen further, it is by standing on the shoulders of giants</p>
</blockquote>
<p>He meant that by working from previous knowledge, he was able to create and (re)discover new ideas.</p>
<p>Yet, the real power of math lies in practicing it over and over and understanding it more and more deeply. As one of my professors once explained:</p>
<blockquote>
<p><em>More important than knowing the theorems is knowing the ideas behind them and the history of how they were created.</em></p>
</blockquote>
<p>Very often, to solve problems, it is necessary to think in terms of first principles and build from there. Math teaches exactly that. In this way, math is not just an academic subject. It is a language spoken by scientists and engineers around the globe.</p>
<p>By having it well preserved and shared, it is still possible to create new math from previous ideas. And it’s possible for the big tree to continue growing based on previous branches or nodes.</p>
<h2 id="heading-an-tree-example-foundations-of-relativity-by-albert-einstein">An Tree example: Foundations of Relativity by Albert Einstein</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518865627/e84ff108-b383-405b-8bb0-73ffb50b4dcf.jpeg" alt="Albert Einstein, one of the greatest physics giants in history" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Albert Einstein created the general and special theories of relativity. These have big consequences nowadays:</p>
<ul>
<li><p>GPS and Global Communication</p>
</li>
<li><p>Advancements in Satellite Telecommunications</p>
</li>
<li><p>Space Exploration and Satellite Launches</p>
</li>
</ul>
<p>But this was only possible through the unification of geometry with calculus, called <strong>differential geometry.</strong> The evolution of differential geometry happened over the centuries, thanks to many great mathematicians. Below are some of them, but this is not a complete list:</p>
<ul>
<li><p><strong>Euclid (circa 300 BCE):</strong> Contributed to geometry, laying the groundwork for later mathematical systems</p>
</li>
<li><p><strong>Archimedes (circa 287–212 BCE):</strong> Pioneered the understanding of volume, surface area, and the principles of mechanics</p>
</li>
<li><p><strong>René Descartes (1596–1650):</strong> Developed Cartesian coordinates and analytical geometry</p>
</li>
<li><p><strong>Isaac Newton (1642–1727) &amp; Gottfried Wilhelm Leibniz (1646–1716):</strong> Newton’s laws of motion and gravitation, alongside Leibniz’s development of calculus, formed the basis of classical mechanics that Einstein sought to extend and modify in his theory of relativity.</p>
</li>
<li><p><strong>Leonhard Euler (1707–1783):</strong> Contributed to the development of differential equations, which are essential in the mathematical foundations of physics.</p>
</li>
<li><p><strong>Gaspard Monge (1746–1818):</strong> The father of differential geometry and pioneer in descriptive geometry</p>
</li>
<li><p><strong>Carl Friedrich Gauss (1777–1855):</strong> Made groundbreaking advances in geometry, including the concept of curved surfaces.</p>
</li>
<li><p><strong>Bernhard Riemann (1826–1866):</strong> Introduced Riemannian geometry, a branch of differential geometry.</p>
</li>
</ul>
<p>Once again, as Isaac Newton wrote, in a letter to Robert Hooke in 1675:</p>
<blockquote>
<p>If I have seen further, it is by standing on the shoulders of giants.</p>
</blockquote>
<p>Albert Einstein saw what no one else in his time saw, thanks to these great math giants and countless others.</p>
<h2 id="heading-the-biggest-paradox-of-math-discovered-by-kurt-godel">The Biggest Paradox of Math, Discovered by Kurt Gödel</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518411126/df53f84c-f920-4b42-9081-5aeb1017f543.jpeg" alt="Kurt Gödel, one of the greatest math giants in history" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The biggest paradox in math, in my opinion, is what Kurt Gödel discovered. His early 20th century research revealed a limitation within this cycle.</p>
<p>This paradox – that is, <a target="_blank" href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems">his incompleteness theorems</a> – shows that in any consistent formal system capable of expressing simple arithmetic, there will always be true mathematical statements that cannot be proven within the system itself.</p>
<p>This means that in ALL systems, there are limits to what you can actually prove as to what is true and false. For for mathematicians, this means that the tree will never be completed. There are truths that are beyond formal truths, and yet we still assume that they are true (albeit unproven).</p>
<p>This way, it proves that no matter how many mathematicians work in the field or how much AI is used to find new mathematics, there will always exist limitations. Some things are impossible to prove that are true, and we just know that they are due to approximation estimations and other non logical exact methods.</p>
<h2 id="heading-what-about-applied-math-and-engineering">What About Applied Math and Engineering?</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518581076/606f3bce-d7db-4ac3-9322-833673a734b0.jpeg" alt="Photo by JESHOOTS.com: https://www.pexels.com/photo/person-holding-a-chalk-in-front-of-the-chalk-board-714699/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Applied math and engineering involves interpreting the same pure math ideas in real-world scenarios. Actually, in many cases, it is the combination of many math ideas. Let’s consider some examples:</p>
<p>Principal component analysis (PCA) is a widely used tool in data science. Yet, it is a mixture of linear algebra (in PCA, eigenvalues) with optimization (order eigenvalues that represent more data with less data) in order to make datasets shorter.</p>
<p>In machine learning, logistic regression is a mixture of calculus with statistics and probability.</p>
<p>In harmonic analysis, Laplace, Fourier, and Z-transforms are a way to see the same thing in a new domain to get new insights. In this case, integrals are used to make this mapping.</p>
<p>In deep learning, neural networks are just many matrices multiplying and updating themselves that adapt to model a dataset representing a system. This optimization of matrix values happens with activation functions, a gradient descent-based optimization method (tells how much values need to change), and backpropagation (applies those alterations to all matrix values).</p>
<p>I have actually written an article where I teach <a target="_blank" href="https://www.freecodecamp.org/news/activation-functions-in-neural-networks/">why activation functions are important</a> if you want to check it out.</p>
<p>But the best example of this fusion of math with engineering is in <a target="_blank" href="https://www.freecodecamp.org/news/basic-control-theory-with-python/">control theory</a>.</p>
<p>Control theory is the study of the architecture of systems. From trains to cars to airplanes, everything is based on control theory. It is everywhere in nearly all modern electronic devices. In electric circuits, control theory is also used heavily to guarantee circuit stability in the face of electric disturbances.</p>
<p>So as you can probably start to see, many of the tools we now have are just a mixture of many pure math ideas. Just many combinations and recipes of pure math ideas. In essence, applied math is the application of pure math as “ingredients“ in "recipes" to solve problems.</p>
<p>So, we’ve explored the structure and evolution of mathematics. Yet, it is important to see how these ideas can be applied in real life. Pure math makes the framework, and applied math applies that framework to solve problems. To understand this, we’ll examine two code examples that show how you can use math ideas as programming tools.</p>
<h2 id="heading-code-examples-analytical-and-numerical-approaches">Code Examples – Analytical and Numerical Approaches</h2>
<p>These code examples demonstrate a couple ways you can use Python to solve math equations.</p>
<p>In the first code example, we’ll solve the problem in the same way that kids in school solve math exercises: essentially, by hand with a pencil. Moving variables from left to right to find their values. In the second example, we’ll solve the problem using numerical analysis.</p>
<h3 id="heading-example-1-solve-a-problem-analytically">Example 1: Solve a Problem Analytically</h3>
<p>When we solve math problems analytically, like we did in school, we are manipulating symbols to get exact values. Often there symbols are x, y and z. In Python, we can do this using the SymPy library:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sympy <span class="hljs-keyword">import</span> symbols, Eq, solve

x, y = symbols(<span class="hljs-string">'x y'</span>)
eq1 = Eq(<span class="hljs-number">2</span>*x + <span class="hljs-number">3</span>*y, <span class="hljs-number">6</span>)
eq2 = Eq(-x + y, <span class="hljs-number">1</span>)

solution = solve((eq1, eq2), (x, y))
print(solution)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160359386/7a21cddc-f4ba-4f9f-afa0-d1cc11fb27d6.png" alt="7a21cddc-f4ba-4f9f-afa0-d1cc11fb27d6" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Essentially, we are finding x and y based on this equation:</p>
<p>$$\begin{align*} 2x + 3y &amp;= 6 \\ -x + y &amp;= 1 \end{align*}$$</p><p>Which gives us the following result:</p>
<pre><code class="lang-python">{x: <span class="hljs-number">3</span>/<span class="hljs-number">5</span>, y: <span class="hljs-number">8</span>/<span class="hljs-number">5</span>}
</code></pre>
<p>Or:</p>
<ul>
<li><p>x= 0.6</p>
</li>
<li><p>y = 1.6</p>
</li>
</ul>
<p>When we say that we’re solving this analytically, it means that we’re finding an exact mathematical solution using formulas or equations.</p>
<p>But many times, problems are harder and can be solved by adding symbols to the right or left of the equation.</p>
<p>Sometimes, there can be so many symbols and transformed versions of them, with things like derivatives and integrals, that it can become very hard to manage and takes a lot of time.</p>
<p>For this reason, there is an area of mathematics devoted to finding approximations of already created mathematical formulas called numerical analysis. It makes it faster to solve these problems. And this is the method we will explore next.</p>
<h3 id="heading-example-2-solve-numerically-approximation">Example 2: Solve Numerically (Approximation)</h3>
<p>We’ll now use SciPy to solve the same system with numerical methods:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> scipy.linalg <span class="hljs-keyword">import</span> solve

A = np.array([[<span class="hljs-number">3</span>, <span class="hljs-number">2</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>],
              [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">2</span>, <span class="hljs-number">-2</span>],
              [<span class="hljs-number">4</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
              [<span class="hljs-number">5</span>, <span class="hljs-number">3</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
              [<span class="hljs-number">2</span>, <span class="hljs-number">-3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>]])

b = np.array([<span class="hljs-number">12</span>, <span class="hljs-number">5</span>, <span class="hljs-number">7</span>, <span class="hljs-number">9</span>, <span class="hljs-number">10</span>])

solution = solve(A, b)

print(solution)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747160347486/d1f17aa6-b288-4e41-9be7-0810c45e778c.png" alt="d1f17aa6-b288-4e41-9be7-0810c45e778c" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>In this code example, this line of code:</p>
<pre><code class="lang-python">solution = solve(A, b)
</code></pre>
<p>Uses the <a target="_blank" href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html">solve</a> method from the <a target="_blank" href="https://scipy.org/">SciPy</a> Python library:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> scipy.linalg <span class="hljs-keyword">import</span> solve
</code></pre>
<p>It’s a method that helps you find the values of x in an equation A⋅x=b, where a is a square grid of numbers and b is a list of numbers. Which gives us the following:</p>
<pre><code class="lang-python">[ <span class="hljs-number">1.35022026</span> <span class="hljs-number">-0.79955947</span> <span class="hljs-number">-1.17180617</span>  <span class="hljs-number">3.14317181</span> <span class="hljs-number">-0.83920705</span>]
</code></pre>
<p>Now imagine, in this simple case, that a matrix like A could represent the <strong>traffic flow</strong> between cities or intersections, and b could represent the <strong>traffic entering or leaving</strong> each city.</p>
<p>By solving the system, it could help us determine the distribution of traffic between cities to meet desired traffic conditions.</p>
<p>Of course, these types of problems are far more complex in real life. But to understand and solve the big problems, you need to first understand the smaller problems.</p>
<p>And by the way, a system of equations is the same thing as a matrix. We just represent systems of equations as matrices to make the findings of properties and clarity easier to understand.</p>
<p>The thing is that by using matrices, it is easier to make calculations and to perform linear algebra math to check for characteristics of the matrix and understand it better.</p>
<p>In essence, a matrix represents a system of equations. Also, systems of equations can represent real life phenomena like the economy of a country or the weather.</p>
<p>If you want to know more, I wrote an <a target="_blank" href="https://www.freecodecamp.org/news/numerical-analysis-explained-how-to-apply-math-with-python/">entire article on numerical analysis</a> that you can check out.</p>
<h2 id="heading-the-impact-of-a-grand-unified-theory-of-mathematics">The Impact of a Grand Unified Theory of Mathematics</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747518681068/54a9556c-2a79-441c-a6d6-27ff38e1f4ff.jpeg" alt="Photo by Porapak Apichodilok: https://www.pexels.com/photo/person-holding-world-globe-facing-mountain-346885/" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Despite the biggest paradox in mathematics, what would happen with a <a target="_blank" href="https://www.scientificamerican.com/article/the-evolving-quest-for-a-grand-unified-theory-of-mathematics/">Grand Unified Theory of Mathematics</a>?</p>
<p>Remember that such a theory tells us that there are things that are true that are impossible to formally prove, and we need to just accept it. But even with this assumption, it is still possible to unify all math.</p>
<p>This is what <a target="_blank" href="https://en.wikipedia.org/wiki/Langlands_program">the Langland's program</a> is trying to solve. A kind of attempt to interconnect the largest parts of the big tree of math to uncover new patterns in math.</p>
<p>With a Grand Unified Theory of Mathematics, we would be able to understand how every branch of the tree connects with the others and all the relationships between them.</p>
<h3 id="heading-what-is-the-value-of-this-big-unification-for-society">What is the value of this big unification for society?</h3>
<p>By studying history, we can find patterns. The unification of various fields has created many massive impacts on society, such as:</p>
<ul>
<li><p>In the 19th century, James Clerk Maxwell united the fields of <em>electricity</em> and <em>magnetism</em> with his famous Maxwell equations. This allowed the creation of radios and electric grids around the globe. In turn, it served as a foundation for all technological progress in the 20th and 21th century.</p>
</li>
<li><p>In the 20th century, the unification of <em>algebra</em> with <em>logic</em> led to the rise of digital systems. In turn, digital systems gave the rise of processors and the evolution of computers to the modern laptop.</p>
</li>
<li><p>Also in the 20th century, the unification of <em>probability</em> and <em>communication</em> led to information theory. This became the foundation for the internet. This unification was carried out by a great mathematician called Clause Shannon.</p>
</li>
</ul>
<p>In the end, a Grand Unified Theory of Mathematics could be one of the biggest achievements in modern society.</p>
<p>It could lead to new discoveries in physics, such as in string theory or quantum gravity, where deep mathematical structures are needed to create new physics. In AI, it could help unify all machine learning models in a common architecture. This would help accelerate the development of new AI models. It could also open the door to new cryptographic methods and material science advances, revealing, with math, the deep patterns still not found in these fields.</p>
<p>Just as uniting electricity and magnetism led to modern technology, a unified math framework would lead to a wave of innovation.</p>
<h2 id="heading-a-final-lesson-from-history">A Final Lesson From History</h2>
<p>From Greek geometry to AI, math has grown like a tree over centuries. By understanding its structure, it is possible to see its role in finding the patterns of our universe. I hope I was able to make you see math in this way.</p>
<p>In addition, we can conclude that the unification of scientific fields makes the foundations for the creation of new innovations to help society go forward. Many profound societal transformations only came to be thanks to abstract math ideas. When these are shared and refined, they become the hidden architecture of progress in society. Innovation begins when disconnected ideas are united, well-linked, and widely shared.</p>
<p>Find the full code <a target="_blank" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ From Failure to International Success: How Online Learning Platforms Saved My Life ]]>
                </title>
                <description>
                    <![CDATA[ It is better to be a samurai in a garden than an agricultural worker in a war - Miyamoto Musashi In this article, I’ll share my story. When I was younger, I thought I was destined to be a failure in life. To be isolated from everyone. But years late... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-online-learning-platforms-saved-my-life/</link>
                <guid isPermaLink="false">67ed3edd597c0ff6bdcf4c45</guid>
                
                    <category>
                        <![CDATA[ Learning Journey ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Online education ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Personal growth   ]]>
                    </category>
                
                    <category>
                        <![CDATA[ learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 02 Apr 2025 13:42:53 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743601355025/2f8c32a4-c451-4860-b4e2-f02b690fb928.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <blockquote>
<p>It is better to be a samurai in a garden than an agricultural worker in a war - Miyamoto Musashi</p>
</blockquote>
<p>In this article, I’ll share my story.</p>
<p>When I was younger, I thought I was destined to be a failure in life. To be isolated from everyone. But years later, I realized I was actually destined for success.</p>
<p>I went from wasting thousands of hours playing video games to giving a lecture to medical professionals called “Trustworthy AI: The Role of Small Models in Critical Systems.”</p>
<p>And I went from being told I was dumber than most people based on an IQ test I took as a 14 year old low self-esteem kid, to becoming a frequent contributor to freeCodeCamp. I’ve written articles on interpretable AI, applied math, and advanced tech. And these articles have now reached more than 200,000 people worldwide.</p>
<p><strong>And this is just the beginning.</strong></p>
<p>When it comes to education, I owe gratitude to three people and the organizations they lead:</p>
<ul>
<li><p>Salman Khan, founder of Khan Academy</p>
</li>
<li><p>Quincy Larson, founder of freeCodeCamp</p>
</li>
<li><p>Andrew Ng, co-founder of Coursera and founder of DeepLearning.AI</p>
</li>
</ul>
<p>I also owe a lot to the great author of the book Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones – James Clear.</p>
<p>I’m sharing my story to inspire others who are struggling in their lives just like I was.</p>
<p>I’m also writing this for those who know me or will know me personally, so that they can understand where my determination comes from and why I am relentless.</p>
<h3 id="heading-heres-what-ill-cover">Here’s what I’ll cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-where-i-was-misery-depression-and-isolation">Where I Was: Misery, Depression, and Isolation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-my-transformation-learning-triple-integrals-and-programming">My Transformation: Learning Triple Integrals and Programming</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-one-of-the-best-choices-in-my-life-why-i-chose-electrical-and-computer-engineering">One of the Best Choices in My Life: Why I Chose Electrical and Computer Engineering</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-being-restless-and-determined-my-work-ethic-in-university">Being Restless and Determined: My Work Ethic in University</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-my-projects-while-in-nova-fct-ai-projects-international-student-organizations-and-freecodecamp-articles">My Projects while in NOVA FCT: AI Projects, International Student Organizations, and freeCodeCamp Articles</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-my-personal-philosophy-at-21-years-old-and-view-on-envy-and-negativity">My Personal Philosophy at 21 Years Old and View on Envy and Negativity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-where-i-am-today-a-fraction-of-what-i-have-achieved">Where I Am Today: A Fraction of What I Have Achieved</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts-have-an-adaptive-grand-strategy-for-your-life">Final Thoughts: Have an Adaptive Grand Strategy for Your Life</a></p>
</li>
</ol>
<h2 id="heading-where-i-was-misery-depression-and-isolation">Where I Was: Misery, Depression, and Isolation</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743425427463/49388c13-8c05-41ae-878b-316de0e3ed56.jpeg" alt="Photo by <a href=&quot;https://unsplash.com/@nmbalanial?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash&quot;>Nikko Balanial</a> on <a href=&quot;https://unsplash.com/photos/water-droplets-on-glass-window-4XSdSFgKm8k?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash&quot;>Unsplash</a>       " class="image--center mx-auto" width="2703" height="1850" loading="lazy"></p>
<p>Photo by <a target="_blank" href="https://unsplash.com/@nmbalanial?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Nikko Balanial</a> <a target="_blank" href="https://unsplash.com/@nmbalanial?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">on Unsplash</a></p>
<blockquote>
<p>"As it turns out, it was that very rock bottom that became the firmest foundation I had ever planted my feet on." — Mandy Hale</p>
</blockquote>
<p>Five years ago, I was in a different place and was a completely different person.</p>
<p>Like many teenagers, I started playing video games and became addicted to them. Over time, games became an escape from reality and all my problems, including my bad grades and many other issues.</p>
<p>At age 14, I still held ambitions in my heart. I dreamed of being someone who would help others, maybe as a doctor or an engineer.</p>
<p>But after an IQ and vocational guidance test, I was told that I was incapable of doing these things. That I lacked the intelligence needed. That it was unrealistic for me to pursue these types of degrees.</p>
<p>Eventually, and because of many comments, opinions, and expectations of others, I began to believe in this lie for years.</p>
<p>Over time, depressed and constantly escaping reality, my grades plummeted and I got worse and worse. And this only made the prospect of going to college less and less likely.</p>
<p>By 11th grade, I was:</p>
<ul>
<li><p>Extremely shy and anxious</p>
</li>
<li><p>Struggling academically</p>
</li>
<li><p>Over 2000 hours in video games on two games alone:</p>
<ul>
<li><p>1000 hours in GTA V</p>
</li>
<li><p>1000 hours in Destiny 2</p>
</li>
</ul>
</li>
</ul>
<p>2000 hours equals nearly 83 days.</p>
<p>This means that in these two games, I lost more than two months of my life.</p>
<p>But from these wasted hours, I learned English. This became crucial when learning online.</p>
<p>In January 2020, I was tired of everything. In particular, I was sick of the misery of always being at the bottom and of so much negativity towards and around me.</p>
<p>So I made these vows to myself for the rest of my life:</p>
<ul>
<li><p>Never again would I worry or care about what other people say about me.</p>
</li>
<li><p>I would no longer accept the limitations imposed by others or myself on my growth.</p>
</li>
<li><p>And as for the limitations I imposed on myself, I would rethink to see if they were really impossible or if I could actually conquer them.</p>
</li>
</ul>
<p>As a result, I started relearning and learning everything by myself to make sure I succeeded in the national exams.</p>
<h2 id="heading-my-transformation-learning-triple-integrals-and-programming">My Transformation: Learning Triple Integrals and Programming</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743425550115/3f61c996-1a20-415f-bc89-65eaf3119799.jpeg" alt="3f61c996-1a20-415f-bc89-65eaf3119799" class="image--center mx-auto" width="3000" height="2000" loading="lazy"></p>
<p>Photo by <a target="_blank" href="https://unsplash.com/@joshuaearle?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Joshua Earle</a> <a target="_blank" href="https://unsplash.com/@joshuaearle?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">on Unsplash</a></p>
<blockquote>
<p><em>"The expert in anything was once a beginner."</em><br>— Helen Hayes</p>
</blockquote>
<p>I started going through chemistry exercises in the 10th and 11th grades using books from school and YouTube videos. In two weeks, I relearned or learned most of the material I needed to know.</p>
<p>I started doing the same for mathematics, something I always found hard due to a lack of basic foundational mathematics knowledge.</p>
<p>I found it hard, that was, until I discovered Khan Academy.</p>
<p>With the Khan Academy, I rebuilt myself, going from struggling with basic math to mastering double and triple integrals, all within five to six months.</p>
<p>My method was simple:</p>
<ul>
<li><p>Study a little bit every day.</p>
</li>
<li><p>Take detailed notes</p>
</li>
<li><p>Redo quizzes or units tests until I scored more than 90%</p>
<ul>
<li>For topics that I found harder or failed to understand, I did the practice exercizes</li>
</ul>
</li>
<li><p>Use YouTube to close any knowledge gaps</p>
</li>
</ul>
<p>For example, for <a target="_blank" href="https://www.khanacademy.org/math/algebra">Algebra I</a>, where I started to relearn math, I say how many units there were. Each unit had a certain number of topics. As of 2025, Algebra I has, from units with mastery exercises, 89 topics.</p>
<p>For those 89 topics, I watched the videos and did the quizzes. According to my scores, I would either go on to the next video (if I felt confident), or stop, rewatch the video, go through the same material on YouTube, do practice exercises, and then try to do the quiz again.</p>
<p>I decided that I needed to do at least three topics every day. This way, by doing 3 topics per day, I could finish Algebra I by the end of one month.</p>
<p>But I was so motivated and so focused on it that I did more than 3 topics per day.</p>
<p>I did the same for Algebra II, and all the others until <a target="_blank" href="https://www.khanacademy.org/math/ap-calculus-bc">College Calculus BC</a>.</p>
<p>Some days, I completed more than 8 topics. Other days, I struggle to even do 2. But I made sure that I mastered mathematics and its foundation for the rest of my life.</p>
<p>This was not just about grades. It was about regaining belief and confidence in myself.</p>
<p>I also read many books, primarily self-help, to make myself better. Over the years, I have started reading fewer self-help books and have started focusing on non-fictional books that explain to me how the world works.</p>
<h3 id="heading-covid-19-accelerating-my-learning-in-programming-and-machine-learning"><strong>COVID-19: Accelerating My Learning in Programming and Machine Learning</strong></h3>
<p>When the pandemic hit, I started accelerating my learning in other areas, like programming and physics. In many online classes, I didn’t pay attention as well as I should’ve – and I found myself prioritizing self study on topics I found more important. And I always used my time to learn more about programming.</p>
<p>I learned Python and C through free YouTube courses for beginners on freeCodeCamp’s channel.</p>
<p>This was where I first learned Python.</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/rfscVS0vtbw" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p> </p>
<p>And C:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/KJgsSFOSQv0" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p> </p>
<p>Soon after exploring C programming, I realized that programming languages are just tools. Once you master one, others come more naturally.</p>
<p>I studied data science tutorials on the web and on YouTube. This way, I learned how to import Python libraries in virtual environments. I also began building projects with Python libraries I found interesting and made it a habit to explain every line of code to myself as if I were the teacher.</p>
<p>For example, I started working with the <a target="_blank" href="https://scikit-learn.org/stable/index.html">scikit learn</a> Python library to make simple linear and logistic models that could make make predictions.</p>
<p>I also decided to explore Deep Learning and taught myself how to work with <a target="_blank" href="https://www.arduino.cc/">Arduino</a> and circuits. In other words, I learned how to train the architecture of neural networks to predict things.</p>
<p>I found this hard to master compared to triple integrals at the time!</p>
<p>But this way, I understood one very important thing about Deep Learning: to truly understand and master it, I needed to know, deeply, some difficult mathematics concepts. And I also needed to learn quickly about the new research coming out.</p>
<h2 id="heading-one-of-the-best-choices-in-my-life-why-i-choose-electrical-and-computer-engineering">One of the Best Choices in My Life: Why I Choose Electrical and Computer Engineering</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743426438035/92ee17a4-c537-4922-82cc-cbb42c564baa.jpeg" alt="92ee17a4-c537-4922-82cc-cbb42c564baa" class="image--center mx-auto" width="3888" height="2592" loading="lazy"></p>
<p>Photo by <a target="_blank" href="https://unsplash.com/@nicolasthomas?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Nicolas Thomas</a> <a target="_blank" href="https://unsplash.com/@nicolasthomas?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">on Unsplash</a></p>
<blockquote>
<p><em>"Opportunities multiply as they are seized."</em><br>— Sun Tzu</p>
</blockquote>
<p>After completing the Portuguese national mathematics exam in the 12th grade, I chose Electrical and Computer Engineering (ECE). I choose this area, because it would challenge me and allow me to gain the skills to learn and apply new mathematics by myself without anyone teaching me.</p>
<p>It was also broad:</p>
<ul>
<li><p>If I liked, I could follow an electrical engineering area. Like circuits, power systems, or telecommunications.</p>
</li>
<li><p>If an electrical engineering subarea was not in my best interest, I could follow a computer science path or apply math in banking or other areas where people who know applied math work.</p>
</li>
</ul>
<p>The ECE degree also allwed me to unite the following areas:</p>
<ul>
<li><p>Advanced Mathematics</p>
</li>
<li><p>Programming (from low-level like assembly and C to high-level like Python)</p>
</li>
<li><p>Physics (circuits, robotics, communication systems)</p>
</li>
</ul>
<p>I wanted to become someone who not only mastered knowledge but could also create new systems and ideas from it.</p>
<p>I knew that I was laying the foundation for something greater than just academic success.</p>
<h3 id="heading-what-did-i-gain-exactly"><strong>What Did I Gain Exactly?</strong></h3>
<p>Over time, I learned the many skills I needed to understand all the new AI research coming out after completing AI specializations.</p>
<p>I also learned hard math and applied mathematics areas such as:</p>
<ul>
<li><p>Partial differential equations: how they can represent and model real phenomena, like the economy of a country.</p>
</li>
<li><p>Pure harmonic analysis: Fourier and Laplace transformations and how integral transformations allow us to see problems in other ways.</p>
</li>
<li><p>Complex analysis: application of derivatives and integrals in a complex domain, with real and imaginary numbers.</p>
</li>
<li><p>Numerical analysis: how math used to approximate analytical math is used by computers to get faster results.</p>
</li>
<li><p>Signal and control theory: how the architecture of systems is studied to ensure rocket, train, and car control systems are stable, despite possible disturbances in the systems.</p>
</li>
</ul>
<p>Not to mention physics classes like:</p>
<ul>
<li><p>Classical mechanics</p>
</li>
<li><p>Electromagnetism</p>
</li>
</ul>
<p>While these topics may not be applied in depth to AI, learning them helped me develop an incredible intuition into systems thinking. It also greatly improved my ability to learn hard STEM concepts on my own.</p>
<h2 id="heading-being-restless-and-determined-my-work-ethic-in-university">Being Restless and Determined: My Work Ethic in University</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743421252923/58fa0581-1312-4577-b186-b61a0d7ecac1.png" alt="Me at 18 years old" class="image--center mx-auto" width="460" height="460" loading="lazy"></p>
<p>Me at 18 years old</p>
<blockquote>
<p>"Success is the sum of small efforts, repeated day in and day out." — Robert Collier</p>
</blockquote>
<p>I adopted a very rigorous work ethic.</p>
<p>When my work ethic failed to achieve what I wanted, I adapted with more knowledge and learned very deeply what I did wrong so as not to repeat it.</p>
<p>For example, in the first semester, my first grades were not the best. So, I read:</p>
<ul>
<li><p>Deep Work</p>
</li>
<li><p>The Effective Executive: The Definitive Guide to Getting the Right Things Done</p>
</li>
<li><p>How to Become a Straight-A Student: The Unconventional Strategies Real College Students Use</p>
</li>
</ul>
<p>These books taught me how to focus and prioritize what needed to be done. This became essential as I entered one of the most demanding phases of my life.</p>
<p>In addition, I used Notion as a management system and Google Calendar as a schedule system.</p>
<p>Every week, I transferred next week's tasks from Google Calendar to Notion. This way, I never forgot anything and never worried about forgetting anything.</p>
<p>I had two simple scalable systems that worked very well for managing everything I did.</p>
<p>In the scheduling system, I would place certain events on repeat, for example:</p>
<ol>
<li><p>Every week:</p>
<ul>
<li>Read the top articles of the week on Subreddits dedicated to programming and others topics so I could keep learning and growing. Same with communities on Stack Exchange Network.</li>
</ul>
</li>
</ol>
<ul>
<li>Read new articles on IEEE Spectrum and learn as much as possible about what is happening currently.</li>
</ul>
<ol start="2">
<li><p>Every two weeks:</p>
<ul>
<li>Plan my studying according to time available, as well as all class and other resources I could get for tests, projects, and exams.</li>
</ul>
</li>
<li><p>Every month:</p>
<ul>
<li>Review my annual objectives and prioritize what was important and urgent to do in that month. Also, review new opportunities that appeared that aligned with my objectives this year and in my life.</li>
</ul>
</li>
</ol>
<p>This way, I was always aligned and efficient. And all this was from a Notion database.</p>
<p>Very often, I started working at 8:00am and continued until around 9:00 or 10:30am, when my classes often started. At that time, I studied, did student organization work, completed online courses and specializations, worked on AI projects, wrote freeCodeCamp articles, and many tasks.</p>
<p>I went beyond studying just the subjects from my degree:</p>
<ul>
<li><p>I also studied history, economics, and geopolitics to understand the hidden incentives that shape the world.</p>
</li>
<li><p>I developed the habit of studying the architecture of things, from political systems to technology, understanding how they work to design better systems.</p>
</li>
<li><p>I attended many free online and university events to learn as much as possible.</p>
</li>
</ul>
<p>I also treated weekends as opportunities to grow and work, and did not stop. This was not possible 100% of the time, but most days I was able to do so.</p>
<p>In this way, I completed Coursera’s prestigious Deep Learning Specialization, a very important achievement in my journey.</p>
<p>I also read many books and listened to podcasts while taking public transportation, ensuring that no time was wasted.</p>
<h2 id="heading-my-projects-while-in-nova-fct-ai-projects-international-student-organizations-and-freecodecamp-articles">My Projects While in NOVA FCT: AI Projects, International Student Organizations, and freeCodeCamp Articles</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743422159743/81944517-ee5d-4cd3-a84a-fd9726391175.jpeg" alt="81944517-ee5d-4cd3-a84a-fd9726391175" class="image--center mx-auto" width="2016" height="1512" loading="lazy"></p>
<p>Me at 21 years old.</p>
<blockquote>
<p><em>"The strength of the team is each individual member. The strength of each member is the team."</em><br>— Phil Jackson</p>
</blockquote>
<p>International student organizations often offer opportunities that are rarely found in local college clubs.</p>
<p>These student organizations are also often better managed than some local clubs, which can sometimes suffer from internal politics focused on titles rather than making a real contribution to society.</p>
<p>For this reason, I sought international organizations that pushed members towards real impact and development.</p>
<p>After a while, I became interested in BEST (Board of Engineering Students of Technology), a large international organization of student organizations spread over 80 groups around European universities. I joined the local group, BEST Almada, one of the 80 local BEST groups across Europe that helps foster the development of students through courses and events.</p>
<p>I also became deeply involved in the IEEE, the world’s largest non-profit professional association, where I served as the Vice-Chair of the IEEE NOVA Student Branch. Currently, I contribute nationally in the IEEE Portugal Section by creating videos for social media.</p>
<p>Thanks to IEEE, I was able to go to the IEEE Melecon conference in Porto last year to speak with some amazing scientists and researchers.</p>
<p>Here’s a key thing I learned from IEEE that I want to share: Communication, alignment of expectations between everybody, and knowing how to navigate social dynamics is crucial for any project or initiative to succeed. Of course, the culture of the organization and a lot of other variables are important as well. But I believe communication is one of the most important and critical factors.</p>
<p>Along this path, I worked on projects like Eurostatify AI, which aimed to provide European public data insights and hidden patterns that are accessible to researchers and policymakers. I also led the Doctor AI Project as part of a Hackathon in March 2023, where I developed two AI bots using Flutter and the ChatGPT API to help doctors make better decisions.</p>
<p>Each step helped me forge myself into someone capable of inspiring and leading others. I also taught complex topics in my freeCodeCamp articles, such as how CPUs work in depth, interpretable AI, quantum AI, and even how to design a control system for rockets.</p>
<p>I was involved in local student clubs before I realized the value of joining international organizations. In Europe, these organizations bring unique opportunities, are usually better managed than local groups. They’re a great place for developing soft skills as well.</p>
<p>So in the end, joining international student organizations was one of the best decisions of my university life.</p>
<h2 id="heading-my-personal-philosophy-at-21-years-old-and-view-on-envy-and-negativity">My Personal Philosophy at 21 Years Old and View on Envy and Negativity</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743425739851/135f7a2a-edab-48a4-91ee-e8aa3ed8c416.jpeg" alt="135f7a2a-edab-48a4-91ee-e8aa3ed8c416" class="image--center mx-auto" width="5953" height="3969" loading="lazy"></p>
<p>Photo by <a target="_blank" href="https://unsplash.com/@giamboscaro?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Giammarco Boscaro</a> <a target="_blank" href="https://unsplash.com/@giamboscaro?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">on Unsplash</a></p>
<blockquote>
<p><em>"Freedom lies in being bold."</em><br>— Robert Frost</p>
</blockquote>
<p>Here’s one thing I’ve learned over the years: You need to make your own path. Chasing social status and falling prey to social pressures isn’t worth it, and you shouldn’t be blinded by these things. True freedom comes from defining your own path. Developing relationships with professors and mentors, learning from books, and taking advantage of solid free learning resource are all things that can help you go further in life.</p>
<p>But what about envy and negativity from others?</p>
<p>Well, unfortunately these will always be part of our lives. Being envious is human nature, and various forms of negativity will likely continue to exist. Anyone who works and achieves any level of success will inevitably attract envy and negativity.</p>
<p>The best response is not to react and to ignore it completely. Just keep growing.</p>
<p>Some people will disappear, mock you, envy you, or hate you – but just try to let it all go. Keep walking your path.</p>
<p>Time is precious. Don’t waste it on:</p>
<ul>
<li><p>Meaningless opinions</p>
</li>
<li><p>Video games</p>
</li>
<li><p>Distractions</p>
</li>
</ul>
<p>I find it sad that, despite living in such an exciting time, and despite unprecedented access to knowledge and education, advances in technology, and immense global connectivity, some people still choose to hate and be envious of others. But as I said before, it’s human nature and there is little we can do about it.</p>
<p>Just remember: you have opportunities today that previous generations could only dream of. Take advantage of them to the fullest and worry about your own personal growth.</p>
<h2 id="heading-where-i-am-today-a-fraction-of-what-i-have-achieved">Where I am Today: A Fraction of What I Have Achieved</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743428837578/9f5359be-4cca-4840-b74c-ca78b44b8672.jpeg" alt="9f5359be-4cca-4840-b74c-ca78b44b8672" class="image--center mx-auto" width="1537" height="1533" loading="lazy"></p>
<p>Me in a Tesla factory in Silicon Valley</p>
<blockquote>
<p><em>"I am not a product of my circumstances. I am a product of my decisions."</em><br>— Stephen R. Covey</p>
</blockquote>
<p>At 21, I am finishing my degree in Electrical and Computer Engineering at NOVA FCT.</p>
<p>So far:</p>
<ul>
<li><p>I’ve been accepted into the Silicon Valley Fellowship Program: Only 18 out of 600 are accepted to visit Silicon Valley's top companies and universities.</p>
</li>
<li><p>I’ve delivered a talk to doctors about AI called "Trustworthy AI - The Role of Small AI Models in Critical Systems.". Before this, I delivered other smaller talks.</p>
</li>
<li><p>I’ve completed Coursera AI specializations such as the Deep Learning Specialization from DeepLearning.AI and Reinforcement Learning Specialization from the University of Alberta.</p>
</li>
<li><p>In IEEE (the largest non-profit professional association in the world), I served as the vice chair of my faculty IEEE NOVA SB student branch, and I am now an IEEE PT Officer, creating videos for social media.</p>
</li>
<li><p>I’ve had twenty articles published on freeCodeCamp since 2023 that have accumulated around 200,000 views. They are related to advanced applied math, AI, and technology. (Link below)</p>
</li>
<li><p>I’ve been recognized as a Top Open Source Contributor for freeCodeCamp in 2022, 2023, and 2024</p>
</li>
</ul>
<h2 id="heading-final-thoughts-have-an-adaptive-grand-strategy-for-your-life">Final Thoughts: Have an Adaptive Grand Strategy for Your Life</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743421784759/80ba16f2-fef7-4a4d-82b9-9287d7b43e53.jpeg" alt="80ba16f2-fef7-4a4d-82b9-9287d7b43e53" class="image--center mx-auto" width="1080" height="1293" loading="lazy"></p>
<p>Silicon Valley Fellowship post about me</p>
<blockquote>
<p><em>"What you do makes a difference, and you have to decide what kind of difference you want to make."</em><br>— Jane Goodall</p>
</blockquote>
<p>My life objective was and still the same when I was 14, 7 years ago:</p>
<ul>
<li>Help as many people as possible. In their opportunities to make their life’s better and in making society better for future generations that will come after mine.</li>
</ul>
<p><strong>My strongest advice for anyone: Have a grand strategy for your life.</strong></p>
<p>A grand strategy is a type of long-term strategy in which nations align power and resources to achieve their objectives. You must align your actions, skills, and knowledge towards your purpose.</p>
<p>I used to be afraid of public speaking and so many other things. Not anymore.</p>
<p>Now, I know I am destined to contribute, inspire, and leave a mark on other people's lives for the better.</p>
<p>If you feel stuck, remember:</p>
<ul>
<li><p>You can change! It will be hard. Many people will not want it.</p>
</li>
<li><p>Ignore all that and focus on yourself.</p>
</li>
</ul>
<p>It takes effort, patience and courage, but it is possible.</p>
<p><strong>Thanks to all organizations for the opportunity to contribute to society and grow as a person:</strong></p>
<ul>
<li><p>NOVA School of Science and Technology and its student association, AEFCT</p>
</li>
<li><p>IEEE Portugal Section</p>
</li>
<li><p>Silicon Valley Fellowship</p>
</li>
<li><p>BEST and BEST Almada</p>
</li>
<li><p>Magma Studio</p>
</li>
</ul>
<p>I also want to thank all the university professors at NOVA FCT who taught me, especially the ones from the department of electrical and computer engineering.</p>
<p>Finally, I want to express my gratitude to Portuguese society. Not long ago, in Portugal, pursuing higher education, especially in STEM, was inaccessible to many. Thanks to the efforts of past generations, today, young people like me can pursue these opportunities and contribute back to society.</p>
<p><strong>This is just the beginning of my impact on society.</strong></p>
<p>My FreeCodeCamp blog:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.freecodecamp.org/news/author/tiagomonteiro/">https://www.freecodecamp.org/news/author/tiagomonteiro/</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Quantum AI Model for Predicting Iris Flower Data with Python ]]>
                </title>
                <description>
                    <![CDATA[ Machine learning is an area of AI where the likes of ChatGPT and other famous models were created. These systems were all created with neural networks. The field of machine learning that deals with the creation of these neural networks is called deep... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-ai-model-for-predicting-data-with-python/</link>
                <guid isPermaLink="false">66ba5335582bb94cb02712aa</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Thu, 08 Aug 2024 13:18:14 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/08/pexels-guvo-20731157.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Machine learning is an area of AI where the likes of ChatGPT and other famous models were created. These systems were all created with neural networks.</p>
<p>The field of machine learning that deals with the creation of these neural networks is called deep learning. </p>
<p>In this blog post, we'll create a neural network with some neurons that run on a classical computer and others in quantum computers.</p>
<p>This way, creating and training a neural network with both types of neurons will create an AI model based on quantum computing, as most of the processing will occur in the quantum neurons.</p>
<p>We'll talk about these:</p>
<ul>
<li><a class="post-section-overview" href="#heading-introduction-to-ai-hybrid-neural-networks-and-its-benefits">Introduction to AI, hybrid neural networks and its benefits</a></li>
<li><a class="post-section-overview" href="#heading-quantum-ai-in-action-predicting-iris-flower-data-with-python">Quantum AI in Action: Predicting Iris Flower Data with Python</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-the-future-of-efficient-ai-models">Conclusion: The future of efficient AI models</a></li>
</ul>
<p><strong>Note:</strong> We'll create a simple neural network, avoiding complex architectures like transformers, deep dives into quantum physics, or advanced AI model optimization techniques.</p>
<p>The full code is available <a target="_blank" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">here</a>.</p>
<h2 id="Introduction">Introduction to AI, Hybrid Neural Networks and Its Benefits</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-pavel-danilyuk-8438918.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Pavel Danilyuk: https://www.pexels.com/photo/elderly-man-thinking-while-looking-at-a-chessboard-8438918/</em></p>
<h3 id="heading-what-is-deep-learning-in-artificial-intelligence">What is Deep Learning in Artificial Intelligence?</h3>
<p>Deep learning is a subfield of AI that uses neural networks to predict complex patterns like weather, classifying images, responding to text, and so on.</p>
<p>The bigger the neural network, the more complex things it can do. Like ChatGPT, which can process natural language to interact with users.</p>
<h3 id="heading-neural-networks">Neural Networks</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/Firefox_Screenshot_2024-08-03T13-56-12.699Z.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simple Neural Network</em></p>
<p>Deep learning is the training of neural networks to predict future data. Training a neural network involves feeding it data, allowing it to learn, and then making predictions.</p>
<p>Neural networks are composed of many neurons organized in layers. All layers get different patterns of the data.</p>
<p>This layer type structure allows AI models to interpret complex data and patterns. For example, the neural network in the image above can, for example, with 8 characteristics of data from the weather, be trained to predict whether if it will rain or not.</p>
<p>The layer that takes data is called the input layer and the final one is called the output layer. Between these are the hidden layers that capture complex patterns.</p>
<p>Of course, this is a very simple neural network, but the idea of training a neural network is the same for any complex architecture.</p>
<h3 id="heading-hybrid-neural-networks-combining-quantum-and-classical-computing">Hybrid Neural Networks - Combining Quantum and Classical Computing</h3>
<p>We'll now create a hybrid neural network. Essentially, the input and outputs layers will operate on classical computers while the hidden layer will process data on a quantum computer.</p>
<p>This approach uses the best of classical and quantum computing to train a neural network.</p>
<h3 id="heading-why-choose-hybrid-neural-networks-over-traditional-neural-networks">Why Choose Hybrid Neural Networks Over Traditional Neural Networks?</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-weekendplayer-45072.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Burak The Weekender: https://www.pexels.com/photo/lighted-light-bulb-in-selective-focus-photography-45072/</em></p>
<p>The main idea of using a hybrid neural network is to make the processing of data occur in a quantum computer, which is a lot faster than a classical computer.</p>
<p>In addition, quantum computers perform certain tasks with far less energy consumption. This efficiency in processing and energy usage allows the creation of smaller and more reliable AI models.</p>
<p>This is the main idea of a hybrid neural network: to create smaller and more efficient AI models.</p>
<h2 id="Quantum">Quantum AI in Action: Predicting Iris Flower Data with Python</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-googledeepmind-25626507.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Google DeepMind: https://www.pexels.com/photo/quantum-computing-and-ai-25626507/</em></p>
<p>In this code, we'll create a quantum based AI model to predict the species of iris flowers from the famous Iris dataset.</p>
<p>The code uses a quantum simulator called <code>default.qubit</code>, which mimics a quantum computer behavior on a classical computer. </p>
<p>This is possible because of the use of mathematical models to simulate quantum operations.</p>
<p>However, with some code alterations, you can run this code on the IBM, Amazon or Microsoft platforms to make it actually run on a quantum computer</p>
<pre><code class="lang-jsx"><span class="hljs-keyword">import</span> pennylane <span class="hljs-keyword">as</span> qml
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_iris
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler, OneHotEncoder
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score

# Load and preprocess the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# One-hot encode the labels
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y.reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>))

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_onehot, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>)

# Define a quantum device
n_qubits = <span class="hljs-number">4</span>
dev = qml.device(<span class="hljs-string">'default.qubit'</span>, wires=n_qubits)

# Define a quantum node
@qml.qnode(dev)
def quantum_circuit(inputs, weights):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(inputs)):
        qml.RY(inputs[i], wires=i)

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_qubits):
        qml.RX(weights[i], wires=i)
        qml.RY(weights[n_qubits + i], wires=i)

    <span class="hljs-keyword">return</span> [qml.expval(qml.PauliZ(i)) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_qubits)]

# Define a hybrid quantum-classical model
def hybrid_model(inputs, weights):
    <span class="hljs-keyword">return</span> quantum_circuit(inputs, weights)

# Initialize weights
np.random.seed(<span class="hljs-number">0</span>)
weights = np.random.normal(<span class="hljs-number">0</span>, np.pi, (<span class="hljs-number">2</span> * n_qubits,))

# Define a cost <span class="hljs-function"><span class="hljs-keyword">function</span>
<span class="hljs-title">def</span> <span class="hljs-title">cost</span>(<span class="hljs-params">weights</span>):
    <span class="hljs-title">predictions</span> = <span class="hljs-title">np</span>.<span class="hljs-title">array</span>(<span class="hljs-params">[hybrid_model(x, weights) for x in X_train]</span>)
    <span class="hljs-title">loss</span> = <span class="hljs-title">np</span>.<span class="hljs-title">mean</span>(<span class="hljs-params">(predictions - y_train) ** <span class="hljs-number">2</span></span>)
    <span class="hljs-title">return</span> <span class="hljs-title">loss</span>

# <span class="hljs-title">Optimize</span> <span class="hljs-title">the</span> <span class="hljs-title">weights</span> <span class="hljs-title">using</span> <span class="hljs-title">gradient</span> <span class="hljs-title">descent</span>
<span class="hljs-title">opt</span> = <span class="hljs-title">qml</span>.<span class="hljs-title">GradientDescentOptimizer</span>(<span class="hljs-params">stepsize=<span class="hljs-number">0.1</span></span>)
<span class="hljs-title">steps</span> = 100
<span class="hljs-title">for</span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(<span class="hljs-params">steps</span>):
    <span class="hljs-title">weights</span> = <span class="hljs-title">opt</span>.<span class="hljs-title">step</span>(<span class="hljs-params">cost, weights</span>)
    <span class="hljs-title">if</span> <span class="hljs-title">i</span> % 10 == 0:
        <span class="hljs-title">print</span>(<span class="hljs-params">f<span class="hljs-string">"Step {i}, Cost: {cost(weights)}"</span></span>)

# <span class="hljs-title">Test</span> <span class="hljs-title">the</span> <span class="hljs-title">model</span>
<span class="hljs-title">predictions</span> = <span class="hljs-title">np</span>.<span class="hljs-title">array</span>(<span class="hljs-params">[hybrid_model(x, weights) for x in X_test]</span>)
<span class="hljs-title">predicted_labels</span> = <span class="hljs-title">np</span>.<span class="hljs-title">argmax</span>(<span class="hljs-params">predictions, axis=<span class="hljs-number">1</span></span>)
<span class="hljs-title">true_labels</span> = <span class="hljs-title">np</span>.<span class="hljs-title">argmax</span>(<span class="hljs-params">y_test, axis=<span class="hljs-number">1</span></span>)

# <span class="hljs-title">Calculate</span> <span class="hljs-title">the</span> <span class="hljs-title">accuracy</span>
<span class="hljs-title">accuracy</span> = <span class="hljs-title">accuracy_score</span>(<span class="hljs-params">true_labels, predicted_labels</span>)
<span class="hljs-title">print</span>(<span class="hljs-params">f<span class="hljs-string">"Test Accuracy: {accuracy * 100:.2f}%"</span></span>)</span>
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/1-1.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Let's see the code block by block!</p>
<h3 id="heading-import-libraries">Import Libraries</h3>
<pre><code><span class="hljs-keyword">import</span> pennylane <span class="hljs-keyword">as</span> qml
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_iris
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> StandardScaler, OneHotEncoder
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/2-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Import Libraries</em></p>
<p>In this part of the code we imported the necessary libraries:</p>
<ul>
<li><code>pennylane</code> and <code>pennylane.numpy</code>: For creating and manipulating quantum circuits.</li>
<li><code>sklearn.datasets</code>: To load the Iris dataset.</li>
<li><code>sklearn.preprocessing</code>: For data preprocessing like scaling and encoding.</li>
<li><code>sklearn.model_selection</code>: For splitting the data into training and testing sets.</li>
<li><code>sklearn.metrics</code>: To evaluate the model's accuracy.</li>
</ul>
<h3 id="heading-load-and-preprocess-the-iris-dataset">Load and Preprocess the Iris Dataset</h3>
<pre><code># Load and preprocess the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# One-hot encode the labels
encoder = OneHotEncoder(sparse=False)
y_onehot = encoder.fit_transform(y.reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>))

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_onehot, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/3-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Load and Preprocess the Iris Dataset</em></p>
<p>Here, we prepared the data for training the neural network:</p>
<ul>
<li>Loads the Iris dataset and extracts features (<code>X</code>) and labels (<code>y</code>).</li>
<li>Standardizes the features to have zero mean and unit variance using <code>StandardScaler</code>.</li>
<li>One-hot encodes the labels for multi-class classification using <code>OneHotEncoder</code>.</li>
<li>Splits the dataset into training and test sets with a ratio of 80/20.</li>
</ul>
<h3 id="heading-define-the-quantum-device-and-circuit">Define the Quantum Device and Circuit</h3>
<pre><code># Define a quantum device
n_qubits = <span class="hljs-number">4</span>
dev = qml.device(<span class="hljs-string">'default.qubit'</span>, wires=n_qubits)

# Define a quantum node
@qml.qnode(dev)
def quantum_circuit(inputs, weights):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(inputs)):
        qml.RY(inputs[i], wires=i)

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_qubits):
        qml.RX(weights[i], wires=i)
        qml.RY(weights[n_qubits + i], wires=i)

    <span class="hljs-keyword">return</span> [qml.expval(qml.PauliZ(i)) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_qubits)]
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/4-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define the Quantum Device and Circuit</em></p>
<p>This segment defines the quantum device and circuit:</p>
<ul>
<li>Sets up a quantum device with 4 qubits using PennyLane's default simulator.</li>
<li>Defines a quantum circuit (<code>quantum_circuit</code>) that takes inputs and weights. The circuit applies rotation gates (<code>RY</code>, <code>RX</code>) to encode inputs and parameters, and measures the expectation values of <code>PauliZ</code> operators on each qubit.</li>
</ul>
<h3 id="heading-define-the-hybrid-model-and-initialize-weights">Define the Hybrid Model and Initialize Weights</h3>
<pre><code># Define a hybrid quantum-classical model
def hybrid_model(inputs, weights):
    <span class="hljs-keyword">return</span> quantum_circuit(inputs, weights)

# Initialize weights
np.random.seed(<span class="hljs-number">0</span>)
weights = np.random.normal(<span class="hljs-number">0</span>, np.pi, (<span class="hljs-number">2</span> * n_qubits,))
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/5-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define the Hybrid Model and Initialize Weights</em></p>
<p>Here, we actually created the model and started its weights.</p>
<ul>
<li>Defines a hybrid model function that utilizes the quantum circuit.</li>
<li>Initializes the weights for the model using a normal distribution with a specified seed for reproducibility.</li>
</ul>
<h3 id="heading-define-the-cost-function-and-optimize-weights">Define the Cost Function and Optimize Weights</h3>
<pre><code># Define a cost <span class="hljs-function"><span class="hljs-keyword">function</span>
<span class="hljs-title">def</span> <span class="hljs-title">cost</span>(<span class="hljs-params">weights</span>):
    <span class="hljs-title">predictions</span> = <span class="hljs-title">np</span>.<span class="hljs-title">array</span>(<span class="hljs-params">[hybrid_model(x, weights) for x in X_train]</span>)
    <span class="hljs-title">loss</span> = <span class="hljs-title">np</span>.<span class="hljs-title">mean</span>(<span class="hljs-params">(predictions - y_train) ** <span class="hljs-number">2</span></span>)
    <span class="hljs-title">return</span> <span class="hljs-title">loss</span>

# <span class="hljs-title">Optimize</span> <span class="hljs-title">the</span> <span class="hljs-title">weights</span> <span class="hljs-title">using</span> <span class="hljs-title">gradient</span> <span class="hljs-title">descent</span>
<span class="hljs-title">opt</span> = <span class="hljs-title">qml</span>.<span class="hljs-title">GradientDescentOptimizer</span>(<span class="hljs-params">stepsize=<span class="hljs-number">0.1</span></span>)
<span class="hljs-title">steps</span> = 100
<span class="hljs-title">for</span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(<span class="hljs-params">steps</span>):
    <span class="hljs-title">weights</span> = <span class="hljs-title">opt</span>.<span class="hljs-title">step</span>(<span class="hljs-params">cost, weights</span>)
    <span class="hljs-title">if</span> <span class="hljs-title">i</span> % 10 == 0:
        <span class="hljs-title">print</span>(<span class="hljs-params">f<span class="hljs-string">"Step {i}, Cost: {cost(weights)}"</span></span>)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/6-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define the Cost Function and Optimize Weights</em></p>
<p>Finally, we started training the quantum based neural network.</p>
<ul>
<li>Defines a cost function that calculates the mean squared error between predictions and true labels.</li>
<li>Uses PennyLane's <code>GradientDescentOptimizer</code> to minimize the cost function by updating weights iteratively. It prints the cost every 10 steps to track progress.</li>
</ul>
<p>It prints out:</p>
<pre><code>Step <span class="hljs-number">0</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.35359229278282217</span>
Step <span class="hljs-number">10</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.3145818194833503</span>
Step <span class="hljs-number">20</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.28937668289628116</span>
Step <span class="hljs-number">30</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.2733108557682183</span>
Step <span class="hljs-number">40</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.26273285477208475</span>
Step <span class="hljs-number">50</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.25532913470009133</span>
Step <span class="hljs-number">60</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.24973939376050813</span>
Step <span class="hljs-number">70</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.24517135825709957</span>
Step <span class="hljs-number">80</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.2411459409849017</span>
Step <span class="hljs-number">90</span>, <span class="hljs-attr">Cost</span>: <span class="hljs-number">0.23735091263019087</span>
</code></pre><h3 id="heading-test-the-model-and-evaluate-accuracy">Test the Model and Evaluate Accuracy</h3>
<pre><code># Test the model
predictions = np.array([hybrid_model(x, weights) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> X_test])
predicted_labels = np.argmax(predictions, axis=<span class="hljs-number">1</span>)
true_labels = np.argmax(y_test, axis=<span class="hljs-number">1</span>)

# Calculate the accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print(f<span class="hljs-string">"Test Accuracy: {accuracy * 100:.2f}%"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/7-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Test the Model and Evaluate Accuracy</em></p>
<p>Next, we evaluate the trained model:</p>
<ul>
<li>Makes predictions on the test set using the optimized weights.</li>
<li>Converts one-hot encoded predictions and true labels back to class labels.</li>
<li>Calculates and prints the accuracy of the model using <code>accuracy_score</code>.</li>
</ul>
<p>And the final results gave:</p>
<pre><code>Test Accuracy: <span class="hljs-number">66.67</span>%
</code></pre><p>An accuracy of 67% is not a good AI model result. This is because we did not optimize this neural network for this data.</p>
<p>We would need to change the neural network structure to get better results.</p>
<p>However, for this dataset, with just normal neural networks and a library like <a target="_blank" href="https://optuna.org/">optuna</a> for hyperparameter optimization, a far bigger accuracy surpassing 98% is possible and can be easily achieved.</p>
<p>Nevertheless, we created a simple quantum AI model.</p>
<h2 id="Conclusion">Conclusion: The Future of Efficient AI Models</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-pixabay-210158.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Pixabay: https://www.pexels.com/photo/low-angle-photography-of-grey-and-black-tunnel-overlooking-white-cloudy-and-blue-sky-210158/</em></p>
<p>Integrating quantum computing in AI allows the creation of smaller and more efficient AI models. With further advances in quantum technology, it will be more and more applied in AI.</p>
<p>In my point of view, the future of AI will eventually be integrated with quantum computers.</p>
<p>Here is the full code:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is a Kalman Filter? How to Simplify Noisy Data in Navigation and Finance ]]>
                </title>
                <description>
                    <![CDATA[ In a world where precision is key, handling noisy data effectively is crucial for solving complex problems. Whether you're trying to control a rocket or forecast the stock market, the ability to get good data from an uncertain environment is importan... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-a-kalman-filter-with-python-code-examples/</link>
                <guid isPermaLink="false">66ba5353f77647345442b9d5</guid>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 07 Aug 2024 13:42:54 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/08/pexels-skitterphoto-63901.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In a world where precision is key, handling noisy data effectively is crucial for solving complex problems.</p>
<p>Whether you're trying to control a rocket or forecast the stock market, the ability to get good data from an uncertain environment is important.</p>
<p>This is exactly the problem Kalman filters help solve. Kalman filters offer a solution that help you deal with noisy data in many fields.</p>
<p>In this article, we'll discuss:</p>
<ul>
<li><a class="post-section-overview" href="#heading-driving-through-fog-kalman-filters-as-your-headlights">Driving Through Fog: Kalman Filters as Your Headlights</a></li>
<li><a class="post-section-overview" href="#heading-what-are-kalman-filters">What are Kalman Filters?</a></li>
<li><a class="post-section-overview" href="#heading-kalman-filters-in-action-a-step-by-step-code-tutorial">Kalman Filters in Action: A Step-by-Step Code Example</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-navigating-nonlinear-data-with-advanced-techniques">Conclusion: Navigating Nonlinear Data with Advanced Techniques</a></li>
</ul>
<h2 id="Driving">Driving Through Fog: Kalman Filters as Your Headlights</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-eberhardgross-1287075.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by eberhard grossgasteiger: https://www.pexels.com/photo/forest-under-clouds-1287075/</em></p>
<p>Imagine you are driving through a dense fog with limited visibility.</p>
<p>To reach the destination, you rely on your senses and your car's navigation system that combines real-time data with a predetermined map.</p>
<p>As you move, the car navigation system is always constantly adjusting to get the destination, and you are always relying on your senses to drive the car well.</p>
<p>This process is very similar to how a Kalman Filter works.</p>
<p>It is constantly updating, and it refines estimates based on incoming data. Even though that data is full of noise and uncertainty.</p>
<p>By integrating past information with current information, a Kalman Filter gives you a clear picture of where you are and where you're headed.</p>
<h2 id="heading-what-are-kalman-filters">What are Kalman Filters?</h2>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-mikebirdy-170811.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/blue-bmw-sedan-near-green-lawn-grass-170811/">Mike Bird on Pexels</a></em></p>
<p>A Kalman filter is a math algorithm used to find the state of a dynamic system from many noisy measurements.</p>
<p>It is often used for systems that change over time – like tracking the position of a moving object.</p>
<h3 id="heading-how-does-a-kalman-filter-work">How Does a Kalman Filter Work?</h3>
<p>The Kalman filter predicts your current state based on past data, like the map and your previous location.</p>
<p>When new data appears, like new GPS signals, the filter compares the new data with its prediction and adjusts its estimate.</p>
<p>Even if the data is noisy, the Kalman filter uses a smart averaging process to improve the estimation. Like how you balance what your navigation system tells you and what you see on the road.</p>
<p>By always integrating new data with past data, Kalman filters help you know where you are and where you are going. This way, it is possible to predict things even in uncertain conditions.</p>
<h3 id="heading-why-are-kalman-filters-used-in-engineering">Why are Kalman Filters used in engineering?</h3>
<p>Since Kalman filters are able to handle incomplete data, they are widely used to make good predictions even when the measurements are not certain.</p>
<p>This makes them very useful for:</p>
<ul>
<li><strong>Navigation Systems</strong>: Estimating the position and velocity of vehicles.</li>
<li><strong>Robotics</strong>: Helping robots understand their environment and position.</li>
<li><strong>Finance</strong>: Filtering out noise from stock price data to predict trends.</li>
</ul>
<p>This way, they are very adaptive and can process real-time information</p>
<h3 id="heading-what-problem-did-kalman-filters-solve">What problem did Kalman Filters solve?</h3>
<p>Kalman filters were developed by Rudolf Kalman in the early 1960s to solve the problem of managing uncertainty and noise in data</p>
<p>Nowadays, they are great for extracting meaningful information from noisy data.</p>
<p>Mathematically, Kalman Filters are called linear quadratic estimators.</p>
<p>This is because, in the process of estimating the future based on current and past data, Kalman filters use:</p>
<ul>
<li>Linear algebra: The study of vectors and matrices used to solve linear equations.</li>
<li>Quadratic optimization: Finding the optimal solution for problems with squared terms</li>
</ul>
<h2 id="Kalman">Kalman Filters in Action: A Step-by-Step Code Tutorial</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-captainsopon-3402846.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by capt.sopon: https://www.pexels.com/photo/gray-airplane-control-panel-3402846/</em></p>
<p>Kalman Filters were created to handle linear systems – that is, systems that follow predictable patterns.</p>
<p>In this code example, we will implement an Extended Kalman Filter. This is a variant that was created to handle non-linear data (in other words, systems that have unpredictable or changing patterns).</p>
<p>Here's the full code (which we'll break down below):</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> filterpy.kalman <span class="hljs-keyword">import</span> ExtendedKalmanFilter <span class="hljs-keyword">as</span> EKF
<span class="hljs-keyword">from</span> filterpy.common <span class="hljs-keyword">import</span> Q_discrete_white_noise

def fx(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" State transition function for the nonlinear system. "</span><span class="hljs-string">""</span>
    # Example: x<span class="hljs-string">' = [x[0] + x[1]*dt, x[1]]
    F = np.array([x[0] + x[1]*dt, x[1]])
    return F

def hx(x):
    """ Measurement function for the nonlinear system. """
    # Example: z = [x[0]]
    return np.array([x[0]])

def jacobian_F(x, dt):
    """ Jacobian of the state transition function. """
    return np.array([[1, dt],
                     [0, 1]])

def jacobian_H(x):
    """ Jacobian of the measurement function. """
    return np.array([[1, 0]])

# Initialize EKF
ekf = EKF(dim_x=2, dim_z=1)

# Initial state
ekf.x = np.array([0, 1])

# Initial state covariance
ekf.P = np.eye(2)

# Process noise covariance
ekf.Q = Q_discrete_white_noise(dim=2, dt=1, var=0.1)

# Measurement noise covariance
ekf.R = np.array([[0.1]])

# Define the state transition and measurement functions
ekf.F = jacobian_F
ekf.H = jacobian_H

# Control input
dt = 1.0  # time step

# Simulated measurements
measurements = [1, 2, 3, 4, 5]

for z in measurements:
    # Predict step
    ekf.predict_update(z, HJacobian=jacobian_H, Hx=hx, Fx=fx, args=(dt,), hx_args=())

    # Print the current state estimate
    print("Estimated state:", ekf.x)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/1-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full code</em></p>
<p>Let's see the code block by block.</p>
<h3 id="heading-import-the-libraries">Import the Libraries</h3>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> filterpy.kalman <span class="hljs-keyword">import</span> ExtendedKalmanFilter <span class="hljs-keyword">as</span> EKF
<span class="hljs-keyword">from</span> filterpy.common <span class="hljs-keyword">import</span> Q_discrete_white_noise
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/2-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing Libraries</em></p>
<p>In this part of the code we import the Python libraries we need:</p>
<ul>
<li><strong><code>import numpy as np</code></strong>: This imports a tool called <a target="_blank" href="https://numpy.org/">NumPy</a>, which helps us work with numbers and lists of numbers (like a spreadsheet).</li>
<li><strong><code>from [filterpy](https://filterpy.readthedocs.io/en/latest/).kalman import ExtendedKalmanFilter as EKF</code></strong>: This brings in a tool called <code>ExtendedKalmanFilter</code> from the <code>filterpy</code> library. We will use this tool, named <code>EKF</code> here, to track things that change over time in a way that's not straight-line simple.</li>
<li><strong><code>from [filterpy](https://filterpy.readthedocs.io/en/latest/).common import Q_discrete_white_noise</code></strong>: This imports a function that helps us set up noise, which is like the natural "fuzziness" or uncertainty in our system.</li>
</ul>
<h3 id="heading-define-how-the-system-works">Define How the System Works</h3>
<pre><code>def fx(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" State transition function for the nonlinear system. "</span><span class="hljs-string">""</span>
    # Example: x<span class="hljs-string">' = [x[0] + x[1]*dt, x[1]]
    return np.array([x[0] + x[1]*dt, x[1]])

def hx(x):
    """ Measurement function for the nonlinear system. """
    # Example: z = [x[0]]
    return np.array([x[0]])</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/3-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define How the System Works</em></p>
<p>In this code we define how the system will work:</p>
<ul>
<li><strong><code>fx(x, dt)</code></strong>: This function describes how our system changes over time. It says the new position is the old position plus speed times time (<code>x[0] + x[1]*dt</code>). The speed (<code>x[1]</code>) stays the same.</li>
<li><strong><code>hx(x)</code></strong>: This function tells us what we can measure from the system. Here, it says we can measure the position (<code>x[0]</code>).</li>
</ul>
<h3 id="heading-define-how-changes-affect-the-system">Define How Changes Affect the System</h3>
<pre><code>def jacobian_F(x, dt):
    <span class="hljs-string">""</span><span class="hljs-string">" Jacobian of the state transition function. "</span><span class="hljs-string">""</span>
    <span class="hljs-keyword">return</span> np.array([[<span class="hljs-number">1</span>, dt],
                     [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>]])

def jacobian_H(x):
    <span class="hljs-string">""</span><span class="hljs-string">" Jacobian of the measurement function. "</span><span class="hljs-string">""</span>
    <span class="hljs-keyword">return</span> np.array([[<span class="hljs-number">1</span>, <span class="hljs-number">0</span>]])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/4-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define How the System Works</em></p>
<p>In this code we define how changes affect the system:</p>
<ul>
<li><strong><code>jacobian_F(x, dt)</code></strong>: This function shows us how sensitive the system is to changes in time and position. It helps the filter predict changes more accurately by considering these sensitivities.</li>
<li><strong><code>jacobian_H(x)</code></strong>: This function tells us how sensitive our measurement is to changes in position. It helps the filter adjust the prediction based on new measurements.</li>
</ul>
<h3 id="heading-set-up-the-kalman-filter">Set Up the Kalman Filter</h3>
<pre><code># Initialize EKF
ekf = EKF(dim_x=<span class="hljs-number">2</span>, dim_z=<span class="hljs-number">1</span>)

# Initial state
ekf.x = np.array([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
print(<span class="hljs-string">"Initial state:"</span>, ekf.x)

# Initial state covariance
ekf.P = np.eye(<span class="hljs-number">2</span>)
print(<span class="hljs-string">"Initial state covariance:\n"</span>, ekf.P)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/5-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Set Up the Kalman Filter</em></p>
<p>In this part of the code, we create a very simple Kalman filter:</p>
<ul>
<li><strong><code>ekf = EKF(dim_x=2, dim_z=1)</code></strong>: This creates an Extended Kalman Filter that tracks two things (position and speed) and one measurement (position).</li>
<li><strong><code>ekf.x = np.array([0, 1])</code></strong>: This sets the starting position to <code>0</code> and speed to <code>1</code>.</li>
</ul>
<p>It prints out:</p>
<pre><code>Initial state: [<span class="hljs-number">0</span> <span class="hljs-number">1</span>]
</code></pre><ul>
<li><strong><code>ekf.P = np.eye(2)</code></strong>: This is a way of saying we aren't very sure about our starting guesses. It's like saying "let's start from here, but we are open to changes."</li>
</ul>
<p>It prints out:</p>
<pre><code>Initial state covariance:
 [[<span class="hljs-number">1.</span> <span class="hljs-number">0.</span>]
 [<span class="hljs-number">0.</span> <span class="hljs-number">1.</span>]]
</code></pre><h3 id="heading-describe-uncertainty-in-the-system">Describe Uncertainty in the System</h3>
<pre><code># Process noise covariance
ekf.Q = Q_discrete_white_noise(dim=<span class="hljs-number">2</span>, dt=<span class="hljs-number">1</span>, <span class="hljs-keyword">var</span>=<span class="hljs-number">0.1</span>)
print(<span class="hljs-string">"Process noise covariance:\n"</span>, ekf.Q)

# Measurement noise covariance
ekf.R = np.array([[<span class="hljs-number">0.1</span>]])
print(<span class="hljs-string">"Measurement noise covariance:\n"</span>, ekf.R)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/6-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Describe Uncertainty in the System</em></p>
<ul>
<li><strong><code>ekf.Q = Q_discrete_white_noise(dim=2, dt=1, var=0.1)</code></strong>: This sets how much randomness or unpredictability we expect in the system itself. It's like saying, "things might not move exactly as we think."</li>
</ul>
<p>It prints out:</p>
<pre><code>Process noise covariance:
 [[<span class="hljs-number">0.025</span> <span class="hljs-number">0.05</span> ]
 [<span class="hljs-number">0.05</span>  <span class="hljs-number">0.1</span>  ]]
</code></pre><ul>
<li><strong><code>ekf.R = np.array([[0.1]])</code></strong>: This sets how much we trust our measurements. A smaller number means we trust them more.</li>
</ul>
<pre><code>Measurement noise covariance:
 [[<span class="hljs-number">0.1</span>]]
</code></pre><h3 id="heading-simulate-data-and-initial-state">Simulate Data and Initial State</h3>
<pre><code># Control input
dt = <span class="hljs-number">1.0</span>  # time step

# Simulated measurements
measurements = [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>]

# True initial state <span class="hljs-keyword">for</span> comparison (not used <span class="hljs-keyword">in</span> the EKF)
true_state = np.array([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
print(<span class="hljs-string">"\nTrue initial state:"</span>, true_state)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/7-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simulate Data and Initial State</em></p>
<ul>
<li><strong><code>dt = 1.0</code></strong>: This is the time between each step of our simulation.</li>
<li><strong><code>measurements = [1, 2, 3, 4, 5]</code></strong>: These are the pretend measurements we will use to test the filter.</li>
<li><strong><code>true_state = np.array([0, 1])</code></strong>: This is the real starting position and speed of our system, used for comparison.</li>
</ul>
<p>It gives:</p>
<pre><code>True initial state: [<span class="hljs-number">0</span> <span class="hljs-number">1</span>]
</code></pre><h3 id="heading-simulate-real-system-changes">Simulate Real System Changes</h3>
<pre><code># Simulate the <span class="hljs-literal">true</span> state evolution (<span class="hljs-keyword">for</span> comparison)
true_states = [true_state[<span class="hljs-number">0</span>]]
<span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(len(measurements) - <span class="hljs-number">1</span>):
    true_state = fx(true_state, dt)
    true_states.append(true_state[<span class="hljs-number">0</span>])

print(<span class="hljs-string">"\nSimulated true states (for reference):"</span>, true_states)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/8-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simulate Real System Changes</em></p>
<ul>
<li><strong>Simulating True States</strong>: This part calculates what the real position should be over time using the way the system works (<code>fx</code>). It's like having a perfect GPS to check against our estimates.</li>
</ul>
<pre><code>Simulated <span class="hljs-literal">true</span> states (<span class="hljs-keyword">for</span> reference): [<span class="hljs-number">0</span>, <span class="hljs-number">1.0</span>, <span class="hljs-number">2.0</span>, <span class="hljs-number">3.0</span>, <span class="hljs-number">4.0</span>]
</code></pre><h3 id="heading-filter-steps-to-estimate-the-state">Filter Steps to Estimate the State</h3>
<pre><code><span class="hljs-keyword">for</span> i, z <span class="hljs-keyword">in</span> enumerate(measurements):
    print(f<span class="hljs-string">"\nStep {i+1}:"</span>)
    print(<span class="hljs-string">"Measurement:"</span>, z)

    # Predict step
    ekf.predict(u=<span class="hljs-number">0</span>)  # Use predict_x <span class="hljs-keyword">if</span> you need to customize the prediction
    print(<span class="hljs-string">"Predicted state before update:"</span>, ekf.x)

    # Update step
    ekf.update(z, HJacobian=jacobian_H, Hx=hx, args=(), hx_args=())
    print(<span class="hljs-string">"Updated state after measurement:"</span>, ekf.x)
    print(<span class="hljs-string">"State covariance after update:\n"</span>, ekf.P)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/9.png" alt="Image" width="600" height="400" loading="lazy">
<em>Filter Steps to Estimate the State</em></p>
<p><strong>Loop Through Measurements</strong>: This loop goes through each fake measurement one by one.</p>
<ul>
<li><strong>Predict Step (<code>ekf.predict(u=0)</code>)</strong>: Before looking at the new measurement, the filter makes a guess about where the position and speed are now.</li>
<li><strong>Update Step (<code>ekf.update</code>)</strong>: After the guess, the filter sees the new measurement and adjusts its guess to be closer to this measurement, balancing the new information with what it previously predicted.</li>
</ul>
<p>Here are the results:</p>
<pre><code>Step <span class="hljs-number">1</span>:
Measurement: <span class="hljs-number">1</span>
Predicted state before update: [<span class="hljs-number">0.</span> <span class="hljs-number">1.</span>]
Updated state after measurement: [<span class="hljs-number">0.91111111</span> <span class="hljs-number">1.04444444</span>]
State covariance after update:
 [[<span class="hljs-number">0.09111111</span> <span class="hljs-number">0.00444444</span>]
 [<span class="hljs-number">0.00444444</span> <span class="hljs-number">1.09777778</span>]]

Step <span class="hljs-number">2</span>:
Measurement: <span class="hljs-number">2</span>
Predicted state before update: [<span class="hljs-number">0.91111111</span> <span class="hljs-number">1.04444444</span>]
Updated state after measurement: [<span class="hljs-number">1.49614396</span> <span class="hljs-number">1.31876607</span>]
State covariance after update:
 [[<span class="hljs-number">0.05372751</span> <span class="hljs-number">0.0251928</span> ]
 [<span class="hljs-number">0.0251928</span>  <span class="hljs-number">1.1840617</span> ]]

Step <span class="hljs-number">3</span>:
Measurement: <span class="hljs-number">3</span>
Predicted state before update: [<span class="hljs-number">1.49614396</span> <span class="hljs-number">1.31876607</span>]
Updated state after measurement: [<span class="hljs-number">2.15857605</span> <span class="hljs-number">1.95145631</span>]
State covariance after update:
 [[<span class="hljs-number">0.0440489</span>  <span class="hljs-number">0.0420712</span> ]
 [<span class="hljs-number">0.0420712</span>  <span class="hljs-number">1.25242718</span>]]

Step <span class="hljs-number">4</span>:
Measurement: <span class="hljs-number">4</span>
Predicted state before update: [<span class="hljs-number">2.15857605</span> <span class="hljs-number">1.95145631</span>]
Updated state after measurement: [<span class="hljs-number">2.91071524</span> <span class="hljs-number">2.95437384</span>]
State covariance after update:
 [[<span class="hljs-number">0.04084552</span> <span class="hljs-number">0.05446424</span>]
 [<span class="hljs-number">0.05446424</span> <span class="hljs-number">1.30228131</span>]]

Step <span class="hljs-number">5</span>:
Measurement: <span class="hljs-number">5</span>
Predicted state before update: [<span class="hljs-number">2.91071524</span> <span class="hljs-number">2.95437384</span>]
Updated state after measurement: [<span class="hljs-number">3.74022237</span> <span class="hljs-number">4.27039095</span>]
State covariance after update:
 [[<span class="hljs-number">0.03970292</span> <span class="hljs-number">0.06298888</span>]
 [<span class="hljs-number">0.06298888</span> <span class="hljs-number">1.33648045</span>]]
</code></pre><h2 id="Beyond">Conclusion: Navigating Nonlinear Data with Advanced Techniques</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/pexels-noellegracephotos-906055.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/close-up-photography-of-magnifying-glass-906055/">Noelle Otto on Pexels</a></em></p>
<p>Kalman Filters are a powerful tool for extracting accurate estimates from noisy and incomplete data. </p>
<p>Variants like the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) have been developed to address non-linearities in data. </p>
<p>However, these variants can still face challenges related to stability and accuracy when applied to complex non-linear systems. </p>
<p>This is due to their reliance on linear approximations, which may not capture the full dynamics of highly non-linear processes.</p>
<p>To overcome these limitations, alternative methods such Neural Network-based approaches have gained attention. </p>
<p>Neural Networks can learn complex patterns directly from data, offering a robust solution for highly non-linear scenarios.</p>
<p>Despite these advancements, Kalman Filters remain an important tool in various fields of science and economics due to their simplicity, efficiency, and effectiveness in a wide range of applications. </p>
<p>As technology continues to evolve, the integration of Kalman Filters with other advanced techniques will likely enhance their capability to navigate the challenges of non-linear data more effectively.</p>
<p>Here is the full code:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Rocket Control System: Basic Control Theory with Python ]]>
                </title>
                <description>
                    <![CDATA[ Building any control systems, including a rocket control system, involves combining control theory with programming. Control theory is the study of how to make systems behave in a desired way using inputs. Planes, cars, trains, circuits, rockets and ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/basic-control-theory-with-python/</link>
                <guid isPermaLink="false">66ba531cf77647345442b9cf</guid>
                
                    <category>
                        <![CDATA[ control theory ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Control Systems ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ System Design ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Tue, 06 Aug 2024 14:26:44 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/08/pexels-pixabay-2159.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building any control systems, including a rocket control system, involves combining control theory with programming.</p>
<p>Control theory is the study of how to make systems behave in a desired way using inputs.</p>
<p>Planes, cars, trains, circuits, rockets and many more systems need to have a brain or an architecture inside them.</p>
<p>Control theory is the study of the control architectures of these complex systems.</p>
<p>In this article, we will explore how to apply control theory to create a rocket control system using Python.</p>
<p>This is a simple guide to how the architecture of complex systems is created. In this case, it's based on a rocket.</p>
<p>In this article, you will learn about:</p>
<ul>
<li><a class="post-section-overview" href="#heading-rocket-systems-and-cake-baking-a-fun-comparison">Rocket Systems and Cake Baking: A Fun Comparison</a></li>
<li><a class="post-section-overview" href="#heading-rocket-control-made-simple-understanding-pid-controllers">Rocket Control Made Simple: Understanding PID Controllers</a></li>
<li><a class="post-section-overview" href="#heading-code-example-designing-a-simple-pid-controller">Code example: Designing a simple PID controller</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-non-linear-control-systems">Conclusion: Non-linear control systems</a></li>
</ul>
<p><strong>Note:</strong> We'll assume the rocket is time-invariant, meaning its behavior doesn't change over time. Addressing time-varying dynamics would complicate this tutorial more than I'd like. </p>
<h2 id="Cake">Rocket Systems and Cake Baking: A Fun Comparison</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/cake.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/white-icing-cover-cake-1702373/">Brent Keane on Pexels</a></em></p>
<h3 id="heading-what-is-a-rocket-control-system">What is a Rocket Control System?</h3>
<p>Imagine you are backing a cake. Your recipe provides the steps and ingredients needed to bake the cake.</p>
<p>In this analogy:</p>
<ul>
<li>The cake is the rocket</li>
<li>The recipe is the rocket flight plan</li>
<li>The baker's actions are the rocket control system</li>
</ul>
<p>Just as you change the oven temperature or mixing time to get the best cake, a control system changes rocket's parameters to ensure it stays on its course and remains stable.</p>
<h3 id="heading-why-are-control-systems-important-in-programming">Why are control systems important in programming?</h3>
<p>By understanding control systems, you'll become better at algorithmic design and systems thinking.</p>
<p>It also enables you to figure out how to adjust processes in feedback loops. This is very important in many areas of programming.</p>
<p>You'll mainly use control theory and control systems when creating software for:</p>
<ul>
<li><strong>Robotics and Automation</strong>: Control systems enable precise movement and adaptability in robots using feedback loops based on sensor input.</li>
<li><strong>Signal Processing and Communication</strong>: They optimize data transmission, error correction, and filtering for reliable communication.</li>
<li><strong>Embedded Systems and IoT</strong>: Control systems manage device interactions with environments, processing sensor inputs and adjusting outputs efficiently.</li>
</ul>
<h3 id="heading-how-to-create-a-rocket-control-system">How to Create a Rocket Control System</h3>
<p>In terms of our cake baking analogy:</p>
<ol>
<li><strong>Choose the Cake and Recipe</strong>: Select a simple control strategy, like choosing a basic cake recipe. A common choice is a PID controller because it's simple and effective.</li>
<li><strong>Understanding the Ingredients</strong>: Derive a mathematical model of the characteristics and trajectory of the rocket. Like studying the recipe and ingredients. This way, we get a clear understanding of the system.</li>
<li><strong>Gathering Initial Ingredients</strong>: Set initial parameters, similar to gathering your basic ingredients. </li>
<li><strong>Mixing and Baking</strong>: Adjust and test the system, much like mixing ingredients and baking. This involves using various graphs to check stability and performance.</li>
<li><strong>Adding Final Touches</strong>: Fine-tune the parameters, just like adding final decorations to your cake, to optimize the control system for efficiency.</li>
<li><strong>Following the Recipe</strong>: Convert your design into a practical form, like carefully following a cake recipe.</li>
</ol>
<h2 id="Rocket">Rocket Control Made Simple: Understanding PID Controllers</h2>

<h3 id="heading-a-simple-control-system-the-pid-controller">A simple control system: The PID controller</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/M6_ControlSystemsdiagram.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example of control system diagram (<a target="_blank" href="https://edtech.engineering.utoronto.ca/object/control-systems-diagrams">source</a>)</em></p>
<p>Every control system has a controller that runs it. One of the most used controllers is the PID controller.</p>
<p>In the code example here, we will use the PID controller. This is because it's simple and effective for simple control systems.</p>
<p>In a rocket control system, the rocket's PID controller constantly adjusts the rocket's path (processing block) by comparing its current position to where it should be (feedback block).</p>
<p>This way, the rocket stays on course and reaches its final destination.</p>
<p>The PID controller has three key parts that work in the processing and feedback part of the system: proportional gain (Kp), integral gain (Ki), and derivative gain (Kd).</p>
<ul>
<li><strong>The proportional gain (Kp):</strong> Reacts immediately to any error, making the system respond quickly but sometimes causing it to overshoot the target.</li>
<li><strong>The integral gain (Ki):</strong> Fixes past errors by adding them up over time, getting rid of any leftover errors, but it can make the system unstable if set too high.</li>
<li><strong>The derivative gain (Kd):</strong> Predicts future errors to help prevent overshooting and smooth out the system's response.</li>
</ul>
<p>This is why it's called a PID (Proportional-Integral-Derivative) controller.</p>
<p>These three parts work together to create a control signal that changes the rocket's setting. This ensures that it's stable, accurate and effective.</p>
<p>With the PID controller, we can control how the inputs like thrust and altitude change the position and speed to ensure the rocket is stable and on its intended path.</p>
<h3 id="heading-analyzing-stability">Analyzing Stability</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/stability.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/closeup-photography-of-stacked-stones-1051449/">Shiva Smyth on Pexels</a></em></p>
<p>To design a PID controller means to design a stable control system.</p>
<p>The process of designing a stable control system is called stability analysis.</p>
<p>There are many methods, but in the code example we will use:</p>
<ul>
<li><strong>Root locus:</strong> Shows system stability and response</li>
<li><strong>Bode plot:</strong> Displays system <em>gain</em> and <em>phase margins</em></li>
<li><strong>Nyquist plot:</strong> Illustrates stability and potential oscillations</li>
</ul>
<p>In this case, the gain and phase margins simply mean that the control system can tolerate changes.</p>
<p>The gain margin tells us how much the system gain can increase without losing stability. Gain means how much to amplify the input signal to make the output signal.</p>
<p>The phase margin tells us how much delay is tolerable without losing stability. Delay in control theory means how much time it takes for the output to respond to the input.</p>
<p>This tells us how to change the Kp, Ki, and Kd so that the PID controller can control the rocket in an effective manner.</p>
<h3 id="heading-the-need-for-transfer-functions-controlling-the-rocket-and-determining-component-values">The Need for Transfer Functions: Controlling the Rocket and Determining Component Values</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/Transfer-function-v2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To implement any control system, we need two transfer functions: one theoretical and one physical.</p>
<p>Transfer functions tell us how inputs convert to outputs in a mathematical way.</p>
<p>The theoretical function is, in this case, the PID controller.</p>
<p>The physical system transfer function represents real-world dynamics and behavior of the physical components in the system.</p>
<p>By combining both, we can understand the behavior of materials and component values such as:</p>
<ul>
<li>Capacitor values for energy storage</li>
<li>Sensor calibration values for accurate data measurement and feedback</li>
<li>Spring constants for shock absorption systems</li>
<li>Pressure ratings for fuel and oxidizer tanks</li>
</ul>
<p>This way, the PID controller is not only the brain of the rocket but also can tell us the values of the components needed so that the rocket can fly its path.</p>
<h3 id="heading-how-do-engineers-find-the-physical-transfer-function-equation">How do engineers find the physical transfer function equation?</h3>
<p>First, we need to understand what the rocket is for.</p>
<p>Will it send a LEO (Low Earth Orbit) or MEO (medium Earth orbit) satellite to space or a rocket to the moon?</p>
<p>After knowing its use case, we can, with math and physics, find the physical equation of the transfer function.</p>
<p>There is actually an entire field of engineering called <strong>system identification</strong> dedicated to this.</p>
<p>Now let's see how to find, for any control system, its physical transfer function.</p>
<h2 id="Code">Code example: Designing a simple PID controller</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/rocket.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/space-rocket-launching-73871/">Pixabay</a></em></p>
<p>Now with this code example, we will create a simple control system for a rocket.</p>
<p>Before we dive into the code, let's talk about decibels.</p>
<p>Decibels use a logarithmic scale to measure sound. In control theory, they measure gain in a way that's easier to visualize on graphs.</p>
<p>This way, we can see many more large and small values in a manageable range.</p>
<p>In other words, by seeing the gain in a logarithmic scale, we are seeing how much the input is amplified to be the output in a manageable range of values.</p>
<p>I'll also explain how root locus, Bode plot, and Nyquist plots assist engineers in stability analysis.</p>
<p>Let's see the code – and then we'll analyze it block by block:</p>
<pre><code># Step <span class="hljs-number">1</span>: Import libraries
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> control <span class="hljs-keyword">as</span> ctrl

# Step <span class="hljs-number">2</span>: Define a <span class="hljs-keyword">new</span> rocket transfer <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">with</span> <span class="hljs-title">poles</span> <span class="hljs-title">closer</span> <span class="hljs-title">to</span> <span class="hljs-title">the</span> <span class="hljs-title">imaginary</span> <span class="hljs-title">axis</span>
<span class="hljs-title">num</span> = [10] 
<span class="hljs-title">den</span> = [2, 2, 1] 
<span class="hljs-title">G</span> = <span class="hljs-title">ctrl</span>.<span class="hljs-title">TransferFunction</span>(<span class="hljs-params">num, den</span>)

# <span class="hljs-title">Step</span> 3: <span class="hljs-title">Design</span> <span class="hljs-title">a</span> <span class="hljs-title">PID</span> <span class="hljs-title">controller</span> <span class="hljs-title">with</span> <span class="hljs-title">new</span> <span class="hljs-title">parameters</span>
<span class="hljs-title">Kp</span> = 5
<span class="hljs-title">Ki</span> = 2
<span class="hljs-title">Kd</span> = 1
<span class="hljs-title">C</span> = <span class="hljs-title">ctrl</span>.<span class="hljs-title">TransferFunction</span>(<span class="hljs-params">[Kd, Kp, Ki], [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>]</span>)

# <span class="hljs-title">Step</span> 4: <span class="hljs-title">Applying</span> <span class="hljs-title">the</span> <span class="hljs-title">PID</span> <span class="hljs-title">controller</span> <span class="hljs-title">to</span> <span class="hljs-title">the</span> <span class="hljs-title">rocket</span> <span class="hljs-title">transfer</span> <span class="hljs-title">function</span>
<span class="hljs-title">CL</span> = <span class="hljs-title">ctrl</span>.<span class="hljs-title">feedback</span>(<span class="hljs-params">C * G, <span class="hljs-number">1</span></span>)

# <span class="hljs-title">Step</span> 5: <span class="hljs-title">Plot</span> <span class="hljs-title">Root</span> <span class="hljs-title">Locus</span> <span class="hljs-title">for</span> <span class="hljs-title">Closed</span>-<span class="hljs-title">Loop</span> <span class="hljs-title">System</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">figure</span>(<span class="hljs-params">figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>)</span>)
<span class="hljs-title">ctrl</span>.<span class="hljs-title">root_locus</span>(<span class="hljs-params">C * G, grid=True</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">title</span>(<span class="hljs-params"><span class="hljs-string">"Root Locus Plot (Closed-Loop)"</span></span>)

# <span class="hljs-title">Step</span> 6: <span class="hljs-title">Plot</span> <span class="hljs-title">Bode</span> <span class="hljs-title">Plot</span> <span class="hljs-title">for</span> <span class="hljs-title">Closed</span>-<span class="hljs-title">Loop</span> <span class="hljs-title">System</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">figure</span>(<span class="hljs-params">figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>)</span>)
<span class="hljs-title">ctrl</span>.<span class="hljs-title">bode_plot</span>(<span class="hljs-params">CL, dB=True, Hz=False, deg=True</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">suptitle</span>(<span class="hljs-params"><span class="hljs-string">"Bode Plot (Closed-Loop)"</span>, fontsize=<span class="hljs-number">16</span></span>)

# <span class="hljs-title">Step</span> 7: <span class="hljs-title">Plot</span> <span class="hljs-title">Nyquist</span> <span class="hljs-title">Plot</span> <span class="hljs-title">for</span> <span class="hljs-title">Closed</span>-<span class="hljs-title">Loop</span> <span class="hljs-title">System</span>
<span class="hljs-title">plt</span>.<span class="hljs-title">figure</span>(<span class="hljs-params">figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>)</span>)
<span class="hljs-title">ctrl</span>.<span class="hljs-title">nyquist_plot</span>(<span class="hljs-params">CL</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">title</span>(<span class="hljs-params"><span class="hljs-string">"Nyquist Plot (Closed-Loop)"</span></span>)

<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>(<span class="hljs-params"></span>)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full Code</em></p>
<h3 id="heading-step-1-import-libraries">Step 1: Import libraries</h3>
<pre><code><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> control <span class="hljs-keyword">as</span> ctrl
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries</em></p>
<p>Here we import two libraries:</p>
<ul>
<li><a target="_blank" href="https://matplotlib.org/">matplotlib</a>: A plotting library for creating various types of visualizations</li>
<li><a target="_blank" href="https://python-control.readthedocs.io/en/0.10.0/">Control</a>: A library for analyzing and designing control systems</li>
</ul>
<h3 id="heading-step-2-define-the-transfer-function-of-the-rocket-system">Step 2: Define the Transfer Function of the Rocket System</h3>
<pre><code>num = [<span class="hljs-number">10</span>] 
den = [<span class="hljs-number">2</span>, <span class="hljs-number">2</span>, <span class="hljs-number">1</span>] 
G = ctrl.TransferFunction(num, den)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define the Transfer Function of the Rocket System</em></p>
<p>In this code we define the transfer function of the physical system</p>
<ul>
<li><strong><code>num=[10]</code></strong>: Sets the system gain to 10.</li>
<li><strong><code>den=[2,2,1]</code></strong>: Defines the denominator.</li>
<li><strong><code>G = ctrl.transferFunction(num, cen)</code></strong>: Constructs the transfer function.</li>
</ul>
<p>This is the transfer function we are going to control with PID:</p>
<p>&lt;!DOCTYPE html&gt;</p>


    
    
    Black-Scholes Equation


    <div class="card">
        <div class="card-body">
            <p>
                $$\frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} = rV - rS \frac{\partial V}{\partial S}$$
            </p>
            <h5 id="heading-rocket-transfer-function">Rocket transfer function
          </h5>
        </div>
    </div>




<p>In this code example, the transfer function rocket equation is very simple. But in real life, rocket transfer functions are not time-invariant linear systems. Usually, they are very complex non-linear systems.</p>
<h3 id="heading-step-3-design-a-pid-controller-with-new-parameters">Step 3: Design a PID controller with new parameters</h3>
<pre><code>Kp = <span class="hljs-number">5</span>
Ki = <span class="hljs-number">2</span>
Kd = <span class="hljs-number">1</span>
C = ctrl.TransferFunction([Kd, Kp, Ki], [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Design a PID controller with new parameters</em></p>
<p>This code sets up a PID controller with specific gains and creates a transfer function:</p>
<ul>
<li><strong><code>Kp = 5</code></strong>: Sets the proportional gain to 5.</li>
<li><strong><code>Ki = 2</code></strong>: Sets the integral gain to 2.</li>
<li><strong><code>Kd = 1</code></strong>: Sets the derivative gain to 1.</li>
<li><strong><code>C = ctrl.TransferFunction([Kd, Kp, Ki], [1, 0])</code></strong>: Creates a transfer function of the PID controller</li>
</ul>
<h3 id="heading-step-4-applying-the-pid-controller-to-the-rocket-transfer-function">Step 4: Applying the PID controller to the rocket transfer function</h3>
<pre><code>CL = ctrl.feedback(C * G, <span class="hljs-number">1</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/5.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying the PID controller to the rocket transfer function</em></p>
<ul>
<li><strong><code>C * G</code></strong>: Multiplies the PID controller <code>C</code> with the system <code>G</code> (the rocket) to form the open-loop transfer function, which models the system's behavior without feedback and relies on predefined settings.</li>
<li><strong><code>ctrl.feedback(C * G, 1)</code></strong>: Computes the closed-loop transfer function by applying feedback and representing the system's behavior with feedback. This allows it to adjust inputs and automatically correct errors.</li>
<li><strong><code>CL</code></strong>: Stores the resulting closed-loop system, integrating the controller with the rocket to maintain desired performance through feedback, and is used for further analysis or simulation.</li>
</ul>
<h3 id="heading-step-5-root-locus-for-gain-analysis">Step 5: Root Locus for gain analysis</h3>
<p>In this code:</p>
<pre><code>plt.figure(figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>))
ctrl.root_locus(C * G, grid=True)
plt.title(<span class="hljs-string">"Root Locus Plot (Closed-Loop)"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/6.png" alt="Image" width="600" height="400" loading="lazy">
<em>Create the Root Locus Graph</em></p>
<p>We generate this plot:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/root-locus.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simple Root Locus Graph</em></p>
<p>This is a root locus graph. It was invented to help engineers study the stability of control systems.</p>
<p>The crosses on the graph, called poles, are very important.</p>
<p>If they are on the left side of the graph, the system is stable. If they are on the right side, the system is unstable.</p>
<p>The more to the left they are, the quicker the system will return to normal after a disturbance, and thus, the more stable it will be.</p>
<p>But moving more to the left can cause too many oscillations, depending on their specific locations.</p>
<p>The key point is:</p>
<ul>
<li>By changing the <strong><code>Kp</code></strong>, <code>Ki</code>, and <strong><code>Kd</code></strong>, this moves the poles to be as far left as possible without causing oscillations.</li>
</ul>
<p>However, the root locus graph is not enough to ensure stability. We need to use the Bode and Nyquist plots as well. Only with them can we get the best PID controller values for the rocket control system.</p>
<h3 id="heading-step-6-bode-plot-for-stability-analysis">Step 6: Bode Plot for Stability Analysis</h3>
<p>In this code:</p>
<pre><code>plt.figure(figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>))
ctrl.bode_plot(CL, dB=True, Hz=False, deg=True)
plt.suptitle(<span class="hljs-string">"Bode Plot (Closed-Loop)"</span>, fontsize=<span class="hljs-number">16</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/7.png" alt="Image" width="600" height="400" loading="lazy">
<em>Create the Bode Plot Graph</em></p>
<p>We generate this plot:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/bode.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simple Bode Plot</em></p>
<p>The Bode plot was invented to help engineers understand how a system responds to changes and how stable it will be under different conditions.</p>
<p>The Bode plot also shows the system's stability and safety margins.</p>
<p>Let's understand how it works:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/detail-bode.png" alt="Image" width="600" height="400" loading="lazy">
<em>Bode Plot in detail</em></p>
<p>The graph on top is called the Magnitude Plot and the one below it is called the Phase Plot.</p>
<p>The magnitude plot measures the gain of a system across different frequencies. Higher gain means quicker and stronger reactions, which is good for precise control.</p>
<p>The phase plot measures the phase shift introduced by the system across different frequencies. The phase shift is seen when the gain is 0.</p>
<p>In this case, we can see with the green line when the gain is zero and what phase shift is associated with that in the red line. It is approximately 63 degrees.</p>
<p>An ideal range is a phase shift of 30 to 60 degrees, which balances stability and response speed.</p>
<p>Over 60 degrees, the system is very stable, but might slow down the system response to changes.</p>
<p>So after analyzing the plot, we can conclude this PID controller is stable.</p>
<h3 id="heading-step-7-nyquist-plot-for-stability-analysis">Step 7: Nyquist Plot for Stability Analysis</h3>
<p>In this code:</p>
<pre><code>plt.figure(figsize=(<span class="hljs-number">10</span>, <span class="hljs-number">6</span>))
ctrl.nyquist_plot(CL)
plt.title(<span class="hljs-string">"Nyquist Plot (Closed-Loop)"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/08/8.png" alt="Image" width="600" height="400" loading="lazy">
<em>Create the Nyquist Plot Graph</em></p>
<p>We generate this plot:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/nyquist.png" alt="Image" width="600" height="400" loading="lazy">
<em>Nyquist Plot Graph</em></p>
<p>The Nyquist Plot is a tool to help engineers quickly check if a control system is stable or not.</p>
<p>It is very simple:</p>
<ul>
<li>If there is no circle around the red cross at point (-1 0), the system is stable.</li>
<li>If there are circles around the red cross, namely clockwise circles, at point (-1 0), the system is unstable.</li>
</ul>
<p>Since there aren't circles around the red cross, this control system is stable.</p>
<h3 id="heading-last-step-after-designing-the-rocket-control-system">Last step after designing the rocket control system</h3>
<p>After completing the design of this PID control system, we can use tools like <a target="_blank" href="https://www.mathworks.com/products/simulink.html">Simulink</a> to find the necessary values for many components.</p>
<p>In other words, after getting the best PID controller variables, it's time to find the physical component values of the rocket.</p>
<p>Some of these values are:</p>
<ul>
<li>Resistor values for controlling current flow</li>
<li>Capacitor values for energy storage</li>
<li>Inductor values for managing electromagnetic interference</li>
<li>Sensor calibration values for accurate data measurement and feedback</li>
<li>Strength and durability of materials for the rocket's body and fins</li>
<li>Torque and speed requirements for servo motors</li>
<li>Spring constants for shock absorption systems</li>
<li>Pressure ratings for fuel and oxidizer tanks</li>
</ul>
<p>Thanks to Simulink, we can get all these values needed to design a rocket according to its mission.</p>
<p>With a stable control system, based on a PID controller to control the physical transfer function of a rocket, we can get all the values needed for each component.</p>
<h2 id="Conclusion">Conclusion: Non-linear control systems</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/moon.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Peter de Vink: https://www.pexels.com/photo/photo-of-full-moon-975012/</em></p>
<p>There are many methods available to us to optimize a Linear Time-Invariant (LTI) control system:</p>
<ol>
<li><strong>Root Locus Method</strong>: Adjust system poles to reduce oscillations.</li>
<li><strong>Bode Plot Analysis</strong>: Maintain phase margin and stability.</li>
<li><strong>Nyquist Plot</strong>: Confirm overall system stability.</li>
</ol>
<p>With these tools, it's possible to create a control system.</p>
<p>However, in this process, it is good practice to use methods like the Ziegler-Nichols method to more quickly find the best PID controller variables.</p>
<p>In our exploration, we worked with a very simple rocket system.</p>
<p>In real life, only non-linear tools are used because all rocket systems are non-linear systems.</p>
<p>One example is adaptive control, where the control system adjusts itself in real-time to handle changing conditions</p>
<p>Another one is Lyapunov's method. In this case, it is used for stability analysis instead of these three plots.</p>
<p>Still, the process of making these control systems is always the same. This article explained how this process works and how it is applied in a time-invariant system.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build an Interpretable Artificial Intelligence Model – Simple Python Code Example ]]>
                </title>
                <description>
                    <![CDATA[ Artificial Intelligence is being used everywhere these days. And many of the groundbreaking applications come from Machine Learning, a subfield of AI. Within Machine Learning, a field called Deep Learning represents one of the main areas of research.... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-an-interpretable-ai-deep-learning-model/</link>
                <guid isPermaLink="false">66ba533a79bcbbffd5d70c56</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Tue, 23 Jul 2024 22:11:31 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/pexels-dmitry-demidov-515774-3852577.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Artificial Intelligence is being used everywhere these days. And many of the groundbreaking applications come from Machine Learning, a subfield of AI.</p>
<p>Within Machine Learning, a field called Deep Learning represents one of the main areas of research. It is from Deep Learning that most new, truly effective AI systems are born.</p>
<p>But typically, the AI systems born from Deep Learning are quite narrow, focused systems. They can outperform humans in one very specific area for which they were made.</p>
<p>Because of this, many new developments in AI come from specialized systems or a combination of systems working together.</p>
<p>One of the bigger problems in the field of Deep Learning models is their lack of interpretability. Interpretability means understanding how decisions are made. </p>
<p>This is a big problem that has its own field, called explainable AI. This is the field within AI that focuses on making an AI model's decisions more easily understandable.</p>
<p>Here's what we'll cover in this article:</p>
<ul>
<li><a class="post-section-overview" href="#heading-artificial-intelligence-and-the-rise-of-deep-learning">Artificial Intelligence and the Rise of Deep Learning</a></li>
<li><a class="post-section-overview" href="#heading-a-big-problem-in-deep-learning-lack-of-interpretability">A big problem in deep learning: Lack of interpretability</a></li>
<li><a class="post-section-overview" href="#heading-a-solution-to-interpretability-glass-box-models">A solution to interpretability: Glass Box models</a></li>
<li><a class="post-section-overview" href="#heading-code-example-solving-the-problem-with-explainable-ai">Code example: Solving the problem with Explainable AI</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-kan-kolmogorovarnold-networks">Conclusion: KAN (Kolmogorov–Arnold Networks)</a></li>
</ul>
<p>This article won't cover dropout or other regularization techniques, hyperparameter optimization, complex architectures like CNNs, or detailed differences in gradient descent variants.</p>
<p>We'll just discuss the basics of deep learning, the lack of interpretability problem, and a code example.</p>
<h2 id="Artificial">Artificial Intelligence and the Rise of Deep Learning</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/AI.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/robot-pointing-on-a-wall-8386440/">Tara Winstead</a></em></p>
<h3 id="heading-what-is-deep-learning-in-artificial-intelligence">What is Deep Learning in Artificial Intelligence?</h3>
<p>Deep Learning is a subfield of artificial intelligence. It uses neural networks to process complex patterns, just like the strategies a sports team uses to win a match.</p>
<p>The bigger the neural network, the more capable it is of doing awesome things – like ChatGPT, for example, which uses natural language processing to answer questions and interact with users.</p>
<p>To truly understand the basics of neural networks – what every single AI model has in common that enables it to work – we need to understand activation layers.</p>
<h3 id="heading-deep-learning-training-neural-networks">Deep Learning = Training Neural Networks</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/4-2.png" alt="4-2" width="600" height="400" loading="lazy">
<em>Simple neural network</em></p>
<p>At the core of deep learning is the training of neural networks.</p>
<p>That means basically using data to get the right values of each neuron to be able to predict what we want.</p>
<p>Neural networks are made of neurons organized in layers. Each layer extracts unique features from the data.</p>
<p>This layered structure allows deep learning models to analyze and interpret complex data.</p>
<h2 id="problem">A Big Problem in Deep Learning: Lack of Interpretability</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/interptret.jpg" alt="Image" width="600" height="400" loading="lazy">
_Photo by <a target="_blank" href="https://www.pexels.com/photo/crop-unrecognizable-woman-reading-book-on-soft-bed-4170628/">Koshevaya_k</a>_</p>
<p>Deep Learning has revolutionized many fields by achieving great results in very complex tasks.</p>
<p>However, there is a big problem: the lack of interpretability</p>
<p>While it is true that neural networks can perform every well, we don't understand internally how neural networks can achieve great results.</p>
<p>In other words, we know they do very well with the tasks we give them, but not how they do them in detail.</p>
<p>It is important to know how the model thinks in fields such as healthcare and autonomous driving.</p>
<p>By understanding how a model thinks, we can be more confident in its reliability in certain critical areas.</p>
<p>So models that work in fields with strict regulations are more transparent to the law and build more trust when they're interpretable.</p>
<p>Models that allow interpretability are called <strong>glass box models</strong>. On the other hand, models that do not have this capability (that is, most of them) are called <strong>black box models.</strong></p>
<h2 id="solution">A Solution to Interpretability: Glass Box Models</h2>

<h3 id="heading-glass-box-models">Glass Box Models</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/glass-pixabay-416528.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Pixabay: https://www.pexels.com/photo/fluid-pouring-in-pint-glass-416528/</em></p>
<p>Glass box models are machine learning models designed to be easily understood by humans.</p>
<p>Glass box models provide clear insights into how they make their decisions.</p>
<p>This transparency in the decision-making process is important for trust, compliance, and improvement.</p>
<p>Below we will see a code example of an AI model that, based on a dataset to predict breast cancer, it achieves an accuracy of 97%.</p>
<p>We'll also find, based on the characteristics of the data, which were of greater importance in predicting the cancer.</p>
<h3 id="heading-black-box-models">Black Box Models</h3>
<p>In addition to glass box models, there are also black box models. </p>
<p>These models are essentially different neural network architectures used in various datasets. Some examples are:</p>
<ul>
<li><strong>CNN (Convolutional Neural Networks)</strong>: Designed specifically for image classification and interpretation.</li>
<li><strong>RNN (Recurrent Neural Networks) and LSTM (Long Short Term Memory)</strong>: Primarily used for sequential data – text and time series data. In 2017, they were surpassed by a neural network architecture called transformers in a paper called <a target="_blank" href="https://arxiv.org/abs/1706.03762">Attention is All You Need.</a></li>
<li><strong>Transformer-based architectures</strong>: Revolutionized AI in 2017 due to their ability to handle sequential data more efficiently. RNN and LSTM have limited capabilities in this regard.</li>
</ul>
<p>Nowadays, most models that process text are transformer-based models.</p>
<p>For instance, in ChatGPT, <strong>GPT</strong> stands for <strong>Generative Pre-trained Transformer</strong>, indicating a transformer neural network architecture that generates text.</p>
<p>All these models—CNN, RNN, LSTM and Transformers—are examples of narrow artificial intelligence (AI).</p>
<p>Achieving general intelligence, in my view, involves combining many of these narrow AI models to mimic human behavior.</p>
<h2 id="example">Code Example: Solving the Problem with Explainable AI</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/cancer-chokniti-khongchum-1197604-2280571.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Chokniti Khongchum: https://www.pexels.com/photo/person-holding-laboratory-flask-2280571/</em></p>
<p>In this code example, we will create an interpretable AI model based on 30 characteristics.</p>
<p>We'll also learn what the 5 characteristics are that are more important in the detection of breast cancer, based on this dataset.</p>
<p>We will use a machine learning glass box model called the Explainable Boosting Machine</p>
<p>Here is the code below, which we will see block by block below:</p>
<pre><code><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
<span class="hljs-keyword">from</span> interpret.glassbox <span class="hljs-keyword">import</span> ExplainableBoostingClassifier
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

# Load a sample dataset
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_breast_cancer
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>)

# Train an EBM model
ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)

# Make predictions
y_pred = ebm.predict(X_test)
print(f<span class="hljs-string">"Accuracy: {accuracy_score(y_test, y_pred)}"</span>)

# Interpret the model
ebm_global = ebm.explain_global(name=<span class="hljs-string">'EBM'</span>)

# Extract feature importances
feature_names = ebm_global.data()[<span class="hljs-string">'names'</span>]
importances = ebm_global.data()[<span class="hljs-string">'scores'</span>]

# Sort features by importance
sorted_idx = np.argsort(importances)
sorted_feature_names = np.array(feature_names)[sorted_idx]
sorted_importances = np.array(importances)[sorted_idx]

# Increase spacing between the feature names
y_positions = np.arange(len(sorted_feature_names)) * <span class="hljs-number">1.5</span>  # Increase multiplier <span class="hljs-keyword">for</span> more space

# Plot feature importances
plt.figure(figsize=(<span class="hljs-number">12</span>, <span class="hljs-number">14</span>))  # Increase figure height <span class="hljs-keyword">if</span> necessary
plt.barh(y_positions, sorted_importances, color=<span class="hljs-string">'skyblue'</span>, align=<span class="hljs-string">'center'</span>)
plt.yticks(y_positions, sorted_feature_names)
plt.xlabel(<span class="hljs-string">'Importance'</span>)
plt.title(<span class="hljs-string">'Feature Importances from Explainable Boosting Classifier'</span>)
plt.gca().invert_yaxis()

# Adjust spacing
plt.subplots_adjust(left=<span class="hljs-number">0.3</span>, right=<span class="hljs-number">0.95</span>, top=<span class="hljs-number">0.95</span>, bottom=<span class="hljs-number">0.08</span>)  # Fine-tune the margins <span class="hljs-keyword">if</span> needed

plt.show()
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/1-4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full Code</em></p>
<p>Alright, now let's break it down.</p>
<h3 id="heading-importing-libraries">Importing Libraries</h3>
<p>First, we'll import the libraries we need for our example. You can do that with the following code:</p>
<pre><code><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
<span class="hljs-keyword">from</span> interpret.glassbox <span class="hljs-keyword">import</span> ExplainableBoostingClassifier
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/2-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries</em></p>
<p>These are the libraries we are going to use:</p>
<ul>
<li><a target="_blank" href="https://pandas.pydata.org/">Pandas</a>: This is a Python library used for data manipulation and analysis.</li>
<li><a target="_blank" href="https://scikit-learn.org/stable/index.html">sklearn</a>: The <a target="_blank" href="https://scikit-learn.org/stable/index.html">scikit-learn library</a> is used to implement machine learning algorithms. We're importing it for data pre processing and model evaluation.</li>
<li><a target="_blank" href="https://interpret.ml/">Interpret</a>: The <a target="_blank" href="https://interpret.ml/">interpretAI</a> Python library is what we'll use to import the model we'll use. </li>
<li><a target="_blank" href="https://matplotlib.org/">Matplotlib</a>: A Python library used to make graphs in Python.</li>
<li><a target="_blank" href="https://numpy.org/">Numpy</a>: Used for very fast numerical computations.</li>
</ul>
<h3 id="heading-loading-preparing-the-dataset-and-splitting-the-data">Loading, Preparing the Dataset, and Splitting the Data</h3>
<pre><code># Load a sample dataset
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_breast_cancer
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/3-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Loading, Preparing the Dataset, and Splitting the Data</em></p>
<p><strong>First, we load a sample dataset</strong>: We import a breast cancer dataset using the Interpret library.</p>
<p><strong>Next, we prepare the data</strong>: The features (data points) from the dataset are organized into a table format, where each column is labeled with a specific feature name. The target outcomes (labels) from the dataset are stored separately.</p>
<p><strong>Then we split the data into training and testing sets</strong>: The data is divided into two parts: one for training the model and one for testing the model. 80% of the data is used for training, while 20% is reserved for testing.</p>
<p>A specific random seed is set to ensure that the data split is consistent every time the code is run.</p>
<p>Quick note: In real life, the dataset is pre-processed with data manipulation techniques to make the AI model faster and to make it smaller.</p>
<h3 id="heading-training-the-model-making-predictions-and-evaluating-the-model">Training the Model, Making Predictions, and Evaluating the Model</h3>
<pre><code># Train an EBM model
ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)

# Make predictions
y_pred = ebm.predict(X_test)
print(f<span class="hljs-string">"Accuracy: {accuracy_score(y_test, y_pred)}"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/4-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Training the Model, Making Predictions and Evaluating the Model</em></p>
<p><strong>First, we train an EBM model</strong>: We initialize an Explainable Boosting Machine model and then train it using the training data. In this step, with the data we have, we create the model. </p>
<p>This way, with one line of code, we create the AI model based on the dataset that will predict breast cancer.</p>
<p><strong>Then we make our predictions</strong>: The trained EBM model is used to make predictions on the test data. Next, we calculate and print the accuracy of the model's predictions.</p>
<h3 id="heading-interpreting-the-model-extracting-and-sorting-feature-importances">Interpreting the Model, Extracting, and Sorting Feature Importances</h3>
<pre><code># Interpret the model
ebm_global = ebm.explain_global(name=<span class="hljs-string">'EBM'</span>)

# Extract feature importances
feature_names = ebm_global.data()[<span class="hljs-string">'names'</span>]
importances = ebm_global.data()[<span class="hljs-string">'scores'</span>]

# Sort features by importance
sorted_idx = np.argsort(importances)
sorted_feature_names = np.array(feature_names)[sorted_idx]
sorted_importances = np.array(importances)[sorted_idx]
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/5-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Interpreting the Model, Extracting and Sorting Feature Importances</em></p>
<p><strong>At this point, we need to interpret the model</strong>: The global explanation of the trained Explainable Boosting Machine (EBM) model is obtained, providing an overview of how the model makes decisions.</p>
<p>In this model, we conclude that the accuracy is approximately 0.9736842105263158 – which means the model is accurate 97 % of the time.</p>
<p>Of course, this only applies to the breast cancer data from <strong>this dataset</strong> – not for every single case of breast cancer detection. Since this is a sample, the dataset does not represent the full population of people seeking to detect breast cancer.</p>
<p>Quick note: In the real world, for classification, we'd use the <strong>F1 score</strong> instead of accuracy to predict how accurate a model is due to its consideration of both <strong>precision</strong> and <strong>recall</strong>.</p>
<p><strong>Next, we extract feature importances</strong>: We extract the names and corresponding importance scores of the features used by the model from the global explanation.</p>
<p><strong>Then we sort the features by importance</strong>: The features are sorted based on their importance scores, resulting in a list of feature names and their respective importance scores ordered from least to most important.</p>
<h3 id="heading-plotting-feature-importances">Plotting Feature Importances</h3>
<pre><code># Increase spacing between the feature names
y_positions = np.arange(len(sorted_feature_names)) * <span class="hljs-number">1.5</span>  # Increase multiplier <span class="hljs-keyword">for</span> more space

# Plot feature importances
plt.figure(figsize=(<span class="hljs-number">12</span>, <span class="hljs-number">14</span>))  # Increase figure height <span class="hljs-keyword">if</span> necessary
plt.barh(y_positions, sorted_importances, color=<span class="hljs-string">'skyblue'</span>, align=<span class="hljs-string">'center'</span>)
plt.yticks(y_positions, sorted_feature_names)
plt.xlabel(<span class="hljs-string">'Importance'</span>)
plt.title(<span class="hljs-string">'Feature Importances from Explainable Boosting Classifier'</span>)
plt.gca().invert_yaxis()

# Adjust spacing
plt.subplots_adjust(left=<span class="hljs-number">0.3</span>, right=<span class="hljs-number">0.95</span>, top=<span class="hljs-number">0.95</span>, bottom=<span class="hljs-number">0.08</span>)  # Fine-tune the margins <span class="hljs-keyword">if</span> needed

plt.show()
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/6-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Plotting Feature Importances</em></p>
<p><strong>Now we need to increase the spacing between feature names</strong>: The positions of the feature names on the y-axis are adjusted to increase the spacing between them.</p>
<p><strong>Then we plot feature importances</strong>: A horizontal bar plot is created to visualize the feature importances. The plot's size is set to ensure it is clear and readable.</p>
<p>The bars represent the importance scores of the features, and the feature names are displayed along the y-axis.</p>
<p>The plot's x-axis is labeled "Importance," and the title "Feature Importances from Explainable Boosting Classifier" is added. The y-axis is inverted to have the most important features at the top.</p>
<p><strong>Then we adjust the spacing</strong>: The margins around the plot are fine-tuned to ensure proper spacing and a neat appearance.</p>
<p><strong>Finally, we display the olot</strong>: The plot is displayed to visualize the feature importances effectively.</p>
<p>The final result should look like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/interpret-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Features importance graph</em></p>
<p>This way, we can conclude from an artificial intelligence model that is interpretable and has an accuracy of 97%, that the five most important factors in detecting breast tumors are: </p>
<ul>
<li>Worst concave points </li>
<li>Worst texture </li>
<li>Worst area </li>
<li>Mean concave points </li>
<li>Area error &amp; worst concavity</li>
</ul>
<p>Again, this is according to the provided dataset.</p>
<p>So according to the population that this sample dataset represents, we can conclude in a <strong>data-driven way</strong> that these factors are key indicators for breast cancer tumor detection. </p>
<p>This way, we can conclude from an artificial intelligence model, which methods interpret the model, that it provides clear insights into the significant features for prediction.</p>
<h2 id="Conclusion">Conclusion: KAN (Kolmogorov–Arnold Networks) </h2>

<p>Thanks to explainable AI, we can study populations using new data-driven methods.</p>
<p>Instead of only using traditional statistics, surveys, and manual data analysis, we can draw conclusions more accurately using an AI programming library and a database or Excel file.</p>
<p>But this is not the only way to have models built with explainable AI.</p>
<p>In April 2024, a paper called <a target="_blank" href="https://arxiv.org/html/2404.19756v1">KAN: Kolmogorov–Arnold Networks</a> was published that might shake up the field even more.</p>
<p>Kolmogorov–Arnold Networks (KANs) promise to be more accurate and easier to understand than traditional models and perform better.</p>
<p>They are also easier to visualize and interact with. So we'll see what happens with them.</p>
<p>You can find the full code here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Quantum Artificial Intelligence Model – With Python Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ Machine learning (ML) is one of the most important subareas of AI used in building great AI systems. In ML, deep learning is a narrow area focused solely on neural networks. Through the field of deep learning, systems like ChatGPT and many other AI m... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-quantum-ai-model/</link>
                <guid isPermaLink="false">66ba5330f77647345442b9d1</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Tue, 23 Jul 2024 18:28:43 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/article_cover.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Machine learning (ML) is one of the most important subareas of AI used in building great AI systems.</p>
<p>In ML, deep learning is a narrow area focused solely on neural networks. Through the field of deep learning, systems like ChatGPT and many other AI models can be created. In other words, ChatGPT is just a giant system based on neural networks. </p>
<p>However, there is a big problem with deep learning: computational efficiency. Creating big and effective AI systems with neural networks often requires a lot of energy, which is expensive.</p>
<p>So, the more efficient the hardware is, the better. There are many solutions to solve this problem, one of which is quantum computing.</p>
<p>This article hopes to show, in plain English, the connection between quantum computing and artificial intelligence.</p>
<p>We'll talk about these:</p>
<ul>
<li><a class="post-section-overview" href="#heading-artificial-intelligence-and-the-rise-of-deep-learning">Artificial Intelligence and the Rise of Deep Learning</a></li>
<li><a class="post-section-overview" href="#heading-a-big-problem-in-deep-learning-computational-efficiency">A Big Problem in Deep Learning: Computational Efficiency</a></li>
<li><a class="post-section-overview" href="#heading-a-solution-quantum-computing">A Solution: Quantum Computing</a></li>
<li><a class="post-section-overview" href="#heading-code-example-a-quantum-ai-model-for-quantum-chemistry">Code Example: A Quantum AI Model for Quantum Chemistry</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-limitations-of-quantum-computing-and-development">Conclusion: Limitations of Quantum Computing and Development</a></li>
</ul>
<h2 id="Artificial">Artificial Intelligence and the Rise of Deep Learning</h2>

<h3 id="heading-what-is-deep-learning-in-artificial-intelligence">What is Deep Learning in Artificial Intelligence?</h3>
<p>Deep learning is a subfield of artificial intelligence. It uses neural networks to process complex patterns, just like the strategies a sports team uses to win a match.</p>
<p>The bigger the neural network, the more capable it is of doing awesome things – like ChatGPT, for example, which uses natural language processing to answer questions and interact with users.</p>
<p>To truly understand the basics of neural networks – what every single AI model has in common that enables it to work – we need to understand activation layers.</p>
<h3 id="heading-deep-learning-training-neural-networks">Deep Learning = Training Neural Networks</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/4-2.png" alt="4-2" width="600" height="400" loading="lazy">
<em>Simple neural network</em></p>
<p>At the core of deep learning is the training of neural networks. That means using data to get the right values for each neuron to be able to predict what we want.</p>
<p>Neural networks are made of neurons organized in layers. Each layer extracts unique features from the data.</p>
<p>This layered structure allows deep learning models to analyze and interpret complex data.</p>
<h2 id="problem">A Big Problem in Deep learning: Computational Efficiency</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/data-brett-sayles-4597280.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Brett Sayles: https://www.pexels.com/photo/black-hardwares-on-data-server-room-4597280/</em></p>
<p>Deep learning powers a lot of the transformation AI makes in the society. However, it comes with a big problem: computational efficiency.</p>
<p>Training deep learning AI systems requires massive amounts of data and computational power. This can take minutes to weeks and in the process, it consumes a lot of energy and computational resources.</p>
<p>There are many solutions to this problem, such as better algorithmic efficiency. </p>
<p>In large language models, this has been the focus of much AI research. To make smaller models have the same performance as larger ones.</p>
<p>Another solution, besides algorithmic efficiency, is better computational efficiency. Quantum computing is one of the solutions related to better computational efficiency.</p>
<h2 id="Solution">A Solution: Quantum Computing</h2>

<p>Quantum computing is a promising solution to the computational efficiency problem in deep learning.</p>
<p>While normal computers work in bits (either 0 or 1), quantum computers work with qubits – can be 0 and 1 at the same time.</p>
<p>With the qubits representing 0 and 1 at the same time, it is possible to process many possibilities simultaneously, thanks to a property called superposition in quantum physics.</p>
<p>This makes the quantum computers, for certain tasks, far more efficient than normal computers.</p>
<p>This way, it is also possible to have quantum-based algorithms that are more efficient than normal algorithms. This way, reducing the energy consumption used when creating AI models.</p>
<h3 id="heading-why-are-quantum-computers-not-so-widely-used">Why Are Quantum Computers Not So Widely Used?</h3>
<p>The problem with quantum computation is that there isn't a good, cheap physical representation of qubits.</p>
<p>Bits are created and managed with logic gates made from tiny transistors, which can be easily created by the billions.</p>
<p>Qubits are created and managed by superconducting circuits, trapped ions, and topological qubits, which are all very expensive.</p>
<p>This is the biggest problem in quantum computation. However, IBM, Amazon, and many others in cloud services allow people to run code on their quantum computers.</p>
<h2 id="Code">Code Example: A Quantum AI Model for Quantum Chemistry</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/chemnistry-pixabay-248152.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Pixabay: https://www.pexels.com/photo/two-clear-glass-jars-beside-several-flasks-248152/</em></p>
<p>In this code example, we'll solve a quantum chemistry problem:</p>
<p><em>What is the lowest energy level of the H₂ molecule using quantum computing?</em></p>
<p>Before understanding the problem at hand, let's discuss quantum chemistry.</p>
<h3 id="heading-what-is-quantum-chemistry">What is Quantum Chemistry?</h3>
<p>Quantum chemistry is a field of science that looks at how electrons behave in atoms and molecules.</p>
<p>It is about using quantum physics to understand how electrons, atoms, molecules and many more tiny particles interact and form different chemical substances.</p>
<h4 id="heading-the-problem-we-want-to-solve">The Problem We Want to Solve</h4>
<p>We want to find the "ground state energy" of the H₂ molecule. </p>
<p>The H₂ molecule means hydrogen gas, which is present in:</p>
<ul>
<li>Water</li>
<li>Organic compounds</li>
<li>Stars</li>
</ul>
<p>Actually, life on Earth would not be possible without it. </p>
<p>By finding the "ground state energy," which is the lowest possible energy that the molecule can have, we can know its most stable form and properties. </p>
<p>This allows scientists to better understand chemical reactions related to H₂. </p>
<p>With classical computers, this problem can be very complex due to a huge number of possibilities and intricate interactions. </p>
<p>With quantum computers, qubits are good representations of electrons, which can directly simulate the behavior of electrons in molecules.</p>
<h3 id="heading-approximating-with-the-vqe-variational-quantum-eigensolver-vqe">Approximating with the VQE (Variational Quantum Eigensolver (VQE)</h3>
<p>The Variational Quantum Eigensolver (VQE) is a hybrid algorithm that leverages both quantum and classical computing. </p>
<p>In this example, the VQE algorithm is used to find the ground state energy of a simple H₂ molecule. </p>
<p>The code is designed to run on a quantum simulator (which is a classical computer running a quantum algorithm).</p>
<p>However, it can be adapted to run on actual quantum hardware through a cloud-based quantum computing service. </p>
<p>This would involve using both quantum and classical resources in practice. Let’s go through the code step by step!</p>
<pre><code><span class="hljs-keyword">import</span> pennylane <span class="hljs-keyword">as</span> qml
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt

# Define the molecule (H2 at bond length <span class="hljs-keyword">of</span> <span class="hljs-number">0.74</span> Å)
symbols = [<span class="hljs-string">"H"</span>, <span class="hljs-string">"H"</span>]
coordinates = np.array([<span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.74</span>])

# Generate the Hamiltonian <span class="hljs-keyword">for</span> the molecule
hamiltonian, qubits = qml.qchem.molecular_hamiltonian(
    symbols, coordinates
)

# Define the quantum device
dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=qubits)

# Define the ansatz (variational quantum circuit)
def ansatz(params, wires):
    qml.BasisState(np.array([<span class="hljs-number">0</span>] * qubits), wires=wires)
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(qubits):
        qml.RY(params[i], wires=wires[i])
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(qubits - <span class="hljs-number">1</span>):
        qml.CNOT(wires=[wires[i], wires[i + <span class="hljs-number">1</span>]])

# Define the cost <span class="hljs-function"><span class="hljs-keyword">function</span>
@<span class="hljs-title">qml</span>.<span class="hljs-title">qnode</span>(<span class="hljs-params">dev</span>)
<span class="hljs-title">def</span> <span class="hljs-title">cost_fn</span>(<span class="hljs-params">params</span>):
    <span class="hljs-title">ansatz</span>(<span class="hljs-params">params, wires=range(qubits)</span>)
    <span class="hljs-title">return</span> <span class="hljs-title">qml</span>.<span class="hljs-title">expval</span>(<span class="hljs-params">hamiltonian</span>)

# <span class="hljs-title">Set</span> <span class="hljs-title">a</span> <span class="hljs-title">fixed</span> <span class="hljs-title">seed</span> <span class="hljs-title">for</span> <span class="hljs-title">reproducibility</span>
<span class="hljs-title">np</span>.<span class="hljs-title">random</span>.<span class="hljs-title">seed</span>(<span class="hljs-params"><span class="hljs-number">42</span></span>)

# <span class="hljs-title">Set</span> <span class="hljs-title">the</span> <span class="hljs-title">initial</span> <span class="hljs-title">parameters</span>
<span class="hljs-title">params</span> = <span class="hljs-title">np</span>.<span class="hljs-title">random</span>.<span class="hljs-title">random</span>(<span class="hljs-params">qubits, requires_grad=True</span>)

# <span class="hljs-title">Choose</span> <span class="hljs-title">an</span> <span class="hljs-title">optimizer</span>
<span class="hljs-title">optimizer</span> = <span class="hljs-title">qml</span>.<span class="hljs-title">GradientDescentOptimizer</span>(<span class="hljs-params">stepsize=<span class="hljs-number">0.4</span></span>)

# <span class="hljs-title">Number</span> <span class="hljs-title">of</span> <span class="hljs-title">optimization</span> <span class="hljs-title">steps</span>
<span class="hljs-title">max_iterations</span> = 100
<span class="hljs-title">conv_tol</span> = 1<span class="hljs-title">e</span>-06

# <span class="hljs-title">Optimization</span> <span class="hljs-title">loop</span>
<span class="hljs-title">energies</span> = []

<span class="hljs-title">for</span> <span class="hljs-title">n</span> <span class="hljs-title">in</span> <span class="hljs-title">range</span>(<span class="hljs-params">max_iterations</span>):
    <span class="hljs-title">params</span>, <span class="hljs-title">prev_energy</span> = <span class="hljs-title">optimizer</span>.<span class="hljs-title">step_and_cost</span>(<span class="hljs-params">cost_fn, params</span>)

    <span class="hljs-title">energy</span> = <span class="hljs-title">cost_fn</span>(<span class="hljs-params">params</span>)
    <span class="hljs-title">energies</span>.<span class="hljs-title">append</span>(<span class="hljs-params">energy</span>)
    <span class="hljs-title">if</span> <span class="hljs-title">np</span>.<span class="hljs-title">abs</span>(<span class="hljs-params">energy - prev_energy</span>) &lt; <span class="hljs-title">conv_tol</span>:
        <span class="hljs-title">break</span>

    <span class="hljs-title">print</span>(<span class="hljs-params">f<span class="hljs-string">"Step = {n}, Energy = {energy:.8f} Ha"</span></span>)

<span class="hljs-title">print</span>(<span class="hljs-params">f<span class="hljs-string">"Final ground state energy = {energy:.8f} Ha"</span></span>)

# <span class="hljs-title">Visualize</span> <span class="hljs-title">the</span> <span class="hljs-title">results</span>
<span class="hljs-title">import</span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title">as</span> <span class="hljs-title">plt</span>

<span class="hljs-title">iterations</span> = <span class="hljs-title">range</span>(<span class="hljs-params">len(energies)</span>)

<span class="hljs-title">plt</span>.<span class="hljs-title">plot</span>(<span class="hljs-params">iterations, energies</span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-params"><span class="hljs-string">'Iteration'</span></span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-params"><span class="hljs-string">'Energy (Ha)'</span></span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">title</span>(<span class="hljs-params"><span class="hljs-string">'Convergence of VQE for H2 Molecule'</span></span>)
<span class="hljs-title">plt</span>.<span class="hljs-title">show</span>(<span class="hljs-params"></span>)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/1-5.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full Code Image</em></p>
<h3 id="heading-importing-libraries">Importing Libraries</h3>
<pre><code><span class="hljs-keyword">import</span> pennylane <span class="hljs-keyword">as</span> qml
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/2-4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries</em></p>
<ul>
<li><a target="_blank" href="https://pennylane.ai/">pennylane</a>: A library for quantum computing that provides tools for creating and optimizing quantum circuits, and for running machine learning quantum based algorithms.</li>
<li><a target="_blank" href="https://numpy.org/">numpy</a>: A library for numerical operations in Python, used here for handling arrays and mathematical computations.</li>
<li><a target="_blank" href="https://matplotlib.org/">matplotlib</a>: A library for creating visualizations and plots in Python, used here to graph the convergence of the VQE algorithm.</li>
</ul>
<h3 id="heading-defining-the-molecule-and-generating-the-hamiltonian">Defining the Molecule and Generating the Hamiltonian</h3>
<pre><code># Define the molecule (H2 at bond length <span class="hljs-keyword">of</span> <span class="hljs-number">0.74</span> Å)
symbols = [<span class="hljs-string">"H"</span>, <span class="hljs-string">"H"</span>]
coordinates = np.array([<span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.0</span>, <span class="hljs-number">0.74</span>])

# Generate the Hamiltonian <span class="hljs-keyword">for</span> the molecule
hamiltonian, qubits = qml.qchem.molecular_hamiltonian(
    symbols, coordinates
)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/3-4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining the Molecule and generating the Hamiltonian</em></p>
<p><strong>Defining the Molecule</strong>:</p>
<ul>
<li>We are defined a hydrogen molecule (H₂).</li>
<li><code>symbols = ["H", "H"]</code>: This means the molecule consists of two hydrogen (H) atoms.</li>
<li><code>coordinates = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.74])</code>: This gives the positions of the two hydrogen atoms. The first hydrogen atom is at the origin (0.0, 0.0, 0.0), and the second hydrogen atom is at (0.0, 0.0, 0.74), which means it is 0.74 angstroms away from the first atom along the z-axis.</li>
</ul>
<p><strong>Generating the Hamiltonian</strong>:</p>
<ul>
<li><code>hamiltonian, qubits = qml.qchem.molecular_hamiltonian(symbols, coordinates)</code>: This line generates the Hamiltonian for the hydrogen molecule. The Hamiltonian is a mathematical object used to describe the energy of the molecule.</li>
<li><code>hamiltonian</code>: Represents the energy operator for the molecule.</li>
<li><code>qubits</code>: Represents the number of quantum bits (qubits) needed to simulate the molecule on a quantum computer.</li>
</ul>
<h3 id="heading-defining-the-quantum-device-and-ansatz-variational-quantum-circuit">Defining the Quantum Device and Ansatz (Variational Quantum Circuit)</h3>
<pre><code># Define the quantum device
dev = qml.device(<span class="hljs-string">"default.qubit"</span>, wires=qubits)

# Define the ansatz (variational quantum circuit)
def ansatz(params, wires):
    qml.BasisState(np.array([<span class="hljs-number">0</span>] * qubits), wires=wires)
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(qubits):
        qml.RY(params[i], wires=wires[i])
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(qubits - <span class="hljs-number">1</span>):
        qml.CNOT(wires=[wires[i], wires[i + <span class="hljs-number">1</span>]])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/4-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining the Quantum Device and Ansatz (Variational Quantum Circuit)</em></p>
<p><strong>Defining the Quantum Device</strong>:</p>
<ul>
<li><code>dev = qml.device("default.qubit", wires=qubits)</code>: This line sets up a quantum computing device to simulate our molecule.</li>
<li><code>"default.qubit"</code>: This specifies the type of quantum simulator we are using (a default qubit-based simulator).</li>
<li><code>wires=qubits</code>: This tells the simulator how many qubits (quantum bits) it needs to use, based on the number of qubits we determined earlier.</li>
</ul>
<p><strong>Defining the Ansatz (Variational Quantum Circuit)</strong>:</p>
<ul>
<li><code>def ansatz(params, wires)</code>: This defines a function named <code>ansatz</code> which describes the variational quantum circuit. This circuit will be used to find the ground state energy of the molecule.</li>
<li><code>qml.BasisState(np.array([0] * qubits), wires=wires)</code>: This initializes the qubits in the state 0. The <code>np.array([0] * qubits)</code> creates an array with zeros, one for each qubit.</li>
<li><code>for i in range(qubits): qml.RY(params[i], wires=wires[i])</code>: This loop applies a rotation around the Y-axis to each qubit. <code>params[i]</code> provides the angle for each rotation.</li>
<li><code>for i in range(qubits - 1): qml.CNOT(wires=[wires[i], wires[i + 1]])</code>: This loop applies Controlled-NOT (CNOT) gates between consecutive qubits, entangling them.</li>
</ul>
<h3 id="heading-defining-the-cost-function-setting-initial-parameters-and-optimizer">Defining the Cost Function, Setting Initial Parameters and Optimizer</h3>
<pre><code># Define the cost <span class="hljs-function"><span class="hljs-keyword">function</span>
@<span class="hljs-title">qml</span>.<span class="hljs-title">qnode</span>(<span class="hljs-params">dev</span>)
<span class="hljs-title">def</span> <span class="hljs-title">cost_fn</span>(<span class="hljs-params">params</span>):
    <span class="hljs-title">ansatz</span>(<span class="hljs-params">params, wires=range(qubits)</span>)
    <span class="hljs-title">return</span> <span class="hljs-title">qml</span>.<span class="hljs-title">expval</span>(<span class="hljs-params">hamiltonian</span>)

# <span class="hljs-title">Set</span> <span class="hljs-title">a</span> <span class="hljs-title">fixed</span> <span class="hljs-title">seed</span> <span class="hljs-title">for</span> <span class="hljs-title">reproducibility</span>
<span class="hljs-title">np</span>.<span class="hljs-title">random</span>.<span class="hljs-title">seed</span>(<span class="hljs-params"><span class="hljs-number">42</span></span>)

# <span class="hljs-title">Set</span> <span class="hljs-title">the</span> <span class="hljs-title">initial</span> <span class="hljs-title">parameters</span>
<span class="hljs-title">params</span> = <span class="hljs-title">np</span>.<span class="hljs-title">random</span>.<span class="hljs-title">random</span>(<span class="hljs-params">qubits, requires_grad=True</span>)

# <span class="hljs-title">Choose</span> <span class="hljs-title">an</span> <span class="hljs-title">optimizer</span>
<span class="hljs-title">optimizer</span> = <span class="hljs-title">qml</span>.<span class="hljs-title">GradientDescentOptimizer</span>(<span class="hljs-params">stepsize=<span class="hljs-number">0.4</span></span>)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/5-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining the Cost Function, Setting Initial Parameters and Optimizer</em></p>
<p><strong>Defining the Cost Function</strong>:</p>
<ul>
<li><code>@qml.qnode(dev)</code>: This line is a decorator that transforms the <code>cost_fn</code> function into a quantum node, allowing it to run on the quantum device <code>dev</code>.</li>
<li><code>def cost_fn(params)</code>: This defines a function named <code>cost_fn</code> that takes some parameters (<code>params</code>) as input.</li>
<li><code>ansatz(params, wires=range(qubits))</code>: Inside this function, we call the previously defined <code>ansatz</code> function, passing in the parameters and specifying that it should use all the qubits.</li>
<li><code>return qml.expval(hamiltonian)</code>: This line returns the expected value of the Hamiltonian, which represents the energy of the molecule. The cost function is what we aim to minimize to find the ground state energy.</li>
</ul>
<p><strong>Setting a Fixed Seed for Reproducibility</strong>:</p>
<ul>
<li><code>np.random.seed(42)</code>: This line sets a fixed seed for the random number generator. This ensures that the random numbers generated will be the same each time the code is run, making the results reproducible.</li>
</ul>
<p><strong>Setting the Initial Parameters</strong>:</p>
<ul>
<li><code>params = np.random.random(qubits, requires_grad=True)</code>: This line initializes the parameters for the ansatz with random values. The number of parameters is equal to the number of qubits. The <code>requires_grad=True</code> part indicates that these parameters can be adjusted during optimization.</li>
</ul>
<p><strong>Choosing an Optimizer</strong>:</p>
<ul>
<li><code>optimizer = qml.GradientDescentOptimizer(stepsize=0.4)</code>: This line creates an optimizer that will adjust the parameters to minimize the cost function. Specifically, it uses gradient descent with a step size of 0.4.</li>
</ul>
<h3 id="heading-optimization-loop">Optimization Loop</h3>
<pre><code># <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> optimization steps
max_iterations = <span class="hljs-number">100</span>
conv_tol = <span class="hljs-number">1e-06</span>

# Optimization loop
energies = []

<span class="hljs-keyword">for</span> n <span class="hljs-keyword">in</span> range(max_iterations):
    params, prev_energy = optimizer.step_and_cost(cost_fn, params)

    energy = cost_fn(params)
    energies.append(energy)
    <span class="hljs-keyword">if</span> np.abs(energy - prev_energy) &lt; conv_tol:
        <span class="hljs-keyword">break</span>

    print(f<span class="hljs-string">"Step = {n}, Energy = {energy:.8f} Ha"</span>)

print(f<span class="hljs-string">"Final ground state energy = {energy:.8f} Ha"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/6-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Optimization Loop</em></p>
<p><strong>Setting the Number of Optimization Steps</strong>:</p>
<ul>
<li><code>max_iterations = 100</code>: This sets the maximum number of steps the optimization will take. In this case, it is 100 steps.</li>
<li><code>conv_tol = 1e-06</code>: This defines the convergence tolerance. If the change in energy between steps is less than this value, the optimization will stop.</li>
</ul>
<p><strong>Optimization Loop</strong>:</p>
<ul>
<li><code>energies = []</code>: This initializes an empty list to store the energies calculated at each step.</li>
</ul>
<p><strong>Looping Through Optimization Steps</strong>:</p>
<ul>
<li><code>for n in range(max_iterations):</code>: This starts a loop that will run up to <code>max_iterations</code> times.</li>
<li><code>params, prev_energy = optimizer.step_and_cost(cost_fn, params)</code>: This line performs one step of optimization. It updates the parameters and returns the new parameters and the previous energy.</li>
<li><code>energy = cost_fn(params)</code>: This calculates the current energy using the updated parameters.</li>
<li><code>energies.append(energy)</code>: This adds the current energy to the <code>energies</code> list.</li>
<li><code>if np.abs(energy - prev_energy) &lt; conv_tol: break</code>: This checks if the absolute difference between the current energy and the previous energy is less than the convergence tolerance. If it is, the loop stops early because the optimization has converged.</li>
<li><code>print(f"Step = {n}, Energy = {energy:.8f} Ha")</code>: This prints the current step number and the energy in Hartree (Ha) to eight decimal places.</li>
</ul>
<p><strong>Printing the Final Energy</strong>:</p>
<ul>
<li><code>print(f"Final ground state energy = {energy:.8f} Ha")</code>: After the loop, this prints the final ground state energy.</li>
</ul>
<h3 id="heading-visualizing-the-results">Visualizing the Results</h3>
<pre><code># Visualize the results
iterations = range(len(energies))

plt.plot(iterations, energies)
plt.xlabel(<span class="hljs-string">'Iteration'</span>)
plt.ylabel(<span class="hljs-string">'Energy (Ha)'</span>)
plt.title(<span class="hljs-string">'Convergence of VQE for H2 Molecule'</span>)
plt.show()
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/7.png" alt="Image" width="600" height="400" loading="lazy">
<em>Visualizing the Results</em></p>
<p><strong>Setting Up the Data for Visualization</strong>:</p>
<ul>
<li><code>iterations = range(len(energies))</code>: This creates a range object representing the number of iterations (steps) the optimization went through. <code>len(energies)</code> gives the number of energy values recorded.</li>
</ul>
<p><strong>Plotting the Results</strong>:</p>
<ul>
<li><code>plt.plot(iterations, energies)</code>: This line creates a plot with the iteration numbers on the x-axis and the corresponding energy values on the y-axis.</li>
<li><code>plt.xlabel('Iteration')</code>: This sets the label for the x-axis to "Iteration".</li>
<li><code>plt.ylabel('Energy (Ha)')</code>: This sets the label for the y-axis to "Energy (Ha)", where "Ha" stands for Hartree, a unit of energy.</li>
<li><code>plt.title('Convergence of VQE for H2 Molecule')</code>: This sets the title of the plot to "Convergence of VQE for H2 Molecule".</li>
<li><code>plt.show()</code>: This displays the plot.</li>
</ul>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/H2H.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The graph titled "Convergence of VQE for H2 Molecule" shows the energy (in Hartree, Ha) of the H2 molecule plotted against the number of iterations of the Variational Quantum Eigensolver (VQE) algorithm.</p>
<ul>
<li><strong>X-Axis (Iteration):</strong> Number of VQE iterations.</li>
<li><strong>Y-Axis (Energy (Ha)):</strong> Energy of the H2 molecule in Hartree.</li>
</ul>
<h3 id="heading-key-points">Key Points:</h3>
<ul>
<li><strong>Initial Energy:</strong> Approximately 1.4 Ha at iteration 0.</li>
<li><strong>Rapid Decrease:</strong> Energy quickly drops within the first 20 iterations.</li>
<li><strong>Plateau:</strong> Energy stabilizes around 0.4 Ha after 20 iterations, indicating convergence to an optimal or near-optimal solution.</li>
</ul>
<h2 id="Conclusion">Conclusion: Limitations of Quantum Computing and Development</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/PC-richasharma96-4247412.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Richa Sharma: https://www.pexels.com/photo/ceramic-mug-on-black-laptop-on-table-in-office-4247412/</em></p>
<p>Besides making AI algorithms far more computationally efficient, quantum computing can revolutionize many fields like:</p>
<ul>
<li>Drug discovery</li>
<li>Material science</li>
<li>Cryptography</li>
<li>Financial modeling</li>
<li>Optimization problems</li>
<li>Climate modeling</li>
<li>Machine learning</li>
</ul>
<p>However, for all of us to use quantum computing, a way to physically represent qubits small enough to fit on our laptops is needed. That will take years.</p>
<p>The full code can be found here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What Are Monte Carlo Methods? How to Predict the Future with Python Simulations ]]>
                </title>
                <description>
                    <![CDATA[ Monte Carlo methods have revolutionized programming and engineering. These methods use the power of randomness, which makes them effective tools that help developers solve difficult problems in many fields. Monte Carlo methods have been adopted in ph... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-are-monte-carlo-methods/</link>
                <guid isPermaLink="false">66ba534a80dbd3f269f5887d</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Tue, 16 Jul 2024 21:42:38 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/pexels-matej-117839-716661.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Monte Carlo methods have revolutionized programming and engineering.</p>
<p>These methods use the power of randomness, which makes them effective tools that help developers solve difficult problems in many fields.</p>
<p>Monte Carlo methods have been adopted in physics, finance, engineering and many other areas where deterministic methods are often impractical to solve problems.</p>
<p>With Monte Carlo methods, simulations and very complex computations have become efficient and easy to manage.</p>
<p>There are many variants of Monte Carlo methods. But all of them share the idea of using randomness to approximate solutions to hard problems. In this article, you'll learn all about Monte Carlo methods.</p>
<h2 id="heading-what-well-cover">What we'll cover:</h2>
<ul>
<li><a class="post-section-overview" href="#heading-understanding-monte-carlo-methods-through-an-analogy">Understanding Monte Carlo Methods Through an Analogy</a></li>
<li><a class="post-section-overview" href="#heading-what-are-monte-carlo-methods-a-plain-english-guide">What Are Monte Carlo Methods? A Plain English Guide</a></li>
<li><a class="post-section-overview" href="#heading-real-world-applications-of-monte-carlo-methods">Real-World Applications of Monte Carlo Methods</a></li>
<li><a class="post-section-overview" href="#heading-exploring-different-types-of-monte-carlo-methods">Different Types of Monte Carlo Methods</a></li>
<li><a class="post-section-overview" href="#heading-practical-implementation-monte-carlo-methods-in-python">Practical Implementation: Monte Carlo Methods in Python</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-the-future-of-monte-carlo-methods">The Future of Monte Carlo Methods</a></li>
</ul>
<h3 id="heading-pre-requisites">Pre-requisites</h3>
<p>You should have a basic knowledge of statistics to understand everything in this article.</p>
<p>If you need to brush up on your stats skills, I recommend checking out this freeCodeCamp course:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/xxpc-HPKN28" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h2 id="understanding">Understanding Monte Carlo Methods Through an Analogy</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/2.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/green-leafed-tree-38136/">veeterzy on Pexels</a></em></p>
<p>Imagine you want to find the average height of trees in a big forest.</p>
<p>Measuring every tree is impossible and impractical. But with Monte Carlo methods, it's possible to randomly select a few spots in the forest and measure the height of all the trees in those spots.</p>
<p>By doing this many times and averaging all these measurements, we can estimate the average height of all the trees in the forest.</p>
<p>This way, it's possible to make great estimations in large and complex populations by finding small and manageable samples and averaging them out.</p>
<h2 id="what">What Are Monte Carlo Methods? A Plain English Guide</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/pexels.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/photo-of-two-red-dices-965879/">Jonathan Petersson on Pexels</a></em></p>
<p>Monte Carlo methods are a type of computer algorithm that uses repeated random measurements to obtain approximate results for a given problem.</p>
<p>They are a part of the mathematical field called <a target="_blank" href="https://www.freecodecamp.org/news/numerical-analysis-explained-how-to-apply-math-with-python/">numerical analysis</a> – the use of approximation methods to find solutions where deterministic methods are impractical.</p>
<p>The main idea is to find good enough approximate solutions to solve problems that are too hard or impossible to solve directly.</p>
<p>These solutions are obtained by getting an average of many randomly chosen samples from the population of the problem at hand.</p>
<p>This way, in systems with many uncertain factors and interacting parts, Monte Carlo methods are able to provide insights into how the system behaves and performs.</p>
<p>They are based on the mathematical idea of the <a target="_blank" href="https://www.investopedia.com/terms/l/lawoflargenumbers.asp">Law of large numbers</a> in probability theory:</p>
<blockquote>
<p>The average of many independent, identically distributed random variables converges to the expected value, if it exists.</p>
</blockquote>
<p>The main problem of Monte Carlo methods is the lack of computer resources to make many simulations to get good results.</p>
<h3 id="heading-why-are-they-called-monte-carlo">Why are they called "Monte Carlo"?</h3>
<p>Monte Carlo methods, named after the <a target="_blank" href="https://www.montecarlosbm.com/en/casino-monaco/casino-monte-carlo">Monte Carlo Casino in Monaco</a>, were coined by mathematicians during the 1940s Manhattan Project.</p>
<p><a target="_blank" href="https://www.britannica.com/biography/Stanislaw-Ulam">Stanislaw Ulam</a>, <a target="_blank" href="https://www.britannica.com/biography/John-von-Neumann">John von Neumann</a>, and others were involved in this project, which developed the American nuclear bomb. </p>
<p>The name reflects the randomness in their simulations, akin to the random outcomes in casino gambling.</p>
<h2 id="real">Real-World Applications of Monte Carlo Methods</h2>

<h3 id="heading-circuit-design-in-electrical-engineering">Circuit design in electrical engineering</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/circuit.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo from <a target="_blank" href="https://www.pexels.com/photo/close-up-photography-of-computer-motherboard-163125/">Pixabay</a></em></p>
<p>Circuits have many components. Here are some of them:</p>
<ul>
<li>Resistors</li>
<li>Inductors</li>
<li>Capacitors</li>
<li>Diodes</li>
<li>Transistors</li>
</ul>
<p>Because of the temperatures of the environment they're in, sometimes the circuits may not work.</p>
<p>So, how do engineers design temperature-resilient circuits?</p>
<p>In other words: how can we test a circuit's performance at different temperatures?</p>
<p>Thanks to Monte Carlo methods, we can simulate many intervals of temperature conditions and see their effects on circuit components and how much they affect circuit performance.</p>
<p>This way, we can gather data on how the components should perform under different thermal stresses.</p>
<p>This way, we can optimize the circuit – whether to change the circuit design or choose different components – to work across many environmental conditions.</p>
<h3 id="heading-rocket-design-in-aerospace-engineering">Rocket design in aerospace engineering</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/rocket.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo from <a target="_blank" href="https://www.pexels.com/photo/white-rocket-2159/">Pixabay</a></em></p>
<p>Rocket design involves many different variables, such as: </p>
<ul>
<li>Material properties </li>
<li>Aerodynamic forces </li>
<li>Propulsion efficiency </li>
<li>Environmental conditions. </li>
</ul>
<p>Monte Carlo methods allow for numerous simulations with varying material properties, propulsion efficiency, and more design variables.</p>
<p>This helps in deeply understanding rocket behavior under diverse conditions.</p>
<p>In essence, this stochastic way of solving a big problem is key in understanding the probability behavior of the rocket's performance, like:</p>
<ul>
<li>Trajectory</li>
<li>Stability </li>
<li>Structural integrity </li>
</ul>
<p>By analyzing how these design variables affect the probability behavior of crucial rocket flying performance metrics, engineers can make rockets safer and more reliable.</p>
<h3 id="heading-financial-portfolio-optimization-in-finance-and-investing">Financial Portfolio optimization in finance and investing</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/finance.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by <a target="_blank" href="https://www.pexels.com/photo/close-up-photo-of-monitor-159888/">energepic.com</a></em></p>
<p>In finance portfolio optimization, what is the best mix of assets in a portfolio to maximize returns while minimizing risk?</p>
<p>Monte Carlo methods are used to <a target="_blank" href="https://www.quantconnect.com/learning/articles/introduction-to-options/stochastic-processes-and-monte-carlo-method">simulate</a> how good a portfolio is at maximizing returns while minimizing risk under various market conditions.</p>
<p>By generating many random scenarios for asset prices and returns, banks and financial institutions can know, under different conditions, portfolio outcomes and manage risk.</p>
<p>This way, it's possible to make data-driven decisions to find a balance between risk and rewards.</p>
<h2 id="exploring">Exploring Different Types of Monte Carlo Methods</h2>

<p>There are many variants of Monte Carlo methods. Here are some of the most important:</p>
<h3 id="heading-classical-monte-carlo">Classical Monte Carlo:</h3>
<p>Classical Monte Carlo uses random samples to estimate values and simulate systems. It's useful for tasks where direct solutions are hard to find, like numerical integration</p>
<h3 id="heading-bayesian-monte-carlo">Bayesian Monte Carlo:</h3>
<p>Bayesian Monte Carlo improves estimations by using existing information with new observations to make better predictions.</p>
<p>It is called Bayesian Monte Carlo because it uses <a target="_blank" href="https://www.freecodecamp.org/news/bayes-rule-explained/">Bayes' theorem</a>.</p>
<p>Bayes' theorem was created by the mathematician Thomas Bayes and it's very important in probability theory.</p>
<p>The main idea of the theorem is to <strong>revise existing beliefs with new data.</strong></p>
<p>This method is ideal when you have some existing information about the problem.</p>
<h3 id="heading-markov-chain-monte-carlo-mcmc">Markov Chain Monte Carlo (MCMC):</h3>
<p>For large datasets, Monte Carlo methods often take too long to compute results.</p>
<p>One way to solve this problem is to use a smaller version of big datasets. This is kind of like how a summary <strong>represents</strong> the content of a book because it is quicker to read.</p>
<p>This smaller version is called a <a target="_blank" href="https://www.freecodecamp.org/news/what-is-a-markov-chain/">Markov Chain</a>.</p>
<p>In simple words, Markov Chains are models that show how a system moves between states.</p>
<p>A large dataset can be seen as a system and the states as patterns of data.</p>
<p>This way, Markov Chains are simple models that can <strong>represent</strong> a large dataset because they show how things change from one state to another.</p>
<p>This state change can represent, with fewer numbers, the important patterns in the data.</p>
<p>This way, from the Markov Chain, the Monte Carlo method computes its results.</p>
<p>Essentially, the Monte Carlo makes its predictions <strong>indirectly</strong> from the original data. The Markov Chain acts as a <strong>data preprocessing</strong> step to compute the Monte Carlo results.</p>
<p>In the end, MCMC is just a regular but far more computationally efficient Monte Carlo method.</p>
<h3 id="heading-other-variants">Other variants</h3>
<p>Other methods like <em>Gradient, Semi-Gradient, and Quasi Monte Carlo</em> focus as well on computational efficiency. But in this article, I only seek to highlight the importance of Monte Carlo methods in science, engineering, and programming.</p>
<h2 id="pratical">Practical Implementation: Monte Carlo Methods in Python</h2>

<p>In the code below, you will see how to implement an MCMC variant in Python.</p>
<p>I'll demo a popular variant of MCMC called Hamiltonian Monte Carlo (HMC).</p>
<p>It is called Hamiltonian because it uses concepts from Hamiltonian mechanics to propose new states for the Markov chains in the data pre-processing step.</p>
<h3 id="heading-what-is-hamiltonian-mechanics">What is Hamiltonian Mechanics?</h3>
<p>To answer this, you need to know a bit about classical mechanics.</p>
<p>Classical mechanics uses Newton's laws of motion to explain how physical systems behave and change over time. </p>
<p>Hamiltonian mechanics is another way to look at these systems. It often emphasizes the role of energy and its conservation by using different variables like generalized positions and momenta.</p>
<p>This unique way of describing a system's state and evolution is used in HMC.</p>
<h3 id="heading-main-code-example-objective">Main code example objective</h3>
<p>We will create a target distribution from a 2D Gaussian distribution using TensorFlow Probability. This means that the HMC will model this target distribution.</p>
<p>The 2D Gaussian distribution is created with synthetic data to demonstrate the approximation process using Hamiltonian Monte Carlo.</p>
<p>In other words, HMC will represent this 2D Gaussian distribution accurately.</p>
<p>In real-life scenarios, from circuits to finance, all systems can be described as a probability distribution. </p>
<p>The Monte Carlo methods approximate these complex distributions. And the MCMC makes this process far faster.</p>
<p>In this simple code example, I am approximating a simple target distribution so that you can understand how, in a real life scenario, this would be applied.</p>
<p>Here is the full code (we'll walk through it step by step below):</p>
<pre><code><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">import</span> tensorflow_probability <span class="hljs-keyword">as</span> tfp

# Define the target distribution (<span class="hljs-number">2</span>D Gaussian)
def target_log_prob(x, y):
    <span class="hljs-keyword">return</span> <span class="hljs-number">-0.5</span> * (x**<span class="hljs-number">2</span> + y**<span class="hljs-number">2</span>)

# Initialize the HMC transition kernel
num_results = <span class="hljs-number">1000</span>
num_burnin_steps = <span class="hljs-number">500</span>

hmc = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=lambda x, <span class="hljs-attr">y</span>: target_log_prob(x, y),
    num_leapfrog_steps=<span class="hljs-number">3</span>,
    step_size=<span class="hljs-number">0.1</span>
)

# Define the trace <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">to</span> <span class="hljs-title">record</span> <span class="hljs-title">the</span> <span class="hljs-title">state</span> <span class="hljs-title">and</span> <span class="hljs-title">kernel</span> <span class="hljs-title">results</span>
@<span class="hljs-title">tf</span>.<span class="hljs-title">function</span>
<span class="hljs-title">def</span> <span class="hljs-title">run_chain</span>(<span class="hljs-params">initial_state, kernel, num_results, num_burnin_steps</span>):
    <span class="hljs-title">return</span> <span class="hljs-title">tfp</span>.<span class="hljs-title">mcmc</span>.<span class="hljs-title">sample_chain</span>(<span class="hljs-params">
        num_results=num_results,
        num_burnin_steps=num_burnin_steps,
        current_state=initial_state,
        kernel=kernel,
        trace_fn=lambda _, pkr: pkr
    </span>)

# <span class="hljs-title">Run</span> <span class="hljs-title">the</span> <span class="hljs-title">MCMC</span> <span class="hljs-title">chain</span>
<span class="hljs-title">initial_state</span> = [<span class="hljs-title">tf</span>.<span class="hljs-title">zeros</span>(<span class="hljs-params">[]</span>), <span class="hljs-title">tf</span>.<span class="hljs-title">zeros</span>(<span class="hljs-params">[]</span>)]
<span class="hljs-title">samples</span>, <span class="hljs-title">kernel_results</span> = <span class="hljs-title">run_chain</span>(<span class="hljs-params">initial_state, hmc, num_results, num_burnin_steps</span>)

# <span class="hljs-title">Extract</span> <span class="hljs-title">the</span> <span class="hljs-title">samples</span> <span class="hljs-title">and</span> <span class="hljs-title">log</span>
<span class="hljs-title">samples_</span> = [<span class="hljs-title">s</span>.<span class="hljs-title">numpy</span>(<span class="hljs-params"></span>) <span class="hljs-title">for</span> <span class="hljs-title">s</span> <span class="hljs-title">in</span> <span class="hljs-title">samples</span>]
<span class="hljs-title">samples_x</span>, <span class="hljs-title">samples_y</span> = <span class="hljs-title">samples_</span>

<span class="hljs-title">print</span>(<span class="hljs-params"><span class="hljs-string">"Acceptance rate: "</span>, kernel_results.is_accepted.numpy().mean()</span>)
<span class="hljs-title">print</span>(<span class="hljs-params"><span class="hljs-string">"Mean of x: "</span>, samples_x.mean()</span>)
<span class="hljs-title">print</span>(<span class="hljs-params"><span class="hljs-string">"Mean of y: "</span>, samples_y.mean()</span>)</span>
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/1-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Pratical implementation of Markov Chain Monte Carlo Method</em></p>
<p>Let's understand how the code works step by step.</p>
<h3 id="heading-import-the-libraries">Import the libraries</h3>
<pre><code><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">import</span> tensorflow_probability <span class="hljs-keyword">as</span> tfp
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/2-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries</em></p>
<p>In this code, we import two Python libraries: </p>
<ul>
<li><a target="_blank" href="https://www.tensorflow.org/">TensorFlow</a>: Building and training machine learning models </li>
<li><a target="_blank" href="https://www.tensorflow.org/probability">TensorFlow Probability</a>: Probabilistic reasoning and statistical modeling</li>
</ul>
<h3 id="heading-create-a-target-distribution">Create a target distribution</h3>
<pre><code>def target_log_prob(x, y):
    <span class="hljs-keyword">return</span> <span class="hljs-number">-0.5</span> * (x**<span class="hljs-number">2</span> + y**<span class="hljs-number">2</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/3-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Creating target distribution</em></p>
<p>In this code, we define a 2D Gaussian distribution:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/output-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>2D Gaussian distribution</em></p>
<p>This graph is defined by:</p>
<p>&lt;!DOCTYPE html&gt;</p>


    
    
    Equation Display
    


    <div class="equation">
        -0.5 × (x<sup>2</sup> + y<sup>2</sup>)
    </div>




<p>By being a 2D Gaussian distribution, each data point is represented by two correlated variables that follow a joint Gaussian distribution. </p>
<p>If this were a real-life scenario, we would be modeling a system by finding its probability distribution based on two variables.</p>
<p>In many practical applications, such as circuits, there can be dozens of variables involved. </p>
<p>To model such systems correctly, we often use multivariate probability distributions, which generalize the concept of the Gaussian distribution to many dimensions.</p>
<h3 id="heading-initialize-the-markov-chain-monte-carlo">Initialize the Markov Chain Monte Carlo</h3>
<pre><code>num_results = <span class="hljs-number">1000</span>
num_burnin_steps = <span class="hljs-number">500</span>

hmc = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=lambda x, <span class="hljs-attr">y</span>: target_log_prob(x, y),
    num_leapfrog_steps=<span class="hljs-number">3</span>,
    step_size=<span class="hljs-number">0.1</span>
)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/4-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Initializing the Markov Chain Monte Carlo</em></p>
<p>This block of code sets up a Hamiltonian Monte Carlo (HMC) transition kernel using TensorFlow Probability. </p>
<p>It first defines two variables:</p>
<ul>
<li><code>num_results</code> as 1000, indicating the number of samples to generate</li>
<li><code>num_burnin_steps</code> as 500, representing the number of initial samples to discard (burn-in period).</li>
</ul>
<p>The HMC transition kernel is set up with:</p>
<ul>
<li>A target log probability function that takes two inputs and returns their log probability. In our case, the target log probability function is the 2D gaussian distribution. The log probability is the likelihood of a particular set of values.</li>
<li>The algorithm takes 3 steps each time.</li>
<li>Each step size (Change amount) is 0.1.</li>
</ul>
<h3 id="heading-create-the-trace-function-to-record-the-state-and-kernel-results">Create the trace function to record the state and kernel results</h3>
<pre><code>@tf.function
def run_chain(initial_state, kernel, num_results, num_burnin_steps):
    <span class="hljs-keyword">return</span> tfp.mcmc.sample_chain(
        num_results=num_results,
        num_burnin_steps=num_burnin_steps,
        current_state=initial_state,
        kernel=kernel,
        trace_fn=lambda _, <span class="hljs-attr">pkr</span>: pkr
    )
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/5-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Creating the trace function to record the state and kernel results</em></p>
<p>The function is decorated with <code>@tf.function</code>, which optimizes it for performance by compiling it into a TensorFlow graph.</p>
<p>The function <code>run_chain</code> takes four arguments:</p>
<ol>
<li><code>initial_state</code>: The initial state of the Markov Chain.</li>
<li><code>kernel</code>: The MCMC transition kernel to use (such as Hamiltonian Monte Carlo).</li>
<li><code>num_results</code>: The number of samples to generate.</li>
<li><code>num_burnin_steps</code>: The number of initial samples to discard (burn-in period).</li>
</ol>
<p>The function calls <code>tfp.mcmc.sample_chain</code> to perform the MCMC sampling:</p>
<ul>
<li><code>num_results</code>: The number of samples to draw.</li>
<li><code>num_burnin_steps</code>: The number of burn-in steps.</li>
<li><code>current_state</code>: The starting state of the Markov Chain.</li>
<li><code>kernel</code>: The transition kernel that defines the sampling process.</li>
<li><code>trace_fn</code>: A function that specifies what to trace during sampling. In this case, it returns the previous kernel results (<code>pkr</code>), effectively tracing the internal state of the MCMC algorithm.</li>
</ul>
<h3 id="heading-run-the-mcmc-chain">Run the MCMC chain</h3>
<pre><code># Run the MCMC chain
initial_state = [tf.zeros([]), tf.zeros([])]
samples, kernel_results = run_chain(initial_state, hmc, num_results, num_burnin_steps)

# Extract the samples and log
samples_ = [s.numpy() <span class="hljs-keyword">for</span> s <span class="hljs-keyword">in</span> samples]
samples_x, samples_y = samples_

print(<span class="hljs-string">"Acceptance rate: "</span>, kernel_results.is_accepted.numpy().mean())
print(<span class="hljs-string">"Mean of x: "</span>, samples_x.mean())
print(<span class="hljs-string">"Mean of y: "</span>, samples_y.mean())
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/07/6.png" alt="Image" width="600" height="400" loading="lazy">
<em>Running the MCMC chain</em></p>
<p>Alright let's break this down as there's a lot going on here:</p>
<p><strong>Initialize the State</strong>:</p>
<ul>
<li><code>initial_state</code> is set to a list containing two zero tensors, which serves as the starting point for the Markov Chain.</li>
</ul>
<p><strong>Run the MCMC Chain</strong>:</p>
<ul>
<li>The <code>run_chain</code> function is called with the initial state, the HMC kernel, the number of results, and the number of burn-in steps.</li>
<li>The function returns two values: <code>samples</code>, which are the generated samples, and <code>kernel_results</code>, which contain the results from the kernel (including diagnostic information).</li>
</ul>
<p><strong>Extract and Convert Samples</strong>:</p>
<ul>
<li>The samples are converted from TensorFlow tensors to NumPy arrays for easier manipulation and analysis.</li>
<li><code>samples_</code> is a list comprehension that converts each sample tensor to a numpy array.</li>
<li><code>samples_x</code> and <code>samples_y</code> are the extracted samples for the two dimensions.</li>
</ul>
<p><strong>Print Diagnostics</strong>:</p>
<ul>
<li>The acceptance rate of the MCMC sampler is calculated and printed. It shows the proportion of accepted proposals during sampling.</li>
<li>The means of the samples for both dimensions (<code>x</code> and <code>y</code>) are calculated and printed to provide a summary of the sampling results.</li>
</ul>
<p>This gives as results:</p>
<ul>
<li>Acceptance rate: 1.0. This means all proposals made during sampling were accepted</li>
<li>Mean of x: -0.11450629 and mean of y:  -0.23079416. In a perfect 2D Gaussian distribution, the means of x and y are 0.</li>
</ul>
<p>With this MCMC variant, we are approximating the 2D Gaussian distribution. But it's close to zero. This means that, given more computational time, it probably would go to an even smaller number until it was so small it could be considered zero.</p>
<h2 id="conclusion">Conclusion: The future of Monte Carlo methods</h2>

<p>The future of Monte Carlo methods lies in the creation of variants that require fewer computational resources and save time.</p>
<p>With these advancements, Monte Carlo methods will find additional applications in more fields.</p>
<p>Thanks to Monte Carlo methods, we are able to model complex systems and phenomena that were previously impossible to do in an efficient manner.</p>
<p>If you want to know more, you can <a target="_blank" href="https://www.freecodecamp.org/news/solve-the-unsolvable-with-monte-carlo-methods-294de03c80cd/">read this article on Monte Carlo methods</a>.</p>
<p>You can also check out the full code here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How Does a CPU Work Internally? From Transistors to Instruction Set Architecture ]]>
                </title>
                <description>
                    <![CDATA[ The CPU (Central Process Unit) is the brain of a computer, and the main connection between software and hardware. It makes it possible to operate software on hardware. However, how does it work in deep detail? And how can it connect programs to certa... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-does-cpu-work-internally/</link>
                <guid isPermaLink="false">66ba5326f4ac8da2b2c2e847</guid>
                
                    <category>
                        <![CDATA[ Computers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ cpu ]]>
                    </category>
                
                    <category>
                        <![CDATA[ hardware ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 10 Jul 2024 19:14:05 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/1.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The CPU (Central Process Unit) is the brain of a computer, and the main connection between software and hardware. It makes it possible to operate software on hardware.</p>
<p>However, how does it work in deep detail? And how can it connect programs to certain computer hardware?</p>
<p>This article aims to make you understand this connection by deeply explaining how a CPU works. This topic is often familiar only to those with a background in computer hardware design from college.</p>
<p>Often, many computer science graduates never have a class in advanced digital logic. So even very experienced programmers may lack an understanding of how a CPU actually processes information.</p>
<p>Although we won't be designing <a target="_blank" href="https://www.homemade-circuits.com/how-to-make-logic-gates-using-transistors/">logic gates from transistors</a> or <a target="_blank" href="https://www.techspot.com/article/1830-how-cpus-are-designed-and-built-part-2/">CPU components from logic gates</a>, we will cover the key concepts needed to understand how a CPU processes data created by a program written in a programming language.</p>
<p>We will see:</p>
<ul>
<li><a class="post-section-overview" href="#heading-analogy-introduction-to-what-makes-cpus-work">Analogy: Introduction to What Makes CPUs Work</a></li>
<li><a class="post-section-overview" href="#heading-the-memory-hubs-understanding-ram-and-rom">The Memory Hubs: Understanding RAM and ROM</a></li>
<li><a class="post-section-overview" href="#heading-the-roadways-of-data-navigating-the-cpu-data-path">The Roadways of Data: Navigating the CPU Data Path</a></li>
<li><a class="post-section-overview" href="#heading-traffic-controllers-the-role-of-state-machines-in-cpus">Traffic Controllers: The Role of State Machines in CPUs</a></li>
<li><a class="post-section-overview" href="#heading-daily-routines-the-fetch-execute-cycle-explained">Daily Routines: The Fetch-Execute Cycle Explained</a></li>
<li><a class="post-section-overview" href="#heading-the-rulebook-decoding-the-instruction-set-architecture-isa">The Rulebook: Decoding the Instruction Set Architecture (ISA)</a></li>
<li><a class="post-section-overview" href="#heading-from-programming-languages-to-machine-code">From programming languages to machine code</a></li>
<li><a class="post-section-overview" href="#heading-city-challenges-addressing-cpu-problems">City Challenges: Addressing CPU Problems</a></li>
<li><a class="post-section-overview" href="#heading-conclusion-better-control-units-and-data-parts">Conclusion: Better control units and data parts</a></li>
</ul>
<p>I will use the Intel 8008 as a reference. </p>
<h2 id="analogy">Analogy: Introduction to What Makes CPUs Work</h2>

<p>To understand deeply how a computer works, let's imagine a city as our real-life scenario. We'll compare computer elements to parts of this city.</p>
<p>This way, you get a clearer view of different CPU parts and why they are important. Afterwards, we will look in depth to each of the components</p>
<h3 id="heading-the-memory-hubs-understanding-ram-and-rom">The Memory Hubs: Understanding RAM and ROM</h3>
<p>RAM (Random access memory) is like a city public library: it stores books and information for people to borrow and return as needed.</p>
<p>In a computer, the RAM loads data and instructions from the computer memory needed by the CPU to process data.</p>
<p>ROM (read Only Memory) is like a historical archive in the city: It only stores records that will never change and never be borrowed from the public.</p>
<h3 id="heading-the-roadways-of-data-navigating-the-cpu-data-path">The Roadways of Data: Navigating the CPU Data Path</h3>
<p>The CPU data path is the network of roads in the city. The buses and registers of the CPU data path act like the city's road network.</p>
<p>Just as roads help cars and people move, the CPU data path guarantee the data travels in a efficient manner in the CPU</p>
<h3 id="heading-traffic-controllers-the-role-of-state-machines-in-cpus">Traffic Controllers: The Role of State Machines in CPUs</h3>
<p>States machines act as the traffic control systems.</p>
<p>The traffic control system manages the flow of vehicles, and the states machines manage the flow of data according to the instructions provided to the CPU.</p>
<h3 id="heading-daily-routines-the-fetch-execute-cycle-explained">Daily Routines: The Fetch-Execute Cycle Explained</h3>
<p>The fetch-execute cycle is the daily commute for city residents.</p>
<p>Every day, people decide where they are going, travel there, perform their tasks and return home. This process is always repeated.</p>
<p>In the same way, the CPU fetches instructions, decodes them, and executes them in a repetitive cycle.</p>
<h3 id="heading-the-rulebook-decoding-the-instruction-set-architecture-isa">The Rulebook: Decoding the Instruction Set Architecture (ISA)</h3>
<p>The instruction set architecture is like the city transportation law.</p>
<p>The city transportation law shows what is legal to do in the city in relation to the transportation of people.</p>
<p>The instruction set architecture is the set of rules and instructions that the CPU can execute.</p>
<h2 id="memory">The Memory Hubs: Understanding RAM and ROM</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/3.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Valentine Tanasovich: https://www.pexels.com/photo/black-and-gray-computer-motherboard-2588757/</em></p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/how-to-access-and-read-ram-contents/">RAM stands for Random Access Memory and can be used to read and write data.</a></p>
<p>The CPU gets data from the computer memory to the RAM first to avoid long waiting times.</p>
<p>Then, it uses the data from the RAM to complete the instructions.</p>
<p>They are used in computers and in many electronic devices due to the memory being volatile. It means that the data is only there while the computer is turned on, making it ideal for temporal storage while the device works.</p>
<p>ROM stands for Read Only Memory. In there, there only exist data added during computer manufacturing.</p>
<p>They are widely used in <a target="_blank" href="https://www.freecodecamp.org/news/what-is-firmware/">firmware</a> for devices, BIOS and small embedded systems.</p>
<p>This is because ROM is non-volatile memory. This means that it remains in memory when the device is powered off, making it very important for permanent data storage.</p>
<h2 id="roadways">The Roadways of Data: Navigating the CPU Data Path</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Rogeer Marques: https://www.pexels.com/photo/close-up-shot-of-a-chip-processor-11272008/</em></p>
<p>The CPU data path is a complex digital circuit with many components that work with one another, such as:</p>
<ul>
<li><strong>Arithmetic Logic Unit (ALU):</strong> Performs arithmetic and logical operations inside the CPU data part.</li>
<li><strong>Registers:</strong> Small, fast storage locations for temporary data retrieved from the RAM.</li>
<li><strong>Buses:</strong> Data, control and address buses are wires used inside the CPU data path to transfer information.</li>
</ul>
<p>While CPUs have changed a lot since the Intel 8008, these are some of the components that still serve as the foundation for all CPUs.</p>
<p>Thanks to them, it is possible to let data flow, but not control the actual flow. This is the job of the control unit in the CPU, created in the Intel 8008 as state machines.</p>
<h2 id="traffic">Traffic Controllers: The Role of State Machines in CPUs</h2>

<p><a target="_blank" href="https://www.freecodecamp.org/news/state-machines-basics-of-computer-science-d42855debc66/">A state machine is a system that transitions between different states in order to perform tasks.</a></p>
<p>They are composed of a number of states and transitions. They were used in the Intel 8008 to create the control unit due to its structure and effective way to manage the sequence of operations needed to process instructions</p>
<p>Each of the states can activate one or many CPU components to process a certain assembly instruction.</p>
<p>This way, certain CPU data path parts are activated for an instruction to be completed.</p>
<p>Additionally, thanks to these state machines, the CPU is complete and can perform all instructions a user wants in a continuous loop called the fetch-execute cycle.</p>
<h2 id="fetch">Daily Routines: The Fetch-Execute Cycle Explained</h2>

<p>The state machine in the CPU controls how the CPU data path works together to perform a given instruction.</p>
<p>Nowadays, every computer receives millions of instructions per second. This way, the state machines act as a loop to get the instructions and execute them.</p>
<p>This process is known as the fetch-execute cycle, where the CPU retrieves and executes instructions:</p>
<ul>
<li><strong>Fetch:</strong> The CPU fetches the instruction from memory.</li>
<li><strong>Decode:</strong> The fetched instruction is decoded to determine the required action.</li>
<li><strong>Execute:</strong> The decoded instruction is executed using the appropriate CPU components.</li>
<li><strong>Write-back:</strong> The result of the execution is written back to memory or a register.</li>
</ul>
<p>In the fetch stage, the control unit tells the RAM to give the next instruction to the CPU.</p>
<p>In the decode stage, the CPU interprets the instruction, and in the execution stage, it performs the operation. Afterwards, the write-back stage ensures the result is stored correctly.</p>
<p>This cycle continues while the PC is on. This way, in modern processors, processing billions of instructions per second.</p>
<h3 id="heading-but-what-about-data-from-the-keyboard-or-mouse">But What About Data from the Keyboard or Mouse?</h3>
<p>This data does not come from RAM but is handled through a mechanism called interrupts. While the CPU runs instructions, it can detect when data comes from peripherals.</p>
<p>If this happens, the CPU stops its current task and prioritizes the instructions from the peripherals. Afterwards, the CPU resumes its previous tasks.</p>
<p>There are many ways to manage interrupts, with some of the most popular being:</p>
<ol>
<li><strong>Polled Interrupts</strong>: The CPU periodically checks if an interrupt has occurred.</li>
<li><strong>Vectored Interrupts</strong>: The interrupting device directs the CPU to the appropriate interrupt service routine.</li>
<li><strong>Prioritized Interrupts</strong>: Interrupts are assigned different priority levels, ensuring critical tasks are handled first.</li>
</ol>
<p>This way, with these mechanisms, the CPU maintains its performance while interacting the peripherals.</p>
<h2 id="instruction">The Rulebook: Decoding the Instruction Set Architecture (ISA)</h2>

<p>With the control unit, the complete CPU and RAM, it is possible to perform many instructions.</p>
<p>But what instructions can be performed on a given CPU? And how many? This is what the Instruction Set Architecture (ISA) solves.</p>
<p>The ISA defines a set of instructions that a certain CPU can execute. It is what allows programmers to understand what a processor can and cannot do without having to understand all the digital logic hardware inside it.</p>
<p>This way, it acts as an interface between software and hardware.</p>
<p><strong>Key Aspects of ISA:</strong></p>
<ul>
<li><strong>Instruction Types:</strong> Includes arithmetic, logical, control, and data transfer instructions.</li>
<li><strong>Addressing Modes:</strong> Methods for specifying operands of instructions.</li>
<li><strong>Registers:</strong> The set of registers available for use by instructions.</li>
</ul>
<p><strong>Common ISAs:</strong></p>
<ul>
<li><strong>x86:</strong> Widely used in desktop and server processors.</li>
<li><strong>ARM:</strong> Dominant in mobile and embedded devices due to its power efficiency.</li>
<li><strong>RISC-V:</strong> An open standard ISA designed for a wide range of applications.</li>
</ul>
<p>Each CPU often has its own version of the instruction-set architecture. And the instruction set architecture is very often defined with the assembly programming languages.</p>
<p>This is why there are so <a target="_blank" href="https://www.freecodecamp.org/news/what-are-assembly-languages/">many versions</a> of the assembly programming language.</p>
<p>Since each CPU has its own hardware specifications, each will have similar components to other CPUs and, thus, similar assembly programming languages associated.</p>
<p>The choice of ISA impacts the CPU's design, performance, and compatibility with software.</p>
<p>For instance, the complexity of x86 allows for powerful desktop applications, while ARM's simplicity favors energy-efficient mobile devices.</p>
<h2 id="programming">From Programming Languages to Machine Code</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/3-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by luis gomes: https://www.pexels.com/photo/close-up-photo-of-programming-of-codes-546819/</em></p>
<p>While each processor has its own assembly language, managing code in assembly and writing code in assembly to make big programs can be complex.</p>
<p>It is very complicated, and may lead to wasting time on correcting things and details instead of, in a faster and easier way, managing the development of a program and actually developing it.</p>
<p>To solve this problem, many programming languages were created from assembly. We write the code in the programming languages, and it is then converted to assembly.</p>
<p>This way, instead of spending time on details, it is possible to focus on more important things like system development and algorithm design.</p>
<p>This is the process by which most programming languages convert their code into assembly:</p>
<ol>
<li>Convert the code to assembly code through either a compiler or interpreter.</li>
<li>The assembly code is then converted to raw machine code and executed by the CPU.</li>
<li>A specific loop in the CPU's state machine is completed.</li>
<li>Afterward, the CPU fetches and executes the next instruction.</li>
</ol>
<p>Let's see two examples of programming languages doing this!</p>
<h3 id="heading-c-programming-language">C Programming Language</h3>
<p>The C programming language was created from assembly in the early 1970s. It was created to provide a higher-level language for efficient system-level programming that also allows hardware manipulation.</p>
<p>With a compiler, the C code is converted to assembly and then processed by the complete CPU.</p>
<p>Thanks to this conversion, by writing programs in the C programming language, we can address many problems in a more efficient manner, such as:</p>
<ul>
<li>Memory management errors</li>
<li>Buffer overflows</li>
<li>Manual optimization issues</li>
</ul>
<p>Nowadays, even for simpler tasks, the assembly code converted from C compiler is far more efficient and reliable than a human writing the assembly code.</p>
<p>If you want to learn more about the C compiler you can check out:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.freecodecamp.org/news/what-is-a-compiler-in-c/">https://www.freecodecamp.org/news/what-is-a-compiler-in-c/</a></div>
<h3 id="heading-python-programming-language">Python Programming Language</h3>
<p>The Python programming language was created from C in the late 1980s.</p>
<p>Its goal was to provide a user-friendly, high-level programming language that emphasizes readability and simplicity, allowing for rapid application development.</p>
<p>In Python, an interpreter converts the Python code to bytecode line by line.</p>
<p>And this bytecode is converted to machine code in the CPU and processed in the fetch-execute cycle.</p>
<p>This way, it is possible for people to program in an easier way and focus on bigger programs, such as:</p>
<ul>
<li>Artificial intelligence models</li>
<li>Web apps</li>
<li>Data analysis</li>
<li>Scientific computing</li>
</ul>
<p>However, the challenge with the CPUs in all programming languages is that it processes data sequentially.</p>
<h2 id="problems">City Challenges: Addressing CPU Problems</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/4-1.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Peng LIU: https://www.pexels.com/photo/timelapse-photography-of-vehicle-on-concrete-road-near-in-high-rise-building-during-nighttime-169677/</em></p>
<p>The traditional one core CPU processes data sequentially, instruction after instruction. This becomes a limitation if we have many instructions to process.</p>
<p>This is what GPUs (Graphics processing units) came to fix.  Thanks to GPUs, we can process instructions in parallel, thereby reducing computing time significantly.</p>
<p>With these parallel processing capabilities, it is possible to achieve a much faster computation and improved efficiency in a wide range of applications.</p>
<h2 id="conclusion">Conclusion: Better Control Units and Data Parts</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/07/5.jpg" alt="Image" width="600" height="400" loading="lazy">
<em>Photo by Miguel Á. Padriñán: https://www.pexels.com/photo/green-circuit-board-343457/</em></p>
<p>In addition to modern CPUs being multicore, advancements in control units and data paths play a critical role in improving processor performance. </p>
<p>Control units are often designed using microprogramming or hardwired control units. </p>
<p>Microprogramming offers greater flexibility and easier updates to the control logic, while hardwired control units provide faster performance by directly implementing control signals.</p>
<p>Another significant advancement is the exploration of new materials for transistors in logic gates. </p>
<p>Instead of relying solely on silicon, researchers are investigating alternative materials to create faster and more efficient processors.</p>
<p>As technology continues to advance, understanding these fundamental concepts will remain essential for both enthusiasts and professionals in the field.</p>
<p>Keeping up with these developments ensures the continued innovation and improvement of CPU design and functionality.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What are Markov Chains? Explained With Python Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ There are various mathematical tools that can be used to predict the near future based on a current state. One of the most widely used are Markov chains. Markov chains allow you to predict the uncertainty of future events under certain conditions. Fo... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-a-markov-chain/</link>
                <guid isPermaLink="false">66ba5357cccc49d721b6ea23</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ statistics ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Mon, 08 Jul 2024 12:53:27 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/miltiadis-fragkidis-2zGTh-S5moM-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>There are various mathematical tools that can be used to predict the near future based on a current state. One of the most widely used are Markov chains.</p>
<p>Markov chains allow you to predict the uncertainty of future events under certain conditions. For this reason, it is widely used in science, engineering, economics and many more areas.</p>
<p>However, there are many types of Markov Chains and each have their own applications.</p>
<p>This guide introduces what Markov chains are, different types of Markov chains, including Discrete-Time, Continuous-Time, Reversible, and a code example of Hidden Markov Models (HMMs).</p>
<p>We will see:</p>
<ul>
<li><a class="post-section-overview" href="#analogy">Analogy</a></li>
<li><a class="post-section-overview" href="#heading-markov-chain-explained-in-plain-english">Markov Chain Explained in plain English</a></li>
<li><a class="post-section-overview" href="#heading-applications-of-markov-chains">Applications of Markov Chains</a></li>
<li><a class="post-section-overview" href="#types-of-markov-chains">Types of Markov Chains</a></li>
<li><a class="post-section-overview" href="#hidden-markov-chains-code-example">Hidden Markov Chains Code Example</a></li>
</ul>
<h2 id="heading-analogy"> Analogy </h2>

<p>Imaging that you want to predict the weather tomorrow, and it <strong>only</strong> depends on the weather today. The weather can be either sunny or rainy.</p>
<p>Here are the probabilities:</p>
<ul>
<li>If it's sunny today, there's an 80% chance that it will be sunny again tomorrow, and a 20% chance that it will be rainy.</li>
<li>If it's rainy today, there's a 50% chance that it will be sunny tomorrow, and a 50% chance that it will be rainy.</li>
</ul>
<p>In this scenario, we can predict future states of the weather based on current states using probabilities.</p>
<p>This idea of predicting the future based solely probabilities of the present is called Markov chain.</p>
<p>Here, the states are either sunny or rainy and the probabilities describe the chances of the weather changing based on the current state.</p>
<h2 id="heading-markov-chain-explained-in-plain-english">Markov Chain Explained in Plain English</h2>
<p>A Markov chain describes random processes where systems move between states, and a new state only depends on the current state, not on how it got there.</p>
<p>Mathematically, Markov chains are called stochastic models because they model (simulate) real life events that are random by nature (stochastic).</p>
<p>Markov chains are very easy to implement and efficient at modeling complex systems.</p>
<p>Another key advantage is their "memoryless" property. This makes it faster to run on computers, and powerful to study random processes and make prediction based on current conditions.</p>
<h2 id="heading-applications-of-markov-chains">Applications of Markov Chains</h2>
<p>At some level, almost all real-life events are stochastic. In other words, they involve randomness and uncertainty.</p>
<p>This is exactly why they are so widely used. They can predict the behavior of systems based on current conditions.</p>
<p>In finance, they are used to detect changes in credit ratings for forecasting market regimes.</p>
<p>In genetics, they help understand how proteins change over time. Which is important when studying genetic variations.</p>
<p>In robotics, they assist with decision-making by predicting the robot's next move based on current observation.</p>
<p>There, real life examples show how effective Markov chains can be used to solve real life problems in different fields.</p>
<h2 id="heading-types-of-markov-chains"> Types of Markov Chains </h2>

<p>There are many types of Markov chains. In this section, we'll only discuss the most important variants of Markov chains.</p>
<h3 id="heading-discrete-time-markov-chains-dtmcs">Discrete-Time Markov Chains (DTMCs)</h3>
<p>In DTMCs, the system changes state at specific time steps. They are called discrete because the state transitions occur at distinct, separate time intervals.</p>
<p>They are used in queuing theory (study of the behavior of waiting lines), genetics, and economics because they are simple to analyze.</p>
<h3 id="heading-continuous-time-markov-chains-ctmcs">Continuous-Time Markov Chains (CTMCs)</h3>
<p>CTMCs differ from DTMCs in that state transitions can occur at any continuous time point, not at fixed intervals.</p>
<p>This makes them stochastic models where state changes happen continuously. This is important in chemical reactions and reliability engineering.</p>
<h3 id="heading-reversible-markov-chains">Reversible Markov Chains</h3>
<p>Reversible Markov chains are special. The process of state change is the same whether the direction is forwards or backwards, like rewinding a video and playing it again.</p>
<p>This property makes it easier to know when a system is stable and study how a system behaves over time. They are widely used in statistical physics and economics</p>
<h3 id="heading-doubly-stochastic-markov-chains">Doubly Stochastic Markov Chains</h3>
<p>Doubly stochastic Markov chains are defined by a transition probability matrix. In the matrix, the sum of the probabilities in each row and each column equals 1.</p>
<p>This means each row and each column represent a valid probability distribution. In other words, each row and column represent a list of chances for different outcomes.</p>
<p>This property is crucial in quantum computing and statistical mechanics.</p>
<p>Thanks to Doubly stochastic Markov chains, systems change in a way that preserves probabilities and symmetry, making the modeling and analysis of quantum computing systems far more accurate.</p>
<h2 id="heading-hidden-markov-chains-code-example"> Hidden Markov Chains Code Example </h2>

<p>Before we jump into code examples, lets first understand what Hidden Markov Chains are.</p>
<h3 id="heading-hidden-markov-chains-modeling-unseen-states">Hidden Markov Chains: Modeling Unseen States</h3>
<p>The main idea behind hidden Markov chains is to model systems that have hidden states (states we do not know their values) which can only be discovered through observable events.</p>
<p>In other words, hidden Markov chains allow us to predict the behavior of a system by:</p>
<ul>
<li>Considering the likelihood of moving from one state to another.</li>
<li>Knowing the probability of observing a certain event from each state</li>
</ul>
<p>We can understand this by observing how the states change from an indirect point of view.</p>
<p>We many not know the states original values.</p>
<p>But by knowing the way they change, we can predict what their values will be in the future.</p>
<p>This way, hidden Markov chains are flexible in modeling sequences, capturing both the transitions between hidden states and the observable outcomes.</p>
<p>Because of this, hidden Markov models are used in fields such as engineering, financial modeling, speech recognition, bioinformatics, and many more.</p>
<h3 id="heading-code-example">Code Example</h3>
<p>In this code example, we will see a simple example with synthetic data.</p>
<p>Here is the full code:</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> hmmlearn <span class="hljs-keyword">import</span> hmm

# <span class="hljs-built_in">Set</span> random seed <span class="hljs-keyword">for</span> reproducibility
np.random.seed(<span class="hljs-number">42</span>)

# Define the HMM parameters
n_components = <span class="hljs-number">2</span>  # <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> states
n_features = <span class="hljs-number">1</span>    # <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> observation features

# Create a Gaussian HMM
model = hmm.GaussianHMM(n_components=n_components, covariance_type=<span class="hljs-string">"diag"</span>)

# Define transition matrix (rows must sum to <span class="hljs-number">1</span>)
model.startprob_ = np.array([<span class="hljs-number">0.6</span>, <span class="hljs-number">0.4</span>])
model.transmat_ = np.array([[<span class="hljs-number">0.7</span>, <span class="hljs-number">0.3</span>],
                            [<span class="hljs-number">0.4</span>, <span class="hljs-number">0.6</span>]])

# Define means and covariances <span class="hljs-keyword">for</span> each state
model.means_ = np.array([[<span class="hljs-number">0.0</span>], [<span class="hljs-number">3.0</span>]])
model.covars_ = np.array([[<span class="hljs-number">0.5</span>], [<span class="hljs-number">0.5</span>]])

# Generate synthetic observation data
X, Z = model.sample(<span class="hljs-number">100</span>)  # <span class="hljs-number">100</span> samples

# Create a <span class="hljs-keyword">new</span> HMM instance
new_model = hmm.GaussianHMM(n_components=n_components, covariance_type=<span class="hljs-string">"diag"</span>, n_iter=<span class="hljs-number">100</span>)

# Fit the model to the data
new_model.fit(X)

# Print the learned parameters
print(<span class="hljs-string">"Transition matrix:"</span>)
print(new_model.transmat_)
print(<span class="hljs-string">"Means:"</span>)
print(new_model.means_)
print(<span class="hljs-string">"Covariances:"</span>)
print(new_model.covars_)

# Predict the hidden states <span class="hljs-keyword">for</span> the observed data
hidden_states = new_model.predict(X)

print(<span class="hljs-string">"Hidden states:"</span>)
print(hidden_states)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Full code</em></p>
<p>Lets see the code block by block!</p>
<h3 id="heading-import-libraries-and-set-random-seed">Import libraries and set random seed</h3>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> hmmlearn <span class="hljs-keyword">import</span> hmm

np.random.seed(<span class="hljs-number">42</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Import libraries and set random seed</em></p>
<p>In this block of code, we imported two python libraries:</p>
<ul>
<li><a target="_blank" href="https://numpy.org/">NumPy</a>: For numerical operations.</li>
<li><a target="_blank" href="https://hmmlearn.readthedocs.io/en/latest/index.html">hmmlearn</a>: For hidden Markov model implementation.</li>
</ul>
<p>Next we defined with the <code>numpy</code> library a random seed.</p>
<h4 id="heading-what-is-a-random-seed">What is a Random Seed?</h4>
<p>A random seed is a value used to start a pseudorandom number generator.</p>
<p>With a fixed random seed, we ensure that the sequence of pseudorandom numbers generated is always the same.</p>
<p>This allows us to duplicate experiments and verify results.</p>
<p>The specific value of the seed does not matter as long as it remains consistent.</p>
<h3 id="heading-define-the-hmm-parameters-and-create-a-gaussian-hmm">Define the HMM parameters and create a Gaussian HMM</h3>
<pre><code>n_components = <span class="hljs-number">2</span>  # <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> states
n_features = <span class="hljs-number">1</span>    # <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> observation features

model = hmm.GaussianHMM(n_components=n_components, covariance_type=<span class="hljs-string">"diag"</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define the HMM parameters and create a Gaussian HMM</em></p>
<p>In this code block, we created a HMM with two hidden states and a single observed variable.</p>
<p><code>covariance_type "diag"</code> means the matrices that represent covariance–how two variables change together–are diagonal. In other words, each row and column is assumed to be independent of the others.</p>
<p>This implies that the probability distributions of each row and column are independent of each other.</p>
<p>However, there is still something strange when we defined the hidden Markov chain.</p>
<h4 id="heading-what-does-gaussian-mean">What Does "Gaussian" Mean?</h4>
<p>This is a very big topic in statistics, but in a few words, Markov chains can only be created when we specify the transition probabilities—chances of moving from one state to another in a Markov chain—and an initial probability distribution.</p>
<p>A Gaussian HMM assumes events are initially modeled by a Gaussian distribution, also called a normal distribution.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/06/normal-distribution.png" alt="Image" width="600" height="400" loading="lazy">
<em>Normal distribution</em></p>
<p>A normal distribution is like a bell-shaped curve that describes how things are often spread out in nature.</p>
<p>The normal distribution is crucial because it describes many natural occurrences like human heights, measurement errors, how likely a disease might spread and many more.</p>
<p>And while many natural events may not be described by a normal distribution with the <a target="_blank" href="https://www.investopedia.com/terms/c/central_limit_theorem.asp">central limit theorem</a>, they can be approximated to be described by a normal distribution.</p>
<p>This way, many hidden Markov models (HMMs) are defined by a normal distribution, which represents many phenomena in nature and society</p>
<p>In the hmmlearn library, there is also the possibility of creating Markov chains based on Poisson distributions.</p>
<p>In simple words, Poisson distributions model probabilities that describe the occurrence of events over a fixed interval of time or space. This is widely used in telecommunications.</p>
<p>HMMs based on a Poisson distribution would predict events that often happen to be random and independent over a specified interval.</p>
<h3 id="heading-define-transition-matrix-means-and-covariances-for-each-state">Define transition matrix , means and covariances for each state</h3>
<pre><code>model.startprob_ = np.array([<span class="hljs-number">0.6</span>, <span class="hljs-number">0.4</span>])
model.transmat_ = np.array([[<span class="hljs-number">0.7</span>, <span class="hljs-number">0.3</span>],
                            [<span class="hljs-number">0.4</span>, <span class="hljs-number">0.6</span>]])

model.means_ = np.array([[<span class="hljs-number">0.0</span>], [<span class="hljs-number">3.0</span>]])
model.covars_ = np.array([[<span class="hljs-number">0.5</span>], [<span class="hljs-number">0.5</span>]])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define transition matrix , means and covariances for each state</em></p>
<p><strong><code>model.startprob_ = np.array([0.6, 0.4])</code></strong>:</p>
<ul>
<li>This line sets the initial state probabilities for a Hidden Markov Model (HMM). It indicates that there is a 60% probability of starting in state 0 and a 40% probability of starting in state 1.</li>
</ul>
<p><strong><code>model.transmat_ = np.array([[0.7, 0.3], [0.4, 0.6]])</code></strong>:</p>
<ul>
<li>This line sets the state transition probability matrix for the HMM. The matrix specifies the probabilities of moving from one state to another:</li>
<li>From state 0, there is a 70% chance of staying in state 0 and a 30% chance of transitioning to state 1.</li>
<li>From state 1, there is a 40% chance of transitioning to state 0 and a 60% chance of staying in state 1.</li>
</ul>
<p><strong><code>model.means_ = np.array([[0.0], [3.0]])</code></strong>:</p>
<ul>
<li>This line sets the mean values for the observation distributions in each state. It indicates that the observations are normally distributed with a mean of 0.0 in state 0 and a mean of 3.0 in state 1.</li>
</ul>
<p><strong><code>model.covars_ = np.array([[0.5], [0.5]])</code></strong>:</p>
<ul>
<li>This line sets the covariance values for the observation distributions in each state. It specifies that the variance (covariance in this 1-dimensional case) of the observations is 0.5 for both state 0 and state 1.</li>
</ul>
<h3 id="heading-create-data-new-hmm-instance-and-fit-the-model-with-the-data">Create data, new HMM instance and fit the model with the data</h3>
<pre><code>X, Z = model.sample(<span class="hljs-number">100</span>)  # <span class="hljs-number">100</span> samples

new_model = hmm.GaussianHMM(n_components=n_components, covariance_type=<span class="hljs-string">"diag"</span>, n_iter=<span class="hljs-number">100</span>)

new_model.fit(X)

print(<span class="hljs-string">"Transition matrix:"</span>)
print(new_model.transmat_)
print(<span class="hljs-string">"Means:"</span>)
print(new_model.means_)
print(<span class="hljs-string">"Covariances:"</span>)
print(new_model.covars_)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/5.png" alt="Image" width="600" height="400" loading="lazy">
<em>Create data, new HMM instance and fit the model with the data</em></p>
<p>In this code, we created a model with 100 samples, iterated it 100 times, and printed the new state transition matrix, means, and covariances.</p>
<p>In other words, we generated 100 samples from the original model, fit a new Hidden Markov Model (HMM) to these samples, and then printed the learned parameters of this new model.</p>
<ul>
<li><strong>X</strong> means the observed data samples generated by the original model.</li>
<li><strong>Z</strong> means the hidden state sequences corresponding to the observed data samples generated by the original model.</li>
</ul>
<p><strong>The transition matrix prints out:</strong></p>
<pre><code>[[<span class="hljs-number">0.8100804</span>  <span class="hljs-number">0.1899196</span> ]
 [<span class="hljs-number">0.49398918</span> <span class="hljs-number">0.50601082</span>]]
</code></pre><p>Which means that the model tends to stay in state 0 and has nearly equal chances of switching or staying when in state 1.</p>
<p><strong>The means print out:</strong></p>
<pre><code>[[<span class="hljs-number">0.01577373</span>]
 [<span class="hljs-number">3.06245496</span>]]
</code></pre><p>Which means that the average observed value is approximately 0.016 in state 0 and 3.062 in state 1.</p>
<p><strong>The covariances print out:</strong></p>
<pre><code>[[[<span class="hljs-number">0.41987084</span>]]
 [[<span class="hljs-number">0.53146802</span>]]]
</code></pre><p>Which means that the observed values varies by about 0.420 in state 0 and 0.531 in state 1.</p>
<p>This way, we may never know exactly the values of the states, but we know:</p>
<ul>
<li>How they tend to change with each other</li>
<li>Their average observed value</li>
<li>How they vary</li>
</ul>
<h3 id="heading-predict-the-hidden-states-for-the-observed-data">Predict the hidden states for the observed data</h3>
<pre><code>hidden_states = new_model.predict(X)

print(<span class="hljs-string">"Hidden states:"</span>)
print(hidden_states)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/06/6.png" alt="Image" width="600" height="400" loading="lazy">
<em>Predict the hidden states for the observed data</em></p>
<p>In this code, based on the X observed data samples, we predicted the new states of the Markov model.</p>
<p>The hidden states print out:</p>
<pre><code>[<span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span>
 <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span>
 <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span>]
</code></pre><p>Which means that the hidden states switch between state 0 and state 1, showing how the system changes states over time.</p>
<h2 id="heading-conclusion-the-future-of-markov-chains">Conclusion: The Future of Markov Chains</h2>
<p>Markov chains are widely used in STEM fields due to their ability to predict the future based on the present.</p>
<p>Markov chains have been integrated more with artificial intelligence, improving automation and predicative analytics of systems.</p>
<p>Additionally, the development of more computationally efficient Markov chains is a big priority, making them more accessible for real-time processing and large-scale simulations.</p>
<p>In summary, Markov chains are a very important tool in science due to their ability to predict the future.</p>
<p>With AI and more computational efficiency, Markov chains can be applied in many other fields and solve many problems.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How the Black-Scholes Equation Works – Explained with Python Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ The Black-Scholes Equation is probably one of the most influential equations that nobody has heard about. It's particularly important in finance, especially in these areas: Securitized debt Exchange-traded options Credit default swaps Over-the-count... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-the-black-scholes-equation-works-python-examples/</link>
                <guid isPermaLink="false">66ba532b8e44e0cdf128126f</guid>
                
                    <category>
                        <![CDATA[ finance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MathJax ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Mon, 17 Jun 2024 16:42:59 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/dan-cristian-padure-h3kuhYUCE9A-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The Black-Scholes Equation is probably one of the most influential equations that nobody has heard about.</p>
<p>It's particularly important in finance, especially in these areas:</p>
<ul>
<li>Securitized debt</li>
<li>Exchange-traded options</li>
<li>Credit default swaps</li>
<li>Over-the-counter derivatives securities</li>
</ul>
<p>In this article, you'll learn why the Black-Scholes Equation is so important in finance, what problems it solves, and the industries it created.</p>
<h3 id="heading-heres-what-well-cover">Here's what we'll cover:</h3>
<ul>
<li><a class="post-section-overview" href="#heading-prerequisite-knowledge-of-finance">Prerequisite knowledge of finance</a></li>
<li><a class="post-section-overview" href="#heading-analogy-predict-the-price-of-a-ticket-for-a-concert">Analogy: Predict the price of a ticket for a concert</a></li>
<li><a class="post-section-overview" href="#heading-black-scholes-in-plain-english-with-a-code-example">Plain English explanation with code example</a></li>
<li><a class="post-section-overview" href="#heading-implications-in-the-real-world">Implications in the real world</a></li>
</ul>
<p>Note: In the code example, we will be working with European call and put options.</p>
<h2 id="pre">Prerequisite Knowledge of Finance</h2>

<p>To get the most out of this article and understand the Black-Scholes Equation, you just need to know what <strong>financial derivatives</strong> and <strong>options</strong> are in finance.</p>
<p>Essentially, financial derivatives are tools investors use to manage risks and improve returns.</p>
<p>There are many types of financial derivatives. One of these is called options. </p>
<p>Options are like financial choices. With options, you can get the right to buy or sell something at a certain time and price, but only if you want to.</p>
<p>The main idea is that they help manage risk so you can make future better investments.</p>
<h2 id="analogy"> Analogy: Predict the Price of a Ticket for a Concert </h2>

<p>Imagine you are planning to buy a ticket for a concert.</p>
<p>The ticket prices change depending on the popularity of the artist, demand, and time until the concert.</p>
<p>Depending on that, you will make the best possible decision to buy the ticket at the lowest price.</p>
<p>Just as you think about the <strong>risk</strong> of buying the thicket at a certain time, investors use the Black-Scholes Equation to estimate the fair value of financial derivatives.</p>
<p>This way, they make sure they make wise investment choices in ever changing markets.</p>
<h2 id="plain"> Black-Scholes in Plain English – with a Code Example </h2>

<p>Essentially, the Black-Scholes Equation solved the problem of <a target="_blank" href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=500303">how to price options correctly in financial markets.</a></p>
<p>This is very important, because it helps banks and financial institutions effectively manage risk.</p>
<p>However, it was not always like this. Before 1973, when the equation was created (<a target="_blank" href="https://www.nobelprize.org/prizes/economic-sciences/1997/press-release/">its creators won a Nobel prize</a>), determining the price of options was much more complicated and difficult.</p>
<p>Before the creation of the Black–Scholes equation, there wasn't a standardized mathematical method to predict the prices of options.</p>
<p>Traders often relied on personal experience and market conditions, which led to unreliable option prices.</p>
<p>And earlier mathematical methods did not fully consider factors like volatility, time decay, and interest rates. So there was a lot of error when pricing options.</p>
<h3 id="heading-here-is-the-black-scholes-equation">Here is the Black-Scholes Equation:</h3>
<p>$$\frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} = rV - r S \frac{\partial V}{\partial S}$$</p><p>While we won't look very deeply at the equation itself, we will outline its key components and implications.</p>
<p>Essentially, the Black-Scholes equation predicts how an option's value changes over time based on several variables:</p>
<ul>
<li>V - Price of option as a function of stock price <em>S</em> and time <em>t</em></li>
<li>S – Price of the underlying asset</li>
<li>t – Time</li>
<li>σ – Volatility</li>
<li>r – Interest rate.</li>
</ul>
<p>The left side of the equation explains how the option's value changes over time and how market ups and downs affect it. </p>
<p>The right side of the equation shows how the option's value increases due to interest rates and how changes in the asset's price impact it. </p>
<p>By making these two sides equal, we figure out the fair price of the option. </p>
<h3 id="heading-python-code-example">Python Code Example</h3>
<p>In this code example, we will find, based on many parameters, the theoretical market value of an option.</p>
<p>For our example, let's assume the following:</p>
<ul>
<li>Current stock price (S) = $100. This is the price of the stock right now.</li>
<li>Strike price (K) = $105. This is the specific price at which the option holder can buy (call) or sell (put) the underlying asset.</li>
<li>Time to expiration (T) = 1 year (or 1.0 when expressed in years). This is the time left until the option expires.</li>
<li>Risk-free interest rate (r) = 0.05% (or 0.0005 when expressed as a decimal). This is the interest rate on a risk-free investment.</li>
<li>Volatility (sigma) = 20% (or 0.2 when expressed as a decimal). This is how much the stock price is expected to fluctuate.</li>
</ul>
<pre><code><span class="hljs-keyword">from</span> blackscholes <span class="hljs-keyword">import</span> BlackScholesCall, BlackScholesPut

def calculate_option_prices(S, K, T, r, sigma, q):
    <span class="hljs-string">""</span><span class="hljs-string">"
    Calculate the Black-Scholes option prices for European call and put options using the 'blackscholes' package.

    Parameters:
    S : float - current stock price
    K : float - strike price of the option
    T : float - time to maturity (in years)
    r : float - risk-free interest rate (annual as a decimal)
    sigma : float - volatility of the underlying stock (annual as a decimal)
    q : float - annual dividend yield (as a decimal)

    Returns:
    tuple - (call price, put price)
    "</span><span class="hljs-string">""</span>
    # Creating instances <span class="hljs-keyword">of</span> BlackScholesCall and BlackScholesPut
    call_option = BlackScholesCall(S=S, K=K, T=T, r=r, sigma=sigma, q=q)
    put_option = BlackScholesPut(S=S, K=K, T=T, r=r, sigma=sigma, q=q)

    # Get call and put prices
    call_price = call_option.price()
    put_price = put_option.price()

    <span class="hljs-keyword">return</span> call_price, put_price


call_price, put_price = calculate_option_prices(<span class="hljs-number">100</span>, <span class="hljs-number">105</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0.0005</span>, <span class="hljs-number">0.20</span>, <span class="hljs-number">0.0</span>)
print(<span class="hljs-string">"Call Price: {:.6f}, Put Price: {:.6f}"</span>.format(call_price, put_price))
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/05/1.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Now let's examine the code more closely and see what's really going on here:</p>
<h4 id="heading-step-1-import-the-library">Step 1: Import the Library</h4>
<p>This is the Python library we are using in this article:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://pypi.org/project/blackscholes/">https://pypi.org/project/blackscholes/</a></div>
<pre><code><span class="hljs-keyword">from</span> blackscholes <span class="hljs-keyword">import</span> BlackScholesCall, BlackScholesPut
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/05/2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing functions</em></p>
<h4 id="heading-step-2-create-the-function-to-calculate-options-prices">Step 2: Create the Function to Calculate Options Prices</h4>
<p>In the below code, we are importing the function we need to calculate the options call and put prices.</p>
<pre><code>def calculate_option_prices(S, K, T, r, sigma, q):

    call_option = BlackScholesCall(S=S, K=K, T=T, r=r, sigma=sigma, q=q)
    put_option = BlackScholesPut(S=S, K=K, T=T, r=r, sigma=sigma, q=q)

    call_price = call_option.price()
    put_price = put_option.price()

    <span class="hljs-keyword">return</span> call_price, put_price
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/05/3.png" alt="Image" width="600" height="400" loading="lazy">
<em>Function to calculate call and put prices</em></p>
<p>The main parameters of the function are:</p>
<ul>
<li>S : float – current stock price</li>
<li>K : float – strike price of the option</li>
<li>T : float – time to maturity (in years)</li>
<li>r : float – risk-free interest rate (annual as a decimal)</li>
<li>sigma : float – volatility of the underlying stock (annual as a decimal)</li>
<li>q : float – annual dividend yield (as a decimal)</li>
</ul>
<p>And it returns:</p>
<ul>
<li>tuple – (call price, put price)</li>
</ul>
<p>First, we calculate the call and put options. Then we extract the price from it. We can also get other characteristics like the charm or the delta of these financial contracts according to the library documentation.</p>
<h4 id="heading-step-3-calculate-the-options-pricing">Step 3: Calculate the Options Pricing</h4>
<p>The call and put prices of an option are the costs to buy the respective option contracts. </p>
<pre><code>call_price, put_price = calculate_option_prices(<span class="hljs-number">100</span>, <span class="hljs-number">105</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0.0005</span>, <span class="hljs-number">0.20</span>, <span class="hljs-number">0.0</span>)
print(<span class="hljs-string">"Call Price: {:.6f}, Put Price: {:.6f}"</span>.format(call_price, put_price))
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/05/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Applying the function</em></p>
<p>We use as examples:</p>
<ul>
<li>Current stock price: $100</li>
<li>Strike price: $105</li>
<li>Time to maturity: 1 year</li>
<li>Risk-free interest rate: 0.05% (as a decimal: 0.0005)</li>
<li>Volatility: 20% (as a decimal: 0.20)</li>
<li>Dividend yield: 0%</li>
</ul>
<p>Based on this factors, we price:</p>
<ul>
<li>Call Option Price: 5.924799</li>
<li>Put Option Price: 10.872312</li>
</ul>
<p>Which means that, given these parameters:</p>
<ul>
<li>The price at which you have the right, but not the obligation, to buy is 5.924799 dollars</li>
<li>The price at which you have the right, but not the obligation, to sell is 10.872312 dollars</li>
</ul>
<h2 id="implications"> Implications in the Real World </h2>

<p>The equation has had a massive impact in the world of finance.</p>
<p>Below are some of the industries the Black-Scholes Equation has changed greatly:</p>
<h3 id="heading-securitized-debt">Securitized Debt</h3>
<p>In simple terms, securitized debt refers to turning loans into something that can be bought and sold.</p>
<p>The Black-Scholes equation changed the way banks price grouped-up debt, like mortgages.</p>
<p>Before the Black-Scholes equation, it was very hard to know the worth of these debts. But with the equation, banks can understand their value much better. This made it easier to buy and sell these debts while knowing the potential benefits and risks.</p>
<p>This way, the market for these mortgage debts grew. Which in turn helped grow the housing market.</p>
<h3 id="heading-exchange-traded-options">Exchange Traded Options</h3>
<p>Trading options was a very uncertain business. There was no way of truly knowing how to correctly price them.</p>
<p>However, with the Black-Scholes equation, option pricing became far easier. It allowed people to calculate an option based on an underlying asset's price, volatility, time to expiration, and interest rates.</p>
<p>The newfound precision helped grow the options market.</p>
<h3 id="heading-credit-default-swaps">Credit Default Swaps</h3>
<p>Credit default swaps are like insurance policies for loans. With a credit default swap, you are protected if the borrower fails to pay back.</p>
<p>Credit default swaps are very important in managing credit risk. But it was only after the black Scholes equation was created that they were accurately priced.</p>
<p>This way, credit default swaps became a very important tool for financial institutions for financial risk management.</p>
<h3 id="heading-over-the-counter-derivatives-securities">Over the Counter Derivatives Securities</h3>
<p>Over-the-counter (OTC) derivatives are private deals made between two parties without a stock exchange.</p>
<p>Before Black-Scholes, negotiating the terms and prices of OTC derivatives was very hard. But then the Black-Scholes equation offered a standard way of finding the price of derivatives.</p>
<p>This allowed market participants to negotiate contracts more efficiently and accurately.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The Black-Scholes equation helped create more precision in the way certain things are priced.</p>
<p>This precision helped create more stable institutions, which in turn helped create a more resilient economy.</p>
<p>If interested in learning more, see this video:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/A5w-dEgIU1M" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>If you are interested in learning more about finance:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://www.freecodecamp.org/news/fundamentals-of-finance-economics-for-businesses/">https://www.freecodecamp.org/news/fundamentals-of-finance-economics-for-businesses/</a></div>
<h2 id="heading-full-code">Full code</h2>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How AI Models Think: The Key Role of Activation Functions with Code Examples ]]>
                </title>
                <description>
                    <![CDATA[ In Artificial Intelligence, Machine Learning is the foundation of most revolutionary AI applications. From language processing to image recognition, Machine Learning is everywhere. Machine Learning relies on algorithms, statistical models, and neural... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/activation-functions-in-neural-networks/</link>
                <guid isPermaLink="false">66ba5319ba2ef92905bfa81b</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ neural networks ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Wed, 10 Apr 2024 15:44:31 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/abigail-keenan-8-s5QuUBtyM-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In Artificial Intelligence, Machine Learning is the foundation of most revolutionary AI applications. From language processing to image recognition, Machine Learning is everywhere.</p>
<p>Machine Learning relies on algorithms, statistical models, and neural networks. And Deep Learning is the subfield of Machine Learning focused only on neural networks.</p>
<p>A key piece of any neural network are activation functions. But understanding exactly why they are essential to any neural network system is a common question, and it can be a difficult one to answer.</p>
<p>This tutorial focuses on explaining, in a simple manner with analogies, why exactly activation functions are necessary.</p>
<p>By understanding this, you will understand the process of how AI models think.</p>
<p>Before that, we will explore neural networks in AI. We will also explore the most commonly used activation functions.</p>
<p>We're also going to analyze every line of a very simple PyTorch code example of a neural network.</p>
<h3 id="heading-in-this-article-we-will-explore">In this article, we will explore:</h3>
<ul>
<li><a class="post-section-overview" href="#artificial">Artificial Intelligence and the Rise of Deep Learning</a></li>
<li><a class="post-section-overview" href="#heading-understanding-activation-functions-simplifying-neural-network-mechanics">Understanding Activation Functions: Simplifying Neural Network Mechanics</a></li>
<li><a class="post-section-overview" href="#heading-simple-analogy-why-activation-functions-are-necessary">Simple Analogy: The Necessity of Activation Functions</a></li>
<li><a class="post-section-overview" href="#heading-what-happens-without-activation-functions">What Happens Without Activation Functions?</a></li>
<li><a class="post-section-overview" href="#heading-pytorch-activation-function-code-example">PyTorch Activation Function Code Example</a> </li>
<li><a class="post-section-overview" href="#heading-conclusion-the-unsung-heroes-of-ai-neural-networks">Conclusion: The Unsung Heroes of AI Neural Networks</a></li>
</ul>
<p>This article won't cover dropout or other regularization techniques, hyperparameter optimization, complex architectures like CNNs, or detailed differences in gradient descent variants.</p>
<p>I just want to showcase <strong>why activation functions are needed</strong> and what happens when they are not applied to neural networks.</p>
<p>If you don't know much about deep learning, I personally recommend this Deep Learning crash course on freeCodeCamp's YouTube channel:</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/VyWAvY2CF9c" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<h2 id="Artificial">Artificial Intelligence and the Rise of Deep Learning</h2>

<h3 id="heading-what-is-deep-learning-in-artificial-intelligence">What is Deep Learning in Artificial Intelligence?</h3>
<p>Deep learning is a subfield of artificial intelligence. It uses neural networks to process complex patterns, just like the strategies a sports team uses to win a match.</p>
<p>The bigger the neural network, the more capable it is of doing awesome things – like ChatGPT, for example, which uses natural language processing to answer questions and interact with users.</p>
<p>To truly understand the basics of neural networks – what every single AI model has in common that enables it to work – we need to understand activation layers.</p>
<h3 id="heading-deep-learning-training-neural-networks">Deep Learning = Training Neural Networks</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/4-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Simple neural network</em></p>
<p>At the core of deep learning is the training of neural networks.</p>
<p>That means basically using data to get the right values of the weights to be able to predict what we want.</p>
<p>Neural networks are made of neurons organized in layers. Each layer extracts unique features from the data.</p>
<p>This layered structure allows deep learning models to analyze and interpret complex data.</p>
<h2 id="activation_functions_explanation">Understanding Activation Functions: Simplifying Neural Network Mechanics</h2>

<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/aaaaaaaaaaaaaaaaaaa.png" alt="Image" width="600" height="400" loading="lazy">
<em>Leaky reLU activation function</em></p>
<p>Activation functions help neural networks handle complex data. They change the neuron value based on the data they receive.</p>
<p>It is almost like a filter every neuron has before sending its value to the next neuron.</p>
<p>Essentially, activation functions control the information flow of neural networks – they decide which data is relevant and which is not.</p>
<p>This helps prevent the vanishing gradients to ensure the network learns properly.</p>
<p>The vanishing gradients problem happens when the neural network's learning signals are too weak to make the weight values change. This makes learning from data very difficult.</p>
<h2 id="simple">Simple Analogy: Why Activation Functions are Necessary</h2>

<p>In a soccer game, players decide whether to pass, dribble, or shoot the ball.</p>
<p>These decisions are based on the current game situation, just as neurons in a neural network process data.</p>
<p>In this case, activation functions act like this in the decision-making process.</p>
<p>Without them, neurons would pass data <strong>without any selective analysis</strong> – like players <strong>mindlessly kicking the ball</strong> regardless of the game context.</p>
<p>In this way, activation functions introduce complexity into a neural network, allowing it to learn complex patterns.</p>
<h2 id="what">What Happens Without Activation Functions?</h2>

<p>To understand what would happen without activation functions, let's first think about what happens if players mindlessly kick the ball in a soccer match.</p>
<p>They'd likely lose the match because there would be no decision-making processes as a team. That ball still goes somewhere – but most of the time it will not go where it's intended.</p>
<p>This is similar to what happens in a neural network without activation functions: the neural network doesn't make good predictions because the neurons were just passing data to each other randomly.</p>
<p>We still get a prediction. Just not what we wanted, or what's helpful.</p>
<p>This dramatically limits the capability – of both the soccer team and the neural network.</p>
<h3 id="heading-intuitive-explanation-of-activation-functions">Intuitive Explanation of Activation Functions</h3>
<p>Let's now look at an example so you can understand this intuitively.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/7-3.png" alt="Image" width="600" height="400" loading="lazy">
<em>reLU activation function</em></p>
<p>Let's start with the most widely used activation function in deep learning (it's also one of the simpler ones).</p>
<p>This is an ReLU activation function. It basically acts as a filter before a neuron sends a value to its next neuron.</p>
<p>This filter is essentially two conditions:</p>
<ul>
<li>If the value of the weight is negative, it becomes 0</li>
<li>If the value of the weight is positive, it does not change anything</li>
</ul>
<p>With this, we are adding a decision-making process to each neuron. It decides which data to send and which not to send.</p>
<p>Now let's look at some examples of other activation functions.</p>
<h3 id="heading-sigmoid-activation-functions">Sigmoid Activation Functions</h3>
<p>This activation function converts the input value between 0 and 1. Sigmoids are widely used in binary classification problems in the last neuron.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/9-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Sigmoid activation function</em></p>
<p>There is a problem with sigmoid activation functions, though. Take the output values from a given linear transformation:</p>
<ul>
<li>0.00000003</li>
<li>0.99999992</li>
<li>0.00000247</li>
<li>0.99993320</li>
</ul>
<p>There are some questions about these values we can ask:</p>
<ul>
<li>Are values like 0.00000003 and 0.000002 really important? Can't they be just 0 so that we have fewer things to run on the computer? Remember, in many of today's models, we have millions of weights in them. Can't millions of 0.00000003 and 0.000002 be 0?</li>
<li>And if it is a positive value, how will it distinguish a <strong>big value</strong> from a <strong>very big value</strong>? For example, in 0.99993320 and 0.99999992, where are the input values like <em>7 and 13</em> or <em>7 and 55</em>? 0.99993320 and 0.99999992 do not <strong>accurately</strong> describe their input values.</li>
</ul>
<p>How can we distinguish the subtle differences in outputs so that accuracy is maintained?</p>
<p>This is what the ReLU activation functions solved: setting negative numbers to zero while keeping positive ones boosts neural network computational efficiency.</p>
<h3 id="heading-tanh-hyperbolic-tangent-activation-functions">Tanh (Hyperbolic Tangent) Activation Functions</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/10-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>tanh activation function</em></p>
<p>These activation functions output values between -1 and 1, similar to Sigmoid.</p>
<p>They're often used in <a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-recurrent-neural-networks-in-python/">recurrent neural networks (RNNs) and long short-term memory networks (LSTMs).</a></p>
<p>Tanh is also used because it is zero-centered. This means that the mean of the output values is around zero. This property helps when dealing with the vanishing gradient problem.</p>
<h3 id="heading-leaky-relu">Leaky reLU</h3>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/11-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Leaky reLU activation function</em></p>
<p>Instead of <strong>ignoring</strong> the negative values, Leaky ReLU activation functions are going to have a small negative value.</p>
<p>This way, negative values are also used when training neural networks.</p>
<p>With the ReLU activation function, neurons with negative values are inactive and do not contribute to the learning process.</p>
<p>With the Leaky ReLU activation function, neurons with negative values are active and contribute to the learning process.</p>
<p>This decision-making process is implemented by activations function. Without it, it would simply send the data to the next neuron (just like a player mindlessly kicking the ball).</p>
<h3 id="heading-mathematical-explanation-of-activation-functions">Mathematical Explanation of Activation Functions</h3>
<p>Neurons do two things:</p>
<ul>
<li>They use linear transformations with the previous neurons weights values</li>
<li>They use activation functions to filter certain values to selectively pass on values.</li>
</ul>
<p>Without activation functions, the neural network just does one thing: <strong>Linear transformations.</strong></p>
<p>If it <strong>only</strong> does linear transformations, it is a <strong>linear system</strong>.</p>
<p>If it is a linear system, in very simple terms without being too technical, the <a target="_blank" href="https://www.allaboutcircuits.com/textbook/direct-current/chpt-10/superposition-theorem/">superposition theorem</a> tells us that any mixture of two or more linear transformations can be simplified into one single transformation.</p>
<p>Essentially, it means that, without activation functions, this complex neural network:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/12-2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Long neural network without activation functions</em></p>
<p>Is the same as this simple one:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/13-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Short neural network without activation functions</em></p>
<p>This is because each layer in its matrix form is a product of linear transformations of previous layers.</p>
<p>And according to the theorem, since any mixture of two or more linear transformations can be simplified in one single transformation, then any mixture of hidden layers (that is, layers between the inputs and outputs of neurons) in a neural network can be simplified into only one layer.</p>
<p><strong>What does this all mean?</strong></p>
<p>It means that it can only model data linearly. But in real life with real data, every system is non-linear. So we need activation functions.</p>
<p>We introduce non-linearity into a neural network so that it learns non-linear patterns.</p>
<h2 id="pytorch">PyTorch Activation Function Code Example </h2>

<p>In this section, we are going to train the neural network below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/01/nn-1.svg" alt="Image" width="600" height="400" loading="lazy">
<em>Simple feed forward neural network</em></p>
<p>This is a simple neural network AI model with four layers:</p>
<ul>
<li>Input layer with 10 neurons</li>
<li>Two hidden layers with 18 neurons</li>
<li>One hidden layer with 18 neurons</li>
<li>One output layer with 1 neuron</li>
</ul>
<p>In the code, we can choose any of the four activation functions mentioned in this tutorial. </p>
<p>Here it is the full code – we'll discuss in detail below:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim

<span class="hljs-comment">#Choose which activation function to use in code</span>
defined_activation_function = <span class="hljs-string">'relu'</span>

activation_functions = {
    <span class="hljs-string">'relu'</span>: nn.ReLU(),
    <span class="hljs-string">'sigmoid'</span>: nn.Sigmoid(),
    <span class="hljs-string">'tanh'</span>: nn.Tanh(),
    <span class="hljs-string">'leaky_relu'</span>: nn.LeakyReLU()
}

<span class="hljs-comment"># Initializing hyperparameters</span>
num_samples = <span class="hljs-number">100</span>
batch_size = <span class="hljs-number">10</span>
num_epochs = <span class="hljs-number">150</span>
learning_rate = <span class="hljs-number">0.001</span>

<span class="hljs-comment"># Define a simple synthetic dataset</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_data</span>(<span class="hljs-params">num_samples</span>):</span>
    X = torch.randn(num_samples, <span class="hljs-number">10</span>)
    y = torch.randn(num_samples, <span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> X, y

<span class="hljs-comment"># Generate synthetic data</span>
X, y = generate_data(num_samples)

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SimpleModel</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, activation=defined_activation_function</span>):</span>
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(in_features=<span class="hljs-number">10</span>, out_features=<span class="hljs-number">18</span>)
        self.fc2 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">18</span>)
        self.fc3 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">4</span>)
        self.fc4 = nn.Linear(in_features=<span class="hljs-number">4</span>, out_features=<span class="hljs-number">1</span>)
        self.activation = activation_functions[activation]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x) 
        x = self.activation(x)
        x = self.fc3(x) 
        x = self.activation(x)  
        x = self.fc4(x) 
        <span class="hljs-keyword">return</span> x

<span class="hljs-comment"># Initialize the model, define loss function and optimizer</span>
model = SimpleModel(activation=defined_activation_function)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

<span class="hljs-comment"># Training loop</span>
<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, num_samples, batch_size):
        <span class="hljs-comment"># Get the mini-batch</span>
        inputs = X[i:i+batch_size]
        labels = y[i:i+batch_size]

        <span class="hljs-comment"># Zero the parameter gradients</span>
        optimizer.zero_grad()

        <span class="hljs-comment"># Forward pass</span>
        outputs = model(inputs)

        <span class="hljs-comment"># Compute the loss</span>
        loss = criterion(outputs, labels)

        <span class="hljs-comment"># Backward pass and optimize</span>
        loss.backward()
        optimizer.step()

    print(<span class="hljs-string">f'Epoch <span class="hljs-subst">{epoch+<span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{num_epochs}</span>, Loss: <span class="hljs-subst">{loss}</span>'</span>)

print(<span class="hljs-string">"Training complete."</span>)
</code></pre>
<p>Looks like a lot, doesn't it? Don't worry – we'll take it piece by piece.</p>
<h3 id="heading-1-importing-libraries-and-defining-activation-functions">1: Importing libraries and defining activation functions</h3>
<pre><code><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim

#Choose which activation <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">to</span> <span class="hljs-title">use</span> <span class="hljs-title">in</span> <span class="hljs-title">code</span>
<span class="hljs-title">defined_activation_function</span> = '<span class="hljs-title">relu</span>'

<span class="hljs-title">activation_functions</span> = </span>{
    <span class="hljs-string">'relu'</span>: nn.ReLU(),
    <span class="hljs-string">'sigmoid'</span>: nn.Sigmoid(),
    <span class="hljs-string">'tanh'</span>: nn.Tanh(),
    <span class="hljs-string">'leaky_relu'</span>: nn.LeakyReLU()
}
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries and defining dictionary with activation functions</em></p>
<p>In this code:</p>
<ul>
<li><code>import torch</code>: <a target="_blank" href="https://pytorch.org/docs/stable/torch.html">Imports the PyTorch library.</a></li>
<li><code>import torch.nn as nn</code>: <a target="_blank" href="https://pytorch.org/docs/stable/nn.html">Imports the neural network module from PyTorch.</a></li>
<li><code>import torch.optim as optim</code>: <a target="_blank" href="https://pytorch.org/docs/stable/optim.html">Imports the optimization module from PyTorch.</a></li>
</ul>
<p>The variable and the dictionary above help you easily define, for this deep learning model, the activation function you want to use.</p>
<h3 id="heading-2-defining-hyperparameters-and-generating-a-dataset">2: Defining hyperparameters and generating a dataset</h3>
<pre><code># Initializing hyperparameters
num_samples = <span class="hljs-number">100</span>
batch_size = <span class="hljs-number">10</span>
num_epochs = <span class="hljs-number">150</span>
learning_rate = <span class="hljs-number">0.001</span>

# Define a simple synthetic dataset
def generate_data(num_samples):
    X = torch.randn(num_samples, <span class="hljs-number">10</span>)
    y = torch.randn(num_samples, <span class="hljs-number">1</span>)
    <span class="hljs-keyword">return</span> X, y

# Generate synthetic data
X, y = generate_data(num_samples)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/2.png" alt="Image" width="600" height="400" loading="lazy">
<em>Initializing hyperparameters and creating, with a function, a synthetic dataset</em></p>
<p>In this code:</p>
<ul>
<li><code>num_samples</code> is the number of samples in the synthetic dataset.</li>
<li><code>batch_size</code> is the size of each mini-batch during training.</li>
<li><code>num_epochs</code> is the number of iterations over the entire dataset during training.</li>
<li><code>learning_rate</code> is the learning rate used by the optimization algorithm.</li>
</ul>
<p>Besides, we define a <code>generate_data</code> function to create two tensors with random values. Then it calls the function and it generates, for X and y, two tensors with random values.</p>
<h3 id="heading-3-creating-the-deep-learning-model">3: Creating the deep learning model</h3>
<pre><code><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SimpleModel</span>(<span class="hljs-title">nn</span>.<span class="hljs-title">Module</span>):
    <span class="hljs-title">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-title">self</span>, <span class="hljs-title">activation</span></span>=defined_activation_function):
        <span class="hljs-built_in">super</span>(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(in_features=<span class="hljs-number">10</span>, out_features=<span class="hljs-number">18</span>)
        self.fc2 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">18</span>)
        self.fc3 = nn.Linear(in_features=<span class="hljs-number">18</span>, out_features=<span class="hljs-number">4</span>)
        self.fc4 = nn.Linear(in_features=<span class="hljs-number">4</span>, out_features=<span class="hljs-number">1</span>)
        self.activation = activation_functions[activation]

    def forward(self, x):
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x) 
        x = self.activation(x)
        x = self.fc3(x) 
        x = self.activation(x)  
        x = self.fc4(x) 
        <span class="hljs-keyword">return</span> x
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/3.png" alt="Image" width="600" height="400" loading="lazy">
<em>A simple feed forward neural network deep learning AI model</em></p>
<p>The <code>__init__</code> method in the <code>SimpleModel</code> class <strong>initializes</strong> the neural network architecture. It initializes four fully connected layers and defines the activation function we are going to use.</p>
<p><a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.nn.Linear.html">We create each layer using</a> <code>nn.Linear</code>, while the <code>forward</code> method defines how the data flows through the neural network.</p>
<h3 id="heading-4-initializing-the-model-and-defining-the-loss-function-and-optimizer">4: Initializing the model and defining the loss function and optimizer</h3>
<pre><code>model = SimpleModel(activation=defined_activation_function)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/4.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining activation function, loss function and gradient descend variation to be used</em></p>
<p>In this code:</p>
<ol>
<li><code>model = SimpleModel(activation=defined_activation_function)</code> creates a neural network model with a specified activation function.</li>
<li><code>criterion = nn.MSELoss()</code> defines the <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html">Mean Squared Error (MSE) Loss function</a>.</li>
<li><code>optimizer = optim.Adam(model.parameters(), lr=learning_rate)</code> sets up the <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Adam.html">Adam optimizer</a> for updating the model parameters during training, with a specified learning rate.</li>
</ol>
<h3 id="heading-5-training-the-deep-learning-model">5: Training the deep learning model</h3>
<pre><code><span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">0</span>, num_samples, batch_size):
        # Get the mini-batch
        inputs = X[i:i+batch_size]
        labels = y[i:i+batch_size]

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)

        # Compute the loss
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

    print(f<span class="hljs-string">'Epoch {epoch+1}/{num_epochs}, Loss: {loss}'</span>)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/5.png" alt="Image" width="600" height="400" loading="lazy">
<em>Training the model</em></p>
<ul>
<li>The outer loop, based on <code>num_epochs</code> (number of epochs) controls how many times the entire dataset is processed.</li>
<li>The inner loop divides the dataset in mini-batches using the range function.</li>
</ul>
<p>In each mini loop:</p>
<ol>
<li>With inputs and labels, we get the data from the mini-batch we want to process</li>
<li>We <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html">eliminate with <code>optimizer.zero_grad()</code>, the gradients</a> – variables that tell us how to adjust weights for accurate predictions – of the previous mini-batch iteration. This is important to prevent mixing gradient information between mini-batches.</li>
<li>The forward pass gets us the model predictions (<code>outputs</code>), and the loss is calculated using the specified loss function (<code>criterion</code>). </li>
<li>With <code>loss.backward()</code>, we <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html">calculate the gradients for the weights</a>. </li>
<li>Finally, <code>optimizer.step()</code> <a target="_blank" href="https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html">updates the model's weights</a> based on those gradients to minimize the loss function.</li>
</ol>
<p>This is the full code to train a very simple deep learning model on a very simple dataset.</p>
<p>It does not have anything more advanced like convolutional neural networks.</p>
<h2 id="conclusion">Conclusion: The Unsung Heroes of AI Neural Networks</h2>

<p>Activation functions are like gatekeepers. By restricting the flow of information, the neural network can learn better.</p>
<p>Activation functions are just like people when they study, or soccer players when deciding what to do with a ball.</p>
<p>These functions give neural networks the ability to learn and predict correctly.</p>
<p>Mathematically, activation functions are what allow the correct approximation of any linear or non-linear function in neural networks. Without them, neural networks approximate only linear functions.</p>
<p>And I leave you with this:</p>
<p>The mathematical idea of a neural network being able to approximate any non linear function is called the <a target="_blank" href="https://towardsai.net/p/deep-learning/understanding-the-universal-approximation-theorem">Universal Approximation Theorem‌‌</a>.</p>
<p>You can find the full code on GitHub here:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code</a></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Apply Math with Python – Numerical Analysis Explained ]]>
                </title>
                <description>
                    <![CDATA[ Numerical analysis is the bridge between math and computer science.  Essentially, it is the development of algorithms that approximate solutions that pure math would also solve, but using less computational resources and faster. This field is very im... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/numerical-analysis-explained-how-to-apply-math-with-python/</link>
                <guid isPermaLink="false">66ba533e80dbd3f269f5887b</guid>
                
                    <category>
                        <![CDATA[ Math ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mathematics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Tiago Capelo Monteiro ]]>
                </dc:creator>
                <pubDate>Thu, 29 Feb 2024 11:41:59 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/07/maxim-hopman-fiXLQXAhCfk-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Numerical analysis is the bridge between math and computer science. </p>
<p>Essentially, it is the development of algorithms that approximate solutions that pure math would also solve, but using less computational resources and faster.</p>
<p>This field is very important. Because for most solutions in the real world, we only need good approximations and not the exact solutions.</p>
<p>In this article, we will explore:</p>
<ul>
<li><a class="post-section-overview" href="#heading-an-analogy-that-illustrates-the-importance-of-numerical-analysis">Analogy Illustrating the Importance of Numerical Analysis</a> </li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/p/a66b15d8-ae59-4c46-8e58-5211690e1032/fundamentals">Fundamentals of Numerical Analysis</a> </li>
<li><a class="post-section-overview" href="#heading-application-of-numerical-analysis-in-real-world-problems">Application of Numerical Analysis in Real-World Problems</a></li>
<li><a class="post-section-overview" href="#heading-an-introduction-to-partial-differential-equations-pdes">Introduction to Partial Differential Equations (PDEs)</a></li>
<li><a class="post-section-overview" href="#heading-an-introduction-to-optimization-in-numerical-analysis">Introduction to Optimization in Numerical Analysis</a></li>
</ul>
<h2 id="analogy">An Analogy that Illustrates the Importance of Numerical Analysis</h2>

<p>How can we measure the coastline of an island?</p>
<p>If we try to measure every centimeter of every small segment, it would be impossible and probably time-consuming.</p>
<p>Because of the sea, the coastline is always changing at that level of detail.</p>
<p>However, by approximating and measuring in larger segments, we can get a practical measurement of the coastline.</p>
<p>This situation mirrors numerical analysis.</p>
<p>Approximation gives insights in situations where precise measurement is impossible or impractical.</p>
<p>Just as we accept a good estimation of the coastline length, numerical analysis uses approximation to solve hard problems.</p>
<h2 id="fundamentals">Fundamentals of Numerical Analysis</h2>

<p>Numerical analysis is all about approximation. It is like using binoculars to see a landscape that is very far away. We can't see every leaf. But we get a good enough picture to understand the terrain.</p>
<p>This is crucial in numerical analysis.</p>
<p>In this, we solve hard math problems where exact solutions are either impossible or extremely resource-intensive.</p>
<p>By approximating, we get sufficient good results with less computational effort.</p>
<h2 id="application">Application of Numerical Analysis in Real-World Problems</h2>

<p>There are many applications of numerical analysis</p>
<ul>
<li>In engineering, it enables simulation of structures and fluids.</li>
<li>In finance, for risk assessment and portfolio optimization.</li>
<li>In environmental science, it predicts climate patterns.</li>
</ul>
<p>In each field, numerical analysis is a toolkit to solve problems where pure math just takes too much time, or it is impossible to give good results.</p>
<h2 id="intro-PDE">An Introduction to Partial Differential Equations (PDEs)</h2>

<p>Partial Differential Equations (PDEs) are equations that describe how quantities like heat, sound, or electricity change in different places and as time goes on.</p>
<p>Solving PDEs is very important. Because it allows us to control these changes.</p>
<p>By allowing us to control them, we can:</p>
<ul>
<li>Predict weather patterns.</li>
<li>Understand sound propagation in different environments.</li>
<li>Design efficient transportation systems.</li>
<li>Optimize energy distribution.</li>
</ul>
<p>However, most PDE can only be approximated with numerical methods.</p>
<p>It is either too hard or impossible to find through normal calculations.</p>
<p>This way, with numerical methods, we are able to solve PDEs which in turn allows us to solve many real life problems.</p>
<h3 id="heading-numerical-solutions-of-pdes-with-scipy">Numerical Solutions of PDEs with SciPy</h3>
<p>Solving PDEs with numerical methods often involves dividing the PDEs in small, manageable parts. Solve each one and then add them up.</p>
<p>SciPy, a Python library for scientific and technical computing, gives many tools for this purpose.</p>
<p>Now, let's solve a heat transfer problem in a rod.</p>
<p>In the below code, we will see line by line how it allows us to know how heat spreads in a rod:</p>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> scipy.integrate <span class="hljs-keyword">import</span> solve_bvp

def heat_equation(x, y):
    <span class="hljs-keyword">return</span> np.vstack((y[<span class="hljs-number">1</span>], -y[<span class="hljs-number">0</span>]))

def boundary_conditions(ya, yb):
    <span class="hljs-keyword">return</span> np.array([ya[<span class="hljs-number">0</span>], yb[<span class="hljs-number">0</span>] - <span class="hljs-number">1</span>])

x = np.linspace(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">5</span>)
y = np.zeros((<span class="hljs-number">2</span>, x.size))

sol = solve_bvp(heat_equation, boundary_conditions, x, y)
</code></pre><p>Lets see how thhe code works block by block in the following sections.</p>
<h3 id="heading-how-to-importing-libraries">How to importing libraries</h3>
<pre><code><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> scipy.integrate <span class="hljs-keyword">import</span> solve_bvp
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/5-1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing libraries</em></p>
<p>Here we import 2 python libraries:</p>
<ul>
<li><a target="_blank" href="null">N</a><a target="_blank" href="https://numpy.org/">umPy</a></li>
<li><a target="_blank" href="null">S</a><a target="_blank" href="https://scipy.org/">ciPy</a></li>
</ul>
<p>These two python libraries are some of the most used in data science.</p>
<h3 id="heading-how-to-define-the-head-equation-and-boundary-conditions">How to define the head equation and boundary conditions</h3>
<pre><code>def heat_equation(x, y):
    <span class="hljs-keyword">return</span> np.vstack((y[<span class="hljs-number">1</span>], -y[<span class="hljs-number">0</span>]))

def boundary_conditions(ya, yb):
    <span class="hljs-keyword">return</span> np.array([ya[<span class="hljs-number">0</span>], yb[<span class="hljs-number">0</span>] - <span class="hljs-number">1</span>])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/6.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining heat equation and boundary conditions</em></p>
<p>We create <code>heat_equation(x, y)</code> and <code>boundary_conditions(ya, yb)</code>.</p>
<p>In <code>heat_equation(x, y)</code> we are defining the differential equation we want to solve.</p>
<p>The <code>boundary_conditions(ya, yb)</code> function defines constrains at the start and end of a solution. The condition is that the end of the solution needs to be one unit less than the start.</p>
<h3 id="heading-how-to-solve-the-equation">How to solve the equation</h3>
<pre><code>x = np.linspace(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">5</span>)
y = np.zeros((<span class="hljs-number">2</span>, x.size))

sol = solve_bvp(heat_equation, boundary_conditions, x, y)
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/7.png" alt="Image" width="600" height="400" loading="lazy">
<em>Solving equation</em></p>
<p>The line <code>sol = solve_bvp(heat_equation, boundary_conditions, x, y)</code> is the solution.</p>
<p>The code <a target="_blank" href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.solve_bvp.html"><code>solve_bvp</code>  stands for solve boundary value problem</a>.</p>
<p>It takes four arguments:</p>
<ul>
<li><code>heat_equation</code>: This is the main problem we are trying to solve.</li>
<li><code>boundary_conditions</code>: These are the mathematical constrains at the start and end of a solution.</li>
<li><code>x</code>:  Are the spots we choose to explore our answers.</li>
<li><code>y</code>: Are initial attempts to solve the problem, based on your chosen <code>x</code> values.</li>
</ul>
<h2 id="intro-optimization">An Introduction to Optimization in Numerical Analysis</h2>

<p>Optimization is finding the best solution from all solutions. It is like finding the most efficient route in a complex network of roads.</p>
<p>Numerical optimization methods find the most efficient or cost-effective solution to a problem, whether that is:</p>
<ul>
<li>Minimizing waste in production.</li>
<li>Maximizing efficiency in a logistic network.</li>
<li>Finding best fit for a certain data model.</li>
</ul>
<h3 id="heading-an-overview-of-numerical-optimization-techniques-with-scipy">An Overview of Numerical Optimization Techniques with SciPy</h3>
<p>The goal in this example is to minimize transportation cost across a network. </p>
<p>For instance, let's consider an optimization problem in logistics, where the goal is to minimize transportation cost across a network. </p>
<p>SciPy's <code>minimize</code> function can be used to find the best strategy to minimizes cost while meeting all constraints:</p>
<pre><code><span class="hljs-keyword">from</span> scipy.optimize <span class="hljs-keyword">import</span> minimize

def objective_function(x):
    <span class="hljs-keyword">return</span> x[<span class="hljs-number">0</span>]**<span class="hljs-number">2</span> + x[<span class="hljs-number">1</span>]**<span class="hljs-number">2</span>

def constraint_eq(x):
    <span class="hljs-keyword">return</span> x[<span class="hljs-number">0</span>] + x[<span class="hljs-number">1</span>] - <span class="hljs-number">10</span>

con_eq = {<span class="hljs-string">'type'</span>: <span class="hljs-string">'eq'</span>, <span class="hljs-string">'fun'</span>: constraint_eq}

bounds = [(<span class="hljs-number">0</span>, <span class="hljs-number">10</span>), (<span class="hljs-number">0</span>, <span class="hljs-number">10</span>)]

x0 = [<span class="hljs-number">5</span>, <span class="hljs-number">5</span>]

result = minimize(objective_function, x0, method=<span class="hljs-string">'SLSQP'</span>, bounds=bounds, constraints=[con_eq])
</code></pre><p>Lets explain how the code works block by block.</p>
<h3 id="heading-how-to-importing-the-library">How to importing the library</h3>
<pre><code><span class="hljs-keyword">from</span> scipy.optimize <span class="hljs-keyword">import</span> minimize
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/8.png" alt="Image" width="600" height="400" loading="lazy">
<em>Importing scipy</em></p>
<p>Once again we import the necessary library:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html">https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html</a></div>
<h3 id="heading-how-to-defining-objective-and-constraint-equation">How to defining objective and constraint equation</h3>
<pre><code>def objective_function(x):
    <span class="hljs-keyword">return</span> x[<span class="hljs-number">0</span>]**<span class="hljs-number">2</span> + x[<span class="hljs-number">1</span>]**<span class="hljs-number">2</span>

def constraint_eq(x):
    <span class="hljs-keyword">return</span> x[<span class="hljs-number">0</span>] + x[<span class="hljs-number">1</span>] - <span class="hljs-number">10</span>

con_eq = {<span class="hljs-string">'type'</span>: <span class="hljs-string">'eq'</span>, <span class="hljs-string">'fun'</span>: constraint_eq}
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/9.png" alt="Image" width="600" height="400" loading="lazy">
<em>Define objective and constrain equations</em></p>
<ul>
<li>The objective function is the function we want to minimize to find the best answer.</li>
<li>The constraint equation is the equation that limits the search space to those <code>x</code> values that fulfill this equation.</li>
</ul>
<p><code>con_eq</code> is defined by the following:</p>
<ul>
<li><code>'type': 'eq'</code> indicates the type of constraint.  <code>'eq'</code> means equality, in other words, the function must equal zero at the solution.</li>
<li><code>'fun': constraint_eq</code> assigns the constraint function.</li>
</ul>
<p>We will see in the next block of code, it is where we constrain the possible solutions of the problem.</p>
<h3 id="heading-how-to-define-an-initial-condition-and-result">How to define an initial condition and result</h3>
<pre><code>bounds = [(<span class="hljs-number">0</span>, <span class="hljs-number">10</span>), (<span class="hljs-number">0</span>, <span class="hljs-number">10</span>)]

x0 = [<span class="hljs-number">5</span>, <span class="hljs-number">5</span>]

result = minimize(objective_function, x0, method=<span class="hljs-string">'SLSQP'</span>, bounds=bounds, constraints=[con_eq])
</code></pre><p><img src="https://www.freecodecamp.org/news/content/images/2024/02/10.png" alt="Image" width="600" height="400" loading="lazy">
<em>Defining initial condition and solving equation</em></p>
<p>To understand this block of code, let's understand each parameter of <code>result = minimize(objective_function, x0, method='SLSQP', bounds=bounds, constraints=[con_eq])</code>:</p>
<ul>
<li><code>objective_function</code>: Is the function to be minimized.</li>
<li><code>x0</code>: Is the initial guess for the variables.</li>
<li><code>method='SLSQP'</code>: This specifies the optimization algorithm we are using. In this case, we use <a target="_blank" href="https://docs.scipy.org/doc/scipy/reference/optimize.minimize-slsqp.html">SLSQP (Sequential Least SQuares Programming)</a>.</li>
<li><code>bounds=bounds</code>: This parameter specifies the bounds for each of the decision variables. </li>
<li><code>constraints=[con_eq]</code>: This parameter tells us the constraints applied in the optimization problem.</li>
</ul>
<h2 id="heading-this-is-how-many-real-life-problems-are-solved">This is how many real life problems are solved</h2>
<p>Many things in real life are modeled with partial differential equation.</p>
<p>Then, with optimization methods developed with numerical analysis, they are optimized.</p>
<p>I am writing this because I know math can be boring for some people, and they may not be aware of where it is applied to solve real problems. The Calculus they learn can be applied in non-ideal situations outside the exams exercises.</p>
<p>Here, we can see finally see why math is important in two scenarios:</p>
<ul>
<li>To model systems to get solutions from it</li>
<li>To optimize a certain system</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Numerical analysis is one of the most important areas of applied math in STEM.</p>
<p>From solving PDE to optimize problems, numerical analysis is everywhere.</p>
<p>With more complex problems, numerical analysis is growing in importance to get faster algorithms that approximate pure math solutions.</p>
<p>This way, it is a bridge between theoretical mathematics and practical application.</p>
<p>If you want to, you can get the full code used in this article on <a target="_blank" href="https://github.com/tiagomonteiro0715/freecodecamp-my-articles-source-code">GitHub</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
