<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ nlp - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ nlp - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Mon, 25 May 2026 22:37:28 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/nlp/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ AI Paper Review: Language Models are Few-Shot Learners (GPT-3) ]]>
                </title>
                <description>
                    <![CDATA[ After GPT-2, it became clear that language models could do much more than researchers originally expected. Simply training a model to predict the next word had already started producing surprising abi ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ai-paper-review-language-models-are-few-shot-learners-gpt-3/</link>
                <guid isPermaLink="false">6a0b76a04e81b730489aea6f</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mohammed Fahd Abrah ]]>
                </dc:creator>
                <pubDate>Mon, 18 May 2026 20:29:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/9fd8e279-ebf3-4662-b204-737dd38b7648.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>After GPT-2, it became clear that language models could do much more than researchers originally expected. Simply training a model to predict the next word had already started producing surprising abilities like translation, summarization, and question answering without task-specific training.</p>
<p>But there was still a major limitation. Even though GPT-2 could generalize across tasks, it still struggled to adapt reliably. Performance often depended on carefully written prompts, and for many real-world applications, fine-tuning was still necessary. AI systems were becoming more flexible, but they still were not truly learning tasks from context the way humans do.</p>
<p>Then GPT-3 pushed the idea much further. Instead of asking whether language models could perform tasks without fine-tuning, the paper explored something even more ambitious:</p>
<p>What happens if we scale language models to an extreme size? The answer surprised almost everyone in the AI community.</p>
<p>GPT-3 showed that a sufficiently large language model could often learn new tasks directly from examples inside the prompt itself. No retraining. No gradient updates. Just a few demonstrations written in natural language.</p>
<p>For example, if you showed the model a few English-to-French translations, it could continue the pattern correctly for a new sentence. If you gave it examples of questions and answers, it could often infer the task immediately and generate reasonable responses.</p>
<p>This became known as <em>few-shot learning</em> and <em>in-context learning</em>.</p>
<p>More importantly, GPT-3 suggested a completely different way of interacting with AI systems. Instead of training a separate model for every task, the same model could dynamically adapt depending on the instructions and examples it received.</p>
<p>That idea eventually became the foundation for modern AI systems like ChatGPT.</p>
<p>Now, like many influential AI papers, the GPT-3 paper can be difficult to read because of its scale, technical experiments, and long benchmark evaluations. So in this article, I’ll break everything down in a clear and practical way.</p>
<p>We’ll explore what problem the paper was trying to solve, how few-shot learning works, why scaling became so important, how GPT-3 was trained, and why this paper fundamentally changed the direction of modern AI research.</p>
<p>By the end, you should understand the core ideas behind GPT-3 and why this paper became one of the most important milestones in the history of large language models LLM.</p>
<h2 id="heading-paper-overview">Paper Overview</h2>
<p>In this article, we’ll review the paper <a href="https://arxiv.org/pdf/2005.14165"><em>Language Models are Few-Shot Learners</em></a> by Tom Brown et al. from Open AI.</p>
<p>This paper introduced GPT-3 and demonstrated something that changed the direction of modern AI research: large language models could learn tasks directly from prompts and examples without task-specific fine-tuning like the methodology of GPT-1.</p>
<p>Instead of retraining the model for every new task, GPT-3 could often adapt dynamically through natural language instructions, one-shot examples, or few-shot prompting.</p>
<p>The paper also introduced the idea of <em>in-context learning</em>, where the model effectively learns from patterns inside the prompt itself during inference.</p>
<p>Here’s the original paper if you want to explore it directly: <a href="https://arxiv.org/pdf/2005.14165"><em>Language Models are Few-Shot Learners (PDF)</em></a></p>
<p>And here’s a quick infographic of what we’ll cover throughout this review:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/871201a8-de4c-4a1c-8b75-4bab09fdb1fc.png" alt="GPT-3 Quick Insight" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-table-of-content">Table of Content:</h2>
<ul>
<li><p><a href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a href="#heading-goals-of-the-paper">Goals of the Paper</a></p>
</li>
<li><p><a href="#heading-core-idea">Core Idea</a></p>
</li>
<li><p><a href="#heading-methodology">Methodology</a></p>
</li>
<li><p><a href="#heading-fine-tuning-vs-zero-shot-vs-few-shot">Fine-tuning vs Zero-Shot vs Few-Shot</a></p>
</li>
<li><p><a href="#heading-model-architecture">Model Architecture</a></p>
</li>
<li><p><a href="#heading-experiments">Experiments</a></p>
</li>
<li><p><a href="#heading-key-findings">Key Findings</a></p>
</li>
<li><p><a href="#heading-task-specific-observations">Task-Specific Observations</a></p>
</li>
<li><p><a href="#heading-generalization-vs-memorization">Generalization vs Memorization</a></p>
</li>
<li><p><a href="#heading-discussion">Discussion</a></p>
</li>
<li><p><a href="#heading-limitations">Limitations</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-final-insight">Final Insight</a></p>
</li>
<li><p><a href="#heading-gpt-1-vs-gpt-2-vs-gpt-3-key-differences">GPT-1 vs GPT-2 vs GPT-3: Key Differences</a></p>
</li>
<li><p><a href="#heading-pytorch-implementations-of-the-gpt-architecture-evolution">PyTorch Implementations of the GPT Architecture Evolution</a></p>
</li>
<li><p><a href="#heading-resources">Resources:</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this breakdown, it helps to already be familiar with a few foundational ideas.</p>
<p>Reading the previous reviews in this series will be especially helpful:</p>
<ul>
<li><p><a href="https://www.freecodecamp.org/news/ai-paper-review-improving-language-understanding-by-generative-pre-training-gpt-1/"><em>AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1)</em></a></p>
</li>
<li><p><a href="https://www.freecodecamp.org/news/ai-paper-review-language-models-are-unsupervised-multitask-learners-gpt-2/"><em>AI Paper Review: Language Models are Unsupervised Multitask Learners (GPT-2)</em></a></p>
</li>
</ul>
<p>GPT-3 directly builds on many of the ideas introduced in those earlier papers, especially pre-training, zero-shot learning, and large-scale language modeling.</p>
<p>It also helps to have:</p>
<ul>
<li><p>A general understanding of natural language processing (NLP) and how machines work with text</p>
</li>
<li><p>A high-level idea of what a Transformer model is (you do not need deep mathematical details)</p>
</li>
<li><p>Familiarity with supervised learning, unsupervised learning, and zero-shot learning</p>
</li>
<li><p>A basic understanding of prompts and how language models generate text</p>
</li>
<li><p>General machine learning concepts like training data, parameters, scaling, and inference</p>
</li>
</ul>
<p>You do not need to be an AI researcher to follow this article, though.</p>
<p>I’ll keep the explanations practical and intuitive, focusing more on understanding the core ideas behind GPT-3 rather than getting lost in dense mathematical details or academic terminology.</p>
<h2 id="heading-executive-summary"><strong>Executive Summary</strong></h2>
<p>Before GPT-3, models like GPT-2 had already shown something surprising: a language model trained only to predict the next word could still perform many tasks it was never directly trained for. Translation, summarization, question answering somehow these abilities started appearing naturally as models became larger.</p>
<p>But there was still a limitation.</p>
<p>Even with GPT-2, strong performance often depended on careful prompting or additional fine-tuning. In practice, most NLP systems still followed the same pattern: train a large model first, then retrain or fine-tune it separately for every new task.</p>
<p>GPT-3 challenges that entire workflow.</p>
<p>According to the authors, if a language model becomes large enough, it can begin learning tasks directly from context alone. Instead of updating the model’s parameters, you simply show it a few examples inside the prompt, and the model continues the pattern.</p>
<p>This idea is what the paper calls <em>few-shot learning</em>.</p>
<p>For example, rather than training a separate translation model, you could write something like:</p>
<ul>
<li><p>dog → chien</p>
</li>
<li><p>cat → chat</p>
</li>
<li><p>house → ?</p>
</li>
</ul>
<p>And GPT-3 would often continue with the correct answer: <em>maison</em>.</p>
<p>What makes this important is that the model is not learning through gradient updates during inference. There is no retraining happening in the traditional sense. The learning happens inside the context window itself, through the examples provided in the prompt.</p>
<p>This marks a major shift in how language models are used.</p>
<p>Instead of building a specialized system for every task, GPT-3 suggests that a single sufficiently large model can adapt dynamically just by reading instructions and examples. The paper refers to this behavior as <em>in-context learning</em>, and much of GPT-3’s contribution revolves around showing how powerful this idea becomes at scale.</p>
<h2 id="heading-goals-of-the-paper"><strong>Goals of the Paper</strong></h2>
<p>According to the authors, one of the biggest limitations of existing NLP systems is that they depend too heavily on task-specific training. Even though models had become increasingly powerful by the time GPT-3 was introduced, most systems still required a separate fine-tuning process for every new task.</p>
<p>In practice, this created several problems.</p>
<p>First, every task needed labeled data. If you wanted a model to summarize articles, answer questions, classify sentiment, or translate text, you usually needed thousands, or sometimes millions of carefully prepared examples. Collecting that data was expensive, time-consuming, and often unrealistic for smaller or niche tasks.</p>
<p>Second, every new capability required additional training. Even when the underlying model was already pretrained on massive amounts of text, developers still had to retrain or fine-tune it again and again for specific use cases.</p>
<p>The paper argues that this workflow is fundamentally inefficient. More importantly, the authors point out that it does not resemble how humans learn. Humans can often understand a task after seeing only a few demonstrations or simple instructions. We do not usually need thousands of labeled examples to figure out what is being asked.</p>
<p>This becomes the central question behind GPT-3:</p>
<p>Can a language model learn new tasks directly from context instead of relying on parameter updates and task-specific retraining?</p>
<p>That question drives nearly every experiment in the paper. Rather than testing whether GPT-3 can master one carefully optimized benchmark, the authors are exploring something broader: whether scaling language models can produce systems that adapt dynamically just from prompts, examples, and natural language instructions.</p>
<h2 id="heading-core-idea"><strong>Core Idea</strong></h2>
<p>At its core, GPT-3 is still built around the same fundamental idea used in GPT-2: train a language model to predict the next token in a sequence. The training objective itself is surprisingly simple. Given some text, the model learns to guess what comes next, one token at a time.</p>
<p>On the surface, GPT-3 may look like nothing more than a much larger version of GPT-2. And in some ways, that is true. The model scales dramatically in size, growing to 175 billion parameters, and it is trained on a far larger and more diverse dataset gathered from sources like Common Crawl, WebText, books, and Wikipedia.</p>
<p>But the paper argues that something more interesting begins to happen as language models scale.</p>
<p>Instead of simply memorizing text patterns better, GPT-3 starts showing the ability to learn tasks directly from prompts. When the model sees examples inside the input itself, it can often continue the pattern correctly without any additional training or parameter updates.</p>
<p>For example, if the prompt contains a few question-answer pairs or translation examples, GPT-3 can infer the structure of the task and generate similar outputs for new inputs. In other words, the prompt becomes a temporary learning environment.</p>
<p>This is the key conceptual shift in the paper.</p>
<p>Traditional machine learning usually separates training from inference. First the model learns by updating its weights, then later it is deployed to make predictions. GPT-3 blurs that boundary. The model still learns during pretraining, of course, but during inference it can also adapt behavior dynamically based on the context it receives.</p>
<p>The authors describe this behavior as <em>in-context learning</em>.</p>
<p>What makes this idea important is that the model is not retrained for each task. There are no gradient updates happening while the prompt is processed. Instead, GPT-3 learns from the examples embedded inside the context window itself.</p>
<p>This marks a subtle but important change in how we think about language models. The prompt is no longer just an input. It effectively becomes a lightweight interface for teaching the model what to do.</p>
<h2 id="heading-methodology"><strong>Methodology</strong></h2>
<p>One reason GPT-3 became so influential is that the underlying training process is actually very familiar. Unlike many research papers that introduce entirely new architectures or complicated learning algorithms, GPT-3 mostly builds on ideas that already existed before it. The difference is how aggressively those ideas are scaled.</p>
<p>According to the authors, the core training objective remains standard autoregressive language modeling. In simple terms, the model reads text and repeatedly learns to predict the next token in the sequence. This is the same general approach used in GPT-2.</p>
<p>The process itself is conceptually straightforward:</p>
<ul>
<li><p>Train a very large Transformer model</p>
</li>
<li><p>Feed it enormous amounts of internet text</p>
</li>
<li><p>Optimize it to predict the next word over and over again</p>
</li>
</ul>
<p>What changes dramatically is the scale.</p>
<p>GPT-3 is trained on hundreds of billions of tokens collected from sources such as Common Crawl, WebText, books, and Wikipedia. The paper also explains that OpenAI filtered and cleaned large portions of the Common Crawl dataset to improve quality and reduce duplication.</p>
<p>But the most important part of the methodology is not just how the model is trained. It is how the model is <em>used after training</em>.</p>
<p>Traditionally, NLP systems relied heavily on fine-tuning. After pretraining a language model, developers would train it again on a smaller labeled dataset for each individual task. GPT-3 experiments with a different approach entirely.</p>
<p>Instead of retraining the model, tasks are described directly inside the prompt.</p>
<p>The paper studies three main settings:</p>
<ul>
<li><p><em>Zero-shot learning</em>: the model receives only a natural language instruction</p>
</li>
<li><p><em>One-shot learning</em>: the model receives a single example of the task</p>
</li>
<li><p><em>Few-shot learning</em>: the model receives several examples before solving a new case</p>
</li>
</ul>
<p>For example, a translation prompt might look like this:</p>
<p>dog → chien<br>cat → chat<br>house → ?</p>
<p>GPT-3 then continues the pattern and predicts:</p>
<p>maison</p>
<p>What makes this remarkable is that no retraining happens during this process. The model’s weights remain completely unchanged. It is simply using the information inside the prompt to infer what kind of task is being requested.</p>
<p>In practice, this transforms the prompt into something much more powerful than an ordinary input. It becomes a temporary workspace where the model can recognize patterns, adapt behavior, and apply learned knowledge dynamically.</p>
<p>The paper repeatedly emphasizes that this behavior emerges through scale rather than task-specific engineering. GPT-3 is not trained separately for translation, summarization, reasoning, or question answering. Instead, the same general language modelinqag objective appears to produce all of these abilities when the model becomes sufficiently large.</p>
<h2 id="heading-fine-tuning-vs-zero-shot-vs-few-shot"><strong>Fine-tuning vs Zero-Shot vs Few-Shot</strong></h2>
<table style="min-width:100px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>Fine-Tuning</strong></p></td><td><p><strong>Zero-Shot Learning</strong></p></td><td><p><strong>Few-Shot Learning</strong></p></td></tr><tr><td><p><strong>Definition</strong></p></td><td><p>The model is additionally trained on labeled data for a specific task</p></td><td><p>The model performs a task using only instructions, without examples</p></td><td><p>The model learns the task from a small number of examples inside the prompt</p></td></tr><tr><td><p><strong>Training Requirement</strong></p></td><td><p>Requires supervised task-specific datasets</p></td><td><p>No task-specific training or examples</p></td><td><p>No retraining, but requires a few demonstrations in the prompt</p></td></tr><tr><td><p><strong>How Tasks Are Given</strong></p></td><td><p>Through a separate training phase</p></td><td><p>Through natural language instructions</p></td><td><p>Through instructions plus a few input-output examples</p></td></tr><tr><td><p><strong>Learning Process</strong></p></td><td><p>Model weights are updated during training</p></td><td><p>No weight updates</p></td><td><p>No weight updates; learning happens inside the context window</p></td></tr><tr><td><p><strong>Flexibility</strong></p></td><td><p>Usually specialized for one task</p></td><td><p>Highly flexible across many tasks</p></td><td><p>Flexible while still benefiting from demonstrations</p></td></tr><tr><td><p><strong>Adaptability</strong></p></td><td><p>Requires retraining for new tasks</p></td><td><p>Adapts instantly through prompting</p></td><td><p>Adapts quickly from contextual examples</p></td></tr><tr><td><p><strong>Data Dependency</strong></p></td><td><p>Depends heavily on labeled datasets</p></td><td><p>Depends mostly on pretraining knowledge</p></td><td><p>Depends on both pretraining and prompt examples</p></td></tr><tr><td><p><strong>Performance</strong></p></td><td><p>Often strongest on narrow benchmark tasks</p></td><td><p>Usually weaker than fine-tuning</p></td><td><p>Often much stronger than zero-shot and sometimes close to fine-tuning</p></td></tr><tr><td><p><strong>Scalability Across Tasks</strong></p></td><td><p>Expensive and difficult to scale</p></td><td><p>Extremely scalable</p></td><td><p>Scalable without retraining</p></td></tr><tr><td><p><strong>Compute Cost</strong></p></td><td><p>High because every task may require new training</p></td><td><p>Low during usage</p></td><td><p>Low during usage</p></td></tr><tr><td><p><strong>Example</strong></p></td><td><p>Fine-tune a model on a sentiment analysis dataset</p></td><td><p>“Classify the sentiment of this sentence”</p></td><td><p>“Positive: I loved the movie. Negative: The film was boring. Sentence: The story was amazing →”</p></td></tr><tr><td><p><strong>Main Strength</strong></p></td><td><p>High accuracy on carefully trained tasks</p></td><td><p>Simplicity and broad generalization</p></td><td><p>Strong balance between flexibility and performance</p></td></tr><tr><td><p><strong>Main Weakness</strong></p></td><td><p>Poor scalability across many tasks</p></td><td><p>Can misunderstand task format or intent</p></td><td><p>Sensitive to prompt quality and example selection</p></td></tr><tr><td><p><strong>Most Associated With</strong></p></td><td><p>Traditional NLP systems, GPT-1 era</p></td><td><p>GPT-2 style prompting</p></td><td><p>GPT-3 and in-context learning</p></td></tr><tr><td><p><strong>Core Idea</strong></p></td><td><p>Train specifically for each task</p></td><td><p>Infer the task from instructions</p></td><td><p>Infer the task from examples in context</p></td></tr></tbody></table>

<h2 id="heading-model-architecture"><strong>Model Architecture</strong></h2>
<p>Architecturally, GPT-3 does not introduce a radically new design. In fact, one of the most interesting aspects of the paper is that the core architecture is almost identical to GPT-2. OpenAI continues using a decoder-only Transformer model trained with an autoregressive objective.</p>
<p>At a high level, the Transformer architecture processes text using a mechanism called <em>attention</em>. Instead of reading words strictly one at a time like older recurrent models, Transformers can look across the entire sequence and determine which words are most relevant to each other.</p>
<p>More specifically, GPT-3 relies on <em>self-attention</em>, which allows the model to weigh different parts of the context while generating text. This helps the model capture long-range relationships between words, sentences, and ideas.</p>
<p>The model is also <em>autoregressive</em>, meaning it generates text sequentially by predicting the next token based on everything that came before it. This next-token prediction objective remains the foundation of GPT-3, just as it was for GPT-2.</p>
<p>So if the architecture is mostly the same, what actually changed?</p>
<p>The answer is scale.</p>
<p>GPT-3 dramatically increases the size of the model, the amount of training data, and the computational resources used during training. The largest version of GPT-3 contains 175 billion parameters, making it far larger than GPT-2’s 1.5 billion parameter model.</p>
<p>The paper also experiments with multiple model sizes ranging from 125 million parameters all the way to 175 billion. This was important because the authors wanted to study how capabilities evolve as models grow larger.</p>
<p>The architecture includes:</p>
<ul>
<li><p>A decoder-only Transformer design</p>
</li>
<li><p>A context window of 2048 tokens</p>
</li>
<li><p>Multiple model scales trained under similar objectives</p>
</li>
<li><p>Attention mechanisms that allow the model to process contextual relationships efficiently</p>
</li>
</ul>
<p>One of the paper’s most important observations is that performance improves smoothly as scale increases. Larger models consistently perform better across a wide range of tasks, including translation, question answering, reasoning, and few-shot learning.</p>
<p>This idea becomes central to the entire GPT-3 paper.</p>
<p>Rather than relying on handcrafted task-specific systems, the authors suggest that many advanced capabilities emerge naturally when language models become sufficiently large and are trained on enough diverse data. In other words, scaling itself starts acting like a research strategy.</p>
<p>What makes this shift important is that GPT-3 does not achieve its results through complicated architectural innovations. The paper’s argument is much simpler, and in some ways more surprising:</p>
<p>A relatively standard Transformer architecture, when scaled aggressively enough, begins to display entirely new behaviors.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/4ab1a945-4379-4f2a-b8a5-3dd15ddbcebb.png" alt="Transformer-Decoder-Architecture" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Note:</strong> The original figure illustrates the complete Transformer architecture (Encoder–Decoder) from <em>Attention Is All You Need</em>. For clarity and relevance to GPT-style models, the image used here was cropped to focus only on the decoder side of the architecture, since GPT models are based on a decoder-only Transformer design.</p>
<p><strong>Reference:</strong> Brownlee, J. <a href="https://machinelearningmastery.com/encoders-and-decoders-in-transformer-models/?utm_source=chatgpt.com">Encoders and Decoders in Transformer Models</a> Machine Learning Mastery.</p>
<h2 id="heading-experiments"><strong>Experiments</strong></h2>
<p>To understand whether GPT-3 could truly learn from context alone, the authors evaluated the model across a very broad range of NLP tasks. Rather than focusing on a single benchmark, the paper tests whether the same pretrained model can adapt to many different kinds of problems using only prompts and examples.</p>
<p>The experiments cover a wide variety of domains, including:</p>
<ul>
<li><p>Language modeling and text completion</p>
</li>
<li><p>Question answering</p>
</li>
<li><p>Translation between languages</p>
</li>
<li><p>Reading comprehension</p>
</li>
<li><p>Commonsense reasoning</p>
</li>
<li><p>Winograd-style reasoning tasks</p>
</li>
<li><p>Cloze and sentence completion tasks</p>
</li>
<li><p>Synthetic reasoning problems such as arithmetic and word manipulation</p>
</li>
</ul>
<p>What makes these experiments especially important is the evaluation setup itself.</p>
<p>Instead of fine-tuning GPT-3 separately for each benchmark, the model is tested entirely through prompting. The authors evaluate GPT-3 in three different settings:</p>
<ul>
<li><p><em>Zero-shot learning</em>, where the model receives only a task description</p>
</li>
<li><p><em>One-shot learning</em>, where it receives a single example</p>
</li>
<li><p><em>Few-shot learning</em>, where several demonstrations are included inside the prompt</p>
</li>
</ul>
<p>For example, in translation tasks, the prompt may contain a few English-to-French examples before asking the model to continue the pattern. In question-answering tasks, the model might see several example questions and answers before attempting a new one.</p>
<p>Importantly, the model’s parameters never change during these evaluations. There are no gradient updates, no retraining steps, and no task-specific optimization. GPT-3 performs every task using the exact same pretrained weights.</p>
<p>This is one of the paper’s biggest departures from traditional NLP systems.</p>
<p>At the time, most state-of-the-art models achieved strong benchmark results through supervised fine-tuning on carefully prepared datasets. GPT-3 instead tests whether a single large language model can generalize across tasks simply by understanding patterns inside prompts.</p>
<p>The paper also evaluates how performance changes as model size increases. OpenAI trained multiple versions of GPT-3, ranging from 125 million parameters up to 175 billion parameters, then compared how scaling affected zero-shot, one-shot, and few-shot behavior.</p>
<p>According to the authors, larger models become noticeably better at using contextual information. Few-shot learning improves especially strongly with scale, suggesting that bigger models are not just memorizing more information. They are becoming better at adapting to new tasks dynamically.</p>
<h2 id="heading-key-findings"><strong>Key Findings</strong></h2>
<p>This is the section where GPT-3 stops feeling like “just a bigger language model” and starts looking like something fundamentally different.</p>
<p>According to the paper, one of the clearest patterns across nearly all experiments is that performance improves consistently as model size increases. As GPT-3 scales from millions of parameters to hundreds of billions, the model becomes dramatically better at understanding prompts, adapting to context, and performing tasks it was never explicitly trained for.</p>
<p>But the most surprising result is not simply higher benchmark scores.</p>
<p>The real breakthrough is that <em>few-shot learning actually works at scale</em>.</p>
<p>Across many tasks, GPT-3’s few-shot performance approaches strong fine-tuned systems, and in some cases even matches or surpasses them. This is remarkable because GPT-3 achieves these results without updating its weights for individual tasks. Everything happens through prompting alone.</p>
<p>One of the strongest examples appears in question answering benchmarks.</p>
<p>On TriviaQA, GPT-3 improves significantly as more examples are provided in the prompt. The paper reports that zero-shot performance is already competitive, but one-shot and few-shot prompting push results even further, eventually reaching or exceeding some state-of-the-art fine-tuned systems in the same closed-book setting.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/1b4bfb72-6cbe-4af9-ba1c-5ddb1afa47eb.png" alt="ZeroShot-OneShot-FewShot learning" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Source: Brown et al. (2020), <em>Language Models are Few-Shot Learners</em>, Figure 1.2.</p>
<p>The same pattern appears repeatedly throughout the paper:</p>
<ul>
<li><p>Few-shot prompting consistently outperforms zero-shot prompting</p>
</li>
<li><p>Larger models make better use of contextual examples</p>
</li>
<li><p>Scaling improves not only accuracy, but adaptability itself</p>
</li>
</ul>
<p>This last point is especially important.</p>
<p>The paper suggests that scaling does more than help the model memorize facts or generate more fluent text. As models become larger, they appear to develop stronger <em>in-context learning</em> abilities. In other words, bigger models become better at inferring patterns and task structures directly from prompts.</p>
<p>The authors even observe that the gap between zero-shot and few-shot performance grows with model size. Smaller models struggle to learn effectively from prompts, while larger models can often infer the task from only a handful of examples.</p>
<p>What makes this finding historically important is that it changes how researchers think about capability growth in AI systems.</p>
<p>Before GPT-3, scaling was often viewed mainly as a way to improve existing performance metrics. GPT-3 introduces a different possibility: that entirely new behaviors can emerge as models become sufficiently large.</p>
<p>This is why the paper became so influential. It was not just reporting better benchmark numbers. It was presenting evidence that scale itself can unlock qualitatively new forms of learning behavior.</p>
<h2 id="heading-task-specific-observations"><strong>Task-Specific Observations</strong></h2>
<p>When you look beyond the headline results, the paper reveals something more nuanced about GPT-3: its abilities are highly uneven. The model performs surprisingly well in some areas, yet still struggles badly in others.</p>
<p>GPT-3 shows particularly strong performance on tasks that align closely with pattern recognition and language continuation.</p>
<p>Translation is one notable example. While GPT-3 was never trained specifically as a translation system, the model can still produce impressive results when given a few examples in the prompt. According to the paper, few-shot translation performance improves substantially as model size increases, especially when translating into English.</p>
<p>The model also performs well on question answering benchmarks, especially in closed-book settings where the answer must come directly from information stored inside the model’s parameters. Tasks like TriviaQA show strong gains as GPT-3 moves from zero-shot to few-shot prompting.</p>
<p>Text completion and cloze-style tasks are another major strength. GPT-3 demonstrates a strong ability to continue patterns, complete paragraphs, and infer missing words from context. On datasets like LAMBADA, the few-shot setup produces especially large improvements.</p>
<p>But the paper is also careful about documenting weaknesses.</p>
<p>GPT-3 struggles noticeably on certain reasoning-heavy benchmarks, particularly tasks involving natural language inference. Datasets like ANLI remain difficult even for the largest model.</p>
<p>Some reading comprehension tasks also expose limitations. In several cases, GPT-3 generates answers that sound plausible but fail to demonstrate deep understanding of the passage. This becomes a recurring theme throughout the paper: fluent language generation does not always mean reliable reasoning.</p>
<p>One of the most interesting observations is how sensitive GPT-3 is to prompt design.</p>
<p>Performance often changes dramatically depending on how examples are written, formatted, or ordered inside the context window. In many tasks, adding just a few demonstrations significantly improves accuracy.</p>
<p>This suggests something important about how GPT-3 operates.</p>
<p>The model is not simply retrieving fixed knowledge from memory. Instead, it relies heavily on contextual cues to infer what kind of behavior is expected. Small prompt changes can reshape the model’s interpretation of the task itself.</p>
<p>In practice, this paper helped introduce an entirely new idea to the AI community: that <em>how you ask the model</em> can matter almost as much as the model itself.</p>
<p>That insight eventually evolves into what we now call <em>prompt engineering</em>.</p>
<h2 id="heading-generalization-vs-memorization"><strong>Generalization vs Memorization</strong></h2>
<p>One of the biggest questions surrounding GPT-3 is whether the model is genuinely learning useful patterns, or simply memorizing enormous portions of the internet.</p>
<p>This concern becomes especially important because GPT-3 is trained on massive web-scale datasets, including Common Crawl. With a model this large, it is reasonable to ask whether strong benchmark performance comes from real generalization or from accidentally seeing parts of the evaluation data during training.</p>
<p>The authors take this issue seriously and dedicate an entire section of the paper to studying what they call <em>data contamination</em>.</p>
<p>According to the paper, OpenAI searched for overlaps between the training data and benchmark datasets used during evaluation. They discovered that some contamination did exist. In other words, portions of certain evaluation datasets appeared somewhere inside the model’s training corpus.</p>
<p>However, the authors argue that this overlap is not large enough to fully explain GPT-3’s results.</p>
<p>For many benchmarks, performance improvements remain consistent even after accounting for contamination effects. The paper also notes that some tasks specifically designed to test adaptation and reasoning still show strong few-shot behavior despite being unlikely to appear directly in the training data.</p>
<p>Another important observation is that GPT-3 still <em>underfits</em> the training data. This means the model has not perfectly memorized everything it has seen, even after extremely large-scale training.</p>
<p>That detail matters because it suggests the model is learning statistical structures and linguistic patterns rather than storing an exact copy of the dataset.</p>
<p>Of course, memorization does still happen to some extent. Large language models can reproduce fragments of training text, especially when rare or repeated data appears frequently during training. The paper does not deny this. Instead, the authors argue that memorization alone cannot explain GPT-3’s broad performance across translation, reasoning, question answering, and in-context learning tasks.</p>
<p>In practice, the evidence points toward something more complex.</p>
<p>GPT-3 appears to absorb patterns, relationships, and task structures from large-scale text data, then reuse those patterns flexibly in new contexts. That is very different from simply copying stored answers.</p>
<p>This distinction becomes one of the central debates in modern AI research. GPT-3 forced researchers to think more carefully about what it actually means for a language model to “understand” something, and where the boundary lies between memorization, pattern recognition, and genuine generalization.</p>
<h2 id="heading-discussion"><strong>Discussion</strong></h2>
<p>This is the point in the paper where the broader implications of GPT-3 start becoming clear.</p>
<p>According to the authors, large language models may be doing something more general than simply predicting text. By training on enormous amounts of language data, the model appears to learn patterns associated with tasks themselves.</p>
<p>That idea changes how we think about language modeling.</p>
<p>Traditionally, NLP systems were designed around explicit supervision. If you wanted a model to translate text, answer questions, summarize documents, or classify sentiment, you trained it specifically for that task using labeled examples.</p>
<p>GPT-3 suggests a different possibility.</p>
<p>The paper argues that many tasks are already implicitly embedded inside natural language data. During pretraining, the model encounters countless examples of explanations, translations, conversations, reasoning patterns, instructions, and question-answer pairs scattered across the internet. As scale increases, the model begins learning these behaviors indirectly.</p>
<p>In practice, this means the model does not always require explicit retraining to perform a new task. Instead, prompts and examples can activate behaviors the model has already absorbed during pretraining.</p>
<p>This is why prompting becomes so powerful in GPT-3.</p>
<p>The prompt is not merely providing information. It is guiding the model toward a behavior pattern that already exists somewhere inside its learned representations.</p>
<p>At the same time, the authors are careful not to overstate the results.</p>
<p>Throughout the paper, they repeatedly acknowledge that GPT-3 is still inconsistent. Some outputs are remarkably convincing, while others are obviously incorrect, nonsensical, or logically flawed.</p>
<p>This becomes one of GPT-3’s defining characteristics.</p>
<p>The model often sounds far more confident than it actually is. It can generate fluent explanations and persuasive answers even when the underlying reasoning is weak or factually wrong. In some tasks, especially deeper reasoning and reading comprehension benchmarks, GPT-3 still struggles significantly.</p>
<p>So the paper does not present GPT-3 as a solved form of intelligence.</p>
<p>Instead, it presents evidence that scaling language models unlocks new capabilities that were previously weak or absent. The results are impressive enough to suggest a major shift in direction, but not strong enough to eliminate the need for further research.</p>
<p>That balance is part of what makes the paper influential. It is ambitious, but also surprisingly honest about the limitations that still remain.</p>
<h2 id="heading-limitations"><strong>Limitations</strong></h2>
<p>One reason the GPT-3 paper remained credible despite the excitement surrounding it is that the authors were unusually open about the model’s weaknesses. The paper does not claim that few-shot learning solves NLP, nor does it pretend that GPT-3 works reliably on every task.</p>
<p>In many cases, traditional fine-tuned systems still perform better.</p>
<p>Although GPT-3 achieves impressive few-shot results across a wide range of benchmarks, the model continues to struggle on several reasoning-heavy tasks, especially natural language inference and certain reading comprehension datasets.</p>
<p>The paper also emphasizes that GPT-3’s success depends heavily on scale. Smaller versions of the model show far weaker few-shot capabilities, while the strongest results appear only at extremely large parameter counts.</p>
<p>This creates a major practical problem.</p>
<p>Training GPT-3 required enormous computational resources, specialized infrastructure, and vast amounts of data. The largest model contains 175 billion parameters and was trained using large GPU clusters over massive datasets.</p>
<p>In practice, very few organizations in the world could realistically reproduce this work at the time.</p>
<p>The paper also discusses broader concerns around bias and fairness. Since GPT-3 learns from large internet datasets, it inevitably absorbs social biases, stereotypes, and problematic language patterns present in the data itself.</p>
<p>This becomes especially concerning because the model can generate highly convincing text. Incorrect or biased outputs may sound authoritative even when they are misleading or harmful.</p>
<p>Another issue the authors examine is <em>data contamination</em>. Because GPT-3 is trained on web-scale corpora, parts of benchmark datasets may accidentally appear in the training data. The paper investigates this directly and acknowledges that some overlap exists, although the authors argue that contamination alone does not explain the overall results.</p>
<p>There is also an environmental and economic cost to scaling models this aggressively.</p>
<p>Training systems at the scale of GPT-3 consumes enormous amounts of compute and energy, raising questions about sustainability and accessibility in AI research. As models become larger, cutting-edge progress increasingly depends on access to industrial-scale infrastructure.</p>
<p>This creates a tension that still exists today.</p>
<p>GPT-3 demonstrated that scaling works extraordinarily well, but it also highlighted how concentrated advanced AI research was becoming. The future of large language models was clearly promising, but also increasingly expensive.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>The paper ends with a surprisingly simple conclusion: scaling language models changes what they are capable of doing.</p>
<p>According to the authors, GPT-3 demonstrates that a sufficiently large language model can learn tasks directly from context without requiring gradient updates or task-specific fine-tuning.</p>
<p>That idea represents a major shift in the direction of NLP.</p>
<p>For years, the standard workflow in machine learning looked something like this:</p>
<ul>
<li><p>Pretrain a model</p>
</li>
<li><p>Fine-tune it for a specific task</p>
</li>
<li><p>Deploy the specialized system</p>
</li>
</ul>
<p>GPT-3 introduces a different paradigm.</p>
<p>Instead of retraining the model repeatedly for new tasks, the same pretrained model can often adapt through prompts alone. Instructions and examples inside the context window become enough to guide the model toward useful behavior.</p>
<p>In other words, the workflow starts looking more like this:</p>
<ul>
<li><p>Train once</p>
</li>
<li><p>Adapt dynamically through prompting</p>
</li>
</ul>
<p>What makes this important is not just convenience. It changes how researchers think about generalization itself.</p>
<p>The paper suggests that many capabilities traditionally associated with supervised learning can emerge naturally from large-scale language modeling. Translation, question answering, reasoning, summarization, and even task adaptation begin appearing inside a single unified system trained only with next-token prediction.</p>
<p>At the same time, the authors remain careful in their conclusions.</p>
<p>GPT-3 is clearly powerful, but it is not reliable enough to be considered a complete solution to intelligence or reasoning. The paper repeatedly acknowledges weaknesses involving logic, factual accuracy, bias, and consistency.</p>
<p>Still, the broader message is difficult to ignore.</p>
<p>GPT-3 showed that scaling language models does not simply improve fluency. It can produce entirely new behaviors that were weak or absent in smaller systems. That realization reshaped the trajectory of modern AI research and laid the foundation for the prompt-driven systems that would soon follow.</p>
<h2 id="heading-final-insight"><strong>Final Insight</strong></h2>
<p>If GPT-1 introduced the idea of large-scale pretraining followed by fine-tuning, and GPT-2 showed that language models could generalize surprisingly well without task-specific training, then GPT-3 pushes the idea even further.</p>
<p>It suggests that language models can begin learning <em>during inference itself</em>.</p>
<p>That is the real conceptual shift behind this paper.</p>
<p>Before GPT-3, most AI systems were still fundamentally task-specific. Even powerful pretrained models usually needed additional supervised training before they became useful for a particular application.</p>
<p>GPT-3 starts breaking that pattern.</p>
<p>Instead of building a separate model for translation, summarization, question answering, or reasoning, the same model can adapt dynamically depending on the prompt it receives. Examples inside the context window effectively become temporary instructions for behavior.</p>
<p>In practice, this moves AI systems away from narrow specialization and toward something more flexible:</p>
<ul>
<li><p>From task-specific systems</p>
</li>
<li><p>To general-purpose models that adapt on the fly</p>
</li>
</ul>
<p>What makes this especially important is that GPT-3 did not achieve this through complicated symbolic reasoning systems or handcrafted pipelines. The model was still trained using a relatively simple next-token prediction objective. Yet at sufficient scale, entirely new behaviors started emerging.</p>
<p>Looking back, this paper feels less like the end of the GPT series and more like the beginning of a new era.</p>
<p>Many ideas that now define modern AI trace directly back to GPT-3:</p>
<ul>
<li><p>Prompt engineering</p>
</li>
<li><p>Instruction-following systems</p>
</li>
<li><p>In-context learning</p>
</li>
<li><p>Conversational AI assistants</p>
</li>
<li><p>General-purpose foundation models</p>
</li>
</ul>
<p>And ultimately, systems like ChatGPT exist because GPT-3 demonstrated that prompting itself could become a powerful interface for interacting with intelligence.</p>
<p>That is why this paper became historically important.</p>
<p>It did not just scale language models. It changed how people imagined using them.</p>
<h2 id="heading-gpt-1-vs-gpt-2-vs-gpt-3-key-differences"><strong>GPT-1 vs GPT-2 vs GPT-3: Key Differences</strong></h2>
<table style="min-width:100px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>GPT-1</strong></p></td><td><p><strong>GPT-2</strong></p></td><td><p><strong>GPT-3</strong></p></td></tr><tr><td><p><strong>Core Idea</strong></p></td><td><p>Pre-training followed by fine-tuning</p></td><td><p>Pre-training alone enables zero-shot behavior</p></td><td><p>Large-scale pre-training enables few-shot and in-context learning</p></td></tr><tr><td><p><strong>Training Approach</strong></p></td><td><p>Two-stage pipeline: pretrain then fine-tune</p></td><td><p>Single-stage language modeling</p></td><td><p>Same language modeling approach, but massively scaled</p></td></tr><tr><td><p><strong>Supervision</strong></p></td><td><p>Requires labeled data for downstream tasks</p></td><td><p>Can perform tasks without supervised fine-tuning</p></td><td><p>Can adapt from prompts and examples without retraining</p></td></tr><tr><td><p><strong>Task Handling</strong></p></td><td><p>Separate fine-tuning for each task</p></td><td><p>Tasks handled mainly through zero-shot prompts</p></td><td><p>Tasks handled through zero-shot, one-shot, and few-shot prompting</p></td></tr><tr><td><p><strong>Learning Style</strong></p></td><td><p>Learns representations, then specializes</p></td><td><p>Learns general language patterns</p></td><td><p>Learns to infer tasks directly from context</p></td></tr><tr><td><p><strong>Generalization</strong></p></td><td><p>Limited outside fine-tuned tasks</p></td><td><p>Stronger cross-task generalization</p></td><td><p>Much stronger contextual adaptation and in-context learning</p></td></tr><tr><td><p><strong>Prompt Usage</strong></p></td><td><p>Minimal importance</p></td><td><p>Prompts become useful</p></td><td><p>Prompts become central to system behavior</p></td></tr><tr><td><p><strong>Inference Behavior</strong></p></td><td><p>Mostly static after training</p></td><td><p>Can generalize during inference</p></td><td><p>Can adapt dynamically during inference</p></td></tr><tr><td><p><strong>Architecture</strong></p></td><td><p>Transformer (decoder-based)</p></td><td><p>Decoder-only Transformer</p></td><td><p>Decoder-only Transformer with large-scale scaling</p></td></tr><tr><td><p><strong>Model Size</strong></p></td><td><p>~117M parameters</p></td><td><p>Up to 1.5B parameters</p></td><td><p>Up to 175B parameters</p></td></tr><tr><td><p><strong>Context Window</strong></p></td><td><p>Smaller context length</p></td><td><p>Up to 1024 tokens</p></td><td><p>2048-token context window</p></td></tr><tr><td><p><strong>Training Data</strong></p></td><td><p>Books Corpus and curated datasets</p></td><td><p>WebText internet dataset</p></td><td><p>Massive multi-source dataset including Common Crawl, WebText, Books, and Wikipedia</p></td></tr><tr><td><p><strong>Key Capability</strong></p></td><td><p>Transfer learning</p></td><td><p>Zero-shot learning</p></td><td><p>Few-shot and in-context learning</p></td></tr><tr><td><p><strong>Performance Style</strong></p></td><td><p>Strong after fine-tuning</p></td><td><p>Strong without task-specific training</p></td><td><p>Often competitive with fine-tuned systems using prompts alone</p></td></tr><tr><td><p><strong>Scaling Importance</strong></p></td><td><p>Moderate</p></td><td><p>Important</p></td><td><p>Central research strategy of the paper</p></td></tr><tr><td><p><strong>Main Limitation</strong></p></td><td><p>Requires labeled datasets and retraining</p></td><td><p>Weak reasoning and inconsistent zero-shot behavior</p></td><td><p>Extremely expensive compute requirements and persistent reasoning limitations</p></td></tr><tr><td><p><strong>Main Contribution</strong></p></td><td><p>Introduced modern NLP pre-training paradigm</p></td><td><p>Demonstrated multitask zero-shot behavior</p></td><td><p>Demonstrated emergent in-context learning at scale</p></td></tr><tr><td><p><strong>Historical Impact</strong></p></td><td><p>Foundation of modern Transformer NLP</p></td><td><p>Shift toward general-purpose language models</p></td><td><p>Foundation for prompt-driven AI systems and modern LLM applications</p></td></tr><tr><td><p><strong>What Changed in the Field</strong></p></td><td><p>Pre-training became standard</p></td><td><p>Prompting became viable</p></td><td><p>Prompting became the primary interface for AI systems</p></td></tr><tr><td><p><strong>Legacy</strong></p></td><td><p>Inspired modern transfer learning pipelines</p></td><td><p>Inspired large-scale generative models</p></td><td><p>Directly influenced ChatGPT, instruction tuning, and foundation models</p></td></tr></tbody></table>

<h2 id="heading-pytorch-implementations-of-the-gpt-architecture-evolution">PyTorch Implementations of the GPT Architecture Evolution</h2>
<p><strong>GPT-1: Pre-training + Fine-Tuning Architecture</strong></p>
<pre><code class="language-python">class GPT1(nn.Module):
    def __init__(self, vocab_size, d_model, n_layers):
        super().__init__()

        self.token_embedding = nn.Embedding(vocab_size, d_model)
        self.position_embedding = nn.Embedding(512, d_model)

        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(d_model)
            for _ in range(n_layers)
        ])

        self.ln_f = nn.LayerNorm(d_model)

        # Language modeling head
        self.lm_head = nn.Linear(d_model, vocab_size)

    def forward(self, input_ids):
        positions = torch.arange(input_ids.size(1))

        x = (
            self.token_embedding(input_ids)
            + self.position_embedding(positions)
        )

        for block in self.transformer_blocks:
            x = block(x)

        x = self.ln_f(x)

        logits = self.lm_head(x)

        return logits
</code></pre>
<p><code>GPT1</code> inherits from <code>nn.Module</code>, which is the base class used to build neural networks in PyTorch. The constructor <code>(init)</code> defines all trainable layers used by the model.</p>
<p><code>nn.Embedding(vocab_size, d_model)</code> creates a learnable lookup table that converts token IDs into dense vectors. Each token in the vocabulary is mapped to a vector of size <code>d_model</code>.</p>
<p>The positional embedding layer adds information about token order. Since Transformers process tokens in parallel, they need explicit positional information to understand sequence structure.</p>
<p><code>nn.ModuleList([...])</code> stores multiple <code>Transformer blocks</code> while ensuring PyTorch properly tracks their parameters during training. Each TransformerBlock typically contains masked self-attention and a feed-forward network.</p>
<p><code>nn.LayerNorm(d_model)</code> applies layer normalization before the output projection. This helps stabilize training and improves gradient flow in deep Transformer architectures.</p>
<p>The language modeling head <code>(nn.Linear)</code> projects the hidden representations back into vocabulary space. The output size equals <code>vocab_size</code>, producing prediction scores for every possible next token.</p>
<p>Inside the <code>forward()</code> method, <code>input_ids.size(1)</code> retrieves the sequence length, and <code>torch.arange(...)</code> generates positional indices for each token position.</p>
<p>The token embeddings and positional embeddings are added together to produce the initial Transformer input representation.</p>
<p>The model then passes the representation through each Transformer block sequentially:</p>
<pre><code class="language-python">for block in self.transformer_blocks:
    x = block(x)
</code></pre>
<p>This iterative stacking is what allows GPT models to learn increasingly abstract contextual representations.</p>
<p>After normalization, the final hidden states are passed into <code>lm_head</code>, producing <code>logits</code>. These logits are unnormalized prediction scores used to compute probabilities for next-token generation.</p>
<p>The model finally returns the logits tensor, which is typically passed through <code>softmax</code> during inference or used directly with <code>CrossEntropyLoss</code> during training.</p>
<p><strong>GPT-2: Zero-Shot Multitask Architecture</strong></p>
<pre><code class="language-python">class GPT2(nn.Module):
    def __init__(self, vocab_size, d_model, n_layers):
        super().__init__()

        self.token_embedding = nn.Embedding(vocab_size, d_model)
        self.position_embedding = nn.Embedding(1024, d_model)

        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(
                d_model=d_model,
                pre_layer_norm=True
            )
            for _ in range(n_layers)
        ])

        self.final_layer_norm = nn.LayerNorm(d_model)

        self.lm_head = nn.Linear(d_model, vocab_size, bias=False)

    def forward(self, input_ids):
        positions = torch.arange(input_ids.size(1))

        x = (
            self.token_embedding(input_ids)
            + self.position_embedding(positions)
        )

        for block in self.transformer_blocks:
            x = block(x)

        x = self.final_layer_norm(x)

        logits = self.lm_head(x)

        return logits
</code></pre>
<p>Like GPT-1, the model begins with token embeddings and positional embeddings. <code>nn.Embedding</code> converts token IDs into dense vectors, while positional embeddings provide information about token order in the sequence.</p>
<p>One noticeable difference is the larger positional embedding size (<code>1024</code> instead of <code>512</code>), allowing GPT-2 to process longer contexts.</p>
<p>The Transformer layers are stored using <code>nn.ModuleList</code>, but each <code>TransformerBlock</code> now uses:</p>
<pre><code class="language-python">pre_layer_norm=True
</code></pre>
<p>This means layer normalization is applied before attention and feed-forward operations rather than after them. This “Pre-LN” design significantly improves gradient flow and training stability in deeper Transformer models.</p>
<p>The forward pass follows the same overall pipeline:</p>
<ol>
<li><p>Generate positional indices with <code>torch.arange()</code></p>
</li>
<li><p>Add token and positional embeddings</p>
</li>
<li><p>Pass representations through stacked Transformer blocks</p>
</li>
<li><p>Apply final normalization</p>
</li>
<li><p>Project outputs into vocabulary space</p>
</li>
</ol>
<p>The sequential block processing happens here:</p>
<pre><code class="language-python">for block in self.transformer_blocks:
    x = block(x)
</code></pre>
<p>GPT-2 also introduces a small optimization in the output layer:</p>
<pre><code class="language-python">self.lm_head = nn.Linear(d_model, vocab_size, bias=False)
</code></pre>
<pre><code class="language-python">self.lm_head = nn.Linear(d_model, vocab_size, bias=False)
</code></pre>
<p>The bias term is removed because it provides little benefit in large language modeling setups and slightly reduces parameter count.</p>
<p>Finally, the model returns <code>logits</code>, which contain prediction scores for every token in the vocabulary at each sequence position.</p>
<p><strong>GPT-3: Few-Shot / In-Context Learning Architecture</strong></p>
<pre><code class="language-python">class GPT3(nn.Module):
    def __init__(
        self,
        vocab_size=50257,
        d_model=12288,
        n_layers=96,
        n_heads=96,
        context_length=2048
    ):
        super().__init__()

        self.token_embedding = nn.Embedding(vocab_size, d_model)
        self.position_embedding = nn.Embedding(context_length, d_model)

        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(
                d_model=d_model,
                n_heads=n_heads,
                pre_layer_norm=True,
                sparse_attention=True
            )
            for _ in range(n_layers)
        ])

        self.final_layer_norm = nn.LayerNorm(d_model)

        self.lm_head = nn.Linear(
            d_model,
            vocab_size,
            bias=False
        )

    def forward(self, input_ids):
        positions = torch.arange(input_ids.size(1))

        x = (
            self.token_embedding(input_ids)
            + self.position_embedding(positions)
        )

        for block in self.transformer_blocks:
            x = block(x)

        x = self.final_layer_norm(x)

        logits = self.lm_head(x)

        return logits
</code></pre>
<p>Compared to earlier GPT versions, this model dramatically increases scale. The embedding size (<code>d_model=12288</code>) and the number of Transformer layers (<code>96</code>) allow the network to learn highly complex language patterns and long-range dependencies.</p>
<p>The model also uses <code>96</code> attention heads:</p>
<pre><code class="language-python">n_heads=96
</code></pre>
<p>Multi-head attention allows the model to focus on different relationships between tokens simultaneously, improving contextual understanding.</p>
<p>The positional embedding length is expanded to <code>2048</code>, enabling the model to process much longer sequences than GPT-2.</p>
<p>Each Transformer block is configured with:</p>
<pre><code class="language-python">pre_layer_norm=True,
sparse_attention=True
</code></pre>
<p>Pre-layer normalization improves training stability in very deep networks, while sparse attention reduces the computational cost of attention by limiting how many tokens attend to each other. This becomes important at GPT-3 scale, where full attention over long sequences is extremely expensive.</p>
<p>The forward pass follows the standard GPT pipeline:</p>
<ol>
<li><p>Convert token IDs into embeddings</p>
</li>
<li><p>Add positional information</p>
</li>
<li><p>Pass representations through stacked Transformer blocks</p>
</li>
<li><p>Apply final layer normalization</p>
</li>
<li><p>Generate vocabulary logits</p>
</li>
</ol>
<p>The core iterative processing happens here:</p>
<pre><code class="language-plaintext">for block in self.transformer_blocks:
    x = block(x)
</code></pre>
<p>Finally, the output layer projects the hidden states into vocabulary space, producing <code>logits</code> used for next-token prediction during training and text generation.</p>
<h2 id="heading-resources"><strong>Resources:</strong></h2>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD/Pytorch-Collections/tree/main/GPT">Pytorch Projects for GPT series</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1706.03762?utm_source=chatgpt.com">Attention Is All You Need</a></p>
</li>
<li><p><a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf?utm_source=chatgpt.com">Improving Language Understanding by Generative Pre-Training (GPT-1)</a></p>
</li>
<li><p><a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf?utm_source=chatgpt.com">Language Models are Unsupervised Multitask Learners (GPT-2)</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1810.04805?utm_source=chatgpt.com">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1906.08237?utm_source=chatgpt.com">XLNet: Generalized Autoregressive Pretraining for Language Understanding</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1907.11692?utm_source=chatgpt.com">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1909.08053?utm_source=chatgpt.com">Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2009.08366?utm_source=chatgpt.com">Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/1904.10509?utm_source=chatgpt.com">Sparse Transformers</a></p>
</li>
<li><p><a href="https://arxiv.org/abs/2001.08361?utm_source=chatgpt.com">Scaling Laws for Neural Language Models</a></p>
</li>
</ul>
<p><strong>Contact Me</strong></p>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD"><strong>Github</strong></a></p>
</li>
<li><p><a href="https://x.com/programmingoce"><strong>X</strong></a></p>
</li>
<li><p><a href="https://www.linkedin.com/in/mohammed-abrah-6435a63ba/"><strong>Linkedin</strong></a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ AI Paper Review: Language Models are Unsupervised Multitask Learners (GPT-2) ]]>
                </title>
                <description>
                    <![CDATA[ Before models like ChatGPT became part of everyday life, AI systems were already getting surprisingly good at generating text. But there was still a major limitation: most models could only perform ta ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ai-paper-review-language-models-are-unsupervised-multitask-learners-gpt-2/</link>
                <guid isPermaLink="false">6a01fbeffca21b0d4b40ae1d</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mohammed Fahd Abrah ]]>
                </dc:creator>
                <pubDate>Mon, 11 May 2026 15:55:27 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/be6d96bd-c687-4fac-a3e2-ea68ba622c51.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Before models like ChatGPT became part of everyday life, AI systems were already getting surprisingly good at generating text. But there was still a major limitation: most models could only perform tasks they were specifically trained for.</p>
<p>If you wanted a model to translate text, summarize an article, or answer questions, you usually had to collect labeled data and train it separately for each task. AI was powerful, but still very narrow.</p>
<p>Then GPT-2 introduced a different idea.</p>
<p>Instead of teaching a model every task individually, researchers explored whether simply training a model to predict the next word on a massive amount of internet text could be enough for useful abilities to emerge on their own.</p>
<p>And surprisingly, it worked.</p>
<p>The model began showing early signs of generalization. It could answer questions, summarize text, translate between languages, and complete prompts – all without task-specific training or fine tuning them toward down stream tasks.</p>
<p>Now, research papers like the one that introduced these new ideas can be difficult and time-consuming to read, especially when they’re filled with technical terminology and experimental details. So in this article, I’ll break the paper down in a simple and practical way.</p>
<p>We’ll look at what problem the paper was trying to solve, the main ideas behind GPT-2, how zero-shot learning works, and why this paper became such an important step toward modern large language models.</p>
<p>By the end, you should understand the key insights of GPT-2 without needing to read the full paper yourself.</p>
<h2 id="heading-paper-overview"><strong>Paper Overview</strong></h2>
<p>In this article, we’ll review the paper <em>Language Models are Unsupervised Multitask Learners</em> by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.</p>
<p>The paper introduced GPT-2 and showed how a language model trained on massive amounts of text could perform multiple tasks without task-specific training.</p>
<p>Here’s the actual paper if you want to read it yourself:</p>
<p><a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf?utm_source=chatgpt.com">Language Models are Unsupervised Multitask Learners (PDF)</a></p>
<p>And here’s a quick infographic of what we’ll cover in this review:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/0a814405-f634-4251-a1be-b3b02d785691.png" alt="AI paper quick insights" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-table-of-contents">Table of Contents:</h2>
<ul>
<li><p><a href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a href="#heading-goals-of-the-paper">Goals of the Paper</a></p>
</li>
<li><p><a href="#heading-core-idea">Core Idea</a></p>
</li>
<li><p><a href="#heading-methodology">Methodology</a></p>
</li>
<li><p><a href="#heading-zero-shot-setup">Zero-Shot Setup</a></p>
</li>
<li><p><a href="#heading-fine-tuning-vs-zero-shot-learning">Fine-tuning vs Zero-Shot Learning</a></p>
</li>
<li><p><a href="#heading-training-data-web-text">Training Data (Web Text)</a></p>
</li>
<li><p><a href="#heading-input-representation">Input Representation</a></p>
</li>
<li><p><a href="#heading-model-architecture">Model Architecture</a></p>
</li>
<li><p><a href="#heading-experiments">Experiments</a></p>
</li>
<li><p><a href="#heading-key-findings">Key Findings</a></p>
</li>
<li><p><a href="#heading-task-specific">Task-Specific</a></p>
</li>
<li><p><a href="#heading-generalization-vs-memorization">Generalization vs Memorization</a></p>
</li>
<li><p><a href="#heading-discussion">Discussion</a></p>
</li>
<li><p><a href="#heading-limitations">Limitations</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a href="#heading-final-insight">Final Insight</a></p>
</li>
<li><p><a href="#heading-gpt-1-vs-gpt-2-key-differences">GPT-1 vs GPT-2 — Key Differences</a></p>
</li>
<li><p><a href="#heading-resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this breakdown, it helps to be familiar with a few basic ideas:</p>
<ul>
<li><p>Reading the previous review, <a href="https://www.freecodecamp.org/news/ai-paper-review-improving-language-understanding-by-generative-pre-training-gpt-1/">AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1)</a>, will be helpful and will give you some solid background info and context (since GPT-2 directly builds on many of the ideas introduced there).</p>
</li>
<li><p>A general understanding of <a href="https://www.freecodecamp.org/news/natural-language-processing-with-spacy-python-full-course/">natural language processing (NLP)</a> and how machines work with text</p>
</li>
<li><p>A high-level idea of what a <a href="https://www.freecodecamp.org/news/how-transformer-models-work-for-language-processing/">Transformer model</a> is (you don’t need deep technical details, just the basic concept)</p>
</li>
<li><p>The difference between supervised learning, unsupervised learning, and zero-shot learning</p>
</li>
<li><p>Basic <a href="https://www.freecodecamp.org/news/learn-the-foundations-of-machine-learning-and-artificial-intelligence/">machine learning concepts</a> like training data, models, and scaling</p>
</li>
</ul>
<p>If you’re not fully comfortable with all of these, that’s completely okay. I’ll keep the explanations as simple and intuitive as possible, focusing more on understanding the ideas than getting lost in heavy technical details.</p>
<h2 id="heading-executive-summary"><strong>Executive Summary</strong></h2>
<p>Before GPT-2, most NLP systems depended heavily on supervised learning. Each task, whether it was translation, question answering, or summarization, typically required its own labeled dataset and a model trained specifically for it.</p>
<p>This paper challenges that approach.</p>
<p>According to the authors, a single large language model, trained only to predict the next word in a sequence of text, can learn to perform many different tasks without any task-specific training.</p>
<p>Instead of being explicitly taught how to solve each problem, the model picks up these abilities from patterns in the data.</p>
<p>In simple terms, the model is not directly trained to translate, answer questions, or summarize. Rather, it learns to do these things implicitly through exposure to large amounts of text.</p>
<p>This marks an important shift. Rather than relying on supervised learning for every task, the paper shows that models can begin to generalize across tasks in what is now known as a zero-shot setting.</p>
<h2 id="heading-goals-of-the-paper"><strong>Goals of the Paper</strong></h2>
<p>To understand the motivation behind this work, it helps to look at the limitations of traditional NLP systems.</p>
<p>According to the authors, most existing approaches rely heavily on labeled datasets, require separate training for each task, and struggle to generalize beyond the specific problems they were designed for.</p>
<p>In practice, this makes systems powerful but narrow: they perform well on what they are trained for, but don’t easily transfer that knowledge elsewhere.</p>
<p>This paper explores a different direction.</p>
<p>The authors ask whether a model can learn to perform multiple tasks without explicit supervision, simply by training on large amounts of text.</p>
<p>They also investigate whether language modeling alone is enough to capture general capabilities, and whether increasing the size of the model and the amount of data can improve this behavior.</p>
<p>At its core, the goal is to move toward more general systems that learn from language itself, rather than from carefully labeled datasets.</p>
<h2 id="heading-core-idea"><strong>Core Idea</strong></h2>
<p>At the heart of the paper is a simple but powerful idea: instead of training models in the traditional supervised way (mapping inputs directly to outputs), the authors train a model to do just one thing: predict the next word in a sequence of text.</p>
<p>At first, this might sound limited. But the key insight is that natural language already contains many examples of tasks embedded within it.</p>
<p>Text on the internet includes questions followed by answers, translations between languages, summaries of longer content, and detailed explanations.</p>
<p>According to the paper, by learning to predict and generate text, the model is indirectly learning how these tasks work. In other words, it begins to model relationships like <em>p(output | input, task)</em> without ever being explicitly told what the task is.</p>
<p>This is what allows the model to move beyond a single objective and start behaving like a general system.</p>
<h2 id="heading-methodology"><strong>Methodology</strong></h2>
<p>To understand how this idea works in practice, it helps to look at how the model is trained.</p>
<p>According to the authors, everything starts with a standard language modeling objective.</p>
<p>The model is trained to predict the next token in a sequence based on the tokens that come before it.</p>
<p>While this may seem simple, it allows the model to learn the underlying structure of language over time.</p>
<p>Formally, this means the model is learning probabilities over sequences of text. In practice, this ability enables it to generate coherent text, complete sentences, and even mimic patterns that resemble specific tasks.</p>
<p>This is what makes the approach powerful. Even though the model is only trained to predict the next word, it ends up capturing much richer behavior that can be applied to a variety of tasks.</p>
<h2 id="heading-zero-shot-setup"><strong>Zero-Shot Setup</strong></h2>
<p>One of the most important differences from earlier approaches is how the model is used after training.</p>
<p>Unlike GPT-1, there's no fine-tuning or task-specific training. The model isn't adapted or retrained for each new task. Instead, everything is handled through the input itself.</p>
<p>According to the authors, tasks are expressed directly as text prompts. For example, you might write something like “Translate to French:” followed by a sentence, or “Answer the question:” followed by a prompt. The model then continues the text in a way that reflects the task.</p>
<p>In practice, this means the model isn't explicitly told what to do through training – it infers the task from the structure of the input and responds accordingly.</p>
<h2 id="heading-fine-tuning-vs-zero-shot-learning"><strong>Fine-tuning vs Zero-Shot Learning</strong></h2>
<table style="min-width:75px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>Fine-tuning (Task-Specific Training)</strong></p></td><td><p><strong>Zero-Shot Learning</strong></p></td></tr><tr><td><p><strong>Definition</strong></p></td><td><p>Model is trained further on labeled data for a specific task</p></td><td><p>Model performs tasks without any additional training</p></td></tr><tr><td><p><strong>Training Requirement</strong></p></td><td><p>Requires task-specific labeled datasets</p></td><td><p>No labeled data needed for the task</p></td></tr><tr><td><p><strong>Setup</strong></p></td><td><p>Separate training phase for each task</p></td><td><p>Tasks are given as natural language prompts</p></td></tr><tr><td><p><strong>Flexibility</strong></p></td><td><p>Limited to trained tasks</p></td><td><p>Can generalize to many unseen tasks</p></td></tr><tr><td><p><strong>Performance</strong></p></td><td><p>Usually higher on specific tasks</p></td><td><p>Lower, but improving with scale</p></td></tr><tr><td><p><strong>Cost</strong></p></td><td><p>Expensive (training per task)</p></td><td><p>Efficient (no retraining needed)</p></td></tr><tr><td><p><strong>Adaptability</strong></p></td><td><p>Needs retraining for new tasks</p></td><td><p>Adapts instantly via prompts</p></td></tr><tr><td><p><strong>Example (NLP)</strong></p></td><td><p>Train model for sentiment analysis dataset</p></td><td><p>“Classify sentiment: …” prompt</p></td></tr><tr><td><p><strong>Used in</strong></p></td><td><p>GPT-1, traditional NLP systems</p></td><td><p>GPT-2, GPT-3, modern LLMs</p></td></tr><tr><td><p><strong>Main Advantage</strong></p></td><td><p>High accuracy on defined tasks</p></td><td><p>High flexibility and generalization</p></td></tr><tr><td><p><strong>Main Limitation</strong></p></td><td><p>Not scalable across many tasks</p></td><td><p>Less precise than fine-tuned models</p></td></tr></tbody></table>

<h2 id="heading-training-data-web-text"><strong>Training Data (Web Text)</strong></h2>
<p>Another key part of this work is the dataset used to train the model.</p>
<p>Instead of relying on traditional sources like Wikipedia, books, or news articles alone, the authors created a new dataset called <strong>Web Text</strong>.</p>
<p>It consists of millions of documents – around 40 GB of text – collected from links shared on Reddit that received a certain level of engagement.</p>
<p>According to the paper, this filtering step helps improve the overall quality of the data, since the content is more likely to be interesting or useful to readers.</p>
<p>What makes this dataset important is its diversity. It contains real-world language from many domains, and more importantly, it includes natural examples of tasks, such as explanations, question–answer pairs, and translations, embedded within the text itself.</p>
<h2 id="heading-input-representation"><strong>Input Representation</strong></h2>
<p>To process text, the model uses a technique called <strong>Byte Pair Encoding (BPE)</strong>.</p>
<p>According to the authors, BPE works as a middle ground between word-level and character-level representations.</p>
<p>Instead of treating text strictly as full words or individual characters, it breaks it into smaller units that can adapt depending on how frequently patterns appear in the data.</p>
<p>In practice, this allows the model to handle a wide range of text more effectively, including rare words and different languages. It also improves generalization, since the model isn't limited to a fixed vocabulary of complete words.</p>
<h2 id="heading-model-architecture"><strong>Model Architecture</strong></h2>
<p>The model used in this paper is based on a <strong>Transformer (decoder-only)</strong> architecture, similar to GPT-1 but significantly scaled up.</p>
<p>According to the authors, the model relies on <strong>masked self-attention</strong>, which allows it to look at previous tokens in a sequence while predicting the next one.</p>
<p>This means it processes text step by step, always using past context to generate the next token.</p>
<p>Compared to GPT-1, several important changes were introduced.</p>
<p>The model can handle longer context, with sequences of up to 1024 tokens, and uses a larger vocabulary of around 50,000 tokens. It's also much deeper, with more layers and significantly more parameters.</p>
<p>The authors trained multiple versions of the model, ranging from 117 million to 1.5 billion parameters.</p>
<p>The largest of these is what we now refer to as GPT-2, and it's the one responsible for most of the strong results reported in the paper.</p>
<p><strong>Transformer (decoder-only)</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/602d56bd-dbf1-4eec-b11d-6d82b3dcd04d.png" alt="Transformer (decoder-only)" style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Note:</strong> The original figure illustrates the complete Transformer architecture (Encoder–Decoder) from <em>Attention Is All You Need</em>. For clarity and relevance to GPT-style models, the image used here was cropped to focus only on the decoder side of the architecture, since GPT models are based on a decoder-only Transformer design.</p>
<p><strong>Reference:</strong> Brownlee, J. <a href="https://machinelearningmastery.com/encoders-and-decoders-in-transformer-models/?utm_source=chatgpt.com">Encoders and Decoders in Transformer Models</a> Machine Learning Mastery.</p>
<h2 id="heading-experiments">Experiments</h2>
<p>To evaluate the model, the authors tested it across a wide range of tasks – but with an important constraint: according to the paper, the model wasn't trained or fine-tuned on any of these tasks.</p>
<p>Instead, everything was evaluated in a zero-shot setting, where the model is simply given a prompt and asked to continue the text.</p>
<p>They applied this setup to different types of problems, including language modeling benchmarks, reading comprehension, translation, summarization, question answering, and commonsense reasoning.</p>
<p>The goal here was not just to measure performance, but to see how far a single model (trained only on raw text) could generalize across tasks without any additional training.</p>
<h2 id="heading-key-findings">Key Findings</h2>
<p>After evaluating the model across different tasks, the results were stronger than many would have expected.</p>
<p>According to the authors, GPT-2 achieves state-of-the-art results on 7 out of 8 language modeling benchmarks in a zero-shot setting.</p>
<p>One of the most important observations is that performance consistently improves as the model size increases, following a roughly log-linear trend.</p>
<p>In other words, scaling up the model leads to better results across tasks.</p>
<p>The paper also shows that larger models display more consistent multitask behavior.</p>
<p>For example, GPT-2 performs well on tasks that require long-range understanding, such as LAMBADA, and shows competitive results in reading comprehension on datasets like CoQA.</p>
<p>It even demonstrates early capabilities in translation and can answer factual questions without being explicitly trained for those tasks.</p>
<p>In practice, the key takeaway is clear: increasing model size and data plays a major role in unlocking these capabilities.</p>
<h2 id="heading-task-specific">Task-Specific</h2>
<p>Looking more closely at individual tasks, the paper gives a clearer picture of where the model performs well and where it still struggles.</p>
<p>GPT-2 shows surprisingly strong results in reading comprehension, even without any task-specific training. But its performance on summarization is still limited.</p>
<p>While it can generate summaries that look reasonable, they're often less accurate compared to supervised approaches.</p>
<p>For translation, the model demonstrates some ability, but the results are still far from competitive.</p>
<p>On the other hand, question answering improves noticeably as the model size increases, suggesting that scale plays an important role in this capability.</p>
<p>Overall, the model is far from perfect. But what stands out is that it's clearly beginning to learn general skills across tasks, even without being explicitly trained for them.</p>
<h2 id="heading-generalization-vs-memorization">Generalization vs Memorization</h2>
<p>A natural question that comes up is whether the model is actually learning useful patterns or simply memorizing the training data.</p>
<p>The authors address this directly. They analyze overlap between the training dataset and evaluation benchmarks using n-gram comparisons, looking for signs that the model might be copying rather than generalizing.</p>
<p>According to the paper, while some overlap does exist (as is common in large datasets), it's not enough to explain the model’s performance.</p>
<p>They also observe that the model still underfits the data, meaning it hasn’t fully captured everything in the training set.</p>
<p>This is an important point: if the model was mainly memorizing, we would expect it to fit the data much more closely.</p>
<p>In practice, this suggests that the improvements are coming from genuine learning rather than simple memorization, even though some overlap is unavoidable.</p>
<h2 id="heading-discussion">Discussion</h2>
<p>This section is where the authors step back and reflect on what these results actually mean.</p>
<p>According to the paper, language models trained on large and diverse datasets aren't just learning representations of text. They're beginning to learn how to perform tasks directly, even without supervision.</p>
<p>In other words, pre-training is doing more than providing useful features: it's capturing patterns that resemble real task behavior.</p>
<p>At the same time, the authors are careful not to overstate the results.</p>
<p>While the zero-shot capabilities are impressive, performance is still far from practical on many tasks.</p>
<p>Some outputs look convincing on the surface but lack accuracy when measured more carefully.</p>
<p>In practice, this section highlights both sides of the story. The approach is clearly promising, but it's still an early step toward more general systems.</p>
<h2 id="heading-limitations">Limitations</h2>
<p>Despite the progress shown in the paper, the approach still has several important limitations.</p>
<p>According to the authors, zero-shot performance, while impressive, is generally weaker than fully supervised models on many tasks.</p>
<p>The results also depend heavily on scale, both in terms of model size and the amount of data used. This means that smaller models don't show the same level of capability.</p>
<p>In addition, some tasks, such as summarization, remain relatively weak.</p>
<p>The model can produce outputs that look plausible, but they often lack accuracy or consistency when evaluated more carefully.</p>
<p>Another practical challenge is the cost. Training these models requires significant computational resources and large datasets, which makes this approach difficult to reproduce or scale for many researchers.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The paper ends with a simple but powerful idea.</p>
<p>According to the authors, when a language model is trained on a sufficiently large and diverse dataset – and with enough capacity – it begins to generalize across tasks and perform them without explicit training.</p>
<p>This suggests that the model isn't just learning language, but also the structure of the tasks embedded within it.</p>
<p>In practice, this points to a different way of thinking about AI systems. Instead of designing and training a model for each specific task, we can focus on training a single model on large-scale language data&nbsp;– and allow useful capabilities to emerge naturally from that process.</p>
<h2 id="heading-final-insight">Final Insight</h2>
<p>If GPT-1 introduced the idea of combining pre-training with fine-tuning, GPT-2 takes that idea a step further.</p>
<p>According to the paper, pre-training alone - when done at a large enough scale – can already produce models that begin to perform a wide range of tasks without any additional training.</p>
<p>This is a subtle but important shift, because it suggests that general capabilities can emerge directly from exposure to large amounts of text.</p>
<p>In my view, this is the point where things start to change direction.</p>
<p>The focus moves away from designing task-specific systems and toward building more general models that can adapt on their own.</p>
<p>This idea directly sets the stage for what comes next: models like GPT-3, ChatGPT, and modern large language systems that build on this same principle.</p>
<h2 id="heading-gpt-1-vs-gpt-2-key-differences"><strong>GPT-1 vs GPT-2 — Key Differences</strong></h2>
<table style="min-width:75px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>GPT-1</strong></p></td><td><p><strong>GPT-2</strong></p></td></tr><tr><td><p><strong>Core Idea</strong></p></td><td><p>Pre-training + fine-tuning</p></td><td><p>Pre-training alone (zero-shot)</p></td></tr><tr><td><p><strong>Training Approach</strong></p></td><td><p>Two-stages: learn language, then adapt to tasks</p></td><td><p>Single stage: learn language and infer tasks</p></td></tr><tr><td><p><strong>Supervision</strong></p></td><td><p>Requires labeled data for fine-tuning</p></td><td><p>No labeled data needed for tasks</p></td></tr><tr><td><p><strong>Task Handling</strong></p></td><td><p>Tasks require separate fine-tuning</p></td><td><p>Tasks handled via prompts (zero-shot)</p></td></tr><tr><td><p><strong>Generalization</strong></p></td><td><p>Limited, depends on fine-tuning</p></td><td><p>Stronger generalization across tasks</p></td></tr><tr><td><p><strong>Model Role</strong></p></td><td><p>Learns language, then adapts</p></td><td><p>Learns language and tasks together</p></td></tr><tr><td><p><strong>Architecture</strong></p></td><td><p>Transformer (decoder-based)</p></td><td><p>Transformer (decoder-only, scaled up)</p></td></tr><tr><td><p><strong>Model Size</strong></p></td><td><p>Smaller (~117M parameters)</p></td><td><p>Much larger (up to 1.5B parameters)</p></td></tr><tr><td><p><strong>Context Length</strong></p></td><td><p>Shorter context</p></td><td><p>Longer context (up to 1024 tokens)</p></td></tr><tr><td><p><strong>Dataset</strong></p></td><td><p>Books Corpus + other curated datasets</p></td><td><p>Web Text (large, diverse internet data)</p></td></tr><tr><td><p><strong>Key Capability</strong></p></td><td><p>Transfer learning</p></td><td><p>Zero-shot learning</p></td></tr><tr><td><p><strong>Performance Style</strong></p></td><td><p>Strong after fine-tuning</p></td><td><p>Strong without any task training</p></td></tr><tr><td><p><strong>Limitations</strong></p></td><td><p>Depends on labeled data</p></td><td><p>Depends heavily on scale (data + compute)</p></td></tr><tr><td><p><strong>Main Contribution</strong></p></td><td><p>Introduced pre-training paradigm</p></td><td><p>Showed emergence of multitask behavior</p></td></tr><tr><td><p><strong>Impact</strong></p></td><td><p>Foundation of modern NLP pipelines</p></td><td><p>Shift toward general-purpose models</p></td></tr></tbody></table>

<h2 id="heading-resources">Resources:</h2>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD/Pytorch-Collections/tree/main/GPT">Pytorch Projects for GPT series</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1706.03762">Attention Is All You Need</a></p>
</li>
<li><p><a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Improving Language Understanding by Generative Pre-Training</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1810.04805">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a></p>
</li>
<li><p><a href="https://papers.nips.cc/paper_files/paper/2015/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf">Semi-supervised Sequence Learning</a></p>
</li>
<li><p><a href="https://aclanthology.org/P18-1031.pdf?">Universal Language Model Fine-tuning for Text Classification</a></p>
</li>
<li><p><a href="https://aclanthology.org/N18-1202.pdf">Deep Contextualized Word Representations</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1508.07909">Neural Machine Translation of Rare Words with Subword Units</a></p>
</li>
<li><p><a href="https://papers.nips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf">Distributed Representations of Words and Phrases and Their Compositionality</a></p>
</li>
<li><p><a href="https://aclanthology.org/D14-1162.pdf">GloVe: Global Vectors for Word Representation</a></p>
</li>
</ul>
<h3 id="heading-contact-me"><strong>Contact Me</strong></h3>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD"><strong>Github</strong></a></p>
</li>
<li><p><a href="https://x.com/programmingoce"><strong>X</strong></a></p>
</li>
<li><p><a href="https://www.linkedin.com/in/mohammed-abrah-6435a63ba/"><strong>Linkedin</strong></a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ AI Paper Review: Improving Language Understanding by Generative Pre-Training (GPT-1)
 ]]>
                </title>
                <description>
                    <![CDATA[ We use AI tools all the time, whether it’s asking questions, generating images, or getting help with everyday tasks. But most of these tools didn’t appear out of nowhere. They were developed based on  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/ai-paper-review-improving-language-understanding-by-generative-pre-training-gpt-1/</link>
                <guid isPermaLink="false">69fb84ad50ecad45335e5367</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ academic writing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ transformers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mohammed Fahd Abrah ]]>
                </dc:creator>
                <pubDate>Wed, 06 May 2026 18:13:01 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/0998e844-4017-49b9-a68d-2d6c73fceb78.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>We use AI tools all the time, whether it’s asking questions, generating images, or getting help with everyday tasks. But most of these tools didn’t appear out of nowhere. They were developed based on research papers where the original ideas were developed and tested.</p>
<p>Now, not everyone enjoys reading research papers or has the time to comb through and digest all that (sometimes very dense) info. So I decided to do the hard work for you and share the key insights in a series of AI paper reviews.</p>
<p>The goal isn’t to turn this into a heavy academic discussion, but to explain the main ideas in a clear and practical way. You'll learn what problem the paper was trying to solve, what approach it introduced, and why it mattered.</p>
<p>In each article, you’ll get a simple breakdown of the paper, how it works, and what you should take away from it. By the end, you should understand the key idea without needing to go through the full research paper yourself.</p>
<h2 id="heading-paper-overview">Paper Overview</h2>
<p>The first paper I'll be reviewing is "Improving Language Understanding by Generative Pre-Training", by Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever.</p>
<p>Here's the actual paper if you want to read it yourself: <a href="https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf">Read the paper</a>.</p>
<p>And here's a little infographic of what we'll cover here:</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/0466e09f-c2a3-41fa-939d-f67d53f900e1.png" alt="0466e09f-c2a3-41fa-939d-f67d53f900e1" style="display:block;margin:0 auto" width="1414" height="2000" loading="lazy">

<h3 id="heading-table-of-contents">Table of Contents</h3>
<ul>
<li><p><a href="#heading-executive-summary">Executive Summary</a></p>
</li>
<li><p><a href="#heading-goals-of-the-paper">Goals of the Paper</a></p>
</li>
<li><p><a href="#heading-methodology">Methodology</a></p>
</li>
<li><p><a href="#heading-transformer-vs-bert-vs-gpt">Transformer vs. BERT vs. GPT</a></p>
</li>
<li><p><a href="#heading-model-architecture">Model Architecture</a></p>
</li>
<li><p><a href="#heading-key-techniques">Key Techniques</a></p>
</li>
<li><p><a href="#heading-key-findings">Key Findings</a></p>
</li>
<li><p><a href="#heading-conclusions">Conclusions</a></p>
</li>
<li><p><a href="#heading-limitations">Limitations</a></p>
</li>
<li><p><a href="#heading-related-work-amp-context">Related Work &amp; Context</a></p>
</li>
<li><p><a href="#heading-final-insight">Final Insight</a></p>
</li>
<li><p><a href="#heading-resources">Resources</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this breakdown, it helps to be familiar with a few basic ideas:</p>
<ul>
<li><p>A general understanding of natural language processing (NLP) and how machines work with text</p>
</li>
<li><p>A high-level idea of what a Transformer model is (you don’t need deep details, just the concept)</p>
</li>
<li><p>The difference between supervised and unsupervised learning</p>
</li>
<li><p>Basic machine learning concepts like training data and models</p>
</li>
</ul>
<p>If you’re not fully comfortable with all of these, that’s okay, you can still follow along. The goal here is to keep things clear and intuitive.</p>
<h2 id="heading-executive-summary">Executive Summary</h2>
<p>Before models like GPT became what we know today, there was a key limitation: AI systems were good at specific tasks, but struggled with general understanding.</p>
<p>In this paper, the authors introduce a simple but powerful idea. Instead of training a model separately for each task, they first train it on a large amount of unlabeled text to learn the structure of language. Then, they adapt it to specific tasks using smaller labeled datasets.</p>
<p>According to the authors, this two-step approach (pre-training followed by fine-tuning) allows a single model to handle many different tasks with minimal changes.</p>
<p>In practice, this marked a major shift: rather than building a new model for every problem, we can train one general model that learns language itself and then reuse it across tasks.</p>
<h2 id="heading-goals-of-the-paper">Goals of the Paper</h2>
<p>To understand the motivation behind this work, it helps to look at the main limitations in NLP at the time.</p>
<p>Most models depended heavily on large labeled datasets, which weren’t always available. Many tasks simply didn’t have enough labeled data to train effective systems. On top of that, existing models were usually designed for a single task, making them hard to reuse or adapt.</p>
<p>Because of this, the authors aimed to reduce the reliance on labeled data and move toward a more general approach. Their goal was to build a language model that could learn from large amounts of raw text and then be applied across different tasks.</p>
<p>According to the paper, they also wanted to enable transfer learning: the ability to take knowledge learned from one task and apply it to others. They also wanted to improve performance without needing to redesign a new model each time.</p>
<h2 id="heading-methodology">Methodology</h2>
<p>To understand how the authors approached this problem, let’s look at the core idea behind their method.</p>
<h3 id="heading-pre-training">Pre-Training</h3>
<p>At the heart of the paper is a simple but powerful approach built in two stages. The first stage is pre-training, where the model learns directly from raw text.</p>
<p>According to the authors, the model is trained on a large corpus of unlabeled text using a language modeling objective (predicting the next word in a sequence) – specifically, predicting the next word based on the previous ones to solve the intractable problem of <a href="https://en.wikipedia.org/wiki/High-dimensional_statistics">high dimension probabilities</a>. Through this process, the model gradually learns important aspects of language, such as grammar, context, structure, and general patterns.</p>
<p>The paper highlights that datasets like BooksCorpus are used in this stage because they contain long, continuous text. This is important, since it helps the model understand relationships across sentences rather than just short fragments.</p>
<h3 id="heading-fine-tuning-adapting-to-tasks">Fine-Tuning (Adapting to Tasks)</h3>
<p>Once the model has learned general language patterns, the next step is fine-tuning, where it is adapted to specific tasks using labeled data.</p>
<p>According to the authors, this includes tasks like question answering, text classification, natural language inference, and semantic similarity. Instead of building a new model for each task, the same pre-trained model is reused with only small adjustments.</p>
<p>In practice, this is what makes the approach powerful: the model already understands language at a general level, so it can quickly adapt to different tasks without needing to be redesigned from scratch.</p>
<h2 id="heading-transformer-vs-bert-vs-gpt">Transformer vs. BERT vs. GPT</h2>
<p>Before diving into GPT-1, it helps to understand how modern language models are structured. Most of them are based on the Transformer architecture, but they use it in different ways: encoder-only models (like BERT), decoder-only models (like GPT), or full encoder–decoder models.</p>
<p>The original encoder–decoder Transformer was mainly used for tasks like machine translation. Encoder-only models are typically used for understanding tasks such as text classification and sentiment analysis, while decoder-only models are designed for generation tasks like text creation, powering systems such as ChatGPT, Gemini, and Claude.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/e7348479-5fa0-4adf-92e1-644ae2039b03.png" alt="e7348479-5fa0-4adf-92e1-644ae2039b03" style="display:block;margin:0 auto" width="700" height="449" loading="lazy">

<p><em>Illustration comparing Transformer, GPT, and BERT architectures, adapted from</em> <a href="https://automotivevisions.wordpress.com/2025/03/21/comparing-large-language-models-gpt-vs-bert-vs-t5/">Comparing Large Language Models: GPT vs. BERT vs. T5</a> <em>showing encoder-decoder, decoder-only, and encoder-only designs</em></p>
<h3 id="heading-transformer-vs-bert-vs-gpt-key-differences">Transformer vs BERT vs GPT: Key Differences</h3>
<table style="min-width:100px"><colgroup><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"><col style="min-width:25px"></colgroup><tbody><tr><td><p><strong>Aspect</strong></p></td><td><p><strong>Transformer (Original)</strong></p></td><td><p><strong>BERT</strong></p></td><td><p><strong>GPT</strong></p></td></tr><tr><td><p><strong>Paper</strong></p></td><td><p>Attention Is All You Need (2017)</p></td><td><p>BERT (2018)</p></td><td><p>GPT (2018–2019)</p></td></tr><tr><td><p><strong>Architecture Type</strong></p></td><td><p>Encoder + Decoder</p></td><td><p>Encoder-only</p></td><td><p>Decoder-only</p></td></tr><tr><td><p><strong>Primary Goal</strong></p></td><td><p>Sequence-to-sequence tasks (for example, translation)</p></td><td><p>Language understanding</p></td><td><p>Language generation</p></td></tr><tr><td><p><strong>Training Objective</strong></p></td><td><p>Predict next token (seq2seq setup)</p></td><td><p>Masked language modeling (fill in blanks)</p></td><td><p>Predict next token (autoregressive)</p></td></tr><tr><td><p><strong>Directionality</strong></p></td><td><p>Bidirectional (encoder) + left-to-right (decoder)</p></td><td><p>Fully bidirectional</p></td><td><p>Left-to-right only</p></td></tr><tr><td><p><strong>Context Understanding</strong></p></td><td><p>Strong (via attention)</p></td><td><p>Very strong (full bidirectional context)</p></td><td><p>Strong (but only past context)</p></td></tr><tr><td><p><strong>Input/Output Style</strong></p></td><td><p>Input → Output sequence</p></td><td><p>Input → Representation</p></td><td><p>Input → Generated text</p></td></tr><tr><td><p><strong>Fine-tuning</strong></p></td><td><p>Required for each task</p></td><td><p>Required for each task</p></td><td><p>Optional (GPT-2+ supports zero-shot)</p></td></tr><tr><td><p><strong>Typical Tasks</strong></p></td><td><p>Translation, summarization</p></td><td><p>Classification, QA, NLI</p></td><td><p>Text generation, QA, chat</p></td></tr><tr><td><p><strong>Strength</strong></p></td><td><p>Flexible architecture foundation</p></td><td><p>Deep understanding of text</p></td><td><p>General-purpose generation</p></td></tr><tr><td><p><strong>Limitation</strong></p></td><td><p>Not directly usable without adaptation</p></td><td><p>Cannot generate text naturally</p></td><td><p>Limited bidirectional context</p></td></tr><tr><td><p><strong>Key Innovation</strong></p></td><td><p>Self-attention mechanism</p></td><td><p>Deep bidirectional encoding</p></td><td><p>Scaled generative pre-training</p></td></tr><tr><td><p><strong>Evolution Role</strong></p></td><td><p>Foundation of all modern LLMs</p></td><td><p>Specialized understanding models</p></td><td><p>Path to general-purpose AI</p></td></tr></tbody></table>

<h2 id="heading-model-architecture">Model Architecture</h2>
<p>To support this pre-training and fine-tuning approach, the GPT-1 model is built on a Transformer (decoder) architecture.</p>
<p>According to the authors, this choice is important for a few reasons. Unlike older models such as LSTMs, Transformers handle long-range dependencies more effectively, meaning they can better understand relationships between words that are far apart in a sentence.</p>
<p>They also rely on self-attention, a mechanism that allows the model to focus on the most relevant parts of the text when processing each word. This helps the model capture context more accurately.</p>
<p>Another key advantage is that Transformers make transfer learning more effective, since the same learned representations can be reused across different tasks with minimal changes.</p>
<p>The paper highlights that, in these transfer learning scenarios, Transformers outperform LSTM-based models.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/59df10f6-d843-4db7-9def-e302594d0b7e.png" alt="59df10f6-d843-4db7-9def-e302594d0b7e" style="display:block;margin:0 auto" width="1793" height="831" loading="lazy">

<p><em>Figure 1 from</em> “Improving Language Understanding by Generative Pre-Training” <em>(Radford et al., 2018), showing the Transformer architecture and task-specific input transformations.</em></p>
<h2 id="heading-key-techniques">Key Techniques</h2>
<p>Along with the main approach, the authors introduce a few practical techniques that make the model more flexible across tasks.</p>
<p>According to the paper, different tasks are handled by converting them into text-based formats, so they can all be processed in a similar way. This makes it easier to use the same model across multiple problems without redesigning it each time.</p>
<p>Another important point is that the model requires only minimal architectural changes when switching between tasks. Most of the knowledge learned during pre-training is reused as-is.</p>
<p>The authors also include an auxiliary language modeling objective during fine-tuning, which helps the model retain its general understanding of language while adapting to specific tasks.</p>
<h2 id="heading-key-findings">Key Findings</h2>
<p>After training and evaluation, the results weren't just strong – they were surprisingly competitive.</p>
<p>According to the authors, the model outperformed state-of-the-art systems in 9 out of 12 tasks. It also showed clear improvements, including +8.9% in commonsense reasoning and +5.7% in question answering.</p>
<p>Another important observation is that the model performed well across datasets of different sizes, although performance was weaker on some smaller datasets.</p>
<p>This suggests that the pre-training step helped it generalize better, even when labeled data was limited.</p>
<p>In practice, what makes these results significant is that a single model was able to compete with specialized systems that were specifically designed for each individual task.</p>
<img src="https://cdn.hashnode.com/uploads/covers/69ce92860ff860b6de01ed93/14e5a9dd-9919-4b2a-ad42-6b011770b7fe.png" alt="14e5a9dd-9919-4b2a-ad42-6b011770b7fe" style="display:block;margin:0 auto" width="1866" height="815" loading="lazy">

<p><em>Figure 2 from</em> “Improving Language Understanding by Generative Pre-Training” <em>(Radford et al., 2018), illustrating performance gains from layer transfer and zero-shot learning behavior.</em></p>
<h2 id="heading-conclusions">Conclusions</h2>
<p>To wrap things up, this paper introduced a major shift in how AI systems are built.</p>
<p>According to the authors, instead of training a new model from scratch for every task, we can first teach a model the structure of language through pre-training, and then adapt it to specific tasks through fine-tuning. This simple idea turns out to be highly effective.</p>
<p>The key takeaway is that language models can develop a general understanding of text, especially when combined with Transformer architectures and large-scale data. This makes transfer learning practical across many different tasks.</p>
<p>In my view, this is what makes the paper so impactful. It doesn’t just improve performance on a few benchmarks. It changes the overall approach to building AI systems.</p>
<p>This idea later became the foundation for models like GPT-2, GPT-3, and ChatGPT, and continues to shape modern large language models today.</p>
<h2 id="heading-limitations">Limitations</h2>
<p>Like any approach, this method comes with its own limitations.</p>
<p>According to the paper, one of the main challenges is the need for large amounts of unlabeled data during the pre-training stage, which may not always be easy to get. The model’s performance also depends heavily on how well the fine-tuning step is done.</p>
<p>The authors also note that multi-task learning was not fully explored in this work, leaving some open questions about how well the model can handle multiple tasks at the same time.</p>
<p>In practice, another limitation is that performance can be weaker when working with very small datasets, especially if the fine-tuning process is not carefully handled.</p>
<h2 id="heading-related-work-amp-context">Related Work &amp; Context</h2>
<p>To better understand where this paper fits, it helps to look at the ideas it builds on.</p>
<p>According to the authors, earlier approaches such as word embeddings (like Word2Vec and GloVe), LSTM-based language models, and semi-supervised learning had already made progress in understanding language. But these methods were often limited to learning representations at the word level or required more task-specific design.</p>
<p>What this paper does differently is move beyond that. Instead of focusing only on individual words, it learns broader language representations that capture context and meaning across entire sequences. This shift is what enables the model to generalize better across different tasks.</p>
<h2 id="heading-final-insight">Final Insight</h2>
<p>If there’s one idea to take away from this paper, it’s this: you don’t need to teach an AI system every task separately.</p>
<p>According to the authors, once a model learns the structure of language, it can adapt to a wide range of tasks with minimal changes. That shift – from task-specific models to general language understanding – is what makes this work so important.</p>
<p>In my view, this is the moment where things really changed. What started here with GPT-1 became the foundation for the systems we use today, including ChatGPT and other modern language models.</p>
<h2 id="heading-resources">Resources:</h2>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD/Pytorch-Collections/tree/main/GPT">Pytorch Projects for GPT series</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1301.3781">Word2Vec (Mikolov et al., 2013)</a></p>
</li>
<li><p><a href="https://aclanthology.org/D14-1162.pdf">GloVe (Pennington et al., 2014)</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1706.03762">Attention Is All You Need (Vaswani et al., 2017)</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1511.01432">Semi-supervised Sequence Learning (Dai and Le, 2015)</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1801.06146">Universal Language Model Fine-tuning for Text Classification (Howard and Ruder, 2018)</a></p>
</li>
<li><p><a href="https://aclanthology.org/N18-1202.pdf">Deep Contextualized Word Representations (Peters et al., 2018)</a></p>
</li>
<li><p><a href="https://aclanthology.org/P17-1194.pdf">Semi-supervised Multitask Learning for Sequence Labeling (Rei, 2017)</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1506.06726">Skip-Thought Vectors (Kiros et al., 2015)</a></p>
</li>
<li><p><a href="https://arxiv.org/pdf/1705.02364">Supervised Learning of Universal Sentence Representations (Conneau et al., 2017)</a></p>
</li>
</ul>
<h3 id="heading-contact-me">Contact Me</h3>
<ul>
<li><p><a href="https://github.com/MOHAMMEDFAHD"><strong>Github</strong></a></p>
</li>
<li><p><a href="https://x.com/programmingoce"><strong>X</strong></a></p>
</li>
<li><p><a href="https://www.linkedin.com/in/mohammed-abrah-6435a63ba/"><strong>Linkedin</strong></a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use NLP Techniques and Tools in Your Projects [Full Handbook] ]]>
                </title>
                <description>
                    <![CDATA[ Nowadays, computers can comprehend and produce human-like language thanks to Natural Language Processing. And this opens up numerous opportunities for you as a developer. This guide will teach you how to create NLP projects from scratch. It includes ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-nlp-techniques-and-tools-in-your-projects-full-handbook/</link>
                <guid isPermaLink="false">692096d4afb994c2aecc26e9</guid>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ handbook ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Oleh Romanyuk ]]>
                </dc:creator>
                <pubDate>Fri, 21 Nov 2025 16:44:04 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763743424066/393a4384-ce7a-4ff8-9e98-1edaaa322bc6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Nowadays, computers can comprehend and produce human-like language thanks to Natural Language Processing. And this opens up numerous opportunities for you as a developer.</p>
<p>This guide will teach you how to create NLP projects from scratch. It includes details on how to organize your workflow, utilize the appropriate tools, and perform typical NLP tasks.</p>
<p>After reading this article, you will understand how to:</p>
<ul>
<li><p>Configure your environment for NLP development.</p>
</li>
<li><p>Select the appropriate frameworks and libraries for your project.</p>
</li>
<li><p>Execute fundamental NLP tasks such as sentiment analysis and text classification.</p>
</li>
<li><p>Create and implement a functional NLP application.</p>
</li>
<li><p>Diagnose and fix common problems in NLP projects.</p>
</li>
</ul>
<p>Before beginning, you should have some basics at hand already. They include a solid understanding of Python programming and knowledge of the general ideas of machine learning. You should also know how to build algorithms and data structures. Finally, your system should have Python 3.8 or higher installed so you can try running the example snippets.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-natural-language-processing">What is Natural Language Processing?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-nlp-systems-interpret-speech">How NLP Systems Interpret Speech</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-typical-nlp-tasks">Typical NLP tasks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conventional-machine-learning-methods-for-nlp">Conventional Machine Learning Methods for NLP</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-nlp-in-various-industries">How to Use NLP in Various Industries</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-choose-the-most-effective-nlp-tools-and-libraries">How to Choose the Most Effective NLP Tools and Libraries</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-prepare-and-train-nlp-systems">How to Prepare and Train NLP systems</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-establishing-and-labeling-datasets">Establishing and Labeling Datasets</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-natural-language-processing">What is Natural Language Processing?</h2>
<p>NLP (natural language processing) is a set of methodologies that allow computers to learn to comprehend human language and produce relevant outputs. </p>
<p>NLP manages the intricacy of human communication. In contrast to conventional machine learning, which operates with structured data only, NLP handles unstructured text data.</p>
<p>Specifically, to more accurately comprehend language, NLP systems simultaneously analyze the syntax (which is the arrangement of words and grammar), the semantics (the meanings of specific words and phrases), and interpret context (how adjacent information affects meaning). This allows them to differentiate between various interpretations of identical words, grasp implied messages, and produce responses as relevant as possible.</p>
<p>The ability of machines to process language was demonstrated by early experiments such as the Georgetown-IBM translation in 1954 and the ELIZA chatbot in 1966 (Sources: <a target="_blank" href="https://www.mdpi.com/2078-2489/15/8/443">Szmurlo and Akhtar, MDPI; Hutchins, ResearchGate</a>). With today's tools, any developer can access and use the capabilities of NLP tools.</p>
<p>So why is this important for you? In 2025, the market for NLP, which currently powers chatbots, translation software, and content creation platforms, has reached $42.47 billion. (Source: <a target="_blank" href="https://www.precedenceresearch.com/natural-language-processing-market">Precedence Research</a>)</p>
<p>The growth is only accelerating. By 2030, the global NLP market is expected to grow to $439.85 billion. (Source: <a target="_blank" href="https://www.grandviewresearch.com/industry-analysis/natural-language-processing-market-report#:~:text=The%20global%20natural%20language%20processing,38.7%25%20from%202025%20to%202030.">GrandviewResearch</a>). </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762161849212/b2e0b1b5-8b91-4061-a647-2488a7396548.png" alt="NLP market size" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-important-nlp-concepts-to-know">Important NLP Concepts to Know</h3>
<p>Five interconnected layers generally make up NLP systems. Every layer addresses a distinct language processing problem. (Source: <a target="_blank" href="https://www.researchgate.net/publication/350058919_Natural_Language_Processing_History_Evolution_Application_and_Future_Work">Khatri and others, ResearchGate).</a></p>
<ul>
<li><p><a target="_blank" href="https://www.researchgate.net/publication/350058919_Natural_Language_Processing_History_Evolution_Application_and_Future_Work"><strong>Analysis of morphology</strong></a> is where you break down words into their most meaningful components by this layer. Words will be broken down into prefixes, roots, and suffixes. For instance, "working" becomes "work" plus "ing." This makes it easier for your system to comprehend word relationships even when they change form.</p>
</li>
<li><p><strong>Analysis of syntactic structure</strong> is where you use grammar rules to determine sentence structure. Here, you construct parse trees that map the grammatical relationships between words. Individual words are represented as leaves, phrases as intermediate nodes, and sentences as roots in the tree.</p>
</li>
<li><p><strong>Analysis of semantics</strong> is where, from the parsed structure, you derive the true meaning.</p>
</li>
<li><p>You deal with synonyms, antonyms, and homophones as well as word ambiguity. This transforms grammatical structure into meaning.</p>
</li>
<li><p><strong>Analysis of</strong> <strong>discourse</strong> is where you connect sentences within longer text structures. You'll observe how ideas flow from one paragraph to the next and spot recurring themes. This connects meaning at the sentence level to meaning at the document level.</p>
</li>
<li><p><strong>Analysis of</strong> <strong>pragmatics</strong> is where you decipher intent and context. You will be able to resolve references, comprehend dialogue structure, and decipher implied meanings. You can process sarcasm, cultural background, and other aspects of everyday communication at this layer.</p>
</li>
</ul>
<p>Understanding these layers gives you the ability to build NLP systems that can manage challenging language tasks in a variety of contexts.</p>
<h2 id="heading-how-nlp-systems-interpret-speech">How NLP Systems Interpret Speech</h2>
<p>NLP systems use a pipeline to convert raw text into computational meaning. Each step builds on its predecessor, allowing for better analysis of unstructured language data. In this section, I’ll provide real snippets of code you can insert into an editor for training.</p>
<h3 id="heading-step-1-text-input">Step 1: Text Input</h3>
<p>To start, your system will take in raw text that can come in various forms. Potential sources for raw input include emails, social media posts, articles, documents, or transcripts of speeches. The raw data will contain misspellings, crude language, and grammatical mistakes you'll need to circumvent.</p>
<h3 id="heading-step-2-text-preprocessing">Step 2: Text Preprocessing</h3>
<p>Next, you’ll need to clean and standardize the input text before your system analyzes it. Your pre-process will likely include some or all of these steps:</p>
<ul>
<li><p>Tokenizing text into single words or subwords</p>
</li>
<li><p>Removing punctuation marks from the text</p>
</li>
<li><p>Lower casing all the text</p>
</li>
<li><p>Removing stop words like "the", "and," and "is."</p>
</li>
</ul>
<p>For example, you can accomplish such a simple form of NLP using Python, but note that you need to import specific libraries (we will discuss them later):</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> nltk
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
<span class="hljs-keyword">from</span> nltk.tokenize <span class="hljs-keyword">import</span> word_tokenize

<span class="hljs-comment"># Download required NLTK data</span>
nltk.download(<span class="hljs-string">'punkt'</span>)
nltk.download(<span class="hljs-string">'stopwords'</span>)

<span class="hljs-comment"># Raw text input</span>
text = <span class="hljs-string">"The quick brown fox jumps over the lazy dog!"</span>

<span class="hljs-comment"># Tokenization</span>
tokens = word_tokenize(text.lower())

<span class="hljs-comment"># Remove punctuation and stop words</span>
stop_words = set(stopwords.words(<span class="hljs-string">'english'</span>))
filtered_tokens = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> tokens <span class="hljs-keyword">if</span> word.isalnum() <span class="hljs-keyword">and</span> word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> stop_words]

print(filtered_tokens)
<span class="hljs-comment"># Output: ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']</span>
</code></pre>
<h3 id="heading-step-3-syntactic-parsing-and-analysis">Step 3: Syntactic Parsing and Analysis</h3>
<p>After cleaning, you’ll analyze the text’s grammatical structure by constructing parse trees. While parse trees can vary in complexity, they map the relationships between words, phrases, and clauses. You can leverage part-of-speech tagging information to assign grammatical roles (noun, verb, adjective, and so on) to words, and dependency parsing to learn how related words are linked syntactically.</p>
<p>For example, the code below illustrates how to perform part-of-speech tagging with spaCy, which determines the grammatical function of each word within a sentence.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> spacy

<span class="hljs-comment"># Load English language model</span>
nlp = spacy.load(<span class="hljs-string">"en_core_web_sm"</span>)

<span class="hljs-comment"># Process text</span>
doc = nlp(<span class="hljs-string">"The cat sat on the mat"</span>)

<span class="hljs-comment"># Part-of-speech tagging</span>
<span class="hljs-keyword">for</span> token <span class="hljs-keyword">in</span> doc:
    print(<span class="hljs-string">f"<span class="hljs-subst">{token.text}</span>: <span class="hljs-subst">{token.pos_}</span>"</span>)

<span class="hljs-comment"># Output:</span>
<span class="hljs-comment"># The: DET</span>
<span class="hljs-comment"># cat: NOUN</span>
<span class="hljs-comment"># sat: VERB</span>
<span class="hljs-comment"># on: ADP</span>
<span class="hljs-comment"># the: DET</span>
<span class="hljs-comment"># mat: NOUN</span>
</code></pre>
<h3 id="heading-step-4-feature-engineering-and-text-representation">Step 4: Feature Engineering and Text Representation</h3>
<p>Here, you convert words into numerical vectors that computers can parse using embedding or transformer-based techniques to capture similarities and semantic relationships between terms. For instance, this allows your system to understand that the words "kid" and "child" are similar in meaning.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sentence_transformers <span class="hljs-keyword">import</span> SentenceTransformer

<span class="hljs-comment"># Load pre-trained model</span>
model = SentenceTransformer(<span class="hljs-string">'all-MiniLM-L6-v2'</span>)

<span class="hljs-comment"># Convert sentences to embeddings</span>
sentences = [<span class="hljs-string">"The cat sits on the mat"</span>, <span class="hljs-string">"The feline rests on the rug"</span>]
embeddings = model.encode(sentences)

print(<span class="hljs-string">f"Embedding shape: <span class="hljs-subst">{embeddings.shape}</span>"</span>)
<span class="hljs-comment"># Output: Embedding shape: (2, 384)</span>
</code></pre>
<h3 id="heading-step-5-modeling-and-pattern-recognition">Step 5: Modeling and Pattern Recognition</h3>
<p>In this part of the process, you’ll use machine learning algorithms to identify patterns from vectorized text. You may use either a traditional machine learning representation or one of the deep learning methods, such as transformers. Your models will learn about patterns in the language, classify the content presented, or extract entities in the text.</p>
<p>To understand this method, let’s see a straightforward example of a text classification model that uses transformers to identify sentiments.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

<span class="hljs-comment"># Load a pre-trained sentiment analysis model</span>
classifier = pipeline(<span class="hljs-string">"sentiment-analysis"</span>)

<span class="hljs-comment"># Classify text sentiment</span>
texts = [<span class="hljs-string">"I love this product!"</span>, <span class="hljs-string">"This is terrible and disappointing"</span>]
results = classifier(texts)

<span class="hljs-keyword">for</span> text, result <span class="hljs-keyword">in</span> zip(texts, results):
    print(<span class="hljs-string">f"Text: <span class="hljs-subst">{text}</span>"</span>)
    print(<span class="hljs-string">f"Sentiment: <span class="hljs-subst">{result[<span class="hljs-string">'label'</span>]}</span>, Confidence: <span class="hljs-subst">{result[<span class="hljs-string">'score'</span>]:<span class="hljs-number">.2</span>f}</span>\n"</span>)

<span class="hljs-comment"># Output:</span>
<span class="hljs-comment"># Text: I love this product!</span>
<span class="hljs-comment"># Sentiment: POSITIVE, Confidence: 0.99</span>
<span class="hljs-comment">#</span>
<span class="hljs-comment"># Text: This is terrible and disappointing</span>
<span class="hljs-comment"># Sentiment: NEGATIVE, Confidence: 0.99</span>
</code></pre>
<p>This illustrates how the model detects linguistic patterns for sentiment categorization, which is a typical task in NLP. In subsequent sections, we’ll delve into more specialized modeling techniques tailored for various NLP applications.</p>
<h3 id="heading-step-6-evaluation-and-deployment">Step 6: Evaluation and Deployment</h3>
<p>Next, you will evaluate your model from metrics such as precision, recall, and F1 scores. After evaluation, you will deploy your model to production, and the model will continue to learn from data produced from real-world text. Here’s an example of how it’s done:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> classification_report

<span class="hljs-comment"># Example predictions vs actual labels</span>
y_true = [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>]
y_pred = [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>]

<span class="hljs-comment"># Generate evaluation metrics</span>
print(classification_report(y_true, y_pred))
</code></pre>
<h2 id="heading-typical-nlp-tasks">Typical NLP Tasks</h2>
<h3 id="heading-natural-language-understanding-nlu-tasks">Natural Language Understanding (NLU) tasks</h3>
<p>Natural Language Understanding (NLU) tasks deal with actually understanding what people are communicating about. There are several elements involved in this process.</p>
<h4 id="heading-sentiment-analysis-and-text-classification">Sentiment analysis and text classification</h4>
<p>Here, you recognize and categorize documents according to emotion. Your engine identifies whether the text conveys a positive, negative, or neutral sentiment. Then, it autonomously filters content across digital platforms. Consider this example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

<span class="hljs-comment"># Load sentiment analysis pipeline</span>
classifier = pipeline(<span class="hljs-string">"sentiment-analysis"</span>)

<span class="hljs-comment"># Analyze sentiment</span>
result = classifier(<span class="hljs-string">"I love this product! It works great."</span>)
print(result)
<span class="hljs-comment"># Output: [{'label': 'POSITIVE', 'score': 0.9998}]</span>
</code></pre>
<h4 id="heading-named-entity-recognition-ner">Named Entity Recognition (NER)</h4>
<p>NER is a pipeline that involves automatically identifying and classifying distinct pieces of information within a body of text. This includes names of people, locations, organizations, dates, and monetary figures.</p>
<p>Your NER system analyzes unstructured text to accurately label these entities, converting raw data into a structured format that can be easily analyzed. Your algorithm can also uncover relationships among these entities, allowing you to gain valuable insights from extensive amounts of text.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> spacy

nlp = spacy.load(<span class="hljs-string">"en_core_web_sm"</span>)
doc = nlp(<span class="hljs-string">"Apple Inc. was founded by Steve Jobs in Cupertino, California."</span>)

<span class="hljs-keyword">for</span> ent <span class="hljs-keyword">in</span> doc.ents:
     print(<span class="hljs-string">f"<span class="hljs-subst">{ent.text}</span>: <span class="hljs-subst">{ent.label_}</span>"</span>)

<span class="hljs-comment"># Output:</span>
<span class="hljs-comment"># Apple Inc.: ORG</span>
<span class="hljs-comment"># Steve Jobs: PERSON</span>
<span class="hljs-comment"># Cupertino: GPE</span>
<span class="hljs-comment"># California: GPE</span>
</code></pre>
<h4 id="heading-question-answering">Question answering</h4>
<p>You can create systems that consume natural language questions and retrieve appropriate answers. Your system can also use entailment and contradiction detection to analyze the logical relationships between text blocks. </p>
<h4 id="heading-intent-recognition">Intent recognition</h4>
<p>You can recognize user intentions in conversational domains. Your dialog systems are conscious of the user’s goals, allowing buttons or voices to respond in kind. </p>
<p>Now, let’s move on to some general natural language-related tasks.</p>
<h3 id="heading-general-natural-language-tasks">General Natural Language Tasks</h3>
<p>This class of tasks pulls together some aspects of understanding while dealing with generation as well.</p>
<h4 id="heading-machine-translation">Machine translation</h4>
<p>You can translate text across multiple languages while preserving context and meaning. Neural networks use encoder-decoder architectures to create linguistic outputs in the target language.</p>
<p>Let’s see how it’s done with the MarianMTModel and MarianTokenizer models:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> MarianMTModel, MarianTokenizer

<span class="hljs-comment"># Load translation model</span>
model_name = <span class="hljs-string">'Helsinki-NLP/opus-mt-en-es'</span>
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

<span class="hljs-comment"># Translate English to German</span>
text = <span class="hljs-string">"Hello, how are you?"</span>
translated = model.generate(**tokenizer(text, return_tensors=<span class="hljs-string">"pt"</span>, padding=<span class="hljs-literal">True</span>))
print(tokenizer.decode(translated[<span class="hljs-number">0</span>], skip_special_tokens=<span class="hljs-literal">True</span>))
<span class="hljs-comment"># Output: Hallo, wie geht's dir?</span>
</code></pre>
<h4 id="heading-text-summarization">Text summarization</h4>
<p>Often you’ll need to shorten a long document into a more accessible summary – this is text summarization, and it’s a common NLP task. Your system retains key details and coherence while reducing the length of a document.</p>
<h4 id="heading-speech-recognition-and-text-to-speech">Speech recognition and text-to-speech</h4>
<p>Using these techniques, you can turn speech into text (speech recognition) or text into natural audio (text-to-speech). These tasks close the gap between text and audio modalities. </p>
<h4 id="heading-syntactic-parsing">Syntactic parsing</h4>
<p>Here, you examine the grammatical construction to determine the syntactic relationships between words in the sentence. This critical task gives a structural analysis of the text to support more complex understanding tasks.</p>
<p>These tasks, when combined, create powerful applications for different industries and use cases in Natural Language Processing.</p>
<h2 id="heading-conventional-machine-learning-methods-for-nlp">Conventional Machine Learning Methods for NLP</h2>
<p>Instead of relying on manually created linguistic rules (where programmers specify patterns like "if a word ends with '-ing', it is likely a verb" or "sentences containing 'not' followed by positive words suggest negative sentiment"), ML approaches apply statistical methods to discover patterns automatically within the data.</p>
<p>These methods learn through examples and don’t require human experts to define every potential language structure explicitly. As a result, they are more scalable and adaptable across different languages and fields. Let’s look at some of them now.</p>
<h3 id="heading-logistic-regression">Logistic Regression</h3>
<p>For tasks involving binary classification, you can use logistic regression. Based on input features, it predicts event probability by learning linear decision boundaries. Consider the following example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> TfidfVectorizer
<span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LogisticRegression
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

<span class="hljs-comment"># Sample data</span>
texts = [<span class="hljs-string">"This is spam"</span>, <span class="hljs-string">"Normal email"</span>, <span class="hljs-string">"Buy now!"</span>, <span class="hljs-string">"Meeting tomorrow"</span>]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]  <span class="hljs-comment"># 1 = spam, 0 = not spam</span>

<span class="hljs-comment"># Convert text to features</span>
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

<span class="hljs-comment"># Train model</span>
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=<span class="hljs-number">0.25</span>)
model = LogisticRegression()
model.fit(X_train, y_train)

<span class="hljs-comment"># Predict</span>
new_text = vectorizer.transform([<span class="hljs-string">"Free money now"</span>])
prediction = model.predict(new_text)
print(<span class="hljs-string">f"Prediction: <span class="hljs-subst">{<span class="hljs-string">'Spam'</span> <span class="hljs-keyword">if</span> prediction[<span class="hljs-number">0</span>] == <span class="hljs-number">1</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'Not Spam'</span>}</span>"</span>)
</code></pre>
<p>Typical uses include toxicity classification, sentiment analysis, and spam detection.</p>
<h3 id="heading-naive-bayes">Naive Bayes</h3>
<p>Using the premise that words are independent, <a target="_blank" href="https://www.freecodecamp.org/news/how-naive-bayes-classifiers-work/">Naive Bayes</a> applies <a target="_blank" href="https://www.freecodecamp.org/news/bayes-rule-explained/">Bayes' Theorem</a>.</p>
<p>To classify documents, it computes:</p>
<p>$$P(label|text) = P(label) × P(text|label) / P(text)$$</p><pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.naive_bayes <span class="hljs-keyword">import</span> MultinomialNB
<span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> CountVectorizer

<span class="hljs-comment"># Training data</span>
texts = [<span class="hljs-string">"I love this product"</span>, <span class="hljs-string">"Terrible service"</span>, <span class="hljs-string">"Amazing quality"</span>, <span class="hljs-string">"Waste of money"</span>]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]  <span class="hljs-comment"># 1 = positive, 0 = negative</span>

<span class="hljs-comment"># Vectorize text</span>
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

<span class="hljs-comment"># Train Naive Bayes</span>
clf = MultinomialNB()
clf.fit(X, labels)

<span class="hljs-comment"># Predict sentiment</span>
new_review = vectorizer.transform([<span class="hljs-string">"Great purchase"</span>])
print(<span class="hljs-string">f"Sentiment: <span class="hljs-subst">{<span class="hljs-string">'Positive'</span> <span class="hljs-keyword">if</span> clf.predict(new_review)[<span class="hljs-number">0</span>] == <span class="hljs-number">1</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'Negative'</span>}</span>"</span>)
</code></pre>
<p>Common uses for this algorithm that you can try are spam detection and bug detection in software.</p>
<h3 id="heading-decision-trees">Decision Trees</h3>
<p>Decision trees partition data sets recursively by choosing the feature that maximizes information gain at each split in a way that builds interpretable, tree-like models. Each internal node is a decision (on a feature), each branch is an outcome of the decision, and each leaf node is a classification.</p>
<p>Decision trees are especially useful for text classification and feature selection because the decision tree allows you to trace exactly how the model made the predicted classification. </p>
<p>Let’s see a code example that shows how the decision tree learns which words, converted to TF-IDF features, predict whether the sentiment of the text in question is positive or negative:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeClassifier
<span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> TfidfVectorizer
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split

<span class="hljs-comment"># Sample text data with labels</span>
texts = [
    <span class="hljs-string">"I love this movie, it's fantastic"</span>,
    <span class="hljs-string">"Terrible film, waste of time"</span>,
    <span class="hljs-string">"Amazing performance and great story"</span>,
    <span class="hljs-string">"Boring and disappointing"</span>,
    <span class="hljs-string">"Excellent cinematography and acting"</span>,
    <span class="hljs-string">"Awful, would not recommend"</span>
]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]  <span class="hljs-comment"># 1 = positive, 0 = negative</span>

<span class="hljs-comment"># Convert text to TF-IDF features</span>
vectorizer = TfidfVectorizer(max_features=<span class="hljs-number">20</span>)
X = vectorizer.fit_transform(texts)

<span class="hljs-comment"># Split data</span>
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=<span class="hljs-number">0.3</span>, random_state=<span class="hljs-number">42</span>)

<span class="hljs-comment"># Train decision tree</span>
clf = DecisionTreeClassifier(max_depth=<span class="hljs-number">3</span>, random_state=<span class="hljs-number">42</span>)
clf.fit(X_train, y_train)

<span class="hljs-comment"># Make predictions</span>
test_text = [<span class="hljs-string">"This movie is wonderful"</span>]
test_vector = vectorizer.transform(test_text)
prediction = clf.predict(test_vector)

print(<span class="hljs-string">f"Text: <span class="hljs-subst">{test_text[<span class="hljs-number">0</span>]}</span>"</span>)
print(<span class="hljs-string">f"Predicted sentiment: <span class="hljs-subst">{<span class="hljs-string">'Positive'</span> <span class="hljs-keyword">if</span> prediction[<span class="hljs-number">0</span>] == <span class="hljs-number">1</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'Negative'</span>}</span>"</span>)
print(<span class="hljs-string">f"Model accuracy: <span class="hljs-subst">{clf.score(X_test, y_test):<span class="hljs-number">.2</span>f}</span>"</span>)
</code></pre>
<p>At each node, the decision tree asks a question: "Does the text have a high TF-IDF score for 'wonderful'?" Then the tree will branch accordingly based on the answer to the question until reaching a classification.</p>
<p>One key parameter in the above code is <code>max_depth=3</code> – without it, the tree may become too complex and overfit. The parameter limits the complexity of the tree.</p>
<h3 id="heading-latent-dirichlet-allocation-lda">Latent Dirichlet Allocation (LDA)</h3>
<p>Latent Dirichlet Allocation (LDA) automatically determines thematic structures in large collections of texts by treating documents as probabilistic mixtures of topics, and topics as distributions over words. This discovery approach uses unsupervised learning, which means that no labeled training data are needed to discover structured but hidden themes. LDA is suited for exploratory text analysis and organization of data in significant amounts of text.</p>
<p>Let’s see some code that generates a word frequency matrix from documents. In this code, LDA identifies two underlying topics based on patterns of word co-occurrence, a process that is a type of clustering analysis for text documents.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> LatentDirichletAllocation
<span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> CountVectorizer

<span class="hljs-comment"># Document collection</span>
documents = [
    <span class="hljs-string">"Machine learning algorithms process data"</span>,
    <span class="hljs-string">"Deep learning uses neural networks"</span>,
    <span class="hljs-string">"Python is great for data science"</span>,
    <span class="hljs-string">"Neural networks learn from examples"</span>
]

<span class="hljs-comment"># Create document-term matrix</span>
vectorizer = CountVectorizer(max_features=<span class="hljs-number">50</span>)
doc_term_matrix = vectorizer.fit_transform(documents)

<span class="hljs-comment"># Train LDA model</span>
lda = LatentDirichletAllocation(n_components=<span class="hljs-number">2</span>, random_state=<span class="hljs-number">42</span>)
lda.fit(doc_term_matrix)

<span class="hljs-comment"># Display topics</span>
feature_names = vectorizer.get_feature_names_out()
<span class="hljs-keyword">for</span> topic_idx, topic <span class="hljs-keyword">in</span> enumerate(lda.components_):
    top_words_idx = topic.argsort()[<span class="hljs-number">-5</span>:]
    top_words = [feature_names[i] <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> top_words_idx]
    print(<span class="hljs-string">f"Topic <span class="hljs-subst">{topic_idx}</span>: <span class="hljs-subst">{<span class="hljs-string">', '</span>.join(top_words)}</span>"</span>)
</code></pre>
<p>In this illustration, we could interpret Topic 0 as "data science and algorithms," and Topic 1 as "neural networks and deep learning." The LDA model will assign, in a mixed model fashion, a probability distribution of each document falling under the two topics. For instance, a document titled "neural networks for data processing" could be considered 60% Topic 1 and 40% Topic 0.</p>
<h3 id="heading-deep-learning-models">Deep Learning Models</h3>
<p>Deep learning models automatically extract hierarchical representations from raw text without manual feature engineering. Applying deep learning to language processing is important because language understanding requires modeling not just individual words, but also phrases, sentences, and the context as a whole.</p>
<p>A neural architecture achieves this modeling by learning multiple layers of abstraction and can interpret the sentences in more complex ways, such as sentiment, intent, or topic. </p>
<p>Let’s illustrate how it works with an example showing a simplified deep learning model that can be used for text classification using TensorFlow/Keras. This specific example uses an embedding layer to map words to dense vectors that capture their semantic meaning, as well as a Bidirectional LSTM layer which is able to capture information from the past and future of a sequence and outputs to a Dense layer for binary classification.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> tensorflow.keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> tensorflow.keras.layers <span class="hljs-keyword">import</span> Embedding, LSTM, Dense
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.sequence <span class="hljs-keyword">import</span> pad_sequences

<span class="hljs-comment"># Example sentences and labels</span>
texts = [<span class="hljs-string">"I like this movie"</span>, <span class="hljs-string">"I hate this movie"</span>]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>]  <span class="hljs-comment"># 1 = positive, 0 = negative</span>

<span class="hljs-comment"># Tokenize text and pad sequences</span>
tokenizer = Tokenizer(num_words=<span class="hljs-number">50</span>)
tokenizer.fit_on_texts(texts)
X = pad_sequences(tokenizer.texts_to_sequences(texts), maxlen=<span class="hljs-number">5</span>)

<span class="hljs-comment"># Simple model: embedding + LSTM + output</span>
model = Sequential([
    Embedding(input_dim=<span class="hljs-number">50</span>, output_dim=<span class="hljs-number">8</span>, input_length=<span class="hljs-number">5</span>),
    LSTM(<span class="hljs-number">4</span>),
    Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>)
])

model.compile(optimizer=<span class="hljs-string">'adam'</span>, loss=<span class="hljs-string">'binary_crossentropy'</span>)
model.fit(X, labels, epochs=<span class="hljs-number">5</span>, verbose=<span class="hljs-number">0</span>)

<span class="hljs-comment"># Predict sentiment for new sentence</span>
test_text = [<span class="hljs-string">"I love this"</span>]
test_seq = pad_sequences(tokenizer.texts_to_sequences(test_text), maxlen=<span class="hljs-number">5</span>)
pred = model.predict(test_seq)[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]

print(<span class="hljs-string">f"Sentiment score: <span class="hljs-subst">{pred:<span class="hljs-number">.2</span>f}</span> (1=positive, 0=negative)"</span>)
</code></pre>
<p>The model learns these patterns from example sentences that have been labeled as positive or negative and then uses those learned patterns to predict the sentiment of new text input. This is an example of how deep learning models learn to automatically represent the text that is processed and then use that representation to interpret sequences of text for classification purposes, without any feature engineering.</p>
<h3 id="heading-convolutional-neural-networks-cnns">Convolutional Neural Networks (CNNs)</h3>
<p>CNNs apply the same pattern-detecting framework to the text as they do to image recognition. CNNs see documents as sequences, and when a convolutional filter is applied across the text, it detects patterns for various types of features, such as n-grams (sequences of symbols that are adjacent to one another), and meaningful phrases.</p>
<p>CNNs encompass multi-filter layers to detect different features. Each filter layer detects features that are continuously more abstract, going from simple combinations of words to capturing combinations of words that are consistently used in semantic patterns, creating an effective use for the text classification task. (Source: <a target="_blank" href="https://arxiv.org/abs/1408.5882">Yoon Kim</a>)</p>
<p>Here is an example of the convolutional layer scanning through the text using filters. It detects meaningful patterns established through previous learning, such as the word "excellent" or "terrible waste," learning to treat each combination of words as expressing a positive or negative sentiment during a final classification step.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">from</span> tensorflow.keras.layers <span class="hljs-keyword">import</span> Embedding, Conv1D, GlobalMaxPooling1D, Dense
<span class="hljs-keyword">from</span> tensorflow.keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.sequence <span class="hljs-keyword">import</span> pad_sequences

<span class="hljs-comment"># Sample training data</span>
texts = [
    <span class="hljs-string">"This movie is excellent and entertaining"</span>,
    <span class="hljs-string">"Terrible film, complete waste"</span>,
    <span class="hljs-string">"Amazing story and great acting"</span>,
    <span class="hljs-string">"Boring and poorly made"</span>
]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]  <span class="hljs-comment"># 1 = positive, 0 = negative</span>

<span class="hljs-comment"># Tokenize and pad sequences</span>
tokenizer = Tokenizer(num_words=<span class="hljs-number">100</span>)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
X = pad_sequences(sequences, maxlen=<span class="hljs-number">10</span>)

<span class="hljs-comment"># Build CNN model</span>
model = Sequential([
    Embedding(input_dim=<span class="hljs-number">100</span>, output_dim=<span class="hljs-number">32</span>, input_length=<span class="hljs-number">10</span>),  <span class="hljs-comment"># Convert words to dense vectors</span>
    Conv1D(filters=<span class="hljs-number">64</span>, kernel_size=<span class="hljs-number">3</span>, activation=<span class="hljs-string">'relu'</span>),  <span class="hljs-comment"># Detect 3-word patterns</span>
    GlobalMaxPooling1D(),  <span class="hljs-comment"># Extract most important features</span>
    Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>)  <span class="hljs-comment"># Binary classification</span>
])

model.compile(optimizer=<span class="hljs-string">'adam'</span>, loss=<span class="hljs-string">'binary_crossentropy'</span>, metrics=[<span class="hljs-string">'accuracy'</span>])
model.fit(X, labels, epochs=<span class="hljs-number">10</span>, verbose=<span class="hljs-number">0</span>)

<span class="hljs-comment"># Test prediction</span>
test_text = [<span class="hljs-string">"wonderful movie with great plot"</span>]
test_seq = tokenizer.texts_to_sequences(test_text)
test_pad = pad_sequences(test_seq, maxlen=<span class="hljs-number">10</span>)
prediction = model.predict(test_pad)

print(<span class="hljs-string">f"Sentiment probability: <span class="hljs-subst">{prediction[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)
print(<span class="hljs-string">f"Classification: <span class="hljs-subst">{<span class="hljs-string">'Positive'</span> <span class="hljs-keyword">if</span> prediction[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>] &gt; <span class="hljs-number">0.5</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'Negative'</span>}</span>"</span>)
</code></pre>
<p>The pooling layer analyzes this filtered text and brings forth the most substantial signals for measuring positive versus negative sentiments from the convolutional text features of the previous steps.</p>
<h3 id="heading-recurrent-neural-networks-rnns">Recurrent Neural Networks (RNNs)</h3>
<p>RNNs handle sequential data by tracking hidden states that reflect dependencies over time. At each time step, the RNN receives the current word and the previous hidden state as input and changes the hidden state, which reflects the accumulated context. </p>
<p>Here's a concrete example where, as the RNN reads the next word from left to right, it updates its hidden state to maintain the context. </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">from</span> tensorflow.keras.layers <span class="hljs-keyword">import</span> Embedding, SimpleRNN, Dense
<span class="hljs-keyword">from</span> tensorflow.keras.models <span class="hljs-keyword">import</span> Sequential
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.text <span class="hljs-keyword">import</span> Tokenizer
<span class="hljs-keyword">from</span> tensorflow.keras.preprocessing.sequence <span class="hljs-keyword">import</span> pad_sequences

<span class="hljs-comment"># Training data</span>
texts = [
    <span class="hljs-string">"I really enjoyed this book"</span>,
    <span class="hljs-string">"The plot was confusing and dull"</span>,
    <span class="hljs-string">"Fantastic read, highly recommend"</span>,
    <span class="hljs-string">"Disappointing and poorly written"</span>
]
labels = [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]

<span class="hljs-comment"># Prepare data</span>
tokenizer = Tokenizer(num_words=<span class="hljs-number">100</span>)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
X = pad_sequences(sequences, maxlen=<span class="hljs-number">10</span>)

<span class="hljs-comment"># Build RNN model</span>
model = Sequential([
    Embedding(input_dim=<span class="hljs-number">100</span>, output_dim=<span class="hljs-number">32</span>, input_length=<span class="hljs-number">10</span>),
    SimpleRNN(units=<span class="hljs-number">64</span>, return_sequences=<span class="hljs-literal">False</span>),  <span class="hljs-comment"># Process sequence and maintain hidden state</span>
    Dense(<span class="hljs-number">1</span>, activation=<span class="hljs-string">'sigmoid'</span>)
])

model.compile(optimizer=<span class="hljs-string">'adam'</span>, loss=<span class="hljs-string">'binary_crossentropy'</span>, metrics=[<span class="hljs-string">'accuracy'</span>])
model.fit(X, labels, epochs=<span class="hljs-number">20</span>, verbose=<span class="hljs-number">0</span>)

<span class="hljs-comment"># Test</span>
test_text = [<span class="hljs-string">"amazing story highly engaging"</span>]
test_seq = tokenizer.texts_to_sequences(test_text)
test_pad = pad_sequences(test_seq, maxlen=<span class="hljs-number">10</span>)
prediction = model.predict(test_pad)

print(<span class="hljs-string">f"Sentiment probability: <span class="hljs-subst">{prediction[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)
</code></pre>
<p>Longer sentences are more complex because the information contained in the hidden state is lost over an ever-increasing number of time steps. That's the motivation for the more sophisticated architectures of long short-term memory (LSTM) and gated recurrent unit (GRU).</p>
<h3 id="heading-encoder-decoder-architectures">Encoder-Decoder Architectures</h3>
<p>These architectures have two neural networks which work together. The first encoder neural network takes the input text and reduces it to a dense, fixed-size representation but encodes the essential meaning. Then a second decoder network generates an output text based on the meaning representation.</p>
<p>These architectures learn a compressed representation of the input data, and they are often used for:</p>
<ul>
<li><p>Dimensionality reductions.</p>
</li>
<li><p>Feature learning.</p>
</li>
<li><p>Document clustering.</p>
</li>
<li><p>Sequence-to-sequence tasks (for example, translations or summarizations).</p>
</li>
</ul>
<p>The following example illustrates how to use a Text-to-Text Transfer Transformer (T5) encoder-decoder model to translate English into German. The encoder takes the input English sentence and builds its internal representation of the text, while the decoder generates the German translation based on the representation:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> T5Tokenizer, T5ForConditionalGeneration

<span class="hljs-comment"># Load T5 model for text generation</span>
tokenizer = T5Tokenizer.from_pretrained(<span class="hljs-string">"t5-small"</span>)
model = T5ForConditionalGeneration.from_pretrained(<span class="hljs-string">"t5-small"</span>)

<span class="hljs-comment"># Translate text</span>
input_text = <span class="hljs-string">"translate English to German: Hello, how are you?"</span>
input_ids = tokenizer(input_text, return_tensors=<span class="hljs-string">"pt"</span>).input_ids

<span class="hljs-comment"># Generate translation</span>
outputs = model.generate(input_ids)
translation = tokenizer.decode(outputs[<span class="hljs-number">0</span>], skip_special_tokens=<span class="hljs-literal">True</span>)
print(<span class="hljs-string">f"Translation: <span class="hljs-subst">{translation}</span>"</span>)
</code></pre>
<p>This architecture solves the issue of variable-length input and output in a very elegant way. The encoding neural network reduces the sentence to a fixed-size representation regardless of the input length. Subsequently, the decoder generates an output for whatever length it determines is appropriate based on the input length, whether it’s one sentence or six sentences.</p>
<h3 id="heading-transformer-models">Transformer Models</h3>
<p>Unlike RNNs, in which text is processed sequentially (one word at a time), transformers use a processing mechanism that evaluates the sequence in parallel. This means that the transformer can simultaneously consider all of the words in a sentence and directly compute relationships between any two words, even apart in distance.</p>
<p>In the example below, "The girl didn't go to school because she was ill," the model directly connects "she" with "girl" despite other words between these two. This brings a faster ability to train on information and helps avoid the degradation of information through time steps. (Source: <a target="_blank" href="https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf">Vaswani and others).</a></p>
<p>In the example, BERT, one of the most well-known transformer models, performs sentiment classification on a text. Here’s how the transformer justifies text classification by understanding pre-trained language and only using minimal additional training:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> BertTokenizer, BertForSequenceClassification
<span class="hljs-keyword">import</span> torch

<span class="hljs-comment"># Load pre-trained BERT</span>
tokenizer = BertTokenizer.from_pretrained(<span class="hljs-string">'bert-base-uncased'</span>)
model = BertForSequenceClassification.from_pretrained(<span class="hljs-string">'bert-base-uncased'</span>)

<span class="hljs-comment"># Prepare input</span>
text = <span class="hljs-string">"This movie was fantastic!"</span>
inputs = tokenizer(text, return_tensors=<span class="hljs-string">"pt"</span>, padding=<span class="hljs-literal">True</span>, truncation=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Get predictions</span>
<span class="hljs-keyword">with</span> torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=<span class="hljs-number">-1</span>)

print(<span class="hljs-string">f"Prediction scores: <span class="hljs-subst">{predictions}</span>"</span>)
</code></pre>
<p>In the above code, the tokenizer converts the sequence of text into numerical tokens (which BERT understands) and special tokens, such as [CLS] (for classification), at the beginning of the list of tokens. BERT then models the entire length of the sentence using multiple layers, where each layer is able to learn abstract representations of meaning in each layer.</p>
<h2 id="heading-how-to-use-nlp-in-various-industries">How to Use NLP in Various Industries</h2>
<p>You can use NLP to solve issues in almost any sector, and there are many sector-specific implementations. You can choose to try the snippets below depending on the area you’re most interested in.</p>
<h3 id="heading-tourism-and-hospitality">Tourism and Hospitality</h3>
<p>You can use NLP techniques to build intelligent booking systems that comprehend natural language requests from clients. Important uses you can apply:</p>
<ul>
<li><p><strong>Sentiment analysis</strong> monitors consumer feedback to spot patterns in satisfaction and problems with customer service.</p>
</li>
<li><p><strong>NER-enabled chatbots</strong> retrieve dates and locations from consumer inquiries such as "I need a flight to Paris next Tuesday."</p>
</li>
</ul>
<p>Here’s an example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

<span class="hljs-comment"># Load NER model</span>
ner = pipeline(<span class="hljs-string">"ner"</span>, grouped_entities=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># Extract booking information</span>
query = <span class="hljs-string">"I need a hotel in London from December 15 to December 20"</span>
entities = ner(query)

<span class="hljs-keyword">for</span> entity <span class="hljs-keyword">in</span> entities:
    print(<span class="hljs-string">f"<span class="hljs-subst">{entity[<span class="hljs-string">'entity_group'</span>]}</span>: <span class="hljs-subst">{entity[<span class="hljs-string">'word'</span>]}</span>"</span>)
<span class="hljs-comment"># Output: LOC: London</span>
</code></pre>
<p>Through machine translation, you can provide multilingual support to your customers in various languages. And an intent classification model based on BERT automatically identifies how to route your customers for service or makes bookings automatically for them.</p>
<h3 id="heading-logistics-and-supply-chain">Logistics and Supply Chain</h3>
<p>You can automate document processing via NLP and optimize delivery routing using predictive algorithms. Let’s see the common areas of application:</p>
<ul>
<li><strong>You can use OCR to process documents</strong> to automatically extract shipping information from invoices and customs forms. Here’s an example:</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pytesseract
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image

<span class="hljs-comment"># Extract text from shipping document</span>
image = Image.open(<span class="hljs-string">'invoice.png'</span>)
text = pytesseract.image_to_string(image)

<span class="hljs-comment"># Parse extracted information</span>
<span class="hljs-comment"># (Add parsing logic based on document structure)</span>
</code></pre>
<ul>
<li><p><strong>Text classification</strong> can place shipments into categories based on descriptions, allowing for recursive sorting of shipments for transport.</p>
</li>
<li><p><strong>Predictive routing models</strong> can use historical delivery data and weather reports to create delivery schedules.</p>
</li>
<li><p><strong>Natural Language Generation</strong> takes technical data across logistics to create user-friendly tracking updates.</p>
</li>
</ul>
<h3 id="heading-retail-and-ecommerce">Retail and eCommerce</h3>
<p>Within the eCommerse operations, you can personalize your customers’ shopping experience and optimize pricing with NLP techniques.</p>
<p>Some key applications that you can benefit from:</p>
<ul>
<li><strong>Recommendation engines</strong> utilize word embeddings to learn product descriptions and corresponding user reviews to suggest relevant items. Here’s how, for instance:</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sentence_transformers <span class="hljs-keyword">import</span> SentenceTransformer, util

<span class="hljs-comment"># Load embedding model</span>
model = SentenceTransformer(<span class="hljs-string">'all-MiniLM-L6-v2'</span>)

<span class="hljs-comment"># Product descriptions</span>
products = [
    <span class="hljs-string">"Wireless Bluetooth headphones with noise cancellation"</span>,
    <span class="hljs-string">"USB-C charging cable for smartphones"</span>,
    <span class="hljs-string">"Noise-cancelling earbuds with long battery life"</span>
]

<span class="hljs-comment"># User query</span>
query = <span class="hljs-string">"I need headphones that block outside noise"</span>

<span class="hljs-comment"># Calculate similarities</span>
query_embedding = model.encode(query)
product_embeddings = model.encode(products)
similarities = util.cos_sim(query_embedding, product_embeddings)

<span class="hljs-comment"># Find best match</span>
best_match_idx = similarities.argmax()
print(<span class="hljs-string">f"Recommended product: <span class="hljs-subst">{products[best_match_idx]}</span>"</span>)
</code></pre>
<ul>
<li><p><strong>Chatbots that include dialogue management</strong> can respond to inquiries from customers about products, orders, and returns.</p>
</li>
<li><p><strong>Sentiment analysis</strong> on social media tracks brand health and customer sentiment in real-time. </p>
</li>
<li><p><strong>Price optimization algorithms</strong> analyze competitors' pricing and market signals to change prices in real-time.</p>
</li>
<li><p><strong>Demand forecasting</strong> analyzes news and social sentiment to predict inventory needs.</p>
</li>
</ul>
<h3 id="heading-healthcare">Healthcare</h3>
<p>Healthcare, with the great amount of data from patient records, is a natural area for NLP to optimize. You can support clinical decision-making and process medical records using specialized NLP systems. </p>
<p>Here are a few of the possible uses and an example:</p>
<ul>
<li><strong>Clinical NER</strong> identifies conditions, medications, and treatments mentioned in the clinicians' notes within electronic health records. For instance:</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> spacy

<span class="hljs-comment"># Load medical NER model (requires installation of scispacy)</span>
<span class="hljs-comment"># pip install scispacy</span>
<span class="hljs-comment"># pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/en_core_sci_sm-0.5.1.tar.gz</span>

nlp = spacy.load(<span class="hljs-string">"en_core_sci_sm"</span>)

<span class="hljs-comment"># Process clinical note</span>
text = <span class="hljs-string">"Patient presents with hypertension and type 2 diabetes. Prescribed metformin 500mg."</span>
doc = nlp(text)

<span class="hljs-keyword">for</span> ent <span class="hljs-keyword">in</span> doc.ents:
    print(<span class="hljs-string">f"<span class="hljs-subst">{ent.text}</span>: <span class="hljs-subst">{ent.label_}</span>"</span>)
</code></pre>
<ul>
<li><p><strong>Clinical decision support systems</strong> scan descriptions of symptoms and provide suggestions for potential diagnoses to help a physician's decision-making.</p>
</li>
<li><p><strong>Literature mining</strong> scans clinical studies and identifies new treatment patterns or potential drug discovery targets.</p>
</li>
</ul>
<p>Of course, NLP can also be used in patient assistance chatbots, as they can comprehend natural language and its nuances.</p>
<h3 id="heading-financial-services">Financial Services</h3>
<p>In the finance sector, there are unique bottlenecks you might face. Financial data security gaps and the risks of fraud are among the most threatening ones, as well as the regulatory fines that come with these issues.</p>
<p>With NLP, you can improve security mechanisms and create systems for detecting fraud. </p>
<p>You can also detect phishing attacks with high accuracy with ML classifiers and NLP using CNNs and RNNs combined. (Source: <a target="_blank" href="https://www.researchgate.net/publication/385251725_ScienceDirect_Advancements_of_SMS_Spam_Detection_A_Comprehensive_Survey_of_NLP_and_ML_Techniques">Saidat and others, ResearchGate).</a></p>
<p>Some other use cases include:</p>
<ul>
<li><p><strong>Document analysis processes loans</strong> applications/contract to assess credit risk by automated analysis of documents. </p>
</li>
<li><p><strong>Fraud detection systems analyze transaction data</strong> and communication dat to identify suspicious activity. For example:</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

<span class="hljs-comment"># Load zero-shot classification model</span>
classifier = pipeline(<span class="hljs-string">"zero-shot-classification"</span>)

<span class="hljs-comment"># Analyze transaction description</span>
description = <span class="hljs-string">"Wire transfer to offshore account for investment opportunity"</span>
candidate_labels = [<span class="hljs-string">"legitimate transaction"</span>, <span class="hljs-string">"potential fraud"</span>, <span class="hljs-string">"suspicious activity"</span>]

result = classifier(description, candidate_labels)
print(<span class="hljs-string">f"Classification: <span class="hljs-subst">{result[<span class="hljs-string">'labels'</span>][<span class="hljs-number">0</span>]}</span> (Score: <span class="hljs-subst">{result[<span class="hljs-string">'scores'</span>][<span class="hljs-number">0</span>]:<span class="hljs-number">.4</span>f}</span>)"</span>)
</code></pre>
<ul>
<li><p><strong>Automated compliance monitoring</strong> scans messages for adherence with regulations.</p>
</li>
<li><p><strong>Robo-advisors leverage natural language interfaces</strong> to engage with clients while providing investment advice.</p>
</li>
</ul>
<p>Apart from these uses, conventional chatbots also provide assistance by using NLP techniques. OCR algorithms are widely used for document analysis – but we’ve mentioned those other use cases, so we won’t discuss them further here.</p>
<h3 id="heading-legal-industry-and-compliance-regulations">Legal Industry and Compliance Regulations</h3>
<p>Even more than financial services, the legal sector depends on strict requirements, laws, and regulations. NLP techniques can help you improve safety, security, and efficiency in processing legal documents.</p>
<p>Key examples of how it can be applied:</p>
<ul>
<li><p><strong>Multimodal authentication</strong> is a secure identity verification process consisting of a combination of facial recognition, voice recognition, and natural language processing.</p>
</li>
<li><p><strong>Speaker recognition</strong> uses automatic speech-to-text encoding and intent recognition to process a verbal response to security questions.</p>
</li>
<li><p><strong>Contract analysis</strong> scans legal documents to identify key terms, deliverables, and dates, extracting that information automatically. As an example, you can try the following snippet with the spaCy library installed:</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> spacy

nlp = spacy.load(<span class="hljs-string">"en_core_web_sm"</span>)

<span class="hljs-comment"># Extract dates and obligations from contract</span>
contract_text = <span class="hljs-string">"The agreement shall commence on January 1, 2026 and continue for a period of 12 months."</span>
doc = nlp(contract_text)

<span class="hljs-keyword">for</span> ent <span class="hljs-keyword">in</span> doc.ents:
    <span class="hljs-keyword">if</span> ent.label_ <span class="hljs-keyword">in</span> [<span class="hljs-string">"DATE"</span>, <span class="hljs-string">"CARDINAL"</span>]:
        print(<span class="hljs-string">f"<span class="hljs-subst">{ent.label_}</span>: <span class="hljs-subst">{ent.text}</span>"</span>)
</code></pre>
<ul>
<li><strong>Compliance monitoring</strong> looks for possible regulatory infractions in legal communications.</li>
</ul>
<p>These real-world examples show how NLP can be used to solve practical business issues and boost operational effectiveness. You can modify the samples to your case or discover other potential uses, but these are the most widespread ones for you to try.</p>
<h2 id="heading-how-to-choose-the-most-effective-nlp-tools-and-libraries">How to Choose the Most Effective NLP Tools and Libraries</h2>
<p>There is a great variety of tools and libraries that can help you learn how to use NLP or that you can use to implement NLP into a project. You should select the appropriate tools considering your project needs and background in the associated technologies.</p>
<p>Below are some popular tools you can choose to learn or check out, along with tips about when they’re most useful.</p>
<h3 id="heading-hugging-face-transformershttpshuggingfacecodocstransformersenindex"><a target="_blank" href="https://huggingface.co/docs/transformers/en/index">Hugging Face Transformers</a></h3>
<p>Hugging Face Transformers has thousands of pre-trained models for text generation, classification, and question answering. It gives you more than 100 languages supported and is compatible with PyTorch and TensorFlow.</p>
<p>It also provides model hosting and datasets, and allows collaboration with community members. It will assist you with deep learning NLP applications that require top-notch software to implement the algorithms. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762166821354/51df5644-2713-46c8-b24d-e0d7a51bd61d.png" alt="Hugging Face" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-nltkhttpswwwnltkorg-natural-language-toolkit"><a target="_blank" href="https://www.nltk.org/">NLTK</a> (Natural Language Toolkit)</h3>
<p>NLTK is the primary package in Python for education and research concerning NLP. Developed at the University of Pennsylvania, it provides extensive packages for your tokenizers, stemmers, parsers, and semantic reasoning. It’s a great choice if you need to get learning concepts in NLP or conduct research projects. </p>
<h3 id="heading-spacyhttpsspacyio"><a target="_blank" href="https://spacy.io/">spaCy</a></h3>
<p>spaCy is a Python library developed to be production-ready, and has the fastest syntactic parser. It’s constructed using Cython for optimal performance and offers excellent named entity recognition. It will fit well if you need strong dependency parsing and a developer-friendly API. It’s easy for you to use spaCy for quick prototyping.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762166854654/b563410d-b978-498a-a52e-9e6bc58f732c.png" alt="SpaCy" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-google-cloud-nlphttpscloudgooglecomnatural-language"><a target="_blank" href="https://cloud.google.com/natural-language">Google Cloud NLP</a></h3>
<p>Google Cloud NLP also offers enterprise-level API services. It will fit your project if you need sentiment analysis, entity recognition, syntax analysis, automatic language identification, and simple, trouble-free scaling. And if you’re already in the Google Cloud ecosystem working with big volumes of customer feedback, it’s just what you need.</p>
<h3 id="heading-amazon-comprehendhttpsawsamazoncomcomprehend"><a target="_blank" href="https://aws.amazon.com/comprehend/">Amazon Comprehend</a></h3>
<p>Comprehend is a fully-managed service from AWS for text analysis in the cloud. It supports the major functions you might want to cover: sentiment analysis, entity recognition, topic modeling, built-in protection of personally identifiable information (PII), and auto-scaling. And it’s perfect if you need a built-in integration with the AWS suite.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762167136170/6cc6a502-adbc-401f-a9e7-957d7facbe0c.png" alt="Amazon Comprehend" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-ibm-watsonhttpswwwibmcomdocsenwatsonxsaastopicscripts-watson-natural-language-processing"><a target="_blank" href="https://www.ibm.com/docs/en/watsonx/saas?topic=scripts-watson-natural-language-processing">IBM Watson</a></h3>
<p>Watson has NLP models specific to regulated industries (healthcare, finance, and so on). Its library offers pre-trained models in 20 programming languages. Its top features you can use are strong data controls, reliable REST API access, and truly compliance-ready outputs. These makes this tool a great choice if you’re in healthcare, finance, or legal industries.</p>
<h3 id="heading-textblobhttpstextblobreadthedocsio"><a target="_blank" href="https://textblob.readthedocs.io/">TextBlob</a></h3>
<p>TextBlob is a simplified library that’s a great option if you’re a beginner. It’s a user-friendly Python library for common NLP tasks. For your convenience, it offers a simplified API design, but still provides decent sentiment analysis, translation, spelling correction, and noun phrase extraction. Apart from beginner projects, it will fit your quick prototypes creation needs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762167252639/9961064e-1311-4423-bb6c-57aa15b11fc4.png" alt="TextBlob" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-prepare-and-train-nlp-systems">How to Prepare and Train NLP Systems</h2>
<p>As you get ready to train your NLP model, you’ll need to prepare your data accurately to ensure its quality doesn’t hinder the outputs. Remember that poor quality data results in a poor performing model, so you’ll want to make sure you have solid data.</p>
<h3 id="heading-understanding-data-quality-and-preprocessing">Understanding Data Quality and Preprocessing</h3>
<p>Raw text data is messy and unstructured. It contains typos, slang, and irrelevant information that degrades the performance of your model. </p>
<p>Preprocessing is the operation that takes messy data and converts it into clean, structured text that models can accept as input. </p>
<p>Research shows that 85.4% of NLP research studies utilized some sort of restructuring/preprocessing to allow NLP models to process raw text. The key data quality components that were essential included accuracy (68.3%), relevance (34.1%) and comparability (31.7%). (Source: <a target="_blank" href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10476151/">Nesca and others, NCBI).</a></p>
<p>Preprocessing comes down to a specific list of tasks you’ll need to perform. Let’s break them down.</p>
<h3 id="heading-text-cleaning">Text Cleaning</h3>
<p>Text cleaning is the process of standardizing the text format by removing anything that may affect model training. Raw text often contains extra elements (HTML tags, URLs, special characters, inconsistent use of capitalization, and excess whitespace) that add noise to your data.</p>
<p>The following example shows a cleaning pipeline that removes the above-mentioned elements. This function performs multiple steps of text cleaning:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> re

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">clean_text</span>(<span class="hljs-params">text</span>):</span>
    <span class="hljs-comment"># Convert to lowercase</span>
    text = text.lower()

    <span class="hljs-comment"># Remove HTML tags</span>
    text = re.sub(<span class="hljs-string">r'&lt;[^&gt;]+&gt;'</span>, <span class="hljs-string">''</span>, text)    

    <span class="hljs-comment"># Remove URLs</span>
    text = re.sub(<span class="hljs-string">r'http\S+|www.\S+'</span>, <span class="hljs-string">''</span>, text)    

    <span class="hljs-comment"># Remove special characters and numbers</span>
    text = re.sub(<span class="hljs-string">r'[^a-zA-Z\s]'</span>, <span class="hljs-string">''</span>, text)    

    <span class="hljs-comment"># Remove extra whitespace</span>
    text = <span class="hljs-string">' '</span>.join(text.split())    

    <span class="hljs-keyword">return</span> text

<span class="hljs-comment"># Example</span>
raw_text = <span class="hljs-string">"Check out https://example.com! It's &lt;b&gt;AMAZING&lt;/b&gt; :-)"</span>
cleaned = clean_text(raw_text)
print(cleaned)
<span class="hljs-comment"># Output: check out its amazing</span>
</code></pre>
<p>The first step the model made was to convert everything to lower case for uniformity. Then it used regular expressions to parse the text to remove HTML tags, URLs, special characters, and numbers. Finally, it normalized whitespace by splitting and joining the text back together. The output is clean text, in a standardized format you can use for tokenization.</p>
<h3 id="heading-tokenization">Tokenization</h3>
<p>The next step is to divide the text into smaller digestible chunks that are easier for ML models to understand. These chunks are known as tokens.</p>
<p>Tokenization comes in three varieties:</p>
<ul>
<li><p><strong>Word tokenization</strong> separates text according to punctuation and whitespace.</p>
</li>
<li><p><strong>Sentence tokenization</strong> uses punctuation cues to divide text into sentences.</p>
</li>
<li><p><strong>Subword tokenization</strong> breaks words up into more manageable chunks.</p>
</li>
</ul>
<p>The example below addresses examples of word and sentence tokenizing by using NLTK (Natural Language Toolkit). </p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> nltk.tokenize <span class="hljs-keyword">import</span> word_tokenize, sent_tokenize

text = <span class="hljs-string">"Natural language processing is exciting! It helps computers understand text."</span>

<span class="hljs-comment"># Word tokenization</span>
words = word_tokenize(text)
print(<span class="hljs-string">f"Words: <span class="hljs-subst">{words}</span>"</span>)

<span class="hljs-comment"># Sentence tokenization</span>
sentences = sent_tokenize(text)
print(<span class="hljs-string">f"Sentences: <span class="hljs-subst">{sentences}</span>"</span>)
</code></pre>
<p>Notice that after word tokenization was performed, the punctuation marks '!' or '.' were considered individual tokens, as punctuation conveys meaning. Sentence tokenization correctly identified the boundaries of the two sentences, and despite the presence of an exclamation mark, it indicated that it had more complex rules beyond just splitting based on periods.</p>
<h3 id="heading-stop-word-removal">Stop Word Removal</h3>
<p>Here, you reduce the text to the meaning without any extra details. You can do this by removing the commonly used words that have little semantic value – the “stop words”. </p>
<p>Common stop words include articles, prepositions, pronouns, auxiliary verbs and conjunctions. Here’s how to do it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
<span class="hljs-keyword">from</span> nltk.tokenize <span class="hljs-keyword">import</span> word_tokenize

nltk.download(<span class="hljs-string">'stopwords'</span>)

text = <span class="hljs-string">"The quick brown fox jumps over the lazy dog"</span>
tokens = word_tokenize(text.lower())

stop_words = set(stopwords.words(<span class="hljs-string">'english'</span>))
filtered_tokens = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> tokens <span class="hljs-keyword">if</span> word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> stop_words]

print(<span class="hljs-string">f"Original: <span class="hljs-subst">{tokens}</span>"</span>)
print(<span class="hljs-string">f"Filtered: <span class="hljs-subst">{filtered_tokens}</span>"</span>)
<span class="hljs-comment"># Output: ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']</span>
</code></pre>
<h3 id="heading-stemming-and-lemmatization">Stemming and Lemmatization</h3>
<p>In this next step, you’ll process the text further by reducing words to their root form. This will treat words with similar variations as a single token.</p>
<ul>
<li><p>To be precise, <strong>stemming is simply using heuristic rules</strong> that remove the endings of words. For example (running|runs|run) → run.</p>
</li>
<li><p><strong>Lemmatization uses morphological analysis</strong> and vocabulary. For example, (children|mice) → child|mouse.</p>
</li>
</ul>
<p>Lemmatization typically gives better, more accurate outcomes (but calls for more computation at the same time).</p>
<p>Here’s how you can apply both:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> nltk.stem <span class="hljs-keyword">import</span> PorterStemmer, WordNetLemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

words = [<span class="hljs-string">"running"</span>, <span class="hljs-string">"runs"</span>, <span class="hljs-string">"ran"</span>, <span class="hljs-string">"children"</span>, <span class="hljs-string">"better"</span>]

print(<span class="hljs-string">"Stemming:"</span>)
<span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words:
    print(<span class="hljs-string">f"<span class="hljs-subst">{word}</span> -&gt; <span class="hljs-subst">{stemmer.stem(word)}</span>"</span>)

print(<span class="hljs-string">"\nLemmatization:"</span>)
<span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words:
    print(<span class="hljs-string">f"<span class="hljs-subst">{word}</span> -&gt; <span class="hljs-subst">{lemmatizer.lemmatize(word, pos=<span class="hljs-string">'v'</span>)}</span>"</span>)
</code></pre>
<h3 id="heading-expanding-contractions">Expanding Contractions</h3>
<p>AI systems need standardization. And “don’ts” and “you’res” are unacceptable for them. This is why, typically, you would want to expand the contraction to the real word for standardization. Here’s how you can do that:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> contractions

text = <span class="hljs-string">"I can't believe it's already 2025. We'll see what happens."</span>
expanded = contractions.fix(text)
print(expanded)
<span class="hljs-comment"># Output: I cannot believe it is already 2025. We will see what happens.</span>
</code></pre>
<h3 id="heading-correcting-spelling-errors">Correcting Spelling Errors</h3>
<p>Orthographic, spelling, or grammar errors shouldn’t be in the data that you feed to your ML models. You can make corrections to these errors using statistical language models, which can predict the most likely intended word, edit distance algorithms that can find the closest valid word, or neural approaches that can learn patterns of common errors.</p>
<p>For example, let’s see how TextBlob, a library that uses a mix of a dictionary-based approach and contextual probability, detects and corrects misspellings.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob

text = <span class="hljs-string">"Natral languag procesing is powrful"</span>
corrected = TextBlob(text).correct()
print(<span class="hljs-string">f"Original: <span class="hljs-subst">{text}</span>"</span>)
print(<span class="hljs-string">f"Corrected: <span class="hljs-subst">{corrected}</span>"</span>)
</code></pre>
<p>TextBlob analyzes each word, identifies which ones are not in its dictionary, calculates edit distances, finds the most similar valid words, and selects corrections based on the frequency of word use in context.</p>
<h3 id="heading-parts-of-speech-tagging">Parts-of-Speech Tagging</h3>
<p>Parts-of-speech tagging (POS) refers to assigning grammatical classification to words based on their role in a sentence. This is important because the same word can function as a different part of speech depending on the context. For example, "walk" can be a noun (like "an evening walk") or a verb (like "I walk to the office").</p>
<p>POS taggers rely on statistical models trained to predict the most likely grammatical role for a word given the context. The following code shows POS tagging using NLTK, which applies a pre-trained model that will tag grammatical structure.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> nltk

text = <span class="hljs-string">"The cat sat on the mat"</span>
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

<span class="hljs-keyword">for</span> word, tag <span class="hljs-keyword">in</span> pos_tags:
    print(<span class="hljs-string">f"<span class="hljs-subst">{word}</span>: <span class="hljs-subst">{tag}</span>"</span>)
<span class="hljs-comment"># Output:</span>
<span class="hljs-comment"># The: DT (Determiner)</span>
<span class="hljs-comment"># cat: NN (Noun)</span>
<span class="hljs-comment"># sat: VBD (Verb, past tense)</span>
</code></pre>
<p>The function pos_tag() assesses each token and assigns it a standardized notation. For example, DT indicates determiners (such as "the"), NN indicates singular nouns, VBD indicates past tense verbs, and IN indicates prepositions. The tagger can also use context to make these decisions: it can determine that "sat" is VBD and not NN because it appears after a noun and before a preposition, all of which are typical patterns of the English sentence.</p>
<h2 id="heading-establishing-and-labeling-datasets">Establishing and Labeling Datasets</h2>
<p>For supervised learning tasks such as sentiment analysis, NER, or classification, unlabeled data is useless. </p>
<p>This is why, to create training datasets, you must annotate raw data with relevant labels.  Models can learn patterns and make predictions thanks to this "ground truth." Let’s define the most common methods for labeling.</p>
<h3 id="heading-automated-labeling-based-on-libraries">Automated Labeling Based on Libraries</h3>
<p>You don’t always have to create labels from scratch. Libraries like TextBlob have built-in sentiment analysis models trained on large datasets that label text. They run a polarity score (a number that represents sentiment) and assign a categorical label.</p>
<p>For example, TextBlob sees a word choice, modifiers in particular (words like "very" or "not"), and grammatical patterns to run a polarity score from -1 (most negative) to +1 (most positive), with zero meaning a neutral sentiment.</p>
<p>In this example, we’re automatically labeling sentiments based on a pre-trained TextBlob sentiment analyzer:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> textblob <span class="hljs-keyword">import</span> TextBlob

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">label_sentiment</span>(<span class="hljs-params">text</span>):</span>
    blob = TextBlob(text)
    polarity = blob.sentiment.polarity

    <span class="hljs-keyword">if</span> polarity &lt; <span class="hljs-number">0</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"negative"</span>
    <span class="hljs-keyword">elif</span> polarity == <span class="hljs-number">0</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"neutral"</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"positive"</span>

<span class="hljs-comment"># Example</span>
texts = [
    <span class="hljs-string">"I love this product!"</span>,
    <span class="hljs-string">"It's okay, nothing special"</span>,
    <span class="hljs-string">"Terrible experience, very disappointed"</span>
]

<span class="hljs-keyword">for</span> text <span class="hljs-keyword">in</span> texts:
    label = label_sentiment(text)
    print(<span class="hljs-string">f"<span class="hljs-subst">{text}</span> -&gt; <span class="hljs-subst">{label}</span>"</span>)
</code></pre>
<h3 id="heading-manual-labeling">Manual Labeling</h3>
<p>In many cases, automated library-based labeling isn’t an option. For domain-specific standards, you should annotate your data by hand to ensure accuracy and relevance.</p>
<p><strong>For projects involving manual labeling:</strong></p>
<ul>
<li><p><strong>Establish precise labeling standards.</strong> Provide an annotation guideline that defines each label with clear criteria and edge cases. As an example, if annotating for customer support tickets, an example of potential criteria should be that "I need help resetting my password" is "Technical support," and another example is "When will my order arrive?" which is an example of "Order inquiry."</p>
</li>
<li><p><strong>For quality control, use several annotators</strong>. Have 2-3 people label the same data samples independently. For example, if annotating for medical symptoms, having multiple annotators better reduces the chance of bias from one person's labeling and may protect against clerical errors.</p>
</li>
<li><p><strong>Determine the inter-annotator agreement.</strong> Calculate <a target="_blank" href="https://numiqo.com/tutorial/cohens-kappa">Cohen's kappa</a> or <a target="_blank" href="https://numiqo.com/tutorial/fleiss-kappa">Fleiss' kappa</a> scores to measure the consistency of agreement among annotators. A score of above 0.80 would signify very good agreement, while a score below 0.60 would indicate that the labeling guidelines were not clear enough to the annotators.</p>
</li>
<li><p><strong>Give instructions and illustrations</strong>. Create a reference document with 20-30 examples you pre-labeled, showing examples of typical use cases and edge cases. For example, in a sentiment analysis, case you can provide examples of when a sentiment would be neutral, with an example such as "This product is fine, I guess," even though it may have seemed slightly negative. A sentiment is also classified as a good positive example even though it contains two negatives: "Not bad at all.”</p>
</li>
</ul>
<p>This is the gold standard for high-quality datasets, but it’s also time-consuming and expensive – labeling 10,000 customer reviews might take a week if done manually.</p>
<h3 id="heading-approaches-with-semi-supervision">Approaches with Semi-Supervision</h3>
<p>There are instances where you should combine the previous approaches. This semi-supervised method uses a small, manually labeled data set (high-quality data) and a large pool of unlabeled data (cheap, large amounts of data). </p>
<p>The method operates through iterative self-training, where you first train the model on your small dataset of signed data, then predict the labels on the unlabeled data using this model, then add the most confident predictions to your training data during training, and retrain. The self-training process is then repeated, improving and expanding your labeled data set gradually.</p>
<p>Here is an example of self-training in practice: this code demonstrates the semi-supervised workflow.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.semi_supervised <span class="hljs-keyword">import</span> SelfTrainingClassifier
<span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC

<span class="hljs-comment"># Small labeled dataset + large unlabeled dataset</span>
X_labeled = [[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>], [<span class="hljs-number">5</span>, <span class="hljs-number">6</span>]]
y_labeled = [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>]
X_unlabeled = [[<span class="hljs-number">2</span>, <span class="hljs-number">3</span>], [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>], [<span class="hljs-number">6</span>, <span class="hljs-number">7</span>]]

<span class="hljs-comment"># Combine datasets (-1 represents unlabeled)</span>
X_train = X_labeled + X_unlabeled
y_train = y_labeled + [<span class="hljs-number">-1</span>, <span class="hljs-number">-1</span>, <span class="hljs-number">-1</span>]

<span class="hljs-comment"># Self-training classifier</span>
base_classifier = SVC(probability=<span class="hljs-literal">True</span>, gamma=<span class="hljs-string">'auto'</span>)
self_training = SelfTrainingClassifier(base_classifier)
self_training.fit(X_train, y_train)
</code></pre>
<p>The code shows a version of the SelfTrainingClassifier that first trains on the three labeled examples, then uses the model to predict inputs and labels for the unlabeled data. The classifier then selects predictions where it has high confidence (for example, predictions that are &gt;90% probability) while using them as newly signed data. The classifier then re-trains itself, and the process continues.</p>
<p>So how do you decide which approach will fit your needs? In most cases, the optimal one will depend on the following aspects:</p>
<ul>
<li><p>Available budget and time.</p>
</li>
<li><p>Desired accuracy.</p>
</li>
<li><p>Size of the dataset.</p>
</li>
<li><p>Complexity of the domain.</p>
</li>
</ul>
<p>As you see, approaches vary and can be mixed now and then. The key thing is to make sure the inputs for final pre-generation processing are cleaned, standardized, and labeled. Remember the key principle: “garbage in, garbage out”. Send gold instead, and good luck!</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>At this point, you should know the basics of working with NLP projects.</p>
<p><strong>Throughout this article, you've learned:</strong></p>
<ul>
<li><p>How to set up your NLP development environment using a set of tools and libraries.</p>
</li>
<li><p>The five parts of NLP systems and how they are used to process language.</p>
</li>
<li><p>How to conduct common tasks like NER, sentiment analysis, and text classification.</p>
</li>
<li><p>How to choose the library to use to accommodate your project needs.</p>
</li>
<li><p>How to prepare and label datasets for training.</p>
</li>
<li><p>How to find the key practical applications of NLP tailored to your industry and use case.</p>
</li>
</ul>
<h3 id="heading-next-steps">Next steps</h3>
<ul>
<li><p>Try to start with a simple project, like sentiment analysis, that uses pre-trained models.</p>
</li>
<li><p>Practice preprocessing methods with your own text data.</p>
</li>
<li><p>Use and try different libraries to see how to get the best output for your project.</p>
</li>
<li><p>Build a full pipeline from preparing text data to deploying models.</p>
</li>
<li><p>Continue to practice and see advanced applications like transformer models and fine-tuning.</p>
</li>
</ul>
<p>Most importantly, keep in mind that NLP is an iterative process. Start small, test appropriately to get it to work, and then build in complexity when you are more comfortable and sure of your abilities and familiarity with the practices.</p>
<h3 id="heading-about-the-author">About the author</h3>
<p>Hope you enjoyed the article and found it helpful. I’ve been a contributor to freeCodeCamp for more than 8 years, and to make this piece more precise and detailed, I used some expert help.</p>
<p>I’m grateful for the technical ideas of my co-workers at <a target="_blank" href="https://coaxsoft.com/">COAX Software</a> who wished to stay anonymous. The company is a well-regarded <a target="_blank" href="https://coaxsoft.com/services/ai-development-services">AI/ML development company.</a></p>
<p>To find out more about me and read more content on tech and digital, you can <a target="_blank" href="https://www.linkedin.com/in/oleg-romanyuk/">visit my LinkedIn page</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How I Built a Makaton AI Companion Using Gemini Nano and the Gemini API ]]>
                </title>
                <description>
                    <![CDATA[ When I started my research on AI systems that could translate Makaton (a sign and symbol language designed to support speech and communication), I wanted to bridge a gap in accessibility for learners with speech or language difficulties. Over time, t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-i-built-a-makaton-ai-companion-using-gemini-nano-and-the-gemini-api/</link>
                <guid isPermaLink="false">690e1f43cb50ea9684f6d9aa</guid>
                
                    <category>
                        <![CDATA[ geminiAPI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Computer Vision ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ gemini-nano ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ OMOTAYO OMOYEMI ]]>
                </dc:creator>
                <pubDate>Fri, 07 Nov 2025 16:33:07 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762533154134/e2209ade-6971-464b-aeef-f05abd0a30d7.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When I started my research on AI systems that could translate Makaton (a sign and symbol language designed to support speech and communication), I wanted to bridge a gap in accessibility for learners with speech or language difficulties.</p>
<p>Over time, this academic interest evolved into a working prototype that combines on-device AI and cloud AI to describe images and translate them into English meanings. The idea was simple: I wanted to build a lightweight web app that recognized Makaton gestures or symbols and instantly provided an English interpretation.</p>
<p>In this article, I’ll walk you through how I built my Makaton AI Companion, a single-page web app powered by Gemini Nano (on-device) and the Gemini API (cloud). You’ll see how it works, how I solved common issues like CORS and API model errors, and how this small project became part of my journey toward AI for accessibility.</p>
<p>By the end of this article, you will be able to:</p>
<ul>
<li><p>Understand the core concept behind Makaton and why it’s important in accessibility and inclusive education.</p>
</li>
<li><p>Learn how to combine on-device AI (Gemini Nano) and cloud-based AI (Gemini API) in a single web project.</p>
</li>
<li><p>Build a functional AI-powered web app that can describe images and map them to predefined English meanings.</p>
</li>
<li><p>Discover how to handle common errors such as model endpoint issues, missing API keys, and CORS restrictions when working with generative AI APIs.</p>
</li>
<li><p>Learn how to store API keys locally for user privacy using <code>localStorage</code>.</p>
</li>
<li><p>Use browser speech synthesis to convert the AI-generated English meanings into spoken output.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-tools-and-tech-stack">Tools and Tech Stack</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-building-the-app-step-by-step">Building the App Step by Step</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-fix-the-common-issues">How to Fix the Common Issues</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-demo-the-makaton-ai-companion-in-action">Demo: The Makaton AI Companion in Action</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-broader-reflections">Broader Reflections</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-tools-and-tech-stack">Tools and Tech Stack</h2>
<p>To build the Makaton AI Companion, I wanted something lightweight, fast to prototype, and easy for anyone to run without complicated dependencies. I chose a plain web stack with a focus on accessibility and transparency.</p>
<p>Here’s what I used:</p>
<h3 id="heading-frontend">Frontend</h3>
<ul>
<li><p><strong>HTML + CSS + JavaScript (Vanilla):</strong> No frameworks, just clean and understandable code that any beginner can follow.</p>
</li>
<li><p>A single <code>index.html</code> page handles the upload interface, output display, and AI logic.</p>
</li>
</ul>
<h3 id="heading-ai-components">AI Components</h3>
<ul>
<li><p><strong>Gemini Nano</strong> runs locally in Chrome Canary. This on-device model lets users generate short text without calling the cloud API.</p>
</li>
<li><p><strong>Gemini API (Cloud)</strong> used as a fallback when on-device AI isn’t available or when image analysis is required.</p>
<ul>
<li><p>Model tested: <code>gemini-1.5-flash</code> and <code>gemini-pro-vision</code>.</p>
</li>
<li><p>Fallback logic ensures the app checks multiple model endpoints if one returns a 404 error.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-local-storage">Local Storage</h3>
<ul>
<li>The Gemini API key is stored safely in the browser’s localStorage, so it never leaves the user’s computer.</li>
</ul>
<h3 id="heading-browser-speechsynthesis-api">Browser SpeechSynthesis API</h3>
<ul>
<li>Converts the translated English meaning into spoken audio with one click.</li>
</ul>
<h3 id="heading-mapping-logic">Mapping Logic</h3>
<ul>
<li>A small custom dictionary (<code>mapping.js</code>) links AI-generated descriptions to likely Makaton meanings. For example: <code>{ keywords: ["open hand", "raised hand", "wave"], meaning: "Hello / Stop" }</code></li>
</ul>
<h3 id="heading-local-server">Local Server</h3>
<ul>
<li><p>The app is served locally using Python’s built-in HTTP server to avoid CORS issues:</p>
<p>  <code>python -m http.server 8080</code></p>
</li>
</ul>
<p>Then open <code>http://localhost:8080</code> in Chrome Canary.</p>
<h2 id="heading-building-the-app-step-by-step">Building the App Step by Step</h2>
<p>Now let’s dive into how the Makaton AI Companion works under the hood. This project follows a simple but effective flow: Upload an image → Describe (AI) → Map to Meaning → Speak or Copy the result</p>
<p>We’ll go through each part step by step.</p>
<h3 id="heading-1-setting-up-the-project-folder">1. Setting Up the Project Folder</h3>
<p>You don’t need any complex setup. Just create a new folder and add these files:</p>
<pre><code class="lang-plaintext">makaton-ai-companion/
│
├── index.html
├── styles.css
├── app.js
└── lib/
    ├── mapping.js
    └── ai.js
</code></pre>
<p>If you prefer a ready-to-run version, you can serve everything from one zip (I’ll share a GitHub link at the end).</p>
<h3 id="heading-2-creating-the-basic-html-structure">2. Creating the Basic HTML Structure</h3>
<p>Your <code>index.html</code> file defines the interface where users upload an image, click <em>Describe</em>, and view the results.</p>
<pre><code class="lang-html"><span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span> <span class="hljs-attr">lang</span>=<span class="hljs-string">"en"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">charset</span>=<span class="hljs-string">"UTF-8"</span> /&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"viewport"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"width=device-width, initial-scale=1.0"</span>/&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Makaton AI Companion<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">link</span> <span class="hljs-attr">rel</span>=<span class="hljs-string">"stylesheet"</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"styles.css"</span>/&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">header</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"app-header"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span>🧩 Makaton AI Companion<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnSettings"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn secondary"</span>&gt;</span>Settings<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">header</span>&gt;</span>

  <span class="hljs-tag">&lt;<span class="hljs-name">main</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"container"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">section</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"card"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">h2</span>&gt;</span>1) Upload an image (Makaton sign/symbol)<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">for</span>=<span class="hljs-string">"file"</span>&gt;</span>
        Choose an image file
        <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"file"</span> <span class="hljs-attr">accept</span>=<span class="hljs-string">"image/*"</span> <span class="hljs-attr">title</span>=<span class="hljs-string">"Select an image file"</span>/&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"preview"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"preview hidden"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"status"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"status"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"actions"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnDescribe"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn"</span>&gt;</span>Describe (Cloud or Nano)<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnType"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn ghost"</span>&gt;</span>Type a description instead<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"typedBox"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"typed hidden"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">textarea</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"typed"</span> <span class="hljs-attr">rows</span>=<span class="hljs-string">"3"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Describe what you see..."</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">textarea</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnUseTyped"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn"</span>&gt;</span>Use this description<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">section</span>&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">section</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"card"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">h2</span>&gt;</span>2) AI Output<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"grid"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">h3</span>&gt;</span>Image Description<span class="hljs-tag">&lt;/<span class="hljs-name">h3</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"output"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"output"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">h3</span>&gt;</span>English Meaning (Mapped)<span class="hljs-tag">&lt;/<span class="hljs-name">h3</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"meaning"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"meaning"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
          <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"actions"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnSpeak"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn ghost"</span> <span class="hljs-attr">disabled</span>&gt;</span>🔊 Speak<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnCopy"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn ghost"</span> <span class="hljs-attr">disabled</span>&gt;</span>📋 Copy<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
          <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">section</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">main</span>&gt;</span>

  <span class="hljs-tag">&lt;<span class="hljs-name">dialog</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"settings"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">form</span> <span class="hljs-attr">method</span>=<span class="hljs-string">"dialog"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"settings-form"</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">h2</span>&gt;</span>Settings<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">label</span>&gt;</span>Gemini API key (optional)<span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"apiKey"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"password"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"AIza..."</span>/&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"settings-actions"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnSaveKey"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn"</span>&gt;</span>Save<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"btnCloseSettings"</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"button"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"btn secondary"</span>&gt;</span>Close<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
      <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"apiStatus"</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"api-status"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">form</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">dialog</span>&gt;</span>

  <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"module"</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"lib/mapping.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"module"</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"lib/ai.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"module"</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"app.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>This interface is intentionally minimal: no frameworks, no build tools, just clear HTML.</p>
<h3 id="heading-3-mapping-descriptions-to-makaton-meanings">3. Mapping Descriptions to Makaton Meanings</h3>
<p>The <code>mapping.js</code> file holds a simple keyword-based dictionary. When the AI describes an image (like <em>“a raised open hand”</em>), the app searches for keywords that match known Makaton signs.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// lib/mapping.js</span>

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> MAKATON_GLOSSES = [
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"open hand"</span>, <span class="hljs-string">"raised hand"</span>, <span class="hljs-string">"wave"</span>, <span class="hljs-string">"hand up"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Hello / Stop"</span> },
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"eat"</span>, <span class="hljs-string">"food"</span>, <span class="hljs-string">"spoon"</span>, <span class="hljs-string">"hand to mouth"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Eat"</span> },
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"drink"</span>, <span class="hljs-string">"cup"</span>, <span class="hljs-string">"glass"</span>, <span class="hljs-string">"bottle"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Drink"</span> },
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"home"</span>, <span class="hljs-string">"house"</span>, <span class="hljs-string">"roof"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Home"</span> },
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"sleep"</span>, <span class="hljs-string">"bed"</span>, <span class="hljs-string">"eyes closed"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Sleep"</span> },
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"book"</span>, <span class="hljs-string">"reading"</span>, <span class="hljs-string">"pages"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Book / Read"</span> },
  <span class="hljs-comment">// Added so your current screenshot maps correctly:</span>
  { <span class="hljs-attr">keywords</span>: [<span class="hljs-string">"help"</span>, <span class="hljs-string">"assist"</span>, <span class="hljs-string">"thumb on palm"</span>, <span class="hljs-string">"hand over hand"</span>, <span class="hljs-string">"assisting"</span>], <span class="hljs-attr">meaning</span>: <span class="hljs-string">"Help"</span> },
];

<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">mapDescriptionToMeaning</span>(<span class="hljs-params">desc</span>) </span>{
  <span class="hljs-keyword">if</span> (!desc) <span class="hljs-keyword">return</span> <span class="hljs-string">""</span>;
  <span class="hljs-keyword">const</span> d = desc.toLowerCase();
  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> entry <span class="hljs-keyword">of</span> MAKATON_GLOSSES) {
    <span class="hljs-keyword">if</span> (entry.keywords.some(<span class="hljs-function"><span class="hljs-params">k</span> =&gt;</span> d.includes(k))) <span class="hljs-keyword">return</span> entry.meaning;
  }
  <span class="hljs-keyword">if</span> (d.includes(<span class="hljs-string">"hand"</span>)) <span class="hljs-keyword">return</span> <span class="hljs-string">"Gesture / Hand sign (clarify)"</span>;
  <span class="hljs-keyword">return</span> <span class="hljs-string">"No direct mapping found."</span>;
}
</code></pre>
<p>It’s simple but effective enough to simulate real symbol-to-language translation for demo purposes.</p>
<h3 id="heading-4-adding-gemini-ai-logic">4. Adding Gemini AI Logic</h3>
<p>The <code>ai.js</code> file connects to Gemini Nano (on-device) or the Gemini API (cloud). If Nano isn’t available, the app falls back to the cloud model. And if that fails, it lets users type a description manually.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// lib/ai.js — dynamic model discovery (try-all version)</span>

<span class="hljs-comment">// --- On-device availability (Gemini Nano) ---</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">checkAvailability</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> res = { <span class="hljs-attr">nanoTextPossible</span>: <span class="hljs-literal">false</span> };
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> canCreate = self.ai?.canCreateTextSession || self.ai?.languageModel?.canCreate;
    <span class="hljs-keyword">if</span> (<span class="hljs-keyword">typeof</span> canCreate === <span class="hljs-string">"function"</span>) {
      <span class="hljs-keyword">const</span> ok = <span class="hljs-keyword">await</span> (self.ai.canCreateTextSession?.() || self.ai.languageModel.canCreate?.());
      res.nanoTextPossible = ok === <span class="hljs-string">"readily"</span> || ok === <span class="hljs-string">"after-download"</span> || ok === <span class="hljs-literal">true</span>;
    }
  } <span class="hljs-keyword">catch</span> {}
  <span class="hljs-keyword">return</span> res;
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createNanoTextSession</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">if</span> (self.ai?.createTextSession) <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> self.ai.createTextSession();
  <span class="hljs-keyword">if</span> (self.ai?.languageModel?.create) <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> self.ai.languageModel.create();
  <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"Gemini Nano text session not available"</span>);
}

<span class="hljs-comment">// --- Cloud: dynamically discover models for this key ---</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">listModels</span>(<span class="hljs-params">key</span>) </span>{
  <span class="hljs-keyword">const</span> url = <span class="hljs-string">"https://generativelanguage.googleapis.com/v1/models?key="</span> + <span class="hljs-built_in">encodeURIComponent</span>(key);
  <span class="hljs-keyword">const</span> r = <span class="hljs-keyword">await</span> fetch(url);
  <span class="hljs-keyword">if</span> (!r.ok) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"ListModels failed: "</span> + (<span class="hljs-keyword">await</span> r.text()));
  <span class="hljs-keyword">const</span> j = <span class="hljs-keyword">await</span> r.json();
  <span class="hljs-keyword">return</span> (j.models || []).map(<span class="hljs-function"><span class="hljs-params">m</span> =&gt;</span> m.name).filter(<span class="hljs-built_in">Boolean</span>);
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">rankModels</span>(<span class="hljs-params">names</span>) </span>{
  <span class="hljs-comment">// Prefer Gemini 1.5 (multimodal), then flash variants, then anything with vision/pro.</span>
  <span class="hljs-keyword">return</span> names
    .filter(<span class="hljs-function"><span class="hljs-params">n</span> =&gt;</span> n.startsWith(<span class="hljs-string">"models/"</span>))              <span class="hljs-comment">// ignore tunedModels, etc.</span>
    .filter(<span class="hljs-function"><span class="hljs-params">n</span> =&gt;</span> !n.includes(<span class="hljs-string">"experimental"</span>))          <span class="hljs-comment">// skip experimental</span>
    .sort(<span class="hljs-function">(<span class="hljs-params">a, b</span>) =&gt;</span> score(b) - score(a));

  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">score</span>(<span class="hljs-params">n</span>) </span>{
    <span class="hljs-keyword">let</span> s = <span class="hljs-number">0</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"1.5"</span>)) s += <span class="hljs-number">10</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"flash"</span>)) s += <span class="hljs-number">8</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"pro-vision"</span>)) s += <span class="hljs-number">7</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"pro"</span>)) s += <span class="hljs-number">6</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"vision"</span>)) s += <span class="hljs-number">5</span>;
    <span class="hljs-keyword">if</span> (n.includes(<span class="hljs-string">"latest"</span>)) s += <span class="hljs-number">2</span>;
    <span class="hljs-keyword">return</span> s;
  }
}

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">tryGenerateForModels</span>(<span class="hljs-params">imageDataUrl, key, models, mimeType</span>) </span>{
  <span class="hljs-keyword">const</span> base64 = imageDataUrl.split(<span class="hljs-string">","</span>)[<span class="hljs-number">1</span>];
  <span class="hljs-keyword">const</span> body = {
    <span class="hljs-attr">contents</span>: [{
      <span class="hljs-attr">parts</span>: [
        { <span class="hljs-attr">text</span>: <span class="hljs-string">"Describe this image briefly in one sentence focusing on the main gesture or symbol."</span> },
        { <span class="hljs-attr">inline_data</span>: { <span class="hljs-attr">mime_type</span>: mimeType || <span class="hljs-string">"image/png"</span>, <span class="hljs-attr">data</span>: base64 } }
      ]
    }]
  };
  <span class="hljs-keyword">let</span> lastErr = <span class="hljs-string">""</span>;
  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> model <span class="hljs-keyword">of</span> models) {
    <span class="hljs-keyword">const</span> endpoint = <span class="hljs-string">"https://generativelanguage.googleapis.com/v1/"</span> + model + <span class="hljs-string">":generateContent?key="</span> + <span class="hljs-built_in">encodeURIComponent</span>(key);
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> r = <span class="hljs-keyword">await</span> fetch(endpoint, { <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>, <span class="hljs-attr">headers</span>: { <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span> }, <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify(body)});
      <span class="hljs-keyword">if</span> (!r.ok) { lastErr = <span class="hljs-keyword">await</span> r.text().catch(<span class="hljs-function">()=&gt;</span><span class="hljs-built_in">String</span>(r.status)); <span class="hljs-keyword">continue</span>; }
      <span class="hljs-keyword">const</span> j = <span class="hljs-keyword">await</span> r.json();
      <span class="hljs-keyword">const</span> text = j?.candidates?.[<span class="hljs-number">0</span>]?.content?.parts?.map(<span class="hljs-function"><span class="hljs-params">p</span>=&gt;</span>p.text).join(<span class="hljs-string">" "</span>).trim();
      <span class="hljs-keyword">if</span> (text) <span class="hljs-keyword">return</span> text;
      lastErr = <span class="hljs-string">"Empty response from "</span> + model;
    } <span class="hljs-keyword">catch</span> (e) {
      lastErr = <span class="hljs-built_in">String</span>(e?.message || e);
    }
  }
  <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"All discovered models failed. Last error: "</span> + lastErr);
}

<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">describeImageWithGemini</span>(<span class="hljs-params">imageDataUrl, apiKey, mimeType = <span class="hljs-string">"image/png"</span></span>) </span>{
  <span class="hljs-keyword">if</span> (!apiKey) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"No API key provided"</span>);

  <span class="hljs-keyword">const</span> models = <span class="hljs-keyword">await</span> listModels(apiKey);
  <span class="hljs-keyword">if</span> (!models.length) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"No models returned for this key. Ensure Generative Language API is enabled and T&amp;Cs accepted in AI Studio."</span>);

  <span class="hljs-keyword">const</span> ranked = rankModels(models);
  <span class="hljs-keyword">if</span> (!ranked.length) <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">"No usable model names returned (models/*)."</span>);

  <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> tryGenerateForModels(imageDataUrl, apiKey, ranked, mimeType);
}

<span class="hljs-comment">// --- Key storage (local only) ---</span>
<span class="hljs-keyword">const</span> KEY = <span class="hljs-string">"makaton_demo_gemini_key"</span>;
<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">saveApiKey</span>(<span class="hljs-params">k</span>) </span>{ <span class="hljs-built_in">localStorage</span>.setItem(KEY, k || <span class="hljs-string">""</span>); }
<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">loadApiKey</span>(<span class="hljs-params"></span>) </span>{ <span class="hljs-keyword">return</span> <span class="hljs-built_in">localStorage</span>.getItem(KEY) || <span class="hljs-string">""</span>; }
</code></pre>
<p>Note: This retry system is essential because many users encounter 404 model errors due to the unavailability of certain Gemini versions in every account.</p>
<h3 id="heading-5-the-main-logic-appjs">5. The Main Logic (app.js)</h3>
<p>This script ties everything together: file upload, AI call, meaning mapping, and output display.</p>
<pre><code class="lang-javascript">
<span class="hljs-keyword">import</span> { mapDescriptionToMeaning } <span class="hljs-keyword">from</span> <span class="hljs-string">'./lib/mapping.js'</span>;
<span class="hljs-keyword">import</span> { checkAvailability, createNanoTextSession, describeImageWithGemini, saveApiKey, loadApiKey } <span class="hljs-keyword">from</span> <span class="hljs-string">'./lib/ai.js'</span>;

<span class="hljs-built_in">document</span>.addEventListener(<span class="hljs-string">'DOMContentLoaded'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton] DOM ready'</span>);

  <span class="hljs-keyword">const</span> $ = <span class="hljs-function">(<span class="hljs-params">s</span>) =&gt;</span> <span class="hljs-built_in">document</span>.querySelector(s);

  <span class="hljs-comment">// Elements</span>
  <span class="hljs-keyword">const</span> fileInput   = $(<span class="hljs-string">'#file'</span>);
  <span class="hljs-keyword">const</span> preview     = $(<span class="hljs-string">'#preview'</span>);
  <span class="hljs-keyword">const</span> meaningEl   = $(<span class="hljs-string">'#meaning'</span>);
  <span class="hljs-keyword">const</span> outputEl    = $(<span class="hljs-string">'#output'</span>);
  <span class="hljs-keyword">const</span> btnDescribe = $(<span class="hljs-string">'#btnDescribe'</span>);
  <span class="hljs-keyword">const</span> btnType     = $(<span class="hljs-string">'#btnType'</span>);
  <span class="hljs-keyword">const</span> typedBox    = $(<span class="hljs-string">'#typedBox'</span>);
  <span class="hljs-keyword">const</span> typed       = $(<span class="hljs-string">'#typed'</span>);
  <span class="hljs-keyword">const</span> btnUseTyped = $(<span class="hljs-string">'#btnUseTyped'</span>);
  <span class="hljs-keyword">const</span> btnSpeak    = $(<span class="hljs-string">'#btnSpeak'</span>);
  <span class="hljs-keyword">const</span> btnCopy     = $(<span class="hljs-string">'#btnCopy'</span>);
  <span class="hljs-keyword">const</span> statusEl    = $(<span class="hljs-string">'#status'</span>);

  <span class="hljs-keyword">const</span> settings        = $(<span class="hljs-string">'#settings'</span>);
  <span class="hljs-keyword">const</span> btnSettings     = $(<span class="hljs-string">'#btnSettings'</span>);
  <span class="hljs-keyword">const</span> btnCloseSettings= $(<span class="hljs-string">'#btnCloseSettings'</span>);
  <span class="hljs-keyword">const</span> btnSaveKey      = $(<span class="hljs-string">'#btnSaveKey'</span>);
  <span class="hljs-keyword">const</span> apiKeyInput     = $(<span class="hljs-string">'#apiKey'</span>);
  <span class="hljs-keyword">const</span> apiStatus       = $(<span class="hljs-string">'#apiStatus'</span>);

  <span class="hljs-keyword">let</span> currentImageDataUrl = <span class="hljs-literal">null</span>;
  <span class="hljs-keyword">let</span> currentImageMime    = <span class="hljs-string">"image/png"</span>;

  <span class="hljs-comment">// Sanity logs</span>
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton] Elements:'</span>, {
    <span class="hljs-attr">fileInput</span>: !!fileInput, <span class="hljs-attr">preview</span>: !!preview, <span class="hljs-attr">outputEl</span>: !!outputEl,
    <span class="hljs-attr">meaningEl</span>: !!meaningEl, <span class="hljs-attr">btnDescribe</span>: !!btnDescribe, <span class="hljs-attr">statusEl</span>: !!statusEl
  });

  <span class="hljs-comment">// Init API key</span>
  <span class="hljs-keyword">if</span> (apiKeyInput) apiKeyInput.value = loadApiKey() || <span class="hljs-string">""</span>;

  <span class="hljs-comment">// --- Helpers ---</span>
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setStatus</span>(<span class="hljs-params">text</span>) </span>{
    <span class="hljs-keyword">if</span> (statusEl) statusEl.textContent = text || <span class="hljs-string">''</span>;
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton][Status]'</span>, text);
  }
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">clearOutputs</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">if</span> (outputEl) outputEl.textContent = <span class="hljs-string">''</span>;
    <span class="hljs-keyword">if</span> (meaningEl) meaningEl.textContent = <span class="hljs-string">''</span>;
    <span class="hljs-keyword">if</span> (btnSpeak) btnSpeak.disabled = <span class="hljs-literal">true</span>;
    <span class="hljs-keyword">if</span> (btnCopy)  btnCopy.disabled  = <span class="hljs-literal">true</span>;
  }
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setOutput</span>(<span class="hljs-params">desc</span>) </span>{
    <span class="hljs-keyword">if</span> (outputEl) outputEl.textContent = desc || <span class="hljs-string">''</span>;
    <span class="hljs-keyword">const</span> meaning = mapDescriptionToMeaning(desc || <span class="hljs-string">''</span>);
    <span class="hljs-keyword">if</span> (meaningEl) meaningEl.textContent = meaning;
    <span class="hljs-keyword">if</span> (btnSpeak) btnSpeak.disabled = !meaning || meaning.includes(<span class="hljs-string">'No direct mapping'</span>);
    <span class="hljs-keyword">if</span> (btnCopy)  btnCopy.disabled  = !meaning;
    setStatus(<span class="hljs-string">'Done.'</span>);
  }
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">fileToDataURL</span>(<span class="hljs-params">file</span>) </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve, reject</span>) =&gt;</span> {
      <span class="hljs-keyword">const</span> reader = <span class="hljs-keyword">new</span> FileReader();
      reader.onload  = <span class="hljs-function">() =&gt;</span> resolve(reader.result);
      reader.onerror = <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> reject(e);
      reader.readAsDataURL(file);
    });
  }
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">handleFiles</span>(<span class="hljs-params">files</span>) </span>{
    <span class="hljs-keyword">const</span> file = files?.[<span class="hljs-number">0</span>];
    <span class="hljs-keyword">if</span> (!file) { setStatus(<span class="hljs-string">'No file selected.'</span>); <span class="hljs-keyword">return</span>; }
    currentImageMime = file.type || <span class="hljs-string">"image/png"</span>;
    fileToDataURL(file)
      .then(<span class="hljs-function">(<span class="hljs-params">dataUrl</span>) =&gt;</span> {
        currentImageDataUrl = dataUrl;
        <span class="hljs-keyword">if</span> (preview) {
          preview.innerHTML = <span class="hljs-string">`&lt;img alt="preview" src="<span class="hljs-subst">${dataUrl}</span>" /&gt;`</span>;
          preview.classList.remove(<span class="hljs-string">'hidden'</span>);
        }
        setStatus(<span class="hljs-string">'Image loaded. Click "Describe" to continue.'</span>);
      })
      .catch(<span class="hljs-function">(<span class="hljs-params">err</span>) =&gt;</span> {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'[Makaton] fileToDataURL error'</span>, err);
        setStatus(<span class="hljs-string">'Could not read the image.'</span>);
      });
  }

  <span class="hljs-comment">// --- File input change ---</span>
  <span class="hljs-keyword">if</span> (fileInput) {
    fileInput.addEventListener(<span class="hljs-string">'change'</span>, <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton] file input change'</span>);
      handleFiles(e.target.files);
    });
  } <span class="hljs-keyword">else</span> {
    <span class="hljs-built_in">console</span>.warn(<span class="hljs-string">'[Makaton] #file input not found in DOM.'</span>);
  }

  <span class="hljs-comment">// --- Drag &amp; drop support on preview area ---</span>
  <span class="hljs-keyword">if</span> (preview) {
    preview.addEventListener(<span class="hljs-string">'dragover'</span>, <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> { e.preventDefault(); preview.classList.add(<span class="hljs-string">'drag'</span>); });
    preview.addEventListener(<span class="hljs-string">'dragleave'</span>, <span class="hljs-function">() =&gt;</span> preview.classList.remove(<span class="hljs-string">'drag'</span>));
    preview.addEventListener(<span class="hljs-string">'drop'</span>, <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> {
      e.preventDefault();
      preview.classList.remove(<span class="hljs-string">'drag'</span>);
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton] drop'</span>);
      handleFiles(e.dataTransfer?.files);
    });
  }

  <span class="hljs-comment">// --- Describe click ---</span>
  <span class="hljs-keyword">if</span> (btnDescribe) {
    btnDescribe.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-keyword">async</span> () =&gt; {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'[Makaton] Describe clicked'</span>);
      <span class="hljs-keyword">if</span> (!currentImageDataUrl) { setStatus(<span class="hljs-string">'Please upload an image first.'</span>); <span class="hljs-keyword">return</span>; }
      clearOutputs();
      setStatus(<span class="hljs-string">'Checking on-device AI availability…'</span>);

      <span class="hljs-keyword">const</span> avail = <span class="hljs-keyword">await</span> checkAvailability().catch(<span class="hljs-function">() =&gt;</span> ({ <span class="hljs-attr">nanoTextPossible</span>: <span class="hljs-literal">false</span> }));
      <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> apiKey = loadApiKey();
        <span class="hljs-keyword">if</span> (apiKey) {
          setStatus(<span class="hljs-string">'Using Gemini cloud for image description…'</span>);
          <span class="hljs-keyword">const</span> desc = <span class="hljs-keyword">await</span> describeImageWithGemini(currentImageDataUrl, apiKey, currentImageMime);
          setOutput(desc);
          <span class="hljs-keyword">return</span>;
        }
        <span class="hljs-keyword">if</span> (avail.nanoTextPossible) {
          setStatus(<span class="hljs-string">'No API key found. Using on-device AI (text) for best guess…'</span>);
          <span class="hljs-keyword">const</span> session = <span class="hljs-keyword">await</span> createNanoTextSession();
          <span class="hljs-keyword">const</span> desc = <span class="hljs-keyword">await</span> session.prompt(<span class="hljs-string">'Given an image is uploaded by the user (not directly visible to you), infer a likely one-sentence description of a common Makaton sign or symbol a teacher might upload. Keep it generic and safe.'</span>);
          setOutput(desc);
          <span class="hljs-keyword">return</span>;
        }
        setStatus(<span class="hljs-string">'No AI available. Please type a brief description.'</span>);
        <span class="hljs-keyword">if</span> (typedBox) typedBox.classList.remove(<span class="hljs-string">'hidden'</span>);
      } <span class="hljs-keyword">catch</span> (err) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'[Makaton] Describe error'</span>, err);
        setStatus(<span class="hljs-string">'Description failed: '</span> + (err?.message || err));
        <span class="hljs-keyword">if</span> (typedBox) typedBox.classList.remove(<span class="hljs-string">'hidden'</span>);
      }
    });
  } <span class="hljs-keyword">else</span> {
    <span class="hljs-built_in">console</span>.warn(<span class="hljs-string">'[Makaton] Describe button not found.'</span>);
  }

  <span class="hljs-comment">// --- Manual typing flow ---</span>
  <span class="hljs-keyword">if</span> (btnType) {
    btnType.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
      <span class="hljs-keyword">if</span> (typedBox) typedBox.classList.remove(<span class="hljs-string">'hidden'</span>);
      <span class="hljs-keyword">if</span> (typed) typed.focus();
    });
  }
  <span class="hljs-keyword">if</span> (btnUseTyped) {
    btnUseTyped.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
      <span class="hljs-keyword">const</span> text = (typed?.value || <span class="hljs-string">''</span>).trim();
      <span class="hljs-keyword">if</span> (!text) { setStatus(<span class="hljs-string">'Type a description first.'</span>); <span class="hljs-keyword">return</span>; }
      setOutput(text);
    });
  }

  <span class="hljs-comment">// --- Utilities ---</span>
  <span class="hljs-keyword">if</span> (btnSpeak) {
    btnSpeak.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
      <span class="hljs-keyword">const</span> text = meaningEl?.textContent?.trim();
      <span class="hljs-keyword">if</span> (!text) <span class="hljs-keyword">return</span>;
      <span class="hljs-keyword">const</span> u = <span class="hljs-keyword">new</span> SpeechSynthesisUtterance(text);
      speechSynthesis.cancel();
      speechSynthesis.speak(u);
    });
  }
  <span class="hljs-keyword">if</span> (btnCopy) {
    btnCopy.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-keyword">async</span> () =&gt; {
      <span class="hljs-keyword">const</span> text = meaningEl?.textContent?.trim();
      <span class="hljs-keyword">if</span> (!text) <span class="hljs-keyword">return</span>;
      <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">await</span> navigator.clipboard.writeText(text);
        setStatus(<span class="hljs-string">'Copied meaning to clipboard.'</span>);
      } <span class="hljs-keyword">catch</span> {
        setStatus(<span class="hljs-string">'Copy failed.'</span>);
      }
    });
  }

  <span class="hljs-comment">// --- Settings modal ---</span>
  <span class="hljs-keyword">if</span> (btnSettings &amp;&amp; settings) btnSettings.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> settings.showModal());
  <span class="hljs-keyword">if</span> (btnCloseSettings &amp;&amp; settings) btnCloseSettings.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> settings.close());
  <span class="hljs-keyword">if</span> (btnSaveKey) {
    btnSaveKey.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">(<span class="hljs-params">e</span>) =&gt;</span> {
      e.preventDefault();
      <span class="hljs-keyword">const</span> k = apiKeyInput?.value?.trim() || <span class="hljs-string">""</span>;
      saveApiKey(k);
      <span class="hljs-keyword">if</span> (apiStatus) apiStatus.textContent = k ? <span class="hljs-string">"API key saved locally. Try Describe again."</span> : <span class="hljs-string">"Cleared API key. You can still use on-device or typed mode."</span>;
    });
  }

  <span class="hljs-comment">// First status</span>
  setStatus(<span class="hljs-string">'Ready. Upload an image to begin.'</span>);
});
</code></pre>
<p>Let's break down the main sections of the <code>app.js</code> script for the Makaton AI Companion, as there’s a lot going on here:</p>
<ol>
<li><p><strong>Imports and Initial Setup:</strong></p>
<ul>
<li><p>The script imports functions from <code>mapping.js</code> and <code>ai.js</code> to handle mapping descriptions to meanings and AI interactions.</p>
</li>
<li><p>It sets up event listeners for when the DOM content is fully loaded, ensuring all elements are ready for interaction.</p>
</li>
</ul>
</li>
<li><p><strong>Element Selection:</strong></p>
<ul>
<li>It uses a helper function <code>$</code> to select DOM elements by their CSS selectors. This includes file inputs, buttons, and display areas for image previews and outputs.</li>
</ul>
</li>
<li><p><strong>Sanity Logs:</strong></p>
<ul>
<li>It logs the presence of key elements to the console for debugging purposes, ensuring that all necessary elements are found in the DOM.</li>
</ul>
</li>
<li><p><strong>API Key Initialization:</strong></p>
<ul>
<li>It loads any saved API key from local storage and sets it in the input field for user convenience.</li>
</ul>
</li>
<li><p><strong>Helper Functions:</strong></p>
<ul>
<li><p><code>setStatus</code>: Updates the status message displayed to the user.</p>
</li>
<li><p><code>clearOutputs</code>: Clears the output and meaning display areas and disables buttons for speaking and copying.</p>
</li>
<li><p><code>setOutput</code>: Displays the AI-generated description and maps it to a Makaton meaning, enabling buttons if a valid meaning is found.</p>
</li>
<li><p><code>fileToDataURL</code>: Converts an uploaded file to a data URL for image preview and processing.</p>
</li>
<li><p><code>handleFiles</code>: Handles file selection, updating the preview and setting the current image data URL.</p>
</li>
</ul>
</li>
<li><p><strong>File Input Change Handling:</strong></p>
<ul>
<li>It listens for changes in the file input, processes the selected file, and updates the preview area.</li>
</ul>
</li>
<li><p><strong>Drag &amp; Drop Support:</strong></p>
<ul>
<li>It adds drag-and-drop functionality to the preview area, allowing users to drag files directly onto the app for processing.</li>
</ul>
</li>
<li><p><strong>Describe Button Click:</strong></p>
<ul>
<li><p>It handles the "Describe" button click event, checking for an uploaded image and attempting to describe it using either the Gemini API or on-device AI.</p>
</li>
<li><p>If no AI is available, it prompts the user to type a description manually.</p>
</li>
</ul>
</li>
<li><p><strong>Manual Typing Flow:</strong></p>
<ul>
<li>It allows users to manually type a description if AI processing is unavailable or fails, updating the output with the typed text.</li>
</ul>
</li>
<li><p><strong>Utilities:</strong></p>
<ul>
<li><p><code>btnSpeak</code>: Uses the browser's SpeechSynthesis API to read aloud the mapped meaning.</p>
</li>
<li><p><code>btnCopy</code>: Copies the mapped meaning to the clipboard for easy sharing.</p>
</li>
</ul>
</li>
<li><p><strong>Settings Modal:</strong></p>
<ul>
<li>It manages the settings modal for entering and saving the API key, providing feedback on the key's status.</li>
</ul>
</li>
<li><p><strong>Initial Status:</strong></p>
<ul>
<li>It sets the initial status message to guide the user to upload an image to begin the process.</li>
</ul>
</li>
</ol>
<p>This script effectively ties together the user interface, file handling, AI processing, and output display, providing a seamless experience for translating Makaton signs into English meanings.</p>
<h4 id="heading-how-vision-and-language-work-together-here">How Vision and Language Work Together Here</h4>
<p>While working on this project, I started appreciating how computer vision and language understanding complement each other in multimodal systems like this one.</p>
<ul>
<li><p>The vision model (Gemini or Nano) interprets <em>what it sees</em> like hand shapes, gestures, or layout and turns that visual context into descriptive language.</p>
</li>
<li><p>The language mapping logic then interprets those words, infers intent, and finds the closest semantic match (e.g., “help,” “friend,” “eat”).</p>
</li>
<li><p>It’s a collaboration between two forms of understanding (<em>perceptual</em> and <em>semantic</em>) that together allow the AI to bridge the gap between gesture and meaning.</p>
</li>
</ul>
<p>This realization reshaped how I think about accessibility: the best assistive technologies often emerge not from smarter models alone, but from the interaction between modalities like seeing, describing, and reasoning in context.</p>
<h3 id="heading-6-optional-speak-and-copy">6. Optional — Speak and Copy</h3>
<p>To make the app more accessible, I added speech output and a quick copy button:</p>
<pre><code class="lang-javascript">btnSpeak.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">const</span> text = meaningEl.textContent.trim();
  <span class="hljs-keyword">if</span> (text) speechSynthesis.speak(<span class="hljs-keyword">new</span> SpeechSynthesisUtterance(text));
});

btnCopy.addEventListener(<span class="hljs-string">'click'</span>, <span class="hljs-keyword">async</span> () =&gt; {
  <span class="hljs-keyword">const</span> text = meaningEl.textContent.trim();
  <span class="hljs-keyword">if</span> (text) <span class="hljs-keyword">await</span> navigator.clipboard.writeText(text);
});
</code></pre>
<p>This gives users both visual and auditory feedback, especially helpful for learners or educators.</p>
<h2 id="heading-how-to-fix-the-common-issues">How to Fix the Common Issues</h2>
<p>No AI or web integration project runs smoothly the first time – and that’s okay. Here’s a breakdown of the main issues I faced while building the Makaton AI Companion, how I diagnosed them, and how I fixed each one.</p>
<p>These lessons will help anyone trying to integrate Gemini APIs, on-device AI, or local web apps without a full backend.</p>
<h3 id="heading-1-the-cors-error-when-running-with-file">1. The “CORS” Error When Running With <code>file://</code></h3>
<p>When I first opened my <code>index.html</code> directly from my file explorer, Chrome threw several CORS policy errors:</p>
<pre><code class="lang-python">Access to script at <span class="hljs-string">'file:///lib/ai.js'</span> <span class="hljs-keyword">from</span> origin <span class="hljs-string">'null'</span> has been blocked by CORS policy.
</code></pre>
<p>At first this looked confusing, but the reason is simple: modern browsers block JavaScript modules (<code>import/export</code>) when running from <code>file://</code> paths for security reasons.</p>
<p>✅ <strong>Fix:</strong> I realized I needed to serve the files over <strong>HTTP</strong>, not from the file system. So I ran a quick local web server using Python:</p>
<pre><code class="lang-python">python -m http.server <span class="hljs-number">8080</span>
</code></pre>
<p>Then opened:</p>
<pre><code class="lang-python">http://localhost:<span class="hljs-number">8080</span>/index.html
</code></pre>
<p>That single step fixed all the CORS errors and allowed my modules to load correctly.</p>
<h3 id="heading-2-model-not-found-404-from-the-gemini-api">2. “Model Not Found” (404) From the Gemini API</h3>
<p>The next big challenge came from the Gemini API. Even though I had a valid API key, my console showed this error:</p>
<pre><code class="lang-python"><span class="hljs-string">"models/gemini-1.5-flash"</span> <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> found <span class="hljs-keyword">for</span> API version v1beta, <span class="hljs-keyword">or</span> <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> supported <span class="hljs-keyword">for</span> generateContent.
</code></pre>
<p>It turns out Google’s API endpoints can vary slightly depending on your project setup and key permissions.</p>
<p>✅ <strong>Fix:</strong> I rewrote my <code>lib/ai.js</code> script to automatically <strong>try multiple Gemini model endpoints</strong> until it found one that worked. Something like this:</p>
<pre><code class="lang-python">const GEMINI_IMAGE_ENDPOINTS = [
  <span class="hljs-string">"https://generativelanguage.googleapis.com/v1/models/gemini-1.5-flash:generateContent"</span>,
  <span class="hljs-string">"https://generativelanguage.googleapis.com/v1/models/gemini-1.5-pro:generateContent"</span>,
  <span class="hljs-string">"https://generativelanguage.googleapis.com/v1/models/gemini-1.5-flash-latest:generateContent"</span>,
];
</code></pre>
<p>And I wrapped it in a loop that stopped once one endpoint succeeded.</p>
<p>Later, I improved it further by listing available models dynamically using<br><code>https://generativelanguage.googleapis.com/v1/models?key=YOUR_KEY</code> and automatically trying whichever ones supported image generation.</p>
<p>That dynamic discovery approach fixed the 404 errors permanently.</p>
<h3 id="heading-3-packaging-a-local-single-file-version"><strong>3. Packaging a Local Single-File Version</strong></h3>
<p>Once I got everything working, I wanted a version that others could test easily without installing Node.js or running build tools.</p>
<p>✅ <strong>Fix:</strong> I bundled the project into a simple zip file containing:</p>
<pre><code class="lang-python">index.html
app.js
lib/ai.js
lib/mapping.js
styles.css
</code></pre>
<p>That way, anyone can just unzip and run:</p>
<pre><code class="lang-python">python -m http.server <span class="hljs-number">8080</span>
</code></pre>
<p>and open <code>localhost:8080</code>.</p>
<p>Everything runs locally in the browser, no server-side code required. This also makes it perfect for demos, classrooms, and so on.</p>
<h3 id="heading-4-debugging-script-import-errors-in-the-console">4. Debugging Script Import Errors in the Console</h3>
<p>Another subtle issue appeared when I noticed this red message:</p>
<pre><code class="lang-python">The requested module <span class="hljs-string">'./lib/mapping.js'</span> does <span class="hljs-keyword">not</span> provide an export named <span class="hljs-string">'mapDescriptionToMeaning'</span>
</code></pre>
<p>That line told me exactly what was wrong: my import and export function names didn’t match. The fix was straightforward:</p>
<pre><code class="lang-python">// app.js
<span class="hljs-keyword">import</span> { mapDescriptionToMeaning } <span class="hljs-keyword">from</span> <span class="hljs-string">'./lib/mapping.js'</span>;
</code></pre>
<p>And then ensuring the mapping file exported it:</p>
<pre><code class="lang-python">// mapping.js
export function mapDescriptionToMeaning(desc) { ... }
</code></pre>
<p>After that, all the pieces connected smoothly.</p>
<p>Using the browser console <strong>as my debugging dashboard</strong> turned out to be the most powerful tool of all. Every fix started by reading and reasoning about those red error lines.</p>
<h2 id="heading-demo-the-makaton-ai-companion-in-action">Demo: The Makaton AI Companion in Action</h2>
<p>Let’s see the Makaton AI Companion in action and understand what’s happening under the hood.</p>
<h3 id="heading-step-1-run-the-app-locally">Step 1: Run the app locally</h3>
<p>Once you’ve downloaded or cloned the project folder, open your terminal in that directory and start a local development server: <code>python -m http.server 8080</code>. Then open your browser and visit: <code>http://localhost:8080/index.html</code></p>
<p>You should see the Makaton AI Companion interface:</p>
<p><img src="https://github.com/tayo4christ/makaton-ai-companion/blob/9cc834fa75f6dcd39866c538ed42255f9006bb51/assets/app-interface.jpg?raw=true" alt="Main interface of the Makaton AI Companion app" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-2-get-your-gemini-api-key">Step 2: Get Your Gemini API Key</h3>
<p>To enable cloud-based image description, you’ll need a <a target="_blank" href="https://aistudio.google.com/welcome?utm_source=PMAX&amp;utm_medium=display&amp;utm_campaign=FY25-global-DR-pmax-1710442&amp;utm_content=pmax&amp;gclsrc=aw.ds&amp;gad_source=1&amp;gad_campaignid=21521981511&amp;gbraid=0AAAAACn9t66nbeHlpP_VYvpWIrX7IJGEW&amp;gclid=EAIaIQobChMIqf-KiIHbkAMV1ZFQBh0KHA8wEAAYASAAEgKLA_D_BwE"><strong>Gemini API key</strong></a> from Google AI Studio.</p>
<p><strong>Here’s how to generate one:</strong></p>
<ol>
<li><p>Visit: <code>https://aistudio.google.com/welcome</code></p>
</li>
<li><p>Click <strong>“Create API key”</strong> and link it to your Google Cloud project (or create a new one).</p>
</li>
<li><p>Copy the key it will look like this: <code>AIzaSyA...XXXXXXXXXXXX</code></p>
</li>
<li><p>Open the Makaton AI Companion in your browser and click the <strong>Settings</strong> button (top left).</p>
</li>
<li><p>Paste your key in the input box and click <strong>Save</strong>.</p>
</li>
</ol>
<p><img src="https://github.com/tayo4christ/makaton-ai-companion/blob/9cc834fa75f6dcd39866c538ed42255f9006bb51/assets/api-key-setting.jpg?raw=true" alt="Setting up the OpenAI API key in the app interface" width="600" height="400" loading="lazy"></p>
<p>You’ll see a confirmation message like this:</p>
<blockquote>
<p><em>“API key saved locally. Try Describe again.”</em></p>
</blockquote>
<p>This means your key is stored safely in localStorage and is only accessible from your browser.</p>
<h3 id="heading-step-3-enable-gemini-nano-for-on-device-ai">Step 3: Enable Gemini Nano for On-Device AI</h3>
<p>If you’re using <a target="_blank" href="https://www.google.com/intl/en_uk/chrome/canary/"><strong>Chrome Canary</strong>,</a> you can run Gemini Nano locally without internet access. This allows the Makaton AI Companion to generate text even when the API key isn’t set.</p>
<h4 id="heading-download-and-install-chrome-canary">Download and Install Chrome Canary:</h4>
<p>Visit the official Chrome Canary download page and install it on your Windows or macOS system. Chrome Canary is a special version of Chrome designed for developers and early adopters, offering the latest features and updates.</p>
<h4 id="heading-enable-gemini-nano">Enable Gemini Nano:</h4>
<p>Open Chrome Canary and type <code>chrome://flags/#prompt-api-for-gemini-nano</code> in the address bar.</p>
<p>Locate the "Prompt API for Gemini Nano" flag in the list. Set this flag to <strong>Enabled</strong>. This action allows Chrome Canary to support the Gemini Nano model for on-device AI processing.</p>
<p>After enabling the flag, relaunch Chrome Canary to apply the changes.</p>
<h4 id="heading-download-the-gemini-nano-model">Download the Gemini Nano Model:</h4>
<p>Open a new tab in Chrome Canary and enter <code>chrome://components</code> in the address bar.</p>
<p>Scroll down to find the <strong>“Optimization Guide”</strong> component. Click on <strong>Check for update</strong>. This action will initiate the download of the Gemini Nano model, which is necessary for running AI tasks locally without an internet connection.</p>
<h4 id="heading-verify-installation">Verify Installation:</h4>
<p>Once the Gemini Nano model is installed, the Makaton AI Companion app will automatically detect it. You should see a message indicating that the app is using on-device AI: <em>“No API key found. Using on-device AI (text) for best guess…”</em></p>
<p>This confirmation means that the app can now generate text descriptions using the Gemini Nano model without needing an API key or internet access.</p>
<p>By following these detailed steps, you ensure that the Gemini Nano model is correctly set up and ready to use for on-device AI processing in the Makaton AI Companion.</p>
<h3 id="heading-step-4-upload-a-makaton-sign-or-symbol">Step 4: Upload a Makaton sign or symbol</h3>
<p>Click <strong>Choose File</strong> to upload any Makaton image (for example, the “help” sign), then press <strong>Describe (Cloud or Nano)</strong>. You’ll immediately see console logs confirming that the app is running correctly and connecting to the Gemini API:</p>
<p><img src="https://github.com/tayo4christ/makaton-ai-companion/blob/9cc834fa75f6dcd39866c538ed42255f9006bb51/assets/console.jpg?raw=true" alt="Console output showing real-time translation logs" width="600" height="400" loading="lazy"></p>
<h3 id="heading-step-5-ai-description-and-mapping">Step 5: AI Description and Mapping</h3>
<p>Here’s what happens next:</p>
<ol>
<li><p>The image is read and encoded as Base64.</p>
</li>
<li><p>The Gemini API (cloud or on-device) generates a short visual description.</p>
</li>
<li><p>The description is passed to the <code>mapDescriptionToMeaning()</code> function.</p>
</li>
<li><p>If keywords match an entry in the <code>MAKATON_GLOSSES</code> dictionary, the app displays the corresponding English meaning.</p>
</li>
<li><p>Finally, users can click <strong>Speak</strong> or <strong>Copy</strong> to hear or reuse the translation.</p>
</li>
</ol>
<p>Example outputs:</p>
<p><strong>When no mapping is found:</strong><br>The AI description is accurate but doesn’t yet match a known Makaton keyword.</p>
<p><img src="https://github.com/tayo4christ/makaton-ai-companion/blob/9cc834fa75f6dcd39866c538ed42255f9006bb51/assets/Incorrect-demonstration.jpg?raw=true" alt="Incorrect demonstration showing the model misinterpreting a sign" width="600" height="400" loading="lazy"></p>
<p><strong>After updating the mapping list:</strong><br>Adding new keywords like <code>"help"</code>, <code>"assist"</code>, or <code>"hand over hand"</code> enables correct translation.</p>
<p><img src="https://github.com/tayo4christ/makaton-ai-companion/blob/9cc834fa75f6dcd39866c538ed42255f9006bb51/assets/correct-demonstration.jpg?raw=true" alt="Correct demonstration where the AI accurately recognizes the Makaton sign" width="600" height="400" loading="lazy"></p>
<h3 id="heading-why-this-matters">Why this matters</h3>
<p>This demonstrates how accessible, AI-assisted tools can support communication for people who rely on Makaton. Even when a gesture isn’t recognized, the system provides a structured output and allows users or educators to expand the mapping list making the tool smarter over time.</p>
<h2 id="heading-broader-reflections">Broader Reflections</h2>
<p>Building this project turned out to be much more than a coding exercise for me.<br>It was a meaningful experiment in combining accessibility, natural language processing, and computer vision. These three fields, when brought together, can create real social impact.</p>
<p>While working on it, I began to understand how computer vision and language understanding complement each other in practice. The vision model perceives the world by identifying shapes, gestures, and spatial patterns, while the language model interprets what those visuals mean in human terms.<br>In this project, the artificial intelligence system first sees the Makaton sign, then describes it, and finally maps it to an English word that carries intent and meaning.</p>
<p>This interaction between perception and semantics is what makes multimodal artificial intelligence so powerful. It is not only about recognizing an image or generating text; it is about building systems that connect understanding across different forms of information to make technology more inclusive and human centered.</p>
<p>This realization changed how I think about accessibility technology. True innovation happens not only through smarter models but through the harmony between seeing and understanding, between what an artificial intelligence system observes and how it communicates that observation to help people.</p>
<h3 id="heading-accessibility-meets-ai">Accessibility Meets AI</h3>
<p>Working on this project reminded me that accessibility isn’t just about compliance or assistive devices. It’s also about inclusion. A simple AI system that can describe a hand gesture or symbol in real time can empower teachers, parents, and students who communicate using Makaton or similar systems.</p>
<p>By mapping AI-generated descriptions to meaningful phrases, the app demonstrates how AI can support inclusive education<strong>,</strong> even at small scales. It bridges the communication gap between verbal and nonverbal learners, which is something that traditional translation systems often overlook.</p>
<h3 id="heading-integrating-nlp-and-computer-vision">Integrating NLP and Computer Vision</h3>
<p>On the technical side, this project showed me how naturally computer vision and language understanding complement each other. The Gemini API’s multimodal models were able to analyze an image and produce coherent natural-language sentences, something that older APIs couldn’t do without chaining multiple tools.</p>
<p>By feeding that output into a lightweight NLP mapping function, I was able to simulate a very early-stage symbol-to-language translator the core of my broader research interest in automatic Makaton-to-English translation.</p>
<h3 id="heading-why-local-ai-gemini-nano-matters">Why Local AI (Gemini Nano) Matters</h3>
<p>While the cloud models are powerful, experimenting with Gemini Nano revealed something exciting:<br>on-device AI can make accessibility tools faster, safer, and more private.</p>
<p>In classrooms or therapy sessions, you often can’t rely on stable internet connections or share sensitive student data. Running inference locally means learners’ gestures or symbol images never leave the device, a crucial step toward privacy-preserving accessibility AI.</p>
<p>And since Nano runs directly inside Chrome Canary, it shows how AI is becoming embedded at the browser level, lowering barriers for teachers and developers to build inclusive solutions without needing large infrastructure.</p>
<h3 id="heading-looking-forward">Looking Forward</h3>
<p>This prototype is just a starting point. Future iterations could integrate gesture recognition directly from camera input, support multiple symbol sets, or even learn from user feedback to expand the dictionary automatically.</p>
<p>Most importantly, it reinforces a central belief in my research and teaching journey:</p>
<p><strong>Accessibility innovation doesn’t require massive systems. It starts with curiosity, empathy, and a few lines of purposeful code.</strong></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building the Makaton AI Companion has been one of the most rewarding projects in my AI journey – not just because it worked, but because it proved how accessible innovation can be.</p>
<p>With just a browser, a few lines of JavaScript, and the right API, I was able to combine computer vision, language understanding, and accessibility design into a working system that translates symbols into meaning. It’s a small step toward a future where anyone, regardless of speech or language ability, can be understood through technology.</p>
<p>The project also reinforced something deeply personal to me as a researcher and educator: that AI for accessibility doesn’t need to be complex, expensive, or centralized. It can be lightweight, open, and built with empathy by anyone who’s willing to learn and experiment.</p>
<h3 id="heading-join-the-conversation">Join the Conversation</h3>
<p>If this project inspires you, I’d love to see your own experiments and improvements. Can you make it support live webcam gestures? Could you adapt it for other symbol systems, like PECS or BSL?</p>
<p>Share your ideas in the comments or tag me if you publish your own version. Together, we can grow a small prototype into a community-driven accessibility tool and continue exploring how AI can give more people a voice.</p>
<p>Full source code on GitHub: <a target="_blank" href="https://github.com/tayo4christ/makaton-ai-companion">Makaton-ai-companion</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Perform Sentence Similarity Check Using Sentence Transformers ]]>
                </title>
                <description>
                    <![CDATA[ Sentence similarity plays an important role in many natural language processing (NLP) applications.  Whether you build chatbots, recommendation systems, or search engines, understanding how close two sentences are in meaning can improve user experien... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-perform-sentence-similarity-check-using-sentence-transformers/</link>
                <guid isPermaLink="false">68b86d04b7b16a9a0d9ce2d2</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 03 Sep 2025 16:29:56 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756916978057/de0bda62-c9ea-48d1-b1ac-b78eb10e82d2.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Sentence similarity plays an important role in many natural language processing (NLP) applications. </p>
<p>Whether you build chatbots, recommendation systems, or search engines, understanding how close two sentences are in meaning can improve user experience – and this is what sentence similarity allows you to do.</p>
<p><a target="_blank" href="https://sbert.net/">Sentence Transformers</a> make this process simple and efficient. In this guide, you will learn what sentence similarity is, how Sentence Transformers work, and how to write code to measure similarity between two sets of sentences.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-is-sentence-similarity">What Is Sentence Similarity?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-use-sentence-transformers">Why Use Sentence Transformers</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-loading-a-pre-trained-model">Loading a Pre-trained Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-defining-sentences-to-compare">Defining Sentences to Compare</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-converting-sentences-into-embeddings">Converting Sentences into Embeddings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-calculating-cosine-similarity">Calculating Cosine Similarity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-printing-the-results">Printing the Results</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-sample-output">Sample Output</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-interpret-the-scores">How to Interpret the Scores</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-real-world-applications-of-sentence-similarity">Real-World Applications of Sentence Similarity</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-semantic-search">Semantic Search</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-duplicate-detection">Duplicate Detection</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-recommendation-systems">Recommendation Systems</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-chatbots-and-virtual-assistants">Chatbots and Virtual Assistants</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-improving-performance-with-larger-models">Improving Performance with Larger Models</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-is-sentence-similarity">What Is Sentence Similarity?</h2>
<p>Sentence similarity is the process of comparing two sentences to see how close they are in meaning. It does not look at the exact words but focuses on the meaning behind them.</p>
<p>For example:</p>
<ul>
<li><p>“The cat is sitting outside”</p>
</li>
<li><p>“The dog is playing in the garden”</p>
</li>
</ul>
<p>Both sentences talk about animals outdoors, so they share some similarity even though they use different words.</p>
<p>This kind of understanding is essential for tasks like document clustering, duplicate detection, or semantic search.</p>
<h2 id="heading-why-use-sentence-transformers">Why Use Sentence Transformers</h2>
<p>Traditional methods like <a target="_blank" href="https://www.freecodecamp.org/news/how-bag-of-words-works/">Bag of Words</a> relied on simple word matching or frequency counts. But these fail when words differ but the meaning stays the same.</p>
<p>Sentence Transformers solve this by using transformer-based language models like <a target="_blank" href="https://en.wikipedia.org/wiki/BERT_%28language_model%29">BERT</a> or RoBERTa to create embeddings.</p>
<p>An <a target="_blank" href="https://www.freecodecamp.org/news/understanding-word-embeddings-the-building-blocks-of-nlp-and-gpts/">embedding</a> is a list of numbers that represents the meaning of a sentence. When two embeddings are close together in this high-dimensional space, their sentences are similar in meaning.</p>
<p>The Sentence Transformers library in Python makes this easy by providing pre-trained models that can generate embeddings for sentences.</p>
<h3 id="heading-installing-the-required-libraries">Installing the Required Libraries</h3>
<p>Before you start coding, make sure you install the required packages. Run this command to do so:</p>
<pre><code class="lang-plaintext">pip install -U sentence-transformers
</code></pre>
<p>This will install the Sentence Transformers library along with its dependencies.</p>
<h2 id="heading-loading-a-pre-trained-model">Loading a Pre-trained Model</h2>
<p>Sentence Transformers offers several pre-trained models. For this example, you will use the <strong>all-MiniLM-L6-v2</strong> model. It’s lightweight, fast, and works well for most applications.</p>
<p>Here is how to load it in Python:</p>
<pre><code class="lang-plaintext">from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("all-MiniLM-L6-v2")
</code></pre>
<p>Once loaded, this model can convert any sentence into its corresponding embedding.</p>
<h2 id="heading-defining-sentences-to-compare">Defining Sentences to Compare</h2>
<p>You need two lists of sentences for comparison. Here is an example:</p>
<pre><code class="lang-plaintext">sentences1 = [
    'The cat sits outside',
    'A man is playing guitar',
    'The movies are awesome'
]

sentences2 = [
    'The dog plays in the garden',
    'A woman watches TV',
    'The new movie is so great'
]
</code></pre>
<p>Each sentence in <code>sentences1</code> will be compared with the sentence at the same position in <code>sentences2</code>.</p>
<h2 id="heading-converting-sentences-into-embeddings">Converting Sentences into Embeddings</h2>
<p>Now that you have sentences, you must convert them into embeddings using the model.</p>
<p>Add this code:</p>
<pre><code class="lang-plaintext"># Convert sentences to embeddings
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)
</code></pre>
<p>The <code>convert_to_tensor=True</code> argument tells the model to return <a target="_blank" href="https://docs.pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html">PyTorch tensors</a>, which work well with similarity calculations.</p>
<h2 id="heading-calculating-cosine-similarity">Calculating Cosine Similarity</h2>
<p>Once you have embeddings, you need a way to measure similarity. The <a target="_blank" href="https://www.youtube.com/watch?v=zcUGLp5vwaQ">cosine similarity</a> metric is commonly used for this.</p>
<p>Cosine similarity looks at the angle between two vectors in a high-dimensional space. If the angle is small, the vectors are similar.</p>
<p>Add this code to compute similarity:</p>
<pre><code class="lang-plaintext">from sentence_transformers import util
# Compute cosine similarity
cosine_scores = util.cos_sim(embeddings1, embeddings2)
</code></pre>
<p>Now <code>cosine_scores</code> contains the similarity score for each sentence pair.</p>
<h2 id="heading-printing-the-results">Printing the Results</h2>
<p>To see the results clearly, format them like this:</p>
<pre><code class="lang-plaintext"># Print formatted results
for i in range(len(sentences1)):
    print(f"{sentences1[i]} \t\t {sentences2[i]} \t\t Score: {cosine_scores[i][i]:.4f}")
</code></pre>
<p>This will print each sentence pair along with its similarity score.</p>
<h2 id="heading-sample-output">Sample Output</h2>
<p>If you run this code, you will see a result similar to the below. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756385160047/576750a6-3c65-45e7-a634-f1e7375e7e16.png" alt="576750a6-3c65-45e7-a634-f1e7375e7e16" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>The third pair has the highest score because both sentences talk about movies in a positive way.</p>
<h2 id="heading-how-to-interpret-the-scores">How to Interpret the Scores</h2>
<p>The cosine similarity score ranges between <strong>-1</strong> and <strong>1</strong>.</p>
<ul>
<li><p>A score close to <strong>1</strong> means the sentences are very similar.</p>
</li>
<li><p>A score near <strong>0</strong> means they are unrelated.</p>
</li>
<li><p>Negative values mean the sentences are not related or even opposite in meaning.</p>
</li>
</ul>
<p>In most real-world cases, you focus on values between 0 and 1. The higher the value, the closer the meanings.</p>
<h2 id="heading-real-world-applications-of-sentence-similarity">Real-World Applications of Sentence Similarity</h2>
<p>Sentence similarity has become a core part of many modern applications because it helps systems understand meaning rather than relying on exact words. This shift makes search, analysis, and recommendations far more accurate and useful.</p>
<h3 id="heading-semantic-search"><strong>Semantic Search</strong></h3>
<p>Traditional search engines depend on keyword matches. If the exact words are missing, results often become irrelevant. <a target="_blank" href="https://en.wikipedia.org/wiki/Semantic_search">Semantic search</a> solves this problem by looking at the meaning behind a query. </p>
<p>For example, if someone searches for “best ways to learn guitar,” the system can return results for “top tips to play the guitar” even though the keywords differ. This makes search experiences smoother and more intelligent.</p>
<h3 id="heading-duplicate-detection"><strong>Duplicate Detection</strong></h3>
<p>Large datasets often contain repeated or near-duplicate content. Manual checking is impossible when dealing with millions of records. </p>
<p>Sentence similarity automates this by detecting texts that carry the same meaning even if the wording changes slightly. This is especially useful in data cleaning, web scraping pipelines, or managing user-generated content.</p>
<h3 id="heading-recommendation-systems"><strong>Recommendation Systems</strong></h3>
<p>Recommendation engines work best when they understand context. For instance, if a user likes articles about “healthy cooking,” the system can recommend content on “nutritious recipes” or “quick healthy meals” using similarity scores. This approach goes beyond surface-level keywords and finds deeper connections in the text.</p>
<h3 id="heading-chatbots-and-virtual-assistants"><strong>Chatbots and Virtual Assistants</strong></h3>
<p>Chatbots store a large set of possible user questions and answers. When someone types a new question, the system must find the most relevant response. By using sentence similarity, chatbots match user input with the closest existing query in meaning, not just words, leading to more accurate and natural conversations.</p>
<h3 id="heading-improving-performance-with-larger-models">Improving Performance with Larger Models</h3>
<p>The <a target="_blank" href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2">all-MiniLM-L6-v2</a> model is fast and accurate for small to medium tasks.</p>
<p>For more accuracy, you can try larger models like <a target="_blank" href="https://huggingface.co/sentence-transformers/all-mpnet-base-v2">all-mpnet-base-v2</a>, though they may require more memory and time to run.</p>
<p>Replace the model name in your code to use a different pre-trained model:</p>
<pre><code class="lang-plaintext">model = SentenceTransformer("all-mpnet-base-v2")
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Sentence Transformers make it easy to measure sentence similarity using pre-trained models. By converting sentences into embeddings and comparing them with cosine similarity, you can build systems that understand meaning rather than relying on simple word matching.</p>
<p>With just a few lines of code, you can integrate this into chatbots, search engines, or recommendation systems and create more intelligent applications.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><em>visit my website</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Semantic Matching? How to Find Words in a Document Using NLP ]]>
                </title>
                <description>
                    <![CDATA[ Have you ever found yourself searching a document for a specific word or phrase just to discover that the term you're looking for isn't there? It can be frustrating, right? Sometimes, even though you might not see the exact term you’re looking for, t... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-semantic-matching-find-words-in-a-document-using-nlp/</link>
                <guid isPermaLink="false">67802329a9edea9df0053dd7</guid>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ibrahim Ogunbiyi ]]>
                </dc:creator>
                <pubDate>Thu, 09 Jan 2025 19:27:37 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Dh7gzpVpdWQ/upload/4e1e504663acda31b980e6fba0c2d661.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Have you ever found yourself searching a document for a specific word or phrase just to discover that the term you're looking for isn't there? It can be frustrating, right?</p>
<p>Sometimes, even though you might not see the exact term you’re looking for, the document might contain similar words or phrases that have the same meaning or context but don’t have the exact same form (such as differences in spelling).</p>
<p>Traditional NLP search approaches have relied on using exact forms to search for words or phrases in a particular document. But this fails at finding words based on semantic or contextual meaning.</p>
<p>To solve this, semantic matching comes into play. It’s an advanced way of searching that takes advantage of traditional search methods while also focusing more on locating or matching words or phrases based on their meaning or context (rather than solely on their exact form).</p>
<p>In this article, you will learn how to perform semantic matching using NLP. Without further ado, let’s get started.</p>
<h2 id="heading-requirements">Requirements</h2>
<p>To make sure that you can reproduce the experiment in this tutorial, you’ll need to have a few things.</p>
<p>First, you’ll need to have Python 3.x (preferably Python 3.10) installed on your PC. You’ll also need some libraries, which you can install using the Pip package manager.</p>
<p>You should also have basic knowledge of NLP such as text preprocessing and text representation techniques. You can learn more <a target="_blank" href="https://www.freecodecamp.org/news/natural-language-processing-techniques-for-beginners/">here</a>.</p>
<p>You can also <a target="_blank" href="https://github.com/ibrahim-ogunbiyi/Semantic_Matching">fork the repo</a> which contains all the code in this article so you can follow along.</p>
<p>To install everything using Pip, type the following command:</p>
<pre><code class="lang-bash">// to install with pip
pip install pypdf2 keybert sentence-transformers
</code></pre>
<h2 id="heading-problem-definition">Problem Definition</h2>
<p>Suppose you’re a data scientist who’s part of a curriculum development team and want to know if a particular concept (word or phrase), say <strong>birth control</strong>, is being taught in a curriculum that’s in a pdf document.</p>
<p>One way you could do this is to open the pdf using a pdf tool and then use the ctrl + f (find) method to check if the phrase birth control is in the pdf.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736408224052/2e6dacef-ef92-4113-a574-ec355e99e6f6.png" alt="The PDF we're working with here" class="image--center mx-auto" width="1600" height="853" loading="lazy"></p>
<p>You could also do it programmatically, as shown below: </p>
<pre><code class="lang-python"><span class="hljs-comment"># import library</span>
<span class="hljs-keyword">import</span> PyPDF2

<span class="hljs-comment"># use PDFreader from PyPDF2 to read pdf content.</span>
pdf_reader = PyPDF2.PdfReader(<span class="hljs-string">"Relationships_Education_RSE_and_Health_Education.pdf"</span>)

<span class="hljs-comment"># join all the content in the pdf pages together and lowercase the letters</span>
pdf_document = <span class="hljs-string">" "</span>.join([page.extract_text().lower() <span class="hljs-keyword">for</span> page <span class="hljs-keyword">in</span> pdf_reader.pages])

<span class="hljs-comment"># check if the string 'birth control' is in the document [Returns False]</span>
<span class="hljs-string">"birth control"</span> <span class="hljs-keyword">in</span> pdf_document
</code></pre>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python"><span class="hljs-literal">False</span>
</code></pre>
<p>As shown above, you can see that both the programmatic way of searching and the pdf tool say that the phrase “birth control” doesn't exist in the pdf document.</p>
<p>Well, this might be true, but because this is a traditional way of NLP searching (that matches word for word in exact form) let’s not fully trust it. As I explained earlier, some words might be in different forms or have a different spelling, but they might mean the same thing contextually or semantically.</p>
<p>So how do we solve this issue? This is where semantic matching comes into play.</p>
<h2 id="heading-what-is-semantic-matching">What is Semantic Matching?</h2>
<p>Semantic Matching is a technique used to determine if two elements have the same meaning. An element can be a word, phrase, sentence, document, or even a corpus. It refers to matching elements based on meaning or context and not just matching based on exact form.</p>
<p>In order to perform semantic matching in NLP, there are certain things you need to know and do. Let’s go through them now:</p>
<h3 id="heading-what-is-word-embedding">What is Word Embedding?</h3>
<p>Word embedding is an advanced text representation technique used to represent words in a lower-dimensional vector representation. This vector representation captures inter-word semantic and syntactic information. This means that words that have similar meanings – even though they might be spelled differently – will have close to similar vector representations.</p>
<h4 id="heading-what-does-lower-dimensional-vector-representation-mean">What does Lower-Dimensional Vector representation mean?</h4>
<p>In NLP, traditional ways of representing text in a way machines can understand (that is, numerical vector representations) are Bag of Words, Term-Frequency and Inverse Document Frequency (TF-IDF), and One-hot encoding. But these techniques usually generate high dimensions (usually the size of the vocabulary) for a particular word representation and are sparse (meaning there will be lots of zeros).</p>
<p>So, for example, if a word is to be represented as a numerical vector and the document or corpus the word belongs to has 10,000 vocabularies, the size of the dimension of that word would be 10,000 (making it high).</p>
<p>The disadvantages of these techniques are high dimensions, sparsity, and their non-capability in capturing semantic information. So, advancements in NLP led to the development of word embedding techniques that simply create lower (also known as more dense) vector representations of words and can capture inter-word semantic information.</p>
<p>Word embedding is the holy grail in NLP and language technology, serving as the foundation for advanced language representation models such as GPT (Generative Pre-trained Transformer).</p>
<p>There is also sentence embedding that represents sentences in a lower-dimension vector representation.</p>
<h3 id="heading-how-do-we-measure-if-two-vectors-are-similar">How do we measure if two vectors are similar?</h3>
<p>This is where cosine similarity comes into play. Cosine similarity is a mathematical technique that we use to know how similar two vectors are to each other.</p>
<p>In NLP, it usually outputs a value between 0 to 1. A value close to 1 means that the two vectors are highly similar.</p>
<p>For example, to understand how cosine similarity works, let’s create a word embedding vector representation for three words: Man, Woman, and Cat. Then we’ll use cosine similarity to figure out which vectors are similar.</p>
<p>Based on our own instincts, we know that Man should be closer to Woman than Cat. So, let’s use NLP to help us validate this.</p>
<p>Thanks to advancements in NLP, there are numerous models we can use to create word embeddings, which you can find on the Hugging Face repository. In this article, we are going to use the ⁣<code>all-mpnet-base-v2</code> model from the ⁣<code>SentenceTransformer</code> library. According to ⁣<code>SentenceTransformer</code>, it provides the best quality performance in terms of sentence embedding, and you can also use it to create word embeddings.</p>
<p>The below code allows us to validate our claim using NLP. So, firstly, we initialize the <code>SentenceTransformer</code> with <code>all-mpnet-base-v2</code> and then use the encode method to get the embedding of each word. Then, finally, we’ll use the <code>cos_sim</code> class, also from <code>SentenceTransformer</code>, to determine which vectors are similar.</p>
<pre><code class="lang-python"><span class="hljs-comment"># import library</span>
<span class="hljs-keyword">from</span> sentence_transformers <span class="hljs-keyword">import</span> SentenceTransformer <span class="hljs-comment"># sentence transformer</span>
<span class="hljs-keyword">from</span> sentence_transformers.util <span class="hljs-keyword">import</span> cos_sim <span class="hljs-comment"># cosine similarity</span>

<span class="hljs-comment"># initialize sentence transformer with the 'all-mpnet-base-v2' model</span>
model = SentenceTransformer(<span class="hljs-string">"all-mpnet-base-v2"</span>)
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment"># get the embedding vector of the man, woman, and cat words.</span>
man_vector = model.encode(<span class="hljs-string">"man"</span>)
woman_vector = model.encode(<span class="hljs-string">"woman"</span>)
cat_vector = model.encode(<span class="hljs-string">"cat"</span>)

<span class="hljs-comment"># get the similarity between man and woman</span>
similarity = cos_sim(man_vector, woman_vector)

<span class="hljs-comment"># get the similarity between man and cat</span>
cat_similarity = cos_sim(man_vector, cat_vector)

print(<span class="hljs-string">"The Similarity between Man vector and Woman Vector:"</span>, similarity, <span class="hljs-string">"\n"</span>)

print(<span class="hljs-string">"The Similarity between Man vector and Cat Vector:"</span>, cat_similarity)
</code></pre>
<p>// Result</p>
<pre><code class="lang-plaintext">The Similarity between Man vector and Woman Vector: tensor([[0.3501]]) 

The Similarity between Man vector and Cat Vector: tensor([[0.2553]])
</code></pre>
<p>As you can see, the similarity score between man and woman (0.35) is higher than that of man and cat (0.26). This shows the beauty of word embedding and cosine similarity together.</p>
<p>Now let’s get back to our business.</p>
<h2 id="heading-how-to-perform-semantic-matching-on-a-pdf-document">How to Perform Semantic Matching on a PDF Document</h2>
<p>Now we are going to use semantic matching to look for a word or phrase in the document that matches the <strong>birth control</strong> phrase.</p>
<h3 id="heading-how-to-get-words-from-the-pdf-using-keybert">How to Get Words from the PDF using KeyBERT</h3>
<p>Word embedding generates embeddings for individual words. Our PDF document contains a <strong>large volume of textual components</strong>, including digits, special characters, symbols, stopwords, and the actual words we want to match. So, to save time on preprocessing, we are going to utilize <code>KeyBERT</code>. This is a library that allows us to get meaningful keywords (words or phrases) from a particular document in a minimal way.</p>
<p>Keep in mind that by default, <code>KeyBERT</code> extracts single keywords – but we can also tell it to extract phrases with two or more words. We’ll use it here to extract single-word and 2-word phrases. Below is the implementation of using <code>KeyBERT</code> to extract keywords from our document:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> keybert <span class="hljs-keyword">import</span> KeyBERT
<span class="hljs-comment"># initialize model</span>
keybert_model =  KeyBERT()

<span class="hljs-comment"># extract all keywords (single word and 2 word phrase) from the pdf</span>
all_keywords = keybert_model.extract_keywords(docs=pdf_document, top_n=<span class="hljs-number">-1</span>, keyphrase_ngram_range=(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>))
<span class="hljs-comment"># print length of keywords extracted                                             </span>
print(len(all_keywords))
<span class="hljs-comment"># show the first 5 keywords</span>
print(all_keywords[:<span class="hljs-number">5</span>])
</code></pre>
<p>The above code imports <code>KeyBERT</code> from the <code>keybert</code> library. It then initializes <code>KeyBERT</code>, and extracts all keywords (that is, single word and 2-word phrases) from the document. Then the next line prints the number of keywords extracted. Lastly, the code prints the first five 5 keywords out of all the keywords extracted from the PDF.</p>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python"><span class="hljs-number">8669</span>
[(<span class="hljs-string">'education guidance'</span>, <span class="hljs-number">0.5954</span>),
 (<span class="hljs-string">'schools guidance'</span>, <span class="hljs-number">0.5542</span>),
 (<span class="hljs-string">'education policies'</span>, <span class="hljs-number">0.5405</span>),
 (<span class="hljs-string">'sex education'</span>, <span class="hljs-number">0.5228</span>),
 (<span class="hljs-string">'education safeguarding'</span>, <span class="hljs-number">0.5001</span>)]
</code></pre>
<p>As you can see above, KeyBERT extracted 8,669 keywords from the PDF. Also, the <code>KeyBERT</code> model usually returns the keywords extracted along with a score of each word. We don’t need the score, so we will only extract each keyword from the tuple it is enclosed in.</p>
<pre><code class="lang-python"><span class="hljs-comment"># remove score from each keyword</span>

all_keywords = [keyword[<span class="hljs-number">0</span>] <span class="hljs-keyword">for</span> keyword <span class="hljs-keyword">in</span> all_keywords]
all_keywords[:<span class="hljs-number">5</span>]
</code></pre>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python">[<span class="hljs-string">'education guidance'</span>,
 <span class="hljs-string">'schools guidance'</span>,
 <span class="hljs-string">'education policies'</span>,
 <span class="hljs-string">'sex education'</span>,
 <span class="hljs-string">'education safeguarding'</span>]
</code></pre>
<h3 id="heading-embedding-of-the-birth-control-phrase-and-the-keywords-extracted-from-the-pdf">Embedding of the Birth Control Phrase and the Keywords Extracted from the PDF</h3>
<p>Now that we’ve extracted these keywords from the document, the next step is to get the embedding of our phrase and the keywords from the document.</p>
<p>The below code lets us do this:</p>
<pre><code class="lang-python"><span class="hljs-comment"># initialize sentence transformer with the 'all-mpnet-base-v2' model</span>
model = SentenceTransformer(<span class="hljs-string">"all-mpnet-base-v2"</span>)

<span class="hljs-comment"># get the embedding of the 'birth control' phrase</span>
birth_control_embedding = model.encode(<span class="hljs-string">"birth control"</span>)

<span class="hljs-comment"># get the embedding of all the keywords in the document</span>
keywords_embedding =  model.encode(all_keywords)
</code></pre>
<h3 id="heading-cosine-similarity-of-birth-control-phrase-and-keywords-in-pdf">Cosine Similarity of Birth Control Phrase and Keywords in PDF</h3>
<p>After getting the embedding of the phrase and the keywords, the next step is to get the similarity score of the phrase and the keywords. This will help us know which keyword in the document is highly similar to the phrase.</p>
<p>The below code allows us to get the cosine similarity of the phrase and the keywords’ embedding vector.</p>
<pre><code class="lang-python"><span class="hljs-comment"># calculate the cosine similarity of the birth control word and each word in the document</span>
cosine_similarity_result = cos_sim(birth_control_embedding, keywords_embedding)
<span class="hljs-comment"># print the shape (equal to the number of keywords)</span>
print(cosine_similarity_result.shape)
<span class="hljs-comment"># show the top 5 similarities</span>
print(cosine_similarity_result[:<span class="hljs-number">5</span>])
</code></pre>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python">torch.Size([<span class="hljs-number">1</span>, <span class="hljs-number">2034</span>])
tensor([[<span class="hljs-number">0.2166</span>, <span class="hljs-number">0.1977</span>, <span class="hljs-number">0.0998</span>,  ..., <span class="hljs-number">0.1634</span>, <span class="hljs-number">0.1082</span>, <span class="hljs-number">0.2194</span>]])
</code></pre>
<p>Now that we have the similarity score of the phrase and the keywords, the total size of the resulting tensor will be the number of keywords, as shown above. Then we can use the <code>argmax()</code> method to get the index of the element of the tensor with the highest score. This index will help us filter out the particular keyword in the <code>all_keywords</code> list variable. The below code achieves this:</p>
<pre><code class="lang-python"><span class="hljs-comment"># return the index number of the high similarity score</span>
index = cosine_similarity_result.argmax()
print(index)
</code></pre>
<p>Below is the output of the above code. It tells us that the keyword with the highest similarity to the <strong>Birth Control phrase</strong> is at index 1490.</p>
<pre><code class="lang-python">tensor(<span class="hljs-number">1490</span>)
</code></pre>
<p>Now, let’s look at the keyword at index 1490 in the <code>all_keywords</code> variable.</p>
<pre><code class="lang-python"><span class="hljs-comment"># print the keyword at index 1490 </span>
print(all_keywords[index])
</code></pre>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python">contraceptive
</code></pre>
<p>After examining it, we found that "contraceptive" was the word with the highest similarity, which makes sense because "birth control" and "contraceptive" mean the same thing. This demonstrates the elegance of semantic matching in finding similar words.</p>
<h3 id="heading-lets-also-explore-top-5-keywords-in-the-pdf-that-match-with-the-phrase-birth-control">Let’s Also Explore Top 5 Keywords in the PDF that Match with the Phrase “Birth Control”</h3>
<p>Let’s explore the 5 top keywords with the highest similarity score to “birth control” to see what the result would look like.</p>
<p>To do that, we can use the <code>topk()</code> method to get the top 5 indices. Then we can then loop through these indices to get the actual keywords:</p>
<pre><code class="lang-python"><span class="hljs-comment"># extract the top 5 indices</span>
top_5_indices = cosine_similarity_result.topk(<span class="hljs-number">5</span>)[<span class="hljs-number">1</span>].tolist()[<span class="hljs-number">0</span>]

print(top_5_indices)
</code></pre>
<p>Below is the result of the above code:</p>
<pre><code class="lang-python">[<span class="hljs-number">1490</span>, <span class="hljs-number">1972</span>, <span class="hljs-number">871</span>, <span class="hljs-number">1199</span>, <span class="hljs-number">1944</span>]
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment"># get top 5 keywords</span>
top_5_keywords = [all_keywords[index] <span class="hljs-keyword">for</span> index <span class="hljs-keyword">in</span> top_5_indices]
print(top_5_keywords)
</code></pre>
<p>Below is the output of the above code:</p>
<pre><code class="lang-python">[<span class="hljs-string">'contraceptive'</span>, <span class="hljs-string">'contraception'</span>, <span class="hljs-string">'contraceptive choices'</span>, <span class="hljs-string">'range contraceptive'</span>, <span class="hljs-string">'cover contraception'</span>]
</code></pre>
<p>There, we can see that the top five results relate to contraception and contraceptives. This demonstrates that semantic matching is an effective way to find related elements in a document.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>In this article, you learned what semantic matching is and its advantages compared to traditional NLP search methods. You also encountered concepts such as word embeddings and cosine similarity and learned how they help us perform semantic matching. Then we implemented semantic matching by finding a phrase in a document.</p>
<p>Thank you for reading this article, and I will see you in the next one.</p>
<h3 id="heading-references">References</h3>
<ol>
<li><p><a target="_blank" href="https://sbert.net/">https://sbert.net/</a></p>
</li>
<li><p><a target="_blank" href="https://maartengr.github.io/KeyBERT/guides/quickstart.html">https://maartengr.github.io/KeyBERT/guides/quickstart.html</a></p>
</li>
<li><p><a target="_blank" href="https://huggingface.co/spaces/mteb/leaderboard">https://huggingface.co/spaces/mteb/leaderboard</a></p>
</li>
</ol>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ BERT Explained – The Key to Advanced Language Models ]]>
                </title>
                <description>
                    <![CDATA[ Have you ever wondered how Google seems to understand exactly what you mean, even when your search terms are a bit off? Or how your favorite voice assistant can comprehend complex questions? The secret behind much of this smart technology is a powerf... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/bert-explained-the-key-to-advanced-language-models/</link>
                <guid isPermaLink="false">66d035b564be048ac359a2fd</guid>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Mon, 04 Mar 2024 12:06:57 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/02/1_cQCu_BIAJyTw8d5G_2Igzg.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Have you ever wondered how Google seems to understand exactly what you mean, even when your search terms are a bit off?</p>
<p>Or how your favorite voice assistant can comprehend complex questions?</p>
<p>The secret behind much of this smart technology is a powerful tool called BERT.</p>
<p>In this article, I’ll break down what BERT is, why it’s a game-changer in the world of natural language processing (NLP), and how you can get started with a simple code example.</p>
<h2 id="heading-what-is-bert">What is BERT?</h2>
<p>BERT stands for Bidirectional Encoder Representations from Transformers. It is an advanced method developed by Google for natural language processing (NLP).</p>
<p>It represents a shift in how computers understand human language.</p>
<p>Imagine you’re trying to understand a sentence with a word that has multiple meanings. For example, the word “bank” could refer to the side of a river or a financial institution. This is where BERT shines.</p>
<p>Instead of just looking at the words before it, like we usually read, BERT looks at the words before and after that word at the same time.</p>
<p>This way, the model gets a fuller picture of what the word means based on the entire sentence, not just part of it. It’s like having a conversation with someone who listens to everything you say before and after a question before answering it.</p>
<p>This bidirectional approach allows BERT to grasp the nuanced meanings of words within their specific context, leading to more accurate interpretations of text.</p>
<p>BERT supports many of the recent improvements in search engines, language translation services, and conversational AI.</p>
<h2 id="heading-why-bert-matters">Why BERT Matters</h2>
<p>BERT excels at understanding the context, helping computers grasp the meaning of ambiguous language.</p>
<p>This has huge implications for improving search engines, translation services, and even generating text that feels more natural to humans.</p>
<ul>
<li><strong>Understand</strong>s c<strong>ontext</strong>: BERT’s ability to understand the context of words in a sentence from both directions leads to more accurate interpretations of the meaning of text, which is crucial for understanding human language.</li>
<li><strong>Improv</strong>es s<strong>earch</strong> e<strong>ngines</strong>: BERT has been used to enhance search engine algorithms, allowing them to better understand the intent behind users’ queries. This means that search results are more relevant and useful to what people are looking for.</li>
<li><strong>Enhanc</strong>es l<strong>anguage-</strong>b<strong>ased</strong> a<strong>pplications</strong>: Applications like language translation, question-answering systems, and virtual assistants benefit significantly from BERT. They become more accurate and conversational, improving user experience and making technology more accessible.</li>
<li><strong>Handl</strong>es a<strong>mbiguity in</strong> l<strong>anguage</strong>: BERT’s deep understanding of context helps it deal with ambiguity in language, distinguishing between different meanings of the same word based on context. This is crucial for accurate language interpretation and translation.</li>
<li><strong>Advanc</strong>es <strong>AI</strong> r<strong>esearch</strong>: BERT represents a significant step forward in machine learning and AI research, pushing the boundaries of what’s possible in understanding and generating human-like text. It opens up new possibilities for AI applications and has set a new standard in the field of NLP.</li>
</ul>
<p>Overall, BERT matters because it represents a leap forward in how machines understand and interact with human language, making technology more intuitive and effective in processing and generating text.</p>
<h2 id="heading-how-bert-works">How BERT Works</h2>
<p><img src="https://miro.medium.com/v2/resize:fit:1050/0*mJctRJFhAipb58Ck" alt="Image" width="1578" height="949" loading="lazy">
<em>Bert Architecture</em></p>
<p>BERT makes use of a <a target="_blank" href="https://towardsdatascience.com/transformers-141e32e69591">transformer</a>, an attention mechanism that learns contextual relations between words (or sub-words) in a text.</p>
<p>In its base form, a transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task. However, BERT only uses the encoder mechanism.</p>
<p>By adopting this approach, BERT models can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial modifications to the underlying model.</p>
<h2 id="heading-how-to-work-with-bert">How to Work with BERT</h2>
<p>Let’s build a simple sentiment analyzer using BERT. We will be using the Huggingface Transformer’s library to use a pre-trained model of BERT and use it to build a sentiment analyzer:</p>
<pre><code> <span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

# Load BERT model <span class="hljs-keyword">for</span> text classification
classifier = pipeline(<span class="hljs-string">"sentiment-analysis"</span>, model=<span class="hljs-string">"bert-base-uncased"</span>)

# Define input text
text = <span class="hljs-string">"It was a fantastic movie and I loved it!"</span>

# Perform sentiment analysis
result = classifier(text)

# <span class="hljs-built_in">Map</span> output label to human-readable sentiment
<span class="hljs-keyword">if</span> result[<span class="hljs-number">0</span>][<span class="hljs-string">'label'</span>] == <span class="hljs-string">'LABEL_1'</span>:
    sentiment_label = <span class="hljs-string">'Positive'</span>
<span class="hljs-attr">else</span>:
    sentiment_label = <span class="hljs-string">'Negative'</span>

# Print result
print(<span class="hljs-string">"Sentiment:"</span>, sentiment_label)
print(<span class="hljs-string">"Score:"</span>, result[<span class="hljs-number">0</span>][<span class="hljs-string">'score'</span>])
</code></pre><p>Let’s look at what this code does:</p>
<ul>
<li>First, we import the required modules from the <code>transformers</code> library.</li>
<li>Next, we load the pre-trained model and tokenizer. We specify a model name (<code>bert-base-uncased</code>) that represents a BERT model. The tokenizer is loaded to preprocess text in the way BERT expects (for example, converting text to lowercase).</li>
<li>Next, we create a sentiment analysis pipeline from the transformers library. The <code>pipeline</code> function from Hugging Face Transformers abstracts away much of the manual work of preprocessing and applying the model. We specify <code>sentiment analysis</code>as the task to automatically handle tokenization, model inference, and output interpretation.</li>
<li>We then give it an input, which in our case is a sentence to analyze the sentiment of. You can replace this with any text you want to analyze.</li>
<li>Next, the example text is sent to the pipeline to get the sentiment analysis results.</li>
<li>Finally, we print the sentiment of along with the confidence score (how confident the model is about the result).</li>
</ul>
<p>Here is the output for the above code:</p>
<pre><code>Sentiment: Negative
<span class="hljs-attr">Score</span>: <span class="hljs-number">0.5871706604957581</span>
</code></pre><p>The <code>pipeline</code> function makes it very straightforward to apply pre-trained models to specific tasks, including sentiment analysis. The label and score give you a quick understanding of the model's sentiment prediction and its confidence in that prediction, respectively.</p>
<p>This example provides a basic understanding of how you can use BERT for a sentiment analysis task. The model takes in a sentence, processes it to understand its context, and predicts its sentiment as either positive or negative.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>BERT represents a significant leap forward in the ability of machines to understand and interact with human language. Its bidirectional training and context-aware capabilities enable a wide range of applications, from enhancing search engine results to creating more powerful chatbots.</p>
<p>By experimenting with BERT and other NLP models, you can begin to explore the vast potential of language understanding technologies. Whether you’re a seasoned developer or just starting, the world of NLP offers endless opportunities for innovation and improvement.</p>
<p>Remember, this example is just the beginning. As you dive deeper into BERT and NLP, you’ll discover more complex and powerful ways to use these tools. Happy coding!</p>
<p>Hope you enjoyed this article. <a target="_blank" href="https://www.turingtalks.ai/">Visit turingtalks.ai</a> for daily byte-sized AI tutorials.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build A Text Summarizer Using Huggingface Transformers ]]>
                </title>
                <description>
                    <![CDATA[ In today’s fast-paced world, we’re bombarded with information. It’s like trying to drink water from a fire hose. That's where a text summarizer comes in. Imagine it as a filter that separates the essential bits from the overwhelming flood of words. I... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-text-summarizer-using-huggingface-transformers/</link>
                <guid isPermaLink="false">66d035dbba54db009200dc87</guid>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 28 Feb 2024 10:13:47 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/02/summary-1.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In today’s fast-paced world, we’re bombarded with information. It’s like trying to drink water from a fire hose.</p>
<p>That's where a text summarizer comes in. Imagine it as a filter that separates the essential bits from the overwhelming flood of words.</p>
<p>In this article, I'll walk you through what a summarizer is, its use cases, what Hugging Face Transformers are, and how you can build your own text summarizer using Hugging Face Transformers. Let's dive in.</p>
<h2 id="heading-what-is-a-summarizer">What is a Summarizer?</h2>
<p>A summarizer does exactly what its name suggests. It takes a large block of text and condenses it into a shorter version.</p>
<p>This shorter version keeps only the key points. Think of it as the difference between reading a whole novel and glancing at its back cover. The aim is to save time while still getting the essence of the content.</p>
<h2 id="heading-use-cases-for-a-summarizer">Use Cases for a Summarizer</h2>
<p>Summarizers are not just cool tech tricks. They serve real-world needs.</p>
<p>Journalists use them to quickly sift through reports and studies. Students use them to summarize lengthy readings. Businesses use them to condense market analyses or lengthy reports.</p>
<p>In essence, anyone who needs to process large amounts of text quickly can benefit from a summarizer.</p>
<h2 id="heading-what-are-hugging-face-transformers">What are Hugging Face Transformers?</h2>
<p><a target="_blank" href="https://huggingface.co/">Hugging Face</a> is a company that has created a state-of-the-art platform for natural language processing (NLP).</p>
<p>Their Transformers library is like a treasure trove for NLP tasks. It includes pre-trained models that can do everything from translation and sentiment analysis, to yes, summarization.</p>
<p>These models have learned from vast amounts of text and can understand and generate language in a surprisingly human-like way.</p>
<h2 id="heading-how-to-build-a-summarizer-with-hugging-face-transformers">How to Build a Summarizer with Hugging Face Transformers</h2>
<p>Now, let's roll up our sleeves and start building. We will use the Huggingface pipeline to implement our summarization model using <a target="_blank" href="https://huggingface.co/facebook/bart-large">Facebook’s Bart model</a>.</p>
<p>The BART model is pre-trained in the English language. It is a sequence-to-sequence model and is great for text generation (such as summarization and translation). It also works well for comprehension tasks (for example, text classification and question answering).</p>
<p><a target="_blank" href="https://huggingface.co/docs/transformers/en/main_classes/pipelines">Hugging Face Pipelines</a> offers a simpler approach to implementing various tasks. Instead of preparing a dataset, training it with the model and then using it, pipeline simplifies the code because it hides away the need for manual tokenization and model customisation.</p>
<h3 id="heading-how-to-set-up-your-environment">How to set up your environment</h3>
<p>First, you need to set up your coding environment. I prefer to use a Google Collab notebook instead of installing it on your local machine. <a target="_blank" href="https://colab.research.google.com/drive/1Urxh0anruXP6HTbmi5B5TM0UuK3pQHzI?usp=sharing">Here is the notebook for this tutorial.</a></p>
<p>Let’s start by installing the transformer library. Use a ! before the command if you are running it in a collab notebook:</p>
<pre><code>pip install transformers
</code></pre><p>Now let’s initialize a text summarization pipeline using the Hugging Face <code>transformers</code> library:</p>
<pre><code>summarizer = pipeline(<span class="hljs-string">"summarization"</span>, model=<span class="hljs-string">"facebook/bart-large-cnn"</span>)
</code></pre><p>Let's break down what each part does:</p>
<ul>
<li><code>pipeline</code>: This is a function provided by the Hugging Face <code>transformers</code> library to make it easy to apply different types of Natural Language Processing (NLP) tasks, such as text classification, translation, summarization, and so on. The function returns a ready-to-use pipeline object for the specified task.</li>
<li><code>"summarization"</code>: This is the first argument to the <code>pipeline</code> function and specifies the type of task you want the pipeline to perform. In this case, <code>"summarization"</code> means that the pipeline will be configured to summarize text.</li>
<li><code>model="facebook/bart-large-cnn"</code>: This argument specifies the pre-trained model to be used for the summarization task. Here, <code>"facebook/bart-large-cnn"</code> refers to a specific model that has been trained on a large dataset to perform text summarization. This model is provided by Facebook and is based on the BART (Bidirectional and Auto-Regressive Transformers) architecture, which is effective for tasks that require understanding and generating natural language. The <code>large-cnn</code> part indicates that this particular model variant is optimized for summarization tasks similar to those tackled by traditional CNN news-style summaries.</li>
</ul>
<p>When this line of code is executed, it creates a <code>summarizer</code> object. This object can then be used to perform text summarization by passing text data to it. The model will generate a shorter version of the input text, capturing the most important or relevant information, according to its training on the summarization task.</p>
<p>Now we are ready to use the model to summarize our text (yeah, really!). Let's use an <a target="_blank" href="https://www.fda.gov/drugs/generic-drugs/office-generic-drugs-2023-annual-report">annual report from FDA</a> and use it as input to get our summary.</p>
<pre><code>text = <span class="hljs-string">""</span><span class="hljs-string">"In 2023 generic drugs continued to play a critical role in the U.S. health care system allowing patients greater access to needed medicines. Generic drugs are generally lower cost than their brand-name equivalent and the approval of generic drugs often means multiple manufacturers for generic medicines, which can help stabilize the supply chain and reduce drug shortage risks.

The mission of the Office of Generic Drugs is to ensure high-quality, safe, and effective generic medicines are available to the American public. Our 2023 Annual Report provides highlights of activities and accomplishments including generic drug approvals, first generic approvals, science and research innovations for generic medicines – including complex generics, and international collaboration, as well as how we are doing on agreements made under the third iteration of the Generic Drug User Fee Amendments."</span><span class="hljs-string">""</span>
</code></pre><p>Now let’s use this text as input and call our summarizer:</p>
<pre><code>summary = summarizer(text, max_length=<span class="hljs-number">150</span>, min_length=<span class="hljs-number">40</span>, do_sample=False)
</code></pre><p>This line of code is using the <code>summarizer</code> object created from a Hugging Face <code>pipeline</code> to generate a summary of the input text. Here's a breakdown of the function call and its parameters:</p>
<ul>
<li><code>summarizer</code>: This is the object initialized previously with the <code>pipeline</code> function.</li>
<li><code>text</code>: This is the input text that you want to summarize.</li>
<li><code>max_length=150</code>: This parameter specifies the maximum length of the summary in terms of the number of tokens (words and punctuation marks).</li>
<li><code>min_length=40</code>: Similarly, this parameter sets the minimum length of the summary.</li>
</ul>
<p>Finally, we will print our summary:</p>
<pre><code>print(summary[<span class="hljs-number">0</span>][<span class="hljs-string">'summary_text'</span>])
</code></pre><p>And here is the response:</p>
<pre><code>In <span class="hljs-number">2023</span> generic drugs continued to play a critical role <span class="hljs-keyword">in</span> the U.S. health care system allowing patients greater access to needed medicines. Generic drugs are generally lower cost than their brand-name equivalent and the approval <span class="hljs-keyword">of</span> generic drugs often means multiple manufacturers <span class="hljs-keyword">for</span> generic medicines.
</code></pre><p>In short, this code:</p>
<ul>
<li>loads a summarization pipeline that is pre-configured to use the <code>facebook/bart-large-cnn</code> model.</li>
<li>feeds the text to the summarizer.</li>
<li>outputs a summary with specified minimum and maximum lengths.</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building a text summarizer with Hugging Face Transformers is not just about playing with cool technology. It's about harnessing the power of AI to make our lives a bit easier. Whether you're a student, a professional, or just someone curious about NLP, the ability to quickly condense information is invaluable.</p>
<p>With Hugging Face Transformers, you're standing on the shoulders of giants, leveraging some of the most advanced NLP models available today. So, give it a try. Who knows? It might just change the way you deal with text forever.</p>
<p>Thanks for reading this article. Find more AI tutorials at <a target="_blank" href="https://www.turingtalks.ai/">TuringTalks.ai</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use the Hugging Face Transformer Library ]]>
                </title>
                <description>
                    <![CDATA[ In this article, I'll talk about why I think the Hugging Face’s Transformer Library is a game-changer in NLP for developers and researchers alike. Have you ever wondered how modern AI achieves such remarkable feats, like understanding human language ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/hugging-face-transformer-library-overview/</link>
                <guid isPermaLink="false">66d035f915ea3036a953992e</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 31 Jan 2024 00:36:42 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/01/hugging-face.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this article, I'll talk about why I think the Hugging Face’s Transformer Library is a game-changer in NLP for developers and researchers alike.</p>
<p>Have you ever wondered how modern AI achieves such remarkable feats, like understanding human language or generating text that sounds like it was written by a person?</p>
<p>A significant part of this magic stems from a groundbreaking model called <a target="_blank" href="https://blogs.nvidia.com/blog/what-is-a-transformer-model/">the Transformer</a>. Many frameworks released into the Natural Language Processing (NLP) space are based on the Transformer model, and an important one is the <a target="_blank" href="https://huggingface.co/docs/transformers/index">Hugging Face Transformer Library</a>.</p>
<p>In this article, I’ll walk you through why this library is not just another piece of software, but a powerful tool for engineers and researchers alike. Then you'll see a practical example of how to use it.</p>
<h2 id="heading-what-is-the-hugging-face-transformer-library">What is the Hugging Face Transformer Library?</h2>
<p>The Hugging Face Transformer Library is an open-source library that provides a vast array of pre-trained models primarily focused on NLP. It’s built on PyTorch and TensorFlow, making it incredibly versatile and powerful.</p>
<p>One of the first reasons the Hugging Face library stands out is its remarkable user-friendliness. Even if you’re not a deep learning expert, you can use this library with relative ease.</p>
<p>It offers straightforward interfaces that allow you to implement complex models with just a few lines of code. This simplicity opens the doors of advanced AI to a broader range of developers and researchers.</p>
<h2 id="heading-pre-trained-and-ready-to-go">Pre-Trained and Ready to Go</h2>
<p>The beauty of today’s deep learning models is that you don't have to train a model from scratch. Most models are pre-trained and your job as an AI engineer will be to train a model using custom data.</p>
<p>So imagine having access to a toolbox where each tool is tailored for a specific job. That’s what Hugging Face offers with its wide range of pre-trained models.</p>
<p>Whether you’re working on text classification, question answering, or language generation, there’s a model ready for you to use. This saves an enormous amount of time and resources as you don’t have to start from scratch.</p>
<p>While pre-trained models are fantastic, they might not fit every specific need. This is where Hugging Face truly shines. The library allows you to fine-tune models on your dataset, making it possible to customize the models to your specific requirements.</p>
<h2 id="heading-community-support">Community Support</h2>
<p>What sets Hugging Face apart is not just its technical capabilities but also its vibrant community. By engaging with this community, you gain access to a wealth of knowledge and support.</p>
<p>Users continuously contribute to the library, adding new models and features, making it a living, evolving ecosystem. This collaborative spirit ensures that the library stays at the cutting edge of AI research and application.</p>
<h2 id="heading-performance-and-scalability">Performance and Scalability</h2>
<p>In the world of AI, performance is key, and the Hugging Face library doesn’t disappoint. It’s designed to handle large-scale models efficiently, which means you can work with some of the most advanced AI models without needing a supercomputer at your disposal.</p>
<p>Hugging Face is also not just about English. It supports multiple languages, which is essential for organizations and developers aiming to create AI applications for a diverse user base.</p>
<h2 id="heading-popular-hugging-face-models">Popular Hugging Face Models</h2>
<ol>
<li><a target="_blank" href="https://huggingface.co/docs/transformers/model_doc/bert"><strong>BERT (Bidirectional Encoder Representations from Transformers)</strong></a><strong>:</strong> BERT excels in understanding the context of a word in a sentence, making it effective for tasks like sentiment analysis, question-answering, and language understanding. It’s widely used in chatbots, search engines, and to enhance user interaction with AI systems.</li>
<li><a target="_blank" href="https://huggingface.co/gpt2"><strong>GPT (Generative Pretrained Transformer)</strong></a><strong>:</strong> Known for its ability to generate human-like text, GPT is used for creative writing, generating conversational responses, and even writing code. It’s particularly popular in chatbots, automated content creation tools, and customer service applications.</li>
<li><a target="_blank" href="https://huggingface.co/docs/transformers/model_doc/distilbert"><strong>DistilBERT</strong></a>: A streamlined version of BERT, DistilBERT offers similar capabilities but is faster and requires less computational power. It’s ideal for environments where resources are limited, like mobile applications, and is used in tasks like text classification and information extraction.</li>
<li><a target="_blank" href="https://huggingface.co/docs/transformers/model_doc/roberta"><strong>RoBERTa (Robustly Optimized BERT Approach)</strong></a>: An optimized version of BERT, RoBERTa is trained on a larger dataset and for a longer time, leading to improved performance. It’s used in more complex NLP tasks like sentiment analysis, language inference, and text classification.</li>
<li><a target="_blank" href="https://huggingface.co/docs/transformers/model_doc/t5"><strong>T5 (Text-To-Text Transfer Transformer)</strong></a>: T5 converts all NLP problems into a text-to-text format, providing a versatile approach to tasks like translation, summarization, and question answering. Its adaptability makes it valuable in diverse applications, from automated translation services to information summarization tools.</li>
</ol>
<p>Each of these models has its unique strengths, and you should choose them based on the specific requirements of your tasks. Make sure to balance factors like computational resources, complexity of the task, and the desired level of performance.</p>
<h2 id="heading-how-to-use-the-hugging-face-transformers-library">How to Use the Hugging Face Transformers Library</h2>
<p>Let me show you how easy it is to work with the Hugging Face Transformers library. We will implement a simple summarization script that takes in a large text and returns a short summary.</p>
<p>We will first import <code>pipeline</code> from the transformers library. In Hugging Face, a “pipeline” is like a tool that helps you perform a series of steps to change data into the form you want. </p>
<pre><code><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
</code></pre><p>The pipeline makes it simple to use these tools for different jobs, without needing to know all the complex details about how these tools work on the inside. For this example, we will use the "summarization" pipeline. </p>
<pre><code>summarizer = pipeline(<span class="hljs-string">"summarization"</span>)
</code></pre><p>And we are now ready to start using the summarization pipeline. Let's pass in a long chunk of text and see what the response is. </p>
<pre><code>text = <span class="hljs-string">""</span><span class="hljs-string">"
The development of the internet has been one of the most transformative events in human history, altering virtually every aspect of modern life. Initially conceived as a military and academic network in the late 1960s, the internet evolved rapidly through the 1970s and 1980s, expanding its reach and capabilities with each passing year. The introduction of the World Wide Web in the early 1990s was a critical moment, making the internet much more accessible and user-friendly, sparking a global revolution in communication, business, and entertainment. As a tool for information dissemination, the internet has been unparalleled, allowing instant access to vast amounts of data from all over the world. It has democratized information, breaking down barriers that once existed due to geography or social status. The internet has also had a profound impact on commerce, giving rise to e-commerce and transforming traditional business models. The ease of online shopping and the rise of digital marketplaces have reshaped consumer habits and expectations. Socially and culturally, the internet has connected people across the globe, facilitating the exchange of ideas and cultures in a way that was previously unimaginable. However, it has also raised concerns about privacy, data security, and the digital divide. The rapid dissemination of information has sometimes led to the spread of misinformation, posing challenges for societies in discerning truth from falsehood. As the internet continues to evolve, it poses new challenges and opportunities, shaping the future of human interaction, governance, and technology.
"</span><span class="hljs-string">""</span>

summary = summarizer(text)
print(summary[<span class="hljs-number">0</span>][<span class="hljs-string">'summary_text'</span>])
</code></pre><p>Here is a sample response:</p>
<pre><code> The introduction <span class="hljs-keyword">of</span> the internet <span class="hljs-keyword">in</span> the <span class="hljs-number">1970</span>s and <span class="hljs-number">1980</span>s was a major event <span class="hljs-keyword">for</span> the world<span class="hljs-string">'s first time . As a result, the internet has been able to connect people across the globe . The internet has also raised concerns about privacy and security in the digital age of 21.</span>
</code></pre><p>That's how easy it is to work with the Hugging Face Transformers library. </p>
<h2 id="heading-ethical-ai-and-transparency-a-step-towards-responsible-ai">Ethical AI and Transparency: A Step Towards Responsible AI</h2>
<p>Since AI ethics are increasingly under the spotlight, Hugging Face commits to transparency and responsible AI development. The open-source nature of the library promotes a level of transparency that’s essential for ethical AI development. Users can see exactly how models are built and make informed decisions about their use.</p>
<p>AI is a field that never stands still, and neither does the Hugging Face Transformer Library. It’s continuously updated with the latest breakthroughs in AI research. This means that when you use Hugging Face, you’re always at the forefront of AI technology.</p>
<p>Finally, the real test of any tool is its applications in the real world, and here, Hugging Face excels. It’s used by academics for cutting-edge research and by companies for practical applications like sentiment analysis, content generation, and language translation.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In summary, the Hugging Face Transformer Library is more than just a collection of AI models. It’s a gateway to advanced AI for people of all skill levels. Its ease of use and the availability of a comprehensive range of models make it a standout library in the world of AI.</p>
<p>Whether you’re a seasoned AI expert or just starting, the Hugging Face library is a useful resource that can help you achieve your AI goals.</p>
<p>Hope you enjoyed this article. Find more beginner-friendly tutorials on AI at <strong><a target="_blank" href="https://www.turingtalks.ai/">turingtalks.ai</a>.</strong> </p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Simple Sentiment Analyzer Using Hugging Face Transformer ]]>
                </title>
                <description>
                    <![CDATA[ In this article, we will look at writing a sentiment analyzer using Hugging Face Transformer, a powerful tool in the world of NLP.  Imagine you’re running a business and you want to know what your customers think about your product. Or maybe you’re a... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-a-simple-sentiment-analyzer-using-hugging-face-transformer/</link>
                <guid isPermaLink="false">66d035d812c679876b0602de</guid>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Sentiment analysis ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Fri, 26 Jan 2024 00:32:04 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/01/pngtree-facial-emotions-illustration-in-black-outline-on-white-background-vector-picture-image_10574137.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this article, we will look at writing a sentiment analyzer using Hugging Face Transformer, a powerful tool in the world of NLP. </p>
<p>Imagine you’re running a business and you want to know what your customers think about your product. Or maybe you’re a movie director wanting to gauge the public reaction to your latest release.</p>
<p>This is where sentiment analysis comes into play.</p>
<blockquote>
<p>Sentiment analysis is a technique used in text analysis that helps in identifying and categorizing opinions expressed in a piece of text.</p>
</blockquote>
<p>Sentiment analysis determines whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral.</p>
<p>In a world where data is king, sentiment analysis is a crown jewel. It’s like having a superpower to understand the emotional tone behind words at scale.</p>
<p>Companies use it to understand customer feedback on products and services. Governments and organizations use it to get a sense of public opinion.</p>
<p>In social media management, sentiment analysis is used for brand monitoring, customer service, and market research.</p>
<p>It’s not just about understanding how many people are talking about your brand or product, but how they feel about it.</p>
<h2 id="heading-what-is-hugging-face">What is Hugging Face?</h2>
<p>Now, let’s talk about Hugging Face. No, it’s not what you think. You don’t go around hugging faces.</p>
<p>In the world of AI, <a target="_blank" href="https://huggingface.co/">Hugging Face</a> is quite the star. It’s an AI community and platform that provides state-of-the-art tools and models for Natural Language Processing (NLP).</p>
<p>Think of it as a toolbox that gives you the power to understand and generate human language. It’s like having a linguistic wizard by your side.</p>
<p>Hugging Face’s most popular offering is the ‘Transformers’ library. The Transformers library comes packed with APIs and tools that let you easily grab and train top-notch pre-trained models.</p>
<p>When you pick these pre-trained models, you’re cutting down on compute costs and carbon footprint. Plus, you save loads of time and resources that you’d otherwise spend training a model from scratch.</p>
<p>These models solve common tasks across various domains, like:</p>
<ul>
<li><strong>Natural Language Processing (NLP)</strong>: Here, you can do a bunch of cool stuff like text classification, spotting names or entities in text, answering questions, language modelling, summarizing, translating, handling multiple-choice questions, and even generating text.</li>
<li><strong>Computer Vision:</strong> This involves image classification, spotting and outlining objects in images, and more.</li>
<li><strong>Audio:</strong> You can work on recognizing speech automatically and classifying different types of sounds.</li>
<li><strong>Multimodal Tasks:</strong> These are tasks that mix it up, like answering questions based on tables, recognizing text in images (like scanned documents), pulling out information from these documents, classifying videos, and answering questions based on images.</li>
</ul>
<p>The neat thing about Transformers is that they’re flexible with different frameworks. Whether you’re into <a target="_blank" href="https://turingtalks.substack.com/p/pytorch-vs-tensorflow-for-deep-learning">PyTorch</a>, TensorFlow, or JAX, Transformers has got you covered.</p>
<p>Its ease of use and comprehensive nature make it a go-to for researchers, developers, and businesses alike.</p>
<h2 id="heading-code-for-sentiment-analysis">Code for Sentiment Analysis</h2>
<p>Now that you know what sentiment analysis and Hugging Face are, let’s write some code. We’ll use Python and the Hugging Face <code>transformers</code> library to build a simple sentiment analyzer.</p>
<p>You can either use your terminal, install Python and run the code, or use a <a target="_blank" href="https://colab.research.google.com/">Google Colab notebook</a>. I would recommend the latter since it comes pre-installed with Python.</p>
<p>Install the <code>transformers</code>library with this command:</p>
<pre><code>pip install transformers
</code></pre><p>If you are using a Colab notebook, use a <strong>!</strong> symbol before the command for the notebook to treat it as a shell command (Colab executes code as Python by default).</p>
<pre><code>!pip install transfomers
</code></pre><p>Once the installation is complete, you can start using the library. First, let's import <code>pipeline</code> from the transformers library.</p>
<pre><code><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
</code></pre><p>In Hugging Face, a “pipeline” is like a tool that helps you perform a series of steps to change data into the form you want. The pipeline makes it simple to use these tools for different jobs, without needing to know all the complex details about how these tools work on the inside.</p>
<p>Now let’s load the <code>sentiment-analysis</code> pipeline.</p>
<pre><code>sentiment_pipeline = pipeline(<span class="hljs-string">"sentiment-analysis"</span>)
</code></pre><p>Now would you believe me if I said we are pretty much done? Our sentiment analysis model is ready and we can pass text to the pipeline and get the label as well as a sentiment score.</p>
<pre><code># Run sentiment analysis
result = sentiment_pipeline(<span class="hljs-string">"Every new day brings a chance to create joyful memories and embrace new opportunities."</span>)

# Print the result
print(result)
</code></pre><p>This is the output of the above code:</p>
<pre><code>[{<span class="hljs-string">'label'</span>: <span class="hljs-string">'POSITIVE'</span>, <span class="hljs-string">'score'</span>: <span class="hljs-number">0.9998821020126343</span>}]
</code></pre><p>If you want to pass multiple sentences, pass an array of inputs to the pipeline.</p>
<pre><code>result = sentiment_pipeline([<span class="hljs-string">"Every new day brings a chance to create joyful memories and embrace new opportunities."</span>,<span class="hljs-string">"Despite the effort, the project failed to meet expectations, leading to disappointment and frustration among the team."</span>])
print(result)
</code></pre><p>Following will be the output of the above code:</p>
<pre><code>[{<span class="hljs-string">'label'</span>: <span class="hljs-string">'POSITIVE'</span>, <span class="hljs-string">'score'</span>: <span class="hljs-number">0.9998821020126343</span>}, {<span class="hljs-string">'label'</span>: <span class="hljs-string">'NEGATIVE'</span>, <span class="hljs-string">'score'</span>: <span class="hljs-number">0.9997937083244324</span>}]
</code></pre><p>I hope you understand how powerful the Hugging Face Transformer library is. This is just a sample of the many pre-trained models that Hugging Face provides. Unless you are working on a unique problem, you should find a pre-trained model in Hugging Face available for you to work with.</p>
<h2 id="heading-summary">Summary</h2>
<p>In this article, we’ve learned about sentiment analysis and Hugging Face, a powerful tool in the world of NLP. Most importantly, you’ve taken your first steps in performing sentiment analysis by using the Hugging Face Transformers library.</p>
<p>Remember, what we’ve covered is just the tip of the iceberg. The field of NLP is vast and constantly evolving. The Hugging Face Transformers library is a powerful ally in your journey through AI. It simplifies complex tasks and gives you access to pre-trained models, saving you time and resources.</p>
<p>Hope you enjoyed this article. Find more beginner-friendly articles on AI at <strong><a target="_blank" href="https://www.turingtalks.ai/">turingtalks.ai</a></strong></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Natural Language Processing Techniques for Topic Identification – Explained with Examples ]]>
                </title>
                <description>
                    <![CDATA[ There's a lot of textual information available these days. It ranges from articles to social media posts and research papers. So our ability to distill meaningful insights is key. This helps us make informed decisions in a wide array of contexts. For... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/topic-identification-using-natural-language-processing/</link>
                <guid isPermaLink="false">66d45f44052ad259f07e4af0</guid>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ibrahim Ogunbiyi ]]>
                </dc:creator>
                <pubDate>Thu, 25 Jan 2024 16:16:15 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/01/pexels-wallace-chuck-3109168.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>There's a lot of textual information available these days. It ranges from articles to social media posts and research papers. So our ability to distill meaningful insights is key. This helps us make informed decisions in a wide array of contexts.</p>
<p>For example, you can analyze a large volume of textual content to extract a common theme. Companies and businesses utilize this technique to understand public opinion about their brand. This lets them make informed decisions and improve their services.</p>
<p>The ability to extract themes from a large amount of textual data is referred to as topic identification.</p>
<p>In this article, you will learn how to utilize NLP techniques for topic identification, enhancing your skillset as a data scientist. So sit back, because it's gonna be an interesting journey.</p>
<h2 id="heading-what-is-topic-identification">What is Topic Identification?</h2>
<p>Topic identification, simply put, is a sub-field under natural language processing. It involves the process of automatically discovering and organizing the main themes or topics present in a collection of textual data.</p>
<p>There are several Natural Language Processing (NLP) techniques you can use to identify themes in text, from simple ones to more algorithm based techniques. In this article we will look at the common NLP techniques used for topic identification. We'll discuss these in more detail below.</p>
<p>I recently tweeted about the essence of NLP. It really is purely statistics, because there are different manipulations you can do to ensure that numbers serve as representations for text (since computers don't understand text).</p>
<div class="embed-wrapper">
        <blockquote class="twitter-tweet">
          <a href="https://twitter.com/Ibrahim_Geek/status/1742877290227187989?s=20"></a>
        </blockquote>
        <script defer="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></div>
<p> </p>
<h2 id="heading-requirements-for-this-project">Requirements for this Project</h2>
<p>In order for you to be able to follow along and get hands-on practical experience while learning, you should have Python 3.x installed on your machine.</p>
<p>We'll also use the following libraries: Gensim, Scikit-Learn, and NLTK. You can install them using the Pip package installer with the following command:</p>
<pre><code class="lang-bash">pip install gensim nltk scikit-learn
</code></pre>
<h2 id="heading-techniques-used-in-nlp-for-topic-identification">Techniques Used in NLP for Topic Identification</h2>
<p>There are various techniques you can use for topic identification. In this article, you will learn about some common NLP techniques that work quite well, from simple and effective methods to more advanced ones.</p>
<h3 id="heading-bag-of-words">Bag of Words</h3>
<p>Bag of Words (BoW) is a common representation used in NLP for textual data. You can use it to count the frequency at which each word occurs in a document.</p>
<p>BoW, in the context of topic identification, is based on the assumption that the more frequently a word occurs in a document, the more important it is. Then you can use those more common words to infer what the document is all about.</p>
<p>Bag of words is the simplest technique used to identify topics in NLP. While Bag of Words is simple and efficient, it is highly affected by stop words, which are common words in text data (like "the," "and," "is," and so on).</p>
<p>But once you eliminate the issue of stop words from the text, allowing you to perform effective text processing (using techniques like normalization), BoW can still prove effective in identifying some main topics.</p>
<p>Let's look at how you can use BoW to identify the topic below.</p>
<h4 id="heading-how-to-implement-of-bag-of-words-in-python">How to implement of Bag of Words in Python</h4>
<p>A bit of background about the example article we'll use here: I got it from the BBC, and it's titled "US lifts ban on imports of latest Apple watch." The article discusses the lifted ban on Apple's latest watches, Ultra 2 and Series 9.</p>
<p>Now let's go over how to implement the bag of words in Python. I'll break this code block up into sections and explain each part as I go to make it a bit more easy to digest.</p>
<pre><code class="lang-python"><span class="hljs-comment">#import necessary libraries</span>
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> Counter
<span class="hljs-keyword">from</span> nltk.tokenize <span class="hljs-keyword">import</span> word_tokenize
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords

article = <span class="hljs-string">"Apple's latest smart watches can resume being sold in the US after the tech company filed an emergency appeal with authorities.\
Sales of the Series 9 and Ultra 2 watches had been halted in the US over a patent row.\
The US's trade body had barred imports and sales of Apple watches with technology for reading blood-oxygen level.\
Device maker Masimo had accused Apple of poaching its staff and technology. \
It comes after the White House declined to overturn a ban on sales and imports of the Series 9 and Ultra 2 watches which came into effect this week.\
Apple had said it strongly disagrees with the ruling.\
The iPhone maker made an emergency request to the US Court of Appeals, which proved successful in getting the ban lifted."</span>
</code></pre>
<p>In the above code, we're importing the necessary libraries that we'll use to implement the BoW.</p>
<p>We'll use the Counter library to count the frequency of each word, and the word_tokenize library to tokenize the document into individual word tokens so they can be counted. Lastly, the stopwords library will remove stop words from the document.</p>
<pre><code class="lang-python">
<span class="hljs-comment"># Initialize english stopwords</span>
english_stopwords = stopwords.words(<span class="hljs-string">"english"</span>)

<span class="hljs-comment">#convert article to tokens</span>
tokens = word_tokenize(article)

<span class="hljs-comment">#extract alpha words and convert to lowercase</span>
alpha_lower_tokens = [word.lower() <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> tokens <span class="hljs-keyword">if</span> word.isalpha()]

<span class="hljs-comment">#remove stopwords</span>
alpha_no_stopwords = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> alpha_lower_tokens <span class="hljs-keyword">if</span> word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> english_stopwords]

<span class="hljs-comment">#Count word</span>
BoW = Counter(alpha_no_stopwords)

<span class="hljs-comment">#3 Most common words</span>
BoW.most_common(<span class="hljs-number">3</span>)
</code></pre>
<p>In the above code, we use the first line of code to extract all stop words in the English language. Then, the second line tokenizes the article string into individual words. The third line of code normalizes each word into lowercase and only extracts alphabetic words from the article. The last two lines of code are used to count the frequency of each word and select the most common three words.</p>
<p>Below is the output of the BoW model:</p>
<pre><code class="lang-javascript">[(<span class="hljs-string">'watches'</span>, <span class="hljs-number">4</span>), (<span class="hljs-string">'us'</span>, <span class="hljs-number">4</span>), (<span class="hljs-string">'apple'</span>, <span class="hljs-number">3</span>), (<span class="hljs-string">'emergency'</span>, <span class="hljs-number">2</span>)]
</code></pre>
<p>From this, we can infer that the article is all about "Apple's watches in the US". As you can see, with the simplicity in reasoning behind the bag of words, it is still possible to infer a bit of knowledge about the article.</p>
<h3 id="heading-latent-dirichlet-allocation">Latent Dirichlet Allocation</h3>
<p>Latent Dirichlet Allocation, or LDA for short, is a popular probabilistic model used in NLP and machine learning for topic modeling (using algorithms to identify topics). It is based on the assumption that documents are mixtures of topics, and topics are mixtures of words.</p>
<p>Simply put, LDA is an NLP technique used to identify the topic to which a document belongs based on the words contained in the document.</p>
<p>LDA operates on the bag-of-words representation of documents, where each document is represented as a vector of word frequencies. You can implement LDA using the Gensim library in Python (which is an open source library used for topic modelling and document similarity analysis).</p>
<p>Steps for implementing LDA include:</p>
<ul>
<li><p><strong>Import Libraries:</strong> First step is to import the necessary libraries you will be utilizing.</p>
</li>
<li><p><strong>Data Preparation:</strong> Convert raw data to a document format then tokenize, remove stop words, and optionally perform stemming or lemmatization.</p>
</li>
<li><p><strong>Create Dictionary and Corpus</strong>: Build a dictionary with unique word IDs. Then form a bag of words corpus representing document-word frequency.</p>
</li>
<li><p><strong>Train LDA Model</strong>: Use the document-word frequency and dictionary to train the LDA model, setting the desired number of topics.</p>
</li>
<li><p><strong>Print Topics</strong>: Explore and print the discovered topics.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Import the necessary libraries</span>
<span class="hljs-keyword">from</span> gensim.corpora.dictionary <span class="hljs-keyword">import</span> Dictionary
<span class="hljs-keyword">from</span> gensim.models <span class="hljs-keyword">import</span> LdaModel
<span class="hljs-keyword">from</span> nltk <span class="hljs-keyword">import</span> sent_tokenize, word_tokenize
<span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords

article = <span class="hljs-string">"Apple's latest smart watches can resume being sold in the US after the tech company filed an emergency appeal with authorities. \
Sales of the Series 9 and Ultra 2 watches had been halted in the US over a patent row. \
The US's trade body had barred imports and sales of Apple watches with technology for reading blood-oxygen level. \
Device maker Masimo had accused Apple of poaching its staff and technology. \
It comes after the White House declined to overturn a ban on sales and imports of the Series 9 and Ultra 2 watches which came into effect this week. \
Apple had said it strongly disagrees with the ruling. \
The iPhone maker made an emergency request to the US Court of Appeals, which proved successful in getting the ban lifted."</span>
</code></pre>
<p>The above lines of code include the necessary libraries that we'll use to implement the LDA.</p>
<p>The first line of code contains the Dictionary object. Then, the second line imports the LDA model, and the third line of code contains the <code>sent_tokenize</code>, which we'll use to convert the article into document. After that, <code>word_tokenize</code> will tokenize the document into individual words. Lastly, we have the <code>stop_words</code> library.</p>
<pre><code class="lang-python"><span class="hljs-comment"># convert article to documents</span>
documents = sent_tokenize(article)

<span class="hljs-comment">#toeknize and normalize the document</span>
tokenized_words = [word_tokenize(doc.lower()) <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> documents]

<span class="hljs-comment"># remove stops words and onl extract alphabets</span>
cleaned_token = [[word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> sentence <span class="hljs-keyword">if</span> word <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> english_stopwords <span class="hljs-keyword">and</span> word.isalpha()]
                 <span class="hljs-keyword">for</span> sentence <span class="hljs-keyword">in</span> tokenize_words]

<span class="hljs-comment"># create a dictionary</span>
dictionary = Dictionary(cleaned_token)

<span class="hljs-comment"># Create a corpus from the document</span>
corpus = [dictionary.doc2bow(text) <span class="hljs-keyword">for</span> text <span class="hljs-keyword">in</span> cleaned_token]
</code></pre>
<p>The above lines of code include the preprocessing steps that will be performed on the article, including converting the article to a document, normalizing, and tokenizing the document into individual words.</p>
<p>The next part removes stopwords from the text and then extracts words and numbers from the document. After that, we create a dictionary, which is a map between each word and its numerical identifier. The last line of code then creates a corpus of the document.</p>
<pre><code class="lang-javascript"># Build the LDA model
model = LdaModel(corpus=corpus, id2word=dictionary, num_topics=<span class="hljs-number">3</span>)

# Print the topics
print(<span class="hljs-string">"Identified Topics:"</span>)
<span class="hljs-keyword">for</span> idx, topic <span class="hljs-keyword">in</span> lda_model.print_topics():
    print(f<span class="hljs-string">"Topic {idx + 1}: {topic}"</span>)
</code></pre>
<p>The above code is used to train the model on the corpus and then prints the top 3 topics from the article.</p>
<p>Below is the output of the LDA Model:</p>
<pre><code class="lang-javascript">Identified Topics:
Topic <span class="hljs-number">1</span>: <span class="hljs-number">0.045</span>*<span class="hljs-string">"9"</span> + <span class="hljs-number">0.045</span>*<span class="hljs-string">"ultra"</span> + <span class="hljs-number">0.044</span>*<span class="hljs-string">"sales"</span> + <span class="hljs-number">0.044</span>*<span class="hljs-string">"2"</span> + <span class="hljs-number">0.043</span>*<span class="hljs-string">"series"</span> + <span class="hljs-number">0.043</span>*<span class="hljs-string">"watches"</span> + <span class="hljs-number">0.029</span>*<span class="hljs-string">"apple"</span> + <span class="hljs-number">0.028</span>*<span class="hljs-string">"ruling"</span> + <span class="hljs-number">0.028</span>*<span class="hljs-string">"disagrees"</span> + <span class="hljs-number">0.028</span>*<span class="hljs-string">"said"</span>
Topic <span class="hljs-number">2</span>: <span class="hljs-number">0.051</span>*<span class="hljs-string">"maker"</span> + <span class="hljs-number">0.035</span>*<span class="hljs-string">"ban"</span> + <span class="hljs-number">0.035</span>*<span class="hljs-string">"us"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"emergency"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"made"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"successful"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"court"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"lifted"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"request"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"proved"</span>
Topic <span class="hljs-number">3</span>: <span class="hljs-number">0.055</span>*<span class="hljs-string">"apple"</span> + <span class="hljs-number">0.054</span>*<span class="hljs-string">"us"</span> + <span class="hljs-number">0.054</span>*<span class="hljs-string">"watches"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"sales"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"technology"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"imports"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"authorities"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"barred"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"appeal"</span> + <span class="hljs-number">0.031</span>*<span class="hljs-string">"filed"</span>
</code></pre>
<p>The LDA technique shows some improvement as compared to BoW method. We can still obtain a more information that the article is all about a ban related to Apple ultra series watches in the US.</p>
<h3 id="heading-non-negative-matrix-factorization">Non-Negative Matrix Factorization</h3>
<p>Non-Negative Matrix Factorization (NMF), just like LDA, is another topic modeling technique that uncovers latent topics in a collection of documents.</p>
<p>But instead of relying on BoW, it relies on the Term Frequency-Inverse Document Frequency (TF-IDF) representation to capture and retrieve hidden themes or topics from the documents.</p>
<p>By incorporating TF-IDF information, NMF is able to weigh the importance of terms, thereby identifying more hidden patterns. You can perform NMF using the Scikit-learn library.</p>
<h3 id="heading-steps-for-performing-nmf">Steps for performing NMF</h3>
<ul>
<li><p>Import necessary libraries</p>
</li>
<li><p>Data Preparation: Convert text into document, then perform necessary data preparation like removing stop words. The TF-IDF function in Scikit-Learn has as an argument that does that.</p>
</li>
<li><p>Convert the document to a TF-IDF matrix using the TF-IDF vectorizer in Scikit-learn</p>
</li>
<li><p>Apply the NMF function on the TF-IDF matrix and specify the numbers of topic you want and the number of words in each topic</p>
</li>
<li><p>Lastly, interpret your result.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># import the necessary libraries</span>
<span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> TfidfVectorizer
<span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> NMF

article = <span class="hljs-string">"Apple's latest smart watches can resume being sold in the US after the tech company filed an emergency appeal with authorities. \
Sales of the Series 9 and Ultra 2 watches had been halted in the US over a patent row. \
The US's trade body had barred imports and sales of Apple watches with technology for reading blood-oxygen level. \
Device maker Masimo had accused Apple of poaching its staff and technology. \
It comes after the White House declined to overturn a ban on sales and imports of the Series 9 and Ultra 2 watches which came into effect this week. \
Apple had said it strongly disagrees with the ruling. \
The iPhone maker made an emergency request to the US Court of Appeals, which proved successful in getting the ban lifted."</span>
</code></pre>
<p>The above code contains the libaries that we'll use to implement NMF and the article itself.</p>
<pre><code class="lang-python"><span class="hljs-comment"># convert article to documents</span>
documents = sent_tokenize(article)

<span class="hljs-comment"># Create a TF-IDF vectorizer</span>
tfidf_vectorizer = TfidfVectorizer(stop_words=<span class="hljs-string">'english'</span>).fit_transform(document)

<span class="hljs-comment"># Apply NMF</span>
num_topics = <span class="hljs-number">5</span>  <span class="hljs-comment"># Set the number of topics you want to identify</span>
nmf_model = NMF(n_components=num_topics, init=<span class="hljs-string">'random'</span>, random_state=<span class="hljs-number">42</span>)
nmf_matrix = nmf_model.fit_transform(tfidf)
</code></pre>
<p>The above code converts the article into documents. Then it creates a Term-Frequency Inverse Document Frequency matrix of the article document. The last three lines of code then define the number of topics and create the topics from the document matrix using the NMF.</p>
<p>Below is the output of the NMF Model:</p>
<pre><code class="lang-javascript">Topic #<span class="hljs-number">1</span>: ultra, series, sales, watches, row, halted, patent, white, house, effect
Topic #<span class="hljs-number">2</span>: lifted, court, iphone, getting, request, successful, proved, appeals, ban, maker
Topic #<span class="hljs-number">3</span>: disagrees, strongly, ruling, said, apple, body, blood, level, trade, oxygen
Topic #<span class="hljs-number">4</span>: filed, resume, appeal, latest, tech, authorities, sold, smart, company, emergency
Topic #<span class="hljs-number">5</span>: technology, apple, accused, masimo, device, staff, poaching, maker, trade, level
</code></pre>
<p>You can see that NMF reveals more insights concerning the themes of the document. For example, you can tell that another company called Masimo is accusing Apple of a patent infringement in their Ultra series watches.</p>
<h2 id="heading-how-to-choose-which-technique-to-use">How to Choose Which Technique to Use?</h2>
<p>I recommend experimenting with all the approaches in order to gain different perspectives concerning the contents of your document.</p>
<p>Bag of Words and LDA are based on how frequently words occur, making these techniques useful for inferring the biggest/most general themes about the document.</p>
<p>On the other hand, when using NMF, which is based on TF-IDF, less frequent words can be used to infer additional topics and provide a different perspective on the document.</p>
<p>For example, NMF was able to identify key terms like "Masimo" and "accused," whereas LDA was not able to do this. So depending on your needs, go ahead and experiment with all the approaches to see which one is able to yield better results.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this article, you've learned about topic identification and how you can use it to extract themes or topics from a large document.</p>
<p>We covered some different techniques you can use to identify topic including simple ones like BoW and more advanced ones like LDA and NMF.</p>
<p>Happy learning, and see you in the next one.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Get Started with Hugging Face – Open Source AI Models and Datasets ]]>
                </title>
                <description>
                    <![CDATA[ By Ambreen Khan What is Hugging Face 🤗? If you are interested in Artificial Intelligence and Natural Language Processing, you have probably heard of Hugging Face – the company named after a cute emoji.  Hugging Face is not only a company, but also a... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/get-started-with-hugging-face/</link>
                <guid isPermaLink="false">66d45d97706b9fb1c166b918</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ open source ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 10 Jan 2024 21:05:36 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/01/HuggingFace_Title-1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Ambreen Khan</p>
<h2 id="heading-what-is-hugging-face"><strong>What is Hugging Face 🤗?</strong></h2>
<p>If you are interested in Artificial Intelligence and Natural Language Processing, you have probably heard of Hugging Face – the company named after a cute emoji. </p>
<p>Hugging Face is not only a company, but also a platform that is transforming the fields of AI and NLP through open source and open science.</p>
<p>Hugging Face offers a platform called the Hugging Face Hub, where you can find and share thousands of AI models, datasets, and demo apps. The Hub is like the GitHub of AI, where you can collaborate with other machine learning enthusiasts and experts, and learn from their work and experience.</p>
<p>Hugging Face’s mission is to democratize good machine learning, one commit at a time. Whether you are a beginner or a professional, you can benefit from the amazing resources and tools that Hugging Face provides.</p>
<p>In this post, I'll guide you through the basics of Hugging Face. You'll learn how to create your Hugging Face account, set up your development environment, and use some of the pre-trained models that are available on the Hub. Let’s get started! 🚀</p>
<h2 id="heading-heres-what-well-cover">Here's what we'll cover:</h2>
<ol>
<li><a class="post-section-overview" href="#heading-what-can-you-do-on-the-hugging-face-platform">What can you do on the Hugging Face Platform?</a><ul>
<li><a class="post-section-overview" href="#download-and-fine-tune-existing-open-source-models">Download and fine-tune existing Open Source models</a></li>
<li><a class="post-section-overview" href="#run-models-directly-from-hugging-face">Run models directly from Hugging Face</a></li>
<li><a class="post-section-overview" href="#addcreate-your-own-model">Add/create your own model</a></li>
<li><a class="post-section-overview" href="#use-existing-datasets">Use existing datasets</a></li>
<li><a class="post-section-overview" href="#createbrowse-demo-apps-also-known-as-spaces">Create/browse demo apps (also known as Spaces)</a></li>
<li><a class="post-section-overview" href="#join-or-create-an-organization">Join or create an organization</a></li>
<li><a class="post-section-overview" href="#create-a-portfolio">Create a portfolio</a></li>
<li><a class="post-section-overview" href="#learn-ai-skills">Learn AI skills</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-hugging-face-terminology">Hugging Face terminology</a></li>
<li><a class="post-section-overview" href="#heading-how-to-get-started-with-hugging-face">How to get started with Hugging Face</a><ul>
<li><a class="post-section-overview" href="#heading-create-a-hugging-face-account">Create a Hugging Face account</a></li>
<li><a class="post-section-overview" href="#heading-set-up-your-environment">Set up your environment</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-how-to-use-pre-trained-models-in-hugging-face">How to use pre-trained models in Hugging Face</a></li>
<li><a class="post-section-overview" href="#heading-how-to-find-the-right-pre-trained-model">How to find the right pre-trained model</a></li>
<li><a class="post-section-overview" href="#whats-next">What's next?</a></li>
</ol>
<h2 id="heading-what-can-you-do-on-the-hugging-face-platform">What Can You Do on the Hugging Face Platform?</h2>
<p>Here are some of the awesome things you can do on Hugging Face:</p>
<h3 id="heading-download-and-fine-tune-existing-open-source-models">Download and fine-tune existing Open Source models:</h3>
<p>Why start from scratch when you can leverage the power of over 450k models that are already available on the Hugging Face model library? </p>
<p>You can easily download these models and fine-tune them on your own custom dataset with just a few lines of code. This way, you can save time and resources, and still get a model that suits your specific needs.</p>
<p>You can use these models to perform various tasks, such as:</p>
<ol>
<li>Natural language processing (for example, translation, summarization, and text generation)</li>
<li>Audio-related functions (for example, automatic speech recognition, voice activity detection, and text-to-speech)</li>
<li>Computer vision tasks (for example, depth estimation, image classification, and image-to-image processing),</li>
<li>Multimodal models capable of handling diverse data types (text, images, audio) and producing multiple types of output.</li>
</ol>
<h3 id="heading-run-models-directly-from-hugging-face">Run Models directly from Hugging Face:</h3>
<p>If you don’t want to set up these models on your own machines, you can simply use Hugging Face’s Transformer library to connect to these models, send requests, and receive outputs. </p>
<h3 id="heading-addcreate-your-own-model">Add/create your own model:</h3>
<p>If you have a brilliant idea for a new model, or you want to improve an existing one, you can also add/create your own model on Hugging Face. </p>
<p>The platform will host your model, and allow you to provide additional information, upload essential files, and manage different versions. You can also choose whether your models are public or private, so you can decide when or if you want to share them with the world. </p>
<p>Once your model is ready, you can access it directly from Hugging Face, send requests, and retrieve the outputs for integration into any applications you are developing.</p>
<h3 id="heading-use-existing-datasets">Use existing datasets:</h3>
<p>A good model needs a good dataset. Hugging Face provides a repository of over 90,000 datasets that you can use and feed into your models. </p>
<p>You can take an in-depth look inside the dataset using the dataset viewer. You can also contribute your own datasets to the repository, and help the machine learning community grow.</p>
<p><img src="https://lh7-us.googleusercontent.com/tYogXTtF_pOn4dIRAFUDP20kpbf4yzTvkWdINjnFqjka6N5b4xfDRT_ssvVqQCig09SlSfb3voil16yE37YOPLDmsHj508xkPtYWKHF63rX8ozOW21BQH2dKQL5jEuhq5Yn-m1xyU9pKKHOimOlDqHk" alt="Image" width="600" height="400" loading="lazy">
<em>Screenshot of dataset viewer</em></p>
<h3 id="heading-createbrowse-demo-apps-also-known-as-spaces">Create/browse demo apps (also known as Spaces):</h3>
<p>Hugging Face’s Spaces are Git repositories that allow you to showcase your machine learning applications. You can also browse and try out the Spaces created by other users, and find inspiration for your next AI app. </p>
<p>With thousands of ML apps to choose from, you will never run out of fun and interesting things to do.</p>
<p>Here are a few cool Spaces you can check out:</p>
<ul>
<li><a target="_blank" href="https://huggingface.co/spaces/openai/whisper">OpenAI's Whisper</a>: Transcribe long-form microphone or audio inputs with the click of a button.</li>
<li><a target="_blank" href="https://huggingface.co/spaces/jbilcke-hf/ai-comic-factory">AI Comic Factory</a>: Create your own comic books.</li>
<li><a target="_blank" href="https://huggingface.co/spaces/huggingface-projects/QR-code-AI-art-generator">QR Code AI Art Generator</a>: Generate beautiful QR codes using AI.</li>
<li><a target="_blank" href="https://huggingface.co/spaces/multimodalart/stable-video-diffusion">Stable Video Diffusion</a> (Img2Vid - XT): Generate 4s video from a single image.</li>
<li><a target="_blank" href="https://huggingface.co/spaces/DAMO-NLP-SG/Video-LLaMA">Video-LLaMA</a>: Audio-Visual Language Model for Video Understanding.</li>
</ul>
<h3 id="heading-join-or-create-an-organization">Join or create an organization:</h3>
<p>You can join or create your own organization on Hugging Face. This allows you to showcase your work and collaborate with other members from your university, lab, or company. You can also work on private datasets, models, and spaces with your organization.</p>
<h3 id="heading-create-a-portfolio">Create a portfolio:</h3>
<p>You can create a professional portfolio on Hugging Face to showcase your work and start building your reputation. This can help you land jobs related to AI model training, integration, and development. </p>
<p>Hugging Face provides the basic computing resources for running the demo app, including 16 GB of RAM, 2 CPU cores, and 50 GB of disk space for free. You can also upgrade your hardware for improved and faster performance with paid options.</p>
<h3 id="heading-learn-ai-skills">Learn AI Skills:</h3>
<p>Hugging Face is an excellent platform for learning AI skills. It offers a comprehensive set of tools and resources for training and using models. This includes demos, use cases, documentation, and tutorials that guide you through the entire process of using these tools and training models.</p>
<p>You can also learn from the experts and the community on Hugging Face, and improve your AI knowledge and skills.</p>
<h2 id="heading-hugging-face-terminology">Hugging Face Terminology</h2>
<p>There are some terms you'll need to know to get the most out of working with Hugging Face.</p>
<p><strong>Pretrained model:</strong> A model that has been trained on a large dataset for a specific task before being made available for use. </p>
<p><strong>Inference:</strong> Inference is the process of using a trained model to make predictions or draw conclusions about new, unseen data based on the learned patterns from the training data.</p>
<p><strong>Transformers:</strong> Transformers are models that can handle text-based tasks, such as translation, summarization, and text generation. They use a special architecture that relies on attention mechanisms to capture the relationships between words and sentences.</p>
<p><strong>Tokenizer</strong>: A tokenizer is a process that breaks down text into smaller units called tokens. Tokens are usually words or subwords that can be used for natural language processing (NLP) tasks.</p>
<h2 id="heading-how-to-get-started-with-hugging-face"><strong>How to Get Started with Hugging Face</strong></h2>
<p>To get started with HuggingFace, you will need to set up an account and install the necessary libraries and dependencies. Don’t worry, it’s easy and fun! </p>
<p>Here are the steps you need to follow:</p>
<h3 id="heading-create-a-hugging-face-account">Create a Hugging Face Account</h3>
<p>Signing up as a Community individual contributor is free of charge. You can also opt for a ‘Pro’ plan or a customized plan for Organizations if you need more features and resources.</p>
<p>Go to the Hugging Face website and click on “Sign Up” to create a free account.</p>
<p>Then enter your email address and a password. Click next and complete your profile and security check.</p>
<p><img src="https://lh7-us.googleusercontent.com/OQA0CUGvs2Dg4LKI3X5mPVjNj7LYIbeUDF0q46sC2p39n-Ca56OwiGNYYdPJU4NrcZG4s-G_KKYX1YADa9QL2yyjHcMDoQ43BBllp6SHgq6P_33XG7ta4nVDTsjierUonbH3YYwuj7CploOW2tpAopo" alt="Image" width="600" height="400" loading="lazy">
<em>Setting up a Hugging Face account</em></p>
<p>Congratulations, you are now a Hugging Face member! 🎉 You will be directed to the Hugging Face ‘Welcome’ page, where you can find more information and tips on how to use the platform.</p>
<p>As a bonus, you also get a Git-based hosted repository where you can create your Models, Datasets and Spaces. You can do this directly using the website or using the CLI. If you prefer the latter, you can check the detailed instructions on the ‘Welcome’ page under the ‘Programmatic access’ section.</p>
<p><img src="https://lh7-us.googleusercontent.com/PhM1PcZxLn4jgchRlU2J6ZEemobdrBTBq0ypqFM3Y2mZsTwtvFUg7nhJ4KBL4HfvYJz4Zp2KsZa7SvbfJMe8o9ARKvy1NOdCGSn4WEJ0JUivxT2Lp4nnWrU21cCjjGl5yJMG7BqfaGzvqVGd9z06Mrg" alt="Image" width="600" height="400" loading="lazy">
<em>Hugging Face welcome screen showing options to create a new model, browse the docs, and set up programmatic access</em></p>
<h3 id="heading-set-up-your-environment">Set Up Your Environment</h3>
<p>Before you start using the Hugging Face hub programmatically, you will need to set up your environment.  </p>
<h4 id="heading-step-1-install-python-and-pip">Step 1: Install Python and Pip:</h4>
<p>Make sure you have Python 3.8 or higher installed on your system. You will also need Pip, the package manager for Python, to install the Hugging Face libraries. If you don’t have Python, you can install it by following the instructions <a target="_blank" href="https://www.python.org/downloads/">here</a>.</p>
<h4 id="heading-step-2-install-huggingface-libraries">Step 2: Install HuggingFace libraries:</h4>
<p>Open a terminal or command prompt and run the following command to install the HuggingFace libraries: </p>
<pre><code class="lang-shell">pip install transformers
</code></pre>
<p>This will install the core Hugging Face library along with its dependencies. To have the full capability, you should also install the datasets and the tokenizers library.</p>
<pre><code class="lang-shell">pip install tokenizers, datasets
</code></pre>
<h4 id="heading-step-3-set-up-a-development-environment">Step 3: Set up a development environment:</h4>
<p>Choose a code editor or IDE of your choice, such as Jupyter Notebook, PyCharm, or Visual Studio Code. Create a new project directory and set up a virtual environment to isolate your project dependencies. You can find more information on how to do this <a target="_blank" href="https://docs.python.org/3/library/venv.html">here</a>.</p>
<p>With these steps completed, you have successfully set up Hugging Face on your system and are ready to start exploring its features and capabilities. Let’s go! 🚀</p>
<h2 id="heading-how-to-use-pre-trained-models-in-hugging-face">How to Use Pre-Trained Models in Hugging Face</h2>
<p>One of the best things about Hugging Face is that it gives you access to thousands of pre-trained models that can perform various tasks on different types of data. Whether you are working with text, vision, audio, or a combination of them, you can find a model that suits your needs.</p>
<p>Hugging Face has two main libraries that provide access to pre-trained models: <strong>Transformers</strong> and <strong>Diffusers</strong>. The Transformers library handles text-based tasks, such as translation, summarization, and text generation. Diffusers can handle image-based tasks, such as image synthesis, image editing, and image captioning.</p>
<p>You have already installed the transformers library during the environment setup. Let’s see how you can use it to work with pre-trained models.</p>
<h3 id="heading-step-1-visit-the-pypi-page">Step 1: Visit the PyPI page</h3>
<p>To learn more about the transformers library, you can visit its page on PyPI, the Python Package Index. </p>
<p>Go to <a target="_blank" href="https://pypi.org/">PyPi</a> and search for ‘transformers’. Click on the latest version of the transformers library displayed in the search result. You will see a brief introduction of the library, as well as some useful links and information.</p>
<h3 id="heading-step-2-download-and-use-pre-trained-models">Step 2: Download and use pre-trained models</h3>
<p>The transformers library provides APIs to quickly download and use pre-trained models on a given text, fine-tune them on your own datasets, and then share them with the community on Hugging Face’s <a target="_blank" href="https://huggingface.co/models">model hub</a>.</p>
<h3 id="heading-step-3-use-the-pipeline-method">Step 3: Use the <code>pipeline()</code> method</h3>
<p>To use a pre-trained model on a given input, Hugging Face provides a <code>pipeline()</code> method, an easy-to-use API for performing a wide variety of tasks. </p>
<p>The <a target="_blank" href="https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/pipelines#transformers.pipeline">pipeline()</a> method makes it simple to use any <a target="_blank" href="https://huggingface.co/models">model</a> from the Hub for inference on any language, computer vision, speech, and multimodal tasks.</p>
<p>Let’s try to perform a task using the pipeline() method.</p>
<h4 id="heading-task-sentiment-analysis">Task: Sentiment analysis:</h4>
<p>Let’s use the <code>pipeline()</code> method to classify positive versus negative texts provided by the user:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

<span class="hljs-comment"># Load the pre-trained sentiment analysis model</span>
sentiment_analysis = pipeline(
<span class="hljs-string">"sentiment-analysis"</span>, model=<span class="hljs-string">"distilbert-base-uncased-finetuned-sst-2-english"</span>)

input_text = [
<span class="hljs-string">"It’s a great app, my biggest problem is the card readers regularly do not connect. Which is very poor customer service for us because we have to manually enter our customers debit cards, which takes time. This slows down our efficiency."</span>
]

<span class="hljs-comment"># Perform sentiment analysis on the input text</span>
result = sentiment_analysis(input_text)

<span class="hljs-comment"># Print the result</span>
print(result)
</code></pre>
<p>The pipeline statement downloads and caches the pretrained model used by the pipeline, while the statement <code>result = sentiment_analysis(input_text)</code> evaluates it on the given text. </p>
<p><strong>Output:</strong></p>
<pre><code class="lang-shell">[{'label': 'NEGATIVE', 'score': 0.9996176958084106}]
</code></pre>
<p>Here, the answer is "NEGATIVE" with a confidence of 99.96%.</p>
<h4 id="heading-task-automatic-speech-recognition">Task: Automatic speech recognition</h4>
<p>Let’s try another task that involves speech recognition.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline

transcriber = pipeline(task=<span class="hljs-string">"automatic-speech-recognition"</span>,
                       model=<span class="hljs-string">"openai/whisper-small"</span>)
result = transcriber(
    <span class="hljs-string">"https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"</span>)

print(result)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-shell">{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
</code></pre>
<p>You can see how easy it is to get a pre-trained model up and running using Hugging Face's libraries. </p>
<h3 id="heading-how-to-find-the-right-pre-trained-model">How to find the right pre-trained model</h3>
<p>But how can you find the right pre-trained model if you want to perform a specific task? </p>
<p>This is actually quite easy. You can browse the models on the Hugging Face website, and filter them by task, language, framework, and more. You can also search for models and datasets by keyword and sort them by trending, most likes, most downloads, or by recent updates.</p>
<p><img src="https://lh7-us.googleusercontent.com/e94ThjikQ7rAFXu-LUx6a0ZosgWFKqjfSION915OcA9fQweqZO62wLdyPkAH657OFOlO-Zw4O9WLvtQ1auZl8Oo9inxtul7J1hkuXs1Bqs10n_FRy8P6o-mhGVB_QKVEz4CHL7-mOm9wTGzbqr6gJJY" alt="Image" width="600" height="400" loading="lazy">
<em>Searching for models</em></p>
<p>Each model has a model card that contains important information, such as model details, inference example, training procedure, community interaction features, and link to the files. You can also try the model on the model card page by using the Inference API section.</p>
<p><img src="https://lh7-us.googleusercontent.com/Fs-OKp8zUOF4WIN9-dFBYQIQDL5loPowHzEzIr7T8mWZltyGSDGEj8K-U-CrTZwPK3D1RjkFZwSfhNex_BhWYCYW4AkUFuADkefneuJtyHSYkDoTqAU24zqvUFdTjx978g8jfVkoajhZ9PF_lTi2Ekg" alt="Image" width="600" height="400" loading="lazy">
<em>Inference API</em></p>
<p>You can also check the list of spaces that are using that particular model and further explore the spaces by clicking on the space link.</p>
<p><img src="https://lh7-us.googleusercontent.com/z2abf18c-bvqWM82OJz7ua_sebywG4DHXQQbWE4QD0Vmv1tIOw35Okw56Va5nBrJlVRWJArC_L6RWdgYIl1nadcaRlMfbt_fyZyK6hFpDkhXAgURyDiU24hzRy91W8jQbwMbs4tavsAv2r3Di-Qjpo0" alt="Image" width="600" height="400" loading="lazy">
<em>Spaces</em></p>
<h2 id="heading-whats-next">What's Next?</h2>
<p>In this guide, you have learned the basics of Hugging Face, and how to use its libraries, models, datasets, and spaces. But there is so much more to discover and enjoy!</p>
<p>Here are some tips on how to make the most of Hugging Face:</p>
<ul>
<li>Dive into Hugging Face’s Spaces: Spaces are where the magic happens. You can find and try out thousands of machine learning applications created by the community, and see what’s trending and popular. You can also create your own spaces and showcase your work to the world.</li>
<li>Explore the Hugging Face documentation and tutorials: If you want to learn more about the Hugging Face platform and its features, you can check out the documentation and tutorials. They provide detailed information and guidance on how to use the tools and resources that Hugging Face offers. You can also find information about common ML/AI tasks, such as text classification, image generation, and speech recognition, on the tasks page.</li>
<li>Visit the <a target="_blank" href="https://huggingface.co/learn">learn</a> section:  If you are interested in acquiring new skills and knowledge in AI and NLP, you can visit the ‘learn’ page that displays courses from Hugging Face. Here, you can learn from the experts and the best practices in the field, and apply them to your own projects.</li>
<li>Join the Hugging Face community: Machine learning is more fun when collaborating! You can join the Hugging Face community on platforms like GitHub, Discord, and Twitter to connect with other users and stay updated on the latest developments. You can also share your feedback, questions, and ideas with the community, and help Hugging Face grow and improve.</li>
</ul>
<p>Hugging Face is not just a platform for AI and NLP – it's also a playground for your curiosity and creativity. You can experiment with new models, expand your AI knowledge, and enrich your AI toolkit with various tools and resources. So, keep learning, keep exploring. There is always something new and exciting to discover with Hugging Face. 😊</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ NLP Tutorial – Text Pre-Processing Techniques for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ Natural Language Processing (NLP) is a branch of Machine learning (ML) that is focused on making computers understand the human language. It is used to create language models, language translation apps like Google translate, and virtual assistants, a... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/natural-language-processing-techniques-for-beginners/</link>
                <guid isPermaLink="false">66bccb244a4c0beb784641d4</guid>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Crypt(iq) ]]>
                </dc:creator>
                <pubDate>Wed, 12 Jul 2023 14:31:24 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/07/NLP-6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Natural Language Processing (NLP) is a branch of Machine learning (ML) that is focused on making computers understand the human language. It is used to create language models, language translation apps like Google translate, and virtual assistants, among other things.</p>
<p>This article takes you through one of the most basic steps in NLP which is text-pre-processing. This is a must-know topic for anyone interested in language models and NLP in general which is a core part of the Artificial Intelligence (AI) and ML field.</p>
<h2 id="heading-what-is-text-pre-processing">What is text pre-processing?</h2>
<p>Text pre-processing is the process of transforming unstructured text to structured text to prepare it for analysis.</p>
<p>When you pre-process text before feeding it to algorithms, you increase the accuracy and efficiency of said algorithms by removing noise and other inconsistencies in the text that can make it hard for the computer to understand.</p>
<p>Making the text easier to understand also helps to reduce the time and resources required for the computer to pre-process data.</p>
<h2 id="heading-processes-involved-in-text-pre-processing">Processes involved in text pre-processing</h2>
<p>To properly pre-process your text and get it in the right state to perform further analysis and actions with it, there are quite a few operations that need to be done on the text and a couple of steps to be followed to get a well structured text.</p>
<p>Let's go over these processes in the following sub-sections.</p>
<h3 id="heading-tokenization">Tokenization</h3>
<p>Tokenization is the first stage of the process. </p>
<p>Here your text is analysed and then broken down into chunks called ‘tokens’ which can either be words or phrases. This allows the computer to work on your text token by token rather than working on the entire text in the following stages.</p>
<p>The two main types of tokenisation are word and sentence tokenisation.</p>
<p><strong>Word tokenisation</strong> is the most common kind of tokenisation. </p>
<p>Here, each token is a word, meaning the algorithm breaks down the entire text into individual words:</p>
<pre><code class="lang-python">text = <span class="hljs-string">'Wisdoms daughter walks alone. The mark of Athena burns through rome'</span>

words = text.split()
print(words)

<span class="hljs-comment">#the output of this is given below</span>
&gt;&gt;&gt;&gt; [<span class="hljs-string">'Wisdoms'</span>, <span class="hljs-string">'daughter'</span>, <span class="hljs-string">'walks'</span>, <span class="hljs-string">'alone.'</span>, <span class="hljs-string">'The'</span>, <span class="hljs-string">'mark'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'Athena'</span>, <span class="hljs-string">'burns'</span>, <span class="hljs-string">'through'</span>, <span class="hljs-string">'rome'</span>]
</code></pre>
<p>On the other hand, <strong>sentence tokenisation</strong> breaks down text into sentences instead of words. It is a less common type of tokenisation only used in few Natural Language Processing (NLP) tasks.</p>
<p>There are various tokenisation algorithms such as the whitespace tokenisation, Regular expression tokenisation (also called Regex), and the statistical tokenisation. </p>
<p>The type of algorithm you use will depend on the particular task you are working on and what you aim to achieve with it.</p>
<h3 id="heading-normalisation">Normalisation</h3>
<p>In normalisation your text is converted to standard form. </p>
<p>An example of this is converting all text to lowercase, removing numbers, or removing punctuations. Normalization helps to make the text more consistent.</p>
<p>There are a couple of different normalisation techniques, but I’ll give you an explanation of some of the most commonly employed normalisation techniques below.</p>
<h4 id="heading-case-normalisation">Case normalisation</h4>
<p>This technique converts all the letters in your text to a single case, either uppercase or lowercase.</p>
<p>Case normalisation ensures that your data is stored in a consistent format and makes it easier to work with the data. </p>
<p>An example would be looking for all the instances of a word and searching for it in your text. Without case normalisation, the result of searching for the word ‘Boy’ would be different from the result of searching for ‘boy’.</p>
<p>You can use the following code to perform case normalisation:</p>
<pre><code class="lang-python">text = <span class="hljs-string">"'To Sleep Or NOT to SLEep, THAT is THe Question'"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">lower_case</span>(<span class="hljs-params">text</span>):</span>
    text = text.lower()
    <span class="hljs-keyword">return</span> text

lower_case = lower_case(text)<span class="hljs-comment">#converts everthing to lowercase</span>
print(lower_case)

<span class="hljs-comment">#the output of this is given below</span>
&gt;&gt;&gt;&gt; to sleep <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> to sleep, that <span class="hljs-keyword">is</span> the question
</code></pre>
<h4 id="heading-stemming">Stemming</h4>
<p>Stemming words like coding, coder, and coded all have the same base word which is <em>code</em>. </p>
<p>ML models most-often-than-not understand that these words are all derived from one base word. They can work with your text without the tenses, prefixes, and suffixes that we as humans would normally need to make sense of it.</p>
<p>Stemming your texts not only helps to reduce the number of words the model has to work with, and by extension improves  the efficiency of the model.</p>
<p>Although the efficiency of a model is increased with this technique, it also removes important information from your text and could cause some words to be wrongly categorised by the model. </p>
<p>An example of this would be the difference between <em>writing</em> and <em>write</em> in the sentences below:</p>
<pre><code>
💡 Writing makes me happy.

💡 He writes regularly.
</code></pre><p>In the first sentence the word <em>writing</em> represents a noun, while <em>writes</em> in the second sentence represents a verb. </p>
<p>If your ML models stems both <em>writing</em> and <em>writes</em> to the base <em>write</em> the difference in their respective parts of speech is overlooked causing some information to be lost in the process of analysing the text.</p>
<h4 id="heading-lemmatisation">Lemmatisation</h4>
<p>This method is very similar to stemming in that it is also used to identify the base of words. It is however a more complex and accurate technique than stemming.</p>
<p>Unlike stemming, lemmatisation takes in the structure of words before identifying a base word. </p>
<p>Due to the complexity of this technique it has high computational requirements and is therefore more expensive than stemming.</p>
<h4 id="heading-punctuation-removal">Punctuation removal</h4>
<p>During human conversations, punctuation marks like <code>‘’</code>, <code>!</code> , <code>[</code>, <code>}</code>, <code>*</code>, <code>#</code>, <code>/</code>, <code>?</code>, and  <code>‘’</code> are incredibly relevant and necessary to have a proper conversation.  Thelp to fully convey the message of the writer. </p>
<p>ML models on the other hand find punctuations distracting. </p>
<p>Their presence could interfere with text analysis and the natural language processing (NLP) process.</p>
<p>By removing punctuation marks from our text we allow the model to focus on the text alone rather than distracting it with symbols. This makes it easier for the text to be analysed.</p>
<p>To perform punctuation removal on text the following code can be used:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> re

text = <span class="hljs-string">' (to love is to destroy, and to be loved, is to be "the" one &lt;destroyed&gt;} '</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_punctuations</span>(<span class="hljs-params">text</span>):</span>
    punctuation = re.compile(<span class="hljs-string">r'[{};():,."/&lt;&gt;-]'</span>)
    text = punctuation.sub(<span class="hljs-string">' '</span>, text)
    <span class="hljs-keyword">return</span> text

clean_text = remove_punctuations(text)
print(clean_text)

<span class="hljs-comment">#the output of this is given below</span>
&gt;&gt;&gt;&gt; to love <span class="hljs-keyword">is</span> to destroy  <span class="hljs-keyword">and</span> to be loved  <span class="hljs-keyword">is</span> to be  the  one  destroyed
</code></pre>
<h4 id="heading-accent-removal">Accent removal</h4>
<p>This process is about removing language specific character symbols from text. </p>
<p>Some characters are written with specific accents or symbols to either imply a different pronunciation or to signify that words containing such accented texts have a different meaning.</p>
<p>An example of this would be the difference in both meaning and pronunciation between the words <em>résumé</em> and <em>resume</em>.</p>
<p>The former refers to a document that highlights your professional skills and achievements, whereas the latter means ‘to take on something again, or to continue a previous task or action’.</p>
<p>You can use the code below to perform accent removal on your text: </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> re

text = <span class="hljs-string">"her fiancé's résumé is beautiful"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">remove_accents</span>(<span class="hljs-params">text</span>):</span>
    accents = re.compile(<span class="hljs-string">u"[\u0300-\u036F]|é|è"</span>)
    text = accents.sub(<span class="hljs-string">u"e"</span>, text)
    <span class="hljs-keyword">return</span> text

cleaned_text = remove_accents(text)
print(cleaned_text)

<span class="hljs-comment">#the output of this is given below</span>
&gt;&gt;&gt;&gt; her fiance<span class="hljs-string">'s resume is beautiful</span>
</code></pre>
<h4 id="heading-stop-word-removal">Stop-word removal</h4>
<p>Stop-words are words with no meaning. They don't add any additional value to data. </p>
<p>Words like <em>A, the, and, of</em> and so on are called stop-words.</p>
<p>Like all the previous processes, stop-word removal also helps to increase the efficiency of your model. </p>
<p>Since it reduces the size of our dataset, it makes it more manageable and increases the accuracy of NLP tasks.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this article you learned the basics of NLP.  </p>
<p>You are now familiar with the proper procedure to follow when pre-processing your text for NLP tasks. Feel free to go ahead and practice this on your own and work on a few NLP projects.</p>
<p>Note that choosing the right pre-processing technique / techniques to use on your text will depend largely on the type of text you’re working with, the source of your data, and whatever goal you aim to achieve with it.</p>
<p>To learn more about NLP, you can check out <a target="_blank" href="https://www.freecodecamp.org/news/tag/nlp/">FreeCodeCamp</a> for more articles and courses on NLP and ML in general.</p>
<p>Connect with me on twitter <a target="_blank" href="https://twitter.com/Iqma__">@Iqma</a> and follow <a target="_blank" href="https://iqmacodes.hashnode.dev/">my hashnode blog</a> to read more content like this and to learn more about all things AI and Machine Learning.</p>
<p>Happy learning !</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Topic Modeling Tutorial – How to Use SVD and NMF in Python ]]>
                </title>
                <description>
                    <![CDATA[ In the context of Natural Language Processing (NLP), topic modeling is an unsupervised learning problem whose goal is to find abstract topics in a collection of documents.  Topic Modeling answers the question: "Given a text corpus of many documents, ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/advanced-topic-modeling-how-to-use-svd-nmf-in-python/</link>
                <guid isPermaLink="false">66bb8b17c332a9c775d15b66</guid>
                
                    <category>
                        <![CDATA[ natural language processing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ nlp ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ topic modeling ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Bala Priya C ]]>
                </dc:creator>
                <pubDate>Tue, 21 Feb 2023 18:32:38 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/02/brett-jordan-M3cxjDNiLlQ-unsplash-cover-img.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In the context of Natural Language Processing (NLP), <strong>topic modeling</strong> is an unsupervised learning problem whose goal is to find abstract topics in a collection of documents. </p>
<p><strong>Topic Modeling</strong> answers the question: "Given a text corpus of many documents, can we find the abstract topics that the text is talking about?"</p>
<p>In this tutorial, you’ll:</p>
<ul>
<li>Learn about two powerful matrix factorization techniques - <strong>Singular Value Decomposition (SVD)</strong> and <strong>Non-negative Matrix Factorization (NMF)</strong></li>
<li>Use them to find topics in a collection of documents</li>
</ul>
<p>By the end of this tutorial, you'll be able to build your own topic models to find topics in any piece of text.📚📑 </p>
<p>Let's get started.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ol>
<li><a class="post-section-overview" href="#heading-what-is-topic-modeling">What is Topic Modeling?</a></li>
<li><a class="post-section-overview" href="#heading-tf-idf-score-equation">TF-IDF Score Equation</a></li>
<li><a class="post-section-overview" href="#heading-topic-modeling-using-singular-value-decomposition-svd">Topic Modeling Using Singular Value Decomposition (SVD)</a></li>
<li><a class="post-section-overview" href="#heading-what-is-truncated-svd-or-k-svd">What is Truncated SVD or k-SVD?</a></li>
<li><a class="post-section-overview" href="#heading-topic-modeling-using-non-negative-matrix-factorization-nmf">Topic Modeling Using Non-Negative Matrix Factorization (NMF)</a></li>
<li><a class="post-section-overview" href="#heading-7-steps-to-use-svd-for-topic-modeling">7 Steps to Use SVD for Topic Modeling</a></li>
<li><a class="post-section-overview" href="#heading-how-to-visualize-topics-as-word-clouds">How to Visualize Topics as Word Clouds</a></li>
<li><a class="post-section-overview" href="#heading-how-to-use-nmf-for-topic-modeling">How to Use NMF for Topic Modeling</a></li>
<li><a class="post-section-overview" href="#heading-svd-vs-nmf-an-overview-of-the-differences">SVD vs NMF – An Overview of the Differences</a></li>
</ol>
<h2 id="heading-what-is-topic-modeling">What is Topic Modeling?</h2>
<p>Let's start by understanding what topic modeling is.</p>
<p>Suppose you're given a large text corpus containing several documents. You'd like to know the <strong>key topics</strong> that reside in the given collection of documents without reading through each document.</p>
<p>Topic Modeling helps you distill the information in the large text corpus into a certain number of topics. Topics are groups of words that are <em>similar in context</em> and are indicative of the information in the collection of documents.</p>
<p>The general structure of the Document-Term Matrix for a text corpus containing <code>M</code> documents, and <code>N</code> terms in all, is shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/1.png" alt="Image" width="600" height="400" loading="lazy">
<em>Structure of the Document-Term Matrix</em></p>
<p>Let's parse the matrix representation:</p>
<ul>
<li>D1, D2, ..., DM are the M documents.</li>
<li>T1, T2, ..., TN are the N terms</li>
</ul>
<p>To populate the Document-Term Matrix, let’s use the widely-used metric—the TF-IDF Score.</p>
<h2 id="heading-tf-idf-score-equation">TF-IDF Score Equation</h2>
<p>The TF-IDF score is given by the following equation:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>where,</p>
<ul>
<li><code>TF_ij</code> is the number of times the term  <code>Tj</code> occurs in the document  <code>Di</code>.</li>
<li><code>dfj</code> is the number of documents containing the term <code>Tj</code></li>
</ul>
<p>A term that occurs frequently in a particular document, and rarely across the entire corpus has a higher IDF score. </p>
<p>I hope you’ve now gained a cursory understanding of the DTM and the TF-IDF score. Let’s now go over the matrix factorization techniques.</p>
<h2 id="heading-topic-modeling-using-singular-value-decomposition-svd">Topic Modeling Using Singular Value Decomposition (SVD)</h2>
<p>The use of Singular Value Decomposition (SVD) for topic modeling is explained in the figure below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/3.jpeg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Singular Value Decomposition on the the Document-Term Matrix D gives the following three matrices:</p>
<ul>
<li>The left singular vector matrix <strong>U</strong>. This matrix is obtained by the eigen decomposition of the Gram matrix <strong>D.D_T</strong>—also called the document similarity matrix. The i,j-th entry of the document similarity matrix signifies how similar document <code>i</code> is to document <code>j</code>.</li>
<li>The matrix of singular values <strong>S</strong>, which (values) signify the relative importance of topics.</li>
<li>The right singular vector matrix <strong>V_T</strong>, which is also called the term topic matrix. The topics in the text reside along the rows of this matrix.</li>
</ul>
<p>If you'd like to refresh the concept of eigen decomposition, here's an excellent tutorial by <a target="_blank" href="https://www.youtube.com/c/3blue1brown">Grant Sanderson from 3Blue1Brown</a>. It explains eigenvectors and eigenvalues visually.</p>
<p><a target="_blank" href="https://www.youtube.com/embed/PFDu9oVAE-g">Embedded content</a></p>
<p>It's totally fine if you find the working of SVD a bit difficult to understand. 🙂 For now, you may think of SVD as a black box that operates on your Document-Term Matrix (DTM) and yields 3 matrices, <strong>U, S</strong>, and <strong>V_T</strong>. And the topics reside along the rows of the matrix <strong>V_T</strong>. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/4.jpeg" alt="Image" width="600" height="400" loading="lazy"></p>
<p><strong>Note</strong>: SVD is also called <strong>Latent Semantic Indexing (LSI).</strong></p>
<h2 id="heading-what-is-truncated-svd-or-k-svd">What is Truncated SVD or k-SVD?</h2>
<p>Suppose you have a text corpus of 150 documents. Would you prefer skimming through 150 different topics that describe the corpus, or would you be happy reading through 10 topics that can convey the content of the corpus?</p>
<p>Well, it's often helpful to fix a small number of topics that best convey the content of the text. And this is what motivates <strong>k-SVD</strong>.</p>
<p>As matrix multiplication requires a lot of computation, it's preferred to choose the <strong>k largest singular values</strong>, and the topics corresponding to them. The working of k-SVD is illustrated below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/5.jpeg" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-topic-modeling-using-non-negative-matrix-factorization-nmf">Topic Modeling Using Non-Negative Matrix Factorization (NMF)</h2>
<p>Non-negative Matrix Factorization (NMF) works as shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/6.jpeg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Non-negative Matrix Factorization acts on the Document-Term Matrix and yields the following:</p>
<ul>
<li>The matrix <strong>W</strong> which is called the <strong>document-topic matrix</strong>. This matrix shows the distribution of the topics across the documents in the corpus.</li>
<li>The matrix <strong>H</strong> which is also called the <strong>term-topic matrix</strong>. This matrix captures the significance of terms across the topics.</li>
</ul>
<p>NMF is easier to interpret as all the elements of the matrices <strong>W</strong> and <strong>H</strong> are now non-negative. So a higher score corresponds to greater relevance.</p>
<p><strong>But how do we get matrices W and H?</strong> </p>
<p>NMF is a <em>non-exact</em> matrix factorization technique. This means that you cannot multiply W and H to get back the original document-term matrix V. </p>
<p>The matrices W and H are initialized randomly. And the algorithm is run iteratively until we find a W and H that minimize the cost function. </p>
<p>The cost function is the Frobenius norm of the matrix <strong>V - W.H</strong>, as shown below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/e2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The Frobenius norm of a matrix A with m rows and n columns is given by the following equation:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/e3.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-7-steps-to-use-svd-for-topic-modeling">7 Steps to Use SVD for Topic Modeling</h2>
<p>1️⃣ To use SVD to get topics, let's first get a text corpus. The following code cell contains a piece of text on <a target="_blank" href="https://en.wikipedia.org/wiki/Computer_programming">computer programming</a>.</p>
<pre><code class="lang-python">text=[<span class="hljs-string">"Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task."</span>,

      <span class="hljs-string">"Programming involves tasks such as: analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms in a chosen programming language (commonly referred to as coding)."</span>,

      <span class="hljs-string">"The source program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit."</span>,

      <span class="hljs-string">"The purpose of programming is to find a sequence of instructions that will automate the performance of a task (which can be as complex as an operating system) on a computer, often for solving a given problem."</span>,

      <span class="hljs-string">"Proficient programming thus often requires expertise in several different subjects, including knowledge of the application domain, specialized algorithms, and formal logic."</span>,

      <span class="hljs-string">"Tasks accompanying and related to programming include: testing, debugging, source code maintenance, implementation of build systems, and management of derived artifacts, such as the machine code of computer programs."</span>,

      <span class="hljs-string">"These might be considered part of the programming process, but often the term software development is used for this larger process with the term programming, implementation, or coding reserved for the actual writing of code."</span>,

      <span class="hljs-string">"Software engineering combines engineering techniques with software development practices."</span>,

    <span class="hljs-string">"Reverse engineering is a related process used by designers, analysts and programmers to understand and re-create/re-implement"</span>]
</code></pre>
<p>The text for which you need to find topics is now ready.</p>
<p>2️⃣ The next step is to import the <code>TfidfVectorizer</code> class from scikit-learn's feature extraction module for text data:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.feature_extraction.text <span class="hljs-keyword">import</span> TfidfVectorizer
</code></pre>
<p>You'll use the <code>TfidfVectorizer</code> class to get the DTM populated with the TF-IDF scores for the text corpus.</p>
<p>3️⃣ To use <strong>Truncated SVD (k-SVD)</strong> discussed earlier, you need to import the <code>TruncatedSVD</code> class from scikit-learn's <code>decomposition</code> module:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> TruncatedSVD
</code></pre>
<p>▶ Now that you've imported all the necessary modules, it's time to start your quest for topics in the text.</p>
<p>4️⃣ In this step, you'll instantiate a <code>Tfidfvectorizer</code> object. Let's call it vectorizer.</p>
<pre><code class="lang-python">vectorizer = TfidfVectorizer(stop_words=<span class="hljs-string">'english'</span>,smooth_idf=<span class="hljs-literal">True</span>) 
<span class="hljs-comment"># under the hood - lowercasing,removing special chars,removing stop words</span>
input_matrix = vectorizer.fit_transform(text).todense()
</code></pre>
<p>So far, you've:</p>
<p>☑ collected the text,<br>☑ imported the necessary modules, and<br>☑ obtained the input DTM.</p>
<p>Now you'll proceed with using SVD to obtain topics.</p>
<p>5️⃣ You'll now use the <code>TruncatedSVD</code> class that you imported in step 3️⃣.</p>
<pre><code class="lang-python">svd_modeling= TruncatedSVD(n_components=<span class="hljs-number">4</span>, algorithm=<span class="hljs-string">'randomized'</span>, n_iter=<span class="hljs-number">100</span>, random_state=<span class="hljs-number">122</span>)
svd_modeling.fit(input_matrix)
components=svd_modeling.components_
vocab = vectorizer.get_feature_names()
</code></pre>
<p>6️⃣ Let’s write a function that gets the topics for us.</p>
<pre><code class="lang-python">topic_word_list = []
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_topics</span>(<span class="hljs-params">components</span>):</span> 
  <span class="hljs-keyword">for</span> i, comp <span class="hljs-keyword">in</span> enumerate(components):
    terms_comp = zip(vocab,comp)
  sorted_terms = sorted(terms_comp, key= <span class="hljs-keyword">lambda</span> x:x[<span class="hljs-number">1</span>], reverse=<span class="hljs-literal">True</span>)[:<span class="hljs-number">7</span>]
     topic=<span class="hljs-string">" "</span>
     <span class="hljs-keyword">for</span> t <span class="hljs-keyword">in</span> sorted_terms:
      topic= topic + <span class="hljs-string">' '</span> + t[<span class="hljs-number">0</span>]
     topic_word_list.append(topic)
     print(topic_word_list)
  <span class="hljs-keyword">return</span> topic_word_list
get_topics(components)
</code></pre>
<p>7️⃣ And it's time to view the topics, and see if they make sense. When you call the <code>get_topics()</code> function with the components obtained from SVD as the argument, you'll get a list of topics, and the top words in each of those topics.</p>
<pre><code class="lang-python">Topic <span class="hljs-number">1</span>: 
  code programming process software term computer engineering

Topic <span class="hljs-number">2</span>: 
  engineering software development combines practices techniques used

Topic <span class="hljs-number">3</span>: 
  code machine source central directly executed intelligible

Topic <span class="hljs-number">4</span>: 
  computer specific task automate complex given instructions
</code></pre>
<p>And you have your topics in just 7 steps. Do the topics look good?</p>
<h2 id="heading-how-to-visualize-topics-as-word-clouds">How to Visualize Topics as Word Clouds</h2>
<p>In the previous section, you printed out the topics, and made sense of the topics using the top words in each topic.</p>
<p>Another popular visualization method for topics is the <strong>word cloud</strong>. In a word cloud, the terms in a particular topic are displayed in terms of their <strong>relative significance</strong>. The most important word has the largest font size, and so on.</p>
<pre><code class="lang-python">!pip install wordcloud
<span class="hljs-keyword">from</span> wordcloud <span class="hljs-keyword">import</span> WordCloud
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">4</span>):
  wc = WordCloud(width=<span class="hljs-number">1000</span>, height=<span class="hljs-number">600</span>, margin=<span class="hljs-number">3</span>,  prefer_horizontal=<span class="hljs-number">0.7</span>,scale=<span class="hljs-number">1</span>,background_color=<span class="hljs-string">'black'</span>, relative_scaling=<span class="hljs-number">0</span>).generate(topic_word_list[i])
  plt.imshow(wc)
  plt.title(<span class="hljs-string">f"Topic<span class="hljs-subst">{i+<span class="hljs-number">1</span>}</span>"</span>)
  plt.axis(<span class="hljs-string">"off"</span>)
  plt.show()
</code></pre>
<p>The word clouds for topics 1 through 4 are shown in the image grid below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/wc1.jpeg" alt="Image" width="600" height="400" loading="lazy">
<em>Topic Clouds from SVD</em></p>
<p>As you can see, the font-size of words indicate their relative importance in a topic. These word clouds are also called topic clouds.</p>
<h2 id="heading-how-to-use-nmf-for-topic-modeling">How to Use NMF for Topic Modeling</h2>
<p>In this section, you'll run through the same steps as in SVD. You need to first import the <code>NMF</code> class from scikit-learn's <code>decomposition</code> module.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.decomposition <span class="hljs-keyword">import</span> NMF
NMF_model = NMF(n_components=<span class="hljs-number">4</span>, random_state=<span class="hljs-number">1</span>)
W = NMF_model.fit_transform(input_matrix)
H = NMF_model.components_
</code></pre>
<p>And then you may call the <code>get_topics()</code> function on the matrix <strong>H</strong> to get the topics.</p>
<pre><code class="lang-python">Topic <span class="hljs-number">1</span>: 
  code machine source central directly executed intelligible

Topic <span class="hljs-number">2</span>: 
  engineering software process development used term combines

Topic <span class="hljs-number">3</span>: 
  algorithms programming application different domain expertise formal

Topic <span class="hljs-number">4</span>: 
  computer specific task programming automate complex given
</code></pre>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/02/wc2.jpeg" alt="Image" width="600" height="400" loading="lazy">
<em>Topic Clouds from NMF</em></p>
<p>For the given piece of text, you can see that both SVD and NMF give similar topic clouds.</p>
<h2 id="heading-svd-vs-nmf-an-overview-of-the-differences">SVD vs NMF – An Overview of the Differences</h2>
<p>Now, let's put together the differences between these two matrix factorization techniques for topic modeling.</p>
<ul>
<li>SVD is an exact matrix factorization technique – you can reconstruct the input DTM from the resultant matrices.</li>
<li>If you choose to use k-SVD, it's the best possible k-rank approximation to the input DTM.</li>
<li>Though NMF is a non-exact approximation to the input DTM, it's known to capture more diverse topics than SVD.</li>
</ul>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>I hope you enjoyed this tutorial. As a next step, you may spin up your own Colab notebook using the code cells from this tutorial. You only have to plug in the piece of text that you'd like to find topics for, and you'd have your topics and word clouds ready!</p>
<p>Thank you for reading, and happy coding!</p>
<h3 id="heading-references-and-further-reading-on-topic-modeling">References and Further Reading on Topic Modeling</h3>
<ul>
<li><a target="_blank" href="https://www.fast.ai/2019/07/08/fastai-nlp/">A Code-First Approach to Natural Language Processing</a> by fast.ai</li>
<li><a target="_blank" href="https://www.fast.ai/2017/07/17/num-lin-alg/">Computational Linear Algebra</a> by fast.ai</li>
</ul>
<p>Cover Image: Photo by <a target="_blank" href="https://unsplash.com/ja/@brett_jordan?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Brett Jordan</a> on <a target="_blank" href="https://unsplash.com/photos/M3cxjDNiLlQ?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
