<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Accessibility - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Accessibility - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 22 May 2026 17:39:38 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/accessibility/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build Responsive and Accessible UI Designs with React and Semantic HTML ]]>
                </title>
                <description>
                    <![CDATA[ Building modern React applications requires more than just functionality. It also demands responsive layouts and accessible user experiences. By combining semantic HTML, responsive design techniques,  ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-responsive-accessible-ui-with-react-and-semantic-html/</link>
                <guid isPermaLink="false">69d539975da14bc70e76871d</guid>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Responsive Web Design ]]>
                    </category>
                
                    <category>
                        <![CDATA[ semantichtml ]]>
                    </category>
                
                    <category>
                        <![CDATA[ aria ]]>
                    </category>
                
                    <category>
                        <![CDATA[ UI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Gopinath Karunanithi ]]>
                </dc:creator>
                <pubDate>Tue, 07 Apr 2026 17:06:31 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/d2651d02-040d-4c4f-bbfe-ef92097edab4.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building modern React applications requires more than just functionality. It also demands responsive layouts and accessible user experiences.</p>
<p>By combining semantic HTML, responsive design techniques, and accessibility best practices (like ARIA roles and keyboard navigation), developers can create interfaces that work across devices and for all users, including those with disabilities.</p>
<p>This article shows how to design scalable, inclusive React UIs using real-world patterns and code examples.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-overview">Overview</a></p>
</li>
<li><p><a href="#heading-why-accessibility-and-responsiveness-matter">Why Accessibility and Responsiveness Matter</a></p>
</li>
<li><p><a href="#heading-core-principles-of-accessible-and-responsive-design">Core Principles of Accessible and Responsive Design</a></p>
</li>
<li><p><a href="#heading-using-semantic-html-in-react">Using Semantic HTML in React</a></p>
</li>
<li><p><a href="#heading-structuring-a-page-with-semantic-elements">Structuring a Page with Semantic Elements</a></p>
</li>
<li><p><a href="#heading-building-responsive-layouts">Building Responsive Layouts</a></p>
</li>
<li><p><a href="#heading-accessibility-with-aria">Accessibility with ARIA</a></p>
</li>
<li><p><a href="#heading-keyboard-navigation">Keyboard Navigation</a></p>
</li>
<li><p><a href="#heading-focus-management">Focus Management</a></p>
</li>
<li><p><a href="#heading-forms-and-accessibility">Forms and Accessibility</a></p>
</li>
<li><p><a href="#heading-responsive-typography-and-images">Responsive Typography and Images</a></p>
</li>
<li><p><a href="#heading-building-a-fully-accessible-responsive-component-end-to-end-example">Building a Fully Accessible Responsive Component (End-to-End Example)</a></p>
</li>
<li><p><a href="#heading-testing-accessibility">Testing Accessibility</a></p>
</li>
<li><p><a href="#heading-best-practices">Best Practices</a></p>
</li>
<li><p><a href="#heading-when-not-to-overuse-accessibility-features">When NOT to Overuse Accessibility Features</a></p>
</li>
<li><p><a href="#heading-future-enhancements">Future Enhancements</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before following along, you should be familiar with:</p>
<ul>
<li><p>React fundamentals (components, hooks, JSX)</p>
</li>
<li><p>Basic HTML and CSS</p>
</li>
<li><p>JavaScript ES6 features</p>
</li>
<li><p>Basic understanding of accessibility concepts (helpful but not required)</p>
</li>
</ul>
<h2 id="heading-overview">Overview</h2>
<p>Modern web applications must serve a diverse audience across a wide range of devices, screen sizes, and accessibility needs. Users today expect seamless experiences whether they are browsing on a desktop, tablet, or mobile device – and they also expect interfaces that are usable regardless of physical or cognitive limitations.</p>
<p>Two essential principles help achieve this:</p>
<ul>
<li><p>Responsive design, which ensures layouts adapt to different screen sizes</p>
</li>
<li><p>Accessibility, which ensures applications are usable by people with disabilities</p>
</li>
</ul>
<p>In React applications, these principles are often implemented incorrectly or treated as afterthoughts. Developers may rely heavily on div-based layouts, ignore semantic HTML, or overlook accessibility features such as keyboard navigation and screen reader support.</p>
<p>This article will show you how to build responsive and accessible UI designs in React using semantic HTML. You'll learn how to:</p>
<ul>
<li><p>Structure components using semantic HTML elements</p>
</li>
<li><p>Build responsive layouts using modern CSS techniques</p>
</li>
<li><p>Improve accessibility with ARIA attributes and proper roles</p>
</li>
<li><p>Ensure keyboard navigation and screen reader compatibility</p>
</li>
<li><p>Apply best practices for scalable and inclusive UI design</p>
</li>
</ul>
<p>By the end of this guide, you'll be able to create React interfaces that are not only visually responsive but also accessible to all users.</p>
<h2 id="heading-why-accessibility-and-responsiveness-matter">Why Accessibility and Responsiveness Matter</h2>
<p>Responsive and accessible design isn't just about compliance. It directly impacts usability, performance, and reach.</p>
<p><strong>Accessibility benefits:</strong></p>
<ul>
<li><p>Supports users with visual, motor, or cognitive impairments</p>
</li>
<li><p>Improves SEO and content discoverability</p>
</li>
<li><p>Enhances usability for all users</p>
</li>
</ul>
<p><strong>Responsiveness benefits:</strong></p>
<ul>
<li><p>Ensures consistent UX across devices</p>
</li>
<li><p>Reduces bounce rates on mobile</p>
</li>
<li><p>Improves performance and scalability</p>
</li>
</ul>
<p>Ignoring these principles can result in broken layouts on smaller screens, poor screen reader compatibility, and limited reach and usability.</p>
<h2 id="heading-core-principles-of-accessible-and-responsive-design">Core Principles of Accessible and Responsive Design</h2>
<p>Before diving into the code, it’s important to understand the foundational principles.</p>
<h3 id="heading-1-semantic-html-first">1. Semantic HTML First</h3>
<p>Semantic HTML refers to using HTML elements that clearly describe their meaning and role in the interface, rather than relying on generic containers like <code>&lt;div&gt; or &lt;span&gt;.</code>These elements provide built-in accessibility, improve SEO, and make code more readable.</p>
<p>For example:</p>
<p><strong>Non-semantic:</strong></p>
<pre><code class="language-html">&lt;div onClick={handleClick}&gt;Submit&lt;/div&gt;
</code></pre>
<p><strong>Semantic:</strong></p>
<pre><code class="language-html">&lt;button type="button" onClick={handleClick}&gt;Submit&lt;/button&gt;
</code></pre>
<p>Another example:</p>
<p><strong>Non-semantic:</strong></p>
<pre><code class="language-html">&lt;div className="header"&gt;My App&lt;/div&gt;
</code></pre>
<p><strong>Semantic:</strong></p>
<pre><code class="language-html">&lt;header&gt;My App&lt;/header&gt;
</code></pre>
<p>Using semantic elements such as <code>&lt;header&gt;</code>, <code>&lt;nav&gt;</code>, <code>&lt;main&gt;</code>, <code>&lt;section&gt;</code>, <code>&lt;article&gt;</code>, and <code>&lt;button&gt;</code> helps browsers and assistive technologies (like screen readers) understand the structure and purpose of your UI without additional configuration.</p>
<p>Why this matters:</p>
<ul>
<li><p>Screen readers understand semantic elements automatically</p>
</li>
<li><p>It supports built-in accessibility (keyboard, focus, roles)</p>
</li>
<li><p>There's less need for ARIA attributes</p>
</li>
<li><p>It gives you better SEO and maintainability</p>
</li>
</ul>
<h3 id="heading-2-mobile-first-design">2. Mobile-First Design</h3>
<p>Mobile-first design means starting your UI design with the smallest screen sizes (typically mobile devices) and progressively enhancing the layout for larger screens such as tablets and desktops.</p>
<p>This approach makes sure that core content and functionality are prioritized, layouts remain simple and performant, and users on mobile devices get a fully usable experience.</p>
<p>In practice, mobile-first design involves:</p>
<ul>
<li><p>Using a single-column layout initially</p>
</li>
<li><p>Applying minimal styling and spacing</p>
</li>
<li><p>Avoiding complex UI patterns on small screens</p>
</li>
</ul>
<p>Then, you scale up using CSS media queries:</p>
<pre><code class="language-css">.container {
  display: flex;
  flex-direction: column;
}
@media (min-width: 768px) {
  .container {
    flex-direction: row;
  }
}
</code></pre>
<p>Here, the default layout is optimized for mobile, and enhancements are applied only when the screen size increases.</p>
<p><strong>Why this approach works:</strong></p>
<ul>
<li><p>Prioritizes essential content</p>
</li>
<li><p>Improves performance on mobile devices</p>
</li>
<li><p>Reduces layout bugs when scaling up</p>
</li>
<li><p>Aligns with how most users access web apps today</p>
</li>
</ul>
<h3 id="heading-3-progressive-enhancement">3. Progressive Enhancement</h3>
<p>Progressive enhancement is the practice of building a baseline user experience that works for all users (regardless of their device, browser capabilities, or network conditions) and then layering on advanced features for more capable environments.</p>
<p>This approach ensures that core functionality is always accessible, users on older devices or slow networks aren't blocked, and accessibility is preserved even when advanced features fail.</p>
<p>In practice, this means:</p>
<ul>
<li><p>Start with semantic HTML that delivers content and functionality</p>
</li>
<li><p>Add basic styling with CSS for layout and readability</p>
</li>
<li><p>Enhance interactivity using JavaScript (React) only where needed</p>
</li>
</ul>
<p>For example, a form should still be usable with plain HTML:</p>
<pre><code class="language-html">&lt;form&gt;
  &lt;label htmlFor="email"&gt;Email&lt;/label&gt;
  &lt;input id="email" type="email" /&gt;
  &lt;button type="submit"&gt;Submit&lt;/button&gt;
&lt;/form&gt;
</code></pre>
<p>Then, React can enhance it with validation, dynamic feedback, or animations.</p>
<p>By prioritizing functionality first and enhancements later, you ensure your application remains usable in a wide range of real-world scenarios.</p>
<h3 id="heading-4-keyboard-accessibility">4. Keyboard Accessibility</h3>
<p>Keyboard accessibility ensures that users can navigate and interact with your application using only a keyboard. This is critical for users with motor disabilities and also improves usability for power users.</p>
<p>Key aspects of keyboard accessibility include:</p>
<ul>
<li><p>Ensuring all interactive elements (buttons, links, inputs) are focusable</p>
</li>
<li><p>Maintaining a logical tab order across the page</p>
</li>
<li><p>Providing visible focus indicators (for example, outline styles)</p>
</li>
<li><p>Supporting keyboard events such as Enter and Space</p>
</li>
</ul>
<p><strong>Bad Example (Not Accessible)</strong></p>
<pre><code class="language-html">&lt;div onClick={handleClick}&gt;Submit&lt;/div&gt;
</code></pre>
<p>This element:</p>
<ul>
<li><p>Cannot be focused with a keyboard</p>
</li>
<li><p>Does not respond to Enter/Space</p>
</li>
<li><p>Is invisible to screen readers</p>
</li>
</ul>
<p><strong>Good Example</strong></p>
<pre><code class="language-html">&lt;button type="button" onClick={handleClick}&gt;Submit&lt;/button&gt;
</code></pre>
<p>This automatically supports:</p>
<ul>
<li><p>Keyboard interaction</p>
</li>
<li><p>Focus management</p>
</li>
<li><p>Screen reader announcements</p>
</li>
</ul>
<p><strong>Custom Component Example (if needed)</strong></p>
<pre><code class="language-html">&lt;div
  role="button"
  tabIndex={0}
  onClick={handleClick}
  onKeyDown={(e) =&gt; {
    if (e.key === 'Enter' || e.key === ' ') {
      e.preventDefault();
      handleClick();
    }
  }}
&gt;
  Submit
&lt;/div&gt;
</code></pre>
<p>But only use this when native elements aren't sufficient.</p>
<p>These principles form the foundation of accessible and responsive design:</p>
<ul>
<li><p>Use semantic HTML to communicate intent</p>
</li>
<li><p>Design for mobile first, then scale up</p>
</li>
<li><p>Enhance progressively for better compatibility</p>
</li>
<li><p>Ensure full keyboard accessibility</p>
</li>
</ul>
<p>Applying these early prevents major usability and accessibility issues later in development.</p>
<h2 id="heading-using-semantic-html-in-react">Using Semantic HTML in React</h2>
<p>As we briefly discussed above, semantic HTML plays a critical role in both accessibility (a11y) and code readability. Semantic elements clearly describe their purpose to both developers and browsers, which allows assistive technologies like screen readers to interpret and navigate the UI correctly.</p>
<p>For example, when you use a <code>&lt;button&gt;</code> element, browsers automatically provide keyboard support, focus behavior, and accessibility roles. In contrast, non-semantic elements like <code>&lt;div&gt;</code>require additional attributes and manual handling to achieve the same functionality.</p>
<p>From a readability perspective, semantic HTML makes your code easier to understand and maintain. Developers can quickly identify the structure and intent of a component without relying on class names or external documentation.</p>
<p><strong>Bad Example (Non-semantic)</strong></p>
<pre><code class="language-html">&lt;div onClick={handleClick}&gt;Submit&lt;/div&gt;
</code></pre>
<p>Why this is problematic:</p>
<ul>
<li><p>The <code>&lt;div&gt;</code>element has no inherent meaning or role</p>
</li>
<li><p>It is not focusable by default, so keyboard users can't access it</p>
</li>
<li><p>It does not respond to keyboard events like Enter or Space unless explicitly coded</p>
</li>
<li><p>Screen readers do not recognize it as an interactive element</p>
</li>
</ul>
<p>To make this accessible, you would need to add:</p>
<p><code>role="button"</code></p>
<p><code>tabIndex="0"</code></p>
<p><code>Keyboard event handlers</code></p>
<p><strong>Good Example (Semantic)</strong></p>
<pre><code class="language-html">&lt;button type="button" onClick={handleClick}&gt;Submit&lt;/button&gt;
</code></pre>
<p>Why this is better:</p>
<ul>
<li><p>The <code>&lt;button&gt;</code> element is inherently interactive</p>
</li>
<li><p>It is automatically focusable and keyboard accessible</p>
</li>
<li><p>It supports Enter and Space key activation by default</p>
</li>
<li><p>Screen readers correctly announce it as a button</p>
</li>
</ul>
<p>This reduces complexity while improving accessibility and usability.</p>
<h3 id="heading-why-all-this-matters">Why all this matters:</h3>
<p>There are many reasons to use semantic HTML.</p>
<p>First, semantic elements like <code>&lt;button&gt;, &lt;a&gt;,</code> and <code>&lt;form&gt;</code> come with default accessibility behaviors such as focus management and keyboard interaction</p>
<p>It also reduces complexity: you don’t need to manually implement roles, keyboard handlers, or tab navigation</p>
<p>They provide better screen reader support as well. Assistive technologies can correctly interpret the purpose of elements and announce them appropriately</p>
<p>Semantic HTML also improves maintainability and helps other developers quickly understand the intent of your code without reverse-engineering behavior from event handlers</p>
<p>Finally, you'll generally have fewer bugs in your code. Relying on native browser behavior reduces the risk of missing critical accessibility features</p>
<p>Here's another example:</p>
<p><strong>Non-semantic:</strong></p>
<pre><code class="language-html">&lt;div className="nav"&gt;
  &lt;div onClick={goHome}&gt;Home&lt;/div&gt;
&lt;/div&gt;
</code></pre>
<p><strong>Semantic:</strong></p>
<pre><code class="language-html">&lt;nav&gt;
  &lt;a href="/"&gt;Home&lt;/a&gt;
&lt;/nav&gt;
</code></pre>
<p>Here, <code>&lt;nav&gt;</code> clearly defines a navigation region, and <code>&lt;a&gt;</code> provides built-in link behavior, including keyboard navigation and proper screen reader announcements.</p>
<h2 id="heading-structuring-a-page-with-semantic-elements">Structuring a Page with Semantic Elements</h2>
<p>When building a React application, structuring your layout with semantic HTML elements helps define clear regions of your interface. Instead of relying on generic containers like <code>&lt;div&gt;</code>, semantic elements communicate the purpose of each section to both developers and assistive technologies.</p>
<p>In the example below, we're creating a basic page layout using commonly used semantic elements such as <code>&lt;header&gt;</code>, <code>&lt;nav&gt;</code>, <code>&lt;main&gt;</code>, <code>&lt;section&gt;</code>, and <code>&lt;footer&gt;</code>. Each of these elements represents a specific part of the UI and contributes to better accessibility and maintainability.</p>
<pre><code class="language-javascript">function Layout() {
  return (
    &lt;&gt;
      {/* Skip link for keyboard and screen reader users */}
      &lt;a href="#main-content" className="skip-link"&gt;
        Skip to main content
      &lt;/a&gt;

      &lt;header&gt;
        &lt;h1&gt;My App&lt;/h1&gt;
      &lt;/header&gt;

      &lt;nav&gt;
        &lt;ul&gt;
          &lt;li&gt;&lt;a href="/"&gt;Home&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/nav&gt;

      &lt;main id="main-content"&gt;
        &lt;section&gt;
          &lt;h2&gt;Dashboard&lt;/h2&gt;
        &lt;/section&gt;
      &lt;/main&gt;

      &lt;footer&gt;
        &lt;p&gt;© 2026&lt;/p&gt;
      &lt;/footer&gt;
    &lt;/&gt;
  );
}
</code></pre>
<p>Each element in this layout has a specific role:</p>
<ul>
<li><p>The skip link allows screen reader users to skip to the main content</p>
</li>
<li><p><code>&lt;header&gt;</code>: Represents introductory content or branding</p>
</li>
<li><p><code>&lt;nav&gt;</code>: Contains navigation links</p>
</li>
<li><p><code>&lt;main&gt;</code>: Holds the primary content of the page</p>
</li>
<li><p><code>&lt;section&gt;</code>: Groups related content within the page</p>
</li>
<li><p><code>&lt;footer&gt;</code>: Contains closing or supplementary information</p>
</li>
</ul>
<p>Using these elements correctly ensures your UI is both logically structured and accessible by default.</p>
<h3 id="heading-why-this-structure-is-important">Why this structure is important:</h3>
<p>Properly structuring a page like this brings with it many benefits.</p>
<p>For example, it gives you Improved screen reader navigation. This is because semantic elements allow screen readers to identify different regions of the page (for example, navigation, main content, footer). Users can quickly jump between these sections instead of reading the page linearly</p>
<p>It also gives you better document structure. Elements like <code>&lt;main&gt;</code> and <code>&lt;section&gt;</code> define a logical hierarchy, making content easier to parse for both browsers and assistive technologies</p>
<p>Search engines also use semantic structure to better understand page content and prioritize important sections, resulting in better SEO.</p>
<p>It also makes your code more readable, so other devs can immediately understand the layout and purpose of each section without relying on class names or comments</p>
<p>And it provides built-in accessibility landmarks using elements like <code>&lt;nav&gt;</code> and <code>&lt;main&gt;</code>, allowing assistive technologies to provide shortcuts for users.</p>
<h2 id="heading-building-responsive-layouts">Building Responsive Layouts</h2>
<p>Responsive layouts ensure that your UI adapts smoothly across different screen sizes, from mobile devices to large desktop displays. Instead of building separate layouts for each device, modern CSS techniques like Flexbox, Grid, and media queries allow you to create flexible, fluid designs.</p>
<p>In this section, we’ll look at how layout behavior changes based on screen size, starting with a mobile-first approach and progressively enhancing the layout for larger screens.</p>
<p><strong>Using CSS Flexbox:</strong></p>
<pre><code class="language-css">.container {
  display: flex;
  flex-direction: column;
}

@media (min-width: 768px) {
  .container {
    flex-direction: row;
  }
}
</code></pre>
<p>On smaller screens (mobile), elements are stacked vertically using <code>flex-direction: column</code>, making content easier to read and scroll.</p>
<p>On larger screens (768px and above), the layout switches to a horizontal row, utilizing available screen space more efficiently.</p>
<p><strong>Why this helps:</strong></p>
<ul>
<li><p>Ensures content is readable on small devices without horizontal scrolling</p>
</li>
<li><p>Improves layout efficiency on larger screens</p>
</li>
<li><p>Supports a mobile-first design strategy by defining the default layout for smaller screens first and enhancing it progressively</p>
</li>
</ul>
<p><strong>Using CSS Grid:</strong></p>
<pre><code class="language-css">.grid {
  display: grid;
  grid-template-columns: 1fr;
  gap: 16px;
}

@media (min-width: 768px) {
  .grid {
    grid-template-columns: repeat(3, 1fr);
  }
}
</code></pre>
<p>On mobile devices, content is displayed in a single-column layout (<code>1fr</code>), ensuring each item takes full width.</p>
<p>On larger screens, the layout shifts to three equal columns using <code>repeat(3, 1fr)</code>, creating a grid structure.</p>
<p><strong>Why this helps:</strong></p>
<ul>
<li><p>Provides a clean and consistent way to manage complex layouts</p>
</li>
<li><p>Makes it easy to scale from simple to multi-column designs</p>
</li>
<li><p>Improves visual balance and spacing across different screen sizes</p>
</li>
</ul>
<p><strong>React Example:</strong></p>
<pre><code class="language-javascript">function CardGrid() {
  return (
    &lt;div className="grid"&gt;
      &lt;div className="card"&gt;Item 1&lt;/div&gt;
      &lt;div className="card"&gt;Item 2&lt;/div&gt;
      &lt;div className="card"&gt;Item 3&lt;/div&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p>The React component uses the .grid class to apply responsive Grid behavior. Each card automatically adjusts its position based on screen size.</p>
<p><strong>Why this is effective:</strong></p>
<ul>
<li><p>Separates structure (React JSX) from layout (CSS)</p>
</li>
<li><p>Allows you to reuse the same component across different screen sizes without modification</p>
</li>
<li><p>Ensures consistent responsiveness across your application with minimal code</p>
</li>
</ul>
<p>By combining Flexbox for one-dimensional layouts and Grid for two-dimensional layouts, you can build highly adaptable interfaces that respond efficiently to different devices and screen sizes.</p>
<h2 id="heading-accessibility-with-aria">Accessibility with ARIA</h2>
<p>ARIA (Accessible Rich Internet Applications) is a set of attributes that enhance the accessibility of web content, especially when building custom UI components that cannot be fully implemented using native HTML elements.</p>
<p>ARIA works by providing additional semantic information to assistive technologies such as screen readers. It does this through:</p>
<ul>
<li><p>Roles, which define what an element is (for example, button, dialog, menu)</p>
</li>
<li><p>States and properties, which describe the current condition or behavior of an element (for example, expanded, hidden, live updates)</p>
</li>
</ul>
<p>For example, when you create a custom dropdown using <code>&lt;div&gt;</code> elements, browsers don't inherently understand its purpose. By applying ARIA roles and attributes, you can communicate that this structure behaves like a menu and ensure it is interpreted correctly.</p>
<p>Just make sure you use ARIA carefully. Incorrect or unnecessary usage can reduce accessibility. Here's a key rule to follow: use native HTML first. Only use ARIA when necessary.</p>
<p>ARIA is especially useful for:</p>
<ul>
<li><p>Custom UI components (modals, tabs, dropdowns)</p>
</li>
<li><p>Dynamic content updates</p>
</li>
<li><p>Complex interactions not covered by standard HTML</p>
</li>
</ul>
<p>Something to note before we get into the examples here: real-world accessibility is complex. For production apps, you should typically prefer well-tested libraries like react-aria, Radix UI, or Headless UI. These examples are primarily for educational purposes and aren't production-ready.</p>
<p><strong>Example: Accessible Modal</strong></p>
<pre><code class="language-javascript">function Modal({ isOpen, onClose }) {
  const dialogRef = React.useRef();

  React.useEffect(() =&gt; {
    if (isOpen) {
      dialogRef.current?.focus();
    }
  }, [isOpen]);

  if (!isOpen) return null;

  return (
    &lt;div
      role="dialog"
      aria-modal="true"
      aria-labelledby="modal-title"
      tabIndex={-1}
      ref={dialogRef}
      onKeyDown={(e) =&gt; {
        if (e.key === 'Escape') onClose();
      }}
    &gt;
      &lt;h2 id="modal-title"&gt;Modal Title&lt;/h2&gt;
      &lt;button type="button" onClick={onClose}&gt;Close&lt;/button&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p><strong>How this works:</strong></p>
<ul>
<li><p><code>role="dialog"</code> identifies the element as a modal dialog</p>
</li>
<li><p><code>aria-modal="true"</code> indicates that background content is inactive</p>
</li>
<li><p><code>aria-labelledby</code> connects the dialog to its visible title for screen readers</p>
</li>
<li><p><code>tabIndex={-1}</code> allows the dialog container to receive focus programmatically</p>
</li>
<li><p>Focus is moved to the dialog when it opens</p>
</li>
<li><p>Pressing Escape closes the modal, which is a standard accessibility expectation</p>
</li>
</ul>
<p>This ensures that users can understand, navigate, and exit the modal using both keyboard and assistive technologies.</p>
<h3 id="heading-key-aria-attributes">Key ARIA Attributes</h3>
<h4 id="heading-1-role">1. role</h4>
<p>Defines the type of element and its purpose. For example, <code>role="dialog"</code> tells assistive technologies that the element behaves like a modal dialog.</p>
<h4 id="heading-2-aria-label">2. aria-label</h4>
<p>Provides an accessible name for an element when visible text is not sufficient. Screen readers use this label to describe the element to users.</p>
<h4 id="heading-3-aria-hidden">3. aria-hidden</h4>
<p>Indicates whether an element should be ignored by assistive technologies. For example, <code>aria-hidden="true"</code> hides decorative elements from screen readers.</p>
<h4 id="heading-4-aria-live">4. aria-live</h4>
<p>Used for dynamic content updates. It tells screen readers to announce changes automatically without requiring user interaction (for example, form validation messages or notifications).</p>
<p><strong>Example: Accessible Dropdown (Custom Component)</strong></p>
<pre><code class="language-javascript">function Dropdown({ isOpen, toggle }) {
  return (
    &lt;div&gt;
      &lt;button
        type="button"
        aria-expanded={isOpen}
        aria-controls="dropdown-menu"
        onClick={toggle}
      &gt;
        Menu
      &lt;/button&gt;

      {isOpen &amp;&amp; (
        &lt;ul id="dropdown-menu"&gt;
          &lt;li&gt;
            &lt;button type="button" onClick={() =&gt; console.log('Item 1')}&gt;
              Item 1
            &lt;/button&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;button type="button" onClick={() =&gt; console.log('Item 2')}&gt;
              Item 2
            &lt;/button&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      )}
    &lt;/div&gt;
  );
}
</code></pre>
<p><strong>How this works:</strong></p>
<ul>
<li><p><code>aria-expanded</code> indicates whether the dropdown is open or closed</p>
</li>
<li><p><code>aria-controls</code> links the button to the dropdown content via its id</p>
</li>
<li><p>The <code>&lt;button&gt;</code> element acts as the trigger and is fully keyboard accessible</p>
</li>
<li><p>The <code>&lt;ul&gt;</code> and <code>&lt;li&gt;</code> elements provide a natural list structure</p>
</li>
<li><p>Using <code>&lt;a&gt;</code> elements ensures proper navigation behavior and accessibility</p>
</li>
</ul>
<p>Why this approach is correct:</p>
<ul>
<li><p>It follows standard web patterns instead of application-style menus</p>
</li>
<li><p>It avoids misusing ARIA roles like role="menu", which require complex keyboard handling</p>
</li>
<li><p>Screen readers can correctly interpret the structure without additional roles</p>
</li>
<li><p>It keeps the implementation simple, accessible, and maintainable</p>
</li>
</ul>
<p>If you need advanced menu behavior (like arrow key navigation), then ARIA menu roles may be appropriate –&nbsp;but only when fully implemented according to the ARIA Authoring Practices.</p>
<p>Note: Most dropdowns in web applications are not true "menus" in the ARIA sense. Avoid using role="menu" unless you are implementing full keyboard navigation (arrow keys, focus management, and so on).</p>
<h2 id="heading-keyboard-navigation">Keyboard Navigation</h2>
<p>Keyboard navigation ensures that users can fully interact with your application using only a keyboard, without relying on a mouse. This is essential for users with motor disabilities, but it also benefits power users and developers who prefer keyboard-based workflows.</p>
<p>In a well-designed interface, users should be able to:</p>
<ul>
<li><p>Navigate through interactive elements using the Tab key</p>
</li>
<li><p>Activate buttons and links using Enter or Space</p>
</li>
<li><p>Clearly see which element is currently focused</p>
</li>
</ul>
<p>In the example below, we’ll look at common mistakes in keyboard handling and why relying on native HTML elements is usually the better approach.</p>
<p><strong>Example:</strong></p>
<p>Avoid adding custom keyboard handlers to native elements like <code>&lt;button&gt;</code>, as they already support keyboard interaction by default.</p>
<p>For example, this is all you need:</p>
<pre><code class="language-html">&lt;button type="button" onClick={handleClick}&gt;Submit&lt;/button&gt;
</code></pre>
<p>This automatically supports:</p>
<ul>
<li><p>Enter and Space key activation</p>
</li>
<li><p>Focus management</p>
</li>
<li><p>Screen reader announcements</p>
</li>
</ul>
<p>Adding manual keyboard event handlers here is unnecessary and can introduce bugs or inconsistent behavior.</p>
<p><strong>What this example shows:</strong></p>
<p>Avoid manually handling keyboard events for native interactive elements like <code>&lt;button&gt;</code>. These elements already provide built-in keyboard support and accessibility features.</p>
<p>For example:</p>
<pre><code class="language-html">&lt;button type="button" onClick={handleClick}&gt;Submit&lt;/button&gt;
</code></pre>
<p>Why this works:</p>
<ul>
<li><p>Supports both Enter and Space key activation by default</p>
</li>
<li><p>Is focusable and participates in natural tab order</p>
</li>
<li><p>Provides built-in accessibility roles and screen reader announcements</p>
</li>
<li><p>Reduces the need for additional logic or ARIA attributes</p>
</li>
</ul>
<p>Adding custom keyboard handlers (like onKeyDown) to native elements is unnecessary and can introduce bugs or inconsistent behavior. Always prefer native HTML elements for interactivity whenever possible.</p>
<h3 id="heading-avoiding-common-keyboard-traps">Avoiding Common Keyboard Traps</h3>
<p>One of the most common keyboard accessibility issues is “trapping users inside interactive components”, such as modals or custom dropdowns. This happens when focus is moved into a component but can't escape using Tab, Shift+Tab, or other keyboard controls. Users relying on keyboards may become stuck, unable to navigate to other parts of the page.</p>
<p>In the example below, you'll see a simple modal that tries to set focus, but doesn’t manage Tab behavior properly.</p>
<pre><code class="language-javascript">function Modal({ isOpen }) {
  const ref = React.useRef();

  React.useEffect(() =&gt; {
    if (isOpen) ref.current?.focus();
  }, [isOpen]);

  return (
    &lt;div role="dialog"&gt;
      &lt;button type="button" ref={ref}&gt;Close&lt;/button&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p>What this code shows:</p>
<ul>
<li><p>When the modal opens, focus is moved to the Close button using <code>ref.current.focus()</code></p>
</li>
<li><p>The modal uses <code>role="dialog"</code> to communicate its purpose</p>
</li>
</ul>
<p>There are some issues with this code that you should be aware of. First, tabbing inside the modal may allow focus to move outside the modal if additional focusable elements exist.</p>
<p>Users may also become trapped if no mechanism returns focus to the triggering element when the modal closes.</p>
<p>There's also no handling of Shift+Tab or cycling focus is present.</p>
<p>This demonstrates a <strong>partial focus management</strong>, but it’s not fully accessible yet.</p>
<p>To improve focus management, you can trap focus within the modal by ensuring that Tab and Shift+Tab cycle only through elements inside the modal.</p>
<p>You can also return focus to the trigger: when the modal closes, return focus to the element that opened it.</p>
<p><strong>Example improvement (conceptual):</strong></p>
<pre><code class="language-javascript">function Modal({ isOpen, onClose, triggerRef }) {
  const modalRef = React.useRef();

  React.useEffect(() =&gt; {
    if (isOpen) {
      modalref.current?.focus();
      // Add focus trap logic here
    } else {
      triggerref.current?.focus();
    }
  }, [isOpen]);

  return (
    &lt;div role="dialog" ref={modalRef} tabIndex={-1}&gt;
      &lt;button type="button" onClick={onClose}&gt;Close&lt;/button&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p>Remember that this modal is not fully accessible without focus trapping. In production, use a library like <code>focus-trap-react</code>, <code>react-aria</code>, or Radix UI.</p>
<p><strong>Key points:</strong></p>
<ul>
<li><p><code>tabIndex={-1}</code> allows the div to receive programmatic focus</p>
</li>
<li><p>Focus trap ensures users cannot tab out unintentionally</p>
</li>
<li><p>Returning focus preserves context, so users can continue where they left off</p>
</li>
</ul>
<p><strong>Best practices:</strong></p>
<ul>
<li><p>Always move focus into modals</p>
</li>
<li><p>Return focus to the trigger element when closed</p>
</li>
<li><p>Ensure Tab cycles correctly</p>
</li>
</ul>
<p>As a general rule, always prefer native HTML elements for interactivity. Only implement custom keyboard handling when building advanced components that cannot be achieved with standard elements.</p>
<h2 id="heading-focus-management">Focus Management</h2>
<p>Focus management is the practice of controlling where keyboard focus goes when users interact with components such as modals, forms, or interactive widgets. Proper focus management ensures that:</p>
<ul>
<li><p>Users relying on keyboards or assistive technologies can navigate seamlessly</p>
</li>
<li><p>Focus does not get lost or trapped in unexpected places</p>
</li>
<li><p>Users maintain context when content updates dynamically</p>
</li>
</ul>
<p>The example below shows a common approach that only partially handles focus:</p>
<p><strong>Bad Example:</strong></p>
<pre><code class="language-javascript">// Bad Example: Automatically focusing input without context
const ref = React.useRef();
React.useEffect(() =&gt; {
  ref.current?.focus();
}, []);
&lt;input ref={ref} placeholder="Name" /&gt;
</code></pre>
<p>In the above code, the input receives focus as soon as the component mounts, but there’s no handling for returning focus when the user navigates away.</p>
<p>If this input is inside a modal or dynamic content, users may get lost or trapped. There aren't any focus indicators or context for assistive technologies.</p>
<p>This is a minimal solution that can cause confusion in real applications.</p>
<p><strong>Improved Example:</strong></p>
<pre><code class="language-javascript">// Improved Example: Managing focus in a modal context
function Modal({ isOpen, onClose, triggerRef }) {  
const dialogRef = React.useRef();

  React.useEffect(() =&gt; {
    if (isOpen) {
      dialogRef.current?.focus();
    } else if (triggerRef?.current) {
      triggerref.current?.focus();
    }
  }, [isOpen]);

  React.useEffect(() =&gt; {
    function handleKeyDown(e) {
      if (e.key === 'Escape') {
        onClose();
      }
    }

    if (isOpen) {
      document.addEventListener('keydown', handleKeyDown);
    }

    return () =&gt; {
      document.removeEventListener('keydown', handleKeyDown);
    };
  }, [isOpen, onClose]);

  if (!isOpen) return null;

  return (
    &lt;div
      role="dialog"
      aria-modal="true"
      aria-labelledby="modal-title"
      tabIndex={-1}
      ref={dialogRef}
    &gt;
      &lt;h2 id="modal-title"&gt;Modal Title&lt;/h2&gt;
      &lt;button type="button" onClick={onClose}&gt;Close&lt;/button&gt;
      &lt;input type="text" placeholder="Name" /&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p><strong>Explanation:</strong></p>
<ul>
<li><p><code>tabIndex={-1}</code> enables the dialog container to receive focus</p>
</li>
<li><p>Focus is moved to the modal when it opens, ensuring keyboard users start in the correct context</p>
</li>
<li><p>Focus is returned to the trigger element when the modal closes, preserving user flow</p>
</li>
<li><p><code>aria-labelledby</code> provides an accessible name for the dialog</p>
</li>
<li><p>Escape key handling allows users to close the modal without a mouse</p>
</li>
</ul>
<p>Note: For full accessibility, you should also implement focus trapping so users cannot tab outside the modal while it is open.</p>
<p>Tip: In production applications, use libraries like react-aria, focus-trap-react, or Radix UI to handle focus trapping and accessibility edge cases reliably.</p>
<p>Also, keep in mind here that the document-level keydown listener is global, which affects the entire page and can conflict with other components.</p>
<pre><code class="language-javascript">document.addEventListener('keydown', handleKeyDown);
</code></pre>
<p>A safer alternative is to scope it to the modal:</p>
<pre><code class="language-javascript">&lt;div
  onKeyDown={(e) =&gt; {
    if (e.key === 'Escape') onClose();
  }}
&gt;
</code></pre>
<p>For simple cases, attach <code>onKeyDown</code> to the dialog instead of the document.</p>
<h4 id="heading-best-practice">Best Practice:</h4>
<p>For complex components, use libraries like <code>focus-trap-react</code> or <code>react-aria</code> to manage focus reliably, especially for modals, dropdowns, and popovers.</p>
<h2 id="heading-forms-and-accessibility">Forms and Accessibility</h2>
<p>Forms are critical points of interaction in web applications, and proper accessibility ensures that all users – including those using screen readers or other assistive technologies – can understand and interact with them effectively.</p>
<p>Proper labeling means that every input field, checkbox, radio button, or select element has an associated label that clearly describes its purpose. This allows screen readers to announce the input meaningfully and helps keyboard-only users understand what information is expected.</p>
<p>In addition to labeling, form accessibility includes:</p>
<ul>
<li><p>Providing clear error messages when input is invalid</p>
</li>
<li><p>Ensuring error messages are announced to assistive technologies</p>
</li>
<li><p>Maintaining logical focus order so users can navigate inputs easily</p>
</li>
</ul>
<p><strong>Bad Example:</strong></p>
<pre><code class="language-html">&lt;input type="text" placeholder="Name" /&gt;
</code></pre>
<p>Why this isn't good:</p>
<ul>
<li><p>This input relies only on a placeholder for context</p>
</li>
<li><p>Screen readers may not announce the purpose of the field clearly</p>
</li>
<li><p>Once a user starts typing, the placeholder disappears, leaving no guidance</p>
</li>
<li><p>Keyboard-only users may not have enough context to know what to enter</p>
</li>
</ul>
<p><strong>Good Example:</strong></p>
<pre><code class="language-html">&lt;label htmlFor="name"&gt;Name&lt;/label&gt;
&lt;input id="name" type="text" /&gt;
</code></pre>
<p>Why this is better:</p>
<ul>
<li><p>The <code>&lt;label&gt;</code> is explicitly associated with the input via <code>htmlFor / id</code></p>
</li>
<li><p>Screen readers announce "Name" before the input, providing clear context</p>
</li>
<li><p>Users navigating with Tab understand the field’s purpose</p>
</li>
<li><p>The label persists even when the user types, unlike a placeholder</p>
</li>
</ul>
<p><strong>Error Handling:</strong></p>
<pre><code class="language-html">&lt;label htmlFor="name"&gt;Name&lt;/label&gt;
&lt;input
  id="name"
  type="text"
  aria-describedby="name-error"
  aria-invalid="true"
/&gt;

&lt;p id="name-error" role="alert"&gt;
  Name is required
&lt;/p&gt;
</code></pre>
<p><strong>Explanation</strong></p>
<ul>
<li><p><code>aria-describedby</code> links the input to the error message using the element’s id</p>
</li>
<li><p>Screen readers announce the error message when the input is focused</p>
</li>
<li><p><code>aria-invalid="true"</code> indicates that the field currently contains an error</p>
</li>
<li><p><code>role="alert"</code> ensures the error message is announced immediately when it appears</p>
</li>
</ul>
<p>This creates a clear relationship between the input and its validation message, improving usability for screen reader users.</p>
<p>Tip: Only apply aria-invalid and error messages when validation fails. Avoid marking fields as invalid before user interaction.</p>
<h2 id="heading-responsive-typography-and-images">Responsive Typography and Images</h2>
<p>Responsive typography and images ensure that your content remains readable and visually appealing across a wide range of devices, from small smartphones to large desktop monitors.</p>
<p>This is important, because text should scale naturally so it remains legible on all screens, and images should adjust to container sizes to avoid layout issues or overflow. Both contribute to a better user experience and accessibility</p>
<p>In this section, we’ll cover practical ways to implement responsive typography and images in React and CSS.</p>
<pre><code class="language-css">h1 {
  font-size: clamp(1.5rem, 2vw, 3rem);
}
</code></pre>
<p>In this code:</p>
<ul>
<li><p>The <code>clamp()</code> function allows text to scale fluidly:</p>
</li>
<li><p>The first value (1.5rem) is the “minimum font size”</p>
</li>
<li><p>The second value (2vw) is the “preferred size based on viewport width”</p>
</li>
<li><p>The third value (3rem) is the “maximum font size”</p>
</li>
<li><p>This ensures headings are “readable on small screens” without becoming too large on desktops</p>
</li>
</ul>
<p>Alternative methods include using <code>media queries</code> to adjust font sizes at different breakpoints</p>
<p><strong>Responsive Images:</strong></p>
<pre><code class="language-html">&lt;img src="image.jpg" alt="Description" loading="lazy" /&gt;
</code></pre>
<p>In this code, responsive images adapt to different screen sizes and resolutions to prevent layout issues or slow loading times. Key techniques include:</p>
<h3 id="heading-1-fluid-images-using-css">1. Fluid images using CSS:</h3>
<pre><code class="language-css">img {
     max-width: 100%;
     height: auto;
   }
</code></pre>
<p>This makes sure that images never overflow their container and maintains aspect ratio automatically.</p>
<h3 id="heading-2-using-srcset-for-multiple-resolutions">2. Using <code>srcset</code> for multiple resolutions:</h3>
<pre><code class="language-html">&lt;img src="image-small.jpg"
     srcset="image-small.jpg 480w,
             image-medium.jpg 1024w,
             image-large.jpg 1920w"
     sizes="(max-width: 600px) 480px,
            (max-width: 1200px) 1024px,
            1920px"
     alt="Description"&gt;
</code></pre>
<p>This provides different image files depending on screen size or resolution and reduces loading times and improves performance on smaller devices.</p>
<h3 id="heading-3-always-include-descriptive-alt-text">3. Always include descriptive alt text</h3>
<p>This is critical for screen readers and accessibility. It also helps users understand the image if it cannot be loaded.</p>
<p>Tip: Combine responsive typography, images, and flexible layout containers (like CSS Grid or Flexbox) to create interfaces that scale gracefully across all devices and maintain accessibility.</p>
<h3 id="heading-4-ensure-sufficient-color-contrast">4. Ensure Sufficient Color Contrast</h3>
<p>Low contrast text can make content unreadable for many users.</p>
<pre><code class="language-css">.bad-text {
  color: #aaa;
}

.good-text {
  color: #222;
}
</code></pre>
<p>Use tools like WebAIM Contrast Checker and Chrome DevTools Accessibility panel to check your color contrasts. Also note that WCAG AA requires 4.5:1 contrast ratio for normal text.</p>
<h2 id="heading-building-a-fully-accessible-responsive-component-end-to-end-example">Building a Fully Accessible Responsive Component (End-to-End Example)</h2>
<p>To understand how responsiveness and accessibility work together in practice, let’s build a reusable accessible card component that adapts to screen size and supports keyboard and screen reader users.</p>
<h3 id="heading-step-1-component-structure-semantic-html">Step 1: Component Structure (Semantic HTML)</h3>
<pre><code class="language-javascript">function ProductCard({ title, description, onAction }) {
  return (
    &lt;article className="card"&gt;
      &lt;h3&gt;{title}&lt;/h3&gt;
      &lt;p&gt;{description}&lt;/p&gt;
      &lt;button type="button" onClick={onAction}&gt;
        View Details
      &lt;/button&gt;
    &lt;/article&gt;
  );
}
</code></pre>
<p><strong>Why This Works</strong></p>
<ul>
<li><p><code>&lt;article&gt;</code> provides semantic meaning for standalone content</p>
</li>
<li><p><code>&lt;h3&gt;</code> establishes a proper heading hierarchy</p>
</li>
<li><p><code>&lt;button&gt;</code> ensures built-in keyboard and accessibility support</p>
</li>
</ul>
<h3 id="heading-step-2-responsive-styling">Step 2: Responsive Styling</h3>
<pre><code class="language-css">.card {
  padding: 16px;
  border: 1px solid #ddd;
  border-radius: 8px;
}

@media (min-width: 768px) {
  .card {
    padding: 24px;
  }
}
</code></pre>
<p>This ensures comfortable spacing on mobile and improved readability on larger screens.</p>
<h3 id="heading-step-3-accessibility-enhancements">Step 3: Accessibility Enhancements</h3>
<pre><code class="language-html">&lt;button type="button" onClick={onAction}&gt;
  View Details
&lt;/button&gt;
</code></pre>
<p>The visible button text provides a clear and accessible label, so no additional ARIA attributes are needed.</p>
<h3 id="heading-step-4-keyboard-focus-styling">Step 4: Keyboard Focus Styling</h3>
<pre><code class="language-css">button:focus {
  outline: 2px solid blue;
  outline-offset: 2px;
}
</code></pre>
<p>Focus indicators are essential for keyboard users.</p>
<h3 id="heading-step-5-using-the-component">Step 5: Using the Component</h3>
<pre><code class="language-javascript">function App() {
  return (
    &lt;div className="grid"&gt;
      &lt;ProductCard
        title="Product 1"
        description="Accessible and responsive"
        onAction={() =&gt; alert('Clicked')}
      /&gt;
    &lt;/div&gt;
  );
}
</code></pre>
<p><strong>Key Takeaways</strong></p>
<p>This simple component demonstrates:</p>
<ul>
<li><p>Semantic HTML structure</p>
</li>
<li><p>Responsive design</p>
</li>
<li><p>Built-in accessibility via native elements</p>
</li>
<li><p>Minimal ARIA usage</p>
</li>
</ul>
<p>In real-world applications, this pattern scales into entire design systems.</p>
<h2 id="heading-testing-accessibility">Testing Accessibility</h2>
<p>Accessibility should be validated continuously, not just at the end of development. There are various automated tools you can use to help you with this process:</p>
<ul>
<li><p>Lighthouse (built into Chrome DevTools)</p>
</li>
<li><p>axe DevTools for detailed audits</p>
</li>
<li><p>ESLint plugins for accessibility rules</p>
</li>
</ul>
<h3 id="heading-manual-testing">Manual Testing</h3>
<p>But automated tools cannot catch everything. Manual testing is essential to make sure users can navigate using only the keyboard and use a screen reader (NVDA or VoiceOver. You should also test zoom levels (up to 200%) and check the color contrast manually.</p>
<p><strong>Example: ESLint Accessibility Plugin</strong></p>
<pre><code class="language-shell">npm install eslint-plugin-jsx-a11y --save-dev
</code></pre>
<p>This helps catch accessibility issues during development.</p>
<h2 id="heading-best-practices">Best Practices</h2>
<ul>
<li><p>Use semantic HTML first</p>
</li>
<li><p>Avoid unnecessary ARIA</p>
</li>
<li><p>Test keyboard navigation</p>
</li>
<li><p>Design mobile-first</p>
</li>
<li><p>Ensure color contrast</p>
</li>
<li><p>Use consistent spacing</p>
</li>
</ul>
<h2 id="heading-when-not-to-overuse-accessibility-features">When NOT to Overuse Accessibility Features</h2>
<ul>
<li><p>Avoid adding ARIA when native HTML works</p>
</li>
<li><p>Do not override browser defaults unnecessarily</p>
</li>
<li><p>Avoid complex custom components without accessibility support</p>
</li>
</ul>
<h2 id="heading-future-enhancements">Future Enhancements</h2>
<ul>
<li><p>Design systems with accessibility built-in</p>
</li>
<li><p>Automated accessibility testing in CI/CD</p>
</li>
<li><p>Advanced focus management libraries</p>
</li>
<li><p>Accessibility-first component libraries</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Building responsive and accessible React applications is not a one-time effort—it is a continuous design and engineering practice. Instead of treating accessibility as a checklist, developers should integrate it into the core of their component design process.</p>
<p>If you are starting out, focus on using semantic HTML and mobile-first layouts. These two practices alone solve a large percentage of accessibility and responsiveness issues. As your application grows, introduce ARIA enhancements, keyboard navigation, and automated accessibility testing.</p>
<p>The key is to build interfaces that work for everyone by default. When responsiveness and accessibility are treated as first-class concerns, your React applications become more usable, scalable, and future-proof.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a Table of Contents for Your Article ]]>
                </title>
                <description>
                    <![CDATA[ When you create an article, such as a blog post for freeCodeCamp, Hashnode, Medium, or DEV.to, you can help guide the reader by creating a Table of Contents (ToC). In this article, I'll explain how to ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-create-a-table-of-contents-for-your-article/</link>
                <guid isPermaLink="false">69b27bc5f22e712aaa45f840</guid>
                
                    <category>
                        <![CDATA[ blog ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ devtools ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Jakub T. Jankiewicz ]]>
                </dc:creator>
                <pubDate>Thu, 12 Mar 2026 08:39:33 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/ff72c490-a57b-46c4-b0d9-8c2654853b7c.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When you create an article, such as a blog post for freeCodeCamp, Hashnode, Medium, or DEV.to, you can help guide the reader by creating a <a href="https://en.wikipedia.org/wiki/Table_of_contents">Table of Contents</a> (ToC). In this article, I'll explain how to create one with the help of JavaScript and browser DevTools. The article will explain how to use Google Chrome Dev Tools. But the same can be applied to any modern browser.</p>
<p>The process in this article needs to be done once per platform. Once you have the code, you can apply it every time to create a ToC. Note that if the platform changes something, you may need to adjust the script.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-browser-dev-tools">Browser Dev Tools</a></p>
</li>
<li><p><a href="#heading-javascript-console">JavaScript Console</a></p>
</li>
<li><p><a href="#heading-understanding-the-dom-structure">Understanding the DOM Structure</a></p>
</li>
<li><p><a href="#heading-creating-toc-in-markdown">Creating TOC in Markdown</a></p>
</li>
<li><p><a href="#heading-how-to-create-an-html-toc">How to create an HTML TOC?</a></p>
</li>
<li><p><a href="#heading-copy-the-html-code-for-the-editor">Copy the HTML code for the editor</a></p>
</li>
<li><p><a href="#heading-what-to-do-if-i-dont-have-headers">What to do if I don’t have headers?</a></p>
<ul>
<li><a href="#heading-create-table-of-contents-for-devto">Create Table of Contents for</a> <a href="http://DEV.to">DEV.to</a></li>
</ul>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-browser-dev-tools">Browser Dev Tools</h2>
<p>Dev Tools is an extension to the browser that can allow you to inspect and manipulate the DOM (<a href="https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model">Document Object Model</a>), which is a representation of the HTML the browser keeps in memory in the form of a tree. It also gives access to the JavaScript console, where you can write short code snippets to test something. It has a lot more features, but we'll only use those two.</p>
<p>To open Dev Tools (in Google Chrome), you can press F12 or right-click on the page with your mouse and click Inspect.</p>
<div>
<div>⚠</div>
<div>In Safari, the browser Dev Tools are disabled initially. To enable it, read: <a target="_self" rel="noopener" class="text-primary underline underline-offset-2 hover:text-primary/80 cursor-pointer eVNpHGjtxRBq_gLOfGDr LQNqh2U1kzYxREs65IJu" href="https://support.apple.com/guide/safari/use-the-developer-tools-in-the-develop-menu-sfri20948/mac" style="pointer-events:none">Use the developer tools in the Develop menu in Safari on Mac</a>.</div>
</div>

<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763748137160/7e4df24f-6d25-4d43-a67b-c671bd85789a.png" alt="A browser window split in half. One the right there is an illustration of the laptop with FreeCodeCamp article on the right there is browser DevTools with DOM Tree and CSS panel." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Above is the screenshot of DevTools with a preview of this article. On the right, you can see a selected <code>h1</code> HTML tag (the title) and CSS applied to that tag. The tree structure you see is the DOM.</p>
<div>
<div>💡</div>
<div>When creating a ToC for <strong>freeCodeCamp,</strong> you should open the preview in a new tab.</div>
</div>

<h2 id="heading-javascript-console">JavaScript Console</h2>
<p>We will need to have access to the JavaScript console. To open the console in Google Chrome, you can use F12, right-click on the page and select Inspect from the context menu, or use the shortcut CTRL+SHIFT+C (Windows, Linux) or CMD+OPTION+C (Mac).</p>
<p>In Chrome DevTools, you can pick the Console tab at the top of the DevTools. But this will hide the DOM tree. It’s better to open the bottom drawer. You need to click the 3 dots in the top right corner and pick “show console drawer”.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763749509540/7968ace9-624e-4037-b09a-fe298ba9b865.png" alt="Screenshot of a menu whic hallow docking the dev tools to the right, left, bottom, or in standalone window." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>The Dev Tools will look like this:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763749614077/640a2467-ac85-4788-9836-3431a1c503bb.png" alt="Screenshot of Browser DevTools showing DOM Tree, CSS panel, and Console Drawer." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<div>
<div>💡</div>
<div>You can ignore any errors or warnings in the console. You can click this icon 🚫 on the left side of the drawer, and it will clear the console.</div>
</div>

<p>The console is a so-called <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">Read-Eval-Print-Loop</a>. A classic interface, where you type some commands, here JavaScript code, and when you press enter, the code is executed in the context of the page the DevTools is on.</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763749997534/b61f8cc0-62eb-4586-9898-d41a7519cf3e.png" alt="Screenshot which shows browser alert popup and JavaScript code in DevTools console which open the alert." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Above, you can see a page alert executed from the console.</p>
<h2 id="heading-understanding-the-dom-structure">Understanding the DOM Structure</h2>
<p>The first step to create a ToC is to inspect the DOM and find the headers. They are usually <strong>H1…H6</strong> tags. H1 is often the title of the page. In an ideal world, it would always be.</p>
<p>In my case, the header looks like this:</p>
<pre><code class="language-xml">&lt;h2 id="heading-dev-tools"&gt;Dev Tools&lt;/h2&gt;
</code></pre>
<p>The article only has H2 tags, but later in the article, I will also explain how to create a nested ToC.</p>
<div>
<div>💡</div>
<div>Your headers need to have an “id” attribute. It can look different, for example, be on a different element, but it has to be in the DOM. Later in the article, I will explain a few different structures and how to handle them.</div>
</div>

<p>Now with DevTools, we can write code that will find every header:</p>
<pre><code class="language-javascript">document.querySelectorAll('h2[id], h3[id], main h4[id]');
</code></pre>
<p>In the case of my article on freeCodeCamp, it returned this output:</p>
<pre><code class="language-plaintext">NodeList(5)&nbsp;[h2#heading-dev-tools, h2#heading-javascript-console, h2#heading-understanding-the-dom-structure, h2#trending-guides.col-header, h2#mobile-app.col-header]
</code></pre>
<p>First, it’s a NodeList that we need to convert to an Array. Second is that besides our headers that we have so far, we also have two headers that are part of the website and not the main content. So we need to find out the single element that is the parent of the headers we need.</p>
<p>You can right-click on the white page that contains the article and pick <strong>Inspect Element</strong>. In our case, it found an element <code>&lt;main&gt;</code>. So we can rewrite our selector as:</p>
<pre><code class="language-javascript">document.querySelectorAll('main h2[id], main h3[id], main h4[id]');
</code></pre>
<p>And now it returns our headers and nothing more.</p>
<div>
<div>💡</div>
<div>The <code>[id]</code> attribute selector is not needed here, actually. At least not on freeCodeCamp.</div>
</div>

<h2 id="heading-how-to-create-the-toc-in-markdown">How to Create the ToC in Markdown</h2>
<p>A lot of blogging platforms support Markdown, so it'll be the first thing we'll create.</p>
<p>First, we'll convert the Node list to an array. We can use the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax">spread operator</a>:</p>
<pre><code class="language-javascript">[...document.querySelectorAll('main h2[id], main h3[id], main h4[id]')];
</code></pre>
<p>Then we can map over the array and create the Markdown links that point to the given header.</p>
<pre><code class="language-javascript">const headers = [...document.querySelectorAll('main h2[id], main h3[id], main h4[id]')];

headers.map(function(node) {
    // H2 header should have 0 indent
    const level = parseInt(node.nodeName.replace('H', '')) - 2;
    const hash = node.getAttribute('id');
    const indent = ' '.repeat(level * 2);
    return `\({indent}* [\){node.innerText}](#${hash})`;
});
</code></pre>
<p>The output looks like this:</p>
<pre><code class="language-plaintext">(4)&nbsp;['* [Dev Tools](#heading-dev-tools)', '* [JavaScript Console](#heading-javascript-console)', '* [Understanding the DOM Structure](#heading-understanding-the-dom-structure)', '* [What to do if I don’t have headers?](#heading-what-to-do-if-i-dont-have-headers)']
</code></pre>
<p>To get the text, we can join the array with a newline character and use <code>console.lo</code>g to display the output. If we don’t use <code>console.log</code>, it will show a string with <code>\n</code> characters.</p>
<pre><code class="language-javascript">const headers = [...document.querySelectorAll('main h2[id], main h3[id], main h4[id]')];

console.log(headers.map(function(node) {
    // H2 header should have 0 indent
    const level = parseInt(node.nodeName.replace('H', '')) - 2;
    const hash = node.getAttribute('id');
    const indent = ' '.repeat(level * 2);
    return `\({indent}* [\){node.innerText}](#${hash})`;
}).join('\n'));
</code></pre>
<p>The output for this article will look like this:</p>
<pre><code class="language-markdown">* [Dev Tools](#heading-dev-tools)
* [JavaScript Console](#heading-javascript-console)
* [Understanding the DOM Structure](#heading-understanding-the-dom-structure)
* [Creating TOC in Markdown](#heading-creating-toc-in-markdown)
  * [This is fake header](#heading-this-is-fake-header)
</code></pre>
<p>I created one fake subheader. Platforms, even when not supporting Markdown when writing articles, often support Markdown when copy-pasted. The ToC at the top of the article was created by copying and pasting markdown generated with the last JavaScript snippet.</p>
<h2 id="heading-how-to-create-an-html-toc">How to Create an HTML ToC</h2>
<p>If your platform doesn’t support Markdown (like Medium), you can create HTML, preview that HTML, and copy the output to the clipboard. Pasting that into the editor of the platform you're using should keep the formatting.</p>
<div>
<div>💡</div>
<div>On Medium, the content is inside a <code>&lt;section&gt;</code> element, so the selector must be updated.</div>
</div>

<p>To convert Markdown to HTML, you can use any online tool, but you'll see how to create it yourself in the snippet. It will be faster after you create the code.</p>
<pre><code class="language-javascript">const headers = [...document.querySelectorAll('main h2[id], main h3[id], main h4[id]')]

function indent(state) {
    return ' '.repeat((state.level - 1) * 2);
}

function closeUlTags(state, targetLevel) {
    while (state.level &gt; targetLevel) {
        state.level--;
        state.lines.push(`${indent(state)}&lt;/ul&gt;`);
    }
}

function openUlTags(state, targetLevel) {
    while (state.level &lt; targetLevel) {
        state.lines.push(`${indent(state)}&lt;ul&gt;`);
        state.level++;
    }
}

const result = headers.reduce((state, node) =&gt; {
    const level = parseInt(node.nodeName.replace('H', ''));

    closeUlTags(state, level);
    openUlTags(state, level);
    
    const hash = node.getAttribute('id');
    state.lines.push(`\({indent(state)}&lt;li&gt;&lt;a href="#\){hash}"&gt;${node.innerText}&lt;/a&gt;&lt;/li&gt;`);
    return state;
}, { lines: [], level: 1 });

closeUlTags(result, 1);

console.log(result.lines.join('\n'));
</code></pre>
<p>This is the output of the code in this article:</p>
<pre><code class="language-html">&lt;ul&gt;
  &lt;li&gt;&lt;a href="#heading-table-of-contents"&gt;Table of Contents&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#heading-dev-tools"&gt;Dev Tools&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#heading-javascript-console"&gt;JavaScript Console&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#heading-understanding-the-dom-structure"&gt;Understanding the DOM Structure&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#heading-creating-toc-in-markdown"&gt;Creating TOC in Markdown&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#heading-how-to-create-html-toc"&gt;How to create HTML TOC&lt;/a&gt;&lt;/li&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href="#heading-level-3"&gt;Level 3&lt;/a&gt;&lt;/li&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href="#heading-level-4"&gt;Level 4&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/ul&gt;
  &lt;li&gt;&lt;a href="#heading-what-to-do-if-i-dont-have-headers"&gt;What to do if I don’t have headers?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</code></pre>
<p>I added a few headers at the end, so you can see that it will work for any level of nested headers. Note that we also have the ToC as the first element on the list.</p>
<div>
<div>💡</div>
<div>Note that the above HTML code includes a link to the Table of Contents. This happens if you run the script again after adding the TOC. You can remove it by hand. If you want to improve the code, you can add a filter.</div>
</div>

<h2 id="heading-copy-the-html-code-for-the-editor">Copy the HTML code for the editor</h2>
<p>Most so-called <a href="https://en.wikipedia.org/wiki/WYSIWYG">WYSIWYG</a> editors are using HTML, and you should be able to copy the output of HTML code with formatting and paste it into that editor. The easiest is to just save that into a file, open that file, and select the text:</p>
<img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763758802247/7e4fa0cd-377d-44ca-9cdb-b53ec14da4b8.png" alt="Screenshot of the browser window with file open. The page in the browser shows the table of content where all text is highlighted by selection." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<h2 id="heading-what-to-do-if-i-dont-have-headers">What to Do If I Don’t Have Headers?</h2>
<p>You need to find anything that can be targeted with CSS. If they are <code>p</code> tags with a specific class (like header), you can use <code>p.header</code> instead of <code>h2</code>.</p>
<h3 id="heading-how-to-create-a-table-of-contents-for-devto">How to Create a Table of Contents for DEV.to</h3>
<p>If you have a different DOM structure, you can use different DOM methods to extract the element you need. For example, on DEV.to, the headers look like this:</p>
<pre><code class="language-xml">&lt;h2&gt;
  &lt;a name="overview" href="#overview"&gt;
  &lt;/a&gt;
  Overview
&lt;/h2&gt;
</code></pre>
<p>So the selector needs to be just <code>main h2</code>. But when you execute this code:</p>
<pre><code class="language-javascript">[...document.querySelectorAll('main h2, main h3, main h4')];
</code></pre>
<p>You will see that there are way more headers than the content of the document. Luckily, we can use a new selector in CSS <code>:has()</code>. The final selector for one header can look like this: <code>main h2:has(a[name])</code>.</p>
<p>Here is the full code:</p>
<pre><code class="language-javascript">const selector = 'main h2:has(a[name]), main h3:has(a[name]), main h4:has(a[name])';
const headers = [...document.querySelectorAll(selector)];

console.log(headers.map(function(node) {
    // H2 header should have 0 indent
    const level = parseInt(node.nodeName.replace('H', '')) - 2;
    // this is how you get the hash
    // you can also access href attribute and remove # from the output string
    const hash = node.querySelector('a').getAttribute('name');
    const indent = ' '.repeat(level);
    return `\({indent}* [\){node.innerText}](#${hash})`;
}).join('\n'));
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Creating a table of contents can help your readers digest your article. Since most people don’t read the whole article, they only scan for what they need. You can also find a lot of articles about its impact on SEO. So it’s always worth adding one if the article is longer.</p>
<p>And as you can see, creating a ToC is not that hard with a bit of web development knowledge.</p>
<p>If you like this article, you may want to follow me on Social Media: (<a href="https://x.com/jcubic">Twitter/X</a>, <a href="https://github.com/jcubic">GitHub</a>, and/or <a href="https://www.linkedin.com/in/jakubjankiewicz/">LinkedIn</a>). You can also check my <a href="https://jakub.jankiewicz.org/">personal website</a> and my <a href="https://jakub.jankiewicz.org/blog/">new blog</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Production-Ready Voice Agent Architecture with WebRTC ]]>
                </title>
                <description>
                    <![CDATA[ In this tutorial, you'll build a production-ready voice agent architecture: a browser client that streams audio over WebRTC (Web Real-Time Communication), a backend that mints short-lived session toke ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-production-ready-voice-agents/</link>
                <guid isPermaLink="false">69ab2f260bca1a3976458b2a</guid>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ llm ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Voice ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nataraj Sundar ]]>
                </dc:creator>
                <pubDate>Fri, 06 Mar 2026 19:46:46 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/c61b4358-66d9-434d-8555-d8921313e573.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this tutorial, you'll build a production-ready voice agent architecture: a browser client that streams audio over WebRTC (Web Real-Time Communication), a backend that mints short-lived session tokens, an agent runtime that orchestrates speech and tools safely, and generates post-call artifacts for downstream workflows.</p>
<p>This article is intentionally vendor-neutral. You can implement these patterns using any AI voice platform that supports WebRTC (directly or via an SFU, selective forwarding unit) and server-side token minting. The goal is to help you ship a voice agent architecture that is secure, observable, and operable in production.</p>
<blockquote>
<p><em>Disclosure: This article reflects my personal views and experience. It does not represent the views of my employer or any vendor mentioned.</em></p>
</blockquote>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#what-youll-build">What You'll Build</a></p>
</li>
<li><p><a href="#how-to-avoid-common-production-failures-in-voice-agents">How to Avoid Common Production Failures in Voice Agents</a></p>
</li>
<li><p><a href="#how-to-design-a-latency-budget-for-a-real-time-voice-agent">How to Design a Latency Budget for a Real-Time Voice Agent</a></p>
</li>
<li><p><a href="#production-voice-agent-architecture-vendor-neutral">Production Voice Agent Architecture (Vendor-Neutral)</a></p>
<ul>
<li><p><a href="#step-0-set-up-the-project">Step 0: Set Up the Project</a></p>
</li>
<li><p><a href="#step-1-keep-credentials-server-side">Step 1: Keep Credentials Server-side</a></p>
</li>
<li><p><a href="#step-2-build-a-backend-token-endpoint">Step 2: Build a Backend Token Endpoint</a></p>
</li>
<li><p><a href="#step-3-connect-from-the-web-client-webrtc--sfu">Step 3: Connect from the Web Client (WebRTC + SFU)</a></p>
</li>
<li><p><a href="#step-4-add-client-actions-agent-suggests-app-executes">Step 4: Add Client Actions (Agent Suggests, App Executes)</a></p>
</li>
<li><p><a href="#step-5-add-tool-integrations-safely">Step 5: Add Tool Integrations Safely</a></p>
</li>
<li><p><a href="#step-6-add-post-call-processing-where-durable-value-appears">Step 6: Add post-call processing (where durable value appears)</a></p>
</li>
</ul>
</li>
<li><p><a href="#production-readiness-checklist">Production readiness checklist</a></p>
</li>
<li><p><a href="#closing">Closing</a></p>
</li>
</ul>
<h2 id="heading-what-youll-build">What You'll Build</h2>
<p>By the end, you'll have:</p>
<ul>
<li><p>A web client that streams microphone audio and plays agent audio.</p>
</li>
<li><p>A backend token endpoint that keeps credentials server-side.</p>
</li>
<li><p>A safe coordination channel between the agent and the application.</p>
</li>
<li><p>Structured messages between the application and the agent.</p>
</li>
<li><p>A production checklist for security, reliability, observability, and cost control.</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You should be comfortable with:</p>
<ul>
<li><p>JavaScript or TypeScript</p>
</li>
<li><p>Node.js 18+ (so <code>fetch</code> works server-side) and an HTTP framework (Express in examples)</p>
</li>
<li><p>Browser microphone permissions</p>
</li>
<li><p>Basic WebRTC concepts (high level is fine)</p>
</li>
</ul>
<h2 id="heading-tldr">TL;DR</h2>
<p>A <strong>production-ready voice agent</strong> needs:</p>
<ul>
<li><p>A <strong>server-side token service</strong> (no secrets in the browser)</p>
</li>
<li><p>A <strong>real-time media plane</strong> (WebRTC) for low-latency audio</p>
</li>
<li><p>A <strong>data channel</strong> for structured messages between your app and the agent</p>
</li>
<li><p><strong>Tool guardrails</strong> (allowlists, confirmations, timeouts, audit logs)</p>
</li>
<li><p><strong>Post-call processing</strong> (summary, actions, CRM (Customer Relationship Management), tickets)</p>
</li>
<li><p><strong>Observability-first</strong> implementation (state transitions + metrics)</p>
</li>
</ul>
<h2 id="heading-how-to-avoid-common-production-failures-in-voice-agents">How to Avoid Common Production Failures in Voice Agents</h2>
<p>If you've operated distributed systems, you've seen most failures happen at boundaries:</p>
<ul>
<li><p>timeouts and partial connectivity</p>
</li>
<li><p>retries that amplify load</p>
</li>
<li><p>unclear ownership between components</p>
</li>
<li><p>missing observability</p>
</li>
<li><p>“helpful automation” that becomes unsafe</p>
</li>
</ul>
<p>Voice agents amplify those risks because:</p>
<p><strong>Latency is User Experience</strong>: A slow agent feels broken. Conversational UX is less forgiving than web UX.</p>
<p><strong>Audio + UI + Tools is a Distributed System</strong>: You coordinate browser audio capture, WebRTC transport, STT (speech-to-text), model reasoning, tool calls, TTS (text-to-speech), and playback buffering. Each stage has different clocks and failure modes.</p>
<p><strong>Security Boundaries are Non-negotiable</strong>: A leaked API key is catastrophic. A tool misfire can trigger real-world side effects.</p>
<p><strong>Debuggability determines whether you can ship</strong>: If you don't log state transitions and capture post-call artifacts, you can't operate or improve the system safely.</p>
<h2 id="heading-how-to-design-a-latency-budget-for-a-real-time-voice-agent">How to Design a Latency Budget for a Real-Time Voice Agent</h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/8bb5c6d5-4250-457b-94a2-fcb748050731.png" alt="Latency budget for a real-time voice agent showing mic capture, network RTT, STT, reasoning, tools, TTS, and playback buffering." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Conversations have a “feel.” That feel is mostly latency.</p>
<p>A practical guideline:</p>
<ul>
<li><p>Under <strong>~200ms</strong> feels instant</p>
</li>
<li><p><strong>300–500ms</strong> feels responsive</p>
</li>
<li><p>Over <strong>~700ms</strong> feels broken</p>
</li>
</ul>
<p>Your end-to-end latency is the sum of mic capture, network RTT (round-trip time), STT, reasoning, tool execution, TTS, and playback buffering. Budget for it explicitly or you’ll ship a technically correct system that users perceive as unintelligent.</p>
<h2 id="heading-how-to-design-a-production-voice-agent-architecture-vendor-neutral">How to Design a Production Voice Agent Architecture (Vendor-Neutral)</h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/f0411ddc-d3fb-48e4-be72-37d9765bf0a7.png" alt="Production-ready voice agent architecture showing web client, token service, WebRTC real-time plane, agent runtime, tool layer, and post-call processing." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A scalable <strong>voice agent architecture</strong> typically has these layers:</p>
<ol>
<li><p><strong>Web client</strong>: mic capture, audio playback, UI state</p>
</li>
<li><p><strong>Token service</strong>: short-lived session tokens (secrets stay server-side)</p>
</li>
<li><p><strong>Real-time plane</strong>: WebRTC media + a data channel</p>
</li>
<li><p><strong>Agent runtime</strong>: STT → reasoning → TTS, plus tool orchestration</p>
</li>
<li><p><strong>Tool layer</strong>: external actions behind safety controls</p>
</li>
<li><p><strong>Post-call processor</strong>: summary + structured outputs after the session ends</p>
</li>
</ol>
<p>This separation makes failure domains and trust boundaries explicit.</p>
<h2 id="heading-step-0-set-up-the-project">Step 0: Set Up the Project</h2>
<p>Create a new project directory:</p>
<pre><code class="language-shell">mkdir voice-agent-app
cd voice-agent-app
npm init -y
npm pkg set type=module
npm pkg set scripts.start="node server.js"
</code></pre>
<p>Install dependencies:</p>
<pre><code class="language-shell">npm install express dotenv
</code></pre>
<p>Create this folder structure:</p>
<pre><code class="language-plaintext">voice-agent-app/
├── server.js
├── .env
└── public/
    ├── index.html
    └── client.js
</code></pre>
<p>Add a <code>.env</code> file:</p>
<pre><code class="language-shell">VOICE_PLATFORM_URL=https://your-provider.example
VOICE_PLATFORM_API_KEY=your_api_key_here
</code></pre>
<p>Now you’re ready to implement each part of the system.</p>
<h2 id="heading-step-1-keep-credentials-server-side">Step 1: Keep Credentials Server-side</h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/d522fdf2-bb96-4531-b4ff-3a364336178c.png" alt="Security trust boundary diagram showing browser as untrusted zone and backend/tooling as trusted zone with secrets server-side." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>Treat every API key like production credentials:</p>
<ul>
<li><p>store it in environment variables or a secrets manager</p>
</li>
<li><p>rotate it if exposed</p>
</li>
<li><p>never embed it in browser or mobile apps</p>
</li>
<li><p>avoid logging secrets (log only a short suffix if necessary)</p>
</li>
</ul>
<p>Even if a vendor supports CORS, the browser is not a safe place for long-lived credentials.</p>
<h2 id="heading-step-2-build-a-backend-token-endpoint">Step 2: Build a Backend Token Endpoint</h2>
<p>Your backend should:</p>
<ul>
<li><p>authenticate the user</p>
</li>
<li><p>mint a short-lived session token using your platform API</p>
</li>
<li><p>return only what the client needs (URL + token + expiry)</p>
</li>
</ul>
<h3 id="heading-create-serverjs-nodejs-express">Create server.js (Node.js + Express)</h3>
<pre><code class="language-javascript">import express from "express";
import dotenv from "dotenv";
import path from "path";
import { fileURLToPath } from "url";

dotenv.config();

const app = express();
app.use(express.json());

// Serve the web client from /public
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
app.use(express.static(path.join(__dirname, "public")));

const VOICE_PLATFORM_URL = process.env.VOICE_PLATFORM_URL;
const VOICE_PLATFORM_API_KEY = process.env.VOICE_PLATFORM_API_KEY;

app.post("/api/voice-token", async (req, res) =&gt; {
  res.setHeader("Cache-Control", "no-store");

  try {
    if (!VOICE_PLATFORM_URL || !VOICE_PLATFORM_API_KEY) {
      return res.status(500).json({
        error: "Missing VOICE_PLATFORM_URL or VOICE_PLATFORM_API_KEY in .env",
      });
    }

    // TODO: Authenticate the caller before minting tokens.

    const r = await fetch(`${VOICE_PLATFORM_URL}/api/v1/token`, {
      method: "POST",
      headers: {
        "X-API-Key": VOICE_PLATFORM_API_KEY,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ participant_name: "Web User" }),
    });

    if (!r.ok) {
      const detail = await r.text().catch(() =&gt; "");
      return res.status(r.status).json({ error: "Token request failed", detail });
    }

    const data = await r.json();

    res.json({
      rtc_url: data.rtc_url || data.livekit_url,
      token: data.token,
      expires_in: data.expires_in,
    });
  } catch (err) {
    res.status(500).json({ error: "Failed to mint token" });
  }
});

app.listen(3000, () =&gt; console.log("Open http://localhost:3000"));
</code></pre>
<h3 id="heading-run-the-server">Run the server</h3>
<pre><code class="language-shell">npm start
</code></pre>
<p>Then open: <a href="http://localhost:3000">http://localhost:3000</a></p>
<h3 id="heading-how-this-code-works">How this code works</h3>
<ul>
<li><p>You load credentials from environment variables so secrets never enter the browser.</p>
</li>
<li><p>The <code>/api/voice-token</code> endpoint calls the voice platform’s token API.</p>
</li>
<li><p>You return only the <code>rtc_url</code>, <code>token</code>, and expiration time.</p>
</li>
<li><p>The browser never sees the API key.</p>
</li>
<li><p>If the provider returns an error, you forward a structured error response.</p>
</li>
</ul>
<h3 id="heading-production-notes"><strong>Production Notes</strong></h3>
<ul>
<li><p>rate-limit /api/voice-token (cost + abuse control)</p>
</li>
<li><p>instrument token mint latency and error rate</p>
</li>
<li><p>keep TTL short and handle refresh/reconnect</p>
</li>
<li><p>return minimal fields</p>
</li>
</ul>
<h2 id="heading-step-3-connect-from-the-web-client-webrtc-sfu">Step 3: Connect from the Web Client (WebRTC + SFU)</h2>
<p>In this step, you'll build a minimal web UI that:</p>
<ul>
<li><p>Requests a short-lived token from your backend</p>
</li>
<li><p>Connects to a real-time WebRTC room (often via an SFU)</p>
</li>
<li><p>Plays the agent's audio track</p>
</li>
<li><p>Captures and publishes microphone audio</p>
</li>
</ul>
<h3 id="heading-create-publicindexhtml">Create <code>public/index.html</code></h3>
<pre><code class="language-html">&lt;!doctype html&gt;
&lt;html&gt;
  &lt;head&gt;
    &lt;meta charset="UTF-8" /&gt;
    &lt;meta name="viewport" content="width=device-width,initial-scale=1" /&gt;
    &lt;title&gt;Voice Agent Demo&lt;/title&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;h1&gt;Voice Agent Demo&lt;/h1&gt;

    &lt;button id="startBtn"&gt;Start Call&lt;/button&gt;
    &lt;button id="endBtn" disabled&gt;End Call&lt;/button&gt;

    &lt;p id="status"&gt;Idle&lt;/p&gt;

    &lt;script type="module" src="/client.js"&gt;&lt;/script&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
<h3 id="heading-create-publicclientjs">Create <code>public/client.js</code></h3>
<p>Note: This uses a LiveKit-style client SDK to demonstrate the pattern. If you're using a different provider, swap this import and the connect/publish calls for your provider's WebRTC client.</p>
<pre><code class="language-javascript">import { Room, RoomEvent, Track } from "https://unpkg.com/livekit-client@2.10.1/dist/livekit-client.esm.mjs";

const startBtn = document.getElementById("startBtn");
const endBtn = document.getElementById("endBtn");
const statusEl = document.getElementById("status");

let room = null;
let intentionallyDisconnected = false;
let audioEls = [];

function setStatus(text) {
  statusEl.textContent = text;
}

function detachAllAudio() {
  for (const el of audioEls) {
    try { el.pause?.(); } catch {}
    el.remove();
  }
  audioEls = [];
}

async function mintToken() {
  const res = await fetch("/api/voice-token", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ participant_name: "Web User" }),
    cache: "no-store",
  });

  if (!res.ok) {
    const detail = await res.text().catch(() =&gt; "");
    throw new Error(`Token request failed: ${detail || res.status}`);
  }

  const { rtc_url, token } = await res.json();
  if (!rtc_url || !token) throw new Error("Token response missing rtc_url or token");
  return { rtc_url, token };
}

function wireRoomEvents(r) {
  // 1) Play the agent audio track when subscribed
  r.on(RoomEvent.TrackSubscribed, (track) =&gt; {
    if (track.kind !== Track.Kind.Audio) return;

    const el = track.attach();
    audioEls.push(el);
    document.body.appendChild(el);

    // Autoplay restrictions vary by browser/device.
    el.play?.().catch(() =&gt; {
      setStatus("Connected (audio may be blocked — click the page to enable)");
    });
  });

  // 2) Reconnect on disconnect (token expiry often shows up this way)
  r.on(RoomEvent.Disconnected, async () =&gt; {
    if (intentionallyDisconnected) return;
    setStatus("Disconnected (reconnecting...)");
    await attemptReconnect();
  });
}

async function connectOnce() {
  const { rtc_url, token } = await mintToken();

  const r = new Room();
  wireRoomEvents(r);

  await r.connect(rtc_url, token);

  // Mic permission + publish mic
  try {
    await r.localParticipant.setMicrophoneEnabled(true);
  } catch {
    try { r.disconnect(); } catch {}
    throw new Error("Microphone access denied. Allow mic permission and try again.");
  }

  return r;
}

async function startCall() {
  if (room) return;

  intentionallyDisconnected = false;
  setStatus("Connecting...");

  room = await connectOnce();

  setStatus("Connected");
  startBtn.disabled = true;
  endBtn.disabled = false;
}

async function stopCall() {
  intentionallyDisconnected = true;

  try {
    await room?.localParticipant?.setMicrophoneEnabled(false);
  } catch {}

  try {
    room?.disconnect();
  } catch {}

  room = null;
  detachAllAudio();

  setStatus("Disconnected");
  startBtn.disabled = false;
  endBtn.disabled = true;
}

async function attemptReconnect() {
  // Simplified exponential backoff reconnect.
  // In production, add jitter, max attempts, and better error classification.
  const delaysMs = [250, 500, 1000, 2000];

  for (const delay of delaysMs) {
    if (intentionallyDisconnected) return;

    try {
      // Tear down current state before reconnecting
      try { room?.disconnect(); } catch {}
      room = null;
      detachAllAudio();

      await new Promise((r) =&gt; setTimeout(r, delay));

      room = await connectOnce();
      setStatus("Reconnected");
      startBtn.disabled = true;
      endBtn.disabled = false;
      return;
    } catch {
      // keep retrying
    }
  }

  setStatus("Disconnected (reconnect failed)");
  startBtn.disabled = false;
  endBtn.disabled = true;
}

startBtn.addEventListener("click", async () =&gt; {
  try {
    await startCall();
  } catch (err) {
    setStatus(err?.message || "Connection failed");
    startBtn.disabled = false;
    endBtn.disabled = true;
    room = null;
    detachAllAudio();
  }
});

endBtn.addEventListener("click", async () =&gt; {
  await stopCall();
});
</code></pre>
<h3 id="heading-how-this-step-works-and-why-these-details-matter">How this Step works (and why these details matter)</h3>
<ul>
<li><p>The Start button gives you a user gesture so browsers are more likely to allow audio playback.</p>
</li>
<li><p>Mic permission is handled explicitly: if the user denies access, you show a clear error and avoid a half-connected session.</p>
</li>
<li><p>Disconnect cleanup removes audio elements so you don't leak resources across retries.</p>
</li>
<li><p>The reconnect loop demonstrates the production pattern: if a disconnect happens (often due to token expiry or network churn), the client re-mints a token and reconnects.</p>
</li>
</ul>
<p>In the next step, you'll add a structured data-channel handler to safely process agent-suggested “client actions”.</p>
<h3 id="heading-handle-these-explicitly"><strong>Handle These Explicitly</strong></h3>
<h3 id="heading-autoplay-restriction-example">Autoplay Restriction Example</h3>
<p>Add this to <code>index.html</code>:</p>
<pre><code class="language-html">&lt;button id="startBtn"&gt;Start Call&lt;/button&gt;
&lt;button id="endBtn" disabled&gt;End Call&lt;/button&gt;
&lt;div id="status"&gt;&lt;/div&gt;
</code></pre>
<p>In <code>client.js</code>:</p>
<pre><code class="language-javascript">const startBtn = document.getElementById("startBtn");
const endBtn = document.getElementById("endBtn");
const statusEl = document.getElementById("status");

let room;

startBtn.addEventListener("click", async () =&gt; {
  try {
    room = await connectVoice();
    statusEl.textContent = "Connected";
    startBtn.disabled = true;
    endBtn.disabled = false;
  } catch (err) {
    statusEl.textContent = "Connection failed";
  }
});
</code></pre>
<h3 id="heading-microphone-denial">Microphone denial</h3>
<pre><code class="language-javascript">try {
  await navigator.mediaDevices.getUserMedia({ audio: true });
} catch (err) {
  statusEl.textContent = "Microphone access denied";
  throw err;
}
</code></pre>
<h3 id="heading-disconnect-cleanup">Disconnect cleanup</h3>
<pre><code class="language-javascript">endBtn.addEventListener("click", () =&gt; {
  if (room) {
    room.disconnect();
    statusEl.textContent = "Disconnected";
    startBtn.disabled = false;
    endBtn.disabled = true;
  }
});
</code></pre>
<h3 id="heading-token-refresh-simplified">Token refresh (simplified)</h3>
<pre><code class="language-javascript">room.on(RoomEvent.Disconnected, async () =&gt; {
  const res = await fetch("/api/voice-token");
  const { rtc_url, token } = await res.json();
  await room.connect(rtc_url, token);
});
</code></pre>
<h2 id="heading-step-4-add-client-actions-agent-suggests-app-executes">Step 4: Add Client Actions (Agent Suggests, App Executes)</h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/2304be1c-3451-45f8-ae44-2519fa92c82a.png" alt="Sequence diagram showing agent requesting a client action, app validating allowlist, user confirming, and app executing the side effect." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>A production voice agent often needs to:</p>
<ul>
<li><p>open a runbook/dashboard URL</p>
</li>
<li><p>show a checklist in the UI</p>
</li>
<li><p>request confirmation for an irreversible action</p>
</li>
<li><p>receive structured context (account, region, incident ID)</p>
</li>
</ul>
<p>The key safety rule:</p>
<p><strong>The agent suggests actions. The application validates and executes them.</strong></p>
<p>Use structured messages over the data channel:</p>
<pre><code class="language-json">{
&nbsp;&nbsp;"type": "client_action",
&nbsp;&nbsp;"action": "open_url",
&nbsp;&nbsp;"payload": { "url": "https://internal.example.com/runbook" },
&nbsp;&nbsp;"id": "action_123"
}
</code></pre>
<p><strong>Add guardrails</strong>:</p>
<ul>
<li><p>allowlist permitted actions</p>
</li>
<li><p>validate payload shape</p>
</li>
<li><p>confirmation gates for irreversible actions</p>
</li>
<li><p>idempotency via id</p>
</li>
<li><p>audit logs for every request and outcome</p>
</li>
</ul>
<p>This boundary limits damage from hallucinations or prompt injection.</p>
<pre><code class="language-javascript">// Guardrails: allowlist + validation + idempotency + confirmation

const ALLOWED_ACTIONS = new Set(["open_url", "request_confirm"]);
const EXECUTED_ACTION_IDS = new Set();
const ALLOWED_HOSTS = new Set(["internal.example.com"]);

function parseClientAction(text) {
  let msg;
  try {
    msg = JSON.parse(text);
  } catch {
    return null;
  }

  if (msg?.type !== "client_action") return null;
  if (typeof msg.id !== "string") return null;
  if (!ALLOWED_ACTIONS.has(msg.action)) return null;

  return msg;
}

async function handleClientAction(msg, room) {
  if (EXECUTED_ACTION_IDS.has(msg.id)) return; // idempotency
  EXECUTED_ACTION_IDS.add(msg.id);

  console.log("[client_action]", msg); // audit log (demo)

  if (msg.action === "open_url") {
    const url = msg.payload?.url;
    if (typeof url !== "string") return;

    const u = new URL(url);
    if (!ALLOWED_HOSTS.has(u.host)) {
      console.warn("Blocked navigation to:", u.host);
      return;
    }

    window.open(url, "_blank", "noopener,noreferrer");
    return;
  }

  if (msg.action === "request_confirm") {
    const prompt = msg.payload?.prompt || "Confirm this action?";
    const ok = window.confirm(prompt);

    // Send confirmation back to agent/app
    room.localParticipant.publishData(
  new TextEncoder().encode(
    JSON.stringify({ type: "user_confirmed", id: msg.id, ok })
  ),
  { topic: "client_events", reliable: true }
);
  }
}
</code></pre>
<pre><code class="language-javascript">room.on(RoomEvent.DataReceived, (payload, participant, kind, topic) =&gt; {
  if (topic !== "client_actions") return;

  const text = new TextDecoder().decode(payload);
  const msg = parseClientAction(text);
  if (!msg) return;

  handleClientAction(msg, room);
});
</code></pre>
<h2 id="heading-step-5-add-tool-integrations-safely">Step 5: Add Tool Integrations Safely</h2>
<p>Tools turn a voice agent into automation. Regardless of vendor, enforce these rules:</p>
<ul>
<li><p>timeouts on every tool call</p>
</li>
<li><p>circuit breakers for flaky dependencies</p>
</li>
<li><p>audit logs (inputs, outputs, duration, trace IDs)</p>
</li>
<li><p>explicit confirmation for destructive actions</p>
</li>
<li><p>credentials stored server-side (never in prompts or clients)</p>
</li>
</ul>
<p>If tools fail, degrade gracefully (“I can’t access that system right now, here’s the manual fallback.”). Silence reads as failure.</p>
<p><strong>Create a server-side tool runner (example)</strong></p>
<p>Paste this into <code>server.js</code>:</p>
<pre><code class="language-javascript">const TOOL_ALLOWLIST = {
  get_status: { destructive: false },
  create_ticket: { destructive: true },
};

let failures = 0;
let circuitOpenUntil = 0;

function circuitOpen() {
  return Date.now() &lt; circuitOpenUntil;
}

async function withTimeout(promise, ms) {
  return Promise.race([
    promise,
    new Promise((_, reject) =&gt; setTimeout(() =&gt; reject(new Error("timeout")), ms)),
  ]);
}

async function runToolSafely(tool, args) {
  if (circuitOpen()) throw new Error("circuit_open");

  try {
    const result = await withTimeout(Promise.resolve({ ok: true, tool, args }), 2000);
    failures = 0;
    return result;
  } catch (err) {
    failures++;
    if (failures &gt;= 3) circuitOpenUntil = Date.now() + 10_000;
    throw err;
  }
}

app.post("/api/tools/run", async (req, res) =&gt; {
  const { tool, args, user_confirmed } = req.body || {};

  if (!TOOL_ALLOWLIST[tool]) return res.status(400).json({ error: "Tool not allowed" });

  if (TOOL_ALLOWLIST[tool].destructive &amp;&amp; user_confirmed !== true) {
    return res.status(400).json({ error: "Confirmation required" });
  }

  try {
    const started = Date.now();
    const result = await runToolSafely(tool, args);
    console.log("[tool_call]", { tool, ms: Date.now() - started }); // audit log
    res.json({ ok: true, result });
  } catch (err) {
    console.log("[tool_error]", { tool, err: String(err) });
    res.status(500).json({ ok: false, error: "Tool call failed" });
  }
});
</code></pre>
<h2 id="heading-step-6-add-post-call-processing-where-durable-value-appears">Step 6: Add post-call processing (where durable value appears)</h2>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/65d350ff-8f20-489f-b5de-9cd59dda5b8c.png" alt="Post-call processing workflow showing transcript storage, queue/worker, summaries/action items, and integration updates." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p>After a call ends, generate structured artifacts:</p>
<ul>
<li><p>summary</p>
</li>
<li><p>action items</p>
</li>
<li><p>follow-up email draft</p>
</li>
<li><p>CRM entry or ticket creation</p>
</li>
</ul>
<p>A production pattern:</p>
<ul>
<li><p>store transcript + metadata</p>
</li>
<li><p>enqueue a background job (queue/worker)</p>
</li>
<li><p>produce outputs as JSON + a human-readable report</p>
</li>
<li><p>apply integrations with retries + idempotency</p>
</li>
<li><p>store a “call report” for audits and incident reviews</p>
</li>
</ul>
<p><strong>Create a post-call webhook endpoint (example)</strong></p>
<p>Paste into <code>server.js</code>:</p>
<pre><code class="language-javascript">app.post("/webhooks/call-ended", async (req, res) =&gt; {
  const payload = req.body;

  console.log("[call_ended]", {
    call_id: payload.call_id,
    ended_at: payload.ended_at,
  });

  setImmediate(() =&gt; processPostCall(payload));
  res.json({ ok: true });
});

function processPostCall(payload) {
  const transcript = payload.transcript || [];
  const summary = transcript.slice(0, 3).map(t =&gt; `- \({t.speaker}: \){t.text}`).join("\n");

  const report = {
    call_id: payload.call_id,
    summary,
    action_items: payload.action_items || [],
    created_at: new Date().toISOString(),
  };

  console.log("[call_report]", report);
}
</code></pre>
<h3 id="heading-test-it-locally">Test it locally</h3>
<pre><code class="language-shell">curl -X POST http://localhost:3000/webhooks/call-ended \
  -H "Content-Type: application/json" \
  -d '{
    "call_id": "call_123",
    "ended_at": "2026-02-26T00:10:00Z",
    "transcript": [
      {"speaker": "user", "text": "I need help resetting my password."},
      {"speaker": "agent", "text": "Sure — I can help with that."}
    ],
    "action_items": ["Send password reset link", "Verify account email"]
  }'
</code></pre>
<h2 id="heading-production-readiness-checklist">Production readiness checklist</h2>
<h3 id="heading-security"><strong>Security</strong></h3>
<ul>
<li><p>no API keys in the browser</p>
</li>
<li><p>strict allowlist for client actions</p>
</li>
<li><p>confirmation gates for destructive actions</p>
</li>
<li><p>schema validation on all inbound messages</p>
</li>
<li><p>audit logging for actions and tool calls</p>
</li>
</ul>
<h3 id="heading-reliability"><strong>Reliability</strong></h3>
<ul>
<li><p>reconnect strategy for expired tokens</p>
</li>
<li><p>timeouts + circuit breakers for tools</p>
</li>
<li><p>graceful degradation when dependencies fail</p>
</li>
<li><p>idempotent side effects</p>
</li>
</ul>
<h3 id="heading-observability"><strong>Observability</strong></h3>
<p>Log state transitions (for example):<br><strong>listening → thinking → speaking → ended</strong></p>
<img src="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/694ca88d5ac09a5d68c63854/a1302294-4338-4a3a-ab0d-c50fd34c117f.png" alt="Voice agent state machine showing listening, thinking, speaking, and ended states for observability." style="display:block;margin:0 auto" width="600" height="400" loading="lazy">

<p><strong>Track:</strong></p>
<ul>
<li><p>connect failure rate</p>
</li>
<li><p>end-to-end latency (STT + reasoning + TTS)</p>
</li>
<li><p>tool error rate</p>
</li>
<li><p>reconnect frequency</p>
</li>
</ul>
<h3 id="heading-cost-control"><strong>Cost control</strong></h3>
<ul>
<li><p>rate-limit token minting and sessions</p>
</li>
<li><p>cap max call duration</p>
</li>
<li><p>bound context growth (summarize or truncate)</p>
</li>
<li><p>track per-call usage drivers (STT/TTS minutes, tool calls)</p>
</li>
</ul>
<h2 id="heading-optional-resources">Optional resources</h2>
<h3 id="heading-how-to-try-a-managed-voice-platform-quickly">How to Try a Managed Voice Platform Quickly</h3>
<p>If you want a managed provider to test quickly, you can sign up for a <a href="https://vocalbridgeai.com/">Vocal Bridge account</a> and implement these steps using their token minting + real-time session APIs.</p>
<p>But the core production voice agent architecture in this article is vendor-agnostic. You can replace any component (SFU, STT/TTS, agent runtime, tool layer) as long as you preserve the boundaries: secure token service, real-time media, safe tool execution, and strong observability.</p>
<h3 id="heading-watch-a-full-demo-and-explore-a-complete-reference-repo">Watch a full demo and explore a complete reference repo</h3>
<p>If you'd like to see these patterns working together in a realistic scenario (incident triage), here are two optional resources:</p>
<p>- <strong>Demo video:</strong> <a href="https://youtu.be/TqrtOKd8Zug">Voice-First Incident Triage (end-to-end run)</a><br>This is a hackathon run-through showing client actions, decision boundaries for irreversible actions, and a structured post-call summary.</p>
<p>- <strong>GitHub repo (architecture + design + working code):</strong> <code>https://github.com/natarajsundar/voice-first-incident-triage</code></p>
<p>These links are optional, you can follow the tutorial end-to-end without them.</p>
<h2 id="heading-closing">Closing</h2>
<p>Production-ready voice agents work when you treat them like real-time distributed systems.</p>
<p>Start with the baseline:</p>
<ul>
<li>token service + web client + real-time audio</li>
</ul>
<p>Then layer in:</p>
<ul>
<li><p>controlled client actions</p>
</li>
<li><p>safe tools</p>
</li>
<li><p>post-call automation</p>
</li>
<li><p>observability and cost controls</p>
</li>
</ul>
<p>That’s how you ship a voice agent architecture you can operate. You now have a vendor-neutral reference architecture you can adapt to your stack, with clear trust boundaries, safe tool execution, and operational visibility.</p>
<p>If you’re shipping real-time AI systems, what’s been your biggest production bottleneck so far: <strong>latency, reliability, or tool safety</strong>? I’d love to hear what you’re seeing in the wild. Connect with me on <a href="https://www.linkedin.com/in/natarajsundar/">LinkedIn</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Add Multi-Language Support in Flutter: Manual and AI-Automated Translations for Flutter Apps ]]>
                </title>
                <description>
                    <![CDATA[ As Flutter applications scale beyond a single market, language support becomes a critical requirement. A well-designed app should feel natural to users regardless of their locale, automatically adapting to their language preferences while still givin... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-add-multi-language-support-in-flutter-manual-and-ai-automated-translations-for-flutter-apps/</link>
                <guid isPermaLink="false">697d5a754655a071649990c6</guid>
                
                    <category>
                        <![CDATA[ Flutter ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Dart ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Atuoha Anthony ]]>
                </dc:creator>
                <pubDate>Sat, 31 Jan 2026 01:27:17 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769822678736/98b19125-c06e-4e00-8694-5c2c23abb15f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>As Flutter applications scale beyond a single market, language support becomes a critical requirement. A well-designed app should feel natural to users regardless of their locale, automatically adapting to their language preferences while still giving them control.</p>
<p>This article provides a comprehensive, production-focused guide to supporting multiple languages in a Flutter application using Flutter’s localization system, the <code>intl</code> package, and Bloc for state management. We’ll support English, French, and Spanish, implement automatic language detection, and allow users to manually switch languages from settings, while also exploring the use of AI to automate text translations.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-localization-matters-in-flutter-applications">Why Localization Matters in Flutter Applications</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-flutter-localization-architecture-overview">Flutter Localization Architecture Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-dependencies">How to Set Up Dependencies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-define-supported-languages">How to Define Supported Languages</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-add-localized-text-with-arb-files">How to Add Localized Text with ARB Files</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-generate-localization-code">How to Generate Localization Code</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-configure-materialapp-for-localization">How to Configure MaterialApp for Localization</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-auto-detecting-the-users-device-language">Auto-Detecting the User’s Device Language</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-manage-localization-with-bloc">How to Manage Localization with Bloc</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-display-localized-text-in-widgets">How to Display Localized Text in Widgets</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-language-switching-from-settings">Language Switching from Settings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-add-parameters-to-localized-strings">How to Add Parameters to Localized Strings</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pluralization-and-quantities">Pluralization and Quantities</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-format-dates-numbers-and-currency">How to Format Dates, Numbers, and Currency</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-localization-data-flow">Localization Data Flow</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-common-pitfalls-and-how-to-avoid-them">Common Pitfalls and How to Avoid Them</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-automate-translations-with-ai">How to Automate Translations with AI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-and-considerations">Best Practices and Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-references">References</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before proceeding, you should be comfortable with the following concepts:</p>
<ul>
<li><p><strong>Dart programming language</strong>: variables, classes, functions, and null safety</p>
</li>
<li><p><strong>Flutter fundamentals</strong>: widgets, <code>BuildContext</code>, and widget trees</p>
</li>
<li><p><strong>State management basics</strong>: familiarity with Bloc or similar patterns</p>
</li>
<li><p><strong>Terminal usage</strong>: running Flutter CLI commands</p>
</li>
</ul>
<p>If you have prior experience working with Flutter widgets and basic app architecture, you are well prepared to follow along.</p>
<h2 id="heading-why-localization-matters-in-flutter-applications">Why Localization Matters in Flutter Applications</h2>
<p>Localization (often abbreviated as l10n) is the process of adapting an application for different languages and regions, going beyond simple text translation to influence accessibility, user trust, and overall usability. From a technical perspective, localization introduces several challenges: text must be dynamically resolved at runtime, the UI must update instantly when the language changes, language preferences must persist across sessions, and device locale detection must gracefully fall back when a language is unsupported.</p>
<p>Flutter’s localization framework, when combined with <code>intl</code> and Bloc, solves these challenges cleanly and predictably.</p>
<h2 id="heading-flutter-localization-architecture-overview">Flutter Localization Architecture Overview</h2>
<p>Flutter localization is built around three key ideas:</p>
<ol>
<li><p><strong>ARB files</strong> as the source of truth for translated strings</p>
</li>
<li><p><strong>Code generation</strong> to provide type-safe access to translations</p>
</li>
<li><p><strong>Locale-driven rebuilds</strong> of the widget tree</p>
</li>
</ol>
<p>At runtime, the active <code>Locale</code> determines which translation file is used. When the locale changes, Flutter automatically rebuilds dependent widgets.</p>
<h2 id="heading-how-to-set-up-dependencies">How to Set Up Dependencies</h2>
<p>Add the required dependencies to your <code>pubspec.yaml</code>:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">dependencies:</span>
  <span class="hljs-attr">flutter:</span>
    <span class="hljs-attr">sdk:</span> <span class="hljs-string">flutter</span>

  <span class="hljs-attr">flutter_localizations:</span>
    <span class="hljs-attr">sdk:</span> <span class="hljs-string">flutter</span>

  <span class="hljs-attr">intl:</span> <span class="hljs-string">^0.20.2</span>
  <span class="hljs-attr">flutter_bloc:</span> <span class="hljs-string">^8.1.3</span>
  <span class="hljs-attr">arb_translate:</span> <span class="hljs-string">^1.1.0</span>
</code></pre>
<p>Enable localization code generation:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">flutter:</span>
  <span class="hljs-attr">generate:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>This instructs Flutter to generate localization classes from ARB files.</p>
<h2 id="heading-how-to-define-supported-languages">How to Define Supported Languages</h2>
<p>For this guide, the application will support:</p>
<ul>
<li><p>English (<code>en</code>)</p>
</li>
<li><p>French (<code>fr</code>)</p>
</li>
<li><p>Spanish (<code>es</code>)</p>
</li>
</ul>
<p>These locales will be declared centrally and used throughout the app.</p>
<h2 id="heading-how-to-add-localized-text-with-arb-files">How to Add Localized Text with ARB Files</h2>
<p>Flutter uses <strong>Application Resource Bundle (ARB)</strong> files to store localized strings. Each supported language has its own ARB file.</p>
<h3 id="heading-english-appenarb">English – <code>app_en.arb</code></h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"en"</span>,
  <span class="hljs-attr">"enter_email_address_to_reset"</span>: <span class="hljs-string">"Enter your email address to reset"</span>
}
</code></pre>
<h3 id="heading-french-appfrarb">French – <code>app_fr.arb</code></h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"fr"</span>,
  <span class="hljs-attr">"enter_email_address_to_reset"</span>: <span class="hljs-string">"Entrez votre adresse e-mail pour réinitialiser"</span>
}
</code></pre>
<h3 id="heading-spanish-appesarb">Spanish – <code>app_es.arb</code></h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"es"</span>,
  <span class="hljs-attr">"enter_email_address_to_reset"</span>: <span class="hljs-string">"Ingrese su dirección de correo electrónico para restablecer"</span>
}
</code></pre>
<p>Each key must be identical across files. Only the values change per language.</p>
<h2 id="heading-how-to-generate-localization-code">How to Generate Localization Code</h2>
<p>Run the following command in your terminal:</p>
<pre><code class="lang-bash">flutter gen-l10n
</code></pre>
<p>Flutter generates a strongly typed localization class, typically located at:</p>
<pre><code class="lang-dart">.dart_tool/flutter_gen/gen_l10n/app_localizations.dart
</code></pre>
<p>This file exposes getters such as:</p>
<pre><code class="lang-dart">AppLocalizations.of(context)!.enter_email_address_to_reset
</code></pre>
<h2 id="heading-how-to-configure-materialapp-for-localization">How to Configure <code>MaterialApp</code> for Localization</h2>
<p>The <code>MaterialApp</code> widget must be configured with localization delegates and supported locales:</p>
<pre><code class="lang-dart">MaterialApp(
  localizationsDelegates: <span class="hljs-keyword">const</span> [
    AppLocalizations.delegate,
    GlobalMaterialLocalizations.delegate,
    GlobalWidgetsLocalizations.delegate,
    GlobalCupertinoLocalizations.delegate,
  ],
  supportedLocales: <span class="hljs-keyword">const</span> [
    Locale(<span class="hljs-string">'en'</span>),
    Locale(<span class="hljs-string">'fr'</span>),
    Locale(<span class="hljs-string">'es'</span>),
  ],
  locale: state.locale,
  home: <span class="hljs-keyword">const</span> MyHomePage(),
)
</code></pre>
<p>The <code>locale</code> property is controlled by Bloc, allowing dynamic updates at runtime.</p>
<h2 id="heading-auto-detecting-the-users-device-language">Auto-Detecting the User’s Device Language</h2>
<p>Flutter exposes the device locale via <code>PlatformDispatcher</code>. We can use this to automatically select the most appropriate supported language.</p>
<pre><code class="lang-dart"><span class="hljs-keyword">void</span> detectLanguageAndSet() {
  Locale deviceLocale = PlatformDispatcher.instance.locale;

  Locale selectedLocale = AppLocalizations.supportedLocales.firstWhere(
    (supported) =&gt; supported.languageCode == deviceLocale.languageCode,
    orElse: () =&gt; <span class="hljs-keyword">const</span> Locale(<span class="hljs-string">'en'</span>),
  );

  <span class="hljs-built_in">print</span>(<span class="hljs-string">'Using Locale: <span class="hljs-subst">${selectedLocale.languageCode}</span>'</span>);

  GlobalConfig.storageService.setStringValue(
    AppStrings.DETECTED_LANGUAGE,
    selectedLocale.languageCode,
  );

  context.read&lt;AppLocalizationBloc&gt;().add(
    SetLocale(locale: selectedLocale),
  );
}
</code></pre>
<p>This approach reads the device language, matches it against supported locales, falls back to English when the language is unsupported, persists the detected language, and updates the UI instantly.</p>
<h2 id="heading-how-to-manage-localization-with-bloc">How to Manage Localization with Bloc</h2>
<p>Bloc provides a predictable and testable way to manage application-wide locale changes.</p>
<h3 id="heading-localization-state">Localization State</h3>
<pre><code class="lang-dart"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AppLocalizationState</span> </span>{
  <span class="hljs-keyword">final</span> Locale locale;
  <span class="hljs-keyword">const</span> AppLocalizationState(<span class="hljs-keyword">this</span>.locale);
}
</code></pre>
<h3 id="heading-localization-event">Localization Event</h3>
<pre><code class="lang-dart"><span class="hljs-keyword">abstract</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AppLocalizationEvent</span> </span>{}

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SetLocale</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">AppLocalizationEvent</span> </span>{
  <span class="hljs-keyword">final</span> Locale locale;
  SetLocale({<span class="hljs-keyword">required</span> <span class="hljs-keyword">this</span>.locale});
}
</code></pre>
<h3 id="heading-localization-bloc">Localization Bloc</h3>
<pre><code class="lang-dart"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AppLocalizationBloc</span>
    <span class="hljs-keyword">extends</span> <span class="hljs-title">Bloc</span>&lt;<span class="hljs-title">AppLocalizationEvent</span>, <span class="hljs-title">AppLocalizationState</span>&gt; </span>{
  AppLocalizationBloc()
      : <span class="hljs-keyword">super</span>(<span class="hljs-keyword">const</span> AppLocalizationState(Locale(<span class="hljs-string">'en'</span>))) {
    <span class="hljs-keyword">on</span>&lt;SetLocale&gt;((event, emit) {
      emit(AppLocalizationState(event.locale));
    });
  }
}
</code></pre>
<p>The <code>AppLocalizationBloc</code> manages the app’s language state. It starts with English (<code>Locale('en')</code>) as the default, and when it receives a <code>SetLocale</code> event, it updates the state to the new locale provided in the event, causing the app’s UI to switch to that language. Whenever <code>SetLocale</code> is dispatched, the entire app rebuilds using the new locale.</p>
<h2 id="heading-how-to-display-localized-text-in-widgets">How to Display Localized Text in Widgets</h2>
<p>Once localization is configured, using translated text is straightforward:</p>
<pre><code class="lang-dart">Text(
  AppLocalizations.of(context)!.enter_email_address_to_reset,
  style: getRegularStyle(
    color: Colors.white,
    fontSize: FontSize.s16,
  ),
)
</code></pre>
<p><code>AppLocalizations.of(context)!.enter_email_address_to_reset</code> retrieves the localized string <code>enter_email_address_to_reset</code> for the current app locale from the generated localization resources. The correct translation is resolved automatically based on the active locale.</p>
<h2 id="heading-language-switching-from-settings">Language Switching from Settings</h2>
<p>Users should always be able to override automatic language detection.</p>
<pre><code class="lang-dart">ListTile(
  title: <span class="hljs-keyword">const</span> Text(<span class="hljs-string">'French'</span>),
  onTap: () {
    context.read&lt;AppLocalizationBloc&gt;().add(
      SetLocale(locale: <span class="hljs-keyword">const</span> Locale(<span class="hljs-string">'fr'</span>)),
    );
  },
)
</code></pre>
<p>This <code>ListTile</code> displays the text <strong>"French"</strong>, and when tapped, it triggers the <code>AppLocalizationBloc</code> to change the app’s locale to French (<code>'fr'</code>) by dispatching a <code>SetLocale</code> event and it persists the selected language so it can be restored on the next app launch.</p>
<h2 id="heading-how-to-add-parameters-to-localized-strings">How to Add Parameters to Localized Strings</h2>
<p>Real-world applications rarely display static text. Messages often include <strong>dynamic values</strong> such as user names, counts, dates, or prices. Flutter’s localization system, powered by <code>intl</code>, supports <strong>parameterized (interpolated) strings</strong> in a type-safe way.</p>
<h3 id="heading-where-parameters-are-defined">Where Parameters Are Defined</h3>
<p>Parameters are defined inside ARB files alongside the localized string itself, with each parameterized message consisting of the message string containing placeholders and a corresponding metadata entry that describes those placeholders.</p>
<h3 id="heading-example-parameterized-text">Example: Parameterized Text</h3>
<p>Suppose we want to display a greeting message that includes a user’s name.</p>
<h4 id="heading-english-appenarb-1">English – <code>app_en.arb</code></h4>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"en"</span>,
  <span class="hljs-attr">"greetingMessage"</span>: <span class="hljs-string">"Hello {username}!"</span>,
  <span class="hljs-attr">"@greetingMessage"</span>: {
    <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Greeting message shown on the home screen"</span>,
    <span class="hljs-attr">"placeholders"</span>: {
      <span class="hljs-attr">"username"</span>: {
        <span class="hljs-attr">"type"</span>: <span class="hljs-string">"String"</span>
      }
    }
  }
}
</code></pre>
<p>This defines a parameterized localized message for English, indicated by <code>"@@locale": "en"</code>. The <code>"greetingMessage"</code> key contains the string <code>"Hello {username}!"</code>, where <code>{username}</code> is a placeholder that will be dynamically replaced with the user’s name at runtime. The <code>"@greetingMessage"</code> entry provides metadata for the message, including a description that explains the string is shown on the home screen, and a <code>"placeholders"</code> section that specifies <code>"username"</code> is of type <code>String</code>. When the app runs, this structure allows the message to display dynamically—for example, if the username is <code>"Alice"</code>, the message would appear as <code>"Hello Alice!"</code>.</p>
<h4 id="heading-french-appfrarb-1">French – <code>app_fr.arb</code></h4>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"fr"</span>,
  <span class="hljs-attr">"greetingMessage"</span>: <span class="hljs-string">"Bonjour {username} !"</span>
}
</code></pre>
<h4 id="heading-spanish-appesarb-1">Spanish – <code>app_es.arb</code></h4>
<pre><code class="lang-json">{
  <span class="hljs-attr">"@@locale"</span>: <span class="hljs-string">"es"</span>,
  <span class="hljs-attr">"greetingMessage"</span>: <span class="hljs-string">"¡Hola {username}!"</span>
}
</code></pre>
<p>The placeholder name (<code>{username}</code>) <strong>must be identical across all ARB files</strong>.</p>
<h3 id="heading-generated-dart-api">Generated Dart API</h3>
<p>After running:</p>
<pre><code class="lang-bash">flutter gen-l10n
</code></pre>
<p>Flutter generates a strongly typed method instead of a simple getter:</p>
<pre><code class="lang-dart"><span class="hljs-built_in">String</span> greetingMessage(<span class="hljs-built_in">String</span> username)
</code></pre>
<p>This prevents runtime errors and ensures compile-time safety.</p>
<h3 id="heading-how-to-use-parameterized-strings-in-widgets">How to Use Parameterized Strings in Widgets</h3>
<pre><code class="lang-dart">Text(
  AppLocalizations.of(context)!.greetingMessage(<span class="hljs-string">'Tony'</span>),
)
</code></pre>
<p>If the locale is set to French, the output becomes:</p>
<pre><code class="lang-bash">Bonjour Tony !
</code></pre>
<h2 id="heading-pluralization-and-quantities">Pluralization and Quantities</h2>
<p>Another common localization requirement is <strong>pluralization</strong>. Languages differ significantly in how they express quantities, and hardcoding plural logic in Dart quickly becomes error-prone.</p>
<h3 id="heading-defining-plural-messages-in-arb">Defining Plural Messages in ARB</h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"itemsCount"</span>: <span class="hljs-string">"{count, plural, =0{No items} =1{1 item} other{{count} items}}"</span>,
  <span class="hljs-attr">"@itemsCount"</span>: {
    <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Displays the number of items"</span>,
    <span class="hljs-attr">"placeholders"</span>: {
      <span class="hljs-attr">"count"</span>: {
        <span class="hljs-attr">"type"</span>: <span class="hljs-string">"int"</span>
      }
    }
  }
}
</code></pre>
<p>This defines a <strong>pluralized message</strong> for <code>itemsCount</code>. The string <code>{count, plural, =0{No items} =1{1 item} other{{count} items}}</code> dynamically changes based on the value of <code>count</code>: it shows <strong>"No items"</strong> when <code>count</code> is 0, <strong>"1 item"</strong> when <code>count</code> is 1, and <strong>"{count} items"</strong> for all other values. The metadata entry <code>"@itemsCount"</code> provides a description and specifies that the placeholder <code>count</code> is of type <code>int</code>.</p>
<p>Each language can define its own plural rules while sharing the same key.</p>
<h3 id="heading-using-pluralized-messages">Using Pluralized Messages</h3>
<pre><code class="lang-dart">Text(
  AppLocalizations.of(context)!.itemsCount(<span class="hljs-number">3</span>),
)
</code></pre>
<p>Flutter automatically applies the correct plural form based on the active locale.</p>
<h2 id="heading-how-to-format-dates-numbers-and-currency">How to Format Dates, Numbers, and Currency</h2>
<p>The <code>intl</code> package also provides locale-aware formatting utilities. These should be used <strong>in combination with localized strings</strong>, not as replacements.</p>
<h3 id="heading-date-formatting-example">Date Formatting Example</h3>
<pre><code class="lang-dart"><span class="hljs-keyword">final</span> formattedDate = DateFormat.yMMMMd(
  Localizations.localeOf(context).toString(),
).format(<span class="hljs-built_in">DateTime</span>.now());
</code></pre>
<pre><code class="lang-dart">Text(
  AppLocalizations.of(context)!.lastLoginDate(formattedDate),
)
</code></pre>
<p>This ensures that both language and formatting rules align with the user’s locale.</p>
<h2 id="heading-localization-data-flow">Localization Data Flow</h2>
<p>Localization is handled as an explicit data flow, with locale resolution modeled as application state rather than a static configuration passed into <code>MaterialApp</code>.</p>
<p>The process starts with the <strong>device locale</strong>, obtained from the platform layer at startup. This value represents the system’s preferred language and region but is not applied directly to the UI.</p>
<p>Instead, it flows through a <code>detectLanguageAndSet</code> step responsible for applying application-specific rules. This layer typically handles locale normalization and fallback logic, such as mapping unsupported locales to supported ones, restoring a user-selected language from persistent storage, or enforcing product constraints around available translations.</p>
<p>The resolved locale is then emitted into a <strong>Localization Bloc</strong>, which acts as the single source of truth for localization state. By centralizing locale management, the application can support runtime language changes, ensure predictable rebuilds, and keep localization logic decoupled from both the widget tree and platform APIs.</p>
<p>The Bloc feeds into the <code>locale</code> property of <code>MaterialApp</code>, which is the integration point with Flutter’s localization system. Updating this value triggers a rebuild of the <code>Localizations</code> scope and causes all dependent widgets to resolve strings for the active locale.</p>
<p>At the edge of the system, <strong>localized widgets</strong> consume the generated localization classes produced by <code>flutter gen-l10n</code>. These widgets remain agnostic to how the locale was selected or updated. They simply react to the localization context provided by the framework.</p>
<p>This architecture cleanly separates:</p>
<ul>
<li><p>Locale detection</p>
</li>
<li><p>Business logic and state management</p>
</li>
<li><p>Framework-level localization</p>
</li>
<li><p>UI rendering</p>
</li>
</ul>
<p>As a result, localization behavior remains explicit, maintainable, and compatible with automated translation workflows and CI-driven localization updates.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769595931473/c2b082be-d3f8-4dc5-90cf-a61712cb9f8f.png" alt="Localization Data Flow" class="image--center mx-auto" width="617" height="690" loading="lazy"></p>
<h2 id="heading-common-pitfalls-and-how-to-avoid-them"><strong>Common Pitfalls and How to Avoid Them</strong></h2>
<ol>
<li><p><strong>Avoid manual string concatenation</strong>. For example, do not use <code>'Hello ' + name</code>. You should rely on localized templates instead.</p>
</li>
<li><p><strong>Never hardcode plural logic in Dart</strong>. Always use <code>intl</code>’s pluralization features to handle different languages correctly.</p>
</li>
<li><p><strong>Avoid locale-specific formatting outside</strong> <code>intl</code> utilities. Dates, numbers, and currencies should be formatted using the proper localization tools.</p>
</li>
<li><p><strong>Always regenerate localization files after updating ARB files</strong>. This ensures the app reflects all the latest translations.</p>
</li>
</ol>
<h2 id="heading-how-to-automate-translations-with-ai">How to Automate Translations with AI</h2>
<p>In Flutter applications that rely on ARB files for localization, translation maintenance becomes increasingly costly as the application grows. Each new message must be manually propagated across locale files, often resulting in missing keys, inconsistent phrasing, or delayed updates. This problem is amplified in projects that do not use a Translation Management System (TMS) and instead keep ARB files directly in the repository.</p>
<p>While many TMS platforms have begun adding AI-assisted translation features, not all projects use a TMS at all, particularly small teams, internal tools, or personal projects. In these cases, developers frequently resort to copying strings into AI chat tools and pasting results back into ARB files, which is inefficient and difficult to scale.</p>
<p>To address this workflow gap, <strong>Leen Code</strong> published <code>arb_translate</code> package, a Dart-based CLI tool that automates missing ARB translations using large language models.</p>
<h3 id="heading-design-approach">Design Approach</h3>
<p>The model behind <code>arb_translate</code> aligns with Flutter’s existing localization pipeline rather than replacing it:</p>
<ul>
<li><p>English ARB files remain the source of truth</p>
</li>
<li><p>Only missing keys are translated</p>
</li>
<li><p>Output is written back as standard ARB files</p>
</li>
<li><p><code>flutter gen-l10n</code> is still responsible for code generation</p>
</li>
</ul>
<p>This design makes the tool suitable for both local development and CI usage, without introducing new runtime dependencies or localization abstractions.</p>
<p>At a high level, the flow is:</p>
<ol>
<li><p>Parse the base (typically English) ARB file</p>
</li>
<li><p>Identify missing keys in target locale ARB files</p>
</li>
<li><p>Send key–value pairs to an LLM via API</p>
</li>
<li><p>Receive translated strings</p>
</li>
<li><p>Update or generate locale-specific ARB files</p>
</li>
<li><p>Run <code>flutter gen-l10n</code> to regenerate localized resources</p>
</li>
</ol>
<h3 id="heading-gemini-based-setup">Gemini-Based Setup</h3>
<p>To use Gemini for ARB translation:</p>
<ol>
<li><p>Generate a Gemini API key<br> <a target="_blank" href="https://ai.google.dev/tutorials/setup">https://ai.google.dev/tutorials/setup</a></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769596589542/596648f3-11ca-4768-befe-341b38e8c1f1.png" alt="Gemini API Dashboard" class="image--center mx-auto" width="1534" height="953" loading="lazy"></p>
</li>
<li><p>Install the CLI:</p>
</li>
</ol>
<pre><code class="lang-bash">dart pub global activate arb_translate
</code></pre>
<ol start="3">
<li>Export the API key:</li>
</ol>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> ARB_TRANSLATE_API_KEY=your-api-key
</code></pre>
<ol start="4">
<li>Run the tool from the Flutter project root:</li>
</ol>
<pre><code class="lang-bash">arb_translate
</code></pre>
<p>The tool scans existing ARB files, generates missing translations, and writes them back to disk.</p>
<h3 id="heading-openaichatgpt-support">OpenAI/ChatGPT Support</h3>
<p>As of version <strong>1.0.0</strong>, <code>arb_translate</code> also supports OpenAI ChatGPT models. This allows teams to standardize on OpenAI infrastructure or switch providers without changing their localization workflow.</p>
<ol>
<li><p>Generate an OpenAI API key<br> <a target="_blank" href="https://platform.openai.com/api-keys">https://platform.openai.com/api-keys</a></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769596780166/28b6ef5d-3ff2-4c31-b8a4-fa3505459977.png" alt="OpenAI Platform" class="image--center mx-auto" width="1519" height="751" loading="lazy"></p>
</li>
<li><p>Install the tool:</p>
</li>
</ol>
<pre><code class="lang-bash">dart pub global activate arb_translate
</code></pre>
<ol start="3">
<li>Export the API key:</li>
</ol>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> ARB_TRANSLATE_API_KEY=your-api-key
</code></pre>
<ol start="4">
<li>Select OpenAI as the provider:</li>
</ol>
<p>Via <code>l10n.yaml</code>:</p>
<pre><code class="lang-bash">arb-translate-model-provider: open-ai
</code></pre>
<p>Or via CLI:</p>
<pre><code class="lang-bash">arb_translate --model-provider open-ai
</code></pre>
<ol start="5">
<li>Execute:</li>
</ol>
<pre><code class="lang-bash">arb_translate
</code></pre>
<h3 id="heading-practical-use-cases">Practical Use Cases</h3>
<p>This approach is not intended to replace professional translation or review workflows. Instead, it serves as a <strong>deterministic automation layer</strong> that:</p>
<ul>
<li><p>Eliminates manual copy-paste workflows</p>
</li>
<li><p>Keeps ARB files structurally consistent</p>
</li>
<li><p>Enables translation generation in CI</p>
</li>
<li><p>Allows downstream review in a TMS if required</p>
</li>
</ul>
<p>For content-heavy Flutter applications or teams without a dedicated localization platform, this provides a pragmatic and maintainable solution.</p>
<h2 id="heading-best-practices-and-considerations"><strong>Best Practices and Considerations</strong></h2>
<ol>
<li><p>Always define a fallback locale to ensure the app remains usable.</p>
</li>
<li><p>Avoid hardcoding user-facing strings; rely on localized resources.</p>
</li>
<li><p>Use semantic and stable ARB keys for maintainability.</p>
</li>
<li><p>Persist user language preferences to provide a consistent experience.</p>
</li>
<li><p>Test your app with long translations and multiple locales to catch layout or UI issues.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Localization is a foundational requirement for modern Flutter applications. By combining Flutter’s built-in localization framework, the <code>intl</code> package, and Bloc for state management, you gain a robust and scalable solution.</p>
<p>With automatic device language detection, runtime switching, and clean architecture, your application becomes globally accessible without sacrificing maintainability.</p>
<h2 id="heading-references">References</h2>
<p>Here are official links you can use as references for Flutter localization:</p>
<ul>
<li><p><strong>Flutter Internationalization Guide</strong> – Official Flutter guide on how to internationalize your app:<br>  <a target="_blank" href="https://docs.flutter.dev/ui/accessibility-and-internationalization/internationalization">https://docs.flutter.dev/ui/accessibility-and-internationalization/internationalization</a></p>
</li>
<li><p><strong>Dart</strong> <code>intl</code> Package Documentation – API reference for the <code>intl</code> library used for formatting and localization utilities:<br>  <a target="_blank" href="https://api.flutter.dev/flutter/package-intl_intl/index.html">https://api.flutter.dev/flutter/package-intl_intl/index.html</a></p>
</li>
<li><p><strong>Flutter</strong> <code>flutter_localizations</code> API – API docs for the <code>flutter_localizations</code> library that provides localized strings and resources for Flutter widgets:<br>  <a target="_blank" href="https://api.flutter.dev/flutter/flutter_localizations/">https://api.flutter.dev/flutter/flutter_localizations/</a></p>
</li>
<li><p><strong>Flutter App Localization with AI (LeanCode)</strong> – A guide on speeding up Flutter localization using AI and tools like Gemini or ChatGPT, including details on the <code>arb_translate</code> package.<br>  <a target="_blank" href="https://leancode.co/blog/flutter-app-localization-with-ai">https://leancode.co/blog/flutter-app-localization-with-ai</a></p>
</li>
<li><p><code>arb_translate</code> package (pub.dev) – A tool for automating ARB file translations in Flutter:<br>  <a target="_blank" href="https://pub.dev/packages/arb_translate">https://pub.dev/packages/arb_translate</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Turn Your Favorite Tech Blogs into a Personal Podcast ]]>
                </title>
                <description>
                    <![CDATA[ These days it feels almost impossible to keep up with tech news. I step away for three days, and suddenly there is a new AI model, a new framework, and a new tool everyone says I must learn. Reading everything no longer scales, but I still want to st... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-turn-your-favorite-blogs-into-personal-podcast/</link>
                <guid isPermaLink="false">6971493162f064cd7502688f</guid>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Spruce Emmanuel ]]>
                </dc:creator>
                <pubDate>Wed, 21 Jan 2026 21:46:25 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769029504274/8900a8bf-73cd-4944-b0d6-e440efd1bc96.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>These days it feels almost impossible to keep up with tech news. I step away for three days, and suddenly there is a new AI model, a new framework, and a new tool everyone says I must learn. Reading everything no longer scales, but I still want to stay informed.</p>
<p>So I decided to change the format instead of giving up. I took a few tech blogs I already enjoy reading, picked the best articles, converted them to audio using my own voice, and turned the result into a private podcast. Now I can stay up to date while walking, running, or driving.</p>
<p>In this tutorial, you’ll learn how to build a simplified version of that pipeline step by step.</p>
<h2 id="heading-table-of-contents"><strong>Table of Contents</strong></h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-you-are-going-to-build">What You Are Going to Build</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-getting-started">Getting Started</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-get-the-content">How to Get the Content</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-filter-the-content">How to Filter the Content</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-clean-up-the-content">How to Clean Up the Content</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-convert-content-to-audio">How to Convert Content to Audio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-upload-the-audio-to-cloudflare-r2">How to Upload the Audio to Cloudflare R2</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-make-the-podcast">How to Make the Podcast</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-automate-the-pipeline">How to Automate the Pipeline</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-you-are-going-to-build"><strong>What You Are Going to Build</strong></h2>
<p>You will build a Node.js script that does the following:</p>
<ul>
<li><p>Fetches articles from RSS feeds.</p>
</li>
<li><p>Extracts clean, readable text from each article.</p>
</li>
<li><p>Filters out content you do not want to listen to.</p>
</li>
<li><p>Cleans the text so it sounds good when spoken.</p>
</li>
<li><p>Converts the text to natural-sounding audio using your own voice.</p>
</li>
<li><p>Uploads the audio to Cloudflare R2.</p>
</li>
<li><p>Generates a podcast RSS feed.</p>
</li>
<li><p>Runs automatically on a schedule.</p>
</li>
</ul>
<p>At the end, you will have a real podcast feed you can subscribe to on your phone.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768711883596/a35c2a6b-6f9f-4f3d-898f-0f9bff798e6e.png" alt="The generated podcast showing converted blog posts as episodes." class="image--center mx-auto" width="1536" height="1024" loading="lazy"></p>
<p>If you want to skip the tutorial and jump straight into using the finished tool, you can find the complete version and instructions on <a target="_blank" href="https://github.com/iamspruce/postcast">Gi</a>tHub.</p>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>To follow along, you need basic JavaScript knowledge.</p>
<p>You also need:</p>
<ul>
<li><p>Node.js 22 or newer.</p>
</li>
<li><p>A place to store audio files (<a target="_blank" href="https://dash.cloudflare.com/">Cloudflare</a> R2 in this tutorial).</p>
</li>
<li><p>A text-to-speech API (<a target="_blank" href="http://orangeclone.com">OrangeClone</a> in this tutorial).</p>
</li>
</ul>
<h2 id="heading-project-overview"><strong>Project Overview</strong></h2>
<p>Before writing code, it helps to understand the idea clearly.</p>
<p>This project is a pipeline:</p>
<pre><code class="lang-text">Fetch content -&gt; Filter content -&gt; Clean up content -&gt; Convert to audio -&gt; Repeat
</code></pre>
<p>Each step takes the output of the previous one. Keeping the flow linear makes the project easier to reason about, debug, and automate.</p>
<p>All code in this tutorial lives in a single file called <code>index.js</code>.</p>
<h2 id="heading-getting-started"><strong>Getting Started</strong></h2>
<p>Create a new project folder and your main file.</p>
<pre><code class="lang-bash">mkdir podcast-pipeline
<span class="hljs-built_in">cd</span> podcast-pipeline
touch index.js
</code></pre>
<p>Initialize the project and install dependencies.</p>
<pre><code class="lang-bash">npm init -y
npm install rss-parser @mozilla/readability jsdom node-fetch uuid xmlbuilder @aws-sdk/client-s3
</code></pre>
<p>Enable ESM so <code>import</code> syntax works in Node 22.</p>
<pre><code class="lang-bash">npm pkg <span class="hljs-built_in">set</span> <span class="hljs-built_in">type</span>=module
</code></pre>
<p>Here is what each dependency is used for:</p>
<ul>
<li><p><code>rss-parser</code> reads RSS feeds.</p>
</li>
<li><p><code>@mozilla/readability</code> extracts readable article text.</p>
</li>
<li><p><code>jsdom</code> provides a DOM for Readability.</p>
</li>
<li><p><code>node-fetch</code> fetches remote content.</p>
</li>
<li><p><code>uuid</code> generates unique filenames.</p>
</li>
<li><p><code>xmlbuilder</code> creates the podcast RSS feed.</p>
</li>
<li><p><code>@aws-sdk/client-s3</code> uploads audio to Cloudflare R2.</p>
</li>
</ul>
<h2 id="heading-how-to-get-the-content"><strong>How to Get the Content</strong></h2>
<p>The first decision is where your content comes from.</p>
<p>Avoid scraping websites directly. Scraped HTML is noisy and inconsistent. RSS feeds are structured and reliable. Most serious blogs provide one.</p>
<p>Open <code>index.js</code> and define your sources.</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> Parser <span class="hljs-keyword">from</span> <span class="hljs-string">"rss-parser"</span>;
<span class="hljs-keyword">import</span> fetch <span class="hljs-keyword">from</span> <span class="hljs-string">"node-fetch"</span>;
<span class="hljs-keyword">import</span> { JSDOM } <span class="hljs-keyword">from</span> <span class="hljs-string">"jsdom"</span>;
<span class="hljs-keyword">import</span> { Readability } <span class="hljs-keyword">from</span> <span class="hljs-string">"@mozilla/readability"</span>;

<span class="hljs-keyword">const</span> parser = <span class="hljs-keyword">new</span> Parser();

<span class="hljs-keyword">const</span> NUMBER_OF_ARTICLES_TO_FETCH = <span class="hljs-number">15</span>;

<span class="hljs-keyword">const</span> SOURCES = [
  <span class="hljs-string">"https://www.freecodecamp.org/news/rss/"</span>,
  <span class="hljs-string">"https://hnrss.org/frontpage"</span>,
];
</code></pre>
<p>Now fetch articles and extract readable content.</p>
<pre><code class="lang-js"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">fetchArticles</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> articles = [];

  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> source <span class="hljs-keyword">of</span> SOURCES) {
    <span class="hljs-keyword">const</span> feed = <span class="hljs-keyword">await</span> parser.parseURL(source);

    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> item <span class="hljs-keyword">of</span> feed.items.slice(<span class="hljs-number">0</span>, NUMBER_OF_ARTICLES_TO_FETCH)) {
      <span class="hljs-keyword">if</span> (!item.link) <span class="hljs-keyword">continue</span>;

      <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(item.link);
      <span class="hljs-keyword">const</span> html = <span class="hljs-keyword">await</span> response.text();

      <span class="hljs-keyword">const</span> dom = <span class="hljs-keyword">new</span> JSDOM(html, { <span class="hljs-attr">url</span>: item.link });
      <span class="hljs-keyword">const</span> reader = <span class="hljs-keyword">new</span> Readability(dom.window.document);
      <span class="hljs-keyword">const</span> content = reader.parse();

      <span class="hljs-keyword">if</span> (!content) <span class="hljs-keyword">continue</span>;

      articles.push({
        <span class="hljs-attr">title</span>: item.title,
        <span class="hljs-attr">link</span>: item.link,
        <span class="hljs-attr">content</span>: content.content,
        <span class="hljs-attr">text</span>: content.textContent,
      });
    }
  }

  <span class="hljs-keyword">return</span> articles.slice(<span class="hljs-number">0</span>, NUMBER_OF_ARTICLES_TO_FETCH);
}
</code></pre>
<p>This function:</p>
<ul>
<li><p>Reads RSS feeds.</p>
</li>
<li><p>Downloads each article.</p>
</li>
<li><p>Extracts clean text using Readability.</p>
</li>
<li><p>Returns a list of articles ready for processing.</p>
</li>
</ul>
<h2 id="heading-how-to-filter-the-content"><strong>How to Filter the Content</strong></h2>
<p>Not every article deserves your attention. Start by filtering out topics you do not want to hear about.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> BLOCKED_KEYWORDS = [<span class="hljs-string">"crypto"</span>, <span class="hljs-string">"nft"</span>, <span class="hljs-string">"giveaway"</span>];

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">filterByKeywords</span>(<span class="hljs-params">articles</span>) </span>{
  <span class="hljs-keyword">return</span> articles.filter(
    <span class="hljs-function">(<span class="hljs-params">article</span>) =&gt;</span>
      !BLOCKED_KEYWORDS.some(<span class="hljs-function">(<span class="hljs-params">keyword</span>) =&gt;</span>
        article.text.toLowerCase().includes(keyword)
      )
  );
}
</code></pre>
<p>Next, remove promotional content.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">removePromotionalContent</span>(<span class="hljs-params">articles</span>) </span>{
  <span class="hljs-keyword">return</span> articles.filter(
    <span class="hljs-function">(<span class="hljs-params">article</span>) =&gt;</span> !article.text.toLowerCase().includes(<span class="hljs-string">"sponsored"</span>)
  );
}
</code></pre>
<p>Finally, remove articles that are too short.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">filterByWordCount</span>(<span class="hljs-params">articles, minWords = <span class="hljs-number">700</span></span>) </span>{
  <span class="hljs-keyword">return</span> articles.filter(
    <span class="hljs-function">(<span class="hljs-params">article</span>) =&gt;</span> article.text.split(<span class="hljs-regexp">/\s+/</span>).length &gt;= minWords
  );
}
</code></pre>
<p>After these steps, you are left with articles you actually want to listen to.</p>
<h2 id="heading-how-to-clean-up-the-content"><strong>How to Clean Up the Content</strong></h2>
<p>Raw articles text still need to be cleaned up to sound good when spoken. First, replace images with spoken placeholders.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">replaceImages</span>(<span class="hljs-params">html</span>) </span>{
  <span class="hljs-keyword">return</span> html.replace(<span class="hljs-regexp">/&lt;img[^&gt;]*alt="([^"]*)"[^&gt;]*&gt;/gi</span>, <span class="hljs-function">(<span class="hljs-params">_, alt</span>) =&gt;</span> {
    <span class="hljs-keyword">return</span> alt ? <span class="hljs-string">`[Image: <span class="hljs-subst">${alt}</span>]`</span> : <span class="hljs-string">`[Image omitted]`</span>;
  });
}
</code></pre>
<p>Next, remove code blocks.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">replaceCodeBlocks</span>(<span class="hljs-params">html</span>) </span>{
  <span class="hljs-keyword">return</span> html.replace(
    <span class="hljs-regexp">/&lt;pre&gt;&lt;code&gt;[\s\S]*?&lt;\/code&gt;&lt;\/pre&gt;/gi</span>,
    <span class="hljs-string">"[Code example omitted]"</span>
  );
}
</code></pre>
<p>Strip URLs and replace them with spoken text.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">replaceUrls</span>(<span class="hljs-params">text</span>) </span>{
  <span class="hljs-keyword">return</span> text.replace(<span class="hljs-regexp">/https?:\/\/\S+/gi</span>, <span class="hljs-string">"link removed"</span>);
}
</code></pre>
<p>Normalize common symbols.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">normalizeSymbols</span>(<span class="hljs-params">text</span>) </span>{
  <span class="hljs-keyword">return</span> text
    .replace(<span class="hljs-regexp">/&amp;/g</span>, <span class="hljs-string">"and"</span>)
    .replace(<span class="hljs-regexp">/%/g</span>, <span class="hljs-string">"percent"</span>)
    .replace(<span class="hljs-regexp">/\$/g</span>, <span class="hljs-string">"dollar"</span>);
}
</code></pre>
<p>Convert HTML to text so TTS does not read tags.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">stripHtml</span>(<span class="hljs-params">html</span>) </span>{
  <span class="hljs-keyword">return</span> html.replace(<span class="hljs-regexp">/&lt;[^&gt;]+&gt;/g</span>, <span class="hljs-string">" "</span>);
}
</code></pre>
<p>Combine everything into one cleanup step.</p>
<pre><code class="lang-javascript"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">cleanArticle</span>(<span class="hljs-params">article</span>) </span>{
  <span class="hljs-keyword">let</span> cleaned = replaceImages(article.content);
  cleaned = replaceCodeBlocks(cleaned);
  cleaned = stripHtml(cleaned);
  cleaned = replaceUrls(cleaned);
  cleaned = normalizeSymbols(cleaned);

  <span class="hljs-keyword">return</span> {
    ...article,
    <span class="hljs-attr">cleanedText</span>: cleaned,
  };
}
</code></pre>
<p>At this point, the text is ready for audio generation.</p>
<h2 id="heading-how-to-convert-content-to-audio"><strong>How to Convert Content to Audio</strong></h2>
<p>Browser speech APIs sound robotic. I wanted something that sounded human and familiar. After trying several tools, I settled on OrangeClone. It was the only option that actually sounded like me.</p>
<p>Create a free account and copy your API key from the dashboard.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768712061376/cd437cea-8957-4cb6-98c8-b5f6e520b57b.png" alt="OrangeClone dashboard with API key visible." class="image--center mx-auto" width="2624" height="1822" loading="lazy"></p>
<p>Record 10 to 15 seconds of clean audio and save it as <code>SAMPLE_VOICE.wav</code> in the project root. Then create a voice character (one-time setup).</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> fs <span class="hljs-keyword">from</span> <span class="hljs-string">"node:fs/promises"</span>;

<span class="hljs-keyword">const</span> ORANGECLONE_API_KEY = process.env.ORANGECLONE_API_KEY;
<span class="hljs-keyword">const</span> ORANGECLONE_BASE_URL =
  process.env.ORANGECLONE_BASE_URL || <span class="hljs-string">"https://orangeclone.com/api"</span>;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createVoiceCharacter</span>(<span class="hljs-params">{ name, avatarStyle, voiceSamplePath }</span>) </span>{
  <span class="hljs-keyword">const</span> audioBuffer = <span class="hljs-keyword">await</span> fs.readFile(voiceSamplePath);
  <span class="hljs-keyword">const</span> audioBase64 = audioBuffer.toString(<span class="hljs-string">"base64"</span>);

  <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(
    <span class="hljs-string">`<span class="hljs-subst">${ORANGECLONE_BASE_URL}</span>/characters/create`</span>,
    {
      <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
      <span class="hljs-attr">headers</span>: {
        <span class="hljs-attr">Authorization</span>: <span class="hljs-string">`Bearer <span class="hljs-subst">${ORANGECLONE_API_KEY}</span>`</span>,
        <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
      },
      <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify({
        name,
        avatarStyle,
        <span class="hljs-attr">voiceSample</span>: {
          <span class="hljs-attr">format</span>: <span class="hljs-string">"wav"</span>,
          <span class="hljs-attr">data</span>: audioBase64,
        },
      }),
    }
  );

  <span class="hljs-keyword">if</span> (!response.ok) {
    <span class="hljs-keyword">const</span> errorText = <span class="hljs-keyword">await</span> response.text();
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">`Failed to create character: <span class="hljs-subst">${errorText}</span>`</span>);
  }

  <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> response.json();

  <span class="hljs-keyword">return</span> (
    data.data?.id ||
    data.data?.characterId ||
    data.id ||
    data.characterId
  );
}
</code></pre>
<p>Generate audio from text.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">generateAudio</span>(<span class="hljs-params">characterId, text</span>) </span>{
  <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`<span class="hljs-subst">${ORANGECLONE_BASE_URL}</span>/voices_clone`</span>, {
    <span class="hljs-attr">method</span>: <span class="hljs-string">"POST"</span>,
    <span class="hljs-attr">headers</span>: {
      <span class="hljs-attr">Authorization</span>: <span class="hljs-string">`Bearer <span class="hljs-subst">${ORANGECLONE_API_KEY}</span>`</span>,
      <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
    },
    <span class="hljs-attr">body</span>: <span class="hljs-built_in">JSON</span>.stringify({
      characterId,
      text,
    }),
  });

  <span class="hljs-keyword">return</span> response.json();
}
</code></pre>
<p>Wait for the job to complete.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">waitForAudio</span>(<span class="hljs-params">jobId</span>) </span>{
  <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`<span class="hljs-subst">${ORANGECLONE_BASE_URL}</span>/voices/<span class="hljs-subst">${jobId}</span>`</span>);
    <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> response.json();

    <span class="hljs-keyword">if</span> (data.status === <span class="hljs-string">"completed"</span>) {
      <span class="hljs-keyword">return</span> data.audioUrl;
    }

    <span class="hljs-keyword">await</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">r</span>) =&gt;</span> <span class="hljs-built_in">setTimeout</span>(r, <span class="hljs-number">5000</span>));
  }
}
</code></pre>
<h2 id="heading-how-to-upload-the-audio-to-cloudflare-r2">How to Upload the Audio to Cloudflare R2</h2>
<p>OrangeClone returns an audio URL, but podcast apps need a stable, public file that will not expire.<br>That is where Cloudflare R2 comes in.</p>
<p>R2 is S3-compatible storage, which means we can upload files using the AWS SDK and serve them publicly for podcast apps.</p>
<h2 id="heading-how-to-set-up-credentials">How to Set Up Credentials</h2>
<p>Create an R2 bucket in your Cloudflare dashboard and set the following environment variables:</p>
<ul>
<li><p><code>R2_ACCOUNT_ID</code></p>
</li>
<li><p><code>R2_ACCESS_KEY_ID</code></p>
</li>
<li><p><code>R2_SECRET_ACCESS_KEY</code></p>
</li>
<li><p><code>R2_BUCKET_NAME</code></p>
</li>
<li><p><code>R2_PUBLIC_URL</code></p>
</li>
</ul>
<p>These values allow the script to upload files and generate public URLs for them.</p>
<h2 id="heading-how-to-initialize-the-r2-client">How to Initialize the R2 Client</h2>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { S3Client, PutObjectCommand } <span class="hljs-keyword">from</span> <span class="hljs-string">"@aws-sdk/client-s3"</span>;

<span class="hljs-keyword">const</span> r2 = <span class="hljs-keyword">new</span> S3Client({
  <span class="hljs-attr">region</span>: <span class="hljs-string">"auto"</span>,
  <span class="hljs-attr">endpoint</span>: <span class="hljs-string">`https://<span class="hljs-subst">${process.env.R2_ACCOUNT_ID}</span>.r2.cloudflarestorage.com`</span>,
  <span class="hljs-attr">credentials</span>: {
    <span class="hljs-attr">accessKeyId</span>: process.env.R2_ACCESS_KEY_ID,
    <span class="hljs-attr">secretAccessKey</span>: process.env.R2_SECRET_ACCESS_KEY,
  },
});
</code></pre>
<p>This creates an S3-compatible client that connects directly to your Cloudflare R2 account instead of AWS.</p>
<h2 id="heading-how-to-download-the-audio">How to Download the Audio</h2>
<pre><code class="lang-javascript"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">downloadAudio</span>(<span class="hljs-params">audioUrl</span>) </span>{
  <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> fetch(audioUrl);
  <span class="hljs-keyword">const</span> buffer = <span class="hljs-keyword">await</span> response.arrayBuffer();
  <span class="hljs-keyword">return</span> Buffer.from(buffer);
}
</code></pre>
<p>OrangeClone gives us a URL, not a file.<br>This function downloads the audio and converts it into a Node.js buffer so it can be uploaded to R2.</p>
<h2 id="heading-how-to-upload-to-r2">How to Upload to R2</h2>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { v4 <span class="hljs-keyword">as</span> uuid } <span class="hljs-keyword">from</span> <span class="hljs-string">"uuid"</span>;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">uploadToR2</span>(<span class="hljs-params">audioBuffer</span>) </span>{
  <span class="hljs-keyword">const</span> fileName = <span class="hljs-string">`<span class="hljs-subst">${uuid()}</span>.mp3`</span>;

  <span class="hljs-keyword">const</span> command = <span class="hljs-keyword">new</span> PutObjectCommand({
    <span class="hljs-attr">Bucket</span>: process.env.R2_BUCKET_NAME,
    <span class="hljs-attr">Key</span>: fileName,
    <span class="hljs-attr">Body</span>: audioBuffer,
    <span class="hljs-attr">ContentType</span>: <span class="hljs-string">"audio/mpeg"</span>,
  });

  <span class="hljs-keyword">await</span> r2.send(command);

  <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${process.env.R2_PUBLIC_URL}</span>/<span class="hljs-subst">${fileName}</span>`</span>;
}
</code></pre>
<p>This function uploads the audio buffer to R2 using a unique filename and returns a public URL that podcast apps can access.</p>
<h2 id="heading-putting-it-together">Putting It Together</h2>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> audioUrl = <span class="hljs-keyword">await</span> waitForAudio(jobId);
<span class="hljs-keyword">const</span> audioBuffer = <span class="hljs-keyword">await</span> downloadAudio(audioUrl);
<span class="hljs-keyword">const</span> publicAudioUrl = <span class="hljs-keyword">await</span> uploadToR2(audioBuffer);
</code></pre>
<p>At the end of this step, <code>publicAudioUrl</code> is the final audio file used in the podcast RSS feed.</p>
<h2 id="heading-how-to-make-the-podcast"><strong>How to Make the Podcast</strong></h2>
<p>With public audio URLs, you can now generate an RSS feed.</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> xmlbuilder <span class="hljs-keyword">from</span> <span class="hljs-string">"xmlbuilder"</span>;

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">generatePodcastFeed</span>(<span class="hljs-params">episodes</span>) </span>{
  <span class="hljs-keyword">const</span> feed = xmlbuilder
    .create(<span class="hljs-string">"rss"</span>, { <span class="hljs-attr">version</span>: <span class="hljs-string">"1.0"</span> })
    .att(<span class="hljs-string">"version"</span>, <span class="hljs-string">"2.0"</span>)
    .ele(<span class="hljs-string">"channel"</span>);

  feed.ele(<span class="hljs-string">"title"</span>, <span class="hljs-string">"My Tech Podcast"</span>);
  feed.ele(<span class="hljs-string">"description"</span>, <span class="hljs-string">"Tech articles converted to audio"</span>);
  feed.ele(<span class="hljs-string">"link"</span>, <span class="hljs-string">"https://your-site.com"</span>);

  episodes.forEach(<span class="hljs-function">(<span class="hljs-params">ep</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> item = feed.ele(<span class="hljs-string">"item"</span>);
    item.ele(<span class="hljs-string">"title"</span>, ep.title);
    item.ele(<span class="hljs-string">"enclosure"</span>, {
      <span class="hljs-attr">url</span>: ep.audioUrl,
      <span class="hljs-attr">type</span>: <span class="hljs-string">"audio/mpeg"</span>,
    });
  });

  <span class="hljs-keyword">return</span> feed.end({ <span class="hljs-attr">pretty</span>: <span class="hljs-literal">true</span> });
}
</code></pre>
<h2 id="heading-how-to-automate-the-pipeline"><strong>How to Automate the Pipeline</strong></h2>
<p>Automation in this project happens in two stages. First, the code itself must be able to process multiple articles in one run. Second, the script must run automatically on a schedule. We’ll start with the code-level automation.</p>
<h3 id="heading-automating-inside-the-code"><strong>Automating Inside the Code</strong></h3>
<p>Earlier, we fetched up to fifteen articles. Now we need to make sure every article that passes our filters goes through the full pipeline.</p>
<p>Add the following function near the bottom of <code>index.js</code>.</p>
<pre><code class="lang-js"><span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">runPipeline</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> rawArticles = <span class="hljs-keyword">await</span> fetchArticles();

  <span class="hljs-keyword">const</span> filteredArticles = filterByWordCount(
    removePromotionalContent(filterByKeywords(rawArticles))
  );

  <span class="hljs-keyword">if</span> (filteredArticles.length === <span class="hljs-number">0</span>) {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"No articles passed the filters"</span>);
    <span class="hljs-keyword">return</span> [];
  }

  <span class="hljs-keyword">const</span> characterId = <span class="hljs-keyword">await</span> createVoiceCharacter({
    <span class="hljs-attr">name</span>: <span class="hljs-string">"My Voice"</span>,
    <span class="hljs-attr">avatarStyle</span>: <span class="hljs-string">"realistic"</span>,
    <span class="hljs-attr">voiceSamplePath</span>: <span class="hljs-string">"./SAMPLE_VOICE.wav"</span>,
  });

  <span class="hljs-keyword">const</span> episodes = [];

  <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> article <span class="hljs-keyword">of</span> filteredArticles) {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Processing: <span class="hljs-subst">${article.title}</span>`</span>);

    <span class="hljs-keyword">const</span> cleaned = cleanArticle(article);

    <span class="hljs-keyword">const</span> job = <span class="hljs-keyword">await</span> generateAudio(characterId, cleaned.cleanedText);

    <span class="hljs-keyword">const</span> audioUrl = <span class="hljs-keyword">await</span> waitForAudio(job.id);
    <span class="hljs-keyword">const</span> audioBuffer = <span class="hljs-keyword">await</span> downloadAudio(audioUrl);
    <span class="hljs-keyword">const</span> publicAudioUrl = <span class="hljs-keyword">await</span> uploadToR2(audioBuffer);

    episodes.push({
      <span class="hljs-attr">title</span>: article.title,
      <span class="hljs-attr">audioUrl</span>: publicAudioUrl,
    });
  }

  <span class="hljs-keyword">return</span> episodes;
}
</code></pre>
<p>This function does all the heavy lifting:</p>
<ul>
<li><p>Fetches articles</p>
</li>
<li><p>Applies all filters</p>
</li>
<li><p>Creates the voice character once</p>
</li>
<li><p>Loops through every valid article</p>
</li>
<li><p>Converts each article into audio</p>
</li>
<li><p>Uploads the audio to Cloudflare R2</p>
</li>
<li><p>Collects podcast episode data</p>
</li>
</ul>
<p>At this point, one script run can generate multiple podcast episodes.</p>
<h3 id="heading-running-the-pipeline-and-generating-the-feed"><strong>Running the Pipeline and Generating the Feed</strong></h3>
<p>Now we need a single entry point that runs the pipeline and writes the podcast feed. Add this below the pipeline function.</p>
<pre><code class="lang-js"><span class="hljs-keyword">import</span> fs <span class="hljs-keyword">from</span> <span class="hljs-string">"node:fs/promises"</span>;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">main</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> episodes = <span class="hljs-keyword">await</span> runPipeline();

  <span class="hljs-keyword">if</span> (episodes.length === <span class="hljs-number">0</span>) {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"No episodes generated"</span>);
    <span class="hljs-keyword">return</span>;
  }

  <span class="hljs-keyword">const</span> rss = generatePodcastFeed(episodes);

  <span class="hljs-keyword">await</span> fs.mkdir(<span class="hljs-string">"./public"</span>, { <span class="hljs-attr">recursive</span>: <span class="hljs-literal">true</span> });
  <span class="hljs-keyword">await</span> fs.writeFile(<span class="hljs-string">"./public/feed.xml"</span>, rss);

  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Podcast feed generated at public/feed.xml"</span>);
}

main().catch(<span class="hljs-built_in">console</span>.error);
</code></pre>
<p>When you run <code>node index.js</code>, this now:</p>
<ul>
<li><p>Processes all selected articles</p>
</li>
<li><p>Creates multiple audio files</p>
</li>
<li><p>Generates a valid podcast RSS feed</p>
</li>
</ul>
<p>This is the core automation.</p>
<h3 id="heading-scheduling-the-pipeline-with-github-actions"><strong>Scheduling the Pipeline with GitHub Actions</strong></h3>
<p>The final step is to make this script run automatically. Create a GitHub Actions workflow file at <code>.github/workflows/podcast.yml</code>.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Podcast</span> <span class="hljs-string">Pipeline</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">schedule:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">cron:</span> <span class="hljs-string">"0 6 * * *"</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">run:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/setup-node@v4</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">node-version:</span> <span class="hljs-number">22</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">npm</span> <span class="hljs-string">install</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">run:</span> <span class="hljs-string">node</span> <span class="hljs-string">index.js</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">ORANGECLONE_API_KEY:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.ORANGECLONE_API_KEY</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">R2_ACCOUNT_ID:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.R2_ACCOUNT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">R2_ACCESS_KEY_ID:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.R2_ACCESS_KEY_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">R2_SECRET_ACCESS_KEY:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.R2_SECRET_ACCESS_KEY</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">R2_BUCKET_NAME:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.R2_BUCKET_NAME</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">R2_PUBLIC_URL:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.R2_PUBLIC_URL</span> <span class="hljs-string">}}</span>
</code></pre>
<p>This workflow runs the pipeline every morning at 6 AM.</p>
<p>Each run:</p>
<ul>
<li><p>Fetches new articles</p>
</li>
<li><p>Generates fresh audio</p>
</li>
<li><p>Updates the podcast feed</p>
</li>
</ul>
<p>Once this is set up, your podcast updates itself without manual work.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>This is a basic version of my full production pipeline, <a target="_blank" href="https://github.com/iamspruce/postcast">PostCast</a>, but the core idea is the same.</p>
<p>You now know how to turn blogs into a personal podcast. Be mindful of copyright and only use content you are allowed to consume.</p>
<p>If you have questions, reach me on X at <code>@</code><a target="_blank" href="https://x.com/sprucekhalifa"><code>sprucekhalifa</code></a>. I write practical tech articles like this regularly.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build and Deploy a Blog-to-Audio Service Using OpenAI ]]>
                </title>
                <description>
                    <![CDATA[ Turning written blog posts into audio is a simple way to reach more people. Many users prefer listening during travel or workouts. Others enjoy having both reading and listening options.  With OpenAI’s text-to-speech models, you can build a clean ser... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-and-deploy-blog-to-audio-openai/</link>
                <guid isPermaLink="false">69671ceac3577e1210128477</guid>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ openai ]]>
                    </category>
                
                    <category>
                        <![CDATA[ FastAPI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 14 Jan 2026 04:34:50 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768359861591/69bc8279-f882-4af1-9375-5576f7043b48.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Turning written blog posts into audio is a simple way to reach more people. Many users prefer listening during travel or workouts. Others enjoy having both reading and listening options. </p>
<p>With OpenAI’s <a target="_blank" href="https://platform.openai.com/docs/guides/text-to-speech">text-to-speech</a> models, you can build a clean service that takes a blog URL or pasted text and produces a natural-sounding audio file. </p>
<p>In this article, you’ll learn how to build this system end-to-end. You will learn how to fetch blog content, send it to OpenAI’s audio API, save the output as an MP3 file, and serve everything through a small <a target="_blank" href="https://fastapi.tiangolo.com/">FastAPI</a> app. </p>
<p>At the end, you’ll also build a minimal user interface and deploy it to Sevalla so that anyone can upload text and download audio without touching code.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-understanding-the-core-idea">Understanding the Core Idea</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-your-project">How to Set Up Your Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-fetch-and-clean-blog-content">How to Fetch and Clean Blog Content</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-send-text-to-openai-for-audio">How to Send Text to OpenAI for Audio</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-a-fastapi-backend">How to Build a FastAPI Backend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-add-a-simple-user-interface">How to Add a Simple User Interface</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-deploy-your-service-to-sevalla">How to Deploy Your Service to Sevalla</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-understanding-the-core-idea">Understanding the Core Idea</h2>
<p>A blog-to-audio service has only three important parts. The first part takes a blog link or text and cleans it. The second part sends the clean text to OpenAI’s text-to-speech model. The third part gives the final MP3 file back to the user.</p>
<p>OpenAI’s speech generation is simple to use. You send text, choose a voice, and get audio back. The quality is high and works well even for long posts. This means you do not need to worry about training models or tuning voices.</p>
<p>The only job left is to make the system easy to use. That is where FastAPI and a small HTML form help. They wrap your code into a web service so anyone can try it.</p>
<h2 id="heading-how-to-set-up-your-project">How to Set Up Your Project</h2>
<p>Create a folder for your project. Inside it, create a file called <code>main.py</code>. You will also need a basic HTML file later.</p>
<p>Install the libraries you need with pip:</p>
<pre><code class="lang-python">pip install fastapi uvicorn requests beautifulsoup4 python-multipart
</code></pre>
<p>FastAPI gives you a simple backend. Requests module helps download blog pages. <a target="_blank" href="https://pypi.org/project/beautifulsoup4/">BeautifulSoup</a> helps remove HTML tags and extract readable text. Python-multipart helps upload form data.</p>
<p>You must also install the OpenAI client:</p>
<pre><code class="lang-python">pip install openai
</code></pre>
<p>Make sure you have your OpenAI API key ready. Set it in your terminal before running the app:</p>
<pre><code class="lang-python">export OPENAI_API_KEY=<span class="hljs-string">"your-key"</span>
</code></pre>
<p>On Windows, you can do:</p>
<pre><code class="lang-python">setx OPENAI_API_KEY <span class="hljs-string">"your-key"</span>
</code></pre>
<h2 id="heading-how-to-fetch-and-clean-blog-content">How to Fetch and Clean Blog Content</h2>
<p>To convert a blog into audio, you must first extract the main article text. You can fetch the page with requests and parse it with BeautifulSoup. </p>
<p>Below is a simple function that does this. </p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> bs4 <span class="hljs-keyword">import</span> BeautifulSoup

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">extract_text_from_url</span>(<span class="hljs-params">url: str</span>) -&gt; str:</span>
    response = requests.get(url, timeout=<span class="hljs-number">10</span>)
    html = response.text
    soup = BeautifulSoup(html, <span class="hljs-string">"html.parser"</span>)
    paragraphs = soup.find_all(<span class="hljs-string">"p"</span>)
    text = <span class="hljs-string">" "</span>.join(p.get_text(strip=<span class="hljs-literal">True</span>) <span class="hljs-keyword">for</span> p <span class="hljs-keyword">in</span> paragraphs)
    <span class="hljs-keyword">return</span> text
</code></pre>
<p>Here is what happens step by step. </p>
<ul>
<li><p>The function downloads the page. </p>
</li>
<li><p>BeautifulSoup reads the HTML and finds all paragraph tags. </p>
</li>
<li><p>It pulls out the text in each paragraph and joins them into one long string. </p>
</li>
<li><p>This gives you a clean version of the blog post without ads or layout code.</p>
</li>
</ul>
<p>If the user pastes text instead of a URL, you can skip this part and use the text as it is.</p>
<h2 id="heading-how-to-send-text-to-openai-for-audio">How to Send Text to OpenAI for Audio</h2>
<p>OpenAI’s text-to-speech API makes this part of the work very easy. You send a message with text and select a voice such as Alloy or Verse. The API returns raw audio bytes. You can save these bytes as an MP3 file.</p>
<p>Here is a helper function to convert text into audio:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI
client = OpenAI()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">text_to_audio</span>(<span class="hljs-params">text: str, output_path: str</span>):</span>
    audio = client.audio.speech.create(
        model=<span class="hljs-string">"gpt-4o-mini-tts"</span>,
        voice=<span class="hljs-string">"alloy"</span>,
        input=text
    )
    <span class="hljs-keyword">with</span> open(output_path, <span class="hljs-string">"wb"</span>) <span class="hljs-keyword">as</span> f:
        f.write(audio.read())
</code></pre>
<p>This function calls the OpenAI client and passes the text, model name, and voice choice. The <code>.read()</code> method extracts the binary audio stream. Writing this to an MP3 file completes the process.</p>
<p>If the blog post is very long, you may want to limit text length or chunk the text and join the audio files later. But for most blogs, the model can handle the entire text in one request.</p>
<h2 id="heading-how-to-build-a-fastapi-backend">How to Build a FastAPI Backend</h2>
<p>Now you can wrap both steps into a simple FastAPI server. This server will accept either a URL or pasted text. It will convert the content into audio and return the MP3 file as a response.</p>
<p>Here is the full backend code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI, Form
<span class="hljs-keyword">from</span> fastapi.responses <span class="hljs-keyword">import</span> FileResponse
<span class="hljs-keyword">import</span> uuid
<span class="hljs-keyword">import</span> os

app = FastAPI()
<span class="hljs-meta">@app.post("/convert")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">convert</span>(<span class="hljs-params">url: str = Form(<span class="hljs-params">None</span>), text: str = Form(<span class="hljs-params">None</span>)</span>):</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> url <span class="hljs-keyword">and</span> <span class="hljs-keyword">not</span> text:
        <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"Please provide a URL or text"</span>}
    <span class="hljs-keyword">if</span> url:
        <span class="hljs-keyword">try</span>:
            text_content = extract_text_from_url(url)
        <span class="hljs-keyword">except</span> Exception:
            <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: <span class="hljs-string">"Could not fetch the URL"</span>}
    <span class="hljs-keyword">else</span>:
        text_content = text
    file_id = uuid.uuid4().hex
    output_path = <span class="hljs-string">f"audio_<span class="hljs-subst">{file_id}</span>.mp3"</span>
    text_to_audio(text_content, output_path)
    <span class="hljs-keyword">return</span> FileResponse(output_path, media_type=<span class="hljs-string">"audio/mpeg"</span>)
</code></pre>
<p>Here is how it works. The user sends form data with either <code>url</code> or <code>text</code>. The server checks which one exists. </p>
<p>If there is a URL, it extracts text with the earlier function. If there is no URL, it uses the provided text directly. A unique file name is created for every request. Then the audio file is generated and returned as an MP3 download.</p>
<p>You can run the server like this:</p>
<pre><code class="lang-python">uvicorn main:app --reload
</code></pre>
<p>Open your browser at <code>http://localhost:8000</code>. You will not see the UI yet, but the API endpoint is working. You can test it using a tool like Postman or by building the front end next.</p>
<h2 id="heading-how-to-add-a-simple-user-interface">How to Add a Simple User Interface</h2>
<p>A service is much easier to use when it has a clean UI. Below is a simple HTML page that sends either a URL or text to your FastAPI backend. Save this file as <code>index.html</code> in the same folder:</p>
<pre><code class="lang-xml"><span class="hljs-meta">&lt;!DOCTYPE <span class="hljs-meta-keyword">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">html</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">head</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">title</span>&gt;</span>Blog to Audio<span class="hljs-tag">&lt;/<span class="hljs-name">title</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">style</span>&gt;</span><span class="css">
        <span class="hljs-selector-tag">body</span> { <span class="hljs-attribute">font-family</span>: Arial, padding: <span class="hljs-number">40px</span>; <span class="hljs-attribute">max-width</span>: <span class="hljs-number">600px</span>; <span class="hljs-attribute">margin</span>: auto; }
        <span class="hljs-selector-tag">input</span>, <span class="hljs-selector-tag">textarea</span> { <span class="hljs-attribute">width</span>: <span class="hljs-number">100%</span>; <span class="hljs-attribute">padding</span>: <span class="hljs-number">10px</span>; <span class="hljs-attribute">margin-top</span>: <span class="hljs-number">10px</span>; }
        <span class="hljs-selector-tag">button</span> { <span class="hljs-attribute">padding</span>: <span class="hljs-number">12px</span> <span class="hljs-number">20px</span>; <span class="hljs-attribute">margin-top</span>: <span class="hljs-number">20px</span>; <span class="hljs-attribute">cursor</span>: pointer; }
    </span><span class="hljs-tag">&lt;/<span class="hljs-name">style</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">head</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">body</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">h2</span>&gt;</span>Convert Blog to Audio<span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">form</span> <span class="hljs-attr">action</span>=<span class="hljs-string">"/convert"</span> <span class="hljs-attr">method</span>=<span class="hljs-string">"post"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">label</span>&gt;</span>Blog URL<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"url"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Enter a blog link"</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">p</span>&gt;</span>or paste text below<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">textarea</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"text"</span> <span class="hljs-attr">rows</span>=<span class="hljs-string">"10"</span> <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Paste blog text here"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">textarea</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>&gt;</span>Convert to Audio<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">form</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">body</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">html</span>&gt;</span>
</code></pre>
<p>This page gives the user two options. They can type a URL or paste text. The form sends the data to <code>/convert</code> using a POST request. The response will be the MP3 file, so the browser will download it.</p>
<p>To serve the HTML file, add this route to your <code>main.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi.responses <span class="hljs-keyword">import</span> HTMLResponse

<span class="hljs-meta">@app.get("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">home</span>():</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">"index.html"</span>, <span class="hljs-string">"r"</span>) <span class="hljs-keyword">as</span> f:
        html = f.read()
    <span class="hljs-keyword">return</span> HTMLResponse(html)
</code></pre>
<p>Now, when you visit the main URL, you will see a clean form.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191346855/7ac2b182-7c19-408b-8af9-5b696bad8cec.png" alt="Blog to Audio UI" class="image--center mx-auto" width="1000" height="352" loading="lazy"></p>
<p>When you submit a URL, the server will process your request and give you an audio file.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191378838/3fedbbba-0ae0-45a4-a0af-5565a78a0884.png" alt="Blog to Audio Result" class="image--center mx-auto" width="1000" height="409" loading="lazy"></p>
<p>Great. Our text to audio service is working. Now let’s get it into production.</p>
<h2 id="heading-how-to-deploy-your-service-to-sevalla">How to Deploy Your Service to Sevalla</h2>
<p>You can choose any cloud provider, like AWS, DigitalOcean, or others, to host your service. I will be using Sevalla for this example.</p>
<p><a target="_blank" href="https://sevalla.com/">Sevalla</a> is a developer-friendly PaaS provider. It offers application hosting, database, object storage, and static site hosting for your projects.</p>
<p>Every platform will charge you for creating a cloud resource. Sevalla comes with a $50 credit for us to use, so we won’t incur any costs for this example.</p>
<p>Let’s push this project to GitHub so that we can connect our repository to Sevalla. We can also enable auto-deployments so that any new change to the repository is automatically deployed.</p>
<p>You can also <a target="_blank" href="https://github.com/manishmshiva/blog-to-audio">fork my repository</a> from here.</p>
<p><a target="_blank" href="https://app.sevalla.com/login">Log in</a> to Sevalla and click on Applications -&gt; Create new application. You can see the option to link your GitHub repository to create a new application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191422806/85b3398b-9be7-4956-be4e-05c72b5dd6ae.png" alt="Sevalla Create Application" class="image--center mx-auto" width="1000" height="620" loading="lazy"></p>
<p>Use the default settings. Click “Create application”. Now we have to add our OpenAI API key to the environment variables. Click on the “Environment variables” section once the application is created, and save the <code>OPENAI_API_KEY</code> value as an environment variable.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191454748/2c19f048-74e3-46d0-90e2-44128be19201.png" alt="Sevalla Environment Variables" class="image--center mx-auto" width="1000" height="293" loading="lazy"></p>
<p>Now we are ready to deploy our application. Click on “Deployments” and click “Deploy now”. It will take 2–3 minutes for the deployment to complete.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191493335/cb789b5e-ff51-4ffb-b398-3b1ccd6bc137.png" alt="Sevalla Deployment" class="image--center mx-auto" width="1000" height="520" loading="lazy"></p>
<p>Once done, click on “Visit app”. You will see the application served via a URL ending with <code>sevalla.app</code> . This is your new root URL. You can replace <code>localhost:8000</code> with this URL and start using it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768191518487/591394e4-de93-43bf-ac5a-6492e45f1e60.png" alt="Application UI" class="image--center mx-auto" width="902" height="586" loading="lazy"></p>
<p>Congrats! Your blog-to-audio service is now live. You can extend this by adding other capabilities and pushing your code to GitHub. Sevalla will automatically deploy your application to production.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You now know how to build a full blog-to-audio service using OpenAI. You learned how to fetch blog text, convert it into speech, and serve it with FastAPI. You also learned how to create a simple user interface, allowing people to try it with no setup. </p>
<p>With this foundation, you can turn any written content into smooth, natural audio. This can help creators reach a wider audience, enhance accessibility, and provide users with more ways to enjoy content.</p>
<p><em>Hope you enjoyed this article. Signup for my free newsletter</em> <a target="_blank" href="https://www.turingtalks.ai/"><strong><em>TuringTalks.ai</em></strong></a> <em>for more hands-on tutorials on AI. You can also</em> <a target="_blank" href="https://manishshivanandhan.com/"><strong><em>visit my website</em></strong></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ A Game Developer’s Guide to Understanding Screen Resolution ]]>
                </title>
                <description>
                    <![CDATA[ Every game developer obsesses over performance, textures, and frame rates, but resolution is the quiet foundation that makes or breaks visual quality.  Whether you are building a pixel-art indie game or a high-fidelity 3D world, understanding how res... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/a-game-developers-guide-to-understanding-screen-resolution/</link>
                <guid isPermaLink="false">691de96a0dec4f292a0f8ff0</guid>
                
                    <category>
                        <![CDATA[ Game Development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ optimization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ performance ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Wed, 19 Nov 2025 15:59:38 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763567809746/3fb2c926-9602-4765-9ef4-5ea565e0e148.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Every game developer obsesses over performance, textures, and frame rates, but resolution is the quiet foundation that makes or breaks visual quality. </p>
<p>Whether you are building a pixel-art indie game or a high-fidelity 3D world, understanding how resolution works is essential. </p>
<p>It affects how your art assets scale, how your UI appears, and how your game feels on different screens. Yet, many developers still treat resolution as a simple number instead of a design decision.</p>
<p>Let’s learn what resolutions are and why it matters for game developers. </p>
<h2 id="heading-what-we-will-cover">What we will Cover</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-what-resolution-really-means">What Resolution Really Means</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-evolution-of-resolution-in-gaming">The Evolution of Resolution in Gaming</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-dpi-scaling-and-texture-clarity">DPI, Scaling, and Texture Clarity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-resolution-vs-performance">Resolution vs. Performance</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-aspect-ratio-and-display-diversity">Aspect Ratio and Display Diversity</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-art-of-testing-in-4k-and-hdr">The Art of Testing in 4K and HDR</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-preparing-for-next-gen-displays">Preparing for Next-Gen Displays</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-what-resolution-really-means">What Resolution Really Means</h2>
<p>Resolution defines how many pixels a screen can display horizontally and vertically.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763470514266/2ba4689a-6e8d-423d-8da7-694bf7bc6d9e.png" alt="Screen Resolution Sizes" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>A monitor labelled 1920x1080 has 1920 pixels across and 1080 down, which equals over two million pixels in total. More pixels mean more visual detail but also more rendering work for the GPU.</p>
<p>In game development, that tradeoff is constant. Rendering at higher resolutions improves clarity but reduces frame rates unless your code and assets are optimized. </p>
<p>Many developers solve this by offering resolution scaling options in their games, letting players balance visual quality and performance.</p>
<p>It’s also important to distinguish between screen size and resolution. A 27-inch monitor and a 15-inch laptop can both run at 1080p, but the larger display will have bigger, less dense pixels. </p>
<p>This is where pixel density comes in. High-density displays pack more pixels per inch, creating smoother edges and sharper textures even at the same resolution.</p>
<h2 id="heading-the-evolution-of-resolution-in-gaming">The Evolution of Resolution in Gaming</h2>
<p>Games have evolved alongside display technology. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763514379811/7a5bef4e-5441-4b40-99cb-3d925865ac87.jpeg" alt="Gameplay Resolution" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Early consoles ran at 240p, then 480p during the SD era. The jump to HD with 720p and 1080p transformed game visuals. Suddenly, developers had to think about anti-aliasing, texture resolution, and UI scaling in new ways.</p>
<p>Today, 4K and HDR have become the standard for modern consoles and PCs. Developers now design with higher fidelity in mind, baking in lighting systems, shaders, and art pipelines that scale up to Ultra HD. </p>
<p>That’s why testing on different display resolutions isn’t just good practice, it’s critical for consistent player experience.</p>
<p>If you want to see how your game performs on large high-resolution displays, try testing it on a modern TV for PS5. These screens are optimized for 4K and 120Hz refresh rates, giving you a realistic look at how your game will appear in a living-room setup. </p>
<p>They also help you spot UI scaling issues, frame pacing problems, and HDR color mismatches that might go unnoticed on a typical monitor.</p>
<h2 id="heading-dpi-scaling-and-texture-clarity">DPI, Scaling, and Texture Clarity</h2>
<p>For web developers, <a target="_blank" href="https://en.wikipedia.org/wiki/Dots_per_inch">DPI</a> mostly affects how images scale. But for game developers, DPI connects directly to texture resolution and how art assets are perceived at different screen sizes. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763470672635/57795a33-7700-4aee-8dd4-aceb8b71dd49.jpeg" alt="DPI Levels" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>A sprite that looks crisp on a 1080p monitor might appear tiny or blurry on a 4K display if not properly scaled. Engines like <a target="_blank" href="https://www.freecodecamp.org/news/game-development-for-beginners-unity-course/">Unity</a> and Unreal handle this with dynamic scaling options, but understanding the underlying math helps. </p>
<p>When your display density doubles, each asset needs four times as many pixels to appear at the same size and sharpness. If you do not plan for this, your carefully crafted textures might look soft or misaligned on higher-resolution displays.</p>
<p>This is why UI systems in modern engines rely on resolution-independent units. In Unity, Canvas Scaler helps ensure your interface looks the same on every device. In Unreal, DPI scaling rules allow developers to maintain consistent HUD layouts. Getting this right means your game remains legible on everything from handhelds to 8K TVs.</p>
<h2 id="heading-resolution-vs-performance">Resolution vs Performance</h2>
<p>The biggest cost of higher resolution is GPU load. Rendering in 4K means pushing four times as many pixels as 1080p. Without proper optimization, frame rates can drop sharply. </p>
<p>That’s why many <a target="_blank" href="https://en.wikipedia.org/wiki/AAA_%28video_game_industry%29">AAA games</a> use resolution scaling techniques like temporal upsampling or DLSS. These methods render frames at a lower resolution and then use AI or interpolation to upscale them without losing clarity.</p>
<p>As a developer, you should test your game across multiple resolutions and aspect ratios. This helps ensure your render pipeline, shaders, and assets adapt smoothly. Tools like <a target="_blank" href="https://developer.nvidia.com/nsight-systems">NVIDIA Nsight</a> or Unreal’s built-in profiler show how resolution affects frame time and GPU usage.</p>
<p>If your game includes video content or cinematic sequences, also remember that video compression behaves differently at higher resolutions. Encoding 4K video requires significantly more bandwidth and storage, which can affect your build size and performance during playback.</p>
<h2 id="heading-aspect-ratio-and-display-diversity">Aspect Ratio and Display Diversity</h2>
<p>Aspect ratio determines the shape of the display.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763476458560/52decf37-c4f4-4927-96b8-1c6fd9be074c.jpeg" alt="Aspect Ratios" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>Most modern games target 16:9, but 21:9 ultrawide and 32:9 super-ultrawide displays are becoming more popular. Developers must ensure their camera framing and UI layouts adapt accordingly.</p>
<p>When a game is locked to one ratio, black bars or stretching can occur. To fix this, adjust your camera’s field of view dynamically or provide safe viewport settings.</p>
<p>Engines like Unreal let you script these adjustments easily, while Unity’s Cinemachine system handles FOV scaling automatically.</p>
<p>Even TVs now vary in aspect ratio capabilities, especially with new mini LED and OLED technologies. Testing across multiple ratios ensures your game looks balanced and cinematic on every screen.</p>
<h2 id="heading-the-art-of-testing-in-4k-and-hdr">The Art of Testing in 4K and HDR</h2>
<p>4K and HDR introduce new layers of visual complexity. HDR displays show a wider range of brightness and color depth, which means lighting and textures can look completely different compared to SDR monitors. To handle this, calibrate your color grading pipeline and use tone mapping tools within your engine.</p>
<p>When working with HDR assets, always test your output on real hardware. Emulators and monitors often fail to reproduce true HDR contrast. A proper HDR-certified TV helps you identify overexposure, color clipping, and banding issues before release.</p>
<h2 id="heading-preparing-for-next-gen-displays">Preparing for Next-Gen Displays</h2>
<p>The display industry continues to evolve fast. 8K and high refresh rate panels are already entering mainstream markets. </p>
<p>For developers, this means thinking ahead. Designing scalable rendering systems, supporting dynamic resolution, and maintaining flexible UI layouts are now essential parts of modern game design.</p>
<p>As displays get sharper, player expectations rise too. Textures, shaders, and post-processing all need to support higher levels of detail without compromising performance. By understanding how resolution interacts with your pipeline, you can future-proof your games for years to come.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Resolution is more than a number on a settings menu. It is a design constraint, a performance factor, and a creative opportunity. As a game developer, mastering resolution helps you build experiences that look sharp, play smoothly, and scale across every device.</p>
<p>The next time you polish your textures or fine-tune your rendering settings, remember that every pixel counts. Understanding how resolution, scaling, and density interact will not only make your games more beautiful but also more accessible to every player, whether they’re gaming on a laptop, a monitor, or the living-room tv that brings your visuals to life in stunning detail.</p>
<p><em>Hope you enjoyed this article. Find me on</em> <a target="_blank" href="https://linkedin.com/in/manishmshiva"><em>Linkedin</em></a> <em>or</em> <a target="_blank" href="https://manishshivanandhan.com/"><em>visit my website</em></a><em>.</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Transformers for Real-Time Gesture Recognition ]]>
                </title>
                <description>
                    <![CDATA[ Gesture and sign recognition is a growing field in computer vision, powering accessibility tools and natural user interfaces. Most beginner projects rely on hand landmarks or small CNNs, but these often miss the bigger picture because gestures are no... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/using-transformers-for-real-time-gesture-recognition/</link>
                <guid isPermaLink="false">68e3c692aa82abf4b593114c</guid>
                
                    <category>
                        <![CDATA[ Computer Vision ]]>
                    </category>
                
                    <category>
                        <![CDATA[ transformers ]]>
                    </category>
                
                    <category>
                        <![CDATA[ pytorch ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ONNX ]]>
                    </category>
                
                    <category>
                        <![CDATA[ gradio ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Deep Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Gesture Recognition ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Tutorial ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ OMOTAYO OMOYEMI ]]>
                </dc:creator>
                <pubDate>Mon, 06 Oct 2025 13:39:30 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1759757931295/5f19fd4e-93c0-4bd7-a75c-a7858e061ecd.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Gesture and sign recognition is a growing field in computer vision, powering accessibility tools and natural user interfaces. Most beginner projects rely on hand landmarks or small CNNs, but these often miss the bigger picture because gestures are not static images. Rather, they unfold over time. To build more robust, real-time systems, we need models that can capture both spatial details and temporal context.</p>
<p>This is where Transformers come in. Originally built for language, they’ve become state-of-the-art in vision tasks thanks to models like the Vision Transformer (ViT) and video-focused variants such as TimeSformer.</p>
<p>In this tutorial, we’ll use a Transformer backbone to create a lightweight real-time gesture recognition tool, optimized for small datasets and deployable on a regular laptop webcam.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-transformers-for-gestures">Why Transformers for Gestures?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-youll-learn">What You’ll Learn</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-setup">Project Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-generate-a-gesture-dataset">Generate a Gesture Dataset</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-1-generate-a-synthetic-dataset">Option 1: Generate a Synthetic Dataset</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-training-script-trainpy">Training Script:</a> <a target="_blank" href="http://train.py">train.py</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-export-the-model-to-onnx">Export the Model to ONNX</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-evaluate-accuracy-latency">Evaluate Accuracy + Latency</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-option-2-use-small-samples-from-public-gesture-datasets">Option 2: Use Small Samples from Public Gesture Datasets</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-accessibility-notes-amp-ethical-limits">Accessibility Notes &amp; Ethical Limits</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-next-steps">Next Steps</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-why-transformers-for-gestures">Why Transformers for Gestures?</h2>
<p>Transformers are powerful because they use self-attention to model relationships across a sequence. For gestures, this means the model doesn’t just see isolated frames, but also learns how movements evolve over time. A wave, for example, looks different from a raised hand only when viewed as a sequence.</p>
<p>Vision Transformers process images as patches, while video Transformers extend this to multiple frames with temporal attention. Even a simple approach, like applying ViT to each frame and pooling across time, can outperform traditional CNN-based methods for small datasets.</p>
<p>Combined with Hugging Face’s pre-trained models and ONNX Runtime for optimization, Transformers make it possible to train on a modest dataset and still achieve smooth real-time recognition.</p>
<h2 id="heading-what-youll-learn">What You’ll Learn</h2>
<p>In this tutorial, you’ll build a gesture recognition system using Transformers. By the end, you’ll know how to:</p>
<ul>
<li><p>Create (or record) a tiny gesture dataset</p>
</li>
<li><p>Train a Vision Transformer (ViT) with temporal pooling</p>
</li>
<li><p>Export the model to ONNX for faster inference</p>
</li>
<li><p>Build a real-time Gradio app that classifies gestures from your webcam</p>
</li>
<li><p>Evaluate your model’s accuracy and latency with simple scripts</p>
</li>
<li><p>Understand the accessibility potential and ethical limits of gesture recognition</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along, you should have:</p>
<ul>
<li><p>Basic Python knowledge (functions, scripts, virtual environments)</p>
</li>
<li><p>Familiarity with PyTorch (tensors, datasets, training loops) – helpful but not required</p>
</li>
<li><p>Python 3.8+ installed on your system</p>
</li>
<li><p>A webcam (for the live demo in Gradio)</p>
</li>
<li><p>Optionally: GPU access (training on CPU works, but is slower)</p>
</li>
</ul>
<h2 id="heading-project-setup">Project Setup</h2>
<p>Create a new project folder and install the required libraries.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create a new project directory and navigate into it</span>
mkdir transformer-gesture &amp;&amp; <span class="hljs-built_in">cd</span> transformer-gesture

<span class="hljs-comment"># Set up a Python virtual environment</span>
python -m venv .venv

<span class="hljs-comment"># Activate the virtual environment</span>
<span class="hljs-comment"># Windows PowerShell</span>
.venv\Scripts\Activate.ps1

<span class="hljs-comment"># macOS/Linux</span>
<span class="hljs-built_in">source</span> .venv/bin/activate
</code></pre>
<p>The provided code snippet is a set of commands for setting up a new Python project with a virtual environment. Here's a breakdown of each part:</p>
<ol>
<li><p><code>mkdir transformer-gesture &amp;&amp; cd transformer-gesture</code>: This command creates a new directory named "transformer-gesture" and then navigates into it.</p>
</li>
<li><p><code>python -m venv .venv</code>: This command creates a new virtual environment in the current directory. The virtual environment is stored in a folder named ".venv".</p>
</li>
<li><p>Activating the virtual environment:</p>
<ul>
<li><p>For Windows PowerShell, you can use <code>.venv\Scripts\Activate.ps1</code> to activate the virtual environment.</p>
</li>
<li><p>For macOS/Linux, use <code>source .venv/bin/activate</code> to activate the virtual environment.</p>
</li>
</ul>
</li>
</ol>
<p>Activating a virtual environment ensures that the Python interpreter and any packages you install are isolated to this specific project, preventing conflicts with other projects or system-wide packages.</p>
<p>Create a <code>requirements.txt</code> file:</p>
<pre><code class="lang-plaintext">torch&gt;=2.0
torchvision
torchaudio
timm
huggingface_hub

onnx
onnxruntime

gradio

numpy
opencv-python
pillow

matplotlib
seaborn
scikit-learn
</code></pre>
<p>The list provided is a set of package dependencies typically found in a <code>requirements.txt</code> file for a Python project. Here's a brief explanation of each package:</p>
<ol>
<li><p><strong>torch&gt;=2.0</strong>: PyTorch is a popular open-source deep learning framework that provides a flexible and efficient platform for building and training neural networks. Version 2.0 and above includes improvements in performance and new features.</p>
</li>
<li><p><strong>torchvision</strong>: This library is part of the PyTorch ecosystem and provides tools for computer vision tasks, including datasets, model architectures, and image transformations.</p>
</li>
<li><p><strong>torchaudio</strong>: Also part of the PyTorch ecosystem, Torchaudio provides audio processing tools and datasets, making it easier to work with audio data in deep learning projects.</p>
</li>
<li><p><strong>timm</strong>: The PyTorch Image Models (timm) library offers a collection of pre-trained models and utilities for computer vision tasks, facilitating quick experimentation and deployment.</p>
</li>
<li><p><strong>huggingface_hub</strong>: This library allows easy access to models and datasets hosted on the Hugging Face Hub, a platform for sharing and collaborating on machine learning models and datasets.</p>
</li>
<li><p><strong>onnx</strong>: The Open Neural Network Exchange (ONNX) format is used to represent machine learning models, enabling interoperability between different frameworks.</p>
</li>
<li><p><strong>onnxruntime</strong>: This is a high-performance runtime for executing ONNX models, allowing for efficient deployment across various platforms.</p>
</li>
<li><p><strong>gradio</strong>: Gradio is a library for creating user interfaces for machine learning models, making them accessible through a web interface for easy interaction and testing.</p>
</li>
<li><p><strong>numpy</strong>: A fundamental package for numerical computing in Python, providing support for arrays and a wide range of mathematical functions.</p>
</li>
<li><p><strong>opencv-python</strong>: OpenCV is a library for computer vision and image processing tasks, widely used for real-time applications.</p>
</li>
<li><p><strong>pillow</strong>: A Python Imaging Library (PIL) fork, Pillow provides tools for opening, manipulating, and saving many different image file formats.</p>
</li>
<li><p><strong>matplotlib</strong>: A plotting library for Python, Matplotlib is used for creating static, interactive, and animated visualizations in Python.</p>
</li>
<li><p><strong>seaborn</strong>: Built on top of Matplotlib, Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.</p>
</li>
<li><p><strong>scikit-learn</strong>: A machine learning library in Python that provides simple and efficient tools for data analysis and modeling, including classification, regression, clustering, and dimensionality reduction.</p>
</li>
</ol>
<p>Install dependencies:</p>
<pre><code class="lang-bash">pip install -r requirements.txt
</code></pre>
<p>The command <code>pip install -r requirements.txt</code> is used to install all the Python packages listed in a file named <code>requirements.txt</code>. This file typically contains a list of package dependencies required for a Python project, each specified with a package name and optionally a version number.</p>
<p>By running this command, <code>pip</code>, which is the Python package installer, reads the file and installs each package listed, ensuring that the project has all the necessary dependencies to run properly. This is a common practice in Python projects to manage and share dependencies easily.</p>
<h2 id="heading-generate-a-gesture-dataset">Generate a Gesture Dataset</h2>
<p>To train our Transformer-based gesture recognizer, we need some data. Instead of downloading a huge dataset, we’ll start with a tiny synthetic dataset you can generate in seconds. This makes the tutorial lightweight and ensures that everyone can follow along without dealing with multi-gigabyte downloads.</p>
<h2 id="heading-option-1-generate-a-synthetic-dataset">Option 1: Generate a Synthetic Dataset</h2>
<p>We’ll use a small Python script that creates short <code>.mp4</code> clips of a moving (or still) coloured box. Each class represents a gesture:</p>
<ul>
<li><p><strong>swipe_left</strong> – box moves from right to left</p>
</li>
<li><p><strong>swipe_right</strong> – box moves from left to right</p>
</li>
<li><p><strong>stop</strong> – box stays still in the center</p>
</li>
</ul>
<p>Save this script as <code>generate_synthetic_gestures.py</code> in your project root:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os, cv2, numpy <span class="hljs-keyword">as</span> np, random, argparse

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">ensure_dir</span>(<span class="hljs-params">p</span>):</span> os.makedirs(p, exist_ok=<span class="hljs-literal">True</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">make_clip</span>(<span class="hljs-params">mode, out_path, seconds=<span class="hljs-number">1.5</span>, fps=<span class="hljs-number">16</span>, size=<span class="hljs-number">224</span>, box_size=<span class="hljs-number">60</span>, seed=<span class="hljs-number">0</span>, codec=<span class="hljs-string">"mp4v"</span></span>):</span>
    rng = random.Random(seed)
    frames = int(seconds * fps)
    H = W = size

    <span class="hljs-comment"># background + box color</span>
    bg_val = rng.randint(<span class="hljs-number">160</span>, <span class="hljs-number">220</span>)
    bg = np.full((H, W, <span class="hljs-number">3</span>), bg_val, dtype=np.uint8)
    color = (rng.randint(<span class="hljs-number">20</span>, <span class="hljs-number">80</span>), rng.randint(<span class="hljs-number">20</span>, <span class="hljs-number">80</span>), rng.randint(<span class="hljs-number">20</span>, <span class="hljs-number">80</span>))

    <span class="hljs-comment"># path of motion</span>
    y = rng.randint(<span class="hljs-number">40</span>, H - <span class="hljs-number">40</span> - box_size)
    <span class="hljs-keyword">if</span> mode == <span class="hljs-string">"swipe_left"</span>:
        x_start, x_end = W - <span class="hljs-number">20</span> - box_size, <span class="hljs-number">20</span>
    <span class="hljs-keyword">elif</span> mode == <span class="hljs-string">"swipe_right"</span>:
        x_start, x_end = <span class="hljs-number">20</span>, W - <span class="hljs-number">20</span> - box_size
    <span class="hljs-keyword">elif</span> mode == <span class="hljs-string">"stop"</span>:
        x_start = x_end = (W - box_size) // <span class="hljs-number">2</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">f"Unknown mode: <span class="hljs-subst">{mode}</span>"</span>)

    fourcc = cv2.VideoWriter_fourcc(*codec)
    vw = cv2.VideoWriter(out_path, fourcc, fps, (W, H))
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> vw.isOpened():
        <span class="hljs-keyword">raise</span> RuntimeError(
            <span class="hljs-string">f"Could not open VideoWriter with codec '<span class="hljs-subst">{codec}</span>'. "</span>
            <span class="hljs-string">"Try --codec XVID and use .avi extension, e.g. out.avi"</span>
        )

    <span class="hljs-keyword">for</span> t <span class="hljs-keyword">in</span> range(frames):
        alpha = t / max(<span class="hljs-number">1</span>, frames - <span class="hljs-number">1</span>)
        x = int((<span class="hljs-number">1</span> - alpha) * x_start + alpha * x_end)
        <span class="hljs-comment"># small jitter to avoid being too synthetic</span>
        jitter_x, jitter_y = rng.randint(<span class="hljs-number">-2</span>, <span class="hljs-number">2</span>), rng.randint(<span class="hljs-number">-2</span>, <span class="hljs-number">2</span>)
        frame = bg.copy()
        cv2.rectangle(frame, (x + jitter_x, y + jitter_y),
                      (x + jitter_x + box_size, y + jitter_y + box_size),
                      color, thickness=<span class="hljs-number">-1</span>)
        <span class="hljs-comment"># overlay text</span>
        cv2.putText(frame, mode, (<span class="hljs-number">8</span>, <span class="hljs-number">24</span>), cv2.FONT_HERSHEY_SIMPLEX, <span class="hljs-number">0.7</span>, (<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>), <span class="hljs-number">2</span>, cv2.LINE_AA)
        cv2.putText(frame, mode, (<span class="hljs-number">8</span>, <span class="hljs-number">24</span>), cv2.FONT_HERSHEY_SIMPLEX, <span class="hljs-number">0.7</span>, (<span class="hljs-number">255</span>, <span class="hljs-number">255</span>, <span class="hljs-number">255</span>), <span class="hljs-number">1</span>, cv2.LINE_AA)
        vw.write(frame)

    vw.release()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">write_labels</span>(<span class="hljs-params">labels, out_dir</span>):</span>
    <span class="hljs-keyword">with</span> open(os.path.join(out_dir, <span class="hljs-string">"labels.txt"</span>), <span class="hljs-string">"w"</span>, encoding=<span class="hljs-string">"utf-8"</span>) <span class="hljs-keyword">as</span> f:
        <span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> labels:
            f.write(c + <span class="hljs-string">"\n"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    ap = argparse.ArgumentParser(description=<span class="hljs-string">"Generate a tiny synthetic gesture dataset."</span>)
    ap.add_argument(<span class="hljs-string">"--out"</span>, default=<span class="hljs-string">"data"</span>, help=<span class="hljs-string">"Output directory (default: data)"</span>)
    ap.add_argument(<span class="hljs-string">"--classes"</span>, nargs=<span class="hljs-string">"+"</span>,
                    default=[<span class="hljs-string">"swipe_left"</span>, <span class="hljs-string">"swipe_right"</span>, <span class="hljs-string">"stop"</span>],
                    help=<span class="hljs-string">"Class names (default: swipe_left swipe_right stop)"</span>)
    ap.add_argument(<span class="hljs-string">"--clips"</span>, type=int, default=<span class="hljs-number">16</span>, help=<span class="hljs-string">"Clips per class (default: 16)"</span>)
    ap.add_argument(<span class="hljs-string">"--seconds"</span>, type=float, default=<span class="hljs-number">1.5</span>, help=<span class="hljs-string">"Seconds per clip (default: 1.5)"</span>)
    ap.add_argument(<span class="hljs-string">"--fps"</span>, type=int, default=<span class="hljs-number">16</span>, help=<span class="hljs-string">"Frames per second (default: 16)"</span>)
    ap.add_argument(<span class="hljs-string">"--size"</span>, type=int, default=<span class="hljs-number">224</span>, help=<span class="hljs-string">"Frame size WxH (default: 224)"</span>)
    ap.add_argument(<span class="hljs-string">"--box"</span>, type=int, default=<span class="hljs-number">60</span>, help=<span class="hljs-string">"Box size (default: 60)"</span>)
    ap.add_argument(<span class="hljs-string">"--codec"</span>, default=<span class="hljs-string">"mp4v"</span>, help=<span class="hljs-string">"Codec fourcc (mp4v or XVID)"</span>)
    ap.add_argument(<span class="hljs-string">"--ext"</span>, default=<span class="hljs-string">".mp4"</span>, help=<span class="hljs-string">"File extension (.mp4 or .avi)"</span>)
    args = ap.parse_args()

    ensure_dir(args.out)
    write_labels(args.classes, <span class="hljs-string">"."</span>)  <span class="hljs-comment"># writes labels.txt to project root</span>

    print(<span class="hljs-string">f"Generating synthetic dataset -&gt; <span class="hljs-subst">{args.out}</span>"</span>)
    <span class="hljs-keyword">for</span> cls <span class="hljs-keyword">in</span> args.classes:
        cls_dir = os.path.join(args.out, cls)
        ensure_dir(cls_dir)
        mode = <span class="hljs-string">"stop"</span> <span class="hljs-keyword">if</span> cls == <span class="hljs-string">"stop"</span> <span class="hljs-keyword">else</span> (<span class="hljs-string">"swipe_left"</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"left"</span> <span class="hljs-keyword">in</span> cls <span class="hljs-keyword">else</span> (<span class="hljs-string">"swipe_right"</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"right"</span> <span class="hljs-keyword">in</span> cls <span class="hljs-keyword">else</span> <span class="hljs-string">"stop"</span>))
        <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(args.clips):
            filename = os.path.join(cls_dir, <span class="hljs-string">f"<span class="hljs-subst">{cls}</span>_<span class="hljs-subst">{i+<span class="hljs-number">1</span>:<span class="hljs-number">03</span>d}</span><span class="hljs-subst">{args.ext}</span>"</span>)
            make_clip(
                mode=mode,
                out_path=filename,
                seconds=args.seconds,
                fps=args.fps,
                size=args.size,
                box_size=args.box,
                seed=i + <span class="hljs-number">1</span>,
                codec=args.codec
            )
        print(<span class="hljs-string">f"  <span class="hljs-subst">{cls}</span>: <span class="hljs-subst">{args.clips}</span> clips"</span>)

    print(<span class="hljs-string">"Done. You can now run: python train.py, python export_onnx.py, python app.py"</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>The script generates a synthetic gesture dataset by creating video clips of a moving or stationary coloured box, simulating gestures like "swipe left," "swipe right," and "stop," and saves them in a specified output directory.</p>
<p>Now run it inside your virtual environment:</p>
<pre><code class="lang-bash">python generate_synthetic_gestures.py --out data --clips 16 --seconds 1.5
</code></pre>
<p>The command above runs a Python script named <code>generate_synthetic_gestures.py</code>, which generates a synthetic gesture dataset with 16 clips per gesture, each lasting 1.5 seconds, and saves the output in a directory named "data".</p>
<p>This creates a dataset like:</p>
<pre><code class="lang-plaintext">data/
  swipe_left/*.mp4
  swipe_right/*.mp4
  stop/*.mp4
labels.txt
</code></pre>
<p>Each folder contains short clips of a moving (or still) box that simulate gestures. This is perfect for testing the pipeline.</p>
<h3 id="heading-training-script-trainpy">Training Script: <code>train.py</code></h3>
<p>Now that we have our dataset, let’s fine-tune a Vision Transformer with temporal pooling. This model applies ViT frame-by-frame, averages embeddings across time, and trains a classification head on your gestures.</p>
<p>Here’s the full training script:</p>
<pre><code class="lang-python"><span class="hljs-comment"># train.py</span>
<span class="hljs-keyword">import</span> torch, torch.nn <span class="hljs-keyword">as</span> nn, torch.optim <span class="hljs-keyword">as</span> optim
<span class="hljs-keyword">from</span> torch.utils.data <span class="hljs-keyword">import</span> DataLoader
<span class="hljs-keyword">import</span> timm
<span class="hljs-keyword">from</span> dataset <span class="hljs-keyword">import</span> GestureClips, read_labels

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ViTTemporal</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-string">"""Frame-wise ViT encoder -&gt; mean pool over time -&gt; linear head."""</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, num_classes, vit_name=<span class="hljs-string">"vit_tiny_patch16_224"</span></span>):</span>
        super().__init__()
        self.vit = timm.create_model(vit_name, pretrained=<span class="hljs-literal">True</span>, num_classes=<span class="hljs-number">0</span>, global_pool=<span class="hljs-string">"avg"</span>)
        feat_dim = self.vit.num_features
        self.head = nn.Linear(feat_dim, num_classes)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>  <span class="hljs-comment"># x: (B,T,C,H,W)</span>
        B, T, C, H, W = x.shape
        x = x.view(B * T, C, H, W)
        feats = self.vit(x)                  <span class="hljs-comment"># (B*T, D)</span>
        feats = feats.view(B, T, <span class="hljs-number">-1</span>).mean(dim=<span class="hljs-number">1</span>)  <span class="hljs-comment"># (B, D)</span>
        <span class="hljs-keyword">return</span> self.head(feats)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>():</span>
    device = <span class="hljs-string">"cuda"</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>
    labels, _ = read_labels(<span class="hljs-string">"labels.txt"</span>)
    n_classes = len(labels)

    train_ds = GestureClips(train=<span class="hljs-literal">True</span>)
    val_ds   = GestureClips(train=<span class="hljs-literal">False</span>)
    print(<span class="hljs-string">f"Train clips: <span class="hljs-subst">{len(train_ds)}</span> | Val clips: <span class="hljs-subst">{len(val_ds)}</span>"</span>)

    <span class="hljs-comment"># Windows/CPU friendly</span>
    train_dl = DataLoader(train_ds, batch_size=<span class="hljs-number">2</span>, shuffle=<span class="hljs-literal">True</span>,  num_workers=<span class="hljs-number">0</span>, pin_memory=<span class="hljs-literal">False</span>)
    val_dl   = DataLoader(val_ds,   batch_size=<span class="hljs-number">2</span>, shuffle=<span class="hljs-literal">False</span>, num_workers=<span class="hljs-number">0</span>, pin_memory=<span class="hljs-literal">False</span>)

    model = ViTTemporal(num_classes=n_classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=<span class="hljs-number">3e-4</span>, weight_decay=<span class="hljs-number">0.05</span>)

    best_acc = <span class="hljs-number">0.0</span>
    epochs = <span class="hljs-number">5</span>
    <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, epochs + <span class="hljs-number">1</span>):
        <span class="hljs-comment"># ---- Train ----</span>
        model.train()
        total, correct, loss_sum = <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0.0</span>
        <span class="hljs-keyword">for</span> x, y <span class="hljs-keyword">in</span> train_dl:
            x, y = x.to(device), y.to(device)
            optimizer.zero_grad()
            logits = model(x)
            loss = criterion(logits, y)
            loss.backward()
            optimizer.step()

            loss_sum += loss.item() * x.size(<span class="hljs-number">0</span>)
            correct += (logits.argmax(<span class="hljs-number">1</span>) == y).sum().item()
            total += x.size(<span class="hljs-number">0</span>)

        train_acc = correct / total <span class="hljs-keyword">if</span> total <span class="hljs-keyword">else</span> <span class="hljs-number">0.0</span>
        train_loss = loss_sum / total <span class="hljs-keyword">if</span> total <span class="hljs-keyword">else</span> <span class="hljs-number">0.0</span>

        <span class="hljs-comment"># ---- Validate ----</span>
        model.eval()
        vtotal, vcorrect = <span class="hljs-number">0</span>, <span class="hljs-number">0</span>
        <span class="hljs-keyword">with</span> torch.no_grad():
            <span class="hljs-keyword">for</span> x, y <span class="hljs-keyword">in</span> val_dl:
                x, y = x.to(device), y.to(device)
                vcorrect += (model(x).argmax(<span class="hljs-number">1</span>) == y).sum().item()
                vtotal += x.size(<span class="hljs-number">0</span>)
        val_acc = vcorrect / vtotal <span class="hljs-keyword">if</span> vtotal <span class="hljs-keyword">else</span> <span class="hljs-number">0.0</span>

        print(<span class="hljs-string">f"Epoch <span class="hljs-subst">{epoch:<span class="hljs-number">02</span>d}</span> | train_loss <span class="hljs-subst">{train_loss:<span class="hljs-number">.4</span>f}</span> "</span>
              <span class="hljs-string">f"| train_acc <span class="hljs-subst">{train_acc:<span class="hljs-number">.3</span>f}</span> | val_acc <span class="hljs-subst">{val_acc:<span class="hljs-number">.3</span>f}</span>"</span>)

        <span class="hljs-keyword">if</span> val_acc &gt; best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), <span class="hljs-string">"vit_temporal_best.pt"</span>)

    print(<span class="hljs-string">"Best val acc:"</span>, best_acc)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    train()
</code></pre>
<p>Running the command <code>python train.py</code> initiates the training process for your gesture recognition model. Here's a breakdown of what happens:</p>
<ol>
<li><p><strong>Load your dataset from data/</strong>: The script will access and load the gesture dataset stored in the "data" directory. This dataset is used to train the model.</p>
</li>
<li><p><strong>Fine-tune a pre-trained Vision Transformer</strong>: The training script will take a Vision Transformer model that has been pre-trained on a larger dataset and fine-tune it using your specific gesture dataset. Fine-tuning helps the model adapt to the nuances of your data, improving its performance on the specific task of gesture recognition.</p>
</li>
<li><p><strong>Save the best checkpoint as vit_temporal_best.pt</strong>: During training, the script will evaluate the model's performance on a validation set. The best-performing version of the model (based on some metric like accuracy) will be saved as a checkpoint file named "vit_temporal_best.pt". This file can later be used for inference or further training.</p>
</li>
</ol>
<h4 id="heading-what-training-looks-like">What Training Looks Like</h4>
<p>You should see logs similar to this:</p>
<pre><code class="lang-plaintext">Train clips: 38 | Val clips: 10
Epoch 01 | train_loss 1.4508 | train_acc 0.395 | val_acc 0.200
Epoch 02 | train_loss 1.2466 | train_acc 0.263 | val_acc 0.200
Epoch 03 | train_loss 1.1361 | train_acc 0.368 | val_acc 0.200
Best val acc: 0.200
</code></pre>
<p>Don’t worry if your accuracy is low at first, as with the synthetic dataset that’s normal. The key is proving that the Transformer pipeline works. You can boost results later by:</p>
<ul>
<li><p>Adding more clips per class</p>
</li>
<li><p>Training for more epochs</p>
</li>
<li><p>Switching to real recorded gestures</p>
</li>
</ul>
<p><img src="https://github.com/tayo4christ/transformer-gesture/blob/07c7071bdb17bc08585baeb60d787eadc3936ef5/images/training-logs.png?raw=true" alt="Training logs" width="600" height="400" loading="lazy"></p>
<p>Figure 1. Example training logs from <code>train.py</code>, where the Vision Transformer with temporal pooling is fine-tuned on a tiny synthetic dataset.</p>
<h3 id="heading-export-the-model-to-onnx">Export the Model to ONNX</h3>
<p>To make our model easier to run in real time (and lighter on CPU), we’ll export it to the ONNX format.</p>
<p><strong>Note:</strong> ONNX, which stands for Open Neural Network Exchange, is an open-source format designed to facilitate the interchange of deep learning models between different frameworks. It lets you train a model in one framework, such as PyTorch or TensorFlow, and then deploy it in another, like Caffe2 or MXNet, without needing to completely rewrite the model. This interoperability is achieved by providing a standardized representation of the model's architecture and parameters.</p>
<p>ONNX supports a wide range of operators and is continually updated to include new features, making it a versatile choice for deploying machine learning models across various platforms and devices.</p>
<p>Create a file called <code>export_onnx.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> train <span class="hljs-keyword">import</span> ViTTemporal
<span class="hljs-keyword">from</span> dataset <span class="hljs-keyword">import</span> read_labels

labels, _ = read_labels(<span class="hljs-string">"labels.txt"</span>)
n_classes = len(labels)

<span class="hljs-comment"># Load trained model</span>
model = ViTTemporal(num_classes=n_classes)
model.load_state_dict(torch.load(<span class="hljs-string">"vit_temporal_best.pt"</span>, map_location=<span class="hljs-string">"cpu"</span>))
model.eval()

<span class="hljs-comment"># Dummy input: batch=1, 16 frames, 3x224x224</span>
dummy = torch.randn(<span class="hljs-number">1</span>, <span class="hljs-number">16</span>, <span class="hljs-number">3</span>, <span class="hljs-number">224</span>, <span class="hljs-number">224</span>)

<span class="hljs-comment"># Export</span>
torch.onnx.export(
    model, dummy, <span class="hljs-string">"vit_temporal.onnx"</span>,
    input_names=[<span class="hljs-string">"video"</span>], output_names=[<span class="hljs-string">"logits"</span>],
    dynamic_axes={<span class="hljs-string">"video"</span>: {<span class="hljs-number">0</span>: <span class="hljs-string">"batch"</span>}},
    opset_version=<span class="hljs-number">13</span>
)

print(<span class="hljs-string">"Exported vit_temporal.onnx"</span>)
</code></pre>
<p>Run it with <code>python export_onnx.py</code>.</p>
<p>This generates a file <code>vit_temporal.onnx</code> in your project folder. ONNX lets us use onnxruntime, which is much faster for inference.</p>
<p>Create a file called <code>app.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os, tempfile, cv2, torch, onnxruntime, numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> gradio <span class="hljs-keyword">as</span> gr
<span class="hljs-keyword">from</span> dataset <span class="hljs-keyword">import</span> read_labels

T = <span class="hljs-number">16</span>
SIZE = <span class="hljs-number">224</span>
MODEL_PATH = <span class="hljs-string">"vit_temporal.onnx"</span>

labels, _ = read_labels(<span class="hljs-string">"labels.txt"</span>)

<span class="hljs-comment"># --- ONNX session + auto-detect names ---</span>
ort_session = onnxruntime.InferenceSession(MODEL_PATH, providers=[<span class="hljs-string">"CPUExecutionProvider"</span>])
<span class="hljs-comment"># detect first input and first output names to avoid mismatches</span>
INPUT_NAME = ort_session.get_inputs()[<span class="hljs-number">0</span>].name   <span class="hljs-comment"># e.g. "input" or "video"</span>
OUTPUT_NAME = ort_session.get_outputs()[<span class="hljs-number">0</span>].name <span class="hljs-comment"># e.g. "logits" or something else</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">preprocess_clip</span>(<span class="hljs-params">frames_rgb</span>):</span>
    <span class="hljs-keyword">if</span> len(frames_rgb) == <span class="hljs-number">0</span>:
        frames_rgb = [np.zeros((SIZE, SIZE, <span class="hljs-number">3</span>), dtype=np.uint8)]
    <span class="hljs-keyword">if</span> len(frames_rgb) &lt; T:
        frames_rgb = frames_rgb + [frames_rgb[<span class="hljs-number">-1</span>]] * (T - len(frames_rgb))
    frames_rgb = frames_rgb[:T]
    clip = [cv2.resize(f, (SIZE, SIZE), interpolation=cv2.INTER_AREA) <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> frames_rgb]
    clip = np.stack(clip, axis=<span class="hljs-number">0</span>)                                    <span class="hljs-comment"># (T,H,W,3)</span>
    clip = np.transpose(clip, (<span class="hljs-number">0</span>, <span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>)).astype(np.float32) / <span class="hljs-number">255</span> <span class="hljs-comment"># (T,3,H,W)</span>
    clip = (clip - <span class="hljs-number">0.5</span>) / <span class="hljs-number">0.5</span>
    clip = np.expand_dims(clip, <span class="hljs-number">0</span>)                                   <span class="hljs-comment"># (1,T,3,H,W)</span>
    <span class="hljs-keyword">return</span> clip

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_extract_path_from_gradio_video</span>(<span class="hljs-params">inp</span>):</span>
    <span class="hljs-keyword">if</span> isinstance(inp, str) <span class="hljs-keyword">and</span> os.path.exists(inp):
        <span class="hljs-keyword">return</span> inp
    <span class="hljs-keyword">if</span> isinstance(inp, dict):
        <span class="hljs-keyword">for</span> key <span class="hljs-keyword">in</span> (<span class="hljs-string">"video"</span>, <span class="hljs-string">"name"</span>, <span class="hljs-string">"path"</span>, <span class="hljs-string">"filepath"</span>):
            v = inp.get(key)
            <span class="hljs-keyword">if</span> isinstance(v, str) <span class="hljs-keyword">and</span> os.path.exists(v):
                <span class="hljs-keyword">return</span> v
        <span class="hljs-keyword">for</span> key <span class="hljs-keyword">in</span> (<span class="hljs-string">"data"</span>, <span class="hljs-string">"video"</span>):
            v = inp.get(key)
            <span class="hljs-keyword">if</span> isinstance(v, (bytes, bytearray)):
                tmp = tempfile.NamedTemporaryFile(delete=<span class="hljs-literal">False</span>, suffix=<span class="hljs-string">".mp4"</span>)
                tmp.write(v); tmp.flush(); tmp.close()
                <span class="hljs-keyword">return</span> tmp.name
    <span class="hljs-keyword">if</span> isinstance(inp, (list, tuple)) <span class="hljs-keyword">and</span> inp <span class="hljs-keyword">and</span> isinstance(inp[<span class="hljs-number">0</span>], str) <span class="hljs-keyword">and</span> os.path.exists(inp[<span class="hljs-number">0</span>]):
        <span class="hljs-keyword">return</span> inp[<span class="hljs-number">0</span>]
    <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_read_uniform_frames</span>(<span class="hljs-params">video_path</span>):</span>
    cap = cv2.VideoCapture(video_path)
    frames = []
    total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) <span class="hljs-keyword">or</span> <span class="hljs-number">1</span>
    idxs = np.linspace(<span class="hljs-number">0</span>, total - <span class="hljs-number">1</span>, max(T, <span class="hljs-number">1</span>)).astype(int)
    want = set(int(i) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> idxs.tolist())
    j = <span class="hljs-number">0</span>
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        ok, bgr = cap.read()
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> ok: <span class="hljs-keyword">break</span>
        <span class="hljs-keyword">if</span> j <span class="hljs-keyword">in</span> want:
            rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
            frames.append(rgb)
        j += <span class="hljs-number">1</span>
    cap.release()
    <span class="hljs-keyword">return</span> frames

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict_from_video</span>(<span class="hljs-params">gradio_video</span>):</span>
    video_path = _extract_path_from_gradio_video(gradio_video)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> video_path <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> os.path.exists(video_path):
        <span class="hljs-keyword">return</span> {}
    frames = _read_uniform_frames(video_path)

    <span class="hljs-comment"># If OpenCV choked on the codec (common with recorded webm), re-encode once:</span>
    <span class="hljs-keyword">if</span> len(frames) == <span class="hljs-number">0</span>:
        tmp = tempfile.NamedTemporaryFile(delete=<span class="hljs-literal">False</span>, suffix=<span class="hljs-string">".mp4"</span>); tmp_name = tmp.name; tmp.close()
        cap = cv2.VideoCapture(video_path)
        fourcc = cv2.VideoWriter_fourcc(*<span class="hljs-string">"mp4v"</span>)
        w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) <span class="hljs-keyword">or</span> <span class="hljs-number">640</span>
        h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) <span class="hljs-keyword">or</span> <span class="hljs-number">480</span>
        out = cv2.VideoWriter(tmp_name, fourcc, <span class="hljs-number">20.0</span>, (w, h))
        <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
            ok, frame = cap.read()
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> ok: <span class="hljs-keyword">break</span>
            out.write(frame)
        cap.release(); out.release()
        frames = _read_uniform_frames(tmp_name)

    clip = preprocess_clip(frames)
    <span class="hljs-comment"># &gt;&gt;&gt; use the detected ONNX input/output names &lt;&lt;&lt;</span>
    logits = ort_session.run([OUTPUT_NAME], {INPUT_NAME: clip})[<span class="hljs-number">0</span>]  <span class="hljs-comment"># (1, C)</span>
    probs = torch.softmax(torch.from_numpy(logits), dim=<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>].numpy().tolist()
    <span class="hljs-keyword">return</span> {labels[i]: float(probs[i]) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(labels))}

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">predict_from_image</span>(<span class="hljs-params">image</span>):</span>
    <span class="hljs-keyword">if</span> image <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
        <span class="hljs-keyword">return</span> {}
    clip = preprocess_clip([image] * T)
    logits = ort_session.run([OUTPUT_NAME], {INPUT_NAME: clip})[<span class="hljs-number">0</span>]
    probs = torch.softmax(torch.from_numpy(logits), dim=<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>].numpy().tolist()
    <span class="hljs-keyword">return</span> {labels[i]: float(probs[i]) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(labels))}

<span class="hljs-keyword">with</span> gr.Blocks() <span class="hljs-keyword">as</span> demo:
    gr.Markdown(<span class="hljs-string">"# Gesture Classifier (ONNX)\nRecord or upload a short video, then click **Classify Video**."</span>)
    <span class="hljs-keyword">with</span> gr.Tab(<span class="hljs-string">"Video (record or upload)"</span>):
        vid_in = gr.Video(label=<span class="hljs-string">"Record from webcam or upload a short clip"</span>)
        vid_out = gr.Label(num_top_classes=<span class="hljs-number">3</span>, label=<span class="hljs-string">"Prediction"</span>)
        gr.Button(<span class="hljs-string">"Classify Video"</span>).click(fn=predict_from_video, inputs=vid_in, outputs=vid_out)
    <span class="hljs-keyword">with</span> gr.Tab(<span class="hljs-string">"Single Image (fallback)"</span>):
        img_in = gr.Image(label=<span class="hljs-string">"Upload an image frame"</span>, type=<span class="hljs-string">"numpy"</span>)
        img_out = gr.Label(num_top_classes=<span class="hljs-number">3</span>, label=<span class="hljs-string">"Prediction"</span>)
        gr.Button(<span class="hljs-string">"Classify Image"</span>).click(fn=predict_from_image, inputs=img_in, outputs=img_out)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    demo.launch()
</code></pre>
<p>Running the command <code>python app.py</code> launches a Gradio application in your web browser. Here's what happens:</p>
<ol>
<li><p><strong>Webcam feed streams live</strong>: The application accesses your webcam to provide a live video feed. This allows you to perform gestures in front of the camera in real-time.</p>
</li>
<li><p><strong>Predictions update continuously</strong>: As you perform gestures, the model processes the video frames continuously, updating its predictions in real-time.</p>
</li>
<li><p><strong>Top 3 gesture classes displayed with probabilities</strong>: The application displays the top three predicted gesture classes along with their probabilities, giving you an idea of the model's confidence in its predictions.</p>
</li>
</ol>
<p>When you open the app in your browser, you'll find two tabs. In the <strong>Video tab</strong>, you can click <em>Record from webcam</em> to capture a short clip of your gesture, typically lasting 2–4 seconds. After recording, click <strong>Classify Video</strong>. The model will then process the captured frames using the Transformer model and display the predicted gesture probabilities. This setup allows for interactive testing and demonstration of the gesture recognition system.</p>
<p>Here’s an example where I raised my hand for a <strong>stop</strong> gesture, and the model predicts “stop” as the top class:</p>
<p><img src="https://github.com/tayo4christ/transformer-gesture/blob/07c7071bdb17bc08585baeb60d787eadc3936ef5/images/realtime-demo.png?raw=true" alt="Gradio demo output" width="600" height="400" loading="lazy"></p>
<p>Figure 2. The Gradio app running locally. After recording a short clip, the Transformer model predicts the gesture with class probabilities.</p>
<h3 id="heading-evaluate-accuracy-latency">Evaluate Accuracy + Latency</h3>
<p>Now that the model runs in a demo app, let’s check how well it performs. There are two sides to this:</p>
<ul>
<li><p><strong>Accuracy</strong>: does the model predict the right gesture class?</p>
</li>
<li><p><strong>Latency</strong>: how fast does it respond, especially on CPU vs GPU?</p>
</li>
</ul>
<h4 id="heading-1-quick-accuracy-check">1. Quick Accuracy Check</h4>
<p>Save this as <code>eval.py</code> in the same folder as your other scripts:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> dataset <span class="hljs-keyword">import</span> GestureClips, read_labels
<span class="hljs-keyword">from</span> train <span class="hljs-keyword">import</span> ViTTemporal

labels, _ = read_labels(<span class="hljs-string">"labels.txt"</span>)
n_classes = len(labels)

<span class="hljs-comment"># Load validation data</span>
val_ds = GestureClips(train=<span class="hljs-literal">False</span>)
val_dl = torch.utils.data.DataLoader(val_ds, batch_size=<span class="hljs-number">2</span>, shuffle=<span class="hljs-literal">False</span>)

<span class="hljs-comment"># Load trained model</span>
model = ViTTemporal(num_classes=n_classes)
model.load_state_dict(torch.load(<span class="hljs-string">"vit_temporal_best.pt"</span>, map_location=<span class="hljs-string">"cpu"</span>))
model.eval()

correct, total = <span class="hljs-number">0</span>, <span class="hljs-number">0</span>
all_preds, all_labels = [], []

<span class="hljs-keyword">with</span> torch.no_grad():
    <span class="hljs-keyword">for</span> x, y <span class="hljs-keyword">in</span> val_dl:
        logits = model(x)
        preds = logits.argmax(dim=<span class="hljs-number">1</span>)
        correct += (preds == y).sum().item()
        total += y.size(<span class="hljs-number">0</span>)
        all_preds.extend(preds.tolist())
        all_labels.extend(y.tolist())

print(<span class="hljs-string">f"Validation accuracy: <span class="hljs-subst">{correct/total:<span class="hljs-number">.2</span>%}</span>"</span>)
</code></pre>
<h4 id="heading-2-confusion-matrix">2. Confusion Matrix</h4>
<p>Let’s also visualize which gestures are confused. Add this snippet at the bottom of <code>eval.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> confusion_matrix

cm = confusion_matrix(all_labels, all_preds)

plt.figure(figsize=(<span class="hljs-number">6</span>,<span class="hljs-number">6</span>))
sns.heatmap(cm, annot=<span class="hljs-literal">True</span>, fmt=<span class="hljs-string">"d"</span>, xticklabels=labels, yticklabels=labels, cmap=<span class="hljs-string">"Blues"</span>)
plt.xlabel(<span class="hljs-string">"Predicted"</span>)
plt.ylabel(<span class="hljs-string">"True"</span>)
plt.title(<span class="hljs-string">"Confusion Matrix"</span>)
plt.tight_layout()
plt.show()
</code></pre>
<p>When you run <code>python eval.py</code>, a heatmap like this will pop up:</p>
<p><img src="https://github.com/tayo4christ/transformer-gesture/blob/07c7071bdb17bc08585baeb60d787eadc3936ef5/images/confusion-matrix.png?raw=true" alt="Confusion matrix" width="600" height="400" loading="lazy"></p>
<p>Figure 3. Confusion matrix on the validation set. Correct predictions appear along the diagonal. Off-diagonal counts show gesture confusions.</p>
<h4 id="heading-3-latency-benchmark">3. Latency Benchmark</h4>
<p>Finally, let’s see how fast inference runs. Save the following as <code>benchmark.py</code>:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time, numpy <span class="hljs-keyword">as</span> np, onnxruntime
<span class="hljs-keyword">from</span> dataset <span class="hljs-keyword">import</span> read_labels

labels, _ = read_labels(<span class="hljs-string">"labels.txt"</span>)

ort = onnxruntime.InferenceSession(<span class="hljs-string">"vit_temporal.onnx"</span>, providers=[<span class="hljs-string">"CPUExecutionProvider"</span>])
INPUT_NAME = ort.get_inputs()[<span class="hljs-number">0</span>].name
OUTPUT_NAME = ort.get_outputs()[<span class="hljs-number">0</span>].name

dummy = np.random.randn(<span class="hljs-number">1</span>, <span class="hljs-number">16</span>, <span class="hljs-number">3</span>, <span class="hljs-number">224</span>, <span class="hljs-number">224</span>).astype(np.float32)

<span class="hljs-comment"># Warmup</span>
<span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">3</span>):
    ort.run([OUTPUT_NAME], {INPUT_NAME: dummy})

<span class="hljs-comment"># Benchmark</span>
t0 = time.time()
<span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">50</span>):
    ort.run([OUTPUT_NAME], {INPUT_NAME: dummy})
t1 = time.time()

print(<span class="hljs-string">f"Average latency: <span class="hljs-subst">{(t1 - t0)/<span class="hljs-number">50</span>:<span class="hljs-number">.3</span>f}</span> seconds per clip"</span>)
</code></pre>
<p>Run: <code>python benchmark.py</code></p>
<p>On CPU, you might see ~0.05–0.15s per clip; on GPU it’s much faster.</p>
<p><strong>Note</strong>: If latency is high, you can enable <strong>quantization</strong> in ONNX to shrink the model and speed up inference.</p>
<h2 id="heading-option-2-use-small-samples-from-public-gesture-datasets">Option 2: Use Small Samples from Public Gesture Datasets</h2>
<p>If you’d prefer to see your model trained on <em>real</em> gesture clips instead of synthetic moving boxes, you can grab a handful of videos from open datasets. You don’t need to download the entire dataset (which can be several GB) just a few <code>.mp4</code> samples are enough to follow along.</p>
<h3 id="heading-recommended-sources">Recommended sources</h3>
<ul>
<li><p><strong>20BN Jester Dataset</strong>: Contains short clips of hand gestures like swiping, clapping, and pointing.</p>
</li>
<li><p><strong>WLASL</strong>: A large-scale dataset of isolated sign language words.</p>
</li>
</ul>
<p>Both projects provide small <code>.mp4</code> videos you can use as realistic training examples. I’ve linked them below.</p>
<h3 id="heading-setting-up-your-dataset-folder">Setting up your dataset folder</h3>
<p>Once you download a few clips, place them in the <code>data/</code> folder under subfolders named after each gesture class. For example:</p>
<pre><code class="lang-plaintext">data/
├── swipe_left/
│   ├── clip1.mp4
│   └── clip2.mp4
├── swipe_right/
│   ├── clip1.mp4
│   └── clip2.mp4
└── stop/
    ├── clip1.mp4
    └── clip2.mp4
</code></pre>
<p>And update <code>labels.txt</code> to match the folder names:</p>
<pre><code class="lang-plaintext">swipe_left
swipe_right
stop
</code></pre>
<p>Now your dataset is ready, and the same training scripts from earlier (<code>train.py</code>, <code>eval.py</code>) will work without modification.</p>
<h3 id="heading-why-choose-this-option">Why choose this option?</h3>
<ul>
<li><p>Gives more realistic results than synthetic coloured boxes</p>
</li>
<li><p>Lets you see how the model handles <em>actual human hand movements</em></p>
</li>
<li><p>It just requires a bit more effort (downloading clips, trimming them if needed)</p>
</li>
</ul>
<p><strong>Tip:</strong> If downloading from these datasets feels too heavy, you can also record your own short gestures using your laptop webcam. Just save them as <code>.mp4</code> files and organize them in the same folder structure.</p>
<h2 id="heading-accessibility-notes-amp-ethical-limits">Accessibility Notes &amp; Ethical Limits</h2>
<p>While this project shows the technical workflow for gesture recognition with Transformers, it’s important to step back and consider the <strong>human context</strong>:</p>
<ul>
<li><p><strong>Accessibility first</strong>: Tools like this can help students with speech or motor difficulties, but they should always be co-designed with the people who will use them. Don’t assume one-size-fits-all.</p>
</li>
<li><p><strong>Dataset sensitivity</strong>: Using publicly available sign or gesture datasets is fine for prototyping, but deploying such a system requires careful consideration of consent and representation.</p>
</li>
<li><p><strong>Error tolerance</strong>: Even small misclassifications can have big consequences in accessibility contexts (for example, confusing <em>stop</em> with <em>go</em>). Always plan for fallback options (like manual input or confirmation).</p>
</li>
<li><p><strong>Bias and inclusivity</strong>: Models trained on narrow datasets may fail for different skin tones, lighting conditions, or cultural gesture variations. Broad and diverse training data is essential for fairness.</p>
</li>
</ul>
<p>In other words: this demo is a <strong>teaching scaffold</strong>, not a production-ready accessibility tool. Responsible deployment requires collaboration with educators, therapists, and end users.</p>
<h2 id="heading-next-steps">Next Steps</h2>
<p>If you’d like to push this project further, here are some directions to explore:</p>
<ul>
<li><p><strong>Better models</strong>: Try video-focused Transformers like <a target="_blank" href="https://arxiv.org/abs/2102.05095">TimeSformer</a> or <a target="_blank" href="https://arxiv.org/abs/2203.12602">VideoMAE</a> for stronger temporal reasoning.</p>
</li>
<li><p><strong>Larger vocabularies</strong>: Add more gesture classes, build your own dataset, or use portions of public datasets like <a target="_blank" href="https://www.kaggle.com/datasets/toxicmender/20bn-jester">20BN Jester</a> or <a target="_blank" href="https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed">WLASL.</a></p>
</li>
<li><p><strong>Pose fusion</strong>: Combine gesture video with human pose keypoints from <a target="_blank" href="https://mediapipe.readthedocs.io/en/latest/solutions/hands.html">MediaPipe</a> or <a target="_blank" href="https://github.com/CMU-Perceptual-Computing-Lab/openpose">OpenPose</a> for more robust predictions.</p>
</li>
<li><p><strong>Real-time smoothing</strong>: Implement temporal smoothing or debounce logic in the app so predictions are more stable during live use.</p>
</li>
<li><p><strong>Quantization + edge devices</strong>: Convert your ONNX model to an INT8 quantized version and deploy it on a Raspberry Pi or Jetson Nano for classroom-ready prototypes.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, you learned how to create a gesture recognition system using Transformer models, demonstrating the potential of cutting-edge machine learning techniques. By preparing a small dataset, training a Vision Transformer with temporal pooling, exporting the model to ONNX for efficient inference, and deploying a real-time Gradio app, you showcased a practical application of these technologies. The evaluation of accuracy and latency further highlighted the system's effectiveness and responsiveness.</p>
<p>This project illustrates how you can leverage advanced ML methods to enhance accessibility and communication, paving the way for more inclusive learning environments.</p>
<p>Remember: while this demo works with small datasets, real-world applications need larger, more diverse data and careful consideration of accessibility, inclusivity, and ethics.</p>
<p>Here’s the GitHub repo for full source code: <a target="_blank" href="https://github.com/tayo4christ/transformer-gesture">transformer-gesture</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Multimodal Makaton-to-English Translator for Accessible Education ]]>
                </title>
                <description>
                    <![CDATA[ A year nine student walks into class full of ideas, but when it is time to contribute, the tools around them do not listen. Their speech is difficult for standard voice systems to recognise, typing feels slow and exhausting, and the lesson moves on w... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-multimodal-translator-for-accessible-education/</link>
                <guid isPermaLink="false">68cb5e6df1766dffdd20f610</guid>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ OMOTAYO OMOYEMI ]]>
                </dc:creator>
                <pubDate>Thu, 18 Sep 2025 01:20:45 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758158024064/bf3d7dac-0231-450a-9b40-6abf43085e49.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>A year nine student walks into class full of ideas, but when it is time to contribute, the tools around them do not listen. Their speech is difficult for standard voice systems to recognise, typing feels slow and exhausting, and the lesson moves on without their voice being heard. The challenge is not a lack of ability but a lack of access.</p>
<p>Across the world, millions of learners face communication barriers. Some live with apraxia of speech or dysarthria, others with limited mobility, hearing differences, or neurodiverse needs. When speaking, writing, or pointing is unreliable or tiring, participation becomes limited, feedback is lost, and confidence slowly erodes. This is not a rare exception but an everyday reality in classrooms.</p>
<p>These barriers appear in very practical ways. Students are skipped or misunderstood when they cannot respond quickly. Their ability is under-measured because their means of expression are constrained. Teachers struggle to maintain the pace of lessons while making individual accommodations. Peers interact less often, reducing opportunities for social belonging.</p>
<p>Assistive technologies have helped over the years, with tools like text-to-speech, symbol boards, and simple gesture inputs. Yet most of these tools are designed for a single mode of interaction. They assume the learner will either speak, or type, or tap. Real communication, however, is fluid. Learners naturally combine gestures, partial speech, symbols, and context to share meaning, especially when fatigue, anxiety, or motor challenges come into play.</p>
<p>This is where modern AI changes the picture. We are beginning to move beyond single-solution tools into multimodal systems that can understand speech, even when it is disordered, interpret gestures and visual symbols, combine signals to infer intent, and adapt in real time as the learner’s abilities develop or change.</p>
<p>AI is reshaping accessibility in education by shifting from isolated tools to multimodal and adaptive systems. These systems combine gesture, speech, and intelligent feedback to meet learners where they are, while also supporting their growth over time.</p>
<p>In this article, we will explore what this shift looks like in practice, how it can unlock participation, and how adaptive feedback personalises support and we will also build a hands-on multimodal demo that turns these ideas into a classroom-ready tool.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<ul>
<li><p><strong>An Operating System:</strong> Windows, macOS, or Linux</p>
</li>
<li><p><strong>Python installed (3.9 or later)</strong> – Along with <code>pip</code> for installing packages.</p>
</li>
<li><p><strong>Editor:</strong> Visual Studio Code or any Integrated development environment (IDE)</p>
</li>
<li><p><strong>Basics:</strong> Comfortable running commands in a terminal</p>
</li>
<li><p><strong>Optional hardware:</strong> Microphone (speech input), Webcam (single-frame tab), speakers (TTS playback)</p>
</li>
<li><p><strong>Internet:</strong> Required for the default SpeechRecognition (Google Web Speech API) and gTTS</p>
</li>
<li><p><strong>No dataset/model needed:</strong> A stub gesture classifier is provided so the demo runs end-to-end</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-weve-achieved-so-far">What We’ve Achieved So Far</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-case-study-1-translating-makaton-to-english">Case Study 1: Translating Makaton to English</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-case-study-2-aura-prototype-adaptive-speech-assistant">Case Study 2: AURA Prototype (Adaptive Speech Assistant)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-the-bigger-picture-multimodal-accessibility-tools">The Bigger Picture: Multimodal Accessibility Tools</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-a-multimodal-makaton-to-english-translator-gesture-speech">How to Build a Multimodal Makaton to English Translator (Gesture + Speech)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-project-overview">Project Overview</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-challenges-and-ethical-considerations">Challenges and Ethical Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-where-were-heading-next">Where We’re Heading Next</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion-building-an-inclusive-future-with-ai">Conclusion: Building an Inclusive Future with AI</a></p>
</li>
</ul>
<h2 id="heading-what-weve-achieved-so-far">What We’ve Achieved So Far</h2>
<p>The past few years have shown how AI can make classrooms more inclusive when we focus on accessibility. Developers, educators, and researchers are already experimenting with tools that bridge communication gaps.</p>
<p>In <a target="_blank" href="https://www.freecodecamp.org/news/create-a-real-time-gesture-to-text-translator/">my first freeCodeCamp tutorial</a>, I built a gesture-to-text translator using MediaPipe. This project demonstrated how computer vision can track hand movements and convert them into text in real time. For learners who rely on gestures, this kind of system can provide a bridge to participation.</p>
<p>Here is a simplified example of how MediaPipe detects hand landmarks:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> mediapipe <span class="hljs-keyword">as</span> mp
<span class="hljs-keyword">import</span> cv2

<span class="hljs-comment"># Initialize MediaPipe Hands</span>
mp_hands = mp.solutions.hands
hands = mp_hands.Hands()

<span class="hljs-comment"># Start capturing video from the webcam</span>
cap = cv2.VideoCapture(<span class="hljs-number">0</span>)

<span class="hljs-comment"># Capture a frame from the video</span>
ret, frame = cap.read()

<span class="hljs-comment"># Process the frame to detect hand landmarks</span>
results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

<span class="hljs-comment"># Print the detected hand landmarks</span>
print(<span class="hljs-string">"Hand landmarks:"</span>, results.multi_hand_landmarks)
</code></pre>
<p>This small piece of code shows how MediaPipe processes a video frame and extracts hand landmarks. From there, you can classify gestures and map them to text.</p>
<p>👉 You can explore the full project on <a target="_blank" href="https://github.com/tayo4christ/Gesture_Article">GitHub</a> or read the complete tutorial on <a target="_blank" href="https://www.freecodecamp.org/news/create-a-real-time-gesture-to-text-translator/">freeCodeCamp</a>.</p>
<p>In another <a target="_blank" href="https://www.freecodecamp.org/news/build-ai-accessibility-tools-with-python/">freeCodeCamp article</a>, I demonstrated how to build AI accessibility tools with Python, such as speech recognition and text-to-speech. These projects provided readers with a foundation for building their own inclusive tools, and you can find the full source code in the <a target="_blank" href="https://github.com/tayo4christ/inclusive-ai-toolkit">repository.</a></p>
<p>Beyond these individual projects, the wider field has also made significant progress. Advances in sign language recognition have improved accuracy in capturing complex hand shapes and movements. Text-to-speech systems have become more natural and adaptive, giving users voices that sound closer to human speech. Mobile and desktop accessibility apps have brought these capabilities into everyday classrooms.</p>
<p>These achievements are encouraging, but they remain limited. Most of today’s tools are still designed for a single mode of communication. A system may work for gestures, or for speech, or for text, but not all of them together.</p>
<p>The next step is clear: we need multimodal, adaptive AI tools that can blend gestures, speech, and feedback into unified systems. This is where the most exciting opportunities in accessibility lie, and it is where we will turn next.</p>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/main/single-vs-multimodal.png?raw=true" alt="Single vs Multimodal Systems" width="600" height="400" loading="lazy"></p>
<p><em>Figure 1: Comparison of isolated single-modality systems with unified multimodal AI systems.</em></p>
<h2 id="heading-case-study-1-translating-makaton-to-english">Case Study 1: Translating Makaton to English</h2>
<p>One of my first projects in this area focused on translating Makaton into English.</p>
<p>Makaton is a language programme that uses signs and symbols to support people with speech and language difficulties. It is widely used in classrooms where learners may not rely fully on speech. The challenge is that while a learner communicates in Makaton, their teachers and peers often work in English, which creates a communication gap.</p>
<h3 id="heading-the-ai-workflow">The AI Workflow</h3>
<p>The system followed a clear pipeline:</p>
<p><em>Camera Input → Hand Landmark Detection → Gesture Classification → English Translation Output</em></p>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/main/makaton-workflow.png?raw=true" alt="Makaton Workflow" width="600" height="400" loading="lazy"></p>
<p><em>Figure 2: AI workflow for translating Makaton gestures into English.</em></p>
<ul>
<li><p><strong>Camera Input</strong>: captures the learner’s Makaton sign.</p>
</li>
<li><p><strong>Hand Landmark Detection</strong>: a vision library such as MediaPipe or OpenCV identifies the position of the fingers and hands.</p>
</li>
<li><p><strong>Gesture Classification</strong>: a trained machine learning model classifies which Makaton sign was made.</p>
</li>
<li><p><strong>English Translation Output</strong>: the system maps that gesture to its English word or phrase and displays it.</p>
</li>
</ul>
<h3 id="heading-example-in-python">Example in Python</h3>
<p>Here is a simplified version of how this workflow might look in code:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Step 1: Capture input</span>
frame = camera.read()

<span class="hljs-comment"># Step 2: Detect hand landmarks</span>
landmarks = mediapipe.process(frame)

<span class="hljs-comment"># Step 3: Classify gesture</span>
gesture = gesture_model.predict(landmarks)

<span class="hljs-comment"># Step 4: Translate to English</span>
translation_map = {
    <span class="hljs-string">"hello_sign"</span>: <span class="hljs-string">"Hello"</span>,
    <span class="hljs-string">"thank_you_sign"</span>: <span class="hljs-string">"Thank you"</span>
}
text = translation_map.get(gesture, <span class="hljs-string">"Unknown sign"</span>)

print(<span class="hljs-string">"Makaton sign:"</span>, gesture, <span class="hljs-string">" -&gt; English:"</span>, text)
</code></pre>
<p>This is a simplified example, but it shows the core idea: map gestures to meaning and then bridge that meaning into English.</p>
<h3 id="heading-why-this-matters">Why This Matters</h3>
<p>Imagine a student signing <em>thank you</em> in Makaton and the system instantly displaying the words on screen. Teachers can check understanding, peers can respond naturally, and the learner’s contribution becomes visible to everyone.</p>
<p>The key takeaway is that AI can bridge symbol and gesture based languages with mainstream spoken and written communication. Instead of forcing learners to adapt to rigid systems, we can design systems that adapt to the way they already communicate.</p>
<h2 id="heading-case-study-2-aura-prototype-adaptive-speech-assistant">Case Study 2: AURA Prototype (Adaptive Speech Assistant)</h2>
<p>Another project I worked on is called <a target="_blank" href="https://aura-apraxia-aac-a8qejouwasaqequrhetbfw.streamlit.app/"><strong>AURA</strong></a>, the <em>Apraxia of Speech Adaptive Understanding and Relearning Assistant</em>. The idea was to design a system that not only recognises speech but also supports learners with speech disorders by detecting errors, adapting feedback, and offering multimodal alternatives.</p>
<h3 id="heading-the-challenge">The Challenge</h3>
<p>Most commercial speech recognition systems fail when a person’s speech does not follow typical patterns. This is especially true for people with apraxia of speech, where motor planning difficulties make pronunciation inconsistent. The result is frequent misrecognition, frustration, and exclusion from tools that rely on voice input.</p>
<h3 id="heading-the-ai-workflow-1">The AI Workflow</h3>
<p>The AURA prototype used a layered architecture:</p>
<p><em>Speech Input → Wav2Vec2 (fine-tuned for disordered speech) → CNN + BiLSTM Error Detection → Reinforcement Learning Feedback → Multimodal Output (Speech + Gesture)</em></p>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/main/aura-workflow.png?raw=true" alt="AURA Workflow" width="600" height="400" loading="lazy"></p>
<p><em>Figure 3: Workflow of the AURA prototype, combining speech, error detection, adaptive feedback, and multimodal outputs.</em></p>
<ul>
<li><p><strong>Wav2Vec2 Speech Recognition</strong>: fine-tuned on disordered speech to improve transcription accuracy.</p>
</li>
<li><p><strong>CNN + BiLSTM Model</strong>: classifies articulation or phonological errors in real time.</p>
</li>
<li><p><strong>Reinforcement Learning Engine</strong>: adapts feedback loops so therapy suggestions improve as the learner progresses.</p>
</li>
<li><p><strong>Gesture-to-Speech Multimodal Input</strong>: when speech is too difficult, MediaPipe gestures can be used to trigger spoken outputs.</p>
</li>
<li><p><strong>Streamlit Interface</strong>: integrates everything into a single accessible app for testing.</p>
</li>
</ul>
<p>Here’s a simplified view of how an error detection module could be structured:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Example: Error classification using CNN + BiLSTM</span>
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn

<span class="hljs-comment"># Define the ErrorClassifier model</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ErrorClassifier</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super(ErrorClassifier, self).__init__()
        self.cnn = nn.Conv1d(in_channels=<span class="hljs-number">40</span>, out_channels=<span class="hljs-number">64</span>, kernel_size=<span class="hljs-number">3</span>)
        self.lstm = nn.LSTM(<span class="hljs-number">64</span>, <span class="hljs-number">128</span>, batch_first=<span class="hljs-literal">True</span>, bidirectional=<span class="hljs-literal">True</span>)
        self.fc = nn.Linear(<span class="hljs-number">256</span>, <span class="hljs-number">3</span>)  <span class="hljs-comment"># Output classes: e.g. correct, substitution, omission</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>
        x = self.cnn(x)
        x, _ = self.lstm(x)
        <span class="hljs-keyword">return</span> self.fc(x[:, <span class="hljs-number">-1</span>, :])

<span class="hljs-comment"># Instantiate the model</span>
model = ErrorClassifier()
</code></pre>
<p>This snippet shows the heart of the error detection pipeline: combining CNN layers for feature extraction with BiLSTMs for sequence modeling. The model can flag articulation errors, which then guide the feedback loop.</p>
<h3 id="heading-why-this-matters-1">Why This Matters</h3>
<p>With AURA, the goal was not just to recognise what someone said, but to help them communicate more effectively. The prototype adapted in real time offering corrective feedback, suggesting gestures, or switching modes when speech became difficult.</p>
<p>The takeaway is that AI can evolve from being a passive recognition tool into an active partner in learning and communication.</p>
<h2 id="heading-the-bigger-picture-multimodal-accessibility-tools">The Bigger Picture: Multimodal Accessibility Tools</h2>
<p>The two projects we explored, translating Makaton into English and building the AURA prototype highlight a much larger transformation underway. Accessibility technology is moving away from isolated, single-purpose applications toward multimodal platforms that bring together speech, gestures, text, and adaptive AI into one seamless system.</p>
<h3 id="heading-why-this-shift-matters">Why This Shift Matters</h3>
<p>The benefits of this shift are profound:</p>
<ul>
<li><p><strong>Greater inclusivity in classrooms</strong>: learners who rely on different modes of communication can participate equally.</p>
</li>
<li><p><strong>Real-time support</strong>: systems that detect errors or adapt to gestures give learners immediate feedback rather than delayed corrections.</p>
</li>
<li><p><strong>Lower frustration</strong>: multimodal options mean if one channel breaks down (for example, speech), others like gesture or text can take over smoothly.</p>
</li>
<li><p><strong>Confidence and independence</strong>: learners express themselves more fully, without depending heavily on support staff or interpreters.</p>
</li>
</ul>
<h3 id="heading-beyond-the-classroom">Beyond the Classroom</h3>
<p>The impact of multimodal accessibility extends across many sectors:</p>
<ul>
<li><p>In <strong>healthcare</strong>, patients with communication difficulties can use multimodal AI assistants to express needs clearly, reducing misdiagnosis and stress.</p>
</li>
<li><p>In the <strong>workplace</strong>, employees with speech or motor impairments can collaborate effectively using adaptive AI tools.</p>
</li>
<li><p>In <strong>community settings</strong>, individuals can participate more freely in conversations, services, and digital platforms, strengthening social inclusion.</p>
</li>
</ul>
<h3 id="heading-visualising-the-shift">Visualising the Shift</h3>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/main/multimodal-applications.png?raw=true" alt="Multimodal Applications" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-build-a-multimodal-makaton-to-english-translator-gesture-speech">How to Build a Multimodal Makaton to English Translator (Gesture + Speech)</h2>
<p>This demo combines both use cases: a Makaton to English classroom tool and the AURA assistive speech path. It prioritizes gesture when a sign is detected, falls back to speech when it isn’t, and produces a unified English output (with optional text-to-speech). We’ll focus on the translation layer, multimodal fusion, and a simple Streamlit UI.</p>
<h3 id="heading-project-structure">Project structure</h3>
<pre><code class="lang-python">makaton_multimodal_demo/
├─ .streamlit/
│   └─ config.toml 
├─ assets/
│   └─ README.txt 
├─ tests/
│   └─ test_fuse.py 
└─ streamlit_app.py
</code></pre>
<p>The structure provided above outlines the organization of a project directory for a multimodal Makaton to English translator demo using Streamlit. Here's a brief explanation of each component:</p>
<ul>
<li><p><code>makaton_multimodal_demo/</code>: This is the root directory of the project.</p>
</li>
<li><p><code>.streamlit/</code>: This directory contains configuration files for Streamlit, which is a framework used to build web apps in Python. The <code>config.toml</code> file is optional and can be used to customize the Streamlit app's settings.</p>
</li>
<li><p><code>assets/</code>: This directory is intended to store models or other necessary files for the project. The <code>README.txt</code> serves as a placeholder to indicate where these files should be placed.</p>
</li>
<li><p><code>tests/</code>: This directory is for test scripts. The <code>test_</code><a target="_blank" href="http://fuse.py"><code>fuse.py</code></a> file likely contains tests for the fusion function, which is a part of the multimodal translation process.</p>
</li>
<li><p><code>streamlit_</code><a target="_blank" href="http://app.py"><code>app.py</code></a>: This is the main application file where the Streamlit app is implemented. It contains the code that runs the app, handling the user interface and the logic for translating Makaton gestures and speech into English.</p>
</li>
</ul>
<h3 id="heading-install-amp-run">Install &amp; run</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># (optional) create and activate a virtualenv</span>
python -m venv .venv

<span class="hljs-comment"># Windows</span>
.\.venv\Scripts\activate

<span class="hljs-comment"># macOS/Linux</span>
<span class="hljs-built_in">source</span> .venv/bin/activate
</code></pre>
<p>The code snippet above provides instructions for creating and activating a Python virtual environment, which is a self-contained directory that contains a Python installation for a particular version of Python, plus several additional packages.</p>
<ol>
<li><p><code>python -m venv .venv</code>: This command creates a new virtual environment in a directory named <code>.venv</code>. The <code>venv</code> module is used to create lightweight virtual environments.</p>
</li>
<li><p><code>.\.venv\Scripts\activate</code> (Windows): This command activates the virtual environment on Windows. Once activated, the environment's Python interpreter and installed packages will be used.</p>
</li>
<li><p><code>source .venv/bin/activate</code> (macOS/Linux): This command activates the virtual environment on macOS or Linux. Similar to Windows, activating the environment ensures that the specific Python interpreter and packages within the environment are used.</p>
</li>
</ol>
<h3 id="heading-install-dependencies">Install dependencies</h3>
<pre><code class="lang-python">pip install streamlit opencv-python mediapipe SpeechRecognition gTTS pydub numpy
</code></pre>
<p>The command above is used to install multiple Python packages at once. Here's what each package does:</p>
<ul>
<li><p><strong>streamlit</strong>: A framework for building interactive web applications in Python, often used for data science and machine learning projects.</p>
</li>
<li><p><strong>opencv-python</strong>: Provides OpenCV, a library for computer vision tasks such as image processing and video analysis.</p>
</li>
<li><p><strong>mediapipe</strong>: A library developed by Google for building cross-platform, customizable machine learning solutions for live and streaming media, including hand and face detection.</p>
</li>
<li><p><strong>SpeechRecognition</strong>: A library for performing speech recognition, allowing Python to recognize and process human speech.</p>
</li>
<li><p><strong>gTTS</strong>: Google Text-to-Speech, a library and CLI tool to interface with Google Translate's text-to-speech API, enabling text-to-speech conversion.</p>
</li>
<li><p><strong>pydub</strong>: A library for audio processing, allowing manipulation of audio files, such as converting between different audio formats.</p>
</li>
<li><p><strong>numpy</strong>: A fundamental package for scientific computing in Python, providing support for arrays and matrices, along with a collection of mathematical functions.</p>
</li>
</ul>
<h3 id="heading-create-streamlitapppy">Create <code>streamlit_app.py</code></h3>
<pre><code class="lang-python"><span class="hljs-comment"># streamlit_app.py</span>
<span class="hljs-keyword">from</span> io <span class="hljs-keyword">import</span> BytesIO
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional
<span class="hljs-keyword">import</span> streamlit <span class="hljs-keyword">as</span> st

<span class="hljs-comment"># Optional deps (kept optional so readers can still run the core demo)</span>
<span class="hljs-keyword">try</span>:
    <span class="hljs-keyword">import</span> cv2
    <span class="hljs-keyword">import</span> mediapipe <span class="hljs-keyword">as</span> mp
    MP_OK = <span class="hljs-literal">True</span>
<span class="hljs-keyword">except</span> Exception:
    MP_OK = <span class="hljs-literal">False</span>

<span class="hljs-keyword">try</span>:
    <span class="hljs-keyword">import</span> speech_recognition <span class="hljs-keyword">as</span> sr
    SR_OK = <span class="hljs-literal">True</span>
<span class="hljs-keyword">except</span> Exception:
    SR_OK = <span class="hljs-literal">False</span>

<span class="hljs-keyword">try</span>:
    <span class="hljs-keyword">from</span> gtts <span class="hljs-keyword">import</span> gTTS
    GTTS_OK = <span class="hljs-literal">True</span>
<span class="hljs-keyword">except</span> Exception:
    GTTS_OK = <span class="hljs-literal">False</span>

<span class="hljs-comment"># --- 1) Minimal Makaton dictionary (extend as needed)</span>
MAKATON_DICT = {
    <span class="hljs-string">"hello_sign"</span>: <span class="hljs-string">"Hello"</span>,
    <span class="hljs-string">"thank_you_sign"</span>: <span class="hljs-string">"Thank you"</span>,
    <span class="hljs-string">"help_sign"</span>: <span class="hljs-string">"Help"</span>,
    <span class="hljs-string">"toilet_sign"</span>: <span class="hljs-string">"Toilet"</span>,
    <span class="hljs-string">"stop_sign"</span>: <span class="hljs-string">"Stop"</span>,
}

<span class="hljs-comment"># --- 2) Gesture classifier (stub for the demo)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">classify_gesture</span>(<span class="hljs-params">landmarks</span>) -&gt; Optional[str]:</span>
    <span class="hljs-string">"""
    Return a canonical label like 'hello_sign' or None if unknown.
    Replace this stub with your trained model + confidence threshold.
    """</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"hello_sign"</span> <span class="hljs-keyword">if</span> landmarks <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># --- 3) Speech recognizer (fallback path)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">transcribe_speech</span>(<span class="hljs-params">seconds: int = <span class="hljs-number">3</span></span>) -&gt; Optional[str]:</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> SR_OK:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
    r = sr.Recognizer()
    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">with</span> sr.Microphone() <span class="hljs-keyword">as</span> source:
            st.info(<span class="hljs-string">"Listening..."</span>)
            audio = r.listen(source, phrase_time_limit=seconds)
        <span class="hljs-keyword">return</span> r.recognize_google(audio)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        st.warning(<span class="hljs-string">f"Speech recognition error: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># --- 4) Fusion logic (gesture first, speech fallback)</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fuse</span>(<span class="hljs-params">gesture_label: Optional[str], speech_text: Optional[str]</span>) -&gt; str:</span>
    <span class="hljs-keyword">if</span> gesture_label <span class="hljs-keyword">and</span> gesture_label <span class="hljs-keyword">in</span> MAKATON_DICT:
        <span class="hljs-keyword">return</span> MAKATON_DICT[gesture_label]
    <span class="hljs-keyword">if</span> speech_text:
        <span class="hljs-keyword">return</span> speech_text
    <span class="hljs-keyword">return</span> <span class="hljs-string">"No input detected"</span>

<span class="hljs-comment"># --- 5) Optional: extract single-frame hand landmarks using MediaPipe</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">extract_hand_landmarks_from_image</span>(<span class="hljs-params">image_bytes: bytes</span>):</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> MP_OK:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
        np_arr = np.frombuffer(image_bytes, dtype=np.uint8)
        img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
        <span class="hljs-keyword">if</span> img <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

        mp_hands = mp.solutions.hands
        <span class="hljs-keyword">with</span> mp_hands.Hands(static_image_mode=<span class="hljs-literal">True</span>, max_num_hands=<span class="hljs-number">1</span>, min_detection_confidence=<span class="hljs-number">0.5</span>) <span class="hljs-keyword">as</span> hands:
            img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            result = hands.process(img_rgb)

        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> result.multi_hand_landmarks:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

        hand_landmarks = result.multi_hand_landmarks[<span class="hljs-number">0</span>]
        <span class="hljs-keyword">return</span> [(lm.x, lm.y, lm.z) <span class="hljs-keyword">for</span> lm <span class="hljs-keyword">in</span> hand_landmarks.landmark]
    <span class="hljs-keyword">except</span> Exception:
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-comment"># --- 6) Streamlit UI</span>
st.set_page_config(page_title=<span class="hljs-string">"Makaton → English (Multimodal Demo)"</span>)
st.title(<span class="hljs-string">"Makaton → English (Multimodal Demo)"</span>)
st.caption(<span class="hljs-string">"Combines a classroom Makaton translator with an assistive speech path (AURA-style)."</span>)

<span class="hljs-keyword">with</span> st.expander(<span class="hljs-string">"What this demo shows"</span>):
    st.write(
        <span class="hljs-string">"- **Translation layer:** small Makaton dictionary you can extend.\n"</span>
        <span class="hljs-string">"- **Multimodal fusion:** gesture prioritized, speech as fallback.\n"</span>
        <span class="hljs-string">"- **UI:** one page, clear output, optional text-to-speech."</span>
    )

tabs = st.tabs([<span class="hljs-string">"Simulated Sign"</span>, <span class="hljs-string">"Single-Frame Webcam (Optional)"</span>, <span class="hljs-string">"About"</span>])

<span class="hljs-comment"># Tab 1: Simulated (no CV model required)</span>
<span class="hljs-keyword">with</span> tabs[<span class="hljs-number">0</span>]:
    st.subheader(<span class="hljs-string">"Simulated Gesture + Speech"</span>)
    col1, col2 = st.columns(<span class="hljs-number">2</span>)

    <span class="hljs-keyword">with</span> col1:
        simulate = st.selectbox(
            <span class="hljs-string">"Pick a sign"</span>,
            [<span class="hljs-string">""</span>, <span class="hljs-string">"hello_sign"</span>, <span class="hljs-string">"thank_you_sign"</span>, <span class="hljs-string">"help_sign"</span>, <span class="hljs-string">"toilet_sign"</span>, <span class="hljs-string">"stop_sign"</span>],
            index=<span class="hljs-number">0</span>
        )
        gesture_label = simulate <span class="hljs-keyword">or</span> <span class="hljs-literal">None</span>

    <span class="hljs-keyword">with</span> col2:
        speech_text = st.session_state.get(<span class="hljs-string">"speech_text"</span>)
        st.write(<span class="hljs-string">"Current speech:"</span>, speech_text <span class="hljs-keyword">or</span> <span class="hljs-string">"None"</span>)
        <span class="hljs-keyword">if</span> st.button(<span class="hljs-string">"Transcribe 3s"</span>):
            <span class="hljs-keyword">if</span> SR_OK:
                speech_text = transcribe_speech(<span class="hljs-number">3</span>)
                st.session_state[<span class="hljs-string">"speech_text"</span>] = speech_text
            <span class="hljs-keyword">else</span>:
                st.warning(<span class="hljs-string">"SpeechRecognition not installed."</span>)

    output = fuse(gesture_label, st.session_state.get(<span class="hljs-string">"speech_text"</span>))
    st.markdown(<span class="hljs-string">f"### Output: **<span class="hljs-subst">{output}</span>**"</span>)

    <span class="hljs-keyword">if</span> output <span class="hljs-keyword">and</span> output != <span class="hljs-string">"No input detected"</span>:
        <span class="hljs-keyword">if</span> st.button(<span class="hljs-string">"Speak output"</span>):
            <span class="hljs-keyword">if</span> GTTS_OK:
                mp3 = BytesIO()
                <span class="hljs-keyword">try</span>:
                    gTTS(output).write_to_fp(mp3)
                    st.audio(mp3.getvalue(), format=<span class="hljs-string">"audio/mp3"</span>)
                <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
                    st.warning(<span class="hljs-string">f"TTS failed: <span class="hljs-subst">{e}</span>"</span>)
            <span class="hljs-keyword">else</span>:
                st.warning(<span class="hljs-string">"gTTS not installed."</span>)

<span class="hljs-comment"># Tab 2: Optional single-frame webcam capture</span>
<span class="hljs-keyword">with</span> tabs[<span class="hljs-number">1</span>]:
    st.subheader(<span class="hljs-string">"Single-Frame Hand Detection (Webcam)"</span>)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> MP_OK:
        st.warning(<span class="hljs-string">"Install MediaPipe + OpenCV to enable this tab."</span>)
    <span class="hljs-keyword">else</span>:
        img = st.camera_input(<span class="hljs-string">"Capture a frame"</span>)
        captured_label = <span class="hljs-literal">None</span>
        <span class="hljs-keyword">if</span> img <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
            landmarks = extract_hand_landmarks_from_image(img.getvalue())
            <span class="hljs-keyword">if</span> landmarks:
                captured_label = classify_gesture(landmarks)
                st.success(<span class="hljs-string">"Hand detected."</span>)
            <span class="hljs-keyword">else</span>:
                st.info(<span class="hljs-string">"No hand landmarks found. Try better lighting/framing."</span>)

        <span class="hljs-keyword">if</span> st.button(<span class="hljs-string">"Transcribe 3s (webcam tab)"</span>):
            st.session_state[<span class="hljs-string">"speech_text2"</span>] = transcribe_speech(<span class="hljs-number">3</span>) <span class="hljs-keyword">if</span> SR_OK <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

        speech_text2 = st.session_state.get(<span class="hljs-string">"speech_text2"</span>)
        st.write(<span class="hljs-string">"Current speech:"</span>, speech_text2 <span class="hljs-keyword">or</span> <span class="hljs-string">"None"</span>)

        output2 = fuse(captured_label, speech_text2)
        st.markdown(<span class="hljs-string">f"### Output: **<span class="hljs-subst">{output2}</span>**"</span>)

        <span class="hljs-keyword">if</span> output2 <span class="hljs-keyword">and</span> output2 != <span class="hljs-string">"No input detected"</span>:
            <span class="hljs-keyword">if</span> st.button(<span class="hljs-string">"Speak output (webcam tab)"</span>):
                <span class="hljs-keyword">if</span> GTTS_OK:
                    mp3 = BytesIO()
                    <span class="hljs-keyword">try</span>:
                        gTTS(output2).write_to_fp(mp3)
                        st.audio(mp3.getvalue(), format=<span class="hljs-string">"audio/mp3"</span>)
                    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
                        st.warning(<span class="hljs-string">f"TTS failed: <span class="hljs-subst">{e}</span>"</span>)
                <span class="hljs-keyword">else</span>:
                    st.warning(<span class="hljs-string">"gTTS not installed."</span>)
</code></pre>
<p>The code above creates a Streamlit application that combines gesture recognition and speech recognition to translate Makaton signs into English. Here's a brief explanation of how it works:</p>
<ol>
<li><p><strong>Dependencies and Setup</strong>: The code attempts to import optional dependencies like OpenCV, MediaPipe, SpeechRecognition, and gTTS. These are used for gesture detection, speech recognition, and text-to-speech functionalities.</p>
</li>
<li><p><strong>Makaton Dictionary</strong>: A minimal dictionary that maps Makaton signs to English words. This can be extended to include more signs.</p>
</li>
<li><p><strong>Gesture Classifier</strong>: A placeholder function (<code>classify_gesture</code>) is used to classify hand gestures. In a real application, this would be replaced with a trained model.</p>
</li>
<li><p><strong>Speech Recognizer</strong>: The <code>transcribe_speech</code> function uses the SpeechRecognition library to convert spoken words into text, serving as a fallback when gestures are not detected.</p>
</li>
<li><p><strong>Fusion Logic</strong>: The <code>fuse</code> function prioritizes gesture recognition over speech. If a gesture is recognized, it translates it using the dictionary; otherwise, it uses the transcribed speech.</p>
</li>
<li><p><strong>Hand Landmark Extraction</strong>: The code includes a function to extract hand landmarks from an image using MediaPipe, which is used for gesture classification.</p>
</li>
<li><p><strong>Streamlit UI</strong>: The user interface is built with Streamlit, featuring tabs for simulated gestures, webcam-based gesture detection, and additional information. Users can simulate gestures, capture gestures via webcam, and use speech input. The output is displayed and can be converted to speech using gTTS.</p>
</li>
</ol>
<p>This application demonstrates a multimodal approach by integrating both gesture and speech recognition to facilitate communication for users who rely on Makaton.</p>
<h3 id="heading-run">Run</h3>
<pre><code class="lang-bash">streamlit run .\streamlit_app.py
</code></pre>
<p>The command above is used to launch a Streamlit application. When executed, it starts a local web server and opens the specified Python script in a web browser, allowing you to interact with the app's user interface. This command is typically run in a terminal or command prompt.</p>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/8117234b9dc032aa0f4ff32abad92e7ad3344b81/ui-home-simulated-tab.jpg?raw=1" alt="Streamlit app ‘Makaton to English (Multimodal Demo)’ showing the Simulated Sign tab with ‘Pick a sign’, ‘Transcribe 3s’, and ‘Output: No input detected’." width="600" height="400" loading="lazy"></p>
<p><em>Figure — App interface: the Simulated Sign tab before any input.</em></p>
<p><img src="https://github.com/tayo4christ/ai-accessibility-articles-assets/blob/8117234b9dc032aa0f4ff32abad92e7ad3344b81/ui-simulated-hello-output.jpg?raw=1" alt="Simulated sign ‘hello_sign’ selected in the Streamlit app; Output shows “Hello”." width="600" height="400" loading="lazy"></p>
<p><em>Figure — Selecting</em> <code>hello_sign</code> <em>produces “Output: Hello”.</em></p>
<h2 id="heading-project-overview">Project Overview</h2>
<p>You have developed a multimodal translator that integrates both gesture recognition (specifically Makaton signs) and speech recognition to produce a unified English output. The system is designed to prioritize gesture input, using speech as a fallback when gestures are not detected.</p>
<p><strong>User Interface</strong></p>
<p>The application is built using Streamlit, featuring two main tabs:</p>
<ul>
<li><p><strong>Simulated Sign Tab</strong>: Allows users to simulate gestures without requiring computer vision (CV) capabilities.</p>
</li>
<li><p><strong>Webcam Single Frame Tab</strong>: Optionally uses a webcam to capture and process a single frame for gesture detection.</p>
</li>
</ul>
<p><strong>Use Case Integration</strong></p>
<ul>
<li><p><strong>Makaton to English Translation</strong>: In a classroom setting, detected Makaton signs are translated into short English phrases, facilitating communication.</p>
</li>
<li><p><strong>AURA-style Assistive Path</strong>: If no gesture is detected, the system relies on speech input to generate an output, ensuring continuous communication support.</p>
</li>
</ul>
<p><strong>Design Limitations</strong></p>
<ul>
<li><p>The gesture classifier is currently a placeholder and should be replaced with a trained model that includes a confidence threshold for better accuracy.</p>
</li>
<li><p>The Makaton dictionary is minimal and can be expanded to include more phrases and templates.</p>
</li>
<li><p>The speech recognition component uses a basic recognizer. For improved robustness, consider using advanced models like Wav2Vec2 or offline automatic speech recognition (ASR) systems.</p>
</li>
</ul>
<p><strong>Suggested Extensions</strong></p>
<ul>
<li><p>Implement a confidence threshold to display both gesture and speech inputs when the system is uncertain.</p>
</li>
<li><p>Expand the dictionary to support slot templates, such as "I want [item]".</p>
</li>
<li><p>Introduce a toggle to switch between speech-first and gesture-first input priorities.</p>
</li>
<li><p>Enable logging of outputs for teachers and provide an option to export these logs as CSV files.</p>
</li>
<li><p>Consider replacing gTTS with an offline text-to-speech solution for better reliability.</p>
</li>
</ul>
<p><strong>Troubleshooting Tips</strong></p>
<ul>
<li><p>If you encounter microphone errors, ensure that pyaudio is installed. On Windows, use <code>pip install pipwin</code> followed by <code>pipwin install pyaudio</code>.</p>
</li>
<li><p>If the webcam is not detected, check your browser permissions. The Simulated Sign tab can still be used without a webcam.</p>
</li>
<li><p>If there are issues with package imports, verify that they are installed in your active virtual environment.</p>
</li>
</ul>
<p>The link to the full code: <a target="_blank" href="https://github.com/tayo4christ/makaton-multimodal-demo/tree/main/makaton_multimodal_demo">Multimodal_Makaton</a></p>
<h2 id="heading-challenges-and-ethical-considerations">Challenges and Ethical Considerations</h2>
<p>While the promise of multimodal accessibility tools is exciting, building them responsibly requires us to confront several challenges. These are not only technical problems but also ethical ones that affect how learners, teachers, and communities experience AI.</p>
<h3 id="heading-data-scarcity">Data Scarcity</h3>
<p>Training AI systems requires large, diverse datasets. But when it comes to disordered speech or symbol systems like Makaton, the data is limited. Without enough examples, models risk being inaccurate or biased toward a narrow group of users. Collecting more data is essential, but it must be done ethically, with consent and respect for the communities involved.</p>
<h3 id="heading-fairness-and-inclusion">Fairness and Inclusion</h3>
<p>AI systems often work better for some groups than others. A model trained mostly on fluent English speakers may fail to recognise learners with strong accents or speech difficulties. Similarly, gesture recognition may not account for differences in motor ability. Fairness means designing models that work across abilities, accents, and cultures, so that no group is excluded by design.</p>
<h3 id="heading-privacy-and-security">Privacy and Security</h3>
<p>Speech and video data are highly sensitive, especially when collected in schools. Protecting this data is not optional, it is a requirement. Systems must anonymize or encrypt recordings and store them securely. Transparency is also key: learners, parents, and teachers should know exactly how data is being used and who has access to it.</p>
<h3 id="heading-accessibility-of-the-tools-themselves">Accessibility of the Tools Themselves</h3>
<p>Ironically, many “accessibility tools” remain inaccessible because they are expensive, require powerful hardware, or are too complex to use. For AI to truly reduce barriers, solutions must be affordable, lightweight, and easy for teachers to set up in real classrooms, not just in research labs.</p>
<h3 id="heading-takeaway">Takeaway</h3>
<p>These challenges remind us that accessibility in AI is not only a technical question but also an ethical and social responsibility. To build tools that genuinely help learners, we need collaboration between developers, educators, policymakers, and the communities who will use the systems.</p>
<h2 id="heading-where-were-heading-next">Where We’re Heading Next</h2>
<p>The future of AI accessibility tools is speculative, but the possibilities are both exciting and necessary. What we have now are prototypes and early systems. What lies ahead are tools that could reshape how classrooms and society more broadly approach communication and inclusion.</p>
<h3 id="heading-multilingual-makaton-translation">Multilingual Makaton Translation</h3>
<p>One promising direction is the ability to translate Makaton across multiple languages. A learner in the UK could sign in Makaton and see their contribution appear not just in English but in French, Spanish, or Yoruba. This would open up international classrooms and give learners access to global opportunities that are often closed off by language barriers.</p>
<h3 id="heading-ai-tutors-with-dynamic-adaptation">AI Tutors with Dynamic Adaptation</h3>
<p>Imagine a classroom assistant powered by AI that adapts in real time. If a learner struggles with speech, it could switch to gesture recognition. If gestures become tiring, it could prompt the learner with symbol-based options. These AI tutors would not only support communication but also guide learning, adapting to each student’s strengths and challenges over time.</p>
<h3 id="heading-wearable-multimodal-devices">Wearable Multimodal Devices</h3>
<p>The rise of lightweight hardware makes it possible to imagine wearable AI assistants that provide instant translation and support. Glasses could capture gestures and overlay text, while earbuds could translate disordered speech into clear audio for peers and teachers. Instead of bulky setups, accessibility would become portable, personal, and ever-present.</p>
<h3 id="heading-a-broader-impact">A Broader Impact</h3>
<p>These innovations go beyond technology alone. They align with the United Nations Sustainable Development Goals (SDGs) especially:</p>
<ul>
<li><p><strong>Quality Education (Goal 4):</strong> ensuring that every learner, regardless of ability, has equal access to education.</p>
</li>
<li><p><strong>Reduced Inequalities (Goal 10):</strong> breaking down barriers so that disability or difference is not a cause of exclusion.</p>
</li>
</ul>
<p>The journey from single-modality tools to multimodal, adaptive systems is still in its early stages. But if we continue to push forward with creativity, ethics, and inclusivity at the center, AI accessibility tools will not only change classrooms they will change lives.</p>
<h2 id="heading-conclusion-building-an-inclusive-future-with-ai">Conclusion: Building an Inclusive Future with AI</h2>
<p>AI accessibility tools are no longer just optional add-ons for a few learners. They are becoming core enablers of inclusion in education, healthcare, workplaces, and daily life.</p>
<p>The journey from early gesture recognition systems to multimodal, adaptive prototypes like Makaton translation and AURA shows what is possible when technology is designed around people rather than forcing people to adapt to technology. These innovations break down communication barriers and open up new opportunities for learners who have too often been left on the margins.</p>
<p>But the future of accessibility is not automatic. It depends on choices we make now as developers, educators, researchers, and policymakers. Building tools that are open, ethical, and affordable requires collaboration and commitment.</p>
<p>The vision is clear: a world where every learner, regardless of ability, can express themselves fully, be understood by others, and participate with confidence.</p>
<p><strong>The future of education is inclusive and with thoughtful design, AI can help us get there.</strong></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Design Accessible Browser Extensions ]]>
                </title>
                <description>
                    <![CDATA[ Building a browser extension is easy, but ensuring that it’s accessible to everyone takes deliberate care and skill. Your extension might fetch data flawlessly and have a beautiful interface, but if screen reader users or keyboard navigators can’t us... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-design-accessible-browser-extensions/</link>
                <guid isPermaLink="false">68c169e7d950044818727fd6</guid>
                
                    <category>
                        <![CDATA[ browser ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Browser Extension ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Browsers ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ophy Boamah ]]>
                </dc:creator>
                <pubDate>Wed, 10 Sep 2025 12:07:03 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757460414092/f3a9f3ec-f520-4627-b839-a28f15574ba6.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Building a browser extension is easy, but ensuring that it’s accessible to everyone takes deliberate care and skill.</p>
<p>Your extension might fetch data flawlessly and have a beautiful interface, but if screen reader users or keyboard navigators can’t use it, you’ve unintentionally excluded many potential users.</p>
<p>In this article, we will audit a Chrome browser extension for accessibility issues and transform it into an inclusive experience that works for everyone.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-why-accessibility-matters-in-browser-extensions">Why Accessibility Matters in Browser Extensions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-perform-manual-browser-extension-accessibility-tests">How to Perform Manual Browser Extension Accessibility Tests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-implement-browser-extension-accessibility-improvements">How to Implement Browser Extension Accessibility Improvements</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-perform-automated-browser-extension-accessibility-tests">How to Perform Automated Browser Extension Accessibility Tests</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-best-practices-for-accessible-browser-extensions">Best Practices for Accessible Browser Extensions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-why-accessibility-matters-in-browser-extensions">Why Accessibility Matters in Browser Extensions</h2>
<p>Every click in your browser extension is an opportunity to empower users or exclude them if accessibility isn’t part of your design.</p>
<p>Browser extensions face unique accessibility challenges, as they must inject functionality into existing web pages while maintaining their own accessible interfaces - a dual responsibility that can introduce potential barriers. For example, a popup that traps keyboard users or fails to communicate with screen readers can render an extension unusable.</p>
<p>With over one billion people living with disabilities, according to the World Health Organization, accessible design unlocks a vast user base and creates better experiences for everyone.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757242166628/da2f87e2-5903-4bae-a2f4-071b2a339c69.png" alt="An infographic showing browser extension common accessibility barriers" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>For browser extensions, accessibility barriers commonly emerge as:</p>
<ul>
<li><p><strong>Keyboard navigation dead-ends</strong>: Popups and interfaces that trap or exclude keyboard users.</p>
</li>
<li><p><strong>Silent interactions</strong>: Missing labels and descriptions, like a button with only an icon announced as “unlabelled button” by screen readers, leaving users guessing about its purpose.</p>
</li>
<li><p><strong>Unannounced dynamic content updates</strong>: Content changes that occur without assistive technology awareness, such as a quote updating without notifying screen readers of the change, including missing feedback for loading states or errors</p>
</li>
<li><p><strong>Context integration conflicts</strong>: Extensions modifying existing web pages can mistakenly break the page's accessibility features or introduce elements that clash with established navigation patterns</p>
</li>
</ul>
<p>By understanding these barriers, developers can take targeted steps to test and improve their extensions’ accessibility.</p>
<h2 id="heading-how-to-perform-manual-browser-extension-accessibility-tests">How to Perform Manual Browser Extension Accessibility Tests</h2>
<p>While automated tools catch obvious issues, manual testing reveals the real user experience. Here's how to systematically evaluate your extension's accessibility.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">You can use any unpublished browser extension to follow along. For this test, we’ll be using the <a target="_self" href="https://www.freecodecamp.org/news/how-to-build-an-advice-generator-chrome-extension-with-manifest-v3/">browser extension built in this article</a>, which uses <a target="_self" href="https://www.frontendmentor.io/challenges/advice-generator-app-QdUG-13db?via=ophyboamah">this Advice generator app design</a>.</div>
</div>

<h3 id="heading-keyboard-navigation-test">Keyboard Navigation Test</h3>
<p>Disconnect your mouse and try to use your extension completely with the keyboard only. Navigate using <code>Tab</code> to move between elements, <code>Enter</code> or <code>Space</code> to activate buttons, and arrow keys within components. </p>
<ul>
<li><p>Is it always clear which element has focus?</p>
</li>
<li><p>Can you activate buttons with <code>Enter</code> or <code>Space</code> as expected?</p>
</li>
<li><p>Can users exit modal dialogs or dropdown menus?</p>
</li>
</ul>
<p>If you encounter any dead-ends or confusion points, keyboard users will face the same barriers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757242828152/b1555a79-a810-4d02-a995-6bf101ca2564.png" alt="An screenshot of an advice interface with a focused button " class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h3 id="heading-screen-reader-evaluation">Screen Reader Evaluation</h3>
<p>Use your operating system's built-in screen reader to navigate your extension and listen to what is announced. On macOS, enable VoiceOver; on Windows, use Narrator; on Linux, try Orca. </p>
<ul>
<li><p>Does each element’s purpose come through clearly, such as a button announced as “Generate new advice” rather than just “button”?</p>
</li>
<li><p>Are headings, lists, and other structures properly conveyed?</p>
</li>
<li><p>Do users understand when content is loading, selected, or has changed?</p>
</li>
</ul>
<p>This testing phase often reveals the gap between what you intended to communicate and what actually reaches users.</p>
<h3 id="heading-visual-accessibility-review">Visual Accessibility Review</h3>
<p>Examine your extension in different visual contexts. Use developer tools, like WebAIM’s Contrast Checker, to verify that text meets WCAG’s 4.5:1 contrast ratio for readability. Test how your extension appears in system high-contrast settings. Ensure:</p>
<ul>
<li><p>Functionality remains usable at 200% zoom.</p>
</li>
<li><p>Information isn’t conveyed through colour alone, such as using text labels alongside colour-coded indicators.</p>
</li>
</ul>
<p>These manual tests will uncover critical accessibility issues, paving the way for targeted improvements to make your extension inclusive.</p>
<h2 id="heading-how-to-implement-browser-extension-accessibility-improvements">How to Implement Browser Extension Accessibility Improvements</h2>
<p>Imagine refreshing a page without knowing it happened or clicking a button with no clear purpose. The manual tests performed above revealed that's the experience for screen reader users of our extension among these three key accessibility issues:</p>
<ul>
<li><p><strong>Missing button label</strong>: The dice button only has an image with alt text “Dice icon,” which lacks the context screen readers need</p>
</li>
<li><p><strong>Silent dynamic updates</strong>: When new advice loads, screen readers don't know the content has changed</p>
</li>
<li><p><strong>No loading states</strong>: When fetching advice, users receive no feedback that something is happening</p>
</li>
</ul>
<p>Let's address the issues before conducting automated tests.</p>
<h3 id="heading-how-to-address-missing-button-label-and-alt-text">How to Address Missing Button Label and Alt text</h3>
<p>We’ll add <code>aria-label</code> to clearly explain the button's purpose and provide descriptive alt text for the icon. The <code>role="presentation"</code> attribute ensures the image is treated as decorative by screen readers.</p>
<pre><code class="lang-xml"><span class="hljs-comment">&lt;!--Before: Unclear Button Purpose and icon alt text--&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"dice-button"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"generate-advice-btn"</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"/icons/icon-dice.png"</span> <span class="hljs-attr">alt</span>=<span class="hljs-string">"Dice icon"</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>

<span class="hljs-comment">&lt;!--After: Clear, Accessible Button and icon alt text--&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">button</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"dice-button"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"generate-advice-btn"</span> <span class="hljs-attr">aria-label</span>=<span class="hljs-string">"Generate new advice"</span>&gt;</span>
     <span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"/icons/icon-dice.png"</span> <span class="hljs-attr">alt</span>=<span class="hljs-string">"A dice icon with green background"</span> <span class="hljs-attr">role</span>=<span class="hljs-string">"presentation"</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
</code></pre>
<h3 id="heading-how-to-address-silent-dynamic-updates">How to Address Silent Dynamic Updates</h3>
<p>We’ll add <code>aria-live="polite"</code> for screen readers to announce new advice and <code>aria-atomic="true"</code> to ensure that the entire quote is read. That is:</p>
<pre><code class="lang-xml"><span class="hljs-comment">&lt;!--Before: Silent Dynamic Updates--&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"advice-quote"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"advice-quote"</span>&gt;</span>
    "It is easy to sit up and take notice, what's difficult is getting up and taking action."
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>

<span class="hljs-comment">&lt;!--After: Announced Content Changes--&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">class</span>=<span class="hljs-string">"advice-quote"</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"advice-quote"</span> <span class="hljs-attr">aria-live</span>=<span class="hljs-string">"polite"</span> <span class="hljs-attr">aria-atomic</span>=<span class="hljs-string">"true"</span>&gt;</span>
    "It is easy to sit up and take notice, what's difficult is getting up and taking action."
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
</code></pre>
<h3 id="heading-how-to-address-no-loading-states">How to Address No Loading States</h3>
<p>We’ll add a <code>setLoadingState</code> function to provide loading indicators, ensuring screen reader users are notified when content is being fetched:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Before: No Loading Feedback</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">requestNewAdvice</span>(<span class="hljs-params"></span>) </span>{
  chrome.runtime.sendMessage({ <span class="hljs-attr">action</span>: <span class="hljs-string">"fetchAdvice"</span> }, <span class="hljs-function">(<span class="hljs-params">response</span>) =&gt;</span> {
    <span class="hljs-comment">// No loading indicators...</span>
  });
}

<span class="hljs-comment">// After: Accessible Loading States</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">requestNewAdvice</span>(<span class="hljs-params"></span>) </span>{
  setLoadingState(<span class="hljs-literal">true</span>); 
  chrome.runtime.sendMessage({ <span class="hljs-attr">action</span>: <span class="hljs-string">"fetchAdvice"</span> }, <span class="hljs-function">(<span class="hljs-params">response</span>) =&gt;</span> {
    setLoadingState(<span class="hljs-literal">false</span>);
    <span class="hljs-comment">// Handle response with proper announcements...</span>
  });
}
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">setLoadingState</span>(<span class="hljs-params">isLoading</span>) </span>{
  <span class="hljs-keyword">if</span> (isLoading) {
    <span class="hljs-comment">// Disable button and show loading text</span>
    generateAdviceBtn.disabled = <span class="hljs-literal">true</span>;
    generateAdviceBtn.setAttribute(<span class="hljs-string">'aria-label'</span>, <span class="hljs-string">'Loading new advice...'</span>);
    <span class="hljs-comment">// Show loading text in the advice quote element</span>
    adviceQuoteElement.textContent = <span class="hljs-string">"Loading new advice..."</span>;
  } <span class="hljs-keyword">else</span> {
    <span class="hljs-comment">// Re-enable button</span>
    generateAdviceBtn.disabled = <span class="hljs-literal">false</span>;
    generateAdviceBtn.setAttribute(<span class="hljs-string">'aria-label'</span>, <span class="hljs-string">'Generate new advice'</span>);
  }
}
</code></pre>
<p>With the manual testing issues addressed, we can now move on to performing an automated test of the same extension.</p>
<h2 id="heading-how-to-perform-automated-browser-extension-accessibility-tests">How to Perform Automated Browser Extension Accessibility Tests</h2>
<p>Manual testing provides crucial insights, but automated tools can efficiently catch common issues and provide ongoing monitoring. </p>
<p>This <a target="_blank" href="https://extensiona11ychecker.vercel.app/">Extension Accessibility Checker</a> simplifies testing by analyzing browser extension interfaces, such as popups and content scripts, for WCAG compliance, addressing unique challenges like popup constraints and content injection conflicts.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757239257443/42918662-1465-4c01-8f07-ada5d9adb174.gif" alt="A GIF showing how to test an extension zip file with the Extension accessibility checker tool" class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<p>To use the Extension Accessibility Checker:</p>
<ol>
<li><p>Compress your browser extension folder into a .zip file</p>
</li>
<li><p>Upload the .zip file on <a target="_blank" href="https://extensiona11ychecker.vercel.app/">https://extensiona11ychecker.vercel.app/</a></p>
</li>
<li><p>Review the generated report for specific accessibility violations and implement suggested fixes </p>
</li>
</ol>
<p>As shown in the GIF above, this workflow helps establish accessibility as a routine part of your development process rather than an afterthought.</p>
<p>With automated testing in place, let’s explore best practices to ensure that your extension remains accessible throughout development.</p>
<h2 id="heading-best-practices-for-accessible-browser-extensions">Best Practices for Accessible Browser Extensions</h2>
<p>We've transformed our <a target="_blank" href="https://www.frontendmentor.io/challenges/advice-generator-app-QdUG-13db?via=ophyboamah">sample advice-generating browser extension</a> from a functional but inaccessible tool into an inclusive one that works for everyone. </p>
<p>Based on our improvements, here are four key principles for designing accessible browser extensions:</p>
<ol>
<li><h3 id="heading-semantic-html-and-clear-descriptive-labels">Semantic HTML and Clear, Descriptive Labels</h3>
</li>
</ol>
<p>Always start with proper HTML structure, using appropriate elements (for example, for a “Generate Advice” action, proper heading hierarchy) before adding ARIA attributes.</p>
<p>Ensure that every interactive element has a clear purpose via <code>aria-label</code>, <code>aria-labelledby</code>, or visible text that explains its action.</p>
<ol start="2">
<li><h3 id="heading-clear-communication-at-every-step">Clear Communication at Every Step</h3>
</li>
</ol>
<p>Every interactive element must convey its purpose effectively. Users need to understand:</p>
<ul>
<li><ul>
<li><p>What’s happening (for example, “Loading new advice…” for loading states)</p>
<ul>
<li><p>What went wrong (for example, “Failed to load advice” for errors)</p>
</li>
<li><p>What changed (for example, aria-live regions for updated content)</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ol start="3">
<li><h3 id="heading-complete-keyboard-accessibility">Complete Keyboard Accessibility</h3>
</li>
</ol>
<p>All functionality must be available through keyboard navigation. This requires testing with <code>Tab</code>, <code>Enter</code>, <code>Space</code>, and arrow keys as appropriate.</p>
<p>Provide clear and thoughtful focus indicators that move predictably through your interface with obvious ways to exit modals or complex interactions.</p>
<ol start="4">
<li><h3 id="heading-user-preferences-and-content-script-considerations">User Preferences and Content Script Considerations</h3>
</li>
</ol>
<p>Respect user choices by supporting system font size settings and not overriding user-defined colour schemes unnecessarily.</p>
<p>When your extension modifies existing web pages, make sure you don't break the page's established accessibility features, focus management and navigation patterns. Ensure any new elements you inject follow accessibility standards.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>As we’ve seen with our <a target="_blank" href="https://www.frontendmentor.io/challenges/advice-generator-app-QdUG-13db?via=ophyboamah">advice-generating extension</a>, addressing accessibility issues transforms a functional tool into an inclusive one.</p>
<p>However, while fixing issues in existing extensions is helpful, the most effective approach is letting accessibility guide your design and development decisions from the first line of code.</p>
<p>When starting your next browser extension project, ask:</p>
<ul>
<li><p>How would someone navigate this using only a keyboard?</p>
</li>
<li><p>Is the purpose of every interactive element immediately clear to screen readers?</p>
</li>
<li><p>How will users understand what's happening during loading states?</p>
</li>
</ul>
<p>Here are some helpful resources</p>
<ul>
<li><p><a target="_blank" href="https://developer.chrome.com/docs/extensions/mv3/a11y/">Chrome Extension Accessibility Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://extensiona11ychecker.vercel.app/">Extension Accessibility Checker</a></p>
</li>
<li><p><a target="_blank" href="https://www.w3.org/WAI/WCAG21/quickref/">Web Content Accessibility Guidelines (WCAG) 2.1</a></p>
</li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build AI Speech-to-Text and Text-to-Speech Accessibility Tools with Python ]]>
                </title>
                <description>
                    <![CDATA[ Classrooms today are more diverse than ever before. Among the students are neurodiverse learners with different learning needs. While these learners bring unique strengths, traditional teaching methods don’t always meet their needs. This is where AI-... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-ai-accessibility-tools-with-python/</link>
                <guid isPermaLink="false">68b5f910f596271023ce3698</guid>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ AI ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ OMOTAYO OMOYEMI ]]>
                </dc:creator>
                <pubDate>Mon, 01 Sep 2025 19:50:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1756755907758/3568b7ab-f659-45c9-8c1a-e877d1a0a166.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Classrooms today are more diverse than ever before. Among the students are neurodiverse learners with different learning needs. While these learners bring unique strengths, traditional teaching methods don’t always meet their needs.</p>
<p>This is where AI-driven accessibility tools can make a difference. From real-time captioning to adaptive reading support, artificial intelligence is transforming classrooms into more inclusive spaces.</p>
<p>In this article, you’ll:</p>
<ul>
<li><p>Understand what inclusive education means in practice.</p>
</li>
<li><p>See how AI can support neurodiverse learners.</p>
</li>
<li><p>Try two hands-on Python demos:</p>
<ul>
<li><p><strong>Speech-to-Text</strong> using local Whisper (free, no API key).</p>
</li>
<li><p><strong>Text-to-Speech</strong> using Hugging Face SpeechT5.</p>
</li>
</ul>
</li>
<li><p>Get a ready-to-use project structure, requirements**,** and troubleshooting tips for Windows and macOS/Linux users.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-a-note-on-missing-files">A Note on Missing Files</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-inclusive-education-really-means">What Inclusive Education Really Means</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-toolbox-five-ai-accessibility-tools-teachers-can-try-today">Toolbox: Five AI Accessibility Tools Teachers Can Try Today</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-platform-notes-windows-vs-macoslinux">Platform Notes (Windows vs macOS/Linux)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-hands-on-build-a-simple-accessibility-toolkit-python">Hands-On: Build a Simple Accessibility Toolkit (Python)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-quick-setup-cheatsheet">Quick Setup Cheatsheet</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-from-code-to-classroom-impact">From Code to Classroom Impact</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-developer-challenge-build-for-inclusion">Developer Challenge: Build for Inclusion</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-challenges-and-considerations">Challenges and Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-looking-ahead">Looking Ahead</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you start, make sure you have the following:</p>
<ul>
<li><p><strong>Python 3.8</strong> or later versions installed (for Windows users, in case you don’t have it installed, you can download the latest version at: <a target="_blank" href="http://python.org">python.org</a>. macOS users usually already have <code>python3</code>).</p>
</li>
<li><p><strong>Virtual environment</strong> set up (<code>venv</code>) — recommended to keep things clean.</p>
</li>
<li><p><strong>You have to install</strong> <a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/#"><strong>FFmpeg</strong></a> (This is required for Whisper to read audio files).</p>
</li>
<li><p><strong>PowerShell</strong> (Windows) or <strong>Terminal</strong> (macOS/Linux).</p>
</li>
<li><p><strong>Basic familiarity</strong> with running Python scripts.</p>
</li>
</ul>
<p><strong>Tip</strong>: If you’re new to Python environments, the you shouldn’t worry because the setup commands will be included with each step below.</p>
<h2 id="heading-a-note-on-missing-files">A Note on Missing Files</h2>
<p>Some files are not included in the <a target="_blank" href="https://github.com/tayo4christ/inclusive-ai-toolkit">GitHub repository</a>. This is intentional, they are either generated automatically or should be created/installed locally:</p>
<ul>
<li><p><code>.venv/</code> → Your virtual environment folder. Each reader should create their own locally with:</p>
<pre><code class="lang-python">  python -m venv .venv
</code></pre>
<ol>
<li><p><strong>FFmpeg Installation</strong>:</p>
<ul>
<li><p><strong>Windows</strong>: FFmpeg is not included in the project files because it is large (approximately 90 MB). Users are instructed to download the FFmpeg build separately.</p>
</li>
<li><p><strong>macOS</strong>: Users can install FFmpeg using the Homebrew package manager with the command <code>brew install ffmpeg</code>.</p>
</li>
<li><p><strong>Linux</strong>: Users can install FFmpeg using the package manager with the command <code>sudo apt install ffmpeg</code>.</p>
</li>
</ul>
</li>
<li><p><strong>Output File</strong>:</p>
<ul>
<li><code>output.wav</code> is a file generated when you run the Text-to-Speech script. This file is not included in the GitHub repository, it is created locally on your machine when you execute the script.</li>
</ul>
</li>
</ol>
</li>
</ul>
<p>To keep the repo clean, these are excluded using <code>.gitignore</code>:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Ignore virtual environments</span>
.venv/
env/
venv/

<span class="hljs-comment"># Ignore binary files</span>
ffmpeg.exe
*.dll
*.lib

<span class="hljs-comment"># Ignore generated audio (but keep sample input)</span>
*.wav
*.mp3
!lesson_recording.mp3
</code></pre>
<p>The repository does include all essential files needed to follow along:</p>
<ul>
<li><p><code>requirements.txt</code> (see below)</p>
</li>
<li><p><code>transcribe.py</code> and <code>tts.py</code>(covered step-by-step in the Hands-On section).</p>
</li>
</ul>
<p><code>requirements.txt</code></p>
<pre><code class="lang-python">openai-whisper
transformers
torch
soundfile
sentencepiece
numpy
</code></pre>
<p>This way, you’ll have everything you need to reproduce the project.</p>
<h2 id="heading-what-inclusive-education-really-means">What Inclusive Education Really Means</h2>
<p>Inclusive education goes beyond placing students with diverse needs in the same classroom. It’s about designing learning environments where every student can thrive.</p>
<p>Common barriers include:</p>
<ul>
<li><p>Reading difficulties (for example, dyslexia).</p>
</li>
<li><p>Communication challenges (speech/hearing impairments).</p>
</li>
<li><p>Sensory overload or attention struggles (autism, ADHD).</p>
</li>
<li><p>Note-taking and comprehension difficulties.</p>
</li>
</ul>
<p>AI can help reduce these barriers with captioning, reading aloud, adaptive pacing, and alternative communication tools.</p>
<h2 id="heading-toolbox-five-ai-accessibility-tools-teachers-can-try-today">Toolbox: Five AI Accessibility Tools Teachers Can Try Today</h2>
<ol>
<li><p><a target="_blank" href="https://support.microsoft.com/en-gb/office/use-immersive-reader-in-word-a857949f-c91e-4c97-977c-a4efcaf9b3c1"><strong>Microsoft Immersive Reader</strong></a> – Text-to-speech, reading guides, and translation.</p>
</li>
<li><p><a target="_blank" href="https://cloud.google.com/speech-to-text"><strong>Google Live Transcribe</strong></a> – Real-time captions for speech/hearing support.</p>
</li>
<li><p><a target="_blank" href="http://Otter.ai"><strong>Otter.ai</strong></a> – Automatic note-taking and summarization.</p>
</li>
<li><p><a target="_blank" href="https://www.grammarly.com/"><strong>Grammarly</strong></a> <strong>/</strong> <a target="_blank" href="https://quillbot.com/login?returnUrl=%2F&amp;triggerOneTap=true"><strong>Quillbot</strong></a> – Writing assistance for readability and clarity.</p>
</li>
<li><p><a target="_blank" href="https://blogs.microsoft.com/accessibility/seeing-ai/"><strong>Seeing AI (Microsoft)</strong></a> – Describes text and scenes for visually impaired learners.</p>
</li>
</ol>
<h3 id="heading-real-world-examples">Real-World Examples</h3>
<p>A student with dyslexia can use Immersive Reader to listen to a textbook while following along visually. Another student with hearing loss can use Live Transcribe to follow class discussions. These are small technology shifts that create big inclusion wins.</p>
<h2 id="heading-platform-notes-windows-vs-macoslinux">Platform Notes (Windows vs macOS/Linux)</h2>
<p>Most code works the same across systems, but setup commands differ slightly:</p>
<p><strong>Creating a virtual environment</strong></p>
<p>To create and activate a virtual environment in PowerShell using Python 3.8 or higher, you can follow these steps:</p>
<ol>
<li><p><strong>Create a virtual environment</strong>:</p>
<pre><code class="lang-powershell"> py <span class="hljs-literal">-3</span>.<span class="hljs-number">12</span> <span class="hljs-literal">-m</span> venv .venv
</code></pre>
</li>
<li><p><strong>Activate the virtual environment</strong>:</p>
<pre><code class="lang-powershell"> .\.venv\Scripts\Activate
</code></pre>
</li>
</ol>
<p>Once activated, your PowerShell prompt should change to indicate that you are now working within the virtual environment. This setup helps manage dependencies and keep your project environment isolated.</p>
<p>For Mac OS users to create and activate a virtual environment in a bash shell using Python 3, you can follow these steps:</p>
<ol>
<li><p><strong>Create a virtual environment</strong>:</p>
<pre><code class="lang-bash"> python3 -m venv .venv
</code></pre>
</li>
<li><p><strong>Activate the virtual environment</strong>:</p>
<pre><code class="lang-bash"> <span class="hljs-built_in">source</span> .venv/bin/activate
</code></pre>
</li>
</ol>
<p>Once activated, your bash prompt should change to indicate that you are now working within the virtual environment. This setup helps manage dependencies and keep your project environment isolated.</p>
<p><strong>To install FFmpeg on Windows, follow these steps:</strong></p>
<ol>
<li><p><strong>Download FFmpeg Build</strong>: Visit the official FFmpeg <a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/#">website</a> to download the latest FFmpeg build for Windows.</p>
</li>
<li><p><strong>Unzip the Downloaded File</strong>: Once downloaded, unzip the file to extract its contents. You will find several files, including <code>ffmpeg.exe</code>.</p>
</li>
<li><p><strong>Copy</strong> <code>ffmpeg.exe</code>: You have two options for using <code>ffmpeg.exe</code>:</p>
<ul>
<li><p><strong>Project Folder</strong>: Copy <code>ffmpeg.exe</code> directly into your project folder. This way, your project can access FFmpeg without modifying system settings.</p>
</li>
<li><p><strong>Add to PATH</strong>: Alternatively, you can add the directory containing <code>ffmpeg.exe</code> to your system's PATH environment variable. This allows you to use FFmpeg from any command prompt window without specifying its location.</p>
</li>
</ul>
</li>
</ol>
<p>Additionally, the full project folder, including all necessary files and instructions, is available for <a target="_blank" href="https://github.com/tayo4christ/inclusive-ai-toolkit">download on GitHub</a>. You can also find the link to the GitHub repository at the end of the article.</p>
<p>For macOS users:</p>
<p>To install FFmpeg on macOS, you can use Homebrew, a popular package manager for macOS. Here’s how:</p>
<ol>
<li><p><strong>Open Terminal</strong>: You can find Terminal in the Utilities folder within Applications.</p>
</li>
<li><p><strong>Install Homebrew</strong> (if not already installed): Paste the following command in Terminal and press Enter. Follow the on-screen instructions. /bin/bash -c "$(curl -fsSL <a target="_blank" href="https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh">https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh</a>)"</p>
</li>
<li><p><strong>Install FFmpeg</strong>: Once Homebrew is installed, run the following command in Terminal:</p>
<pre><code class="lang-bash"> brew install ffmpeg
</code></pre>
<p> This command will download and install FFmpeg, making it available for use on your system.</p>
</li>
</ol>
<p>For Linux users (Debian/Ubuntu):</p>
<p>To install FFmpeg on Debian-based systems like Ubuntu, you can use the APT package manager. Here’s how:</p>
<ol>
<li><p><strong>Open Terminal</strong>: You can usually find Terminal in your system’s applications menu.</p>
</li>
<li><p><strong>Update Package List</strong>: Before installing new software, it’s a good idea to update your package list. Run:</p>
<pre><code class="lang-bash"> sudo apt update
</code></pre>
</li>
<li><p><strong>Install FFmpeg</strong>: After updating, install FFmpeg by running:</p>
<pre><code class="lang-bash"> sudo apt install ffmpeg
</code></pre>
<p> This command will download and install FFmpeg, allowing you to use it from the command line.</p>
</li>
</ol>
<p>These steps will ensure that FFmpeg is installed and ready to use on your macOS or Linux system.</p>
<p><strong>Running Python scripts</strong></p>
<ul>
<li><p>Windows: <code>python script.py</code> or <code>py script.py</code></p>
</li>
<li><p>macOS/Linux: <code>python3 script.py</code></p>
</li>
</ul>
<p>I will mark these differences with a <strong>macOS/Linux note</strong> in the relevant steps so you can follow along smoothly on your system.</p>
<h2 id="heading-hands-on-build-a-simple-accessibility-toolkit-python">Hands-On: Build a Simple Accessibility Toolkit (Python)</h2>
<p>You’ll build two small demos:</p>
<ul>
<li><p><strong>Speech-to-Text</strong> with Whisper (local, free).</p>
</li>
<li><p><strong>Text-to-Speech</strong> with Hugging Face SpeechT5.</p>
</li>
</ul>
<h3 id="heading-1-speech-to-text-with-whisper-local-and-free">1) Speech-to-Text with Whisper (Local and free)</h3>
<p><strong>What you’ll build:</strong><br>A Python script that takes a short MP3 recording and prints the transcript to your terminal.</p>
<p><strong>Why Whisper?</strong><br>It’s a robust open-source STT model. The local version is perfect for beginners because it avoids API keys/quotas and works offline after the first install.</p>
<p><strong>How to Install Whisper (PowerShell):</strong></p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Activate your virtual environment</span>
<span class="hljs-comment"># Example: .\venv\Scripts\Activate</span>

<span class="hljs-comment"># Install the openai-whisper package</span>
pip install openai<span class="hljs-literal">-whisper</span>

<span class="hljs-comment"># Check if FFmpeg is available</span>
ffmpeg <span class="hljs-literal">-version</span>

<span class="hljs-comment"># If FFmpeg is not available, download and install it, then add it to PATH or place ffmpeg.exe next to your script</span>
<span class="hljs-comment"># Example: Move ffmpeg.exe to the script directory or update PATH environment variable</span>
</code></pre>
<p><img src="https://github.com/tayo4christ/inclusive-ai-toolkit/blob/a285ef9fd724d5221e1d7090c0d88713d1e5accb/Images/ffmpeg-version.jpg?raw=true" alt="PowerShell confirming FFmpeg is installed" width="600" height="400" loading="lazy"></p>
<p>You should see a version string here before running Whisper.</p>
<p><strong>Note:</strong> Mac OS users can use the same code snippet as above in their terminal</p>
<p>If FFmpeg is not installed, you can install it using the following commands:</p>
<p>For macOS:</p>
<pre><code class="lang-bash">brew install ffmpeg
</code></pre>
<p>For Ubuntu/Debian Linux:</p>
<pre><code class="lang-bash">sudo apt install ffmpeg
</code></pre>
<h3 id="heading-create-transcribepy">Create <code>transcribe.py</code>:</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> whisper

<span class="hljs-comment"># Load the Whisper model</span>
model = whisper.load_model(<span class="hljs-string">"base"</span>)  <span class="hljs-comment"># Use "tiny" or "small" for faster speed</span>

<span class="hljs-comment"># Transcribe the audio file</span>
result = model.transcribe(<span class="hljs-string">"lesson_recording.mp3"</span>, fp16=<span class="hljs-literal">False</span>)

<span class="hljs-comment"># Print the transcript</span>
print(<span class="hljs-string">"Transcript:"</span>, result[<span class="hljs-string">"text"</span>])
</code></pre>
<p><strong>How the code works:</strong></p>
<ul>
<li><p><code>whisper.load_model("base")</code> — downloads/loads the model once, then cached afterward.</p>
</li>
<li><p><code>model.transcribe(...)</code> — handles audio decoding, language detection, and text inference.</p>
</li>
<li><p><code>fp16=False</code> — avoids half-precision GPU math so it runs fine on CPU.</p>
</li>
<li><p><code>result["text"]</code> — the final transcript string.</p>
</li>
</ul>
<p>Run it:</p>
<pre><code class="lang-bash">python transcribe.py
</code></pre>
<p>Expected output:</p>
<p><img src="https://github.com/tayo4christ/inclusive-ai-toolkit/blob/a285ef9fd724d5221e1d7090c0d88713d1e5accb/Images/whisper-transcript.jpg?raw=true" alt="Whisper successfully transcribed audio to text" width="600" height="400" loading="lazy"></p>
<p>Successful Speech-to-Text: Whisper prints the recognized sentence from <code>lesson_recording.mp3</code></p>
<p>To run the <code>transcribe.py</code> script on macOS or Linux, use the following command in your Terminal:</p>
<pre><code class="lang-bash">python3 transcribe.py
</code></pre>
<p><strong>Common hiccups (and fixes):</strong></p>
<ul>
<li><p><code>FileNotFoundError</code> during transcribe → <strong>FFmpeg</strong> isn’t found. Install it and confirm with <code>ffmpeg -version</code>.</p>
</li>
<li><p>Super slow on CPU → switch to <code>tiny</code> or <code>small</code> models: <code>whisper.load_model("small")</code>.</p>
</li>
</ul>
<h3 id="heading-2-text-to-speech-with-speecht5">2) Text-to-Speech with SpeechT5</h3>
<p><strong>What you’ll build:</strong><br>A Python script that converts a short string into a spoken WAV file called <code>output.wav</code>.</p>
<p><strong>Why SpeechT5?</strong><br>It’s a widely used open model that runs on your CPU. Easy to demo and no API key needed.</p>
<p><strong>Install the required packages on (PowerShell) Windows:</strong></p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Activate your virtual environment</span>
<span class="hljs-comment"># Example: .\venv\Scripts\Activate</span>

<span class="hljs-comment"># Install the required packages</span>
pip install transformers torch soundfile sentencepiece
</code></pre>
<p><strong>Note:</strong> Mac OS users can use the same code snippet as above in their terminal</p>
<p>Create <code>tts.py</code></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
<span class="hljs-keyword">import</span> soundfile <span class="hljs-keyword">as</span> sf
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Load models</span>
processor = SpeechT5Processor.from_pretrained(<span class="hljs-string">"microsoft/speecht5_tts"</span>)
model = SpeechT5ForTextToSpeech.from_pretrained(<span class="hljs-string">"microsoft/speecht5_tts"</span>)
vocoder = SpeechT5HifiGan.from_pretrained(<span class="hljs-string">"microsoft/speecht5_hifigan"</span>)

<span class="hljs-comment"># Speaker embedding (fixed random seed for a consistent synthetic voice)</span>
g = torch.Generator().manual_seed(<span class="hljs-number">42</span>)
speaker_embeddings = torch.randn((<span class="hljs-number">1</span>, <span class="hljs-number">512</span>), generator=g)

<span class="hljs-comment"># Text to synthesize</span>
text = <span class="hljs-string">"Welcome to inclusive education with AI."</span>
inputs = processor(text=text, return_tensors=<span class="hljs-string">"pt"</span>)

<span class="hljs-comment"># Generate speech</span>
<span class="hljs-keyword">with</span> torch.no_grad():
    speech = model.generate_speech(inputs[<span class="hljs-string">"input_ids"</span>], speaker_embeddings, vocoder=vocoder)

<span class="hljs-comment"># Save to WAV</span>
sf.write(<span class="hljs-string">"output.wav"</span>, speech.numpy(), samplerate=<span class="hljs-number">16000</span>)
print(<span class="hljs-string">"✅ Audio saved as output.wav"</span>)
</code></pre>
<p>Expected Output:</p>
<p><img src="https://github.com/tayo4christ/inclusive-ai-toolkit/blob/a285ef9fd724d5221e1d7090c0d88713d1e5accb/Images/tts-saved-ok.jpg?raw=true" alt="Text-to-Speech complete — Audio saved as output.wav" width="600" height="400" loading="lazy"></p>
<p>Text-to-Speech complete. SpeechT5 generated the audio and saved it as <code>output.wav</code></p>
<p><strong>How the code works:</strong></p>
<ul>
<li><p><code>SpeechT5Processor</code> — prepares your text for the model.</p>
</li>
<li><p><code>SpeechT5ForTextToSpeech</code> — generates a <em>mel-spectrogram</em> (the speech content).</p>
</li>
<li><p><code>SpeechT5HifiGan</code> — a vocoder that turns the spectrogram into a waveform you can play.</p>
</li>
<li><p><code>speaker_embedding</code> — a 512-dim vector representing a “voice.” We seed it for a consistent (synthetic) voice across runs.</p>
</li>
</ul>
<p>Note: If you want the same voice every time you reopen the project, you need to save the embedding once using the snippet below:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> torch

<span class="hljs-comment"># Save the speaker embeddings</span>
np.save(<span class="hljs-string">"speaker_emb.npy"</span>, speaker_embeddings.numpy())

<span class="hljs-comment"># Later, load the speaker embeddings</span>
speaker_embeddings = torch.tensor(np.load(<span class="hljs-string">"speaker_emb.npy"</span>))
</code></pre>
<p>Run it:</p>
<pre><code class="lang-bash">python tts.py
</code></pre>
<p><strong>Note:</strong> MacOS/Linux use <code>python3 tts.py</code> to run the same code as above.</p>
<p><strong>Expected result:</strong></p>
<ul>
<li><p>Terminal prints: <code>✅ Audio saved as output.wav</code></p>
</li>
<li><p>A new file appears in your folder: <code>output.wav</code></p>
</li>
</ul>
<p><img src="https://github.com/tayo4christ/inclusive-ai-toolkit/blob/a285ef9fd724d5221e1d7090c0d88713d1e5accb/Images/output-wav-explorer.png.jpg?raw=true" alt="Explorer showing the generated output.wav file" width="600" height="400" loading="lazy"></p>
<p><strong>Common hiccups (and fixes):</strong></p>
<ul>
<li><p><code>ImportError: sentencepiece not found</code> → <code>pip install sentencepiece</code></p>
</li>
<li><p>Torch install issues on Windows →</p>
</li>
</ul>
<pre><code class="lang-powershell"><span class="hljs-comment"># Activate your virtual environment</span>
<span class="hljs-comment"># Example: .\venv\Scripts\Activate</span>

<span class="hljs-comment"># Install the torch package using the specified index URL for CPU</span>
pip install torch -<span class="hljs-literal">-index</span><span class="hljs-literal">-url</span> https://download.pytorch.org/whl/cpu
</code></pre>
<p>Note: The first run is usually slow because the models may still be downloading. So that’s normal.</p>
<h3 id="heading-3-optional-whisper-via-openai-api">3) Optional: Whisper via OpenAI API</h3>
<p><strong>What this does:</strong><br>Instead of running Whisper locally, you can call the <strong>OpenAI Whisper API (</strong><code>whisper-1</code>). Your audio file is uploaded to OpenAI’s servers, transcribed there, and the text is returned.</p>
<p><strong>Why use the API?</strong></p>
<ul>
<li><p>No need to install or run Whisper models locally (saves disk space &amp; setup time).</p>
</li>
<li><p>Runs on OpenAI’s infrastructure (faster if your computer is slow).</p>
</li>
<li><p>Great if you’re already using OpenAI services in your classroom or app.</p>
</li>
</ul>
<p><strong>What to watch out for:</strong></p>
<ul>
<li><p>Requires an <strong>API key</strong>.</p>
</li>
<li><p>Requires <strong>billing enabled</strong> (the free trial quota is usually small).</p>
</li>
<li><p>Needs internet access (unlike the local Whisper demo).</p>
</li>
</ul>
<p><strong>How to get an API key:</strong></p>
<ol>
<li><p>Go to <a target="_blank" href="https://auth.openai.com/log-in">OpenAI’s API Keys page.</a></p>
</li>
<li><p>Log in with your OpenAI account (or create one).</p>
</li>
<li><p>Click <strong>“Create new secret key”</strong>.</p>
</li>
<li><p>Copy the key — it looks like <code>sk-xxxxxxxx...</code>. Treat this like a password: don’t share it publicly or push it to GitHub.</p>
</li>
</ol>
<h4 id="heading-step-1-set-your-api-key">Step 1: Set your API key</h4>
<p>In PowerShell (session only):</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Set the OpenAI API key in the environment variable</span>
<span class="hljs-variable">$env:OPENAI_API_KEY</span>=<span class="hljs-string">"your_api_key_here"</span>
</code></pre>
<p>Or permanently set an environment variable in PowerShell - you can use the <code>setx</code> command. Here is how you can do it:</p>
<pre><code class="lang-powershell">setx OPENAI_API_KEY <span class="hljs-string">"your_api_key_here"</span>
</code></pre>
<p>This command sets the <code>OPENAI_API_KEY</code> environment variable to the specified value. Note that you should replace <code>"your_api_key_here"</code> with your actual API key. This change will apply to future PowerShell sessions, but you may need to restart your current session or open a new one to see the changes take effect.</p>
<p>Verify it’s set:</p>
<p>To check the value of an environment variable in PowerShell, you can use the <code>echo</code> command. Here's how you can do it:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">echo</span> <span class="hljs-variable">$env:OPENAI_API_KEY</span>
</code></pre>
<p>This command will display the current value of the <code>OPENAI_API_KEY</code> environment variable in your PowerShell session. If the variable is set, it will print the value. Otherwise, it will return nothing or an empty line.</p>
<p><strong>Step 2: Install the OpenAI Python client</strong></p>
<p>To install the OpenAI Python client using <code>pip</code>, you can use the following command in your PowerShell:</p>
<pre><code class="lang-python">pip install openai
</code></pre>
<p>This command will download and install the OpenAI package, allowing you to interact with OpenAI's API in your Python projects. Make sure you have Python and pip installed on your system before running this command.</p>
<p><strong>Step 3: Create</strong> <code>transcribe_api.py</code></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> openai <span class="hljs-keyword">import</span> OpenAI

<span class="hljs-comment"># Initialize the OpenAI client (reads API key from environment)</span>
client = OpenAI()

<span class="hljs-comment"># Open the audio file and create a transcription</span>
<span class="hljs-keyword">with</span> open(<span class="hljs-string">"lesson_recording.mp3"</span>, <span class="hljs-string">"rb"</span>) <span class="hljs-keyword">as</span> f:
    transcript = client.audio.transcriptions.create(
        model=<span class="hljs-string">"whisper-1"</span>,
        file=f
    )

<span class="hljs-comment"># Print the transcript</span>
print(<span class="hljs-string">"Transcript:"</span>, transcript.text)
</code></pre>
<h4 id="heading-step-4-run-it">Step 4: Run it</h4>
<pre><code class="lang-bash">python transcribe_api.py
</code></pre>
<p>Expected output:</p>
<p><code>Transcript: Welcome to inclusive education with AI.</code></p>
<h4 id="heading-common-hiccups-and-fixes">Common hiccups (and fixes):</h4>
<ul>
<li><p><strong>Error: insufficient_quota</strong> → You’ve run out of free credits. Add billing to continue.</p>
</li>
<li><p><strong>Slow upload</strong> → If your audio is large, compress it first (for example, MP3 instead of WAV).</p>
</li>
<li><p><strong>Key not found</strong> → Double-check if <code>$env:OPENAI_API_KEY</code> is set in your terminal session.</p>
</li>
</ul>
<p><strong>Local Whisper vs API Whisper — Which Should You Use?</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Local Whisper (on your machine)</td><td>OpenAI Whisper API (cloud)</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Setup</strong></td><td>Needs Python packages + FFmpeg</td><td>Just install <code>openai</code> client + set API key</td></tr>
<tr>
<td><strong>Hardware</strong></td><td>Runs on your CPU (slower) or GPU (faster)</td><td>Runs on OpenAI’s servers (no local compute needed)</td></tr>
<tr>
<td><strong>Cost</strong></td><td>✅ Free after initial download</td><td>💳 Pay per minute of audio (after free trial quota)</td></tr>
<tr>
<td><strong>Internet required</strong></td><td>❌ No (fully offline once installed)</td><td>✅ Yes (uploads audio to OpenAI servers)</td></tr>
<tr>
<td><strong>Accuracy</strong></td><td>Very good - depends on model size (tiny → large)</td><td>Consistently strong - optimized by OpenAI</td></tr>
<tr>
<td><strong>Speed</strong></td><td>Slower on CPU, faster with GPU</td><td>Fast (uses OpenAI’s infrastructure)</td></tr>
<tr>
<td><strong>Privacy</strong></td><td>Audio never leaves your machine</td><td>Audio is sent to OpenAI (data handling per policy)</td></tr>
</tbody>
</table>
</div><p><strong>Rule of thumb:</strong></p>
<ul>
<li><p>Use <strong>Local Whisper</strong> if you want free, offline transcription or you’re working with sensitive data.</p>
</li>
<li><p>Use the <strong>API Whisper</strong> if you prefer convenience, don’t mind usage billing, and want speed without local setup.</p>
</li>
</ul>
<h2 id="heading-quick-setup-cheatsheet"><strong>Quick Setup Cheatsheet</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Task</td><td>Windows (PowerShell)</td><td>macOS / Linux (Terminal)</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Create venv</strong></td><td><code>py -3.12 -m venv .venv</code></td><td><code>python3 -m venv .venv</code></td></tr>
<tr>
<td><strong>Activate venv</strong></td><td><code>.\.venv\Scripts\Activate</code></td><td><code>source .venv/bin/activate</code></td></tr>
<tr>
<td><strong>Install Whisper</strong></td><td><code>pip install openai-whisper</code></td><td><code>pip install openai-whisper</code></td></tr>
<tr>
<td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>Install FFmpeg</strong></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/">Download build → unzip → ad</a>d to PATH or cop<a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/">y <code>ffmpeg.exe</code></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>brew install</code></a> <code>ffmpeg</code> (macO<a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/">S) <code>sudo apt in</code></a><code>stall ffmpeg</code> <a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/">(Linux)</a></td></tr>
<tr>
<td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>R</strong></a><strong>un</strong> <a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>STT script</strong></a></td><td><code>python</code> <a target="_blank" href="http://transcribe.py"><code>transcribe.py</code></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pyth</code></a><code>on3</code> <a target="_blank" href="http://transcribe.py"><code>transcribe.py</code></a></td></tr>
<tr>
<td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>Install</strong></a> <strong>TTS d</strong><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>eps</strong></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pip ins</code></a><code>tall transformer</code><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>s torch soundf</code></a><code>ile sentencepiece</code></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pip install</code></a> <code>tra</code><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>nsformers torc</code></a><code>h soundfile sentencepiece</code></td></tr>
<tr>
<td><strong>Run TTS script</strong></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>python</code></a> <a target="_blank" href="http://tts.py"><code>tts.py</code></a></td><td><code>python3</code> <a target="_blank" href="http://tts.py"><code>tts.py</code></a></td></tr>
<tr>
<td><strong>Install OpenAI client (A</strong><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>PI)</strong></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pip ins</code></a><code>tall</code> <a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>openai</code></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pip</code></a> <code>install o</code><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>penai</code></a></td></tr>
<tr>
<td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><strong>Run</strong></a> <strong>API script</strong></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>python trans</code></a><code>cribe_</code><a target="_blank" href="http://api.py"><code>api.py</code></a></td><td><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/"><code>pyth</code></a><code>on3 transcribe_</code><a target="_blank" href="http://api.py"><code>api.py</code></a></td></tr>
</tbody>
</table>
</div><p><strong>Pro tip for MacOS M1/M2 users:</strong> You may need a special PyTorch build for Metal GPU acceleration. Check the <a target="_blank" href="https://pytorch.org/get-started/locally/">PyTorch install guide</a> for the right wheel.</p>
<h2 id="heading-from-code-to-classroom-impact">From Code to Classroom Impact</h2>
<p>Whether you chose the <strong>local Whisper</strong>, the <strong>cloud API</strong>, or SpeechT5 for <strong>text-to-speech</strong><a target="_blank" href="https://www.gyan.dev/ffmpeg/builds/">,</a> you should now have a working prototype that can:</p>
<ul>
<li><p>Convert spoken lessons into text.</p>
</li>
<li><p>Read text aloud for students who prefer auditory input.</p>
</li>
</ul>
<p>That’s the technical foundation. But the real question is: how can these building blocks empower teachers and learners in real classrooms?</p>
<h2 id="heading-developer-challenge-build-for-inclusion">Developer Challenge: Build for Inclusion</h2>
<p>Try combining the two snippets into a simple <strong>classroom companion app</strong> that:</p>
<ul>
<li><p><strong>Captions</strong> what the teacher says in real time.</p>
</li>
<li><p><strong>Reads aloud</strong> transcripts or textbook passages on demand.</p>
</li>
</ul>
<p>Then think about how to expand it further:</p>
<ul>
<li><p>Add <strong>symbol recognition</strong> for non-verbal communication.</p>
</li>
<li><p>Add <strong>multi-language translation</strong> for diverse classrooms.</p>
</li>
<li><p>Add <strong>offline support</strong> for schools with poor connectivity.</p>
</li>
</ul>
<p>These are not futuristic ideas, they are achievable with today’s open-source AI tools.</p>
<h2 id="heading-challenges-and-considerations">Challenges and Considerations</h2>
<p>Of course, building for inclusion isn’t just about code. There are important challenges to address:</p>
<ul>
<li><p><strong>Privacy</strong>: Student data must be safeguarded, especially when recordings are involved.</p>
</li>
<li><p><strong>Cost</strong>: Solutions must be affordable and scalable for schools of all sizes.</p>
</li>
<li><p><strong>Teacher Training</strong>: Educators need support to confidently use these tools.</p>
</li>
<li><p><strong>Balance</strong>: AI should assist teachers, not replace the vital human element in learning.</p>
</li>
</ul>
<h2 id="heading-looking-ahead">Looking Ahead</h2>
<p>The future of inclusive education will likely involve multimodal AI which include systems that combine speech, gestures, symbols, and even emotion recognition. We may even see brain–computer interfaces and wearable devices that enable seamless communication for learners who are currently excluded.</p>
<p>But one principle is clear: inclusion works best when teachers, developers, and neurodiverse learners co-design solutions together.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>AI isn’t here to replace teachers, it’s here to help them reach every student. By embracing AI-driven accessibility, classrooms can transform into spaces where neurodiverse learners aren’t left behind, but instead empowered to thrive.</p>
<p>📢 <strong>Your turn:</strong></p>
<ul>
<li><p><strong>Teachers</strong>: You can try one of the tools in your next lesson.</p>
</li>
<li><p><strong>Developers</strong>: You can use the code snippets above to prototype your own inclusive classroom tool.</p>
</li>
<li><p><strong>Policymakers</strong>: You can support initiatives that make accessibility central to education.</p>
</li>
</ul>
<p>Inclusive education isn’t just a dream, it’s becoming a reality. With thoughtful use of AI, it can become the new norm.</p>
<p><strong>Full source code on GitHub:</strong> <a target="_blank" href="https://github.com/tayo4christ/inclusive-ai-toolkit">Inclusive AI Toolkit</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a Real-Time Gesture-to-Text Translator Using Python and Mediapipe ]]>
                </title>
                <description>
                    <![CDATA[ Sign and symbol languages, like Makaton and American Sign Language (ASL), are powerful communication tools. However, they can create challenges when communicating with people who don't understand them. As a researcher working on AI for accessibility,... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/create-a-real-time-gesture-to-text-translator/</link>
                <guid isPermaLink="false">68a331edf6c19271552e2ac7</guid>
                
                    <category>
                        <![CDATA[ Computer Vision ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ OMOTAYO OMOYEMI ]]>
                </dc:creator>
                <pubDate>Mon, 18 Aug 2025 14:00:13 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755525484024/9f4c42e0-dbfd-4f04-9223-0a2169abd1fb.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Sign and symbol languages, like Makaton and American Sign Language (ASL), are powerful communication tools. However, they can create challenges when communicating with people who don't understand them.</p>
<p>As a researcher working on AI for accessibility, I wanted to explore how machine learning and computer vision could bridge that gap. The result was a real-time gesture-to-text translator built with Python and Mediapipe, capable of detecting hand gestures and instantly converting them to text.</p>
<p>In this tutorial, you’ll learn how to build your own version from scratch, even if you’ve never used Mediapipe before.</p>
<p>By the end, you’ll know how to:</p>
<ul>
<li><p>Detect and track hand movements in real time.</p>
</li>
<li><p>Classify gestures using a simple machine learning model.</p>
</li>
<li><p>Convert recognized gestures into text output.</p>
</li>
<li><p>Extend the system for accessibility-focused applications.</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before following along with this tutorial, you should have:</p>
<ul>
<li><p><strong>Basic Python knowledge</strong> – You should be comfortable writing and running Python scripts.</p>
</li>
<li><p><strong>Familiarity with the command line</strong> – You’ll use it to run scripts and install dependencies.</p>
</li>
<li><p><strong>A working webcam</strong> – This is required for capturing and recognizing gestures in real time.</p>
</li>
<li><p><strong>Python installed (3.8 or later)</strong> – Along with <code>pip</code> for installing packages.</p>
</li>
<li><p><strong>Some understanding of machine learning basics</strong> – Knowing what training data and models are will help, but I’ll explain the key parts along the way.</p>
</li>
<li><p><strong>An internet connection</strong> – To install libraries such as Mediapipe and OpenCV.</p>
</li>
</ul>
<p>If you’re completely new to Mediapipe or OpenCV, don’t worry, I will walk through the core parts you need to know to get this project working.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-this-matters">Why This Matters</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tools-and-technologies">Tools and Technologies</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-how-to-install-the-required-libraries">Step 1: How to Install the Required Libraries</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-how-mediapipe-tracks-hands">Step 2: How Mediapipe Tracks Hands</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-project-pipeline">Step 3: Project Pipeline</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-how-to-collect-gesture-data">Step 4: How to Collect Gesture Data</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-how-to-train-a-gesture-classifier">Step 5: How to Train a Gesture Classifier</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-6-real-time-gesture-to-text-translation">Step 6: Real-Time Gesture-to-Text Translation</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-7-extending-the-project">Step 7: Extending the Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-ethical-and-accessibility-considerations">Ethical and Accessibility Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p>Accessible communication is a right, not a privilege. Gesture-to-text translators can:</p>
<ul>
<li><p>Help non-signers communicate with sign/symbol language users.</p>
</li>
<li><p>Assist in educational contexts for children with communication challenges.</p>
</li>
<li><p>Support people with speech impairments.</p>
</li>
</ul>
<p><strong>Note:</strong> This project is a proof-of-concept and should be tested with diverse datasets before real-world deployment.</p>
<h2 id="heading-tools-and-technologies">Tools and Technologies</h2>
<p>We’ll be using:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Tool</td><td>Purpose</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Python</strong></td><td>Primary programming language</td></tr>
<tr>
<td><strong>Mediapipe</strong></td><td>Real-time hand tracking and gesture detection</td></tr>
<tr>
<td><strong>OpenCV</strong></td><td>Webcam input and video display</td></tr>
<tr>
<td><strong>NumPy</strong></td><td>Data processing</td></tr>
<tr>
<td><strong>Scikit-learn</strong></td><td>Gesture classification</td></tr>
</tbody>
</table>
</div><h2 id="heading-step-1-how-to-install-the-required-libraries">Step 1: How to Install the Required Libraries</h2>
<p>Before installing the dependencies, ensure you have Python version 3.8 or higher installed (for example, Python 3.8, 3.9, 3.10, or newer). You can check your current Python version by opening a terminal (Command Prompt on Windows, or Terminal on macOS/Linux) and typing:</p>
<pre><code class="lang-bash">python --version
</code></pre>
<p>or</p>
<pre><code class="lang-bash">python3 --version
</code></pre>
<p>You have to confirm that your Python version is 3.8 or higher because Mediapipe and some dependencies require modern language features and binary wheels. If the commands above print a version older than/before 3.8, then you’ll have to install a newer Python version before you continue.</p>
<p><strong>Windows:</strong></p>
<ol>
<li><p>Press <strong>Windows Key + R</strong></p>
</li>
<li><p>Type <code>cmd</code> and press Enter to open Command Prompt</p>
</li>
<li><p>Type one of the above commands and press Enter</p>
</li>
</ol>
<p><strong>macOS/Linux:</strong></p>
<ol>
<li><p>Open your <strong>Terminal</strong> application</p>
</li>
<li><p>Type one of the above commands and press Enter</p>
</li>
</ol>
<p>If your Python version is older than 3.8, you’ll need to <a target="_blank" href="https://www.python.org/downloads/">download and install a newer version from the official Python website</a>.</p>
<p>Once Python is ready, you can install the required libraries using <code>pip</code>:</p>
<pre><code class="lang-bash">pip install mediapipe opencv-python numpy scikit-learn pandas
</code></pre>
<p>This command installs all the libraries you’ll need for the project:</p>
<ul>
<li><p><strong>Mediapipe</strong> – real-time hand tracking and landmark detection.</p>
</li>
<li><p><strong>OpenCV</strong> – reading frames from your webcam and drawing overlays.</p>
</li>
<li><p><strong>Pandas</strong> – storing our collected landmark data in a CSV for training.</p>
</li>
<li><p><strong>Scikit-learn</strong> – training and evaluating the gesture classification model.</p>
</li>
</ul>
<h2 id="heading-step-2-how-mediapipe-tracks-hands">Step 2: How Mediapipe Tracks Hands</h2>
<p>Mediapipe’s Hand Tracking solution detects 21 key landmarks for each hand including fingertips, joints, and the wrist, at up to <strong>30+ FPS</strong> even on modest hardware.</p>
<p>Here’s a conceptual diagram of the landmarks:</p>
<p><img src="https://github.com/tayo4christ/Gesture_Article/blob/7598826bb530d5bd1cd40251d6f56f35653b6b51/images/landmarks_concept.png?raw=true" alt="Diagram showing Mediapipe hand landmark numbering and connections between joints" width="600" height="400" loading="lazy"></p>
<p>And here’s what real‑time tracking looks like:</p>
<p><img src="https://github.com/tayo4christ/Gesture_Article/blob/7598826bb530d5bd1cd40251d6f56f35653b6b51/images/hand_tracking_3d_android_gpu.gif?raw=true" alt="Animated GIF showing Mediapipe 3D hand tracking detecting finger joints and bones in real-time" width="600" height="400" loading="lazy"></p>
<p>Each landmark has <code>(x, y, z)</code> coordinates relative to the image size, making it easy to measure angles and positions for gesture classification.</p>
<h2 id="heading-step-3-project-pipeline">Step 3: Project Pipeline</h2>
<p>Here’s how the system works, from webcam to text output:</p>
<p><img src="https://github.com/tayo4christ/Gesture_Article/blob/7598826bb530d5bd1cd40251d6f56f35653b6b51/diagrams/pipeline_flowchart.png?raw=true" alt="Pipeline Flowchart showing how gesture input flows through hand tracking, feature extraction, gesture classification, and final text output" width="600" height="400" loading="lazy"></p>
<ul>
<li><p><strong>Capture</strong>: Webcam frames are captured using OpenCV.</p>
</li>
<li><p><strong>Detection</strong>: Mediapipe locates hand landmarks.</p>
</li>
<li><p><strong>Vectorization</strong>: Landmarks are flattened into a numeric vector.</p>
</li>
<li><p><strong>Classification</strong>: A machine learning model predicts the gesture.</p>
</li>
<li><p><strong>Output</strong>: The recognized gesture is displayed as text.</p>
</li>
</ul>
<p>Basic hand detection example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> cv2
<span class="hljs-keyword">import</span> mediapipe <span class="hljs-keyword">as</span> mp

mp_hands = mp.solutions.hands
mp_draw = mp.solutions.drawing_utils

cap = cv2.VideoCapture(<span class="hljs-number">0</span>)

<span class="hljs-keyword">with</span> mp_hands.Hands(max_num_hands=<span class="hljs-number">1</span>) <span class="hljs-keyword">as</span> hands:
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        ret, frame = cap.read()
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> ret:
            <span class="hljs-keyword">break</span>

        results = hands.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

        <span class="hljs-keyword">if</span> results.multi_hand_landmarks:
            <span class="hljs-keyword">for</span> hand_landmarks <span class="hljs-keyword">in</span> results.multi_hand_landmarks:
                mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

        cv2.imshow(<span class="hljs-string">"Hand Tracking"</span>, frame)
        <span class="hljs-keyword">if</span> cv2.waitKey(<span class="hljs-number">1</span>) &amp; <span class="hljs-number">0xFF</span> == ord(<span class="hljs-string">"q"</span>):
            <span class="hljs-keyword">break</span>

cap.release()
cv2.destroyAllWindows()
</code></pre>
<p>The code above opens the webcam and processes each frame with Mediapipe’s Hands solution. The frame is then converted to RGB (as Mediapipe expects), runs detection, and if a hand is found, it draws the 21 landmarks and their connections on top of the frame. You can press <code>q</code> to close the window. This piece verifies your setup and helps you see that landmark tracking works before moving on.</p>
<h2 id="heading-step-4-how-to-collect-gesture-data">Step 4: How to Collect Gesture Data</h2>
<p>Before we can train our model, we need a dataset of <strong>labelled gestures</strong>. Each gesture will be stored in a CSV file (<code>gesture_data.csv</code>) containing the 3D landmark coordinates for all detected hand points.</p>
<p>For example, we’ll collect data for three gestures:</p>
<ul>
<li><p><strong>thumbs_up</strong> – the classic thumbs-up pose.</p>
</li>
<li><p><strong>open_palm</strong> – a flat hand, fingers extended (like a “high five”).</p>
</li>
<li><p><strong>ok</strong> – the “OK” sign, made by touching the thumb and index finger.</p>
</li>
</ul>
<p>You can collect samples for each gesture by running:</p>
<pre><code class="lang-bash">python src/collect_data.py --label thumbs_up --samples 200
</code></pre>
<pre><code class="lang-bash">python src/collect_data.py --label open_palm --samples 200
</code></pre>
<pre><code class="lang-bash">python src/collect_data.py --label ok --samples 200
</code></pre>
<p><strong>Explanation of the command:</strong></p>
<ul>
<li><p><code>--label</code> → the name of the gesture you’re recording. This label will be stored alongside each row of coordinates in the CSV.</p>
</li>
<li><p><code>--samples</code> → the number of frames to capture for that gesture. More samples generally lead to better accuracy.</p>
</li>
</ul>
<p><strong>How the process works:</strong></p>
<ol>
<li><p>When you run a command, your webcam will open.</p>
</li>
<li><p>Make the specified gesture in front of the camera.</p>
</li>
<li><p>The script will use MediaPipe Hands to detect 21 hand landmarks (each with <code>x</code>, <code>y</code>, <code>z</code> coordinates).</p>
</li>
<li><p>These 63 numbers (21 × 3) are stored in a row of the CSV file, along with the gesture label.</p>
</li>
<li><p>The counter at the top will track how many samples have been collected.</p>
</li>
<li><p>When the sample count reaches your target (<code>--samples</code>), the script will close automatically.</p>
</li>
</ol>
<p><strong>Example of what the CSV looks like:</strong></p>
<p><img src="https://raw.githubusercontent.com/tayo4christ/Gesture_Article/26db13366407e5b5d230a6c7dd7923e34a9f2a19/screenshots/gesture_data.webp" alt="Sample of gesture_data.csv" width="600" height="400" loading="lazy"></p>
<p>Each row contains:</p>
<ul>
<li><p><strong>x0, y0, z0 … x20, y20, z20</strong> → coordinates of each hand landmark.</p>
</li>
<li><p><strong>label</strong> → the gesture name.</p>
</li>
</ul>
<p><strong>Example of data collection in progress:</strong></p>
<p><img src="https://github.com/tayo4christ/Gesture_Article/blob/7598826bb530d5bd1cd40251d6f56f35653b6b51/screenshots/detection_example.jpg?raw=true" alt="Screenshot of data collection interface capturing hand gesture landmarks from webcam" width="600" height="400" loading="lazy"></p>
<p>In the above screenshot, the script is capturing <strong>10 out of 10</strong> <code>thumbs_up</code> samples.</p>
<p>📌 <strong>Tip:</strong> Make sure your hand is clearly visible and well-lit. Repeat the process for all gestures you want to train.</p>
<h2 id="heading-step-5-how-to-train-a-gesture-classifier">Step 5: How to Train a Gesture Classifier</h2>
<p>Once you have enough samples for each gesture, train a model:</p>
<pre><code class="lang-bash">python src/train_model.py --data data/gesture_data.csv --label palm_open
</code></pre>
<p>This script:</p>
<ul>
<li><p>Loads the CSV dataset.</p>
</li>
<li><p>Splits into training and testing sets.</p>
</li>
<li><p>Trains a Random Forest Classifier.</p>
</li>
<li><p>Prints accuracy and a classification report.</p>
</li>
<li><p>Saves the trained model.</p>
</li>
</ul>
<p>Core training logic:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> RandomForestClassifier
<span class="hljs-keyword">import</span> pickle

<span class="hljs-comment"># Load the dataset</span>
df = pd.read_csv(<span class="hljs-string">"data/gesture_data.csv"</span>)

<span class="hljs-comment"># Separate features and labels</span>
X = df.drop(<span class="hljs-string">"label"</span>, axis=<span class="hljs-number">1</span>)
y = df[<span class="hljs-string">"label"</span>]

<span class="hljs-comment"># Initialize and train the Random Forest Classifier</span>
model = RandomForestClassifier()
model.fit(X, y)

<span class="hljs-comment"># Save the trained model to a file</span>
<span class="hljs-keyword">with</span> open(<span class="hljs-string">"data/gesture_model.pkl"</span>, <span class="hljs-string">"wb"</span>) <span class="hljs-keyword">as</span> f:
    pickle.dump(model, f)
</code></pre>
<p>This block loads the gesture dataset from <code>data/gesture_data.csv</code> and splits it into:</p>
<ul>
<li><p><code>X</code> – the input features (the 3D landmark coordinates for each gesture sample).</p>
</li>
<li><p><code>y</code> – the labels (gesture names like <code>thumbs_up</code>, <code>open_palm</code>, <code>ok</code>).</p>
</li>
</ul>
<p>We then created a Random Forest Classifie<strong>r</strong>, which is well-suited for numerical data and works reliably without much tuning. The model learns patterns in the landmark positions that correspond to each gesture.<br>Finally, we saved the trained model as <code>data/gesture_model.pkl</code> so it can be loaded later for real-time gesture recognition without retraining.</p>
<h2 id="heading-step-6-real-time-gesture-to-text-translation">Step 6: Real-Time Gesture-to-Text Translation</h2>
<p>Load the model and run the translator:</p>
<pre><code class="lang-bash">python src/gesture_to_text.py --model data/gesture_model.pkl
</code></pre>
<p>This command runs the real-time gesture recognition script.</p>
<ul>
<li><p>The <code>--model</code> argument tells the script which trained model file to load — in this case, <code>gesture_model.pkl</code> that we saved earlier.</p>
</li>
<li><p>Once running, the script opens your webcam, detects your hand landmarks, and uses the model to predict the gesture.</p>
</li>
<li><p>The predicted gesture name appears as text on the video feed.</p>
</li>
<li><p>Press <code>q</code> to exit the window when you’re done.</p>
</li>
</ul>
<p>Core prediction logic:</p>
<pre><code class="lang-python"><span class="hljs-keyword">with</span> open(<span class="hljs-string">"data/gesture_model.pkl"</span>, <span class="hljs-string">"rb"</span>) <span class="hljs-keyword">as</span> f:
    model = pickle.load(f)

<span class="hljs-keyword">if</span> results.multi_hand_landmarks:
    <span class="hljs-keyword">for</span> hand_landmarks <span class="hljs-keyword">in</span> results.multi_hand_landmarks:
        coords = []
        <span class="hljs-keyword">for</span> lm <span class="hljs-keyword">in</span> hand_landmarks.landmark:
            coords.extend([lm.x, lm.y, lm.z])
        gesture = model.predict([coords])[<span class="hljs-number">0</span>]
        cv2.putText(frame, gesture, (<span class="hljs-number">10</span>, <span class="hljs-number">50</span>), cv2.FONT_HERSHEY_SIMPLEX, <span class="hljs-number">1</span>, (<span class="hljs-number">0</span>, <span class="hljs-number">255</span>, <span class="hljs-number">0</span>), <span class="hljs-number">2</span>)
</code></pre>
<p>This code loads the trained gesture recognition model from <code>gesture_model.pkl</code>.<br>If any hands are detected (<code>results.multi_hand_landmarks</code>), it loops through each detected hand and:</p>
<ol>
<li><p><strong>Extracts the coordinates</strong> – for each of the 21 landmarks, it appends the <code>x</code>, <code>y</code>, and <code>z</code> values to the <code>coords</code> list.</p>
</li>
<li><p><strong>Makes a prediction</strong> – passes <code>coords</code> to the model’s <code>predict</code> method to get the most likely gesture label.</p>
</li>
<li><p><strong>Displays the result</strong> – uses <code>cv2.putText</code> to draw the predicted gesture name on the video feed.</p>
</li>
</ol>
<p>This is the real-time decision-making step that turns raw Mediapipe landmark data into a readable gesture label.</p>
<p>You should see the recognized gesture at the top of the video feed:</p>
<p><img src="https://github.com/tayo4christ/Gesture_Article/blob/7598826bb530d5bd1cd40251d6f56f35653b6b51/screenshots/text_output.jpg?raw=true" alt="Screenshot of the real-time gesture recognition output overlaying the 'palm_open' label on the video feed" width="600" height="400" loading="lazy"></p>
<h2 id="heading-step-7-extending-the-project">Step 7: Extending the Project</h2>
<p>You can take this project further by:</p>
<ul>
<li><p><strong>Adding Text-to-Speech</strong>: Use <code>pyttsx3</code> to speak recognized words.</p>
</li>
<li><p><strong>Supporting More Gestures</strong>: Expand your dataset.</p>
</li>
<li><p><strong>Deploying in the Browser</strong>: Use TensorFlow.js for web-based recognition.</p>
</li>
<li><p><strong>Testing with Real Users</strong>: Especially in accessibility contexts.</p>
</li>
</ul>
<h2 id="heading-ethical-and-accessibility-considerations">Ethical and Accessibility Considerations</h2>
<p>Before deploying:</p>
<ul>
<li><p><strong>Dataset Diversity</strong>: Train with gestures from different skin tones, hand sizes, and lighting conditions.</p>
</li>
<li><p><strong>Privacy</strong>: Store only landmark coordinates unless you have consent for video storage.</p>
</li>
<li><p><strong>Cultural Context</strong>: Some gestures have different meanings in different cultures.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this tutorial, we explored how to use Python, Mediapipe, and machine learning to build a real-time gesture-to-text translator. This technology has exciting potential for accessibility and inclusive communication, and with further development, could become a powerful tool for breaking down language barriers.</p>
<p>You can find the full code and resources here:</p>
<p><strong>GitHub Repo</strong> – <a target="_blank" href="https://github.com/tayo4christ/Gesture_Article">Gesture_Article</a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Improve Web Accessibility with Landmarks - Explained with Examples ]]>
                </title>
                <description>
                    <![CDATA[ If you’re reading this article on the freeCodeCamp publication, you should see some visual clues in different sections of the page. The header is at the top of the page. If you scroll all the way to the bottom of the page, you can see the footer sect... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/improve-web-accessibility-with-landmarks/</link>
                <guid isPermaLink="false">68926eb93b7f491bf54aef85</guid>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ilknur Eren ]]>
                </dc:creator>
                <pubDate>Tue, 05 Aug 2025 20:51:05 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754425989581/1302898e-439b-4666-af27-cf1b091c6975.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you’re reading this article on the freeCodeCamp publication, you should see some visual clues in different sections of the page. The header is at the top of the page. If you scroll all the way to the bottom of the page, you can see the footer section in grey background, which is clearly separated from the body with a white background.</p>
<p>freecCodeCamp, like other websites, visually separates the sections of the page to give the user clues so they can easily navigate between sections.</p>
<p>While sighted users have visual clues about the sections, those who use assistive technology like a screen reader, rely on landmarks to navigate through the page.</p>
<p>Simply put, landmarks are semantic regions in a web page that define the purpose of its sections. Landmarks allow assistive technologies to jump between major parts of the page, just like sighted users visually scan headings or menus.</p>
<p>Common HTML landmarks include:</p>
<ul>
<li><p><code>&lt;header&gt;</code> – Represents introductory content or a page header.</p>
</li>
<li><p><code>&lt;nav&gt;</code> – Identifies navigation links.</p>
</li>
<li><p><code>&lt;main&gt;</code> – Marks the main content area of the page.</p>
</li>
<li><p><code>&lt;aside&gt;</code> – Contains complementary or related information.</p>
</li>
<li><p><code>&lt;footer&gt;</code> – Represents page or section footer.</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-to-navigate-landmarks-in-any-browser">How to Navigate Landmarks in Any Browser</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-navigate-through-landmarks-on-a-mac-voice-over">How to Navigate Through Landmarks on a Mac Voice Over</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-why-landmarks-matter-for-accessibility">Why Landmarks Matter for Accessibility</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-landmarks">How to Use Landmarks</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-concrete-examples-of-each-landmark">Concrete Examples of Each Landmark</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-how-to-navigate-landmarks-in-any-browser">How to Navigate Landmarks in Any Browser</h2>
<h3 id="heading-general-browser-support">General Browser Support</h3>
<p>Most screen readers support landmark navigation with shortcut keys. Here's a basic overview:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Screen Reader</td><td>OS</td><td>Shortcut</td></tr>
</thead>
<tbody>
<tr>
<td>VoiceOver</td><td>macOS</td><td><code>Control + Option + U</code> (to open Rotor), then arrow keys to navigate</td></tr>
<tr>
<td>NVDA</td><td>Windows</td><td><code>D</code> to move to the next landmark</td></tr>
<tr>
<td>JAWS</td><td>Windows</td><td><code>R</code> to cycle through regions</td></tr>
<tr>
<td>Narrator</td><td>Windows</td><td><code>Caps Lock + Right Arrow</code> to move by landmark</td></tr>
<tr>
<td>ChromeVox</td><td>Chrome OS</td><td><code>Search + Left/Right Arrow</code> to move between landmarks</td></tr>
</tbody>
</table>
</div><p>These shortcuts let users jump between regions—for example, from the <code>&lt;main&gt;</code> content directly to the <code>&lt;footer&gt;</code>—without tabbing through every interactive element.</p>
<h2 id="heading-how-to-navigate-through-landmarks-on-a-mac-voice-over"><strong>How to Navigate Through Landmarks on a Mac Voice Over</strong></h2>
<ol>
<li><p><strong>Turn on VoiceOver:</strong> You can easily turn VoiceOver by opening Finder and typing VoiceOver. Toggle VoiceOver on.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753898345534/2fba73d7-b102-41ce-8731-f865a71631e6.png" alt="Finder searching the word, &quot;voiceOver&quot; Underneath is a VoiceOver with a toggle to turn on. Underneath that is VoiceOver Utility" class="image--center mx-auto" width="1160" height="256" loading="lazy"></p>
</li>
<li><p><strong>Open rotor:</strong> Once you turned on voiceOver, press Control+Option+U on your keyboard. This will open the VoiceOver rotor. You can press right and left arrow to switch through different rotor items which include navigating with all headings, links and landmarks. Screenshot below is the accessibility rotor’s landmark item option on freeCodeCamp article. The article is divided up into navigation, search, main, article and footer elements.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753897939501/ddef4a40-047d-469c-80e3-add29ff8f297.png" alt="Landmarks title followed up by navigation, search, main, article and footer" class="image--center mx-auto" width="1032" height="406" loading="lazy"></p>
<ol start="3">
<li><strong>Press down and up arrow to navigate through landmarks:</strong> Once you are on accessibility rotor’s landmark items, you can press down and up arrow to navigate to different sections of the page. If you want to go to the footer, press the down arrow until you reach footer and then press enter.</li>
</ol>
<h2 id="heading-why-landmarks-matter-for-accessibility"><strong>Why Landmarks Matter for Accessibility</strong></h2>
<h3 id="heading-1-easier-navigation-for-screen-reader-users"><strong>1. Easier Navigation for Screen Reader Users</strong></h3>
<p>Screen readers provide shortcuts to navigate through landmarks. Without landmarks, users must tab through every single link or element, which is frustrating and time-consuming. In the freeCodeCamp article example, the user might want to skip to the footer in order to find and click on the donation link. Without landmarks, the user will need to tab through the entire article to reach the footer. This is time consuming and exhausting. Landmarks provide easy navigation to users that rely on screen readers.</p>
<h3 id="heading-2-consistent-structure-across-pages"><strong>2. Consistent Structure Across Pages</strong></h3>
<p>When every page uses the same landmark structure, users can predict where navigation menus, main content, and sidebars are located. This predictability reduces cognitive load. With organizing the page into sections, you can easily figure out where to go next.</p>
<h3 id="heading-3-clear-context-and-orientation"><strong>3. Clear Context and Orientation</strong></h3>
<p>Landmarks communicate the <strong>role</strong> of content. For instance:</p>
<ul>
<li><p>The <code>main</code> landmark signals: <em>“This is the core content of the page.”</em></p>
</li>
<li><p>The <code>aside</code> landmark signals: <em>“This is supplementary or related content.”</em></p>
</li>
</ul>
<p>This helps users decide which areas to skip or focus on.</p>
<h2 id="heading-how-to-use-landmarks">How to Use Landmarks</h2>
<h3 id="heading-basic-landmark-structure">✅ <strong>Basic Landmark Structure</strong></h3>
<p>Here’s an example of a page using HTML5 landmarks:</p>
<pre><code class="lang-plaintext">&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
&lt;head&gt;
  &lt;meta charset="UTF-8"&gt;
  &lt;title&gt;Accessible Landmark Example&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;

  &lt;header&gt;
    &lt;h1&gt;Website Logo&lt;/h1&gt;
    &lt;nav&gt;
      &lt;ul&gt;
        &lt;li&gt;&lt;a href="#home"&gt;Home&lt;/a&gt;&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/nav&gt;
  &lt;/header&gt;

  &lt;main&gt;
    &lt;h2&gt;Main Content Area&lt;/h2&gt;
    &lt;p&gt;This is the primary content of the page.&lt;/p&gt;
  &lt;/main&gt;

  &lt;aside&gt;
    &lt;h3&gt;Related Links&lt;/h3&gt;
    &lt;ul&gt;
      &lt;li&gt;&lt;a href="#resource1"&gt;Resource 1&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/aside&gt;

  &lt;footer&gt;
    &lt;p&gt;2025 Example Company&lt;/p&gt;
  &lt;/footer&gt;

&lt;/body&gt;
&lt;/html&gt;
</code></pre>
<p>The HTML is divided into 5 landmark sections which are header, navigation, main, aside and footer. If the screen reader wants to skip the header and go direct to the main content, they can do so by turning the accessibility rotor and clicking on the main landmark. Landmarks allow screen reader users to easily navigate through the page.</p>
<p>Here’s a breakdown of what each landmark is and how it's typically used:</p>
<h3 id="heading-navigation-section"><code>&lt;nav&gt;</code> – Navigation Section</h3>
<p>Used for menus, site-wide links, or breadcrumbs.</p>
<pre><code class="lang-plaintext">&lt;nav&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href="/about"&gt;About&lt;/a&gt;&lt;/li&gt;
    &lt;li&gt;&lt;a href="/courses"&gt;Courses&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/nav&gt;
</code></pre>
<p><strong>Real-world use</strong>: Jump straight to the navigation to find the “Contact” page without browsing through all the content.</p>
<h3 id="heading-primary-page-content"><code>&lt;main&gt;</code> – Primary Page Content</h3>
<p>Used once per page to wrap the most important content.</p>
<pre><code class="lang-plaintext">&lt;main&gt;
  &lt;h1&gt;Learn Accessibility&lt;/h1&gt;
  &lt;p&gt;This article explains how to use landmarks...&lt;/p&gt;
&lt;/main&gt;
</code></pre>
<p><strong>Real-world use</strong>: Skip past the header and sidebar to dive into the tutorial or article.</p>
<h3 id="heading-complementary-information"><code>&lt;aside&gt;</code> – Complementary Information</h3>
<p>Used for sidebars, ads, related links, or pull quotes.</p>
<pre><code class="lang-plaintext">&lt;aside&gt;
  &lt;h3&gt;Related Tutorials&lt;/h3&gt;
  &lt;ul&gt;
    &lt;li&gt;&lt;a href="/accessibility/forms"&gt;Accessible Forms&lt;/a&gt;&lt;/li&gt;
  &lt;/ul&gt;
&lt;/aside&gt;
</code></pre>
<p><strong>Real-world use</strong>: Users can skip the aside if they don’t want extra content, or jump to it for helpful resources.</p>
<h3 id="heading-page-footer"><code>&lt;footer&gt;</code> – Page Footer</h3>
<p>Used for closing content like copyright.</p>
<pre><code class="lang-plaintext">&lt;footer&gt;
  &lt;p&gt;&amp;copy; 2025 FreeCodeCamp. All rights reserved.&lt;/p&gt;
&lt;/footer&gt;
</code></pre>
<p><strong>Real-world use</strong>: Quickly navigate to support links, licensing info, or a newsletter sign-up.</p>
<h3 id="heading-top-of-page-or-section-header"><code>&lt;header&gt;</code> – Top-of-Page or Section Header</h3>
<p>Used for introductory content, such as logos or search bars.</p>
<pre><code class="lang-plaintext">&lt;header&gt;
  &lt;img src="logo.png" alt="Site Logo" /&gt;
  &lt;form role="search"&gt;
    &lt;input type="text" aria-label="Search site" /&gt;
  &lt;/form&gt;
&lt;/header&gt;
</code></pre>
<p><strong>Real-world use</strong>: Quickly access the search input or return to the homepage.</p>
<h2 id="heading-final-thoughts"><strong>Final Thoughts</strong></h2>
<p>Landmarks aren’t just an accessibility bonus—they’re a fundamental part of good UX. By implementing landmarks properly, you make your site easier to navigate for users with disabilities, comply with WCAG, and create a more predictable structure for everyone.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Audit Android Accessibility with the Accessibility Scanner App ]]>
                </title>
                <description>
                    <![CDATA[ The Web Content Accessibility Guidelines (WCAG 2.1 Level AA) is an internationally recognized standard for digital accessibility. Meeting these guidelines helps you make sure that your website is usable by people with visual, motor, hearing, and cogn... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-audit-android-accessibility-with-the-accessibility-scanner-app/</link>
                <guid isPermaLink="false">6862d146cc277a35bb68ec20</guid>
                
                    <category>
                        <![CDATA[ Android ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Mobile app accessibility testing ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Ilknur Eren ]]>
                </dc:creator>
                <pubDate>Mon, 30 Jun 2025 18:02:46 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751301060182/df4d483a-8dd6-45ce-a665-76cbf45ef945.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The Web Content Accessibility Guidelines (WCAG 2.1 Level AA) is an internationally recognized standard for digital accessibility. Meeting these guidelines helps you make sure that your website is usable by people with visual, motor, hearing, and cognitive impairments.</p>
<p>Google’s <a target="_blank" href="https://play.google.com/store/apps/details?id=com.google.android.apps.accessibility.auditor&amp;hl=en_US">Accessibility Scanner</a> on Google Play is a free app that offers developers, designers, and product leaders the ability to audit their app to find accessibility issues. The app is designed to highlight accessibility issues that might not meet the WCAG 2.1 Level AA standards. </p>
<p>Once installed, the Accessibility Scanner allows you to take screenshots or video recordings of your app, then highlights areas that may not meet accessibility requirements, like small touch targets, low color contrast, or missing content labels.</p>
<h3 id="heading-heres-what-well-cover">Here’s what we’ll cover:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-how-to-download-and-enable-the-accessibility-scanner">How to Download and Enable the Accessibility Scanner</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-the-accessibility-scanner">How to Use the Accessibility Scanner</a></p>
<ul>
<li><p><a class="post-section-overview" href="#heading-how-to-use-the-snapshot-feature">How to Use the Snapshot Feature</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-use-the-record-feature">How to Use the Record Feature</a></p>
</li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-why-use-the-accessibility-scanner">Why Use the Accessibility Scanner?</a></p>
</li>
</ol>
<h2 id="heading-how-to-download-and-enable-the-accessibility-scanner"><strong>How to Download and Enable the Accessibility Scanner</strong></h2>
<p>In five quick steps, you can download the Accessibility App and enable it on your Android device.</p>
<ol>
<li><p>Search “Accessibility Scanner” on Google Play Store and download it.</p>
</li>
<li><p>Find the downloaded app on your device and open it.</p>
</li>
<li><p>Turn on the Accessibility scanner by clicking on the “Turn on” button on the bottom right side of the page. This will take you to your Accessibility Settings.</p>
</li>
<li><p>In the Accessibility Setting page, click on the Accessibility Scanner button. This will take you to the Accessibility Scanner Settings.</p>
</li>
<li><p>Find Accessibility Scanner toggle and turn it on. (This will open a modal that asks if you allow “Accessibility Scanner” to have full control of your device, click Allow.</p>
</li>
</ol>
<p>After step five, you will have a blue checkmark icon will appear on the right side of your screen (see image below). This floating icon gives you quick access to start scanning any screen for accessibility issues.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750821547116/75f49863-7f19-4db5-ada1-45483c0df70b.png" alt="Facebook Log in Page with Accessibility Scanner toggle on the right with arrow pointing to it" class="image--center mx-auto" width="2680" height="1472" loading="lazy"></p>
<h2 id="heading-how-to-use-the-accessibility-scanner"><strong>How to Use the Accessibility Scanner</strong></h2>
<p>To scan or record your app to find accessibility issues, tap the blue checkmark icon. You’ll see a few options after clicking on the blue checkmark:</p>
<ul>
<li><p><strong>Record</strong>: Captures a short video of user interaction and generates a report of potential accessibility issues.</p>
</li>
<li><p><strong>Snapshot</strong>: Takes a static screenshot and flags issues found on that screen.</p>
</li>
<li><p><strong>Turn off:</strong> Turns the Accessibility Scanner off.</p>
</li>
<li><p><strong>Collapse:</strong> Collapses the options to show the initial blue checkmark.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750895121001/9673c7d5-5182-4c99-b36a-1b2a2e27986b.png" alt="Facebook Log in Page with Accessibility Scanner toggle opened on the right with arrow pointing to it" class="image--center mx-auto" width="1694" height="1038" loading="lazy"></p>
<p>You can choose between taking a single <strong>Snapshot</strong> or recording user flow using <strong>Record</strong> to evaluate multiple screens.</p>
<h3 id="heading-how-to-use-the-snapshot-feature">How to Use the Snapshot Feature</h3>
<p>The snapshot button will take a snapshot of the page you are currently in and give you a result of accessibility issues that may be on the page. The accessibility issues will be highlighted in red boxes.</p>
<p>The image below is the result of taking a snapshot of the Facebook log in page. The accessibility scanner states that there are 10 accessibility suggestions on this page alone.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750898582440/76cc763c-e6db-46a9-b062-2e29a57e7022.jpeg" alt="Facebook log in page with red boxes around several elements, highlighting accessibility issues." class="image--center mx-auto" width="1080" height="2400" loading="lazy"></p>
<p>You can click on the highlighted area in order to get more details of the potential accessibility issue. For example, you can click on the red box that is highlighting the “Mobile number or email” form that’s in the image above. Once you click on the highlighted area, you will get additional information.</p>
<p>The image below is the result of clicking on the “Mobile number or email” form element. Accessibility Scanner is highlighting errors it found on this email form.</p>
<p>The first suggestion it gives is to fix the item label, because the item may not have a label readable by screen readers. The second issue it highlights is the Touch Target and suggests that the target should be larger. The final suggestion is the Unexposed Text, possible text detected: Mobile number or email.</p>
<p>Snapshots allow us to take screenshots of our pages and highlight accessibility issues.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750898563142/ce93909e-b351-405c-8367-dd47d7d19c9f.jpeg" alt="Email form field is selected from Accessibility Scanner. Scanner shows three areas to fix." class="image--center mx-auto" width="1080" height="2400" loading="lazy"></p>
<h3 id="heading-how-to-use-the-record-feature">How to Use the Record Feature</h3>
<p>If you select to record, the Accessibility Scanner will take snapshots at intervals as you go through your app’s pages. To end the recording, tap the blue pause button (which replaces the original checkmark during recording).</p>
<p>Once you stop recording, Accessibility Scanner will give you the several snapshots and highlighted errors. The image below is the result of recording the Facebook log in page in less than a minute.</p>
<p>While recording, I navigated to other pages within the app. The recording gave 5 snapshots of the pages I was going through. You can see the snapshots on top of the page. In the image below, I am on screen one of five,. I can click to the other snapshots underneath the words, “Screen 1 of 5” and see issues for different snapshots taken during my recording. Similar to the snapshot accessibility audit, you can click on the red boxes and get more information on the errors.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750898542344/a390f512-262d-40c1-87ad-35e36c31def4.jpeg" alt="Facebook Log in Page with Accessibility Scanner highlighting elements with accessibility issues." class="image--center mx-auto" width="1080" height="2400" loading="lazy"></p>
<h2 id="heading-why-use-the-accessibility-scanner"><strong>Why Use the Accessibility Scanner?</strong></h2>
<p>The Accessibility Scanner is a valuable tool for teams throughout the app development lifecycle. Engineers can use it early in the process to scan the app locally, identify accessibility issues, and resolve them before release. During the QA phase, designers and product managers can use the scanner to audit user interfaces and flag potential accessibility concerns. Even after an app is in production, all teams can continue to use the scanner to monitor and improve accessibility.</p>
<p>But it’s important to note that the Accessibility Scanner is just one part of an accessibility strategy – it’s not a complete replacement for manual testing or audits. And it won’t catch all types of accessibility barriers – especially those that require keyboard navigation, screen reader testing, or cognitive usability reviews. But it is a simple and effective starting point for improving accessibility in Android apps.</p>
<p>You should use it alongside other tools, such as Android’s TalkBack for screen reader testing. Most importantly, real-world feedback from people who use assistive technologies is essential to identifying usability barriers that automated tools may miss.</p>
<p>With just a few taps, Accessibility Scanner helps surface issues that might otherwise be missed. It’s a free, lightweight, and essential tool for anyone building inclusive mobile experiences.</p>
<h2 id="heading-thanks-for-reading">Thanks for Reading!</h2>
<p>You should now know how to get started using the Accessibility Scanner to check your apps’ accessibility and make sure they’re usable by everyone.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create Accessible and User-Friendly Forms in React ]]>
                </title>
                <description>
                    <![CDATA[ When designing web applications, you’ll often be asked the age old question “How accessible is your website” and “Does it offer the best user experience?”. These are both very valid questions, but they are often overlooked in favour of rich or fancy ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-create-accessible-and-user-friendly-forms-in-react/</link>
                <guid isPermaLink="false">6810f59e8deee87383c1bc46</guid>
                
                    <category>
                        <![CDATA[ React ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Accessibility ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Tutorial ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Grant Riordan ]]>
                </dc:creator>
                <pubDate>Tue, 29 Apr 2025 15:51:58 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1745789677789/c386af23-39d6-4421-9f26-f98d75a30d61.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When designing web applications, you’ll often be asked the age old question “How accessible is your website” and “Does it offer the best user experience?”. These are both very valid questions, but they are often overlooked in favour of rich or fancy looking features, reducing the site’s audience.</p>
<p>In this article, I’ll teach you about the React Hook Form library, HTML attributes, and development considerations to make sure your site’s available for all, focusing on:</p>
<ul>
<li><p>blind or visually impaired users, who may use a screen reader</p>
</li>
<li><p>better user feedback</p>
</li>
<li><p>visual queues for all</p>
</li>
<li><p>design considerations for all</p>
</li>
</ul>
<p>Whilst following along with this tutorial, you can either pull down the code from the GitHub repo (by visiting this <a target="_blank" href="https://github.com/grant-dot-dev/form_accessibility_ux">page</a>), or you can use the inline code snippets within the article.</p>
<h3 id="heading-pre-requisites-for-this-article"><strong>Pre-requisites for this article:</strong></h3>
<ul>
<li><p>Knowledge of React</p>
</li>
<li><p>Knowledge of writing TypeScript and HTML / JSX.</p>
</li>
<li><p>Familiarity with Tailwind CSS (not required in order to follow this tutorial)</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-the-initial-basic-form">The Initial Basic Form</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-error-handling-with-react-hook-form">Error Handling With React-Hook-Form</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-hooking-up-the-useform-methods-to-our-form">Hooking Up The useForm Methods To Our Form</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-showing-error-messages">Showing Error Messages</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-aria-required">Adding aria-required</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-fieldset-and-legend">Adding fieldset and legend</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-adding-labels-and-using-htmlfor">Adding Labels and Using htmlFor</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-do-not-rely-on-placeholders-only">Do Not Rely on Placeholders Only!</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-give-additional-information-with-aria-describedby">Give Additional Information With aria-describedBy</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-avoid-tooltips-for-critical-information">Avoid Tooltips for Critical Information</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-tell-me-something-important">Tell Me Something Important</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-focus-states-and-colouring">Focus States and Colouring</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-make-buttons-descriptive">Make Buttons Descriptive</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-final-thoughts">Final Thoughts</a></p>
</li>
</ul>
<h2 id="heading-the-initial-basic-form">The Initial Basic Form</h2>
<p>So if we take a look at the form in its current state, you may think it looks fine. But it’s actually not very accessible, nor does it offer a great user experience.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { TvIcon } <span class="hljs-keyword">from</span> <span class="hljs-string">"@heroicons/react/24/outline"</span>;

<span class="hljs-keyword">type</span> FormData = {
    fullName: <span class="hljs-built_in">string</span>;
    email: <span class="hljs-built_in">string</span>;
    password: <span class="hljs-built_in">string</span>;
    confirmPassword: <span class="hljs-built_in">string</span>;
    agreeToTerms: <span class="hljs-built_in">boolean</span>;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> RegistrationForm = <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> onSubmit = <span class="hljs-function">() =&gt;</span> {
        alert(<span class="hljs-string">`Form submitted`</span>);
    };

    <span class="hljs-keyword">return</span> (
        &lt;div className=<span class="hljs-string">"flex justify-center items-center w-screen h-screen bg-gray-900"</span>&gt;
            &lt;div className=<span class="hljs-string">"w-full max-w-md p-8 bg-black bg-opacity-75 rounded-lg"</span>&gt;
                &lt;div className=<span class="hljs-string">"flex flex-row justify-center items-center gap-x-4"</span>&gt;
                    &lt;TvIcon className=<span class="hljs-string">"h-12 w-12 text-white"</span> /&gt;
                    &lt;h1 className=<span class="hljs-string">"text-7xl font-bold text-center text-red-600 mb-4"</span>&gt;Getflix&lt;/h1&gt;
                &lt;/div&gt;
                &lt;h2 className=<span class="hljs-string">"text-3xl font-bold text-white mb-6 text-center"</span>&gt;
                    Sign Up
                &lt;/h2&gt;

                &lt;form onSubmit={onSubmit} className=<span class="hljs-string">"space-y-6"</span>&gt;

                    {<span class="hljs-comment">/* Full Name */</span>}
                    &lt;div&gt;
                        &lt;input
                            <span class="hljs-keyword">type</span>=<span class="hljs-string">"text"</span>
                            placeholder=<span class="hljs-string">"Full Name"</span>
                            className=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 "</span>
                        /&gt;

                    &lt;/div&gt;

                    {<span class="hljs-comment">/* Email */</span>}
                    &lt;div&gt;
                        &lt;input
                            <span class="hljs-keyword">type</span>=<span class="hljs-string">"email"</span>
                            placeholder=<span class="hljs-string">"Email Address"</span>
                            className=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 "</span>
                        /&gt;

                    &lt;/div&gt;

                    {<span class="hljs-comment">/* Password */</span>}
                    &lt;div&gt;
                        &lt;input
                            <span class="hljs-keyword">type</span>=<span class="hljs-string">"password"</span>
                            placeholder=<span class="hljs-string">"Password"</span>
                            className=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400"</span>
                        /&gt;

                    &lt;/div&gt;

                    {<span class="hljs-comment">/* Confirm Password */</span>}
                    &lt;div&gt;
                        &lt;input
                            <span class="hljs-keyword">type</span>=<span class="hljs-string">"password"</span>
                            placeholder=<span class="hljs-string">"Confirm Password"</span>
                            className=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 "</span>
                        /&gt;

                    &lt;/div&gt;

                    {<span class="hljs-comment">/* Agree to Terms */</span>}
                    &lt;div className=<span class="hljs-string">"flex items-center text-gray-400 text-sm"</span>&gt;
                        &lt;input

                            <span class="hljs-keyword">type</span>=<span class="hljs-string">"checkbox"</span>
                            id=<span class="hljs-string">"agreeToTerms"</span>
                            className=<span class="hljs-string">"mr-2"</span>
                        /&gt;
                        &lt;label htmlFor=<span class="hljs-string">"agreeToTerms"</span> className=<span class="hljs-string">"select-none"</span>&gt;
                            I agree to the Terms and Conditions
                        &lt;/label&gt;
                    &lt;/div&gt;


                    {<span class="hljs-comment">/* Submit */</span>}
                    &lt;button
                        <span class="hljs-keyword">type</span>=<span class="hljs-string">"submit"</span>
                        className=<span class="hljs-string">"w-full py-3 bg-red-600 hover:bg-red-700 text-white rounded font-semibold transition"</span>
                    &gt;
                        Sign Up
                    &lt;/button&gt;


                &lt;/form&gt;
            &lt;/div&gt;
        &lt;/div&gt;
    );
};
</code></pre>
<h3 id="heading-whats-wrong-with-the-form">What’s Wrong With The Form?</h3>
<ul>
<li><p>Lack of action feedback – no user feedback means that users can become confused as to whether an action has happened or not. No error messages or feedback offers the user no insight into what they need to do to correct the form.</p>
</li>
<li><p>No labels for form inputs – No labels for form inputs prevent screen readers from understanding their purpose. Some screen readers may miss placeholders, and once a user types within the input, the placeholder is replaced, losing context and making it hard to return to erroneous inputs.</p>
</li>
<li><p>Lack of accessibility markup to make the form optimised for screen readers and accessibility tools.</p>
</li>
</ul>
<p>So how do we make this better? Let’s jump right in.</p>
<h2 id="heading-error-handling-with-react-hook-form">Error Handling With React-Hook-Form</h2>
<p>Error handling on forms is a critical aspect of any form submission flow. Without it, the process becomes both chaotic and frustrating for the user. We can alleviate this frustration by adding some useful error messages which explain the issues.</p>
<p>A popular library for working with forms in React is the <code>react-hook-form</code> library. It’s used by over 1.4 million people according to their GitHub statistics.</p>
<p>Go ahead and install it if you don’t have it already:</p>
<pre><code class="lang-bash">npm install react-hook-form
</code></pre>
<p>We will then implement the basic required functions from the <code>react-hook-form</code> package, using the <code>useForm()</code> hook like so:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// define our type structure to use within the form</span>
<span class="hljs-keyword">type</span> FormData = {
    fullName: <span class="hljs-built_in">string</span>;
    email: <span class="hljs-built_in">string</span>;
    password: <span class="hljs-built_in">string</span>;
    confirmPassword: <span class="hljs-built_in">string</span>;
    agreeToTerms: <span class="hljs-built_in">boolean</span>;
};

<span class="hljs-comment">// basic usage of `useForm()`</span>
<span class="hljs-keyword">const</span> {
    register,
    handleSubmit,
    watch,
    formState: { errors },
  } = useForm&lt;Inputs&gt;()
</code></pre>
<p><strong>Quick Explanation:</strong></p>
<ul>
<li><p><code>register</code>: One of the key concepts in React Hook Form is to “register” your component / HTML element. This means you can access value of the element for both form validation and when submitting the form.</p>
</li>
<li><p><code>handleSubmit</code>: This is the key function needed to submit the form, run validation, and any other configured checks. It can take up to two arguments:</p>
<ol>
<li><p><code>handleSubmit(onSuccess)</code> – called when the submission of the form is valid and can submit ok.</p>
</li>
<li><p><code>handleSubmit(onSuccess, onFail)</code> – here you can pass the <code>handleSubmit()</code> method two functions: the first will be run when React Hook Form deems the form to be valid, and allows you to continue. The second will be called when the form sees an error. This could be from validation, or another stipulation.</p>
</li>
</ol>
</li>
<li><p><code>watch</code>: Watch is a function that monitors a specified element for changes and returns its value. For instance, if you’re watching an input element, you can output the user’s typing in real-time or have another element validate it against a predefined value. A good example is a confirm password matching the previous password field.</p>
</li>
<li><p><code>formState</code>: this is an object which holds information about your form. The <code>formState</code> object keeps track of the state of the form, like:</p>
<ol>
<li><p><strong>isDirty</strong> – <code>true</code> if the user has changed <em>any</em> input.</p>
</li>
<li><p><strong>isValid</strong> – <code>true</code> if the form passes all validations.</p>
</li>
<li><p><strong>errors</strong> – an object holding any validation errors per field.</p>
</li>
<li><p><strong>isSubmitting</strong> – <code>true</code> while the form is being submitted (useful for showing loading spinners)</p>
</li>
<li><p><strong>isSubmitted</strong> – <code>true</code> after the form has been submitted.</p>
</li>
<li><p><strong>touchedFields</strong> – which fields the user has interacted with.</p>
</li>
<li><p><strong>dirtyFields</strong> – which fields the user has modified.</p>
</li>
</ol>
</li>
</ul>
<p>We can use any of these properties by including them in our form state object. We are destructing the <code>errors</code> property so we can use the errors later in our form to either show error messages, or validate that there no errors on the page.</p>
<h2 id="heading-hooking-up-the-useform-methods-to-our-form">Hooking Up the <code>useForm</code> Methods to Our Form</h2>
<p>Now that we know more about the <code>useForm()</code> method and react-hook-form, we need to integrate this with our existing <code>&lt;form/&gt;</code> element. Doing so will allow us to use all the react-hook-form features we’ve discussed so far in our form.</p>
<pre><code class="lang-xml">import { TvIcon } from "@heroicons/react/24/outline";
import { useState } from "react";
import { useForm } from "react-hook-form";

type FormData = {
    fullName: string;
    email: string;
    password: string;
    confirmPassword: string;
    agreeToTerms: boolean;
};

export const RegistrationForm = () =&gt; {
    const {
        register,
        handleSubmit,
        formState: { errors },
        watch,
    } = useForm<span class="hljs-tag">&lt;<span class="hljs-name">FormData</span>&gt;</span>();

    const onSubmit = () =&gt; {
        alert(`Form submitted`);
    };

    return (
        <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"flex justify-center items-center w-screen h-screen bg-gray-900"</span>&gt;</span>
            <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full max-w-md p-8 bg-black bg-opacity-75 rounded-lg"</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"flex flex-row justify-center items-center gap-x-4"</span>&gt;</span>
                    <span class="hljs-tag">&lt;<span class="hljs-name">TvIcon</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"h-12 w-12 text-red-500"</span> /&gt;</span>
                    <span class="hljs-tag">&lt;<span class="hljs-name">h1</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-7xl font-bold text-center text-white mb-4"</span>&gt;</span>Getflix<span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span>
                <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
                <span class="hljs-tag">&lt;<span class="hljs-name">h2</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-3xl font-bold text-white mb-6 text-center"</span>&gt;</span>
                    Sign Up
                <span class="hljs-tag">&lt;/<span class="hljs-name">h2</span>&gt;</span>


                <span class="hljs-tag">&lt;<span class="hljs-name">form</span> <span class="hljs-attr">onSubmit</span>=<span class="hljs-string">{handleSubmit(onSubmit)}</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"space-y-6"</span>&gt;</span>

                    {/* Full Name */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                            {<span class="hljs-attr">...register</span>("<span class="hljs-attr">fullName</span>", {
                                <span class="hljs-attr">required:</span> "<span class="hljs-attr">Full</span> <span class="hljs-attr">Name</span> <span class="hljs-attr">is</span> <span class="hljs-attr">required</span>"
                            })}
                            <span class="hljs-attr">aria-required</span>
                            <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span>
                            <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Full name"</span>
                            <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
                        /&gt;</span>
                        {errors.fullName &amp;&amp; (
                            <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.fullName.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                        )}
                    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

                    {/* Email */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                            {<span class="hljs-attr">...register</span>("<span class="hljs-attr">email</span>", {
                                <span class="hljs-attr">required:</span> "<span class="hljs-attr">Email</span> <span class="hljs-attr">is</span> <span class="hljs-attr">required</span>",
                                <span class="hljs-attr">pattern:</span> {
                                    <span class="hljs-attr">value:</span> /^\<span class="hljs-attr">S</span>+@\<span class="hljs-attr">S</span>+$/<span class="hljs-attr">i</span>,
                                    <span class="hljs-attr">message:</span> "<span class="hljs-attr">Invalid</span> <span class="hljs-attr">email</span> <span class="hljs-attr">address</span>",
                                },
                            })}
                            <span class="hljs-attr">type</span>=<span class="hljs-string">"email"</span>
                            <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Email Address"</span>
                            <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
                        /&gt;</span>
                        {errors.email &amp;&amp; (
                            <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.email.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                        )}

                    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

                    {/* Password */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                            {<span class="hljs-attr">...register</span>("<span class="hljs-attr">password</span>", {
                                <span class="hljs-attr">required:</span> "<span class="hljs-attr">Please</span> <span class="hljs-attr">enter</span> <span class="hljs-attr">your</span> <span class="hljs-attr">password</span>",
                            })}
                            <span class="hljs-attr">type</span>=<span class="hljs-string">"password"</span>
                            <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Password"</span>
                            <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
                        /&gt;</span>
                        {errors.password &amp;&amp; (
                            <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.password.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                        )}
                    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

                    {/* Confirm Password */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                            {<span class="hljs-attr">...register</span>("<span class="hljs-attr">confirmPassword</span>", {
                                <span class="hljs-attr">required:</span> "<span class="hljs-attr">Please</span> <span class="hljs-attr">enter</span> <span class="hljs-attr">your</span> <span class="hljs-attr">password</span>",
                                <span class="hljs-attr">validate:</span> (<span class="hljs-attr">value</span>) =&gt;</span>
                                    value === watch("password") || "Passwords do not match",
                            })}
                            type="password"
                            placeholder="Confirm Password"
                            className="w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"
                        /&gt;
                        {errors.confirmPassword &amp;&amp; (
                            <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.confirmPassword.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                        )}
                    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

                    {/* Agree to Terms */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"flex items-center text-gray-400 text-sm"</span>&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
                            {<span class="hljs-attr">...register</span>("<span class="hljs-attr">agreeToTerms</span>", {
                                <span class="hljs-attr">required:</span> "<span class="hljs-attr">You</span> <span class="hljs-attr">must</span> <span class="hljs-attr">agree</span> <span class="hljs-attr">to</span> <span class="hljs-attr">the</span> <span class="hljs-attr">terms</span> <span class="hljs-attr">and</span> <span class="hljs-attr">conditions</span>"
                            })}
                            <span class="hljs-attr">type</span>=<span class="hljs-string">"checkbox"</span>
                            <span class="hljs-attr">id</span>=<span class="hljs-string">"agreeToTerms"</span>
                            <span class="hljs-attr">className</span>=<span class="hljs-string">"mr-2"</span>
                        /&gt;</span>
                        <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"select-none"</span>&gt;</span>
                            I agree to the Terms and Conditions
                        <span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>

                    <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
                    {errors.agreeToTerms &amp;&amp; (
                        <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.agreeToTerms.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                    )}


                    {/* Submit */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">button</span>
                        <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>
                        <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full py-3 bg-red-600 hover:bg-red-700 text-white rounded font-semibold transition"</span>
                    &gt;</span>
                        Sign Up
                    <span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>

                    {/* Already have account */}
                    <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-center text-gray-400 text-sm mt-4"</span>&gt;</span>
                        Already have an account?{" "}
                        <span class="hljs-tag">&lt;<span class="hljs-name">a</span> <span class="hljs-attr">href</span>=<span class="hljs-string">"#"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 hover:underline"</span>&gt;</span>
                            Sign In
                        <span class="hljs-tag">&lt;/<span class="hljs-name">a</span>&gt;</span>
                    <span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
                <span class="hljs-tag">&lt;/<span class="hljs-name">form</span>&gt;</span>
            <span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
        <span class="hljs-tag">&lt;/<span class="hljs-name">div</span> &gt;</span>
    );
};
</code></pre>
<p>So in the updated form code, we’ve made a few adjustments:</p>
<h3 id="heading-registered-each-our-elements">Registered Each Our Elements</h3>
<p>For each of our elements we’ve added the <code>register</code> object, and configuring some overrides.</p>
<p>We added the <strong>required</strong> property to all input fields, which checks if the element has a value. If not, it records the provided name and marks the error as erroneous, updating the errors object with our name and the provided <em>required</em> message.</p>
<pre><code class="lang-typescript"> {...register(<span class="hljs-string">"fullName"</span>, {
    required: <span class="hljs-string">"Full Name is required"</span>
  })}
</code></pre>
<p>We’ve added a <code>pattern</code> property on the email’s register object. This allows us to specify a criteria for the value of the input – perfect for passwords, email fields, and other inputs which may have value restrictions, or requirements.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// valid email pattern</span>
pattern: {
    value: <span class="hljs-regexp">/^\S+@\S+$/i</span>,
    message: <span class="hljs-string">"Invalid email address"</span>,
},
</code></pre>
<p>We have also added the <code>validate</code> property to the confirm password element. This is a given function that will run as the user types.</p>
<pre><code class="lang-typescript">validate: <span class="hljs-function">(<span class="hljs-params">value</span>) =&gt;</span> value === watch(<span class="hljs-string">"password"</span>) || <span class="hljs-string">"Passwords do not match"</span>
</code></pre>
<p>The <code>validate</code> function inside <code>register</code> is run <strong>automatically</strong> based on the field's <code>validationMode</code> setting.</p>
<p>By default (if you do not specify the <code>validationMode</code>), React Hook Form runs validation on <code>onChange</code> and <code>onBlur</code> events. This means that:</p>
<ul>
<li><p>When the user types into the input → it triggers <code>validate</code>.</p>
</li>
<li><p>When the user leaves (blurs) the input → it triggers <code>validate</code> again.</p>
</li>
</ul>
<p>If you wanted to update the custom validation mode, you can override this using the <code>mode</code> setting within <code>useForm()</code> like so:</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">const</span> { register, handleSubmit, formState, trigger } = useForm({
    mode: <span class="hljs-string">"onSubmit"</span>,
  });
</code></pre>
<p>If you then want to go an extra step and update the mode per element, overriding the <code>mode</code> setting you just globally set for your form, you can use the <code>trigger()</code> method from <code>useForm</code> like so:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">input</span>
  {<span class="hljs-attr">...register</span>("<span class="hljs-attr">email</span>", { <span class="hljs-attr">required:</span> "<span class="hljs-attr">Email</span> <span class="hljs-attr">is</span> <span class="hljs-attr">required</span>" })}
  <span class="hljs-attr">onBlur</span>=<span class="hljs-string">{()</span> =&gt;</span> trigger("email")} // validate this field onBlur manually
/&gt;
</code></pre>
<p>This allows you to have <code>onSubmit</code> validation set via <code>mode</code>, and then email is triggered via <code>onBlur()</code> too.</p>
<p>Just adding these simple settings within the react-hook-form library already gives us a much better user experience than before – but it isn’t everything. Let’s explore more settings, HTML, and attributes we can add to increase accessibility and user experience.</p>
<h2 id="heading-showing-error-messages">Showing Error Messages</h2>
<p>Form errors can be stored within the <code>formState</code> object we mentioned earlier, but they’re no good there – we need to display them to our users. We can achieve this simply by accessing the destructed <code>errors</code> object, like below:</p>
<pre><code class="lang-xml">{errors.password &amp;&amp; (
    <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>{errors.password.message}<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
)}
</code></pre>
<p>The code uses conditional syntax to show the <code>&lt;p&gt;</code> tag only if the <code>errors.password</code> object has a value, indicating an error associated with the password field from <code>useForm()</code> checks. We can then display the error message from <code>errors.password.message</code>, combined with a commonly used erroneous colour like red, to highlight the form’s problems. This can then been applied to all other input fields as per the code above.</p>
<h2 id="heading-adding-aria-required">Adding <code>aria-required</code></h2>
<p>So we’ve informed the form that certain elements are required and these should be checked when submitting the form. But this alone doesn’t inform visually impaired users that the element is required.</p>
<p>To aid with screen-readers, we can add an <code>aria</code> attribute to our element which will be read by the screen-reader. This property is the <code>aria-required</code> property. This means that when the screen-reader reads out information about the element it will inform the user that this value is required for successful submission.</p>
<pre><code class="lang-xml"> <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
    {<span class="hljs-attr">...register</span>("<span class="hljs-attr">fullName</span>", {
        <span class="hljs-attr">required:</span> "<span class="hljs-attr">Full</span> <span class="hljs-attr">Name</span> <span class="hljs-attr">is</span> <span class="hljs-attr">required</span>"
    })}
    <span class="hljs-attr">aria-required</span>
    <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span>
    <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Full name"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
/&gt;</span>
</code></pre>
<h2 id="heading-adding-fieldset-and-legend">Adding <code>fieldset</code> and <code>legend</code></h2>
<p>Fieldset elements group <code>&lt;form&gt;</code> controls together, while legend elements provide a description for the grouped controls.</p>
<p>Imagine you have one big form, but it spans two "sections" – for example, a "<em>User Details</em>" section for username, email, and passwords, and an "<em>Address Details</em>" section asking for your shipping and billing information.</p>
<p>In this tutorial, we’re using TailwindCSS, which provides a utility class called <code>sr-only</code>. You can apply <code>sr-only</code> to your legends so they are only visible to screen readers, and not actually visible on the page.</p>
<p>This way, the legend will be read aloud when users navigate into a section of the form, making it clear which part of the form they are interacting with.</p>
<p><strong>Important Note:</strong> Legends must be placed inside fieldsets. You need to wrap your legends within a <code>&lt;fieldset&gt;</code> element for your HTML to be valid and accessible.</p>
<p>Here's an unrelated example (to keep it brief and simple):</p>
<pre><code class="lang-xml">
  <span class="hljs-tag">&lt;<span class="hljs-name">fieldset</span>&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">legend</span>&gt;</span>Payment Method<span class="hljs-tag">&lt;/<span class="hljs-name">legend</span>&gt;</span>    
    <span class="hljs-tag">&lt;<span class="hljs-name">label</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"radio"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"payment"</span> <span class="hljs-attr">value</span>=<span class="hljs-string">"card"</span> /&gt;</span>
      Credit Card
    <span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">label</span>&gt;</span>
      <span class="hljs-tag">&lt;<span class="hljs-name">input</span> <span class="hljs-attr">type</span>=<span class="hljs-string">"radio"</span> <span class="hljs-attr">name</span>=<span class="hljs-string">"payment"</span> <span class="hljs-attr">value</span>=<span class="hljs-string">"paypal"</span> /&gt;</span>
      PayPal
    <span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
  <span class="hljs-tag">&lt;/<span class="hljs-name">fieldset</span>&gt;</span>
</code></pre>
<p>You can see that the payment option inputs have been grouped within a fieldset, and then described by the <code>legend</code> element, informing the user that these elements relate to “<em>Payment Method</em>”. You as the developer can then decide if you would like this shown to everyone, or if it’s only for visually impaired users.</p>
<p>For screen readers, they’d hear something like:</p>
<blockquote>
<p>"Group: Payment Method. Credit Card radio button. PayPal radio button."</p>
</blockquote>
<h2 id="heading-do-not-rely-on-placeholders-only">Do Not Rely on Placeholders Only!</h2>
<p>Placeholders are a great addition to make it clear to the user what the input elements are used for, and show helpful information. But they aren’t that user friendly, especially in regards to screen-readers.</p>
<p>The main reasons for this are:</p>
<ul>
<li><p>Placeholders disappear when typing, meaning that if a user begins to type “<em>Grant</em>”, and then tabs away from the input when they go back, without a label it will simply read the value of the input, not what it relates to.</p>
</li>
<li><p>Often developers utilise a grey-like colour for their placeholders, with a low opacity. This can mean it’s difficult for users to sometimes see the placeholder, especially those who are colour blind or visually impaired.</p>
</li>
</ul>
<p>So what can we do instead ? Well this leads me onto our next point – we can use a common HTML element, the <code>&lt;label/&gt;</code>.</p>
<h2 id="heading-adding-labels-and-using-htmlfor">Adding Labels and Using <code>htmlFor</code></h2>
<p>Another accessibility feature we can add to boost our accessibility and user experience for all, is the <code>htmlFor</code> attribute combined with the <code>&lt;label/&gt;</code> element.</p>
<p>Labels are highly important for both sighted and visually impaired users. It offers clarity as to what the input is associated with, as well as a navigational tool for those using screen-readers.</p>
<p>The <code>htmlFor</code> attribute is used to link <code>&lt;label/&gt;</code> elements with their input.</p>
<p><strong>Note:</strong> <code>htmlFor</code> <em>attributes can only be used on labels and are not valid on any other element.</em></p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">htmlFor</span>=<span class="hljs-string">"fullname"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-white"</span>&gt;</span>Full Name<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">input</span>
    {<span class="hljs-attr">...register</span>("<span class="hljs-attr">fullName</span>", {
        <span class="hljs-attr">required:</span> "<span class="hljs-attr">Full</span> <span class="hljs-attr">Name</span> <span class="hljs-attr">is</span> <span class="hljs-attr">required</span>"
    })}
    <span class="hljs-attr">id</span>=<span class="hljs-string">"fullname"</span>
    <span class="hljs-attr">aria-required</span>
    <span class="hljs-attr">type</span>=<span class="hljs-string">"text"</span>
    <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Full name"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
/&gt;</span>
</code></pre>
<p>Why this is important for accessibility:</p>
<h4 id="heading-1-screen-readers">1. Screen readers:</h4>
<p>When a screen reader lands on the <code>&lt;input&gt;</code>, it automatically reads the associated label ("Full Name"). Even if the label is not visually right next to the input, the screen reader still knows which text describes the input, giving you some freedom when designing your forms.</p>
<h4 id="heading-2-click-behaviour">2. Click behaviour:</h4>
<p>When you click the <code>&lt;label&gt;</code>, it automatically focuses the <code>&lt;input&gt;</code> when using <code>htmlFor</code>.</p>
<p>Users don’t have to click exactly on the tiny input field – and this can certainly be useful when dealing with checkboxes or radio buttons, for example.</p>
<p>In short, big click targets = better usability and faster form filling.</p>
<p>This is also very helpful for mobile users where precision tapping is hard, especially on smaller screens.</p>
<h2 id="heading-give-additional-information-with-aria-describedby"><strong>Give Additional Information With</strong> <code>aria-describedBy</code></h2>
<p>Now that we’ve added clear labels to our form fields, we can take accessibility a step further by providing additional guidance for users when errors occur. By using <code>aria-describedby</code> and <code>aria-invalid</code>, we can link helpful error messages to the input fields and ensure screen readers communicate validation issues clearly. Let’s look at how to implement this:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">label</span> <span class="hljs-attr">htmlFor</span>=<span class="hljs-string">"email"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-white"</span>&gt;</span>Email<span class="hljs-tag">&lt;/<span class="hljs-name">label</span>&gt;</span>
  <span class="hljs-tag">&lt;<span class="hljs-name">input</span>
    {<span class="hljs-attr">...register</span>("<span class="hljs-attr">email</span>", {
          <span class="hljs-attr">required:</span> "<span class="hljs-attr">You</span> <span class="hljs-attr">must</span> <span class="hljs-attr">enter</span> <span class="hljs-attr">an</span> <span class="hljs-attr">email</span> <span class="hljs-attr">address</span>",
      <span class="hljs-attr">pattern:</span> {
        <span class="hljs-attr">value:</span> /^\<span class="hljs-attr">S</span>+@\<span class="hljs-attr">S</span>+$/<span class="hljs-attr">i</span>,
        <span class="hljs-attr">message:</span> "<span class="hljs-attr">Invalid</span> <span class="hljs-attr">email</span> <span class="hljs-attr">address</span>",
      },
    })}
    <span class="hljs-attr">id</span>=<span class="hljs-string">"email"</span>
    <span class="hljs-attr">type</span>=<span class="hljs-string">"email"</span>
    <span class="hljs-attr">aria-invalid</span>=<span class="hljs-string">{errors.email</span> ? "<span class="hljs-attr">true</span>" <span class="hljs-attr">:</span> "<span class="hljs-attr">false</span>"}
    <span class="hljs-attr">aria-describedby</span>=<span class="hljs-string">{errors.email</span> ? "<span class="hljs-attr">email-error</span>" <span class="hljs-attr">:</span> <span class="hljs-attr">undefined</span>}
    <span class="hljs-attr">placeholder</span>=<span class="hljs-string">"Enter your email address"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full p-3 rounded bg-gray-700 text-white placeholder-gray-400 focus:outline-none focus:ring-2 focus:ring-red-500"</span>
  /&gt;</span>
  {errors.email &amp;&amp; (
    <span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"email-error"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-red-500 text-sm mt-1"</span>&gt;</span>
      {errors.email.message}
    <span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
  )}
<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
</code></pre>
<p>Notice the two new attributes we’ve added:</p>
<ul>
<li><p><code>aria-describedBy</code> – this attribute links our error message with our input. Screen readers will therefore read out the error message whilst reading out other information when the input is focused.</p>
</li>
<li><p><code>aria-invalid</code> – this attribute again aids with screen readers, informing the user that the input’s value is invalid and they need to correct it. This combined with the <code>describedBy</code> attribute gives visually impaired users all the information they need in order to correct their mistake.</p>
</li>
</ul>
<h2 id="heading-avoid-tooltips-for-critical-information">Avoid Tooltips for Critical Information</h2>
<p>When developing your form, try to avoid tooltips (those little elements that show when you hover over another element for a period of time like below).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745685939025/ce427cf1-ef44-4021-b8d3-0095e7a091c6.png" alt="Example of a tooltip showing text that appears when a user hovers over a term." class="image--center mx-auto" width="824" height="228" loading="lazy"></p>
<p>The problems with using tooltips are:</p>
<ol>
<li><p>They often require <strong>mouse hover</strong>, which doesn't work on touch devices (for example mobile phones, or tablets).</p>
</li>
<li><p>They aren’t announced reliably by screen readers if proper aria labels aren’t added.</p>
</li>
<li><p>They disappear too quickly</p>
</li>
</ol>
<p>Instead, we can use inline helper text or descriptions combined with <code>aria-describedby</code> like below:</p>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"passwordHint"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"text-xs text-gray-500"</span>&gt;</span>
  Must be at least 8 characters and include a number.
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
</code></pre>
<p>We can then reference this within our input using the <code>aria-describedBy</code> attribute. But wait, we already have a <code>describedBy</code> pointing at the error message – well, that’s ok! We can link multiple elements, like the brief example below:</p>
<pre><code class="lang-xml">// now references both passwordHint and the password error (we seperate the ids with a space)
<span class="hljs-tag">&lt;<span class="hljs-name">input</span> 
  <span class="hljs-attr">id</span>=<span class="hljs-string">"password"</span>
  <span class="hljs-attr">aria-describedby</span>=<span class="hljs-string">"passwordHint passwordError"</span>
/&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"passwordHint"</span>&gt;</span>
  Must be at least 8 characters long.
<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">div</span> <span class="hljs-attr">id</span>=<span class="hljs-string">"passwordError"</span>&gt;</span>
  Passwords do not match!
<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
</code></pre>
<h2 id="heading-tell-me-something-important">Tell Me Something Important</h2>
<p><code>aria-live</code> is an aria attribute you can add to an element to tell screen readers:</p>
<blockquote>
<p>"Hey, if the content inside me changes, announce it automatically."</p>
</blockquote>
<p>It makes dynamic content updates audible without needing the user to re-focus anything.</p>
<p>A basic example could look something like below, where a message which is updated upon submission is updated, it could contain something like:</p>
<blockquote>
<p>“Loading” → “<em>Hurray, registration complete”</em></p>
<p>or</p>
<p>““Pending” → “Registration failed due to many errors”</p>
</blockquote>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">aria-live</span>=<span class="hljs-string">"polite"</span>&gt;</span>
  {formSubmissionResultMessage}
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
</code></pre>
<p>When <code>formSubmissionResultMessage</code> changes, screen readers will automatically announce the updated message.</p>
<p>The timing of when it is read out depends on the value of the <code>aria-live</code> attribute – with <code>polite</code>, the announcement waits for a natural pause. With <code>assertive</code>, it interrupts immediately.</p>
<h3 id="heading-real-world-examples">Real-World Examples</h3>
<h4 id="heading-polite-update-good-for-passive-notifications">Polite update: good for passive notifications</h4>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">aria-live</span>=<span class="hljs-string">"polite"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"mt-2 text-green-500"</span>&gt;</span>
  Form saved successfully.
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
</code></pre>
<p>The screen reader waits for a good moment to say it.</p>
<h4 id="heading-assertive-update-good-for-urgent-errors">Assertive update: good for urgent errors</h4>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">p</span> <span class="hljs-attr">aria-live</span>=<span class="hljs-string">"assertive"</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"mt-2 text-red-500"</span>&gt;</span>
  Passwords do not match!
<span class="hljs-tag">&lt;/<span class="hljs-name">p</span>&gt;</span>
</code></pre>
<p>The screen reader <strong>immediately</strong> interrupts and announces it.</p>
<h4 id="heading-good-things-to-know">Good things to know:</h4>
<ul>
<li><p>The element needs to <strong>already exist</strong> in the DOM when the update happens. So it’s smart to always render the <code>&lt;p aria-live&gt;</code> – just update its content.</p>
</li>
<li><p>Don’t overuse <code>assertive</code>, or you’ll annoy users and make apps feel super noisy and overwhelming.</p>
</li>
</ul>
<h2 id="heading-focus-states-and-colouring">Focus States and Colouring</h2>
<p>You may have noticed on the input elements that I have added some custom colouring with TailwindCSS classes <code>focus:</code>. But what is this doing?</p>
<p>Well, this allows us to control the focus colour of the inputs. Without this, the browser will apply its own default styling which may not be as accessible to our users, especially those with colour-blindness.</p>
<p>For example, within our form, without the styling the input with focus looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745687500364/4bfa9c86-d908-4f2b-a674-485a1ed15bc3.png" alt="image of the form with a faint white and blue outline around the focussed input" class="image--center mx-auto" width="514" height="673" loading="lazy"></p>
<p>Here you can see it has applied a subtle white and blue outline – but its not that clear it’s being focused. You can argue it is different enough to other input elements, but for some users this may not be enough.</p>
<p>To combat this and improve usability, we can override this with our own custom colouring. When using TailwindCSS, we can apply the following class names:</p>
<pre><code class="lang-xml">focus:outline-none focus:ring-2 focus:ring-red-500
</code></pre>
<h3 id="heading-what-does-this-do">What Does This Do?</h3>
<p>This now applies a much thicker red line (encompassing brand colours) as well as making it clearer against the darker background</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Class name</strong></td><td><strong>Meaning (CSS equivalent)</strong></td></tr>
</thead>
<tbody>
<tr>
<td><code>focus:outline-none</code></td><td>Remove the outline when the element is focused</td></tr>
<tr>
<td><code>focus:ring-2</code></td><td>On focus, apply a <strong>2px wide ring</strong> (like a border/shadow)</td></tr>
<tr>
<td><code>focus:ring-red-500</code></td><td>Set the ring colour to Tailwind’s <code>red-500</code> colour</td></tr>
</tbody>
</table>
</div><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745687470893/c003c194-5f93-491d-a16e-4d983ea557ac.png" alt="image of form with thick red outline around the focussed input" class="image--center mx-auto" width="537" height="677" loading="lazy"></p>
<p>If you’re not using TailwindCSS, you can accomplish the same with plain CSS like so:</p>
<pre><code class="lang-css"><span class="hljs-selector-tag">input</span><span class="hljs-selector-pseudo">:focus</span> {
  <span class="hljs-attribute">outline</span>: none; <span class="hljs-comment">/* no default browser outline */</span>
  <span class="hljs-attribute">box-shadow</span>: <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">0</span> <span class="hljs-number">2px</span> <span class="hljs-number">#ef4444</span>; <span class="hljs-comment">/* 2px red ring around input */</span>
}
</code></pre>
<h2 id="heading-make-buttons-descriptive">Make Buttons Descriptive</h2>
<p>A super simple way to level up your form’s user experience is to make sure your buttons use clear, descriptive text.</p>
<p>Let’s take a look at a few examples of buttons that don’t quite achieve this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745688682031/54a84908-e0fd-4781-ab83-6c34ab360cae.png" alt="image showing different poor input buttons" class="image--center mx-auto" width="545" height="336" loading="lazy"></p>
<p>The above buttons are examples of poor input buttons because:</p>
<ul>
<li><p>“Click Here” doesn’t give any context. Screen reader users, and even sighted users, have no idea what "click here" does without reading nearby text.</p>
</li>
<li><p>Icon Only: Sighted users <em>might</em> guess what the icon means, but screen readers see nothing unless you add <code>aria-label</code>. The point is, it is ambiguous and unclear as to what the button does. You may see websites that just use an icon, not surrounded by a button, which can be even more confusing.</p>
</li>
<li><p>“Submit”: If you have several "Submit" buttons (for example, one for payment, one for contact form), users don't know which "submit" is doing what.</p>
</li>
</ul>
<h3 id="heading-improvements">Improvements</h3>
<p>Instead, we can improve those buttons to be more accessible and user-friendly by doing the following:</p>
<ul>
<li><p><strong>Use descriptive button text</strong> – for example: "Pay Now", "Sign Up", or "Save Changes".</p>
</li>
<li><p><strong>Use both an icon and text</strong> – combining an icon with text can be the perfect blend for both accessibility and design.</p>
</li>
<li><p><strong>Use</strong> <code>aria-label</code> – if you really must use an icon-only button (like a basket or home icon in a navigation bar), make sure to add an <code>aria-label</code> attribute to clearly describe the button’s purpose, like so:</p>
</li>
</ul>
<pre><code class="lang-xml"><span class="hljs-tag">&lt;<span class="hljs-name">button</span> 
    <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full py-3 px-6 rounded-lg bg-red-600 hover:bg-red-700 focus:outline-none focus:ring-2 focus:ring-red-500 text-white text-lg font-semibold transition"</span>
&gt;</span> Pay Now <span class="hljs-tag">&lt;<span class="hljs-name">button</span>&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">button</span>
    <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full py-3 px-6 rounded-lg bg-blue-600 hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 text-white text-lg font-semibold flex justify-center items-center gap-2 transition"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">HomeIcon</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"h-6 w-6"</span> /&gt;</span>
        Home
<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>

<span class="hljs-tag">&lt;<span class="hljs-name">button</span>
    <span class="hljs-attr">type</span>=<span class="hljs-string">"submit"</span>
    <span class="hljs-attr">aria-label</span>=<span class="hljs-string">"Go to homepage"</span>
    <span class="hljs-attr">className</span>=<span class="hljs-string">"w-full py-3 px-6 rounded-lg bg-blue-600 hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 text-white text-lg font-semibold flex justify-center items-center transition"</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">HomeIcon</span> <span class="hljs-attr">className</span>=<span class="hljs-string">"h-6 w-6"</span> /&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">button</span>&gt;</span>
</code></pre>
<p>That code would generate the following:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1745787493605/e5259e87-f8ee-463f-bc57-662779eea698.png" alt="image showing more accessible buttons from above html" class="image--center mx-auto" width="513" height="326" loading="lazy"></p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>In this tutorial, we’ve covered various ways to make your forms more accessible and user-friendly. From simple things like making button text clearer and using more user-friendly colours, to more complex HTML attributes like <code>aria-describedBy</code> and <code>aria-live</code>, you should be covered.</p>
<p>I hope you found this tutorial helpful, and now you’re ready to take your development skills to the next level. Making these simple changes can have a big impact on your users’ experience, and they’ll definitely stick around longer and be less frustrated.</p>
<p>As always, if you’d like to share feedback on the article, discuss it further, or just hear about future articles or content, you can drop me a follow on X (Twitter) via my handle <a target="_blank" href="https://x.com/grantdotdev">@grantdotdev</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
