How to Build AI Speech-to-Text and Text-to-Speech Accessibility Tools with Python

Classrooms today are more diverse than ever before. Among the students are neurodiverse learners with different learning needs. While these learners bring unique strengths, traditional teaching methods don’t always meet their needs.

This is where AI-driven accessibility tools can make a difference. From real-time captioning to adaptive reading support, artificial intelligence is transforming classrooms into more inclusive spaces.

In this article, you’ll:

Understand what inclusive education means in practice.
See how AI can support neurodiverse learners.
Try two hands-on Python demos:
- Speech-to-Text using local Whisper (free, no API key).
- Text-to-Speech using Hugging Face SpeechT5.
Get a ready-to-use project structure, requirements**,** and troubleshooting tips for Windows and macOS/Linux users.

Prerequisites
A Note on Missing Files
What Inclusive Education Really Means
Toolbox: Five AI Accessibility Tools Teachers Can Try Today
Platform Notes (Windows vs macOS/Linux)
Hands-On: Build a Simple Accessibility Toolkit (Python)
Quick Setup Cheatsheet
From Code to Classroom Impact
Developer Challenge: Build for Inclusion
Challenges and Considerations
Looking Ahead
Conclusion

Prerequisites

Before you start, make sure you have the following:

Python 3.8 or later versions installed (for Windows users, in case you don’t have it installed, you can download the latest version at: python.org. macOS users usually already have python3).
Virtual environment set up (venv) — recommended to keep things clean.
You have to install FFmpeg (This is required for Whisper to read audio files).
PowerShell (Windows) or Terminal (macOS/Linux).
Basic familiarity with running Python scripts.

Tip: If you’re new to Python environments, the you shouldn’t worry because the setup commands will be included with each step below.

A Note on Missing Files

Some files are not included in the GitHub repository. This is intentional, they are either generated automatically or should be created/installed locally:

.venv/ → Your virtual environment folder. Each reader should create their own locally with:
```
  python -m venv .venv
```
1. FFmpeg Installation:
  - Windows: FFmpeg is not included in the project files because it is large (approximately 90 MB). Users are instructed to download the FFmpeg build separately.
  - macOS: Users can install FFmpeg using the Homebrew package manager with the command brew install ffmpeg.
  - Linux: Users can install FFmpeg using the package manager with the command sudo apt install ffmpeg.
2. Output File:
  - output.wav is a file generated when you run the Text-to-Speech script. This file is not included in the GitHub repository, it is created locally on your machine when you execute the script.

To keep the repo clean, these are excluded using .gitignore:

# Ignore virtual environments
.venv/
env/
venv/

# Ignore binary files
ffmpeg.exe
*.dll
*.lib

# Ignore generated audio (but keep sample input)
*.wav
*.mp3
!lesson_recording.mp3

The repository does include all essential files needed to follow along:

requirements.txt (see below)
transcribe.py and tts.py(covered step-by-step in the Hands-On section).

requirements.txt

openai-whisper
transformers
torch
soundfile
sentencepiece
numpy

This way, you’ll have everything you need to reproduce the project.

What Inclusive Education Really Means

Inclusive education goes beyond placing students with diverse needs in the same classroom. It’s about designing learning environments where every student can thrive.

Common barriers include:

Reading difficulties (for example, dyslexia).
Communication challenges (speech/hearing impairments).
Sensory overload or attention struggles (autism, ADHD).
Note-taking and comprehension difficulties.

AI can help reduce these barriers with captioning, reading aloud, adaptive pacing, and alternative communication tools.

Toolbox: Five AI Accessibility Tools Teachers Can Try Today

Microsoft Immersive Reader – Text-to-speech, reading guides, and translation.
Google Live Transcribe – Real-time captions for speech/hearing support.
Otter.ai – Automatic note-taking and summarization.
Grammarly / Quillbot – Writing assistance for readability and clarity.
Seeing AI (Microsoft) – Describes text and scenes for visually impaired learners.

Real-World Examples

A student with dyslexia can use Immersive Reader to listen to a textbook while following along visually. Another student with hearing loss can use Live Transcribe to follow class discussions. These are small technology shifts that create big inclusion wins.

Platform Notes (Windows vs macOS/Linux)

Most code works the same across systems, but setup commands differ slightly:

Creating a virtual environment

To create and activate a virtual environment in PowerShell using Python 3.8 or higher, you can follow these steps:

Create a virtual environment:
```
 py -3.12 -m venv .venv
```
Activate the virtual environment:
```
 .\.venv\Scripts\Activate
```

Once activated, your PowerShell prompt should change to indicate that you are now working within the virtual environment. This setup helps manage dependencies and keep your project environment isolated.

For Mac OS users to create and activate a virtual environment in a bash shell using Python 3, you can follow these steps:

Create a virtual environment:
```
 python3 -m venv .venv
```
Activate the virtual environment:
```
 source .venv/bin/activate
```

Once activated, your bash prompt should change to indicate that you are now working within the virtual environment. This setup helps manage dependencies and keep your project environment isolated.

To install FFmpeg on Windows, follow these steps:

Download FFmpeg Build: Visit the official FFmpeg website to download the latest FFmpeg build for Windows.
Unzip the Downloaded File: Once downloaded, unzip the file to extract its contents. You will find several files, including ffmpeg.exe.
Copy ffmpeg.exe: You have two options for using ffmpeg.exe:
- Project Folder: Copy ffmpeg.exe directly into your project folder. This way, your project can access FFmpeg without modifying system settings.
- Add to PATH: Alternatively, you can add the directory containing ffmpeg.exe to your system's PATH environment variable. This allows you to use FFmpeg from any command prompt window without specifying its location.

Additionally, the full project folder, including all necessary files and instructions, is available for download on GitHub. You can also find the link to the GitHub repository at the end of the article.

For macOS users:

To install FFmpeg on macOS, you can use Homebrew, a popular package manager for macOS. Here’s how:

Open Terminal: You can find Terminal in the Utilities folder within Applications.
Install Homebrew (if not already installed): Paste the following command in Terminal and press Enter. Follow the on-screen instructions. /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install FFmpeg: Once Homebrew is installed, run the following command in Terminal:
```
 brew install ffmpeg
```
This command will download and install FFmpeg, making it available for use on your system.

For Linux users (Debian/Ubuntu):

To install FFmpeg on Debian-based systems like Ubuntu, you can use the APT package manager. Here’s how:

Open Terminal: You can usually find Terminal in your system’s applications menu.
Update Package List: Before installing new software, it’s a good idea to update your package list. Run:
```
 sudo apt update
```
Install FFmpeg: After updating, install FFmpeg by running:
```
 sudo apt install ffmpeg
```
This command will download and install FFmpeg, allowing you to use it from the command line.

These steps will ensure that FFmpeg is installed and ready to use on your macOS or Linux system.

Running Python scripts

Windows: python script.py or py script.py
macOS/Linux: python3 script.py

I will mark these differences with a macOS/Linux note in the relevant steps so you can follow along smoothly on your system.

Hands-On: Build a Simple Accessibility Toolkit (Python)

You’ll build two small demos:

Speech-to-Text with Whisper (local, free).
Text-to-Speech with Hugging Face SpeechT5.

1) Speech-to-Text with Whisper (Local and free)

What you’ll build:
A Python script that takes a short MP3 recording and prints the transcript to your terminal.

Why Whisper?
It’s a robust open-source STT model. The local version is perfect for beginners because it avoids API keys/quotas and works offline after the first install.

How to Install Whisper (PowerShell):

# Activate your virtual environment
# Example: .\venv\Scripts\Activate

# Install the openai-whisper package
pip install openai-whisper

# Check if FFmpeg is available
ffmpeg -version

# If FFmpeg is not available, download and install it, then add it to PATH or place ffmpeg.exe next to your script
# Example: Move ffmpeg.exe to the script directory or update PATH environment variable

PowerShell confirming FFmpeg is installed

You should see a version string here before running Whisper.

Note: Mac OS users can use the same code snippet as above in their terminal

If FFmpeg is not installed, you can install it using the following commands:

For macOS:

brew install ffmpeg

For Ubuntu/Debian Linux:

sudo apt install ffmpeg

Create `transcribe.py`:

import whisper

# Load the Whisper model
model = whisper.load_model("base")  # Use "tiny" or "small" for faster speed

# Transcribe the audio file
result = model.transcribe("lesson_recording.mp3", fp16=False)

# Print the transcript
print("Transcript:", result["text"])

How the code works:

whisper.load_model("base") — downloads/loads the model once, then cached afterward.
model.transcribe(...) — handles audio decoding, language detection, and text inference.
fp16=False — avoids half-precision GPU math so it runs fine on CPU.
result["text"] — the final transcript string.

Run it:

python transcribe.py

Expected output:

Whisper successfully transcribed audio to text

Successful Speech-to-Text: Whisper prints the recognized sentence from lesson_recording.mp3

To run the transcribe.py script on macOS or Linux, use the following command in your Terminal:

python3 transcribe.py

Common hiccups (and fixes):

FileNotFoundError during transcribe → FFmpeg isn’t found. Install it and confirm with ffmpeg -version.
Super slow on CPU → switch to tiny or small models: whisper.load_model("small").

2) Text-to-Speech with SpeechT5

What you’ll build:
A Python script that converts a short string into a spoken WAV file called output.wav.

Why SpeechT5?
It’s a widely used open model that runs on your CPU. Easy to demo and no API key needed.

Install the required packages on (PowerShell) Windows:

# Activate your virtual environment
# Example: .\venv\Scripts\Activate

# Install the required packages
pip install transformers torch soundfile sentencepiece

Note: Mac OS users can use the same code snippet as above in their terminal

Create tts.py

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
import soundfile as sf
import torch
import numpy as np

# Load models
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# Speaker embedding (fixed random seed for a consistent synthetic voice)
g = torch.Generator().manual_seed(42)
speaker_embeddings = torch.randn((1, 512), generator=g)

# Text to synthesize
text = "Welcome to inclusive education with AI."
inputs = processor(text=text, return_tensors="pt")

# Generate speech
with torch.no_grad():
    speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

# Save to WAV
sf.write("output.wav", speech.numpy(), samplerate=16000)
print("✅ Audio saved as output.wav")

Expected Output:

Text-to-Speech complete — Audio saved as output.wav

Text-to-Speech complete. SpeechT5 generated the audio and saved it as output.wav

How the code works:

SpeechT5Processor — prepares your text for the model.
SpeechT5ForTextToSpeech — generates a mel-spectrogram (the speech content).
SpeechT5HifiGan — a vocoder that turns the spectrogram into a waveform you can play.
speaker_embedding — a 512-dim vector representing a “voice.” We seed it for a consistent (synthetic) voice across runs.

Note: If you want the same voice every time you reopen the project, you need to save the embedding once using the snippet below:

import numpy as np
import torch

# Save the speaker embeddings
np.save("speaker_emb.npy", speaker_embeddings.numpy())

# Later, load the speaker embeddings
speaker_embeddings = torch.tensor(np.load("speaker_emb.npy"))

Run it:

python tts.py

Note: MacOS/Linux use python3 tts.py to run the same code as above.

Expected result:

Terminal prints: ✅ Audio saved as output.wav
A new file appears in your folder: output.wav

Explorer showing the generated output.wav file

Common hiccups (and fixes):

ImportError: sentencepiece not found → pip install sentencepiece
Torch install issues on Windows →

# Activate your virtual environment
# Example: .\venv\Scripts\Activate

# Install the torch package using the specified index URL for CPU
pip install torch --index-url https://download.pytorch.org/whl/cpu

Note: The first run is usually slow because the models may still be downloading. So that’s normal.

3) Optional: Whisper via OpenAI API

What this does:
Instead of running Whisper locally, you can call the OpenAI Whisper API (whisper-1). Your audio file is uploaded to OpenAI’s servers, transcribed there, and the text is returned.

Why use the API?

No need to install or run Whisper models locally (saves disk space & setup time).
Runs on OpenAI’s infrastructure (faster if your computer is slow).
Great if you’re already using OpenAI services in your classroom or app.

What to watch out for:

Requires an API key.
Requires billing enabled (the free trial quota is usually small).
Needs internet access (unlike the local Whisper demo).

How to get an API key:

Go to OpenAI’s API Keys page.
Log in with your OpenAI account (or create one).
Click “Create new secret key”.
Copy the key — it looks like sk-xxxxxxxx.... Treat this like a password: don’t share it publicly or push it to GitHub.

Step 1: Set your API key

In PowerShell (session only):

# Set the OpenAI API key in the environment variable
$env:OPENAI_API_KEY="your_api_key_here"

Or permanently set an environment variable in PowerShell - you can use the setx command. Here is how you can do it:

setx OPENAI_API_KEY "your_api_key_here"

This command sets the OPENAI_API_KEY environment variable to the specified value. Note that you should replace "your_api_key_here" with your actual API key. This change will apply to future PowerShell sessions, but you may need to restart your current session or open a new one to see the changes take effect.

Verify it’s set:

To check the value of an environment variable in PowerShell, you can use the echo command. Here's how you can do it:

echo $env:OPENAI_API_KEY

This command will display the current value of the OPENAI_API_KEY environment variable in your PowerShell session. If the variable is set, it will print the value. Otherwise, it will return nothing or an empty line.

Step 2: Install the OpenAI Python client

To install the OpenAI Python client using pip, you can use the following command in your PowerShell:

pip install openai

This command will download and install the OpenAI package, allowing you to interact with OpenAI's API in your Python projects. Make sure you have Python and pip installed on your system before running this command.

Step 3: Create transcribe_api.py

from openai import OpenAI

# Initialize the OpenAI client (reads API key from environment)
client = OpenAI()

# Open the audio file and create a transcription
with open("lesson_recording.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f
    )

# Print the transcript
print("Transcript:", transcript.text)

Step 4: Run it

python transcribe_api.py

Expected output:

Transcript: Welcome to inclusive education with AI.

Common hiccups (and fixes):

Error: insufficient_quota → You’ve run out of free credits. Add billing to continue.
Slow upload → If your audio is large, compress it first (for example, MP3 instead of WAV).
Key not found → Double-check if $env:OPENAI_API_KEY is set in your terminal session.

Local Whisper vs API Whisper — Which Should You Use?

Feature	Local Whisper (on your machine)	OpenAI Whisper API (cloud)
Setup	Needs Python packages + FFmpeg	Just install `openai` client + set API key
Hardware	Runs on your CPU (slower) or GPU (faster)	Runs on OpenAI’s servers (no local compute needed)
Cost	✅ Free after initial download	💳 Pay per minute of audio (after free trial quota)
Internet required	❌ No (fully offline once installed)	✅ Yes (uploads audio to OpenAI servers)
Accuracy	Very good - depends on model size (tiny → large)	Consistently strong - optimized by OpenAI
Speed	Slower on CPU, faster with GPU	Fast (uses OpenAI’s infrastructure)
Privacy	Audio never leaves your machine	Audio is sent to OpenAI (data handling per policy)

Rule of thumb:

Use Local Whisper if you want free, offline transcription or you’re working with sensitive data.
Use the API Whisper if you prefer convenience, don’t mind usage billing, and want speed without local setup.

Quick Setup Cheatsheet

Task	Windows (PowerShell)	macOS / Linux (Terminal)
Create venv	`py -3.12 -m venv .venv`	`python3 -m venv .venv`
Activate venv	`.\.venv\Scripts\Activate`	`source .venv/bin/activate`
Install Whisper	`pip install openai-whisper`	`pip install openai-whisper`
Install FFmpeg	Download build → unzip → add to PATH or copy `ffmpeg.exe`	`brew install` `ffmpeg` (macOS) `sudo apt install ffmpeg` (Linux)
Run STT script	`python` `transcribe.py`	`python3` `transcribe.py`
Install TTS deps	`pip install transformers torch soundfile sentencepiece`	`pip install` `transformers torch soundfile sentencepiece`
Run TTS script	`python` `tts.py`	`python3` `tts.py`
Install OpenAI client (API)	`pip install` `openai`	`pip` `install openai`
Run API script	`python transcribe_api.py`	`python3 transcribe_api.py`

Pro tip for MacOS M1/M2 users: You may need a special PyTorch build for Metal GPU acceleration. Check the PyTorch install guide for the right wheel.

From Code to Classroom Impact

Whether you chose the local Whisper, the cloud API, or SpeechT5 for text-to-speech, you should now have a working prototype that can:

Convert spoken lessons into text.
Read text aloud for students who prefer auditory input.

That’s the technical foundation. But the real question is: how can these building blocks empower teachers and learners in real classrooms?

Developer Challenge: Build for Inclusion

Try combining the two snippets into a simple classroom companion app that:

Captions what the teacher says in real time.
Reads aloud transcripts or textbook passages on demand.

Then think about how to expand it further:

Add symbol recognition for non-verbal communication.
Add multi-language translation for diverse classrooms.
Add offline support for schools with poor connectivity.

These are not futuristic ideas, they are achievable with today’s open-source AI tools.

Challenges and Considerations

Of course, building for inclusion isn’t just about code. There are important challenges to address:

Privacy: Student data must be safeguarded, especially when recordings are involved.
Cost: Solutions must be affordable and scalable for schools of all sizes.
Teacher Training: Educators need support to confidently use these tools.
Balance: AI should assist teachers, not replace the vital human element in learning.

Looking Ahead

The future of inclusive education will likely involve multimodal AI which include systems that combine speech, gestures, symbols, and even emotion recognition. We may even see brain–computer interfaces and wearable devices that enable seamless communication for learners who are currently excluded.

But one principle is clear: inclusion works best when teachers, developers, and neurodiverse learners co-design solutions together.

Conclusion

AI isn’t here to replace teachers, it’s here to help them reach every student. By embracing AI-driven accessibility, classrooms can transform into spaces where neurodiverse learners aren’t left behind, but instead empowered to thrive.

📢 Your turn:

Teachers: You can try one of the tools in your next lesson.
Developers: You can use the code snippets above to prototype your own inclusive classroom tool.
Policymakers: You can support initiatives that make accessibility central to education.

Inclusive education isn’t just a dream, it’s becoming a reality. With thoughtful use of AI, it can become the new norm.

Full source code on GitHub: Inclusive AI Toolkit

Table of Contents