Ever wondered how Gmail knows that an email promising you $10 million is spam? Or how it catches those "You've won a free iPhone!" messages before they reach your inbox?
In this tutorial, you'll build your own spam email classifier from scratch using the Naive Bayes algorithm. By the end, you'll have a working model that achieves over 97% accuracy—and you'll understand exactly how it works under the hood.
This project was inspired by the Python Machine Learning Workbook for Beginners by AI Publishing, which offers excellent hands-on ML projects for those starting their journey. (Note: I have no affiliation with the authors — I simply found it a useful resource.)
Table of Contents
What You'll Learn
How email spam filters actually work
The intuition behind the Naïve Bayes algorithm
Text preprocessing techniques for machine learning
How to evaluate classification models
Building a complete spam detection pipeline in Python
Prerequisites
You should have basic familiarity with Python and some understanding of fundamental machine learning concepts. Don't worry if you're still learning—I'll explain everything as we go.
Why Naive Bayes for Spam Detection?
Before we dive into code, let's understand why Naive Bayes is particularly well-suited for this task.
Imagine you receive an email containing words like "free," "winner," "click here," and "limited time offer." Your brain immediately flags this as suspicious. The Naive Bayes algorithm does something similar—it calculates the probability that an email is spam based on the words it contains.
The algorithm is called "naive" because it makes a simplifying assumption: it treats each word as independent of every other word. In reality, word combinations matter (think "free trial" vs. "free money"), but this simplification works remarkably well in practice.
Why Choose Naive Bayes?
Speed: It trains incredibly fast, even on large datasets
Efficiency: Requires minimal training data to produce reliable results
Simplicity: Easy to implement and interpret
Performance: Despite its simplicity, it often outperforms more complex algorithms for text classification
Limitations to keep in mind:
The independence assumption means it can't capture relationships between words
If a word appears in the test data but never appeared in training, the algorithm assigns it zero probability (though there are ways to handle this)
Now let's build our spam detector.
How to Set Up Your Environment
First, install the required libraries. Open your terminal or run this in a Jupyter notebook cell:
%pip install regex wordcloud numpy pandas seaborn matplotlib scikit-learn
Here's a quick summary of what each library does:
regex/re— for cleaning text using pattern matchingwordcloud— for visualizing which words appear most frequentlynumpyandpandas— for data loading and manipulationseabornandmatplotlib— for charts and visualizationsscikit-learn— provides the Naive Bayes classifier, vectorizer, and evaluation tools
Once installation is complete, import everything at the top of your script or notebook. Grouping all imports at the top is a Python best practice — it makes dependencies easy to spot at a glance.
# Data manipulation and analysis
import pandas as pd
import numpy as np
# Data visualization
import seaborn as sns
import matplotlib.pyplot as plt
# Natural language processing
import nltk
import re
from nltk.corpus import stopwords
# Machine learning
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Word cloud visualization
from wordcloud import WordCloud
How to Load and Explore the Dataset
We'll use a dataset of labeled emails. You can download it from Kaggle or use any similar email dataset with text and spam columns.
Use pandas' read_csv() function to load the dataset from a CSV file into a DataFrame — a table-like structure that makes it easy to inspect and manipulate data. The head() method then displays the first 5 rows so you can confirm the data loaded correctly and understand its structure.
message_dataset = pd.read_csv('emails.csv')
message_dataset.head()
Output:
| text | spam | |
|---|---|---|
| 0 | Subject: naturally irresistible your corporate... | 1 |
| 1 | Subject: the stock trading gunslinger fanny i... | 1 |
| 2 | Subject: unbelievable new homes made easy im ... | 1 |
| 3 | Subject: 4 color printing special request add... | 1 |
| 4 | Subject: do not have money , get software cds ... | 1 |
Next, call shape on the DataFrame to check its dimensions — this returns a tuple of (rows, columns) and is a quick way to confirm you loaded the full dataset without truncation.
# Get the dimensions of our dataset (rows, columns)
message_dataset.shape
Output:
(5728, 2)
The dataset contains 5,728 emails with two columns: text (the email content) and spam (1 for spam, 0 for legitimate emails).
How to Visualize the Data Distribution
Before training any model, it's crucial to understand your data. Let's see how spam and legitimate emails are distributed.
value_counts() tallies how many emails belong to each class (spam vs. legitimate). Chaining .plot(kind="pie") on the result converts those counts directly into a pie chart. The autopct="%1.0f%%" argument tells matplotlib to label each slice with its percentage, rounded to the nearest whole number.
plt.rcParams["figure.figsize"] = [8, 10]
message_dataset.spam.value_counts().plot(kind="pie", autopct="%1.0f%%")
Output:
You'll see that approximately 24% of emails in the dataset are spam, while 76% are legitimate. This is a moderately imbalanced dataset, which we'll keep in mind when evaluating our model.
How to Analyze Word Patterns with Word Clouds
Word clouds provide an intuitive visualization of the most frequent words in a text corpus. Words that appear more often are rendered larger. Let's create separate word clouds for spam and legitimate emails to identify distinguishing patterns.
First, we need to remove stop words — common words like "the," "is," and "at" that appear everywhere and carry no meaningful signal for classification. NLTK's stopwords.words("english") returns a pre-built list of these words. The apply() method runs a function across every row in the column, and the lambda inside it splits each email into individual words, filters out any stop words, then rejoins the remaining words into a clean string.
stop = stopwords.words("english")
message_dataset["text_without_sw"] = message_dataset["text"].apply(
lambda x: "".join([item for item in x.split() if item not in stop])
)
Now let's visualize the spam emails. We filter the DataFrame to rows where spam == 1, join all that text into a single large string, and pass it to WordCloud().generate(). The imshow() function renders the resulting image, and axis("off") hides the x/y axes since they're not meaningful for an image display.
message_dataset_spam = message_dataset[message_dataset["spam"] == 1]
plt.rcParams["figure.figsize"] = [8, 10]
text = ' '.join(message_dataset_spam['text_without_sw'])
wordcloud2 = WordCloud().generate(text)
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
Output:
Now do the same for legitimate emails by filtering to rows where spam == 0:
message_dataset_ham = message_dataset[message_dataset["spam"] == 0]
plt.rcParams["figure.figsize"] = [8, 10]
text = ' '.join(message_dataset_ham['text_without_sw'])
wordcloud2 = WordCloud().generate(text)
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
Output:
Key observations:
Spam emails frequently contain promotional language: "free," "money," "offer," "click," "please"
Legitimate emails contain more conversational and work-related terms: "company," "time," "thanks"
You'll also notice the word "enron" appearing prominently in the legitimate emails cloud. This is because the non-spam emails in this dataset are drawn from the publicly available Enron email corpus — a large collection of real internal emails from Enron Corporation that was released during their 2001 fraud investigation. It has since become one of the most widely used benchmark datasets in NLP research, which is why "enron" shows up so frequently as a word in legitimate email content.
These patterns give us confidence that word-based classification will work well.
How to Preprocess the Text Data
Raw text needs cleaning before machine learning algorithms can process it effectively. Let's first separate our features from our labels. In ML terminology, X holds the inputs (the email text we use to make predictions) and y holds the target labels (1 for spam, 0 for legitimate).
X = message_dataset["text"]
y = message_dataset["spam"]
Now we'll define a function to clean the text. The re.sub() function from Python's built-in re module performs pattern-based substitution using regular expressions. We call it three times in sequence:
re.sub('[^a-zA-Z]', ' ', doc)— replaces anything that isn't a letter (numbers, punctuation, symbols) with a space. This strips noise that doesn't help with classification.re.sub(r'\s+[a-zA-Z]\s+', ' ', document)— removes isolated single characters (like "I" or "a" left behind after removing punctuation) by matching any single letter surrounded by whitespace.re.sub(r'\s+', ' ', document)— collapses multiple consecutive spaces into a single space, tidying up any extra gaps created by the previous two steps.
def clean_text(doc):
document = re.sub('[^a-zA-Z]', ' ', doc)
document = re.sub(r'\s+[a-zA-Z]\s+', ' ', document)
document = re.sub(r'\s+', ' ', document)
return document
Apply this cleaning function to every email in the dataset. We first convert the pandas Series to a plain Python list using list(), then loop through each email, clean it, and collect the results in X_sentences.
# Create an empty list to store cleaned emails
X_sentences = []
# Convert the pandas Series to a list for iteration
reviews = list(X)
# Clean each email and add it to our list
for rev in reviews:
X_sentences.append(clean_text(rev))
How to Convert Text to Numerical Features
Machine learning algorithms work with numbers, not text. We need to transform our cleaned text into a numerical representation.
TF-IDF (Term Frequency-Inverse Document Frequency) is a great choice for this. It assigns each word a score that reflects how important it is to a particular document relative to the entire dataset. A word that appears often in one email but rarely across all emails gets a high score — meaning it's distinctive and likely meaningful. Common words that appear everywhere get a lower score.
TfidfVectorizer from scikit-learn handles this transformation. The parameters we set control what gets included:
max_features=2500— only keeps the 2,500 most frequent words, discarding rare ones that don't generalize wellmin_df=5— ignores words that appear in fewer than 5 emails (too rare to be useful)max_df=0.7— ignores words that appear in more than 70% of all emails (too common to be distinctive)stop_words=stopwords.words('english')— removes common English words like "the" and "is"
fit_transform() does two things in one step: it learns the vocabulary from our text (fit), then converts each email into a numerical vector based on that vocabulary (transform). Calling .toarray() on the result converts the sparse matrix output — which stores only non-zero values for efficiency — into a regular dense NumPy array that scikit-learn classifiers expect.
vectorizer = TfidfVectorizer(
max_features=2500,
min_df=5,
max_df=0.7,
stop_words=stopwords.words('english')
)
X = vectorizer.fit_transform(X_sentences).toarray()
Each email is now represented as a vector of 2,500 numbers, where each number is the TF-IDF score for a specific word.
How to Train the Naive Bayes Classifier
Now comes the exciting part — training our model! First, split the data into training and test sets using train_test_split(). This function randomly shuffles and divides both X and y simultaneously, keeping labels aligned with their corresponding emails. Setting test_size=0.20 reserves 20% of the data for testing. Setting random_state=42 seeds the random number generator so you get the same split every time you run the code, making your results reproducible.
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.20,
random_state=42
)
Now train the Multinomial Naive Bayes classifier. We use MultinomialNB specifically because it's designed for features that represent counts or frequencies — exactly what TF-IDF scores are. Calling fit(X_train, y_train) trains the model by having it calculate the probability of each word appearing in spam versus legitimate emails across the training set. Those probability tables are what the model uses later to classify new emails.
spam_detector = MultinomialNB()
spam_detector.fit(X_train, y_train)
That's it! The Naive Bayes algorithm is remarkably fast—training completes in milliseconds even with thousands of emails.
How to Evaluate Model Performance
Let's see how well our spam detector performs on emails it has never seen before. The predict() method takes the test set features and returns a predicted label (0 or 1) for each email, based on the probability tables the model learned during training.
y_pred = spam_detector.predict(X_test)
Now evaluate the predictions using three different tools from scikit-learn's metrics module:
confusion_matrix()— produces a 2×2 grid comparing actual vs. predicted labels, showing exactly where the model gets things right and wrongclassification_report()— prints precision, recall, and F1-score for each class, giving a more complete picture than accuracy aloneaccuracy_score()— returns the overall percentage of correct predictions
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
print(accuracy_score(y_test, y_pred))
Output:
[[849 7]
[ 18 272]]
precision recall f1-score support
0 0.98 0.99 0.99 856
1 0.97 0.94 0.96 290
accuracy 0.98 1146
macro avg 0.98 0.96 0.97 1146
weighted avg 0.98 0.98 0.98 1146
0.9781849912739965
Our model achieves 97.82% accuracy! Let's break down what the confusion matrix tells us:
849: Legitimate emails correctly identified as legitimate (True Negatives)
7: Legitimate emails incorrectly marked as spam (False Positives)
18: Spam emails that slipped through as legitimate (False Negatives)
272: Spam emails correctly caught (True Positives)
The classification report shows:
For legitimate emails (class 0): 98% precision, 99% recall
For spam emails (class 1): 97% precision, 94% recall
These numbers are impressive, especially considering the simplicity of our approach.
How to Test on Individual Emails
Let's verify our model works by testing it on a specific email. We'll first print the cleaned text at index 56 and its actual label to see what we're working with. Then we'll ask the model to predict it.
print(X_sentences[56])
print(y[56])
Output:
Subject localized software all languages available hello we would like to offer localized software versions german french spanish uk and many others aii iisted software is available for immediate downioad no need to wait week for cd deiivery just few exampies norton lnternet security pro windows xp professionai with sp fuil version corei draw graphics suite dreamweaver mx homesite inciudinq macromedia studio mx just browse our site and find any software you need in your native ianguaqe best reqards kayieen
1
This is clearly a spam email trying to sell pirated software. The actual label is 1 (spam). Now pass this single email through the same pipeline — first transforming it into a TF-IDF vector using the already-fitted vectorizer, then calling predict() on the result. It's important to use the same vectorizer that was fitted on the training data, so the word-to-index mapping is consistent.
print(spam_detector.predict(vectorizer.transform([X_sentences[56]])))
Output:
[1]
The model correctly identifies this promotional email as spam.
Key Takeaways
Naive Bayes is powerful for text classification despite its simplifying assumptions. For spam detection, it achieves excellent accuracy with minimal computational cost.
Text preprocessing matters. Removing noise (special characters, numbers, extra spaces) helps the algorithm focus on meaningful patterns.
TF-IDF captures word importance effectively. It gives higher weight to distinctive words that help differentiate spam from legitimate emails.
Always evaluate with multiple metrics. Accuracy alone can be misleading, especially with imbalanced datasets. Precision, recall, and F1-score give a complete picture.
Start simple. Before reaching for complex deep learning models, try classical algorithms like Naïve Bayes. They're interpretable, fast, and often surprisingly effective.
Next Steps
Want to improve this spam detector further? Here are some ideas:
Experiment with different vectorizers: Try CountVectorizer or word embeddings (Word2Vec, GloVe)
Handle class imbalance: Use techniques like SMOTE or adjust class weights
Feature engineering: Add features like email length, number of links, or sender domain
Try other algorithms: Compare with SVM, Random Forest, or gradient boosting
Deploy the model: Build a simple API using Flask or FastAPI
Conclusion
You've built a spam email classifier that achieves over 97% accuracy using the Naïve Bayes algorithm. Along the way, you learned about text preprocessing, feature extraction with TF-IDF, and model evaluation techniques.
The beauty of this approach is its simplicity. With just a few dozen lines of code, you've created something that actually works—and now you understand the principles behind commercial spam filters.
Feel free to experiment with the code, try different parameters, and see how the results change. That's the best way to deepen your understanding.