Rakshath Naik - freeCodeCamp.org

Data Science Insights: Why the Mean Lies When Handling Messy Retail Data

Rakshath Naik — Tue, 05 May 2026 16:59:17 +0000

In our daily life, we use the word "average" all the time: average salary, average marks, average age, and so on.

Let's take the case of a retail shop. If we're looking at the average order value to understand customer spending, we'd load the data, run the code, and get a result of $20 per order.

Done.

Except something looks odd.

When we take a closer look, we see that most customers are buying items worth $8 - $15. So where's $20 coming from?

In that case, the problem isn’t data – it’s the average. This is a clean textbook trap where everything works perfectly in the textbook, but real-world data doesn’t behave nicely.

Some customers buy in bulk (very large orders), some return orders (negative quantities), and a few anomalies distort the entire picture.

In this article, we'll use the Online Retail Dataset to answer a simple but tricky question: What does “average” really mean in the real world?

Prerequisites
The Dataset
Mean: The Sensitive Giant
Median: The Robust Middle
Beyond Averages: Understanding Spread with Quartiles
Applying IQR to Our Dataset
Final Comparison and Insights
Conclusion
Connect with me

Prerequisites

To follow along here, you'll need:

Basic Python knowledge: Understanding of variables and functions.

The Pandas library: Familiarity with loading data and basic DataFrame operations.

A development environment: Access to a tool like Jupyter Notebook, VS Code, or Google Colab.

A Dataset: For this analysis, I used the Online Retail Dataset, which is available for download here.

The Dataset

We'll work with the Online Retail Dataset, a real-world transactional dataset containing purchase records from a UK-based online retail store.

Source: UCI Machine Learning Repository
Collected by: UK-based online retail company (2010–2011)
Size: 541,909 transactions
Features: 8 attributes (InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country)
Ownership: Public dataset hosted by UCI
License: Open for research and educational use

Mean: The Sensitive Giant

In statistics and data analysis, the terms "average" and "arithmetic mean" are often used interchangeably. We aim to find the mean total price in our dataset. Mean in the context of the Online Retail Dataset is given as:

$$\text{Average Order Value} = \frac{\text{Sum of all TotalPrice values}}{\text{Number of transactions}}$$

In our dataset, the mean is calculated by summing all transaction values (including bulk purchases and returns) and dividing by the total number of transactions. This means every value, irrespective of unusually high or any negative values, directly influences the final average.

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
df = pd.read_excel(url, engine='openpyxl')

# Clean and Feature Engineering
df = df.dropna(subset=['CustomerID'])
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Calculate the Mean (Average Order Value)
mean_value = df['TotalPrice'].mean()
print(f"Average Order Value (Mean): {mean_value:.2f}")

The results are as follows:

Average Order Value (Mean): 20.40

At first glance, the results may look promising: every transaction contributes equally. But that’s where the problem lies. Sometimes a few transactions, which are extremely high or low, affect the mean for all customers who lie in the closer range.

Take a look at the graph for the mean below.

The graph shows the mean Total Price for the Online Retail Dataset. We get a mean of 20.42. (Image by Author)

The graph shows a right-skewed distribution where the calculated mean of 20.40 is actually a textbook trap. The tallest bar clearly shows that the majority of transactions lie in the range of $8 - $15 range, but the red line is being dragged to the right by the long tail of high-value bulk orders by some customers.

In this scenario, the average price is well above what a typical customer actually spends because it's highly sensitive to outliers – and in reality, the bulk of the data lives in the lower price range.

In simple words, the mean is being pulled by some extreme values to the right, especially by some lying in the range of 200–300, which is noticeable in the graph.

Median: The Robust Middle

When the mean is distorted by extreme values, we need a metric that remains unaffected by such outliers. This is where the median comes into play.

Median is defined as the middle value after sorting the data.

In our dataset, we sort all the transactions and pick the middle one.

The formula for calculating the median is:

$$\text{Median} = \begin{cases} X_{\left[ \frac{n+1}{2} \right]} & \text{if } n \text{ is odd} \ \frac{X_{\left[ \frac{n}{2} \right]} + X_{\left[ \frac{n}{2} + 1 \right]}}{2} & \text{if } n \text{ is even} \end{cases}$$

Unlike the mean, the median doesn't depend on extreme values, and it cares only about the position of the data, not the magnitude.

# Clean and Feature Engineering
df = df.dropna(subset=['CustomerID'])
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Calculate only the Median
median_value = df['TotalPrice'].median()
print(f"Typical Order Value (Median): {median_value:.2f}")

The results are as follows:

Typical Order Value (Median): 11.10

Now you'll notice that the result lies in the $8 — $15 range, where most of the transactions lie.

The figure demonstrates the graph for the median, where we get an accurate value of the transactions by the customers. (Image by Author)

In the previous graph, the mean was pulled to the right by large orders, but the median just asks what the middle customer spends. So even if someone spends $300 or some transactions are negative, the median stays stable.

In the above figure the median graph accurately highlights the range where most of the customers lie.

Beyond Averages: Understanding Spread with Quartiles

So far, we've studied the median, but knowing the center is not enough.

To truly understand how customer spending is, we need to understand how the data is spread, and this is where quartiles come into play.

Quartiles divide the dataset into the following parts:

Q1(25th percentile): 25% of transactions are below this.
Q2 (50th percentile): Median
Q3 (75th percentile): 75% of transactions are below this.

This is formally expressed as the Interquartile Range (IQR):

$$IQR = Q_3 - Q_1$$

The IQR: Detecting Outliers

The IQR measures the spread of the middle 50%.

If the IQR is small, then the data is concentrated. If it's large, the data is spread out. The IQR also helps us identify outliers mathematically.

Outlier Rule:

Lower Bound = Q1 — 1.5 * IQR
Upper Bound = Q3 + 1.5 * IQR

A Simple Example to Understand IQR

Consider the following transaction values:

$$\left[ 5, 8, 10, 12, 15, 18, 20 \right]$$

Step 1: Find the Median (Q2):

The middle value is:

$$Q_2 = 12$$

Step 2: Find Q1 (Lower Quartile):

The lower half is [5, 8, 10]. The median of the lower half is:

$$Q_1 = 8$$

Step 3: Find Q3 (Upper Quartile):

The upper half is [15, 18, 20]. The median of the upper half is:

$$Q_3 = 18$$

Step 4: Calculate IQR:

$$IQR = Q_3 - Q_1 = 18 - 8 = 10$$

Step 5: Find Outlier Bounds:

$$\begin{aligned} \text{Lower Bound} &= Q_1 - 1.5 \times IQR = 8 - 15 = -7 \ \text{Upper Bound} &= Q_3 + 1.5 \times IQR = 18 + 15 = 33 \end{aligned}$$

Any value below -7 or above 33 is an outlier (but in this demo problem, no outliers exist).

Applying IQR to Our Dataset

In our retail dataset, instead of neat values, we have bulk values and even negative returns.

# 1. Calculate IQR and Bounds
Q1 = df['TotalPrice'].quantile(0.25)
Q3 = df['TotalPrice'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

When we calculate IQR for our dataset, we get:

Lower Bound: -18.75
Upper Bound: 42.45
Number of Outliers: 33180

The graph demonstrates outliers, which are any values falling outside the range of -18.75 to 42.45. (Image by Author)

As the graph shows, the values outside the range -18.75 to 42.45 are considered outliers. These values will be removed.

Revisiting the Mean After Removing Outliers

Using the IQR method, we've removed extreme transactions that fell outside the typical spending range.

# Clean and Feature Engineering
df = df.dropna(subset=['CustomerID'])
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Original Mean
mean_value = df['TotalPrice'].mean()
print(f"Original Mean: {mean_value:.2f}")

# IQR Calculation
Q1 = df['TotalPrice'].quantile(0.25)
Q3 = df['TotalPrice'].quantile(0.75)
IQR = Q3 - Q1

# Define bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

print(f"Lower Bound: {lower_bound:.2f}")
print(f"Upper Bound: {upper_bound:.2f}")

# Remove Outliers
df_no_outliers = df[(df['TotalPrice'] >= lower_bound) & (df['TotalPrice'] <= upper_bound)]

# New Mean after removing outliers
new_mean = df_no_outliers['TotalPrice'].mean()
print(f"Mean after removing outliers: {new_mean:.2f}")

After recomputing, we get:

Original Mean: 20.40
Lower Bound: -18.75
Upper Bound: 42.45
Mean after removing outliers: 11.63

Removing outliers significantly shifts the mean toward the region where most transactions occur. We now have a much better mean of 11.63 as opposed to the right-stretched mean of 20.40 we got with outliers.

Final Comparison and Insights

Looking at the results from all the graphs, we get a complete understanding of the dataset. The original mean was 20.40, which appeared to be significantly higher than the most transactions that actually occurred. In that case, the mean was pulled upward by some of the high-valued transactions and was distorted by the outliers.

The median, on the other hand, was 11.10, which lies within the range where most transactions are concentrated. This shows that the median is a much better representation of what a typical customer spends, as it's not affected by extreme values.

After removing the outliers using the IQR, the mean dropped to 11.63, bringing it very close to the median. This confirms that the earlier mean was not inherently wrong, but was simply influenced by extreme values in the data. Once those values were handled, the mean became a much more reliable measure of central tendency.

Conclusion

The results show that the mean can be misleading when data contains outliers. In our dataset, the original mean of 20.40 overstated customer spending, while the median (11.10) gave a more realistic picture. After removing outliers, the mean shifted to 11.63, aligning closely with the median.

This highlights a key lesson: The mean isn't wrong, but it must be used with an understanding of the data.

Choosing the right measure of average depends on the dataset, and in messy real-world scenarios, the median or a cleaned mean often tells the true story.

Connect with me

If you want to dive deeper, you can visit: Mean vs Median vs Mode: Understanding Central Tendency in Data Analysis.

How to Deploy a Serverless Spam Classifier Using Scikit-Learn, AWS Lambda, & API Gateway

Rakshath Naik — Thu, 30 Apr 2026 05:06:15 +0000

In today's digital world, spam is no longer just an annoyance - it's a growing security threat. To combat this, developers often turn to machine learning to build intelligent filters that can distinguish legitimate emails from malicious ones.

While building a machine learning model in a notebook is relatively straightforward, the real challenge lies in the last mile: deploying that model into a scalable, production-ready system that users can actually interact with.

In this project, I built an end-to-end serverless spam classifier, combining Scikit-learn for model development with AWS Lambda, Amazon S3, and Amazon API Gateway for deployment. The result is a lightweight, scalable API that can classify messages in real time.

The system is designed to be modular and cost-efficient, allowing the model to be retrained and updated independently without affecting the live API. From detecting "free iPhone" scams to identifying phishing attempts, this project demonstrates how to bridge the gap between machine learning experimentation and real-world deployment.

Prerequisites
Building the Brain: The Model
Deploying the Model to AWS
How to Run The Project Locally
Our Project Architecture
Conclusion: The Power of Serverless AI
Acknowledgment / References

1. Prerequisites

Fundamental skills: Basic proficiency in Python and understanding of Machine Learning concepts like classification.
AWS account: Access to an AWS account with permissions for Lambda, S3, and API Gateway.
Environment: Python 3.11 installed, along with libraries like scikit-learn, pandas, and joblib.
AWS CLI: Configured on your local machine for file uploads.
HuggingFace account: You can directly download the model from my account.

2. Building the Brain: The Model

Photo by Steve A Johnson on Unsplash

At the heart of this project lies a supervised learning approach. Instead of simply specifying which words are considered spam, we'll provide the computer with a dataset and an algorithm, enabling it to learn and identify spam patterns on its own.

1. Vectorization: Turning Text into Math

Machine Learning models can't read text. They require numerical input. To solve this, we used the TF-IDF (Term Frequency-Inverse Document Frequency) Vectorizer.

feature_extraction = TfidfVectorizer(min_df=1, stop_words='english', lowercase=True)
X_train_features = feature_extraction.fit_transform(X_train

Here's the mathematical formula:

$$w_{i,j} = tf_{i,j} \times \log \left( \frac{N}{df_i} \right)$$

TF-IDF term definitions:

wᵢ,ⱼ (Weight): The final importance score of a specific word in a document.
tfᵢ,ⱼ (Term Frequency): How often a word appears in a single email.
N (Total Documents): The total count of all emails in your dataset.
dfᵢ (Document Frequency): The number of different emails that contain this specific word.
log(N/dfᵢ) (IDF): A penalty that lowers the score of common words like the or is that appear everywhere.

It cleans the data by removing common words, converts all text to lowercase for consistency, and assigns more importance to rare and meaningful words while giving less importance to frequently used words.

2. Training: The Logistic Regression Engine

We'll use Logistic Regression here, a classification algorithm that predicts the probability of an outcome.

In this stage, we feed our vectorized training data into the Logistic Regression algorithm. The goal is to establish a mathematical relationship between specific word weights and the Spam or Ham label.

During training, the model iteratively adjusts its internal parameters to minimize error, eventually learning that words like winner or free correlate highly with spam, while conversational language correlates with legitimate messages.

model = LogisticRegression()
model.fit(X_train_features, Y_train)

In our case, it calculates the probability that an email belongs to spam or HAM.

The algorithm uses the Sigmoid function to map any real-valued number into a value between 0 and 1.

$$P(y=1|x) = \frac{1}{1 + e^{-(z)}}$$

where z = β₀ + β₁x₁ + … + βₙxₙ.

3. Evaluation: Testing the Intelligence

After training, we need to verify if the brain actually works on data it hasn't seen before.

prediction_on_test_data = model.predict(X_test_features)
accuracy_on_test_data = accuracy_score(Y_test, prediction_on_test_data)

By comparing the model’s predictions against the actual labels in our test set, we calculate an Accuracy Score. This gives us the confidence that the model is ready for the real world (achieving ~94% accuracy in our tests).

4. Exporting the Logic (Serialization)

To move this brain from our local Python environment to the AWS Cloud, we'll use Joblib to save our work into binary files (.pkl).

joblib.dump(model, 'spam_model.pkl')
joblib.dump(feature_extraction, 'vectorizer.pkl')

We use the Pickle format because it allows us to freeze complex Python objects (mathematical weights and word mappings) into a portable binary format that can be instantly re-animated in the cloud.

We need the Vectorizer to translate new user text into the exact numerical coordinates the Model was trained to understand. Using one without the other is like having a key but no lock.

The trained Logistic Regression model and TF-IDF vectorizer are openly available for the community on Hugging Face here: Get the model on HuggingFace.

3. Deploying the Model to AWS

Training a model is science, while deploying it is engineering. To make this classifier accessible to the world, we'll use a serverless stack that scales automatically and incurs nearly no maintenance costs.

1. Model Storage: Amazon S3

First, we'll uploade our .pkl files to an S3 bucket. By decoupling the model from the code, we can update the AI's intelligence (simply by overwriting the file in S3) without redeploying the backend code. It makes the system highly maintainable.

2. The Production Backend: AWS Lambda

To make the AI accessible, we'll move from a local script to a Serverless Cloud Architecture. This ensures the model is always available without the cost of a 24/7 server.

The deployment environment is AWS Lambda (Python 3.11). Since Lambda is a lightweight environment, it doesn't include Scikit-Learn or Joblib. To provide these, we'll download and store them in our S3 bucket and import them through the layers.

Commands in AWS CLI:


# 1. Create a workspace
mkdir ml_layer && cd ml_layer

# 2. Install scikit-learn and its dependencies into a folder
pip install \
    --platform manylinux2014_x86_64 \
    --target=python/lib/python3.11/site-packages \
    --implementation cp \
    --python-version 3.11 \
    --only-binary=:all: \
    scikit-learn joblib

# 3. Zip the folder
zip -r sklearn_lib.zip python

# 4. Upload to S3 (Using AWS CLI)
aws s3 cp sklearn_lib.zip s3://YOUR-BUCKET-NAME/

We store the Scikit-Learn library as a ZIP in S3 to bypass the AWS Lambda deployment package size limit. This allows the function to dynamically load heavy dependencies only when needed without bloating the core code.

The Lambda Function:


import json
import boto3
import os
import sys
from io import BytesIO

# Ensures the custom Lambda layer(containing sklearn/joblib)
sys.path.append('/opt/python')

try:
    import joblib
except ImportError:
    # Fallback for specific Scikit-Learn distributions
    from sklearn.utils import _joblib as joblib

# Initialize S3 client
s3 = boto3.client('s3')

# Use placeholders for the article so readers can insert their own values
BUCKET_NAME = 'YOUR_S3_BUCKET_NAME' 
MODEL_KEY = 'spam_model.pkl'
VECTORIZER_KEY = 'vectorizer.pkl'

# Global variables for 'Warm Start' caching (improves performance by keeping model in RAM)
model = None
vectorizer = None

def load_model():
    """Downloads model files from S3 only if they aren't already in RAM"""
    global model, vectorizer
    if model is None or vectorizer is None:
        try:
            # 1. Load the Logistic Regression Model from S3
            m_obj = s3.get_object(Bucket=BUCKET_NAME, Key=MODEL_KEY)
            model = joblib.load(BytesIO(m_obj['Body'].read()))
            
            # 2. Load the TF-IDF Vectorizer directly from S3
            v_obj = s3.get_object(Bucket=BUCKET_NAME, Key=VECTORIZER_KEY)
            vectorizer = joblib.load(BytesIO(v_obj['Body'].read()))
        except Exception as e:
            raise Exception(f"Failed to load .pkl files from S3: {str(e)}")

def lambda_handler(event, context):
    try:
        # Ensure model and vectorizer are ready before processing
        load_model()
        
        # Handles both direct Lambda tests and API Gateway POST requests
        body = event.get('body', event)
        if isinstance(body, str):
            body = json.loads(body)
            
        text = body.get('text', '')
            
        if not text:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'No text provided.'})
              }

        # 1. Transform input text to numeric features using the trained Vectorizer
        data_vec = vectorizer.transform([text])
        
        # 2. Predict using the Logistic Regression Model 
        prediction = int(model.predict(data_vec)[0])
        
      # 3. Map numeric result to human-readable label
        result_label = "HAM" if prediction == 1 else "SPAM"
        
        # RESPONSE WITH CORS
        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*' # needed for cross-domain web integration
            },
            'body': json.dumps({
                'status': 'success',
                'classification': result_label,
                'input_text': text
            })
        }
        
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error_message': f"Inference Error: {str(e)}"})
        }

Key features of the Lambda function:

Warm start caching: By defining the model and vectorizer variables outside the lambda_handler, we store them in the container's memory. This significantly reduces cold start latency for subsequent requests.
Dynamic dependency loading: The sys.path.append('/opt/python') line allows us to import heavy libraries from S3/Layers without exceeding the upload limit.
Bimodal input handling: The function is designed to handle both direct JSON testing from the AWS console and stringified payloads sent via API Gateway.

3. The API Gateway - The Bridge to the Web

Photo by Growtika on Unsplash

Creating the REST API

Next we'll create a REST API with a single POST method. Why POST, you might be wondering? Well, we need to securely send a JSON payload containing the user’s text message to our model.

First navigate to the Amazon API Gateway console and select Create API -> REST API.
Give your API a name, such as EmailSpamPredictor-API, and set the Endpoint Type to Regional.
Then in the left sidebar, click Resources and enter a resource name (e.g: / predict as entered by me)
Next click the create method and select POST and then select Lambda Function for integration type
Ensure Lambda Proxy integration is enabled (this allows the full request to pass through to your code).

The CORS Configuration (The Troubleshooting Hub)
This is where many developers encounter the dreaded Connection Error. Since our API is hosted on AWS, and if your front-end is on a separate website, the browser’s Same-Origin Policy will block the request by default.

To fix this, we'll enable CORS:

Access-Control-Allow-Origin: Set to * (or specifically to your domain) to tell the browser that the API is allowed to talk to your front-end.
The OPTIONS method: API Gateway creates an OPTIONS method automatically. This handles the Preflight request where the browser asks, “Are you allowed to receive data from me?” before sending the actual text.
Access-Control-Allow-Headers: In the screenshot, you'll notice headers like Content-Type and Authorization are allowed. This ensures that when our JavaScript fetch() call sets the content type to application/json, the API Gateway doesn't reject it.

Image illustrates the CORS configuration for our project. (Image by author)

Deployment Stages

Once the API is deployed to a production stage, AWS generates a permanent Invoke URL. This acts as the public gateway to our model and typically follows this structure: https://[api-id].execute-api.[region].amazonaws.com/prod/classify.

Connecting the Frontend (The JavaScript Layer)

With the API live, we can now write a simple JavaScript function to talk to our model. This script runs whenever a user clicks the Analyze button on your site.


async function checkSpam() {
    const message = document.getElementById("userInput").value;
    const apiUrl = "YOUR_API_GATEWAY_INVOKE_URL";

    try {
        const response = await fetch(apiUrl, {
            method: "POST",
            headers: {
                "Content-Type": "application/json"
            },
            body: JSON.stringify({ "text": message })
        });

        const data = await response.json();
        
        // Display result on the webpage
        const resultElement = document.getElementById("result");
        resultElement.innerText = `Prediction: ${data.classification}`;
        resultElement.style.color = data.classification === "SPAM" ? "red" : "green";

    } catch (error) {
        console.error("Error:", error);
        alert("Could not connect to the Spam Detector API.");
    }
}

4. How to Run The Project Locally

You can store the front-end as an HTML file. Once it's ready, you shouldn’t just double-click the .html file. Opening it as a file in your browser can cause security restrictions. Instead, you should host it using a simple local server.

Step 1: Open the terminal or Command Prompt.

Step 2: Navigate to your project folder

cd [PATH_TO_YOUR_FOLDER]

Step 3: Start a local Python web server.

python -m http.server 8000

Step 4: Access the application.

Open your browser and navigate to:
http://localhost:8000/your-file-name.html

Watch the Demo:

5. Our Project Architecture

The image illustrates the architecture of our project (Building a Serverless Spam Classifier). It shows the process that takes place from the client input to the final model output. (Image by Author)

Client Front-End Interaction: The process starts on the far left. A user interacts with the web interface (for example, a website or a desktop app). They input text like WIN free iPhone now and trigger a request.
The Entry Point: API Gateway: The request hits the Amazon API Gateway, which acts as the security guard and translator.
(a) CORS OPTIONS handles the pre-flight handshake to ensure the browser has permission to talk to the AWS cloud.
(b) Classification Request (POST) routes the actual message data to your backend logic.
The Engine: AWS Lambda (Python 3.11): The central “lightbulb” represents your Lambda function. This is where the code you wrote lives. It doesn’t run 24/7 – it only wakes up when a request arrives.
Storage & Retrieval: S3 Bucket: Since Lambda is lightweight, it doesn’t store your heavy Machine Learning files internally.
Dependency and Model Download: The function reaches out to the S3 Bucket to pull in the sklearn_lib.zip (the engine) and the .pkl files (the intelligence).
Required Dependency and Model: These assets are loaded into the Lambda’s temporary memory to prepare for the prediction.
The Inference Pipeline: Inside the Lambda, a three-step mathematical cycle occurs:
(a) Text Vectorizer: Translates the words into numbers.
(b) Logistic Regression: Calculates the probability of spam based on those numbers.
(c) Label: Assigns a final result (Spam or Ham).
The Result Delivery: The result is sent back through the API Gateway, including the necessary CORS Headers to ensure the browser accepts it. The front-end then updates to show the “Result: SPAM” with a visual indicator.

6. Conclusion: The Power of Serverless AI

By merging the mathematical simplicity of Logistic Regression with the industrial strength of AWS Serverless Architecture, we have transformed a static Python script into a globally accessible, scalable API.

This project demonstrates that you don’t need a massive budget or a 24/7 dedicated server to deploy high-quality Machine Learning.

Using the S3-to-Lambda workaround allowed us to bypass common storage hurdles, ensuring that our Brain (the model) and its Muscle (Scikit-Learn) could function seamlessly within the cloud’s ephemeral environment. It bridges the gap between experimentation and real-world applications, making AI systems practical, efficient, and accessible.

7. Acknowledgment / References

Pre-trained spam classification model: View on Hugging Face (rakshath1/mail-spam-detector · Hugging Face)
Scikit-learn Documentation
AWS Lambda Documentation
Amazon S3 Documentation
Amazon API Gateway Documentation

Connect With Me

You may also like