Bala Priya C - freeCodeCamp.org

Efficient Data Processing in Python: Batch vs Streaming Pipelines Explained

Bala Priya C — Mon, 13 Apr 2026 13:51:23 +0000

Every data pipeline makes a fundamental choice before any code is written: does it process data in chunks on a schedule, or does it process data continuously as it arrives?

This choice — batch versus streaming — shapes the architecture of everything downstream. The tools you use, the guarantees you can make about data freshness, the complexity of your error handling, and the infrastructure you need to run it all follow directly from this decision.

Getting it wrong is expensive. Teams that build streaming pipelines when batch would have sufficed end up maintaining complex infrastructure for a problem that didn't require it.

Teams that build batch pipelines when their use case demands real-time processing discover the gap at the worst possible moment — when a stakeholder asks why the dashboard is six hours out of date.

In this article, you'll learn what batch and streaming pipelines actually are, how they differ in terms of architecture and tradeoffs, and how to implement both patterns in Python. By the end, you'll have a clear framework for choosing the right approach for any data engineering problem you solve.

Prerequisites

To follow along comfortably, make sure you have:

Practice writing Python functions and working with modules
Familiarity with pandas DataFrames and basic data manipulation
A general understanding of what ETL pipelines do — extract, transform, load

Prerequisites
What Is a Batch Pipeline?
- Implementing a Batch Pipeline in Python
- When Batch Works Well
What Is a Streaming Pipeline?
- Implementing a Streaming Pipeline in Python
- When Streaming Works Well
The Key Differences at a Glance
Choosing Between Batch and Streaming
The Hybrid Pattern: Lambda and Kappa Architectures

What Is a Batch Pipeline?

A batch pipeline processes a bounded, finite collection of records together — a file, a database snapshot, a day's worth of transactions. It runs on a schedule, say, hourly, nightly, weekly, reads all the data for that period, transforms it, and writes the result somewhere. Then it stops and waits until the next run.

The mental model is simple: collect, then process. Nothing happens between runs.

In a retail ETL context, a typical batch pipeline might look like this:

At midnight, extract all orders placed in the last 24 hours from the transactional database
Join with the product catalogue and customer dimension tables
Compute daily revenue aggregates by region and product category
Load the results into the data warehouse for reporting

The pipeline runs, finishes, and produces a complete, consistent snapshot of yesterday's business. By the time analysts arrive in the morning, the warehouse is up to date.

Implementing a Batch Pipeline in Python

A batch pipeline in its simplest form is a Python script with three clearly separated stages: extract, transform, load.

import pandas as pd
from datetime import datetime, timedelta

def extract(filepath: str) -> pd.DataFrame:
    """Load raw orders from a daily export file."""
    df = pd.read_csv(filepath, parse_dates=["order_timestamp"])
    return df

def transform(df: pd.DataFrame) -> pd.DataFrame:
    """Clean and aggregate orders into daily revenue by region."""
    # Filter to completed orders only
    df = df[df["status"] == "completed"].copy()

    # Extract date from timestamp for grouping
    df["order_date"] = df["order_timestamp"].dt.date

    # Aggregate: total revenue and order count per region per day
    summary = (
        df.groupby(["order_date", "region"])
        .agg(
            total_revenue=("order_value_gbp", "sum"),
            order_count=("order_id", "count"),
            avg_order_value=("order_value_gbp", "mean"),
        )
        .reset_index()
    )
    return summary

def load(df: pd.DataFrame, output_path: str) -> None:
    """Write the aggregated result to the warehouse (here, a CSV)."""
    df.to_csv(output_path, index=False)
    print(f"Loaded {len(df)} rows to {output_path}")

# Run the pipeline
raw = extract("orders_2024_06_01.csv")
aggregated = transform(raw)
load(aggregated, "warehouse/daily_revenue_2024_06_01.csv")

Let's walk through what this code is doing:

extract reads a CSV file representing a daily order export. The parse_dates argument tells pandas to interpret the order_timestamp column as a datetime object rather than a plain string — this matters for the date extraction step in transform.
transform does two things: it filters out any orders that didn't complete (returns, cancellations), and then groups the remaining orders by date and region to produce revenue aggregates. The .agg() call computes three metrics per group in a single pass.
load writes the result to a destination — in production this would be a database insert or a cloud storage upload, but the pattern is the same regardless.

The three functions are deliberately kept separate. This separation — extract, transform, load — makes each stage independently testable, replaceable, and debuggable. If the transform logic changes, you don't need to modify the extract or load code.

When Batch Works Well

Batch pipelines are the right choice when:

Data freshness requirements are measured in hours, not seconds. A daily sales report doesn't need to be updated every minute. A weekly marketing attribution model certainly doesn't.
You're processing large historical datasets. Backfilling two years of transaction history into a new data warehouse is inherently a batch job — the data exists, it's bounded, and you want to process it as efficiently as possible in one run.
Consistency matters more than latency. Batch pipelines produce complete, point-in-time snapshots. Every row in the output was computed from the same input state. This consistency is valuable for financial reporting, regulatory compliance, and any downstream process that requires a stable, reproducible dataset.

What Is a Streaming Pipeline?

A streaming pipeline processes data continuously, record by record or in small micro-batches, as it arrives. There is no "end" to the dataset — the pipeline runs indefinitely, consuming events from a source like a message queue, a Kafka topic, or a webhook, and processing each one as it comes in.

The mental model is: process as you collect. The pipeline is always running.

In the same retail ETL context, a streaming pipeline might handle order events as they're placed:

An order is placed on the website and an event is published to a message queue
The streaming pipeline consumes the event within milliseconds
It validates, enriches, and routes the event to downstream systems
The fraud detection service, the inventory system, and the real-time dashboard all receive updated information immediately

The difference from batch is fundamental: the data isn't sitting in a file waiting to be processed. It's flowing, and the pipeline has to keep up.

Implementing a Streaming Pipeline in Python

Python's generator functions are the natural building block for streaming pipelines. A generator produces values one at a time and pauses between yields — which maps directly onto the idea of processing records as they arrive without loading everything into memory.

import json
import time
from typing import Generator, Dict

def event_source(filepath: str) -> Generator[Dict, None, None]:
    """
    Simulate a stream of order events from a file.
    In production, this would consume from Kafka or a message queue.
    """
    with open(filepath, "r") as f:
        for line in f:
            event = json.loads(line.strip())
            yield event
            time.sleep(0.01)  # simulate arrival delay between events

def validate(event: Dict) -> bool:
    """Check that the event has the required fields and valid values."""
    required_fields = ["order_id", "customer_id", "order_value_gbp", "region"]
    if not all(field in event for field in required_fields):
        return False
    if event["order_value_gbp"] <= 0:
        return False
    return True

def enrich(event: Dict) -> Dict:
    """Add derived fields to the event before routing downstream."""
    event["processed_at"] = time.strftime("%Y-%m-%dT%H:%M:%S")
    event["value_tier"] = (
        "high"   if event["order_value_gbp"] >= 500
        else "mid"    if event["order_value_gbp"] >= 100
        else "low"
    )
    return event

def run_streaming_pipeline(source_file: str) -> None:
    """Process each event as it arrives from the source."""
    processed = 0
    skipped = 0

    for raw_event in event_source(source_file):
        if not validate(raw_event):
            skipped += 1
            continue

        enriched_event = enrich(raw_event)

        # In production: publish to downstream topic or write to sink
        print(f"[{enriched_event['processed_at']}] "
              f"Order {enriched_event['order_id']} | "
              f"£{enriched_event['order_value_gbp']:.2f} | "
              f"tier={enriched_event['value_tier']}")
        processed += 1

    print(f"\nDone. Processed: {processed} | Skipped: {skipped}")

run_streaming_pipeline("order_events.jsonl")

Here's what's happening:

event_source is a generator function — note the yield keyword instead of return. Each call to yield event pauses the function and hands one event to the caller. The pipeline processes that event before the generator resumes and fetches the next one. This means only one event is in memory at a time, regardless of how large the stream is. The time.sleep(0.01) simulates the real-world delay between events arriving from a message queue.
validate checks each event for required fields and valid values before doing anything else with it. In a streaming context, bad events are super common — network issues, upstream bugs, and schema changes all produce malformed records. Validating early and skipping invalid events is far safer than letting them propagate into downstream systems.
enrich adds derived fields to the event. This can be a processing timestamp and a value tier classification. In production, this step might also join against a lookup table, call an external API, or apply a model prediction.
run_streaming_pipeline ties it together. The for loop over event_source consumes events one at a time, processes each through the validate → enrich → route stages, and keeps a running count of processed and skipped events.

When Streaming Works Well

Streaming pipelines are the right choice when:

Data freshness is measured in seconds or milliseconds. Fraud detection, real-time inventory updates, live dashboards, and alerting systems all require data to be processed immediately — a batch job running every hour would make them useless.
The data volume is too large to accumulate. High-frequency IoT sensor data, clickstream events, and financial tick data can generate millions of records per hour. Accumulating all of that before processing is often impractical – you'd need enormous storage and the processing job would take too long to be useful.
You need to react, not just report. Streaming pipelines can trigger downstream actions — send a notification, block a transaction, update a recommendation — in response to individual events. Batch pipelines can only report on what already happened.

The Key Differences at a Glance

Here is an overview of the differences between batch and stream processing we've discussed thus far:

DIMENSION	BATCH	STREAMING
Data model	Bounded, finite dataset	Unbounded, continuous flow
Processing trigger	Schedule (time or event)	Arrival of each record
Latency	Minutes to hours	Milliseconds to seconds
Throughput	High (optimized for bulk processing)	Lower per-record overhead
Complexity	Lower	Higher
State management	Stateless per run	Often stateful across events
Error handling	Retry the whole job	Per-event dead-letter queues
Consistency	Strong (point-in-time snapshot)	Eventually consistent
Best for	Reporting, ML training, backfills	Alerting, real-time features, event routing

Choosing Between Batch and Streaming

Okay, all of this info is great. But how do you choose between batch and stream processing? The decision comes down to three questions:

How fresh does the data need to be? If stakeholders can tolerate results that are hours old, batch is simpler and more cost-effective. If they need results within seconds, streaming is unavoidable.

How complex is your processing logic? Batch jobs can join across large datasets, run expensive aggregations, and apply complex business logic without worrying about latency. Streaming pipelines must process each event quickly, which constrains how much work you can do per record.

What's your operational capacity? Streaming infrastructure — Kafka clusters, Flink or Spark Streaming jobs, dead-letter queues, exactly-once delivery guarantees — is significantly more complex to operate than a scheduled Python script. If your team is small or your use case doesn't demand real-time results, that complexity is cost without benefit.

Start with batch. It's simpler to build, simpler to test, simpler to debug, and simpler to maintain. Move to streaming when a specific, concrete requirement — not a hypothetical future one — makes batch insufficient. Most data problems are batch problems, and the ones that genuinely require streaming are usually obvious when you run into them.

And as you might have guessed, you may need to combine them for some data processing systems. Which is why hybrid approaches exist.

The Hybrid Pattern: Lambda and Kappa Architectures

In practice, many production data systems use both patterns together. The two most common hybrid architectures are: Lambda and Kappa architecture.

Lambda architecture runs a batch layer and a streaming layer in parallel. The batch layer processes complete historical data and produces accurate, consistent results on a delay. The streaming layer processes live data and produces approximate results immediately. Downstream consumers merge both outputs — using the streaming result for freshness and the batch result for correctness.

The tradeoff is operational complexity: you're maintaining two separate processing codebases that must produce semantically equivalent results.

Kappa architecture simplifies this by using only a streaming layer, but with the ability to replay historical data through the same pipeline when you need batch-style reprocessing. This works well when your streaming framework like Apache Kafka and Apache Flink supports log retention and replay. You get one codebase, one set of logic, and the ability to reprocess history when your pipeline changes.

Neither architecture is universally better. Lambda is more common in organizations that adopted batch processing first and added streaming incrementally. Kappa is more common in systems designed with streaming as the primary pattern.

Conclusion

Batch and streaming are tools with different tradeoffs, each suited to a different class of problems. Batch pipelines excel at consistency, simplicity, and bulk throughput. Streaming pipelines excel at latency, reactivity, and continuous processing.

Understanding both patterns at the architectural level — before reaching for specific frameworks like Apache Spark, Kafka, or Flink — gives you the judgment to choose the right one and explain that choice clearly. The frameworks implement these patterns, while the judgment about which pattern fits your problem is yours to make first.

How to Use the Command Pattern in Python

Bala Priya C — Mon, 23 Mar 2026 21:08:03 +0000

Have you ever used an undo button in an app or scheduled tasks to run later? Both of these rely on the same idea: turning actions into objects.

That's the command pattern. Instead of calling a method directly, you package the call – the action, its target, and any arguments – into an object. That object can be stored, passed around, executed later, or undone.

In this tutorial, you'll learn what the command pattern is and how to implement it in Python with a practical text editor example that supports undo.

You can find the code for this tutorial on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Basic understanding of Python classes and methods
Familiarity with object-oriented programming (OOP) concepts

Let's get started!

What Is the Command Pattern?
Setting Up the Receiver
Defining Commands
The Invoker: Running and Undoing Commands
Putting It All Together
When to Use the Command Pattern

What Is the Command Pattern?

The command pattern is a behavioral design pattern that encapsulates a request as an object. This lets you:

Parameterize callers with different operations
Queue or schedule operations for later execution
Support undo/redo by keeping a history of executed commands

The pattern has four key participants:

Command: an interface with an execute() method (and optionally undo())
Concrete Command: implements execute() and undo() for a specific action
Receiver: the object that actually does the work (for example, a document)
Invoker: triggers commands and manages history

Think of a restaurant. The customer (client) tells the waiter (invoker) what they want. The waiter writes it on a ticket (command) and hands it to the kitchen (receiver). The waiter doesn't cook – they only manage tickets. If you change your mind, the waiter can cancel the ticket before it reaches the kitchen.

Setting Up the Receiver

We'll build a simple document editor. The receiver here is the Document class. It knows how to insert and delete text, but it has no idea who's calling it or why.

class Document:
    def __init__(self):
        self.content = ""

    def insert(self, text: str, position: int) -> None:
        self.content = (
            self.content[:position] + text + self.content[position:]
        )

    def delete(self, position: int, length: int) -> None:
        self.content = (
            self.content[:position] + self.content[position + length:]
        )

    def show(self) -> None:
        print(f'Document: "{self.content}"')

insert places text at a given position. delete removes length characters from a given position. Both are plain methods with no history or awareness of commands. And that's intentional.

Defining Commands

Now let's define a base Command interface using an abstract class:

from abc import ABC, abstractmethod

class Command(ABC):
    @abstractmethod
    def execute(self) -> None:
        pass

    @abstractmethod
    def undo(self) -> None:
        pass

Any concrete command must implement both execute and undo. This is what makes a full history possible.

`InsertCommand`

InsertCommand stores the text and position at creation time:

class InsertCommand(Command):
    def __init__(self, document: Document, text: str, position: int):
        self.document = document
        self.text = text
        self.position = position

    def execute(self) -> None:
        self.document.insert(self.text, self.position)

    def undo(self) -> None:
        self.document.delete(self.position, len(self.text))

When execute() is called, it inserts the text. When undo() is called, it deletes exactly what was inserted. Notice that undo is the inverse of execute – this is the key design requirement.

`DeleteCommand`

Now let's code the DeleteCommand:

class DeleteCommand(Command):
    def __init__(self, document: Document, position: int, length: int):
        self.document = document
        self.position = position
        self.length = length
        self._deleted_text = ""  # stored on execute, used on undo

    def execute(self) -> None:
        self._deleted_text = self.document.content[
            self.position : self.position + self.length
        ]
        self.document.delete(self.position, self.length)

    def undo(self) -> None:
        self.document.insert(self._deleted_text, self.position)

DeleteCommand has one important detail: it captures the deleted text during execute(), not at creation time. This is because we don't know what text is at that position until the command actually runs. Without this, undo() wouldn't know what to restore.

The Invoker: Running and Undoing Commands

The invoker is the object that executes commands and keeps a history stack. It has no idea what a document is or how text editing works. It just manages command objects.

class EditorInvoker:
    def __init__(self):
        self._history: list[Command] = []

    def run(self, command: Command) -> None:
        command.execute()
        self._history.append(command)

    def undo(self) -> None:
        if not self._history:
            print("Nothing to undo.")
            return
        command = self._history.pop()
        command.undo()
        print("Undo successful.")

run() executes the command and pushes it onto the history stack. undo() pops the last command and calls its undo() method. The stack naturally gives you the right order: last in, first undone.

Putting It All Together

Let's put it all together and walk through a real editing session:

doc = Document()
editor = EditorInvoker()

# Type a title
editor.run(InsertCommand(doc, "Quarterly Report", 0))
doc.show()

# Add a subtitle
editor.run(InsertCommand(doc, " - Finance", 16))
doc.show()

# Oops, wrong subtitle — undo it
editor.undo()
doc.show()

# Delete "Quarterly" and replace with "Annual"
editor.run(DeleteCommand(doc, 0, 9))
doc.show()

editor.run(InsertCommand(doc, "Annual", 0))
doc.show()

# Undo the insert
editor.undo()
doc.show()

# Undo the delete (restores "Quarterly")
editor.undo()
doc.show()

This outputs:

Document: "Quarterly Report"
Document: "Quarterly Report - Finance"
Undo successful.
Document: "Quarterly Report"
Document: " Report"
Document: "Annual Report"
Undo successful.
Document: " Report"
Undo successful.
Document: "Quarterly Report"

Here's the step-by-step breakdown of how (and why) this works:

Each InsertCommand and DeleteCommand carries its own instructions for both doing and undoing.
EditorInvoker never looks inside a command. It only calls execute() and undo().
The document (Document) never thinks about history. It mutates its content when told to.

Each participant has a single, clear responsibility.

Extending with Macros

One of the lesser-known benefits of the command pattern is that commands are just objects. So you can group them. Here's a MacroCommand that batches several commands and undoes them as a unit:

class MacroCommand(Command):
    def __init__(self, commands: list[Command]):
        self.commands = commands

    def execute(self) -> None:
        for cmd in self.commands:
            cmd.execute()

    def undo(self) -> None:
        for cmd in reversed(self.commands):
            cmd.undo()

# Apply a heading format in one shot: clear content, insert formatted title
macro = MacroCommand([
    DeleteCommand(doc, 0, len(doc.content)),
    InsertCommand(doc, "== Annual Report ==", 0),
])

editor.run(macro)
doc.show()

editor.undo()
doc.show()

This gives the following output:

Document: "== Annual Report =="
Undo successful.
Document: "Quarterly Report"

The macro undoes its commands in reverse order. This is correct since the last thing done should be the first thing undone.

When to Use the Command Pattern

The command pattern is a good fit when:

You need undo/redo: the pattern is practically made for this. Store executed commands in a stack and reverse them.
You need to queue or schedule operations: commands are objects, so you can put them in a queue, serialize them, or delay execution.
You want to decouple the caller from the action: the invoker doesn't need to know what the command does. It just runs it.
You need to support macros or batched operations: group commands into a composite and run them together, as shown above.

Avoid it when:

The operations are simple and will never need undo or queuing. The pattern adds classes and indirection that may not be worth it for a simple CRUD action.
Commands would need to share so much state that the "encapsulate the request" idea breaks down.

Conclusion

I hope you found this tutorial useful. To summarize, the command pattern turns actions into objects. And that single idea unlocks a lot: undo/redo, queuing, macros, and clean separation between who triggers an action and what the action does.

We built a document editor from scratch using InsertCommand, DeleteCommand, an EditorInvoker with a history stack, and a MacroCommand for batched edits. Each class knew exactly one thing and did it well.

As a next step, try extending the editor with a RedoCommand. You'll need a second stack alongside the history to bring back undone commands.

Happy coding!

Recursion in Python – A Practical Introduction for Beginners

Bala Priya C — Thu, 12 Mar 2026 15:57:38 +0000

Recursion is when a function solves a problem by calling itself.

It sounds odd at first — why would a function call itself? — but once it clicks, you'll find it's often the most natural way to express certain kinds of problems in code.

In this article, you'll learn what recursion is, how it works under the hood, and how to use it in Python with examples that go from the basics all the way to practical real-world use cases.

You can get the code on GitHub.

Prerequisites

Before we get started, make sure you have:

Python 3.10 or higher installed
Basic understanding of Python functions and how they work
Familiarity with loops and conditionals

What Is Recursion?
The Two Rules of Every Recursive Function
Your First Recursive Function
How Python Handles Recursive Calls
Recursion vs Iteration
Working with Nested Data
Recursive Tree Traversal
Memoization: Fixing Slow Recursion
Python's Recursion Limit
When to Use Recursion

What Is Recursion?

Recursion is a technique where a function solves a problem by breaking it into a smaller version of the same problem, and calling itself on that smaller version.

Think of a set of Russian nesting dolls or Matryoshka dolls. To find the smallest doll, you open the outer doll, then open the next one inside it, and keep going until there's nothing left to open. Each step is the same action — open the doll — just on a smaller doll than before.

That's recursion in a nutshell: the same action, applied to a shrinking problem, until you hit a point where there's nothing left to do.

The Two Rules of Every Recursive Function

Every correct recursive function must have exactly two things:

1. A base case — the condition where the function stops calling itself and returns a result directly.

2. A recursive case — the part where the function calls itself with a smaller or simpler version of the input.

If you forget the base case, the function will keep calling itself forever — until Python raises a RecursionError. We'll talk more about that later.

Your First Recursive Function

Let's start with the classic example: calculating a factorial.

The factorial of n (written as n!) is the product of all integers from 1 to n. So 5! = 5 × 4 × 3 × 2 × 1 = 120.

Notice the pattern: 5! = 5 × 4!. And 4! = 4 × 3!. Each factorial is just n multiplied by the factorial of the number below it. That's a perfect fit for recursion.

def factorial(n):
    # Base case: factorial of 0 or 1 is 1
    if n <= 1:
        return 1
    # Recursive case: n! = n * (n-1)!
    return n * factorial(n - 1)

print(factorial(5))
print(factorial(10))

This outputs:

120
3628800

The base case is n <= 1. When we hit 0 or 1, we stop and return 1. The recursive case is n * factorial(n - 1). We multiply n by the factorial of the number below it, trusting the function to figure out the rest.

How Python Handles Recursive Calls

When a function calls itself, Python doesn't just replace the current call – it stacks them. Each call waits for the one below it to return a value before it can finish.

Let's trace factorial(4) step by step:

factorial(4)
  └── 4 * factorial(3)
            └── 3 * factorial(2)
                      └── 2 * factorial(1)
                                └── returns 1   ← base case
                      └── 2 * 1 = 2
            └── 3 * 2 = 6
  └── 4 * 6 = 24

Each call is pushed onto Python's call stack. Once the base case returns, the stack unwinds — each waiting call gets its answer and finishes. This is why deep recursion can be a problem: too many stacked calls and Python runs out of stack space.

Recursion vs Iteration

Most problems you can solve recursively, you can also solve with a loop. Let's compare both approaches for summing a list of numbers.

Iterative approach:

def sum_iterative(numbers):
    total = 0
    for n in numbers:
        total += n
    return total

Recursive approach:

def sum_recursive(numbers):
    if not numbers:       # base case: empty list
        return 0
    return numbers[0] + sum_recursive(numbers[1:])

print(sum_recursive([10, 20, 30, 40]))

The recursive function call gives the following output:

The recursive version says: the sum of a list is the first element plus the sum of everything else. The base case is an empty list, which sums to 0.

Both the iterative and recursive approaches work. Recursion tends to be more expressive. It's closer to how you'd describe the problem in plain English. Iteration tends to be more efficient in Python. Knowing when to use which one is a skill you'll develop with practice.

Working with Nested Data

Here's where recursion really starts to make sense. Loops are great for flat data, but nested structures — like a folder tree or a deeply nested dictionary — are often awkward to handle with loops alone.

Let's say you have a nested dictionary representing a product catalog, and you want to find all the prices buried inside it:

catalog = {
    "electronics": {
        "laptops": {
            "ThinkPad X1": 1299.99,
            "MacBook Air": 1099.99
        },
        "accessories": {
            "USB-C Hub": 49.99,
            "Laptop Stand": 34.99
        }
    },
    "stationery": {
        "Notebook A5": 8.99,
        "Gel Pen Set": 12.49
    }
}

def find_all_prices(data):
    prices = []
    for value in data.values():
        if isinstance(value, dict):
            # It's a nested dict — recurse into it
            prices.extend(find_all_prices(value))
        else:
            # It's a price — collect it
            prices.append(value)
    return prices

all_prices = find_all_prices(catalog)
print(f"All prices: {all_prices}")
print(f"Total inventory value: ${sum(all_prices):.2f}")

The function checks each value. If it's another dictionary, it recurses into it. If it's a number, it collects it.

Writing this with nested loops would require you to know the depth of the structure in advance. Recursion doesn't care how deep it goes.

Output:

All prices: [1299.99, 1099.99, 49.99, 34.99, 8.99, 12.49]
Total inventory value: $2506.44

Recursive Tree Traversal

A tree is a structure where each node can have child nodes, and each child node is itself a tree. That self-similar structure maps directly to a recursive function.

Let's build a simple file system tree and calculate the total size of all files:

class FileNode:
    def __init__(self, name, size=0, children=None):
        self.name = name
        self.size = size  # 0 for folders
        self.children = children or []

def total_size(node):
    # Base case: it's a file (no children)
    if not node.children:
        return node.size
    # Recursive case: sum this node's size + all children's sizes
    return node.size + sum(total_size(child) for child in node.children)

# Build a small file tree
project = FileNode("project", children=[
    FileNode("src", children=[
        FileNode("main.py", size=12400),
        FileNode("utils.py", size=5800),
    ]),
    FileNode("data", children=[
        FileNode("sales_jan.parquet", size=302914),
        FileNode("sales_feb.parquet", size=289000),
    ]),
    FileNode("README.md", size=3200)
])

print(f"Total project size: {total_size(project):,} bytes")
print(f"Source files only: {total_size(project.children[0]):,} bytes")

For each node, we either return its size directly (base case: it's a file) or add its size to the sum of all its children (recursive case: it's a folder). The structure of the code mirrors the structure of the tree.

Running the above should give the following output:

Total project size: 613,314 bytes
Source files only: 18,200 bytes

Memoization: Fixing Slow Recursion

Recursion can sometimes do a lot of repeated work. The classic example is the Fibonacci sequence, where each number is the sum of the two before it.

def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

This works, but fib(35) already takes a noticeable pause. The problem is that fib(30) gets calculated dozens of times across different branches.

The fix is memoization which involves caching results so each value is only computed once. Python makes this super simple with functools.lru_cache, which you can use like so:

from functools import lru_cache

@lru_cache(maxsize=None)
def fib_fast(n):
    if n <= 1:
        return n
    return fib_fast(n - 1) + fib_fast(n - 2)

print(fib_fast(10))
print(fib_fast(50))
print(fib_fast(100))

Output:

55
12586269025
354224848179261915075

Adding @lru_cache stores the result of each call. The next time fib_fast(30) is needed, it returns the cached value instantly instead of recalculating the entire subtree.

Note: Memoization is worth reaching for any time your recursive function might solve the same subproblem more than once.

Python's Recursion Limit

Python sets a default recursion limit of 1000 calls. If your function goes deeper than that, you'll get a RecursionError:

def countdown(n):
    if n == 0:
        return "Done"
    return countdown(n - 1)

print(countdown(5))     # works fine
print(countdown(2000))  # raises RecursionError

Here countdown(5) works and we get the Done message while countdown(2000) gives a RecursionError as it exceeds the preset recursion limit of 1000 recursive calls.

Done
RecursionError: maximum recursion depth exceeded

You can raise the limit with sys.setrecursionlimit(), but this is usually a sign that iteration — or memoization — is the better tool for that particular problem.

import sys
sys.setrecursionlimit(5000)  # use with caution

For most tree traversals and divide-and-conquer algorithms, the default limit is more than enough. You'll only hit it when working with very deep input structures.

When to Use Recursion

Recursion is a good fit when:

The problem has a naturally self-similar structure — trees, graphs, nested data, file systems
You're implementing divide-and-conquer algorithms — merge sort, binary search, quicksort
The recursive solution is significantly clearer than the iterative equivalent
The depth of the structure is unknown at compile time

Prefer iteration when:

You're working with flat sequences — summing a list, searching an array
Performance is critical — Python doesn't optimise tail calls, so deep recursion has overhead
The input could be very large or deeply nested — risking a RecursionError

Conclusion

Recursion takes a little time to feel natural, but the core idea is simple: solve a small version of the problem, trust the function to handle the rest, and always define a base case to stop.

You've covered the fundamentals: how the call stack works, how to handle nested data, tree traversal, and how to speed things up with memoization. These patterns come up repeatedly in practice, especially when working with file systems, parsers, and hierarchical data.

The best way to get comfortable with recursion is to pick a suitable problem and try writing it recursively before reaching for a loop. The thinking gets easier every time. You can also solve related programming challenges on HackerRank or Leetcode.

Happy coding!

How to Implement the Strategy Pattern in Python

Bala Priya C — Wed, 11 Mar 2026 20:40:29 +0000

Have you ever opened a food delivery app and chosen between "fastest route", "cheapest option", or "fewest stops"? Or picked a payment method at checkout like credit card, PayPal, or wallet balance? Behind both of these, there's a good chance the strategy pattern is at work.

The strategy pattern lets you define a family of algorithms, put each one in its own class, and make them interchangeable at runtime. Instead of writing a giant if/elif chain every time behavior needs to change, you swap in the right strategy for the job.

In this tutorial, you'll learn what the strategy pattern is, why it's useful, and how to implement it in Python with practical examples.

You can get the code on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Basic understanding of Python classes and methods
Familiarity with object-oriented programming (OOP) concepts

Let's get started!

What Is the Strategy Pattern?
A Simple Strategy Pattern Example
Swapping Strategies at Runtime
Using Abstract Base Classes
When to Use the Strategy Pattern

What Is the Strategy Pattern?

The strategy pattern defines a way to encapsulate a group of related algorithms so they can be used interchangeably. The object that uses the algorithm, called the context, doesn't need to know how it works. It just delegates the work to whichever strategy is currently set.

Think of it like a GPS app. The destination is the same, but you can switch between "avoid highways", "shortest distance", or "least traffic" without changing the destination or the app itself. Each routing option is a separate strategy.

The pattern is useful when:

You have multiple variations of an algorithm or behavior
You want to eliminate long if/elif conditionals based on type
You want to swap behavior at runtime without changing the context class
Different parts of your app need different variations of the same operation

Now let's look at examples to understand this better.

A Simple Strategy Pattern Example

Let's build a simple e-commerce order system where different discount strategies can be applied at checkout.

First, let's create the three discount strategies:

class RegularDiscount:
    def apply(self, price):
        return price * 0.95  # 5% off

class SeasonalDiscount:
    def apply(self, price):
        return price * 0.80  # 20% off

class NoDiscount:
    def apply(self, price):
        return price  # no change

Each class has a single apply method that takes a price and returns the discounted price. They share the same interface but implement different logic: that's the key concept in the strategy pattern.

Now let's create the Order class that uses one of these strategies:

class Order:
    def __init__(self, product, price, discount_strategy):
        self.product = product
        self.price = price
        self.discount_strategy = discount_strategy

    def final_price(self):
        return self.discount_strategy.apply(self.price)

    def summary(self):
        print(f"Product : {self.product}")
        print(f"Original: ${self.price:.2f}")
        print(f"Final   : ${self.final_price():.2f}")
        print("-" * 30)

The Order class is our context. It doesn't contain any discount logic itself – it delegates that entirely to discount_strategy.apply(). Whichever strategy object you pass in, that's the one that runs.

Now let's place some orders:

order1 = Order("Mechanical Keyboard", 120.00, NoDiscount())
order2 = Order("Laptop Stand", 45.00, RegularDiscount())
order3 = Order("USB-C Hub", 35.00, SeasonalDiscount())

order1.summary()
order2.summary()
order3.summary()

Running the above code should give you the following output:

Product : Mechanical Keyboard
Original: $120.00
Final   : $120.00
------------------------------
Product : Laptop Stand
Original: $45.00
Final   : $42.75
------------------------------
Product : USB-C Hub
Original: $35.00
Final   : $28.00
------------------------------

Notice how Order never checks if discount_type == "seasonal". It just calls apply() and trusts the strategy to handle it. Adding a new discount type in the future means creating one new class and nothing else changes.

Swapping Strategies at Runtime

One of the biggest advantages of the strategy pattern is that you can change the strategy while the program is running. Let's say a user upgrades to a premium membership mid-session:

class ShoppingCart:
    def __init__(self):
        self.items = []
        self.discount_strategy = NoDiscount()  # default

    def add_item(self, name, price):
        self.items.append({"name": name, "price": price})

    def set_discount(self, strategy):
        self.discount_strategy = strategy
        print(f"Discount updated to: {strategy.__class__.__name__}")

    def checkout(self):
        print("\n--- Checkout Summary ---")
        total = 0
        for item in self.items:
            discounted = self.discount_strategy.apply(item["price"])
            print(f"{item['name']}: ${discounted:.2f}")
            total += discounted
        print(f"Total: ${total:.2f}\n")

The set_discount method lets us replace the strategy at any point. Let's see it in action:

cart = ShoppingCart()
cart.add_item("Notebook", 15.00)
cart.add_item("Desk Lamp", 40.00)
cart.add_item("Monitor Riser", 25.00)

# Checkout as a regular customer
cart.checkout()

# User upgrades to seasonal sale membership
cart.set_discount(SeasonalDiscount())
cart.checkout()

This outputs:

--- Checkout Summary ---
Notebook: $15.00
Desk Lamp: $40.00
Monitor Riser: $25.00
Total: $80.00

Discount updated to: SeasonalDiscount

--- Checkout Summary ---
Notebook: $12.00
Desk Lamp: $32.00
Monitor Riser: $20.00
Total: $64.00

The cart itself didn't change – only the strategy did. This is the advantage of keeping behavior separate from the context that uses it.

Using Abstract Base Classes

So far, nothing enforces that every strategy has an apply method. If someone creates a strategy and forgets it, they'll get a cryptic AttributeError at runtime. We can prevent that using Python's Abstract Base Classes.

from abc import ABC, abstractmethod

class DiscountStrategy(ABC):
    @abstractmethod
    def apply(self, price: float) -> float:
        pass

Now let's rewrite our strategies to inherit from it:

class RegularDiscount(DiscountStrategy):
    def apply(self, price):
        return price * 0.95

class SeasonalDiscount(DiscountStrategy):
    def apply(self, price):
        return price * 0.80

class NoDiscount(DiscountStrategy):
    def apply(self, price):
        return price

Now if someone creates a broken strategy without apply, Python will raise a TypeError immediately when they try to instantiate it — before any code runs. That's a much cleaner failure.

class BrokenStrategy(DiscountStrategy):
    pass  # forgot to implement apply()

s = BrokenStrategy()  # raises TypeError right here

Using ABCs is especially helpful on larger teams or in shared codebases, where you want to make the contract explicit: every strategy must implement apply. Else, you run into an error as shown.

      2     pass  # forgot to implement apply()
      3 
----> 4 s = BrokenStrategy()  # raises TypeError right here

TypeError: Can't instantiate abstract class BrokenStrategy without an implementation for abstract method 'apply'

When to Use the Strategy Pattern

The Strategy pattern is a good fit when:

You have branching logic based on type — long if/elif blocks that check a "mode" or "type" variable are a signal that Strategy might help.
Behavior needs to change at runtime — when users or config values should be able to switch algorithms without restarting.
You're building extensible systems — new behavior can be added as a new class without touching existing code.
You want to test algorithms independently — each strategy is its own class, making unit tests straightforward.

Avoid it when:

You only have two variations that will never grow — a simple if/else is perfectly fine there.
The strategies share so much state that separating them into classes adds complexity without benefit.

Conclusion

I hope you found this tutorial useful. To sum up, the strategy pattern gives you a clean way to manage varying behavior without polluting your classes with conditional logic. The context stays simple and stable and the strategies handle the complexity.

We covered the basic pattern, runtime strategy swapping, and enforcing contracts with abstract base classes. As with most design patterns, start simple: even without ABCs, separating your algorithms into their own classes immediately makes your code easier to read, test, and extend.

Happy coding!

How to Implement the Observer Pattern in Python

Bala Priya C — Tue, 17 Feb 2026 19:42:28 +0000

Have you ever wondered how YouTube notifies you when your favorite channel uploads a new video? Or how your email client alerts you when new messages arrive? These are perfect examples of the observer pattern in action.

The observer pattern is a design pattern where an object (called the subject) maintains a list of dependents (called observers) and notifies them automatically when its state changes. It's like having a newsletter subscription: when new content is published, all subscribers get notified.

In this tutorial, you'll learn what the observer pattern is, why it's useful, and how to implement it in Python with practical examples.

You can find the code on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Understanding of how Python classes and methods work
Familiarity with object-oriented programming (OOP) concepts

Let's get started!

Prerequisites
What Is the Observer Pattern?
A Simple Observer Pattern Example
Handling Unsubscribes
Different Types of Observers
Using Abstract Base Classes
When to Use the Observer Pattern

What Is the Observer Pattern?

The observer pattern defines a one-to-many relationship between objects. When one object changes state, all its dependents are notified and updated automatically.

Think of it like a news agency and reporters. When breaking news happens (the subject), the agency notifies all subscribed reporters (observers) immediately. Each reporter can then handle the news in their own way – some might tweet it, others might write articles, and some might broadcast it on TV.

The pattern is useful when:

You need to notify multiple objects about state changes
You want loose coupling between objects
You don't know how many objects need to be notified in advance
Objects should be able to subscribe and unsubscribe dynamically

A Simple Observer Pattern Example

Let's start with a basic example: a blog that notifies readers when a new article is published.

We'll create a blog (subject) and email subscribers (observers) who get notified automatically when new content is posted.

First, let's build the Blog class that will manage subscribers and send notifications:

class Blog:
    def __init__(self, name):
        self.name = name
        self._subscribers = []
        self._latest_post = None

    def subscribe(self, subscriber):
        """Add a subscriber to the blog"""
        if subscriber not in self._subscribers:
            self._subscribers.append(subscriber)
            print(f"✓ {subscriber.email} subscribed to {self.name}")

    def unsubscribe(self, subscriber):
        """Remove a subscriber from the blog"""
        if subscriber in self._subscribers:
            self._subscribers.remove(subscriber)
            print(f"✗ {subscriber.email} unsubscribed from {self.name}")

    def notify_all(self):
        """Send notifications to all subscribers"""
        print(f"\nNotifying {len(self._subscribers)} subscribers...")
        for subscriber in self._subscribers:
            subscriber.receive_notification(self.name, self._latest_post)

    def publish_post(self, title):
        """Publish a new post and notify subscribers"""
        print(f"\n📝 {self.name} published: '{title}'")
        self._latest_post = title
        self.notify_all()

The Blog class is our subject. It maintains a list of subscribers in _subscribers and stores the latest post title in _latest_post. The subscribe method adds subscribers to the list, checking for duplicates. The notify_all method loops through all subscribers and calls their receive_notification method. When we call publish_post, it updates the latest post and automatically notifies all subscribers.

Now let's create the observer class that receives notifications:

class EmailSubscriber:
    def __init__(self, email):
        self.email = email

    def receive_notification(self, blog_name, post_title):
        print(f"📧 Email sent to {self.email}: New post on {blog_name} - '{post_title}'")

The EmailSubscriber class is our observer. It has one method, receive_notification, which handles incoming notifications from the blog.

Now let's use these classes together:

# Create a blog
tech_blog = Blog("DevDaily")

# Create subscribers
reader1 = EmailSubscriber("anna@example.com")
reader2 = EmailSubscriber("betty@example.com")
reader3 = EmailSubscriber("cathy@example.com")

# Subscribe to the blog
tech_blog.subscribe(reader1)
tech_blog.subscribe(reader2)
tech_blog.subscribe(reader3)

# Publish posts
tech_blog.publish_post("10 Python Tips for Beginners")
tech_blog.publish_post("Understanding Design Patterns")

Output:

✓ anna@example.com subscribed to DevDaily
✓ betty@example.com subscribed to DevDaily
✓ cathy@example.com subscribed to DevDaily

📝 DevDaily published: '10 Python Tips for Beginners'

Notifying 3 subscribers...
📧 Email sent to anna@example.com: New post on DevDaily - '10 Python Tips for Beginners'
📧 Email sent to betty@example.com: New post on DevDaily - '10 Python Tips for Beginners'
📧 Email sent to cathy@example.com: New post on DevDaily - '10 Python Tips for Beginners'

📝 DevDaily published: 'Understanding Design Patterns'

Notifying 3 subscribers...
📧 Email sent to anna@example.com: New post on DevDaily - 'Understanding Design Patterns'
📧 Email sent to betty@example.com: New post on DevDaily - 'Understanding Design Patterns'
📧 Email sent to cathy@example.com: New post on DevDaily - 'Understanding Design Patterns'

Notice how the Blog class doesn't need to know the details of how each subscriber handles the notification. It just calls their receive_notification method.

Note: Think of all the examples here as placeholder functions that explain how the observer pattern works. In your projects, you’ll have functions that connect to email and other services.

Handling Unsubscribes

In real applications, users need to be able to unsubscribe. Here's how that works:

blog = Blog("CodeMaster")

user1 = EmailSubscriber("john@example.com")
user2 = EmailSubscriber("jane@example.com")

# Subscribe users
blog.subscribe(user1)
blog.subscribe(user2)

# Publish a post
blog.publish_post("Getting Started with Python")

# User1 unsubscribes
blog.unsubscribe(user1)

# Publish another post - only user2 gets notified
blog.publish_post("Advanced Python Techniques")

Output:

✓ john@example.com subscribed to CodeMaster
✓ jane@example.com subscribed to CodeMaster

📝 CodeMaster published: 'Getting Started with Python'

Notifying 2 subscribers...
📧 Email sent to john@example.com: New post on CodeMaster - 'Getting Started with Python'
📧 Email sent to jane@example.com: New post on CodeMaster - 'Getting Started with Python'
✗ john@example.com unsubscribed from CodeMaster

📝 CodeMaster published: 'Advanced Python Techniques'

Notifying 1 subscribers...
📧 Email sent to jane@example.com: New post on CodeMaster - 'Advanced Python Techniques'

After user1 unsubscribes, only user2 receives the notification for the second post. The observer pattern makes it easy to add and remove observers dynamically.

Different Types of Observers

One super useful aspect of the observer pattern is that different observers can react differently to the same event. Let's create a stock price tracker where multiple observer types respond to price changes.

First, let's create the Stock class that will notify observers when the price changes:

class Stock:
    def __init__(self, symbol, price):
        self.symbol = symbol
        self._price = price
        self._observers = []

    def add_observer(self, observer):
        self._observers.append(observer)
        print(f"Observer added: {observer.__class__.__name__}")

    def remove_observer(self, observer):
        self._observers.remove(observer)

    def notify_observers(self):
        for observer in self._observers:
            observer.update(self.symbol, self._price)

    def set_price(self, price):
        print(f"\n {self.symbol} price changed: ${self._price} → ${price}")
        self._price = price
        self.notify_observers()

The Stock class maintains the current price and notifies all observers whenever set_price is called.

Now let's create three different observer types that respond differently to price updates:

class EmailAlert:
    def __init__(self, email):
        self.email = email

    def update(self, symbol, price):
        print(f"📧 Sending email to {self.email}: {symbol} is now ${price}")

class SMSAlert:
    def __init__(self, phone):
        self.phone = phone

    def update(self, symbol, price):
        print(f"📱 Sending SMS to {self.phone}: {symbol} price update ${price}")

class Logger:
    def update(self, symbol, price):
        print(f"📝 Logging: {symbol} = ${price} at system time")

Each observer has a different implementation of the update method. EmailAlert sends emails, SMSAlert sends text messages, and Logger records the change.

Now let's use them together:

# Create a stock
apple_stock = Stock("AAPL", 150.00)

# Create different types of observers
email_notifier = EmailAlert("investor@example.com")
sms_notifier = SMSAlert("+1234567890")
price_logger = Logger()

# Add all observers
apple_stock.add_observer(email_notifier)
apple_stock.add_observer(sms_notifier)
apple_stock.add_observer(price_logger)

# Update the stock price
apple_stock.set_price(155.50)
apple_stock.set_price(152.25)

Output:

Observer added: EmailAlert
Observer added: SMSAlert
Observer added: Logger

 AAPL price changed: $150.0 → $155.5
📧 Sending email to investor@example.com: AAPL is now $155.5
📱 Sending SMS to +1234567890: AAPL price update $155.5
📝 Logging: AAPL = $155.5 at system time

 AAPL price changed: $155.5 → $152.25
📧 Sending email to investor@example.com: AAPL is now $152.25
📱 Sending SMS to +1234567890: AAPL price update $152.25
📝 Logging: AAPL = $152.25 at system time

The Stock class doesn't care what each observer does. It simply calls update on each one and passes the necessary data. You can mix and match observers however you want.

Using Abstract Base Classes

To enforce a consistent interface across all observers, we can use Python's Abstract Base Classes. This guarantees type safety.

First, let's create the base classes that define our interface:

from abc import ABC, abstractmethod

class Subject(ABC):
    def __init__(self):
        self._observers = []

    def attach(self, observer):
        if observer not in self._observers:
            self._observers.append(observer)

    def detach(self, observer):
        self._observers.remove(observer)

    def notify(self, data):
        for observer in self._observers:
            observer.update(data)

class Observer(ABC):
    @abstractmethod
    def update(self, data):
        pass

The Subject class provides standard observer management methods. The Observer class defines the interface with the @abstractmethod decorator ensuring all observers implement update.

Now let's create an order system that uses these base classes:

class OrderSystem(Subject):
    def __init__(self):
        super().__init__()
        self._order_id = None

    def place_order(self, order_id, items):
        print(f"\n🛒 Order #{order_id} placed with {len(items)} items")
        self._order_id = order_id
        self.notify({"order_id": order_id, "items": items})

The OrderSystem inherits from Subject and can manage observers without implementing that logic itself.

Next, let's create concrete observers for different departments:

class InventoryObserver(Observer):
    def update(self, data):
        print(f"📦 Inventory: Updating stock for order #{data['order_id']}")

class ShippingObserver(Observer):
    def update(self, data):
        print(f"🚚 Shipping: Preparing shipment for order #{data['order_id']}")

class BillingObserver(Observer):
    def update(self, data):
        print(f"💳 Billing: Processing payment for order #{data['order_id']}")

Each observer must implement the update method. Now let's put it all together:

# Create the order system
order_system = OrderSystem()

# Create observers
inventory = InventoryObserver()
shipping = ShippingObserver()
billing = BillingObserver()

# Attach observers
order_system.attach(inventory)
order_system.attach(shipping)
order_system.attach(billing)

# Place an order
order_system.place_order("ORD-12345", ["Laptop", "Mouse", "Keyboard"])

Output:

🛒 Order #ORD-12345 placed with 3 items
📦 Inventory: Updating stock for order #ORD-12345
🚚 Shipping: Preparing shipment for order #ORD-12345
💳 Billing: Processing payment for order #ORD-12345

Using abstract base classes provides type safety and ensures all observers follow the same interface.

When to Use the Observer Pattern

The observer pattern is suiatble for:

Event-driven systems – GUI frameworks, game engines, or any system where actions trigger updates elsewhere.
Real-time notifications – Chat apps, social media feeds, stock tickers, or push notification systems.
Decoupled architecture – When you want the subject independent of its observers for flexibility.
Multiple listeners – When multiple objects need to react to the same event differently.

Avoid the Observer Pattern when you have simple one-to-one relationships, or when performance is critical with many observers (because notification overhead can be significant).

Conclusion

The observer pattern creates a clean separation between objects that produce events and objects that respond to them. It promotes loose coupling – the subject doesn't need to know anything about its observers except that they have an update method.

We've covered the basic implementation, handling subscriptions, using different observer types, and abstract base classes. Start simple with the basic subject-observer relationship and add complexity only when needed.

Happy coding!

How to Use the Factory Pattern in Python - A Practical Guide

Bala Priya C — Mon, 09 Feb 2026 15:03:55 +0000

Design patterns are proven solutions to common problems in software development. If you've ever found yourself writing repetitive object creation code or struggling to manage different types of objects, the factory pattern might be exactly what you need.

In this tutorial, you'll learn what the factory pattern is, why it's useful, and how to implement it in Python. We'll build practical examples that show you when and how to use this pattern in real-world applications.

You can find the code on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Understanding of Python classes and methods
Familiarity with object-oriented programming (OOP) concepts

Let’s get started!

What Is the Factory Pattern?
A Simple Factory Example
Using a Dictionary for Cleaner Code
Factory Pattern with Parameters
Using Abstract Base Classes
A More Helpful Example: Database Connection Factory
When to Use the Factory Pattern

What Is the Factory Pattern?

The factory pattern is a creational design pattern that provides an interface for creating objects without specifying their exact classes. Instead of calling a constructor directly, you call a factory method that decides which class to instantiate.

Think of it like ordering food at a restaurant. You don't go into the kitchen and make the food yourself. You tell the waiter what you want, and the kitchen (the factory) creates it for you. You get your meal without worrying about the recipe or cooking process.

The factory pattern is useful when:

You have multiple related classes and need to decide which one to instantiate at runtime
Object creation logic is complex and you want to encapsulate it
You want to make your code more maintainable and testable

A Simple Factory Example

Let's start with a basic example. Say you're building a notification system that can send messages via email, SMS, or push notifications.

Without a factory, you might write code like this everywhere in your application:

# Bad approach - tight coupling
if notification_type == "email":
    notifier = EmailNotifier()
elif notification_type == "sms":
    notifier = SMSNotifier()
elif notification_type == "push":
    notifier = PushNotifier()

This gets messy quickly. Let's use a factory instead:

class EmailNotifier:
    def send(self, message):
        return f"Sending email: {message}"

class SMSNotifier:
    def send(self, message):
        return f"Sending SMS: {message}"

class PushNotifier:
    def send(self, message):
        return f"Sending push notification: {message}"

class NotificationFactory:
    @staticmethod
    def create_notifier(notifier_type):
        if notifier_type == "email":
            return EmailNotifier()
        elif notifier_type == "sms":
            return SMSNotifier()
        elif notifier_type == "push":
            return PushNotifier()
        else:
            raise ValueError(f"Unknown notifier type: {notifier_type}")

In this code, we define three notifier classes, each with a send method.

Note: In a real application, these would have different implementations for sending notifications.

The NotificationFactory class has a static method called create_notifier. This is our factory method. It takes a string parameter and returns the appropriate notifier object.

The @staticmethod decorator means we can call this method without creating an instance of the factory. We just use NotificationFactory.create_notifier().

# Using the factory
notifier = NotificationFactory.create_notifier("email")
result = notifier.send("Hello, World!")

Now, whenever we need a notifier, we call the factory instead of instantiating classes directly. This centralizes our object creation logic in one place.

Using a Dictionary for Cleaner Code

The if-elif chain in our factory can get unwieldy as we add more notifier types. Let's refactor using a dictionary:

class NotificationFactory:
    notifier_types = {
        "email": EmailNotifier,
        "sms": SMSNotifier,
        "push": PushNotifier
    }

    @staticmethod
    def create_notifier(notifier_type):
        notifier_class = NotificationFactory.notifier_types.get(notifier_type)
        if notifier_class:
            return notifier_class()
        else:
            raise ValueError(f"Unknown notifier type: {notifier_type}")

This approach is much cleaner. We store a dictionary that maps strings to class objects and not instances. The keys are notifier type names, and the values are the actual class references.

The get method retrieves the class from the dictionary. If the key doesn't exist, it returns None. We then instantiate the class by calling it with parentheses: notifier_class().

# Test with different types
email_notifier = NotificationFactory.create_notifier("email")
sms_notifier = NotificationFactory.create_notifier("sms")
push_notifier = NotificationFactory.create_notifier("push")

This makes adding new notifier types easier. You just add another entry to the dictionary.

Factory Pattern with Parameters

Real-world objects often need configuration. Let's extend our factory to handle notifiers that require initialization parameters.

We'll create a document generator that produces different file formats with custom settings:

class PDFDocument:
    def __init__(self, title, author):
        self.title = title
        self.author = author
        self.format = "PDF"

    def generate(self):
        return f"Generating {self.format}: '{self.title}' by {self.author}"

class WordDocument:
    def __init__(self, title, author):
        self.title = title
        self.author = author
        self.format = "DOCX"

    def generate(self):
        return f"Generating {self.format}: '{self.title}' by {self.author}"

class MarkdownDocument:
    def __init__(self, title, author):
        self.title = title
        self.author = author
        self.format = "MD"

    def generate(self):
        return f"Generating {self.format}: '{self.title}' by {self.author}"

class DocumentFactory:
    document_types = {
        "pdf": PDFDocument,
        "word": WordDocument,
        "markdown": MarkdownDocument
    }

    @staticmethod
    def create_document(doc_type, title, author):
        document_class = DocumentFactory.document_types.get(doc_type)
        if document_class:
            return document_class(title, author)
        else:
            raise ValueError(f"Unknown document type: {doc_type}")

The key difference here is that our factory method now accepts additional parameters.

The create_document method takes doc_type, title, and author as arguments. When we instantiate the class, we pass the title and author to the create_document constructor: document_class(title, author).

# Create different documents with parameters
pdf = DocumentFactory.create_document("pdf", "Python Guide", "Tutorial Team")
word = DocumentFactory.create_document("word", "Meeting Notes", "Grace Dev")
markdown = DocumentFactory.create_document("markdown", "README", "DevTeam")

This lets us create fully configured objects through the factory while keeping the creation logic centralized.

Using Abstract Base Classes

To make our factory more robust, we can use Python's Abstract Base Classes (ABC) to enforce a common interface.

Let's create a super simple payment processing system:

from abc import ABC, abstractmethod

class PaymentProcessor(ABC):
    @abstractmethod
    def process_payment(self, amount):
        pass

    @abstractmethod
    def refund(self, transaction_id):
        pass

class CreditCardProcessor(PaymentProcessor):
    def process_payment(self, amount):
        return f"Processing ${amount} via Credit Card"

    def refund(self, transaction_id):
        return f"Refunding credit card transaction {transaction_id}"

class PayPalProcessor(PaymentProcessor):
    def process_payment(self, amount):
        return f"Processing ${amount} via PayPal"

    def refund(self, transaction_id):
        return f"Refunding PayPal transaction {transaction_id}"

class PaymentFactory:
    processors = {
        "credit_card": CreditCardProcessor,
        "paypal": PayPalProcessor
    }

    @staticmethod
    def create_processor(processor_type):
        processor_class = PaymentFactory.processors.get(processor_type)
        if processor_class:
            return processor_class()
        else:
            raise ValueError(f"Unknown processor type: {processor_type}")

Here, the PaymentProcessor class defines an interface that all payment processors must implement. The @abstractmethod decorator marks methods that subclasses must override.

You cannot instantiate PaymentProcessor directly. It only serves as a blueprint. All concrete processors (CreditCardProcessor, PayPalProcessor) must implement both process_payment and refund methods. If they don't, Python will raise an error. This guarantees that any object created by our factory will have the expected methods, making our code more predictable and safer.

You can use the factory like so:

processor = PaymentFactory.create_processor("paypal")

A More Helpful Example: Database Connection Factory

Let's build something practical: a factory that creates different database connection objects based on configuration.

class MySQLConnection:
    def __init__(self, host, database):
        self.host = host
        self.database = database
        self.connection_type = "MySQL"

    def connect(self):
        return f"Connected to {self.connection_type} at {self.host}/{self.database}"

    def execute_query(self, query):
        return f"Executing on MySQL: {query}"

class PostgreSQLConnection:
    def __init__(self, host, database):
        self.host = host
        self.database = database
        self.connection_type = "PostgreSQL"

    def connect(self):
        return f"Connected to {self.connection_type} at {self.host}/{self.database}"

    def execute_query(self, query):
        return f"Executing on PostgreSQL: {query}"

class SQLiteConnection:
    def __init__(self, host, database):
        self.host = host
        self.database = database
        self.connection_type = "SQLite"

    def connect(self):
        return f"Connected to {self.connection_type} at {self.host}/{self.database}"

    def execute_query(self, query):
        return f"Executing on SQLite: {query}"

class DatabaseFactory:
    db_types = {
        "mysql": MySQLConnection,
        "postgresql": PostgreSQLConnection,
        "sqlite": SQLiteConnection
    }

    @staticmethod
    def create_connection(db_type, host, database):
        db_class = DatabaseFactory.db_types.get(db_type)
        if db_class:
            return db_class(host, database)
        else:
            raise ValueError(f"Unknown database type: {db_type}")

    @staticmethod
    def create_from_config(config):
        """Create a database connection from a configuration dictionary"""
        return DatabaseFactory.create_connection(
            config["type"],
            config["host"],
            config["database"]
        )

This example shows a more realistic use case. We have multiple database connection classes, each with the same interface but different implementations.

The factory has two creation methods: create_connection for direct parameters and create_from_config for configuration dictionaries.

The create_from_config method is particularly useful because it lets you load database settings from a config file or environment variables and create the appropriate connection object.

This pattern makes it easy to switch between different databases without changing your application code. You just change the configuration as shown:

# Use with direct parameters
db1 = DatabaseFactory.create_connection("mysql", "localhost", "myapp_db")
print(db1.connect())
print(db1.execute_query("SELECT * FROM users"))

# Use with configuration dictionary
config = {
    "type": "postgresql",
    "host": "db.example.com",
    "database": "production_db"
}
db2 = DatabaseFactory.create_from_config(config)

When to Use the Factory Pattern

The factory pattern is useful when you have the following:

Multiple related classes: When you have several classes that share a common interface but have different implementations (like the payment processors or database connections we had in the examples).
Runtime decisions: When you need to decide which class to instantiate based on user input, configuration, or other runtime conditions.
Complex object creation: When creating an object involves multiple steps or requires specific logic that you want to encapsulate.

However, don't use the factory pattern when:

You only have one or two simple classes
Object creation is straightforward with no special logic
The added abstraction makes your code harder to understand

Wrapping Up

The factory pattern is a useful tool for managing object creation in Python. It helps you write cleaner, more maintainable code by centralizing creation logic and decoupling your code from specific class implementations. We've covered:

Basic factory implementation with simple examples
Using dictionaries for cleaner factory code
Passing parameters to factory-created objects
Using abstract base classes for cleaner interfaces

The key takeaway is this: whenever you find yourself writing repetitive object creation code or need to decide which class to instantiate at runtime, consider using the factory pattern. Start simple and add complexity only when needed. The basic dictionary-based factory is often all you need for most applications.

Happy coding!

How to Use the Builder Pattern in Python – A Practical Guide for Developers

Bala Priya C — Wed, 28 Jan 2026 16:39:14 +0000

Creating complex objects can get messy. You've probably written constructors with too many parameters, struggled with optional arguments, or created objects that require multiple setup steps. The builder pattern solves these problems by separating object construction from representation.

In this tutorial, I'll show you how to implement the builder pattern in Python. I’ll also explain when it's useful, and show practical examples you can use in your projects.

You can find the code on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Understanding of Python classes and methods
Familiarity with object-oriented programming (OOP) concepts

Let’s get started!

Understanding the Builder Pattern
The Problem: Complex Object Construction
Basic Builder Pattern Implementation
A More Helpful Example: SQL Query Builder
Validation and Error Handling
The Pythonic Builder Pattern
When to Use the Builder Pattern

Understanding the Builder Pattern

The builder pattern addresses the problem of constructing complex objects. Instead of cramming all construction logic into a constructor, you create a separate builder class that constructs the object incrementally.

Consider building a SQL query. A simple query might be SELECT * FROM users, but most queries have WHERE clauses, JOINs, ORDER BY, GROUP BY, and LIMIT clauses. You could pass all these as constructor parameters, but that becomes unwieldy fast. The builder pattern lets you construct the query piece by piece.

The pattern separates two concerns: what the final object should be (the product) and how to build it (the builder). This separation gives you flexibility because you can now have multiple builders that create the same type of object in different ways, or one builder that creates different variations.

Python is simpler and more flexible to code in, which means we can implement builders more elegantly than in languages like Java or C++. We'll explore both traditional and Pythonic approaches.

The Problem: Complex Object Construction

Let's start with a problem that shows why builders are useful. We'll create an HTTP request configuration – something complex enough to show the pattern's value without being overwhelming.

# The naive approach - constructor with many parameters
class HTTPRequest:
    def __init__(self, url, method="GET", headers=None, body=None, 
                 timeout=30, auth=None, verify_ssl=True, allow_redirects=True,
                 max_redirects=5, cookies=None, proxies=None):
        self.url = url
        self.method = method
        self.headers = headers or {}
        self.body = body
        self.timeout = timeout
        self.auth = auth
        self.verify_ssl = verify_ssl
        self.allow_redirects = allow_redirects
        self.max_redirects = max_redirects
        self.cookies = cookies or {}
        self.proxies = proxies or {}

# Using it is messy
request = HTTPRequest(
    "https://api.example.com/users",
    method="POST",
    headers={"Content-Type": "application/json"},
    body='{"name": "John"}',
    timeout=60,
    auth=("username", "password"),
    verify_ssl=True,
    allow_redirects=False,
    max_redirects=0,
    cookies={"session": "abc123"},
    proxies={"http": "proxy.example.com"}
)

print(f"Request to: {request.url}")
print(f"Method: {request.method}")
print(f"Timeout: {request.timeout}s")

Output:

Request to: https://api.example.com/users
Method: POST
Timeout: 60s

This constructor is hard to use. You need to remember parameter order, pass None for things you don't want, and it's unclear what the defaults are. When creating the request, you can't tell which parameters are required without checking the documentation. This is where the builder pattern comes in handy.

Basic Builder Pattern Implementation

Let's rebuild this using the builder pattern. The builder provides methods for setting each property, making construction explicit and readable.

First, we define the product class, which is the object we want to build:

class HTTPRequest:
    """The product - what we're building"""
    def __init__(self, url):
        self.url = url
        self.method = "GET"
        self.headers = {}
        self.body = None
        self.timeout = 30
        self.auth = None
        self.verify_ssl = True
        self.allow_redirects = True
        self.max_redirects = 5
        self.cookies = {}
        self.proxies = {}

    def execute(self):
        """Simulate executing the request"""
        auth_str = f" (auth: {self.auth[0]})" if self.auth else ""
        return f"{self.method} {self.url}{auth_str} - timeout: {self.timeout}s"

Now we create the builder class. Each method modifies the request and returns self to enable method chaining:

class HTTPRequestBuilder:
    """The builder - constructs HTTPRequest step by step"""
    def __init__(self, url):
        self._request = HTTPRequest(url)

    def method(self, method):
        """Set HTTP method (GET, POST, etc.)"""
        self._request.method = method.upper()
        return self  # Return self for method chaining

    def header(self, key, value):
        """Add a header"""
        self._request.headers[key] = value
        return self

    def headers(self, headers_dict):
        """Add multiple headers at once"""
        self._request.headers.update(headers_dict)
        return self

    def body(self, body):
        """Set request body"""
        self._request.body = body
        return self

    def timeout(self, seconds):
        """Set timeout in seconds"""
        self._request.timeout = seconds
        return self

    def auth(self, username, password):
        """Set basic authentication"""
        self._request.auth = (username, password)
        return self

    def disable_ssl_verification(self):
        """Disable SSL certificate verification"""
        self._request.verify_ssl = False
        return self

    def disable_redirects(self):
        """Disable automatic redirects"""
        self._request.allow_redirects = False
        self._request.max_redirects = 0
        return self

    def build(self):
        """Return the constructed request"""
        return self._request

Now let's use the builder to create a request:

# Now using the builder is much cleaner and more readable
request = (HTTPRequestBuilder("https://api.example.com/users")
    .method("POST")
    .header("Content-Type", "application/json")
    .header("Accept", "application/json")
    .body('{"name": "John", "email": "john@example.com"}')
    .timeout(60)
    .auth("username", "password")
    .disable_redirects()
    .build())

print(request.execute())
print(f"\nHeaders: {request.headers}")
print(f"SSL verification: {request.verify_ssl}")
print(f"Allow redirects: {request.allow_redirects}")

Output:


Headers: {'Content-Type': 'application/json', 'Accept': 'application/json'}
SSL verification: True
Allow redirects: False

The builder makes construction much clearer. Each method describes what it does, and method chaining creates a fluent interface that reads almost like English. You only specify what you need – everything else gets sensible defaults. The construction process is explicit and self-documenting.

Notice that each builder method returns self. This enables method chaining where you can call multiple methods in sequence. The final build() method returns the constructed object. This separation between building and the final product is the core of the pattern.

A More Helpful Example: SQL Query Builder

Let's build something more useful and helps us understand how the pattern works: a SQL query builder. This is a practical tool you might actually use in projects.

First, we define the SQL query product class:

class SQLQuery:
    """The product - represents a SQL query"""
    def __init__(self):
        self.select_columns = []
        self.from_table = None
        self.joins = []
        self.where_conditions = []
        self.group_by_columns = []
        self.having_conditions = []
        self.order_by_columns = []
        self.limit_value = None
        self.offset_value = None

    def to_sql(self):
        """Convert the query object to SQL string"""
        if not self.from_table:
            raise ValueError("FROM clause is required")

        # Build SELECT clause
        columns = ", ".join(self.select_columns) if self.select_columns else "*"
        sql = f"SELECT {columns}"

        # Add FROM clause
        sql += f"\nFROM {self.from_table}"

        # Add JOINs
        for join in self.joins:
            sql += f"\n{join}"

        # Add WHERE clause
        if self.where_conditions:
            conditions = " AND ".join(self.where_conditions)
            sql += f"\nWHERE {conditions}"

        # Add GROUP BY
        if self.group_by_columns:
            columns = ", ".join(self.group_by_columns)
            sql += f"\nGROUP BY {columns}"

        # Add HAVING
        if self.having_conditions:
            conditions = " AND ".join(self.having_conditions)
            sql += f"\nHAVING {conditions}"

        # Add ORDER BY
        if self.order_by_columns:
            columns = ", ".join(self.order_by_columns)
            sql += f"\nORDER BY {columns}"

        # Add LIMIT and OFFSET
        if self.limit_value:
            sql += f"\nLIMIT {self.limit_value}"
        if self.offset_value:
            sql += f"\nOFFSET {self.offset_value}"

        return sql

Now we create the query builder with methods for each SQL clause:

class QueryBuilder:
    """Builder for SQL queries"""
    def __init__(self):
        self._query = SQLQuery()

    def select(self, *columns):
        """Add columns to SELECT clause"""
        self._query.select_columns.extend(columns)
        return self

    def from_table(self, table):
        """Set the FROM table"""
        self._query.from_table = table
        return self

    def join(self, table, on_condition, join_type="INNER"):
        """Add a JOIN clause"""
        join_clause = f"{join_type} JOIN {table} ON {on_condition}"
        self._query.joins.append(join_clause)
        return self

    def left_join(self, table, on_condition):
        """Convenience method for LEFT JOIN"""
        return self.join(table, on_condition, "LEFT")

    def where(self, condition):
        """Add a WHERE condition"""
        self._query.where_conditions.append(condition)
        return self

    def group_by(self, *columns):
        """Add GROUP BY columns"""
        self._query.group_by_columns.extend(columns)
        return self

    def having(self, condition):
        """Add a HAVING condition"""
        self._query.having_conditions.append(condition)
        return self

    def order_by(self, *columns):
        """Add ORDER BY columns"""
        self._query.order_by_columns.extend(columns)
        return self

    def limit(self, value):
        """Set LIMIT"""
        self._query.limit_value = value
        return self

    def offset(self, value):
        """Set OFFSET"""
        self._query.offset_value = value
        return self

    def build(self):
        """Return the constructed query"""
        return self._query

Let's use the builder to create queries:

# Example 1: Simple query
simple_query = (QueryBuilder()
    .select("id", "name", "email")
    .from_table("users")
    .where("status = 'active'")
    .order_by("name")
    .limit(10)
    .build())

print("Simple Query:")
print(simple_query.to_sql())

Output:

Simple Query:
SELECT id, name, email
FROM users
WHERE status = 'active'
ORDER BY name
LIMIT 10

Now let's create a more complex query with joins and aggregations:

# Example 2: Complex query with joins and aggregations
complex_query = (QueryBuilder()
    .select("u.name", "COUNT(o.id) as order_count", "SUM(o.total) as total_spent")
    .from_table("users u")
    .left_join("orders o", "u.id = o.user_id")
    .where("u.created_at >= '2024-01-01'")
    .where("u.country = 'US'")
    .group_by("u.id", "u.name")
    .having("COUNT(o.id) > 5")
    .order_by("total_spent DESC")
    .limit(20)
    .build())

print("Complex Query:")
print(complex_query.to_sql())

Output:

Complex Query:
SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01' AND u.country = 'US'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY total_spent DESC
LIMIT 20

This SQL builder shows that the builder pattern is useful. Building SQL queries programmatically is complex. There are many optional clauses that must appear in a specific order. The builder handles all this complexity, giving you a clean API that prevents errors like putting WHERE after GROUP BY.

The builder ensures you can't create invalid queries (like forgetting the FROM clause) while keeping the API flexible. You can chain methods in any order during construction, and the to_sql() method handles ordering the clauses correctly. This separation of construction from representation is exactly what the builder pattern provides.

Validation and Error Handling

Good builders validate data during construction. Let's improve our HTTP request builder with validation.

class HTTPRequestBuilder:
    """Enhanced builder with validation"""
    VALID_METHODS = {"GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS"}

    def __init__(self, url):
        if not url:
            raise ValueError("URL cannot be empty")
        if not url.startswith(("http://", "https://")):
            raise ValueError("URL must start with http:// or https://")

        self._request = HTTPRequest(url)

    def method(self, method):
        """Set HTTP method with validation"""
        method = method.upper()
        if method not in self.VALID_METHODS:
            raise ValueError(f"Invalid HTTP method: {method}")
        self._request.method = method
        return self

    def timeout(self, seconds):
        """Set timeout with validation"""
        if seconds <= 0:
            raise ValueError("Timeout must be positive")
        if seconds > 300:
            raise ValueError("Timeout cannot exceed 300 seconds")
        self._request.timeout = seconds
        return self

    def header(self, key, value):
        """Add header with validation"""
        if not key or not value:
            raise ValueError("Header key and value cannot be empty")
        self._request.headers[key] = value
        return self

    def body(self, body):
        """Set request body"""
        self._request.body = body
        return self

    def build(self):
        """Validate and return the request"""
        # Final validation before building
        if self._request.method in {"POST", "PUT", "PATCH"} and not self._request.body:
            raise ValueError(f"{self._request.method} requests typically require a body")

        return self._request

Now let's test the validation:

# Valid request
try:
    valid_request = (HTTPRequestBuilder("https://api.example.com/data")
        .method("POST")
        .body('{"key": "value"}')
        .timeout(45)
        .build())
    print("✓ Valid request created successfully")
except ValueError as e:
    print(f"✗ Error: {e}")

# Invalid request - bad method
try:
    invalid_request = (HTTPRequestBuilder("https://api.example.com/data")
        .method("INVALID")
        .build())
except ValueError as e:
    print(f"✓ Caught error: {e}")

# Invalid request - POST without body
try:
    invalid_request = (HTTPRequestBuilder("https://api.example.com/data")
        .method("POST")
        .build())
except ValueError as e:
    print(f"✓ Caught error: {e}")

Output:

✓ Valid request created successfully
✓ Caught error: Invalid HTTP method: INVALID
✓ Caught error: POST requests typically require a body

Validation in the builder catches errors early, during construction rather than at execution time. This is much better than discovering problems when you try to use the object. The builder becomes a gatekeeper that ensures only valid objects are created.

Each builder method validates its input immediately. The final build() method performs cross-field validation, which checks that require looking at multiple properties together. This layered validation approach catches errors at the most appropriate point.

The Pythonic Builder Pattern

Python's flexibility allows for more concise builder implementations. Here's a Pythonic version using keyword arguments (**kwargs) and context managers.

First, let's define our email message class:

class EmailMessage:
    """Email message with builder pattern using kwargs"""
    def __init__(self, **kwargs):
        self.to = kwargs.get('to', [])
        self.cc = kwargs.get('cc', [])
        self.bcc = kwargs.get('bcc', [])
        self.subject = kwargs.get('subject', '')
        self.body = kwargs.get('body', '')
        self.attachments = kwargs.get('attachments', [])
        self.priority = kwargs.get('priority', 'normal')

    def send(self):
        """Simulate sending the email"""
        recipients = len(self.to) + len(self.cc) + len(self.bcc)
        attachments = f" with {len(self.attachments)} attachment(s)" if self.attachments else ""
        return f"Sending '{self.subject}' to {recipients} recipient(s){attachments}"

Now we create a builder that accumulates parameters:

class EmailBuilder:
    """Pythonic email builder"""
    def __init__(self):
        self._params = {}

    def to(self, *addresses):
        """Add TO recipients"""
        self._params.setdefault('to', []).extend(addresses)
        return self

    def cc(self, *addresses):
        """Add CC recipients"""
        self._params.setdefault('cc', []).extend(addresses)
        return self

    def subject(self, subject):
        """Set email subject"""
        self._params['subject'] = subject
        return self

    def body(self, body):
        """Set email body"""
        self._params['body'] = body
        return self

    def attach(self, *files):
        """Attach files"""
        self._params.setdefault('attachments', []).extend(files)
        return self

    def priority(self, level):
        """Set priority (low, normal, high)"""
        if level not in ('low', 'normal', 'high'):
            raise ValueError("Priority must be low, normal, or high")
        self._params['priority'] = level
        return self

    def build(self):
        """Build the email message"""
        if not self._params.get('to'):
            raise ValueError("At least one recipient is required")
        if not self._params.get('subject'):
            raise ValueError("Subject is required")

        return EmailMessage(**self._params)

Let's use it to build and send an email:

# Build and send an email
email = (EmailBuilder()
    .to("alice@example.com", "bob@example.com")
    .cc("manager@example.com")
    .subject("Q4 Sales Report")
    .body("Please find the Q4 sales report attached.")
    .attach("q4_report.pdf", "sales_data.xlsx")
    .priority("high")
    .build())

print(email.send())
print(f"To: {email.to}")
print(f"CC: {email.cc}")
print(f"Priority: {email.priority}")
print(f"Attachments: {email.attachments}")

Output:

Sending 'Q4 Sales Report' to 3 recipient(s) with 2 attachment(s)
To: ['alice@example.com', 'bob@example.com']
CC: ['manager@example.com']
Priority: high
Attachments: ['q4_report.pdf', 'sales_data.xlsx']

This Pythonic version uses **kwargs to pass parameters to the product, making the builder more flexible. The builder accumulates parameters in a dictionary and passes them all at once during build(). This approach is cleaner for Python.

The key here is that Python doesn't require the boilerplate code that other languages need. We can achieve the same benefits with less boilerplate while still maintaining the builder pattern's core advantages: readable construction, validation, and separation of concerns.

When to Use the Builder Pattern

The builder pattern is useful in specific situations. Understanding when to use it helps you avoid overengineering.

Use builder pattern when:

You're creating objects with many optional parameters. If your constructor has more than 3-4 parameters, especially if many are optional, consider a builder. The pattern makes construction explicit and self-documenting.
Object construction requires multiple steps or specific ordering. If you need to set up an object through several method calls in a particular sequence, a builder can enforce and simplify this process.
You need to create different variations of an object. Builders can create different representations of the same type, like different SQL query types or different HTTP request configurations.

However, don't use builders when:

Your objects are simple. If a regular constructor with 2-3 parameters works fine, don't add builder complexity. Python's keyword arguments already make simple construction readable.
You're only setting attributes. Python objects can have attributes set directly. If there's no validation or complex construction logic, a builder adds unnecessary complexity.

The pattern is useful for complex configuration objects, query builders, document generators, or any object that requires careful step-by-step construction. For simple data containers, stick with straightforward constructors.

Conclusion

I hope you found this tutorial useful. The builder pattern separates object construction from representation, making complex objects easier to create and maintain. You've learned how to implement builders in Python, from traditional approaches to more Pythonic variants using the language's dynamic features.

Remember that the builder pattern is a tool, not a requirement. Use it when construction is genuinely complex and the pattern adds clarity. For simple objects, Python's flexibility provides simpler solutions. Choose the right tool for your specific problem, and you'll write clearer, more maintainable code.

Happy coding!

How to Build a Singleton in Python (and Why You Probably Shouldn't)

Bala Priya C — Thu, 22 Jan 2026 18:13:33 +0000

The singleton pattern ensures that a class has exactly one instance throughout your application. You've probably seen it in configuration managers, database connections, or logging systems. While singletons seem useful, they often create more problems than they solve.

In this tutorial, I'll show you how to implement singletons in Python, explain when they might be appropriate, and discuss better alternatives for most use cases.

You can find the code on GitHub.

Prerequisites

Before we start, make sure you have:

Python 3.10 or higher installed
Understanding of Python classes and decorators
Familiarity with object-oriented programming concepts

No external libraries needed as we'll use only Python's standard library.

What is a Singleton?
The Classic Singleton Pattern
The Decorator Pattern
The Metaclass Approach
Thread-Safe Singleton
Why You Probably Shouldn't Use Singletons
Better Alternatives to the Singleton Pattern
When Singletons Are Acceptable

What Is a Singleton?

A singleton is a design pattern that restricts a class to a single instance. No matter how many times you try to create an object from that class, you always get the same instance back.

The classic use case is a configuration object. You want all parts of your application to share the same configuration, not create separate copies. Instead of passing the config object everywhere, the singleton pattern lets you access it globally.

Here's the problem: global state is problematic. When any part of your code can modify shared state, debugging becomes difficult. You lose the ability to reason about code in isolation. Tests become harder because they share state between runs.

Despite these issues, there are a few genuine use cases. Let's explore how to build singletons properly, then discuss when you actually need them.

The Classic Singleton Pattern

The traditional approach uses a class variable to store the single instance. When you try to create a new instance, the class checks if one already exists.

class DatabaseConnection:
    """
    Classic Singleton pattern using __new__
    """
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            print("Creating new database connection")
            cls._instance = super().__new__(cls)
            cls._instance._initialized = False
        return cls._instance

    def __init__(self):
        # Only initialize once
        if not self._initialized:
            print("Initializing database connection")
            self.connection_string = "postgresql://localhost/mydb"
            self.pool_size = 10
            self._initialized = True

    def query(self, sql):
        return f"Executing: {sql}"

Let’s now test the singleton behavior:

db1 = DatabaseConnection()
print(f"db1 connection: {db1.connection_string}")

print("\nCreating second instance:")
db2 = DatabaseConnection()
print(f"db2 connection: {db2.connection_string}")

print(f"\nAre they the same object? {db1 is db2}")

Output:

Creating new database connection
Initializing database connection
db1 connection: postgresql://localhost/mydb

Creating second instance:
Are they the same object? True

The __new__ method controls object creation in Python. By overriding it, we intercept instance creation and return our stored instance if it exists. The __init__ method still runs each time, so we add an _initialized flag to prevent re-initialization.

This pattern works, but it's verbose and easy to mess up. The _initialized flag feels like a hack. Let's look at cleaner approaches.

The Decorator Pattern

A more Pythonic approach uses a decorator to handle the singleton logic. This keeps the class clean and moves the singleton behavior to a reusable decorator.

def singleton(cls):
    """
    Decorator that converts a class into a singleton
    """
    instances = {}

    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]

    return get_instance

@singleton
class AppConfig:
    """
    Application configuration as a singleton
    """
    def __init__(self):
        print("Loading configuration...")
        self.debug_mode = True
        self.api_key = "secret-key-12345"
        self.max_connections = 100
        self.timeout = 30

    def update_setting(self, key, value):
        setattr(self, key, value)
        print(f"Updated {key} = {value}")

As with the earlier approach, let’s test the decorator approach:

# First access
config1 = AppConfig()
print(f"Debug mode: {config1.debug_mode}")

# Second access - no re-initialization
print("\nAccessing config again:")
config2 = AppConfig()
config2.update_setting("timeout", 60)

print(f"\nconfig1 timeout: {config1.timeout}")
print(f"Same instance? {config1 is config2}")

Output:

Loading configuration...
Debug mode: True

Accessing config again:
Updated timeout = 60

config1 timeout: 60
Same instance? True

The decorator pattern is cleaner. The @singleton decorator wraps the class and maintains instances in a closure. This keeps singleton logic separate from the class implementation. The class itself remains simple and testable.

Notice how modifying config2 affects config1 as they're the same object. This shared state can be useful but also dangerous. Any code that gets the config can modify it, potentially breaking other parts of your application.

The Metaclass Approach

For more control, you can use a metaclass. Metaclasses control class creation itself, making them a natural fit for singletons.

class SingletonMeta(type):
    """
    Metaclass that creates singleton instances
    """
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            instance = super().__call__(*args, **kwargs)
            cls._instances[cls] = instance
        return cls._instances[cls]

class Logger(metaclass=SingletonMeta):
    """
    Simple logging singleton using metaclass
    """
    def __init__(self):
        self.logs = []

    def log(self, message):
        self.logs.append(message)
        print(f"[LOG] {message}")

    def get_logs(self):
        return self.logs

Let’s test the above metaclass approach to building a singleton:

# Use the logger from different parts of code
logger1 = Logger()
logger1.log("Application started")
logger1.log("User logged in")

# Another part of code gets the same logger
logger2 = Logger()
logger2.log("Processing request")

print(f"\nTotal logs in logger1: {len(logger1.get_logs())}")
print(f"Total logs in logger2: {len(logger2.get_logs())}")
print(f"Same logger? {logger1 is logger2}")

Output:

[LOG] Application started
[LOG] User logged in
[LOG] Processing request

Total logs in logger1: 3
Total logs in logger2: 3
Same logger? True

The metaclass approach is elegant if you're comfortable with metaclasses. The __call__ method intercepts class instantiation, allowing us to return the existing instance. This happens at a deeper level than __new__, making it more robust.

However, metaclasses add complexity. Most Python developers don't work with them regularly, making code harder to understand. Use this approach only if you need the additional control metaclasses provide.

Thread-Safe Singleton

The previous implementations aren't thread-safe. In multi-threaded applications, two threads might create instances simultaneously. Let's fix that.

import threading

class ThreadSafeSingleton:
    """
    Thread-safe singleton using a lock
    """
    _instance = None
    _lock = threading.Lock()

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                # Double-check pattern
                if cls._instance is None:
                    print(f"Thread {threading.current_thread().name}: Creating instance")
                    cls._instance = super().__new__(cls)
                    cls._instance._initialized = False
        return cls._instance

    def __init__(self):
        if not self._initialized:
            with self._lock:
                if not self._initialized:
                    print(f"Thread {threading.current_thread().name}: Initializing")
                    self.data = {}
                    self._initialized = True

Now let’s test the above singleton with multiple threads and verify that it’s a singleton with only one instance:

# Test with multiple threads
def create_singleton(thread_id):
    instance = ThreadSafeSingleton()
    instance.data[thread_id] = f"Data from thread {thread_id}"

threads = []
for i in range(5):
    t = threading.Thread(target=create_singleton, args=(i,), name=f"Thread-{i}")
    threads.append(t)
    t.start()

for t in threads:
    t.join()

# Verify it's a singleton
final = ThreadSafeSingleton()
print(f"\nShared data across all threads: {final.data}")

Output:

Thread Thread-0: Creating instance
Thread Thread-0: Initializing

Shared data across all threads: 
{0: 'Data from thread 0', 
1: 'Data from thread 1', 
2: 'Data from thread 2', 
3: 'Data from thread 3', 
4: 'Data from thread 4'}

The lock ensures that only one thread creates the instance. The double-check pattern avoids acquiring the lock on every access. We only lock when the instance might be None. This is more efficient than locking every time.

Note on Python 3.13+: Python 3.13 introduced a build option for a free-threaded mode, and this has become more mainstream in Python 3.14. With true parallelism, thread safety becomes even more essential. The global interpreter lock (GIL) previously masked some race conditions by preventing true parallel execution. In free-threaded Python, explicit locking like this becomes essential for correctness, not just good practice. If you're writing code for Python 3.13+ with free-threading enabled, always use proper synchronization primitives like locks for shared mutable state.

Why You Probably Shouldn't Use Singletons

Now that you know how to build singletons, let me explain why you often shouldn't.

Singletons are global state in disguise. Global state makes code harder to understand, test, and maintain. When any code can access and modify a singleton, you lose the ability to reason about your code locally. Changes in one module can break another through shared state.

Singletons make testing difficult. Tests should be independent, but Singletons carry state between tests. You need to reset the singleton before each test, which is error-prone. Worse, you can't easily mock a singleton for testing.

Singletons violate the Single Responsibility Principle. The class handles both its core logic and the singleton mechanism. This mixing of concerns makes code harder to maintain.

Python has better alternatives. Module-level objects are natural singletons. Dependency injection provides better control. Context managers handle resource lifetime cleanly.

Better Alternatives to the Singleton Pattern

Instead of singletons, consider these patterns.

Module-level Instances

Module-level instances are Python's natural singleton. Import a module, and you get the same instance every time. Here’s how you can do it:

# config.py
class Config:
    def __init__(self):
        self.debug = True
        self.api_key = "secret-key"

    def update(self, key, value):
        setattr(self, key, value)

# Create a single instance at module level
config = Config()

This is simpler and more Pythonic. The module system ensures you get the same instance. No special patterns needed. You can use it like so:

# main.py
from config import config

config.update("debug", False)
print(f"Debug mode: {config.debug}")

Let’s now take a closer look at how and why this works. Python's module system is itself a singleton mechanism: when you import a module, Python executes it once and caches the result in sys.modules. Every subsequent import returns the cached module object, not a new one.

When config.py runs for the first time, it creates the Config instance and assigns it to the module-level variable config. This happens only once, during the initial import. Any other file that imports config from this module gets a reference to that same object, not a new instance. So from config import config in multiple files will always give you the exact same Config instance, achieving singleton behavior without any special patterns, metaclasses, or decorators.

Dependency Injection

Dependency injection gives you control without global state. It solves the singleton problem by making dependencies explicit parameters instead of hidden global state. Here’s an example:

class DatabaseConnection:
    def __init__(self, connection_string):
        self.connection_string = connection_string

    def query(self, sql):
        return f"Executing {sql}"

class UserRepository:
    def __init__(self, db):
        self.db = db

    def get_user(self, user_id):
        return self.db.query(f"SELECT * FROM users WHERE id = {user_id}")

Instead of UserRepository creating or accessing a global database singleton internally, it receives the database connection through its constructor (__init__). This means you control exactly which database instance gets used. In production you pass a real DatabaseConnection, but in tests you can pass a mock object that doesn't actually connect to a database.

The key here is that UserRepository doesn't know or care whether it's getting a singleton, a mock, or a fresh instance each time. It just knows it received something that has a query method.

# Create dependencies explicitly
db = DatabaseConnection("postgresql://localhost/mydb")
user_repo = UserRepository(db)

result = user_repo.get_user(123)

This makes the code's dependencies visible in the function signature, eliminates hidden global state, makes testing trivial (just pass different objects), and gives you complete control over object lifecycles without needing any singleton patterns at all.

When Singletons Are Acceptable

Despite the drawbacks, some cases justify the use of singletons. Here are some of them:

Hardware interfaces that represent unique physical resources. You might have one camera, one printer, or one GPIO interface. A singleton models this accurately.

Caching layers where you want a single shared cache across your application. Though even here, dependency injection might be cleaner.

Thread pools or connection pools where you want to limit and share expensive resources. The pool itself might be a singleton, though the resources it manages aren't.

Even in these cases, ask yourself: could I use dependency injection instead? Could I make this a module-level instance? The answer is often yes.

Conclusion

I hope you found this tutorial helpful. You've learned several ways to implement singletons in Python: the classic pattern, decorators, metaclasses, and thread-safe variants. Each approach has trade-offs in complexity, readability, and thread safety.

More importantly, you've learned why singletons often aren't the best solution. Global state, testing difficulties, and violation of design principles make Singletons problematic. Module-level instances and dependency injection usually provide better alternatives.

When you reach for a Singleton, pause and ask: do I really need shared global state? Often the answer is no. But when you do need it, now you know how to implement it properly.

Use singletons sparingly, if at all.

How to Work with the ORC File Format in Python – A Guide with Examples

Bala Priya C — Wed, 14 Jan 2026 01:24:44 +0000

If you've worked with big data or analytics platforms, you've probably heard about ORC files. But what exactly are they, and how can you work with them in Python?

In this tutorial, I'll walk you through the basics of reading, writing, and manipulating ORC files using Python. By the end, you'll understand when to use ORC and how to integrate it into your data pipelines.

You can find the code on GitHub.

What is the ORC File Format?
Prerequisites
Reading ORC Files in Python
Writing ORC Files with Compression
Working with Complex Data Types
A More Helpful Example: Processing Log Data
When Should You Use ORC?

What Is the ORC File Format?

ORC stands for Optimized Row Columnar. It's a columnar storage file format designed for Hadoop workloads. Unlike traditional row-based formats like CSV, ORC stores data by columns, which makes it incredibly efficient for analytical queries.

Here's why ORC is popular:

ORC files are highly compressed, often 75% smaller than text files
Columnar format means you only read the columns you need
You can add or remove columns without rewriting data
ORC includes lightweight indexes for faster queries

Most organizations use ORC for their big data processing because it works well with Apache Hive, Spark, and Presto.

Prerequisites

Before we get started, make sure you have:

Python 3.10 or a later version installed
Basic understanding of DataFrames (pandas or similar)
Familiarity with file I/O operations

You'll need to install these libraries:

pip install pyarrow pandas

So why do we need PyArrow? PyArrow is the Python implementation of Apache Arrow, which provides excellent support for columnar formats like ORC and Parquet. It's fast, memory-efficient, and actively maintained.

Reading ORC Files in Python

Let's start by reading an ORC file. First, I'll show you how to create a sample ORC file so we have something to work with.

Creating a Sample ORC File

Here's how we'll create a simple employee dataset and save it as ORC:

import pandas as pd
import pyarrow as pa
import pyarrow.orc as orc

# Create sample employee data
data = {
    'employee_id': [101, 102, 103, 104, 105],
    'name': ['Alice Johnson', 'Bob Smith', 'Carol White', 'David Brown', 'Eve Davis'],
    'department': ['Engineering', 'Sales', 'Engineering', 'HR', 'Sales'],
    'salary': [95000, 65000, 88000, 72000, 71000],
    'years_experience': [5, 3, 7, 4, 3]
}

df = pd.DataFrame(data)

# Convert to PyArrow Table and write as ORC
table = pa.Table.from_pandas(df)
orc.write_table(table, 'employees.orc')

print("ORC file created successfully!")

This outputs:

ORC file created successfully!

Let me break down what's happening here. We start with a pandas DataFrame containing employee information. Then we convert it to a PyArrow table, which is PyArrow's in-memory representation of columnar data. Finally, we use orc.write_table() to write it to disk in ORC format.

The conversion to a PyArrow table is necessary because ORC is a columnar format, and PyArrow handles the translation from row-based pandas to column-based storage.

Reading the ORC File

Now that we have an ORC file, let's read it back:

# Read ORC file
table = orc.read_table('employees.orc')

# Convert to pandas DataFrame for easier viewing
df_read = table.to_pandas()

print(df_read)
print(f"\nData types:\n{df_read.dtypes}")

Output:

   employee_id           name   department  salary  years_experience
0          101  Alice Johnson  Engineering   95000                 5
1          102      Bob Smith        Sales   65000                 3
2          103    Carol White  Engineering   88000                 7
3          104    David Brown           HR   72000                 4
4          105      Eve Davis        Sales   71000                 3

Data types:
employee_id          int64
name                object
department          object
salary               int64
years_experience     int64
dtype: object

The orc.read_table() function loads the entire ORC file into memory as a PyArrow table. We then convert it back to pandas for familiar DataFrame operations.

Notice how the data types are preserved. ORC maintains schema information, so your integers stay integers and strings stay strings.

Reading Specific Columns

Here's where ORC really shines. When working with large datasets, you often don't need all columns. ORC lets you read only what you need:

# Read only specific columns
table_subset = orc.read_table('employees.orc', columns=['name', 'salary'])
df_subset = table_subset.to_pandas()

print(df_subset)

Output:

            name  salary
0  Alice Johnson   95000
1      Bob Smith   65000
2    Carol White   88000
3    David Brown   72000
4      Eve Davis   71000

This is called column pruning, and it's a massive performance optimization. If your ORC file has 50 columns but you only need 3, you're reading a fraction of the data. This translates to faster load times and lower memory usage.

Writing ORC Files with Compression

ORC supports multiple compression codecs. Let's explore how to use compression when writing files:

# Create a larger dataset
large_data = {
    'id': range(10000),
    'value': [f"data_{i}" for i in range(10000)],
    'category': ['A', 'B', 'C', 'D'] * 2500
}

df_large = pd.DataFrame(large_data)
table_large = pa.Table.from_pandas(df_large)

# Write with ZLIB compression (default)
orc.write_table(table_large, 'data_zlib.orc', compression='ZLIB')

# Write with SNAPPY compression (faster but less compression)
orc.write_table(table_large, 'data_snappy.orc', compression='SNAPPY')

# Write with ZSTD compression (good balance)
orc.write_table(table_large, 'data_zstd.orc', compression='ZSTD')

import os
print(f"ZLIB size: {os.path.getsize('data_zlib.orc'):,} bytes")
print(f"SNAPPY size: {os.path.getsize('data_snappy.orc'):,} bytes")
print(f"ZSTD size: {os.path.getsize('data_zstd.orc'):,} bytes")

Output:

ZLIB size: 23,342 bytes
SNAPPY size: 44,978 bytes
ZSTD size: 6,380 bytes

Different compression codecs offer different trade-offs. ZLIB gives better compression but is slower. SNAPPY is faster but produces larger files. ZSTD offers a good balance between compression ratio and speed.

For most use cases, I recommend ZSTD. It's fast enough for real-time processing and provides excellent compression.

Working with Complex Data Types

ORC handles nested data structures well. Here's how to work with lists and nested data:

# Create data with complex types
complex_data = {
    'user_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Carol'],
    'purchases': [
        ['laptop', 'mouse'],
        ['keyboard'],
        ['monitor', 'cable', 'stand']
    ],
    'ratings': [
        [4.5, 5.0],
        [3.5],
        [4.0, 4.5, 5.0]
    ]
}

df_complex = pd.DataFrame(complex_data)
table_complex = pa.Table.from_pandas(df_complex)
orc.write_table(table_complex, 'complex_data.orc')

# Read it back
table_read = orc.read_table('complex_data.orc')
df_read = table_read.to_pandas()

print(df_read)
print(f"\nType of 'purchases' column: {type(df_read['purchases'][0])}")

Output:

   user_id   name                purchases          ratings
0        1  Alice          [laptop, mouse]       [4.5, 5.0]
1        2    Bob               [keyboard]            [3.5]
2        3  Carol  [monitor, cable, stand]  [4.0, 4.5, 5.0]

Type of 'purchases' column:

ORC preserves list structures, which is incredibly useful for storing JSON-like data or aggregated information. Each cell can contain a list, and ORC handles the variable-length storage efficiently.

A More Helpful Example: Processing Log Data

Let's put this together with a practical example. Imagine you're processing web server logs:

from datetime import datetime, timedelta
import random

# Generate sample log data
log_data = []
start_date = datetime(2025, 1, 1)

for i in range(1000):
    log_data.append({
        'timestamp': start_date + timedelta(minutes=i),
        'user_id': random.randint(1000, 9999),
        'endpoint': random.choice(['/api/users', '/api/products', '/api/orders']),
        'status_code': random.choice([200, 200, 200, 404, 500]),
        'response_time_ms': random.randint(50, 2000)
    })

df_logs = pd.DataFrame(log_data)

# Write logs to ORC
table_logs = pa.Table.from_pandas(df_logs)
orc.write_table(table_logs, 'server_logs.orc', compression='ZSTD')

# Later, query only failed requests
table_subset = orc.read_table('server_logs.orc')
df_subset = table_subset.to_pandas()

# Filter for errors
errors = df_subset[df_subset['status_code'] >= 400]
print(f"Total errors: {len(errors)}")
print(f"\nError breakdown:\n{errors['status_code'].value_counts()}")
print(f"\nSlowest error response: {errors['response_time_ms'].max()}ms")

Output:

Total errors: 387

Error breakdown:
status_code
404    211
500    176
Name: count, dtype: int64

Slowest error response: 1994ms

This example shows how ORC files are suitable file formats for log storage. You can write logs continuously, compress them efficiently, and query them quickly. The columnar format means you can filter by status code without reading endpoint or response time data.

When Should You Use ORC?

Use ORC when you:

Work with big data platforms (Hadoop, Spark, Hive)
Need efficient storage for analytics workloads
Have wide tables where you often query specific columns
Want built-in compression and indexing

Don't use ORC when you:

Need row-by-row processing – use Avro instead
Work with small datasets – CSV is simpler in such cases
Need human-readable files – use JSON
Don't have big data infrastructure

Conclusion

ORC is a powerful format for data engineering and analytics. With PyArrow, working with ORC in Python is both straightforward and performant.

You've learned how to read and write ORC files, use compression, handle complex data types, and apply these concepts to real-world scenarios. The columnar storage and compression make ORC an excellent choice for big data pipelines.

Try integrating ORC into your next data project. You'll likely see significant improvements in storage costs and query performance.

Happy coding!

How to Perform Secure Hashing Using Python's hashlib Module

Bala Priya C — Mon, 15 Dec 2025 22:56:01 +0000

Hashing is a fundamental technique in programming that converts data into a fixed-size string of characters. Unlike encryption, hashing is a one-way process: you can't reverse it to get the original data back.

This makes hashing perfect for storing passwords, verifying file integrity, and creating unique identifiers. In this tutorial, you'll learn how to use Python's built-in hashlib module to implement secure hashing in your applications.

By the end of this tutorial, you'll understand:

How to create basic hashes with different algorithms
Why simple hashing isn't enough for passwords
How to add salt to prevent rainbow table attacks
How to use key derivation functions for password storage

You can find the code on GitHub.

Prerequisites

To follow this tutorial, you should have:

Basic Python: Variables, data types, functions, and control structures
Understanding of strings and bytes: How to encode strings and work with byte data

No external libraries are required, as hashlib and os are both part of Python's standard library.

Basic Hashing with Python's hashlib
Why Simple Hashing Isn't Enough for Passwords
Adding Salt to Your Hashes
Verifying Salted Passwords
Using Key Derivation Functions

Basic Hashing with Python’s hashlib

Let's start with the fundamentals. The hashlib module provides access to several hashing algorithms like MD5, SHA-1, SHA-256, and more.

Here's how to create a simple SHA-256 hash:

import hashlib

# Create a simple hash
message = "Hello, World!"
hash_object = hashlib.sha256(message.encode())
hex_digest = hash_object.hexdigest()

print(f"Original: {message}")
print(f"SHA-256 Hash: {hex_digest}")

Output:

Original: Hello, World!
SHA-256 Hash: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Here, we import the hashlib module, encode our string to bytes using .encode() as hashlib requires bytes, not strings.

Then we create a hash object using hashlib.sha256() and get the hexadecimal representation with .hexdigest().

The resulting hash is always 64 characters long regardless of input size. Meaning you have an output string that is 256 bits long. As each hexadecimal character requires 4 bits, the output has 256/4 = 64 hexadecimal characters. Even changing one character produces a completely different hash.

Let's verify that:

import hashlib

# Small change, big difference
message1 = "Hello, World!"
message2 = "Hello, World?"  # Only changed ! to ?

hash1 = hashlib.sha256(message1.encode()).hexdigest()
hash2 = hashlib.sha256(message2.encode()).hexdigest()

print(f"Message 1: {message1}")
print(f"Hash 1:    {hash1}")
print(f"\nMessage 2: {message2}")
print(f"Hash 2:    {hash2}")
print(f"\nAre they the same? {hash1 == hash2}")

Output:

Message 1: Hello, World!
Hash 1:    dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Message 2: Hello, World?
Hash 2:    f16c3bb0532537acd5b2e418f2b1235b29181e35cffee7cc29d84de4a1d62e4d

Are they the same? False

This property is called the avalanche effect where a tiny change creates a completely different output.

Why Simple Hashing Isn't Enough for Passwords

You might think you can just hash passwords and store them in your database. But there's a problem: attackers use rainbow tables, which are precomputed databases of hashes for common passwords.

Here's what happens:

import hashlib

# Simple password hashing (DON'T USE THIS!)
password = "password123"
hashed = hashlib.sha256(password.encode()).hexdigest()

print(f"Password: {password}")
print(f"Hash: {hashed}")

Output:

Password: password123
Hash: ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

If two users have the same password, they'll have identical hashes. An attacker who cracks one hash knows the password for all users with that hash.

So how do we handle this? Let’s learn in the next section.

Adding Salt to Your Hashes

The solution is salting: adding random data to each password before hashing. This way, even identical passwords produce different hashes.

Here's how to implement salted hashing:

import hashlib
import os

def hash_password_with_salt(password):
    # Generate a random salt (16 bytes = 128 bits)
    salt = os.urandom(16)

    # Combine password and salt, then hash
    hash_object = hashlib.sha256(salt + password.encode())
    password_hash = hash_object.hexdigest()

    # Return both salt and hash (you need the salt to verify later)
    return salt.hex(), password_hash

# Hash the same password twice
password = "password123"

salt1, hash1 = hash_password_with_salt(password)
salt2, hash2 = hash_password_with_salt(password)

print(f"Password: {password}\n")
print(f"First attempt:")
print(f"  Salt: {salt1}")
print(f"  Hash: {hash1}\n")
print(f"Second attempt:")
print(f"  Salt: {salt2}")
print(f"  Hash: {hash2}\n")
print(f"Same password, different hashes? {hash1 != hash2}")

Output:

Password: password123

First attempt:
  Salt: fc24b2d2245ff65b80c5bced38744171
  Hash: 5ce634c05941d25871e7ee334b5c24c75f64c4f6d557db66909fcaa793d869f9

Second attempt:
  Salt: bc8a1f79b07e56b51285557211f88bb0
  Hash: 043599d90b2aa0556265869cead35724c7d9d9d37129d897c6b68bade9e737e6

Same password, different hashes? True

How this works:

os.urandom(16) generates 16 random bytes, which is our salt
We concatenate the salt and password bytes before hashing
We return both the salt (as hex) and the hash
You must store both the salt and hash in your database

When a user logs in, you retrieve their salt, hash the entered password with that salt, and compare the result to the stored hash.

Verifying Salted Passwords

Now let's create a function to verify passwords against salted hashes:

import hashlib
import os

def hash_password(password, salt=None):
    """Hash a password with a salt. Generate new salt if not provided."""
    if salt is None:
        salt = os.urandom(16)
    else:
        # Convert hex string back to bytes if needed
        if isinstance(salt, str):
            salt = bytes.fromhex(salt)

    password_hash = hashlib.sha256(salt + password.encode()).hexdigest()
    return salt.hex(), password_hash

def verify_password(password, stored_salt, stored_hash):
    """Verify a password against a stored salt and hash."""
    # Hash the provided password with the stored salt
    _, new_hash = hash_password(password, stored_salt)

    # Compare the hashes
    return new_hash == stored_hash

Here’s how you can use the above:

print("=== User Registration ===")
user_password = "mySecurePassword!"
salt, password_hash = hash_password(user_password)
print(f"Password: {user_password}")
print(f"Salt: {salt}")
print(f"Hash: {password_hash}")

# Simulate user login attempts
print("\n=== Login Attempts ===")
correct_attempt = "mySecurePassword!"
wrong_attempt = "wrongPassword"

print(f"Attempt 1: '{correct_attempt}'")
print(f"  Valid? {verify_password(correct_attempt, salt, password_hash)}")

print(f"\nAttempt 2: '{wrong_attempt}'")
print(f"  Valid? {verify_password(wrong_attempt, salt, password_hash)}")

Output:

=== User Registration ===
Password: mySecurePassword!
Salt: 381779b5262deea84183e4b9454b98b1
Hash: 9756e1f0bc4c1aa4a72f35b0be8d3c8f430d31613371cf7de3c615bc475de98f

=== Login Attempts ===
Attempt 1: 'mySecurePassword!'
  Valid? True

Attempt 2: 'wrongPassword'
  Valid? False

This implementation shows a complete registration and login flow.

Using Key Derivation Functions

While salted SHA-256 is better than plain hashing, modern applications should use key derivation functions (KDFs) specifically designed for password hashing. These include PBKDF2 (Password-Based Key Derivation Function 2), bcrypt, scrypt, and Argon2. You can check the links to learn more about these key derivation functions.

These algorithms are intentionally slow and require more computational resources, making brute-force attacks much harder. Let's implement PBKDF2, which is built into Python:

import hashlib
import os

def hash_password_pbkdf2(password, salt=None, iterations=600000):
    """Hash password using PBKDF2 with SHA-256."""
    if salt is None:
        salt = os.urandom(32)  # 32 bytes = 256 bits
    elif isinstance(salt, str):
        salt = bytes.fromhex(salt)

    # PBKDF2 with 600,000 iterations (OWASP recommendation for 2024)
    password_hash = hashlib.pbkdf2_hmac(
        'sha256',          # Hash algorithm
        password.encode(), # Password as bytes
        salt,              # Salt as bytes
        iterations,        # Number of iterations
        dklen=32           # Desired key length (32 bytes = 256 bits)
    )

    return salt.hex(), password_hash.hex(), iterations

def verify_password_pbkdf2(password, stored_salt, stored_hash, iterations):
    """Verify password against PBKDF2 hash."""
    _, new_hash, _ = hash_password_pbkdf2(password, stored_salt, iterations)
    return new_hash == stored_hash

# Hash a password
print("=== PBKDF2 Password Hashing ===")
password = "SuperSecure123!"
salt, hash_value, iterations = hash_password_pbkdf2(password)

print(f"Password: {password}")
print(f"Salt: {salt}")
print(f"Hash: {hash_value}")
print(f"Iterations: {iterations:,}")

This outputs:

=== PBKDF2 Password Hashing ===
Password: SuperSecure123!
Salt: b388aecd774f6a7ddd95405091548bb50102c99beb1a10326a4c54070da4a3a5
Hash: c681450f41d0cec9ea2aad1108efe2a430b9c3d9fc3af621071be10ac9b3615a
Iterations: 600,000

Now let’s verify the password and also compare the speeds of SHA-256 vs. PBKDF2:

print("\n=== Verification ===")
is_valid = verify_password_pbkdf2(password, salt, hash_value, iterations)
print(f"Password valid? {is_valid}")

# Show time comparison
import time

print("\n=== Speed Comparison ===")
test_password = "test123"

# Simple SHA-256
start = time.time()
for _ in range(100):
    hashlib.sha256(test_password.encode()).hexdigest()
sha256_time = time.time() - start

# PBKDF2
start = time.time()
for _ in range(100):
    hash_password_pbkdf2(test_password)
pbkdf2_time = time.time() - start

print(f"100 SHA-256 hashes: {sha256_time:.3f} seconds")
print(f"100 PBKDF2 hashes: {pbkdf2_time:.3f} seconds")
print(f"PBKDF2 is {pbkdf2_time/sha256_time:.1f}x slower")

Output:


=== Verification ===
Password valid? True

=== Speed Comparison ===
100 SHA-256 hashes: 0.000 seconds
100 PBKDF2 hashes: 53.631 seconds
PBKDF2 is 240068.1x slower

How PBKDF2 works:

Takes your password and salt
Applies the hash function (SHA-256) repeatedly – 600,000 times in this example
Each iteration makes the computation slower and harder to brute-force
You store the salt, hash, AND iteration count (so you can verify later)

The iteration count can be increased over time as computers get faster. Modern recommendations (2024) suggest 600,000 iterations for PBKDF2-SHA256.

Conclusion

You've learned how to implement secure password hashing in Python using the hashlib module. Here are the key takeaways:

Basic hashing with SHA-256 is useful for data integrity, not passwords
Salting prevents rainbow table attacks by making each hash unique
PBKDF2 adds computational cost through iterations, slowing down attackers
Always store the salt, hash, and iteration count together
Use key derivation functions (PBKDF2, bcrypt, Argon2) for passwords

The code examples in this tutorial provide a solid foundation for implementing authentication in your projects. But remember, security is an ongoing process. Stay updated on best practices and regularly review your security implementations.

Happy (secure) coding!

How to Work with YAML in Python – A Guide with Examples

Bala Priya C — Wed, 10 Dec 2025 22:58:47 +0000

If you've ever worked with configuration files, Docker Compose, Kubernetes, or CI/CD pipelines, you've probably used YAML. It's everywhere in modern development, and for good reason: it’s human-readable, simple, and powerful.

In this guide, you'll learn how to work with YAML files in Python. We'll cover reading, writing, and manipulating YAML data in practice.

🔗 You can find the code on GitHub.

Prerequisites

Before working with YAML in Python, you should have:

Python 3.8 or a later version installed
Basic Python knowledge: Variables, data types, functions, and control structures
Understanding of data structures: Dictionaries, lists, and nested data structures
File handling basics: Reading from and writing to files in Python
Command line familiarity: Running Python scripts and installing packages with pip

You'll also need to install the PyYAML library:

pip install pyyaml

What Is YAML and Why Should You Care?
How to Read YAML Files
How to Write YAML Files
How to Work with Lists in YAML
Build a YAML Config Manager

What Is YAML and Why Should You Care?

YAML (YAML Ain't Markup Language) is a data serialization format designed to be easy to read and write. Think of it as JSON's more readable cousin. :)

Here's the same data in JSON and YAML:

JSON:

{
  "database": {
    "host": "localhost",
    "port": 5432,
    "credentials": {
      "username": "admin",
      "password": "secret"
    }
  }
}

YAML:

database:
  host: localhost
  port: 5432
  credentials:
    username: admin
    password: secret

The YAML version is cleaner and easier to read, especially for configuration files.

How to Read YAML Files

Let's say you have a configuration file for a web application. We'll create a simple config.yaml file and learn how to read it in Python.

First, let's understand what we're trying to do. You have configuration data stored in a YAML file, and you want to load it into Python so you can use it in your application. Here’s how you can do it:

import yaml

# Open and read the YAML file
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Access the data
print(config['database']['host'])

Output:

localhost

Here's what's happening in this code:

We import the yaml module.
Then we open the file using a context manager (with statement), which automatically closes the file when we're done.
We use yaml.safe_load() to parse the YAML content into a Python dictionary so we can access the data just like any Python dictionary.

⚠️ Note that you should always use yaml.safe_load() instead of yaml.load(). The safe_load() function protects you from arbitrary code execution vulnerabilities. Unless you have a very specific reason (and you probably don't), stick with safe_load().

How to Write YAML Files

Now let's go in the opposite direction. You have Python data structures and you want to save them as YAML files. This is useful when you're generating configuration files or exporting data.

import yaml

# Your configuration data as Python dictionaries
config = {
    'database': {
        'host': 'localhost',
        'port': 5432,
        'name': 'myapp_db',
        'credentials': {
            'username': 'admin',
            'password': 'secret123'
        }
    },
    'server': {
        'host': '0.0.0.0',
        'port': 8000,
        'debug': True
    },
    'features': {
        'enable_cache': True,
        'cache_ttl': 3600
    }
}

# Write to a YAML file
with open('generated_config.yaml', 'w') as file:
    yaml.dump(config, file, default_flow_style=False)

Let's break down what's happening:

We create a nested Python dictionary with our configuration.
We open a file in write mode ('w').
We use yaml.dump() to convert the Python dictionary to YAML format and write it to the file.
The default_flow_style=False parameter ensures the output uses block style (the readable, indented format) instead of inline style.

The resulting generated_config.yaml file will be properly formatted and ready to use.

How to Work with Lists in YAML

YAML handles lists elegantly, and they're common in configuration files. Suppose you're building a microservices application and need to configure multiple service endpoints. Here's how you'd work with that data:

import yaml

# Configuration with lists
services_config = {
    'services': [
        {
            'name': 'auth-service',
            'url': 'http://auth.example.com',
            'timeout': 30
        },
        {
            'name': 'payment-service',
            'url': 'http://payment.example.com',
            'timeout': 60
        },
        {
            'name': 'notification-service',
            'url': 'http://notification.example.com',
            'timeout': 15
        }
    ],
    'retry_policy': {
        'max_attempts': 3,
        'backoff_seconds': 5
    }
}

# Write to file
with open('services.yaml', 'w') as file:
    yaml.dump(services_config, file, default_flow_style=False, sort_keys=False)

# Read it back
with open('services.yaml', 'r') as file:
    loaded_services = yaml.safe_load(file)

# Access list items
for service in loaded_services['services']:
    print(f"Service: {service['name']}, URL: {service['url']}")

Output:

Service: auth-service, URL: http://auth.example.com
Service: payment-service, URL: http://payment.example.com
Service: notification-service, URL: http://notification.example.com

This code helps us understand a few key concepts.

We can nest lists and dictionaries freely in our Python data structures. The sort_keys=False parameter preserves the order of keys as we defined them. When we read the YAML back, we can iterate over lists just like any Python list. The data structures in Python match the structures in YAML.

Build a YAML Config Manager

Let's put everything together with a practical example. We'll build a simple configuration manager class that handles environment-specific configs (a common need in real projects):

import yaml
import os

class ConfigManager:
    def __init__(self, config_dir='configs'):
        self.config_dir = config_dir
        self.config = {}

    def load_config(self, environment='development'):
        """Load configuration for a specific environment"""
        config_file = os.path.join(self.config_dir, f'{environment}.yaml')

        try:
            with open(config_file, 'r') as file:
                self.config = yaml.safe_load(file)
            print(f"✓ Loaded configuration for {environment}")
            return self.config
        except FileNotFoundError:
            print(f"✗ Configuration file not found: {config_file}")
            return None
        except yaml.YAMLError as e:
            print(f"✗ Error parsing YAML: {e}")
            return None

    def get(self, key_path, default=None):
        """Get a configuration value using dot notation"""
        keys = key_path.split('.')
        value = self.config

        for key in keys:
            if isinstance(value, dict) and key in value:
                value = value[key]
            else:
                return default

        return value

    def save_config(self, environment, config_data):
        """Save configuration to a file"""
        config_file = os.path.join(self.config_dir, f'{environment}.yaml')

        os.makedirs(self.config_dir, exist_ok=True)

        with open(config_file, 'w') as file:
            yaml.dump(config_data, file, default_flow_style=False)

        print(f"✓ Saved configuration for {environment}")

This ConfigManager class shows you how to build a practical utility:

Initialization: We set up a directory for config files.
Loading: The load_config() method reads environment-specific YAML files with proper error handling.
Accessing data: The get() method lets you access nested values using dot notation (like 'database.host').
Saving: The save_config() method writes configuration data to YAML files.

This is the kind of pattern you might actually use in projects. You can extend it further by adding validation, environment variable overrides, or configuration merging. Here’s how you can use the ConfigManager class we’ve coded:

if __name__ == '__main__':
    # Create config manager
    config_mgr = ConfigManager()

    # Create a sample development config
    dev_config = {
        'database': {
            'host': 'localhost',
            'port': 5432,
            'name': 'dev_db'
        },
        'api': {
            'base_url': 'http://localhost:8000',
            'timeout': 30
        }
    }

    # Save it
    config_mgr.save_config('development', dev_config)

    # Load and use it
    config_mgr.load_config('development')
    print(f"Database host: {config_mgr.get('database.host')}")
    print(f"API timeout: {config_mgr.get('api.timeout')}")

Running the above code should give you the following output:

✓ Saved configuration for development
✓ Loaded configuration for development
Database host: localhost
API timeout: 30

Conclusion

YAML is a powerful tool in your developer toolkit. It comes in handy when you’re configuring applications, defining CI/CD pipelines, or working with infrastructure as code.

In this article, you learned how to work with YAML files in Python. You can read configuration files, write data to YAML format, handle lists and nested structures, and build practical utilities like the ConfigManager we coded.

Start small. Try replacing a JSON config file in one of your projects with YAML. You'll quickly appreciate how much more readable it is, and you'll be comfortable working with YAML across the tools and platforms that use it.

Happy coding!

How to Parse XML in Python Without Using External Libraries

Bala Priya C — Wed, 12 Nov 2025 20:29:56 +0000

In software development, you’ll run into XML (Extensible Markup Language) when working with configuration files, API responses, data exports, and more. While there are powerful third-party libraries for parsing XML, Python's standard library already includes everything you need.

In this tutorial, you'll learn how to parse XML using Python's built-in xml.etree.ElementTree module. No pip installs required.

🔗 You can find the code on GitHub.

Prerequisites

To follow along with this tutorial, you should have:

Python 3.7 or later installed on your system
Basic understanding of Python syntax and data structures
Familiarity with basic programming concepts like loops and conditionals
A text editor or IDE for writing Python code

No external libraries are required as we'll use Python's built-in xml.etree.ElementTree module.

How to Read an XML String
How to Read an XML File
How to Find Elements in an XML Tree
How to Extract Text and Attributes from XML
How to Build a Simple XML Parser
How to Handle Missing Data

How to Read an XML String

Let's start simple. We'll parse XML directly from a string to understand the fundamental concepts.

import xml.etree.ElementTree as ET

xml_string = """

    
        Wireless Keyboard
        29.99
    

"""

root = ET.fromstring(xml_string)
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")

How this works:

We import xml.etree.ElementTree and give it the alias ET (this is the convention)
ET.fromstring() parses the XML string and returns the root element
Every element has a .tag property (the element name) and .attrib dictionary (its attributes)
The root object represents the element in our XML

For the above example, you’ll see the following output:

Root tag: catalog
Root attributes: {}

Here, the root.attrib is empty because the root element in the provided xml_string does not have any attributes defined. Attributes are key-value pairs within the opening tag of an XML element, like id="101" or currency="USD" in the and elements. Since only has a tag and no additional information within its opening tag, its attributes dictionary is empty.

How to Read an XML File

In real applications, you'll usually read XML from files. Say you have a products.xml file. Here's how you can read from the XML file:

# Parse an XML file
tree = ET.parse('products.xml')
root = tree.getroot()

print(f"Root element: {root.tag}")

Before we proceed to run and check the output, let’s note the differences between reading XML strings vs files:

ET.parse() reads from a file and returns an ElementTree object
We call .getroot() to get the root element
Use ET.parse() for files, ET.fromstring() for strings

Running the above code should give you:

Root element: catalog

How to Find Elements in an XML Tree

ElementTree gives you three main ways to search for elements. Understanding when to use each is important.

import xml.etree.ElementTree as ET

xml_data = """

    
        Wireless Keyboard
        
            Electronics
            Accessories
        
    
    
        USB Mouse
        
            Electronics
        
    

"""

root = ET.fromstring(xml_data)

# Method 1: find() - returns the FIRST matching element
first_product = root.find('product')
print(f"First product ID: {first_product.get('id')}")

# Method 2: findall() - returns ALL direct children that match
all_products = root.findall('product')
print(f"Total products: {len(all_products)}")

# Method 3: iter() - recursively finds ALL matching elements
all_categories = root.iter('category')
category_list = [cat.text for cat in all_categories]
print(f"All categories: {category_list}")

Now let’s understand how the three methods work:

find() stops at the first match. Use when you only need one element.
findall() only searches direct children (one level deep). Use for immediate child elements.
iter() searches recursively through the entire tree. Use when elements might be nested anywhere.

This is important: findall('category') on root won't find anything because isn't a direct child of . But iter('category') will find all categories no matter how deeply nested. So when you run the above code, you’ll get:

First product ID: 101
Total products: 2
All categories: ['Electronics', 'Accessories', 'Electronics']

How to Extract Text and Attributes from XML

Now let's extract actual data from our XML. This is where you turn structured XML into Python data you can work with.

xml_data = """

    
        Wireless Keyboard
        29.99
        45
    

"""

root = ET.fromstring(xml_data)
product = root.find('product')

# Get element text content
product_name = product.find('name').text
price_text = product.find('price').text
stock_text = product.find('stock').text

# Get attributes (two ways)
product_id = product.get('id')  # Method 1: .get()
product_id_alt = product.attrib['id']  # Method 2: .attrib dictionary

# Get nested attributes
price_element = product.find('price')
currency = price_element.get('currency')

print(f"Product: {product_name}")
print(f"ID: {product_id}")
print(f"Price: {currency} {price_text}")
print(f"Stock: {stock_text}")

This outputs:

Product: Wireless Keyboard
ID: 101
Price: USD 29.99
Stock: 45

What's happening here:

.text gets the text content between opening and closing tags
.get('attribute_name') safely retrieves an attribute (returns None if missing)
.attrib['attribute_name'] accesses the attribute dictionary directly (raises KeyError if missing)
Use .get() when an attribute might be optional, use .attrib[] when it's required

How to Build a Simple XML Parser

Let's put it all together with a practical example. We'll parse the full product catalog and convert it to a Python list of dictionaries.

def parse_product_catalog(xml_file):
    """Parse an XML product catalog and return a list of product dictionaries."""
    tree = ET.parse(xml_file)
    root = tree.getroot()

    products = []

    for product_element in root.findall('product'):
        # Extract product data
        product = {
            'id': product_element.get('id'),
            'name': product_element.find('name').text,
            'price': float(product_element.find('price').text),
            'currency': product_element.find('price').get('currency'),
            'stock': int(product_element.find('stock').text),
            'categories': []
        }

        # Extract categories (nested elements)
        categories_element = product_element.find('categories')
        if categories_element is not None:
            for category in categories_element.findall('category'):
                product['categories'].append(category.text)

        products.append(product)

    return products

Breaking down this parser:

We iterate through all elements using findall()
For each product, we extract text and attributes into a dictionary. We convert numeric strings to proper types (float for price, int for stock)
For nested categories, we first check if the element exists. Then we iterate through child elements and collect their text

The result is clean Python data structures you can easily work with. You can now use the parser like so:

products = parse_product_catalog('products.xml')

for product in products:
    print(f"\nProduct: {product['name']}")
    print(f"  ID: {product['id']}")
    print(f"  Price: {product['currency']} {product['price']}")
    print(f"  Stock: {product['stock']}")
    print(f"  Categories: {', '.join(product['categories'])}")

Output:

Product: Wireless Keyboard
  ID: 101
  Price: USD 29.99
  Stock: 45
  Categories: Electronics, Accessories

Product: USB Mouse
  ID: 102
  Price: USD 15.99
  Stock: 120
  Categories: Electronics

How to Handle Missing Data

Real-world XML is messy (no surprises there!). Elements might be missing, text might be empty, or attributes might not exist. Here's how to handle that gracefully.

xml_data = """

    
        Wireless Keyboard
        29.99
    
    
        USB Mouse
        
    

"""

root = ET.fromstring(xml_data)

for product in root.findall('product'):
    name = product.find('name').text

    # Safe way to handle potentially missing elements
    price_element = product.find('price')
    if price_element is not None:
        price = float(price_element.text)
        currency = price_element.get('currency', 'USD')  # Default value
        print(f"{name}: {currency} {price}")
    else:
        print(f"{name}: Price not available")

Here, we handle potential missing data by:

Using product.find('price') to search for the element within the current element.
Checking if the result of find() is None. If an element is not found, find() returns None.
Using an if price_element is not None: condition to only attempt to access the text (price_element.text) and attributes (price_element.get('currency', 'USD')) of the element if it was actually found.
Adding an else block to handle the case where the element is missing, printing "Price not available".

This approach prevents errors that would occur if you tried to access .text or .get() on a None object. For the above code snippet, you’ll get:

Wireless Keyboard: USD 29.99
USB Mouse: Price not available

Here are a few more error-handling strategies:

Always check if find() returns None before accessing .text or .get()
Use .get('attr', 'default') to provide default values for missing attributes
Consider wrapping parsing in try-except blocks for production code
Validate your data after parsing rather than assuming XML structure is correct

Conclusion

You now know how to parse XML in Python without installing any external libraries. You learned:

How to read XML from strings and files
The difference between find(), findall(), and iter()
How to extract text content and attributes safely
How to handle nested elements and missing data

The xml.etree.ElementTree module works well enough for most XML parsing needs, and it's always available in Python's standard library.

For more advanced XML navigation and selection, you can explore XPath expressions. XPath works well for selecting nodes in an XML document and can be very useful for complex structures. We’ll cover this in another tutorial.

Until then, happy parsing!

How to Parse JSON in Python – A Complete Guide With Examples

Bala Priya C — Wed, 29 Oct 2025 21:54:41 +0000

JSON has become the standard format for data exchange on the web. So you'll run into JSON all the time when working with REST APIs, configuration files, database exports, and more. As a developer, you should know how to parse, manipulate, and generate JSON efficiently.

Python's built-in json module provides a straightforward interface for working with JSON data. You'll use it to convert JSON strings into Python dictionaries and lists that you can manipulate with familiar syntax, and then convert your Python data structures back into JSON when you need to send data to an API or save it to a file.

Beyond basic parsing, you'll often need to handle nested structures, validate data integrity, manage, and transform data formats. This guide covers practical JSON parsing techniques you can use in your projects right away. Let’s get started!

🔗 You can find the code examples on GitHub.

Prerequisites

To follow along with this tutorial, you should have:

Python 3.7 or later installed on your system
Basic understanding of Python dictionaries and lists
Familiarity with Python file operations (opening and reading files)
A text editor or IDE for writing Python code

Understanding JSON Structure and Basic Parsing
How to Work with Nested JSON Objects
How to Parse JSON Arrays
How to Read JSON from Files
How to Handle JSON Parsing Errors

Understanding JSON Structure and Basic Parsing

JSON represents data using a simple syntax with six data types: objects (key-value pairs), arrays, strings, numbers, Booleans, and null.

When Python parses JSON, these types map directly to Python equivalents:

JSON objects become dictionaries,
arrays become lists,
strings remain strings,
numbers become int or float,
true and false become True and False, and
null becomes None.

This direct mapping makes working with JSON in Python intuitive once you understand the correspondence.

Before you start, import the json module that’s built into the Python standard library.

The basic operation in JSON parsing is converting a JSON string into a Python data structure you can work with. Here's how to perform this basic conversion:

import json

json_string = '{"name": "Sarah Chen", "age": 28, "city": "Portland"}'
person = json.loads(json_string)

print(person["name"]) 
print(person["age"])   
print(type(person))

Output:

Sarah Chen
28

Here, the json.loads() function takes a string containing JSON and returns a Python object. The 's' in 'loads' stands for 'string', indicating it works with string data. After parsing, you have a regular Python dictionary that you can access with bracket notation using the JSON keys.

How to Work with Nested JSON Objects

Real-world JSON data rarely comes in flat structures. APIs typically return deeply nested objects containing multiple levels of data. Understanding how to navigate these structures is essential for extracting the information you need.

Consider this example of parsing a weather API response that contains nested objects for location data and current conditions:

import json

weather_data = '''
{
    "location": {
        "city": "Seattle",
        "state": "WA",
        "coordinates": {
            "latitude": 47.6062,
            "longitude": -122.3321
        }
    },
    "current": {
        "temperature_f": 58,
        "conditions": "Partly Cloudy",
        "humidity": 72,
        "wind": {
            "speed_mph": 8,
            "direction": "NW"
        }
    }
}
'''

weather = json.loads(weather_data)

After parsing the JSON string with json.loads(), you can access nested values by chaining dictionary keys together:

city = weather["location"]["city"]
temp = weather["current"]["temperature_f"]
wind_speed = weather["current"]["wind"]["speed_mph"]

print(f"{city}: {temp}°F, Wind {wind_speed} mph")

Output:

Seattle: 58°F, Wind 8 mph

In this example, each level of nesting requires another set of brackets. The expression weather["location"]["city"] first accesses the "location" object, then retrieves the "city" value from within it. You can drill down as many levels as needed, like weather["current"]["wind"]["speed_mph"] which traverses three levels deep. This chaining syntax mirrors how you would access the data in the original JSON structure.

How to Parse JSON Arrays

JSON arrays represent ordered lists of values and appear frequently in API responses when returning collections of items. Python converts JSON arrays into lists, which you can iterate through or access by index.

Here's an example parsing a list of products from an inventory system:

import json

products_json = '''
[
    {
        "id": "PROD-001",
        "name": "Wireless Mouse",
        "price": 24.99,
        "in_stock": true
    },
    {
        "id": "PROD-002",
        "name": "Mechanical Keyboard",
        "price": 89.99,
        "in_stock": false
    },
    {
        "id": "PROD-003",
        "name": "USB-C Hub",
        "price": 34.99,
        "in_stock": true
    }
]
'''

products = json.loads(products_json)

The JSON string starts with a square bracket, indicating an array at the root level. After parsing, products is a Python list containing three dictionaries.

You can now use standard Python list operations on the parsed data. The len() function returns the number of items, and you can iterate through the list with a for loop. Each iteration gives you a dictionary representing one product, which you access using dictionary syntax.

print(f"Total products: {len(products)}")

for product in products:
    status = "Available" if product["in_stock"] else "Out of stock"
    print(f"{product['name']}: ${product['price']} - {status}")

Output:

Total products: 3
Wireless Mouse: $24.99 - Available
Mechanical Keyboard: $89.99 - Out of stock
USB-C Hub: $34.99 - Available

You can also access specific array elements by index and filter the data. List indexing works exactly as it does with any Python list, starting at zero.

first_product = products[0]
print(f"First product ID: {first_product['id']}")

Output:

First product ID: PROD-001

You can also use list comprehensions to filter the parsed data, creating a new list containing only products where the "in_stock" value is True.

available_products = [p for p in products if p["in_stock"]]
print(f"Available: {len(available_products)} products")

Output:

Available: 2 products

How to Read JSON from Files

Most applications read JSON from files rather than hardcoded strings. Configuration files, data exports, and cached API responses typically live in JSON files that your application needs to load at runtime.

The json module comes with the load function for reading files that handles opening and parsing in one step.

This code creates a sample configuration file to demonstrate file reading:

import json

# First, let's create a sample config 
config_data = {
    "api_url": "https://api.example.com/v2",
    "timeout": 30,
    "retry_attempts": 3,
    "enable_logging": True
}

with open('config.json', 'w') as f:
    json.dump(config_data, f, indent=2)

The json.dump() function writes Python data to a file, and the indent=2 parameter formats the JSON with 2-space indentation to make it human-readable. The 'w' mode opens the file for writing, creating it if it doesn't exist or overwriting it if it does.

Now you can read that file back into your application. The json.load() function (without the 's') reads from a file object and parses the JSON in one operation.

with open('config.json', 'r') as f:
    config = json.load(f)

print(f"API URL: {config['api_url']}")
print(f"Timeout: {config['timeout']} seconds")
print(f"Logging: {'Enabled' if config['enable_logging'] else 'Disabled'}")

Note the difference: json.loads() parses strings, while json.load() reads from files.

The with statement ensures that the file closes properly even if an error occurs during reading. After the with block completes, you have a Python dictionary containing all the parsed configuration data.

API URL: https://api.example.com/v2
Timeout: 30 seconds
Logging: Enabled

How to Handle JSON Parsing Errors

JSON parsing can fail for many reasons: malformed syntax, unexpected data types, corrupted files, or network issues when fetching from APIs. Your code must handle these errors gracefully rather than crashing.

The json module raises a JSONDecodeError when it runs into invalid JSON. Here's how to catch and handle these errors appropriately.

The try-except block catches any JSON parsing errors:

The JSONDecodeError exception provides detailed information about what went wrong: e.msg describes the error, e.lineno indicates which line contains the problem, and e.colno shows the character position. This information helps you debug malformed JSON quickly.
The function returns None when parsing fails, allowing calling code to check for this and handle the error appropriately.

Let's test this with a few JSON examples:

# Missing closing quote
bad_json1 = '{"name": "Sarah, "age": 28}'
result1 = parse_json_safely(bad_json1)
print(f"Result 1: {result1}\n")

# Missing closing brace
bad_json2 = '{"name": "Sarah", "age": 28'
result2 = parse_json_safely(bad_json2)
print(f"Result 2: {result2}\n")

# Extra comma
bad_json3 = '{"name": "Sarah", "age": 28,}'
result3 = parse_json_safely(bad_json3)
print(f"Result 3: {result3}\n")

# Valid JSON for comparison
good_json = '{"name": "Sarah", "age": 28}'
result4 = parse_json_safely(good_json)
print(f"Result 4: {result4}")

Each malformed JSON string triggers a different error message indicating the specific syntax problem. The error messages help pinpoint exactly where the JSON is invalid. The final example shows that valid JSON parses successfully and returns a dictionary instead of None.

JSON parsing failed: Expecting ',' delimiter
Error at line 1, column 19
Result 1: None

JSON parsing failed: Expecting ',' delimiter
Error at line 1, column 28
Result 2: None

JSON parsing failed: Expecting property name enclosed in double quotes
Error at line 1, column 29
Result 3: None

Result 4: {'name': 'Sarah', 'age': 28}

When reading JSON files, you should also handle file-related errors. The following function load_json_file_safely handles three types of errors:

FileNotFoundError when the file doesn't exist,
PermissionError when the application can't read the file, and
JSONDecodeError when the file contains invalid JSON. Each error type gets its own except block with an appropriate message.

The calling code checks if the result is None and falls back to default values, ensuring the application continues running even when the file can't be loaded.

import json

def load_json_file_safely(filepath):
    try:
        with open(filepath, 'r') as f:
            return json.load(f)
    except FileNotFoundError:
        print(f"Error: File '{filepath}' not found")
        return None
    except PermissionError:
        print(f"Error: Permission denied reading '{filepath}'")
        return None
    except json.JSONDecodeError as e:
        print(f"Error: Invalid JSON in '{filepath}'")
        print(f"  {e.msg} at line {e.lineno}")
        return None

data = load_json_file_safely('missing_file.json')
if data is None:
    print("Using default configuration")
    data = {"timeout": 30, "retries": 3}

If you run the above code, you’ll get the following output:

Error: File 'missing_file.json' not found
Using default configuration

And that’s a wrap! Thank you for making it this far if you’ve following along! 🥳

Conclusion

The json module provides everything you need for working with JSON data in Python. Here’s a summary of what we covered:

The core functions handle the most common operations: json.loads() parses JSON strings into Python objects, and json.load() reads and parses JSON from files.
JSON parsing automatically converts between JSON and Python data types. This conversion lets you work with parsed JSON using standard Python syntax.
You can navigate nested JSON by chaining dictionary keys and list indices together. Access nested values like data['section']['subsection']['field'] by following the structure down through each level.
Always wrap JSON parsing in try-except blocks when working with external data. The JSONDecodeError exception provides specific information about parsing failures including the error location, helping you debug issues quickly. When reading files, also catch FileNotFoundError and PermissionError to handle common file access problems gracefully.

Get comfortable with these fundamentals and you'll be able to handle most JSON parsing tasks you’ll need for your Python projects. Happy coding!

How to Work with TOML Files in Python

Bala Priya C — Fri, 24 Oct 2025 19:50:44 +0000

TOML (Tom's Obvious Minimal Language) has become the modern standard for configuration files in Python projects. It's more expressive than INI files and cleaner than JSON or YAML.

Since Python 3.11, the standard library includes the tomllib module for reading and parsing TOML files. TOML offers several advantages over other configuration formats. It supports complex data types like arrays and nested tables while remaining human-readable. Many Python projects, including Poetry and setuptools, use pyproject.toml for configuration.

And in this tutorial, we’ll learn how to parse TOML files in Python.

🔗 Here’s the code on GitHub.

Prerequisites

To follow along with this tutorial, you'll need:

Python 3.11 or higher: The tomllib module is part of the standard library starting from Python 3.11
Basic Python knowledge: Familiarity with dictionaries, file I/O, and basic syntax
A text editor or IDE: Any editor to create and edit TOML and Python files

Understanding the TOML Format
How to Read TOML Files with tomllib
How to Work with TOML Data Types
How to Build a TOML Config Manager
How to Handle Missing Values Safely

Understanding the TOML Format

TOML files organize data into tables (similar to INI sections) but with more powerful features. Let's create a sample configuration to understand the syntax.

Create config.toml:

# Application configuration
title = "My Application"
version = "1.0.0"

[database]
host = "localhost"
port = 5432
username = "app_user"
password = "secure_password"
databases = ["myapp_db", "myapp_cache"]
pool_size = 10
ssl_enabled = true

[server]
host = "0.0.0.0"
port = 8000
debug = false
allowed_hosts = ["localhost", "127.0.0.1", "example.com"]

[logging]
level = "INFO"
format = "%(asctime)s - %(levelname)s - %(message)s"
handlers = ["console", "file"]

[cache]
enabled = true
ttl = 3600
max_size = 1000

[features]
enable_api = true
enable_webhooks = false
rate_limit = 100

This TOML file shows key features: simple key-value pairs, tables (sections in brackets), arrays (square brackets with comma-separated values), and different data types including strings, integers, booleans, and arrays.

How to Read TOML Files with `tomllib`

The tomllib module is part of Python's standard library starting from version 3.11. It provides a simple interface for loading TOML files like so:

import tomllib

with open('config.toml', 'rb') as f:
    config = tomllib.load(f)

# Access values
app_title = config['title']
db_host = config['database']['host']
db_port = config['database']['port']

print(f"Application: {app_title}")
print(f"Database: {db_host}:{db_port}")
print(f"Config keys: {config.keys()}")

Output:

Application: My Application
Database: localhost:5432
Config keys: dict_keys(['title', 'version', 'database', 'server', 'logging', 'cache', 'features'])

Note that tomllib requires opening files in binary mode ('rb'). The load() function parses the TOML file and returns a regular Python dictionary.

Values are automatically converted to appropriate Python types: strings remain strings, integers become ints, booleans become True/False, and arrays become lists. Next, let’s take a closer look at working with different data types.

How to Work with TOML Data Types

TOML's type system maps cleanly to Python's built-in types. Here's how to work with different value types:

import tomllib

with open('config.toml', 'rb') as f:
    config = tomllib.load(f)

# Strings
app_title = config['title']

# Integers
db_port = config['database']['port']
cache_ttl = config['cache']['ttl']

# Booleans
debug_mode = config['server']['debug']
cache_enabled = config['cache']['enabled']

# Arrays (become Python lists)
databases = config['database']['databases']
allowed_hosts = config['server']['allowed_hosts']

print(f"Databases: {databases}")
print(f"Type of databases: {type(databases)}")
print(f"Debug mode: {debug_mode}, type: {type(debug_mode)}")

With tomllib, you don't need special getter methods like ConfigParser. The returned dictionary contains properly typed Python objects ready to use as seen:

Databases: ['myapp_db', 'myapp_cache']
Type of databases: 
Debug mode: False, type:

How to Build a TOML Config Manager

For production applications, wrapping TOML loading in a configuration class provides better error handling and validation. Here’s how you can do it:

import tomllib
from pathlib import Path

class TOMLConfig:
    def __init__(self, config_file='config.toml'):
        self.config_file = Path(config_file)

        if not self.config_file.exists():
            raise FileNotFoundError(f"Config file not found: {config_file}")

        with open(self.config_file, 'rb') as f:
            self.config = tomllib.load(f)

    def get(self, key, default=None):
        """Get a top-level configuration value"""
        return self.config.get(key, default)

    def get_section(self, section):
        """Get an entire configuration section"""
        if section not in self.config:
            raise ValueError(f"Section '{section}' not found")
        return self.config[section]

You can use the TOMLConfig class like so:

config = TOMLConfig('config.toml')

# Get top-level values
app_title = config.get('title')
version = config.get('version')

# Get entire sections
db_config = config.get_section('database')
server_config = config.get_section('server')

print(f"{app_title} v{version}")
print(f"Database config: {db_config}")

This configuration class provides a clean interface to your TOML file. It validates that the file exists before trying to parse it and provides methods to safely access configuration values.

Running the above code gives this output:

My Application v1.0.0
Database config: {'host': 'localhost', 'port': 5432, 'username': 'app_user', 'password': 'secure_password', 'databases': ['myapp_db', 'myapp_cache'], 'pool_size': 10, 'ssl_enabled': True}

How to Handle Missing Values Safely

Your code needs to handle missing configuration gracefully. Here's how to provide defaults and validate required values:

import tomllib

def load_config_safe(config_file='config.toml'):
    try:
        with open(config_file, 'rb') as f:
            return tomllib.load(f)
    except FileNotFoundError:
        print(f"Config file {config_file} not found, using defaults")
        return {}
    except tomllib.TOMLDecodeError as e:
        print(f"Error parsing TOML: {e}")
        raise

config = load_config_safe('config.toml')

# Get with defaults
db_host = config.get('database', {}).get('host', 'localhost')
db_port = config.get('database', {}).get('port', 5432)
debug = config.get('server', {}).get('debug', False)

print(f"Database: {db_host}:{db_port}")
print(f"Debug: {debug}")

Output:

Database: localhost:5432
Debug: False

This pattern uses chained .get() calls with defaults. If a section or key doesn't exist, you get the default value instead of a KeyError.

Conclusion

When working with TOML files in Python, follow these guidelines:

Always open in binary mode: The tomllib module requires binary mode ('rb') when opening files.
Use nested tables for organization: Take advantage of TOML's ability to nest tables for complex configurations.
Provide defaults for optional settings: Use .get() with default values to make your application more flexible.

Consider using TOML for new projects. If you're starting fresh, TOML is a great choice for Python configuration. Happy coding!

How to Parse INI Config Files in Python with Configparser

Bala Priya C — Fri, 17 Oct 2025 15:10:37 +0000

Configuration files provide a structured way to manage application settings that's more organized than environment variables alone.

INI files, short for initialization files, with their simple section-based format, are both easy to read and parse. Python's built-in configparser module makes working with these files straightforward and powerful.

This tutorial will teach you how to read and parse such .ini config files using the configparser module.

🔗 Here’s the code on GitHub.

Prerequisites

To follow along with this tutorial, you should have:

Python 3.7 or later installed on your system
Basic understanding of Python syntax and data structures (dictionaries, strings)
Familiarity with file operations in Python
A text editor or IDE for writing Python code
Basic knowledge of configuration files and why they're used in applications

No external packages are required, as we'll be using Python's built-in configparser module.

Understanding the INI File Format
Basic ConfigParser Usage
Type Conversion and Default Values
How to Create a Simple Config Manager
How to Work with Multiple Sections in INI Files
How to Write Configuration Files

Understanding the INI File Format

INI files organize configuration into sections, where each section contains key-value pairs. This structure is useful for applications with multiple components or environments. Let's look at what an INI file looks like before we parse it.

Create a file named app.ini:

[database]
host = localhost
port = 5432
username = app_user
password = secure_password
pool_size = 10
ssl_enabled = true

[server]
host = 0.0.0.0
port = 8000
debug = false

[logging]
level = INFO
file = app.log

This file contains three sections: database, server, and logging. Each section groups related settings together, making the configuration easy to understand and maintain.

Basic ConfigParser Usage

The configparser module provides the ConfigParser class, which handles all the parsing work. Here's how to read and access configuration values:

import configparser

config = configparser.ConfigParser()
config.read('app.ini')

# Access values from sections
db_host = config['database']['host']
db_port = config['database']['port']

print(f"Database: {db_host}:{db_port}")
print(f"Sections: {config.sections()}")

This code shows the basic workflow:

create a ConfigParser object,
read your INI file,
then access values using dictionary-like syntax.

The first bracket contains the section name, and the second contains the key.

Create the app.ini file and run the above code. You should see the following output:

Database: localhost:5432
Sections: ['database', 'server', 'logging']

Type Conversion and Default Values

Configuration values in INI files are stored as strings, but you often need them as integers, booleans, or floats. ConfigParser provides convenient methods for type conversion as shown here:

import configparser

config = configparser.ConfigParser()
config.read('app.ini')

# Automatic type conversion
db_port = config.getint('database', 'port')
ssl_enabled = config.getboolean('database', 'ssl_enabled')

# With fallback defaults
max_retries = config.getint('database', 'max_retries', fallback=3)
timeout = config.getfloat('database', 'timeout', fallback=30.0)

print(f"Port: {db_port}, SSL: {ssl_enabled}")

In this code, the getint(), getboolean(), and getfloat() methods convert string values to the appropriate type. The fallback parameter provides a default value when the key doesn't exist, preventing errors.

When you run the above code, you’ll get:

Port: 5432, SSL: True

How to Create a Simple Config Manager

A practical approach is to wrap ConfigParser in a class that validates configuration and provides easy access to settings:

import configparser
from pathlib import Path

class ConfigManager:
    def __init__(self, config_file='app.ini'):
        self.config = configparser.ConfigParser()

        if not Path(config_file).exists():
            raise FileNotFoundError(f"Config file not found: {config_file}")

        self.config.read(config_file)

    def get_database_config(self):
        db = self.config['database']
        return {
            'host': db.get('host'),
            'port': db.getint('port'),
            'username': db.get('username'),
            'password': db.get('password'),
            'pool_size': db.getint('pool_size', fallback=5)
        }

This manager class validates that the file exists and provides clean methods to access configuration. It returns dictionaries with properly typed values.

And you can use it like so:

config = ConfigManager('app.ini')
db_config = config.get_database_config()
print(db_config)

This outputs:

{'host': 'localhost', 'port': 5432, 'username': 'app_user', 'password': 'secure_password', 'pool_size': 10}

How to Work with Multiple Sections in INI Files

You can organize different parts of your application into separate sections and access them independently:

import configparser

config = configparser.ConfigParser()
config.read('app.ini')

# Get all options in a section as a dictionary
db_settings = dict(config['database'])
server_settings = dict(config['server'])

# Check if a section exists
if config.has_section('cache'):
    cache_enabled = config.getboolean('cache', 'enabled')
else:
    cache_enabled = False

print(f"Database settings: {db_settings}")
print(f"Caching enabled: {cache_enabled}")

The dict() conversion gives you all key-value pairs from a section at once. The has_section() method lets you conditionally handle optional configuration sections.

Running the above code should give you the following output:

Database settings: {'host': 'localhost', 'port': '5432', 'username': 'app_user', 'password': 'secure_password', 'pool_size': '10', 'ssl_enabled': 'true'}
Caching enabled: False

How to Write Configuration Files

ConfigParser can also create and modify INI files, which is useful for saving user preferences or generating config templates:

import configparser

config = configparser.ConfigParser()

# Add sections and values
config['database'] = {
    'host': 'localhost',
    'port': '5432',
    'username': 'myapp'
}

config['server'] = {
    'host': '0.0.0.0',
    'port': '8000',
    'debug': 'false'
}

# Write to file
with open('generated.ini', 'w') as configfile:
    config.write(configfile)

print("Configuration file created!")

This code creates a new INI file from scratch. The write() method saves the configuration in the proper INI format with sections and key-value pairs.

Conclusion

When environment variables aren't enough and you need grouped settings for different components, INI files are your answer.

The format is human-readable, ConfigParser handles type conversion automatically, and it's built into Python's standard library. Wrap it in a configuration class for validation and clean access patterns.

Also remember:

Organize by component. Use sections to group related settings.
Use type conversion methods. Always use getint(), getboolean(), and getfloat() rather than manual conversion. They handle edge cases better.
Provide sensible defaults. Use the fallback parameter for optional settings so your application works with minimal configuration.
Validate early. Check that required sections and keys exist at startup before attempting to use them.
Keep secrets separate. Don't commit INI files with passwords to version control. Use .ini.example files with dummy values as templates.

In the next article, you’ll learn how to work with TOML files in Python. Until then, keep coding!

Bala Priya C - freeCodeCamp.org

Efficient Data Processing in Python: Batch vs Streaming Pipelines Explained

Prerequisites

Table of Contents

What Is a Batch Pipeline?

Implementing a Batch Pipeline in Python

When Batch Works Well

What Is a Streaming Pipeline?

Implementing a Streaming Pipeline in Python

When Streaming Works Well

The Key Differences at a Glance

Choosing Between Batch and Streaming

The Hybrid Pattern: Lambda and Kappa Architectures

Conclusion

How to Use the Command Pattern in Python

Prerequisites

Table of Contents

What Is the Command Pattern?

Setting Up the Receiver

Defining Commands

InsertCommand

DeleteCommand

The Invoker: Running and Undoing Commands

Putting It All Together

Extending with Macros

When to Use the Command Pattern

Conclusion

Recursion in Python – A Practical Introduction for Beginners

Prerequisites

Table of Contents

What Is Recursion?

The Two Rules of Every Recursive Function

Your First Recursive Function

How Python Handles Recursive Calls

Recursion vs Iteration

Working with Nested Data

Recursive Tree Traversal

Memoization: Fixing Slow Recursion

Python's Recursion Limit

When to Use Recursion

Conclusion

How to Implement the Strategy Pattern in Python

Prerequisites

Table of Contents

What Is the Strategy Pattern?

A Simple Strategy Pattern Example

Swapping Strategies at Runtime

Using Abstract Base Classes

When to Use the Strategy Pattern

Conclusion

How to Implement the Observer Pattern in Python

Prerequisites

Table of Contents

What Is the Observer Pattern?

A Simple Observer Pattern Example

Handling Unsubscribes

Different Types of Observers

Using Abstract Base Classes

When to Use the Observer Pattern

Conclusion

How to Use the Factory Pattern in Python - A Practical Guide

Prerequisites

Table of Contents

What Is the Factory Pattern?

A Simple Factory Example

Using a Dictionary for Cleaner Code

Factory Pattern with Parameters

Using Abstract Base Classes

A More Helpful Example: Database Connection Factory

When to Use the Factory Pattern

Wrapping Up

How to Use the Builder Pattern in Python – A Practical Guide for Developers

Prerequisites

Table of Contents

Understanding the Builder Pattern

The Problem: Complex Object Construction

Basic Builder Pattern Implementation

A More Helpful Example: SQL Query Builder

Validation and Error Handling

The Pythonic Builder Pattern

`InsertCommand`

`DeleteCommand`