Chirag Agrawal - freeCodeCamp.org

How to Integrate Vector Search in Columnar Storage

Chirag Agrawal — Wed, 12 Nov 2025 21:43:04 +0000

Integrating vector search into traditional data platforms is becoming a common task in the current AI-driven landscape. When Google announced general availability for vector search in BigQuery in early 2024, it joined a growing list of established databases that have added capabilities for similarity search on high-dimensional embeddings.

But if you examine BigQuery's implementation more closely, you’ll find an approach that goes beyond a simple feature addition. Instead of bolting on a vector library, Google has deeply integrated vector search into its existing distributed, columnar architecture.

In this article, we’ll take a technical deep dive into the engineering decisions behind BigQuery's vector search. We’ll explore how foundational Google technologies like Dremel, Borg, and Colossus, combined with a proprietary columnar format and a novel indexing algorithm, create a highly scalable and efficient platform for AI workloads.

This analysis will give you insights into the architectural trade-offs involved in building vector search at scale. It also demonstrates how you can adapt a system designed for large-scale analytics so that it excels at modern AI tasks.

Prerequisites
The Unique Challenge of Vector Search
BigQuery's Foundational Distributed Architecture
The Role of Columnar Storage in Vector Operations
- Accelerating Computations with SIMD
The TreeAH Indexing Algorithm
The End-to-End Vector Search Query Flow
Practical Implications for Engineering Teams
Conclusion
Further Reading

Prerequisites

This article assumes that you have a solid foundation in distributed systems and database internals, including familiarity with concepts like columnar storage, query execution plans, and distributed query processing.

You should understand the basics of vector embeddings and similarity search, though we'll briefly review the fundamentals. Experience with at least one vector database or search system (such as pgvector, Pinecone, or Elasticsearch) will help contextualize the architectural comparisons.

While deep knowledge of Google Cloud Platform isn't required, basic familiarity with cloud data warehouses and their typical architectures will be beneficial. The article includes discussions of SIMD operations and CPU-level optimizations, so comfort with low-level performance considerations is helpful, though not mandatory.

Code examples assume working knowledge of SQL, with some sections referencing implementation details in languages like Python or Java. Most importantly, you should have experience building or operating production data systems at scale, as many insights focus on practical engineering trade-offs rather than theoretical concepts.

The Unique Challenge of Vector Search

Vector search fundamentally differs from traditional database operations in ways that challenge our existing infrastructure assumptions. Where conventional queries leverage decades of optimization around exact matching and range scans, vector similarity search requires computing distances between high-dimensional points at massive scale.

Consider the numbers. Modern embedding models produce vectors with 768 or more dimensions. At 4 bytes per float32 value, a single embedding consumes roughly 3KB. A modest corpus of 100 million items translates to 300GB of vector data.

But the real challenge isn't storage. The killer is computation. Finding the nearest neighbors to a query vector means computing distance metrics across all those dimensions. For 100 million vectors, a brute-force search requires 76.8 billion floating-point operations per query just for the distance calculations. Even with modern SIMD instructions processing 16 floats at once, you're looking at billions of CPU cycles per search.

This computational reality forces a fundamental compromise: we abandon exact solutions for approximate ones. Approximate Nearest Neighbor (ANN) algorithms trade perfect accuracy for practical query times. They work by partitioning the vector space cleverly, building graphs of nearest neighbors, or using hashing schemes to avoid examining every vector. The engineering challenge becomes balancing query latency, recall accuracy, and resource consumption.

Most purpose-built vector databases address this through specialized in-memory indexes like HNSW or IVF. These work well for single queries but require keeping massive indexes in RAM. In case you are not familiar with these vector indexes, you can read this article.

BigQuery took a different path. Rather than optimizing for single-query latency, they asked what vector search would look like when built for analytical workloads at warehouse scale. The answer required rethinking basic assumptions about index design, storage layout, and query execution.

BigQuery's Foundational Distributed Architecture

BigQuery's vector search runs on the same infrastructure that's been processing SQL queries since 2011. No new cluster type. No specialized vector nodes. Just four core technologies that power most of Google's data processing, now handling a workload they weren't originally designed for.

This isn't the obvious choice. Most vector databases build specialized infrastructure optimized for similarity search. Graph-based indexes need fast random access. In-memory systems require careful memory management. BigQuery took its existing distributed SQL engine and asked: can we make this work for vectors, too?

The answer required leveraging four foundational systems in new ways:

Dremel, the query engine that normally handles SQL, now orchestrates vector similarity computations.
Borg, which allocates resources for everything from Search to YouTube, dynamically assigns thousands of workers to vector queries.
Colossus stores embeddings in the same distributed filesystem that holds petabytes of analytics data.
And Jupiter's datacenter network, built for bulk data processing, now shuttles vector data between computation nodes.

What's surprising isn't that it works, but how well it works. The same architecture that runs aggregate queries over trillion-row tables can search billion-scale vector collections. Understanding how requires examining each component and how they've been adapted for this new workload.

Dremel: The Distributed Query Engine

At its core, BigQuery is powered by Dremel, a distributed query execution engine developed at Google since 2006.

Dremel processes SQL queries using a hierarchical serving tree. A root server receives the query and orchestrates the execution, while mixer nodes break down the work and distribute it to hundreds or thousands of leaf nodes. These leaf nodes perform the actual computations in parallel on segments of the data.

This architecture allows BigQuery to dynamically allocate a massive number of execution threads, known as slots, to a single query, enabling it to process petabytes of data in seconds.

Borg: Cluster Management and Resource Orchestration

The serverless nature of BigQuery is made possible by Borg, Google's cluster management system that predates and inspired Kubernetes.

When a vector search query is submitted, Borg is responsible for finding available machines across Google's global data centers, allocating the precise amount of CPU and memory resources needed for the query's Dremel slots, and managing fault tolerance by automatically rescheduling work if a machine fails. This dynamic resource allocation means users do not need to provision or scale infrastructure, whether they are searching 1,000 vectors or 10 billion.

Colossus: The Distributed Storage Layer

Data in BigQuery is stored in Colossus, Google's next-generation distributed file system. Colossus is designed for exabyte-scale storage, provides high availability through automatic cross-datacenter replication, and is optimized for the high-throughput parallel reads required by Dremel's leaf nodes.

During a vector search, Colossus can deliver data to thousands of nodes simultaneously without creating a storage bottleneck.

Jupiter: The High-Speed Network Fabric

These compute and storage systems are interconnected by Jupiter, Google's internal datacenter network, which features a petabit-per-second bisection bandwidth. The network's design ensures that data can move between Colossus storage and Dremel compute nodes at extremely high speeds, making data shuffling and aggregation phases of a query efficient.

The Role of Columnar Storage in Vector Operations

Storing vectors in columns sounds wrong. Vectors are arrays. They belong together. Why split them across columnar storage?

BigQuery does it anyway, and it works brilliantly. Here's why.

When you search a million vectors, you need exactly one thing from each row: the embedding. Not the product name, price, or category. Just the vector. Row-oriented storage forces you to read entire records and throw away 90% of the data. Columnar storage reads only what you need.

The performance impact is dramatic. A table with 768-dimensional embeddings plus 20 other columns might total 3TB. Reading just the embedding column? 300GB. That's a 10x reduction in I/O before you've done any actual computation.

But the real magic happens at the CPU level. Columnar storage naturally aligns vector data for SIMD processing. Instead of jumping around memory gathering vector components, the CPU finds them laid out sequentially, ready for bulk operations. Modern processors can load 16 floating-point values into a single register and process them simultaneously.

Compression becomes almost trivial, too. BigQuery's Capacitor format applies techniques like Product Quantization directly to the column data, shrinking vectors from 3KB to under 300 bytes. Try doing that with row-oriented storage where vectors are scattered across pages.

The lesson? Sometimes the "wrong" abstraction at one level enables the right optimizations at another.

Accelerating Computations with SIMD

SIMD instructions are a form of hardware-level parallelism available in modern CPUs that provide significant speedups for vector arithmetic. This is achieved through special instruction sets built into the processor.

For example, AVX-512 (Advanced Vector Extensions 512-bit) is an instruction set found in modern high-performance CPUs, such as those from Intel, that allows a single instruction to operate on 512 bits of data at once.

Since a standard single-precision floating-point number is 32 bits, a CPU with AVX-512 can process 16 floating-point numbers in a single operation. This leads to dramatic performance gains.

The difference between scalar and SIMD processing for vector distance calculations is stark:

Scalar approach: Loop through each dimension, multiply corresponding components, accumulate results. For 768 dimensions, that's 768 multiplications, 768 additions, and terrible cache performance as you jump between two different memory locations for each iteration.
SIMD approach: Load 16 components from each vector into 512-bit registers. Execute a single multiply instruction that handles all 16 pairs. Execute a single horizontal add. Repeat 48 times. The CPU's pipeline stays full, the cache prefetcher knows exactly what data you need next, and you've turned 1,536 operations into 96.

The columnar storage pays off here, too. Vectors stored contiguously in memory align perfectly with SIMD register loads. No gather operations, no wasted cycles. Just pure throughput.

BigQuery's query engine is designed to leverage SIMD extensively. It automatically detects and uses the optimal instruction set available on the underlying hardware (for example, AVX-512 for Intel, NEON for ARM). The columnar storage format ensures that vector data is laid out in memory in a way that is friendly to SIMD registers, and the engine processes query vectors in large batches to maximize the utilization of these parallel instructions.

The TreeAH Indexing Algorithm

While brute-force search can be effective at smaller scales due to BigQuery's massive parallelism, efficient search over billions of vectors requires an index. BigQuery's primary vector index is TreeAH (Tree with Asymmetric Hashing), which is based on Google's open-sourced ScaNN (Scalable Nearest Neighbors) algorithm. TreeAH combines three techniques to achieve high performance and memory efficiency.

1. Hierarchical Tree Structure

The algorithm first partitions the entire vector space into thousands of smaller lists. You can think of this like organizing a massive library. Instead of having one giant room with a million books, a library has floors, sections, and shelves. This hierarchy allows you to find a book without scanning every single one.

Similarly, TreeAH groups semantically similar vectors together into partitions and arranges them in a tree. During a query, the search navigates this tree by comparing the query vector to "centroid" vectors that represent the center of each partition, effectively following a path to the most relevant partitions and pruning away large, irrelevant branches of the search space.

2. Product Quantization (PQ)

Within TreeAH, PQ serves a different purpose than just compression. The index doesn't just store smaller vectors – it fundamentally changes how distance calculations work.

TreeAH learns partition-specific codebooks that capture the local structure of vectors in each tree node. This means vectors that end up in the "shoes" partition get quantized differently than those in "electronics." The compression becomes semantic-aware.

When combined with the tree structure, this creates a powerful effect: not only are you searching fewer vectors (thanks to the tree), but you're computing distances faster on the vectors you do search (thanks to PQ).

3. Asymmetric Hashing

The "asymmetric" aspect refers to the fact that the query vector is kept in its full-precision form, while the database vectors are compared in their compressed, quantized form.

The vectors are not of different dimensions, but of different precision. The semantic matching works because the comparison is not direct. The compressed database vector is a code that points to a region in the original vector space. The distance calculation uses the full-precision query vector to look up a pre-computed distance to the center of that region. This way, the rich information in the query vector is used to accurately estimate the distance, avoiding the significant information loss that would occur if both vectors were compressed.

Architectural Comparison: TreeAH vs. HNSW

To better understand the design philosophy behind TreeAH, it’s useful to compare it with HNSW (Hierarchical Navigable Small World), a popular graph-based algorithm used in many dedicated vector databases.

HNSW constructs a multi-layered graph where vectors are nodes and edges connect them to their nearest neighbors. It’s known for excellent single-query latency.

But this performance comes with significant memory overhead, as the graph structure must be stored in addition to the full-precision vectors. HNSW index builds can also be time-consuming, and frequent data updates can lead to memory fragmentation and performance degradation.

TreeAH, in contrast, makes different architectural trade-offs that align with BigQuery's nature as a distributed analytics system.

The comparison reveals a fundamental design choice: TreeAH prioritizes batch throughput, memory efficiency, and scalability over absolute single-query latency. This makes it well-suited for analytical workloads where thousands of searches are performed simultaneously.

The End-to-End Vector Search Query Flow

The execution timeline of a BigQuery vector search demonstrates how parallel processing eliminates traditional bottlenecks. When a VECTOR_SEARCH query arrives, the system initiates multiple operations concurrently rather than executing them sequentially.

The root server begins query planning immediately upon receiving the request. In parallel, Borg starts allocating compute slots across the cluster, targeting 1,000 slots distributed across 50 or more nodes. Borg prioritizes slots that are physically close to the data in Colossus to minimize data movement costs. This allocation typically completes within 10 milliseconds.

Query planning and resource allocation overlap significantly. The mixer nodes receive partial execution plans and begin partitioning the search space before Borg completes all slot allocations. When TreeAH indexes are available, mixers use them to assign specific vector partitions to leaf nodes. This streaming approach ensures that leaf nodes receive work assignments as soon as they come online.

The parallel execution phase showcases the architecture's efficiency. Hundreds or thousands of leaf nodes simultaneously read their assigned vector partitions from Colossus. Jupiter's high-bandwidth network prevents I/O congestion even with thousands of concurrent reads. Each leaf node operates independently: loading compressed vectors, executing SIMD operations for distance calculations, and maintaining local top-k results.

Aggregation begins before all leaf nodes complete their local searches. Mixers implement a streaming merge algorithm that processes results as they arrive. This approach means that by the time the slowest leaf node reports its results, the mixers have already processed most of the data. The final global top-k emerges from this continuous merging process.

The measured 40-millisecond execution time represents the longest path through the parallel execution graph, not the sum of individual operations. Most operations complete much faster, but the overall latency is bounded by the slowest component. This design trades single-query latency for massive throughput, enabling BigQuery to process thousands of vector searches concurrently across billions of vectors.

Practical Implications for Engineering Teams

The architectural choices behind BigQuery's vector search create specific trade-offs that engineering teams need to understand before committing to this approach.

1. Query Latency vs. Throughput

BigQuery vector searches typically complete in 1-10 seconds, not the sub-100ms latency of specialized vector databases. But you can run thousands of searches concurrently without degradation. This makes BigQuery ideal for batch recommendation generation, similarity analysis across product catalogs, or embedding-based data enrichment pipelines. It's the wrong choice for autocomplete features or real-time personalization that requires immediate responses.

2. Cost Model Considerations

BigQuery charges for data scanned, not query execution time. A vector search that scans 1TB costs the same whether it completes in 2 seconds or 20 seconds. This model favors workloads where you search large datasets infrequently rather than small datasets continuously. Running vector search on a 10GB table thousands of times per day will be more expensive than a dedicated vector database with fixed infrastructure costs.

3. Index Management Trade-offs

TreeAH indexes update automatically in the background when new data arrives, typically within 5-15 minutes. You cannot force immediate index updates or control index parameters like you can with HNSW or IVF indexes. This simplicity reduces operational overhead but limits optimization options. If your use case requires fine-tuning recall/latency trade-offs or immediate consistency after updates, you'll need a different solution.

4. Integration Benefits That Actually Matter

The ability to JOIN vector search results with business data in a single query is more powerful than it initially appears. Consider this query pattern:

WITH semantic_matches AS (

  SELECT item_id, distance

  FROM VECTOR_SEARCH(

    TABLE products,

    'embedding',

    (SELECT embedding FROM queries WHERE query_id = @query_id)

  )

)

SELECT p.*, s.distance

FROM semantic_matches s

JOIN products p USING (item_id)

WHERE p.in_stock = TRUE

  AND p.price BETWEEN 50 AND 200

  AND p.category_restrictions IS NULL

ORDER BY s.distance

LIMIT 20

This combines semantic search with business logic, inventory status, and access controls in one atomic operation. Implementing this with a separate vector database requires complex synchronization between systems.

Conclusion

BigQuery's vector search implementation challenges our assumptions about what a data warehouse can do. Instead of building another specialized vector database, Google pushed their existing infrastructure to handle a fundamentally different workload.

The key insight is recognizing that vector search at scale is a data processing problem. And processing data at scale is what BigQuery was built for.

By leveraging its columnar architecture and hardware-aware algorithms like TreeAH, BigQuery makes a deliberate trade-off. It exchanges the sub-millisecond latency of in-memory systems for massive batch throughput and incredible resource efficiency. An index that uses 10x less memory than HNSW is a trade-off many teams building analytical AI systems would gladly make.

The real power emerges when vectors live alongside business data. Complex queries that would require multiple systems and synchronization nightmares become simple SQL. "Find similar products, but only from reliable suppliers, in stock locally, with no recent quality issues." One query, one system, no architectural gymnastics.

This approach validates a broader trend: vector capabilities are becoming table stakes for data platforms. The question isn't whether your data platform will support vectors, but how well it integrates them into existing workflows.

For teams building analytical AI applications, BigQuery offers a pragmatic path. It won't win latency benchmarks against dedicated vector databases. But for batch processing, integrated analytics, and operational simplicity at scale, it demonstrates that sometimes the best vector database isn't a vector database at all. It's your data warehouse, evolved.

How to Persist State in Time-Series Models with Docker and Redis

Chirag Agrawal — Thu, 09 Oct 2025 01:18:59 +0000

Have you ever built a brilliant time-series model, one that could forecast sales or predict stock prices, only to watch it fail in the real world? Well, this is a common frustration. Your model works perfectly on your machine, but the moment you deploy it in a Docker container, it seems to develop amnesia. It forgets everything it knew yesterday, making its predictions for tomorrow useless.

Don’t worry. This isn't likely a flaw in your model. It's a clash between how time-series models and Docker containers are designed to work.

Time-series models are all about memory. They need to remember the past to predict the future. But Docker containers are built to be stateless and forgetful, wiping their memory clean with every restart. This fundamental conflict can turn a powerful model into a worthless one in production.

In this article, we’ll solve that problem. We're going to give your time-series model a permanent memory. You'll learn how to build a production-ready prediction service that uses Redis as an external brain and Docker volumes to ensure that memory survives any restart. We'll walk through a hands-on example, step-by-step, so you can learn how to build a system that is both intelligent and incredibly reliable.

Who is This Guide For?

To get the most out of this tutorial, it’ll be helpful to have a few things under your belt. We’ll be diving into some code and command-line work, so a little preparation will go a long way.

The main tools for this project are Docker and Docker Compose. Make sure you have them installed and running on your computer.
You’ll also find it easier to follow along if you’re comfortable with the basics of Docker, Python, and the Flask web framework. A bit of command-line experience will also be handy for running the commands in the tutorial.
But don't worry if you've never used Redis before. All you need to know is that it’s a fast, in-memory database. We’ll handle the rest along the way.

Think of this as a guided tour. As long as you're curious and have the basic tools ready, you'll be in great shape.

Understanding the Problem

Before jumping into solutions, let's first clarify what a time-series model is and then explore why containerizing it is so tricky.

So, what is a time-series model?

Simply put, a time-series model is a type of model that analyzes data points collected over time to predict future values. Think of it like predicting the weather. A meteorologist doesn't just look at the sky right now. They look at the temperature, pressure, and wind patterns from the last few hours and days to forecast what will happen tomorrow.

Time-series models do the same thing with data, whether it's website traffic, stock prices, or energy consumption. The key takeaway is that history matters. The sequence of past events provides the context needed to make an intelligent prediction about the future.

Now, here’s what breaks when you put these models in Docker.

1. Containers are ephemeral by design

Docker containers are meant to be stateless. This works great for most APIs. A user profile endpoint? Stateless. A sentiment analysis model? Stateless. They take an input, return an output, and forget everything in between.

Time-series models don't work this way. They need context from previous predictions. Without it, your model is essentially blind.

2. Lost context between predictions

Each prediction happens in isolation. Your model receives a single data point and makes a guess without knowing what came before. This defeats the entire purpose of time-series modeling.

You may think: "I'll just load all historical data on every request." But that approach fails for two reasons:

It's slow. Really slow if you have thousands of data points
It doesn't scale. When you have multiple series or high request volume, you'll hit performance walls fast

3. Model amnesia on restart

Every time you deploy a new version or the container crashes, all accumulated state disappears. Your model starts from scratch. In production, this is unacceptable.

The Solution: External State Store

Instead of keeping state inside the container, we’ll move it outside. Redis becomes the model's memory.

The pattern looks like this:

Client Request → Flask API → Redis → Prediction with Context

Your container stays stateless and replaceable. But the system as a whole maintains state through Redis.

Hands-On Implementation

Let's build this. Clone the demo repository:

git clone https://github.com/ag-chirag/docker-redis-time-series
cd docker-redis-time-series

Start with the broken approach

The docker-compose.initial.yml file shows what NOT to do:

services:
  api:
    build: ./flask-api
    ports:
      - "5000:5000"

  redis:
    image: redis:alpine

Notice what's missing? No volumes. Redis stores data in the container's filesystem, which means that data is temporary.

Run it:

docker compose -f docker-compose.initial.yml up

Make a few predictions:

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "series_id": "demo",
    "historical_data": [
      {"timestamp": "2024-01-01T12:00:00", "value": 10},
      {"timestamp": "2024-01-01T12:01:00", "value": 20},
      {"timestamp": "2024-01-01T12:02:00", "value": 30}
    ]
  }'

You'll get a response showing Redis is working:

{
  "data_points_used": 3,
  "prediction": 40,
  "redis_connected": true
}

Now restart the services:

docker compose down
docker compose -f docker-compose.initial.yml up

Make another prediction. Check the data_points_used field. It reset. All your historical data is gone. This is exactly what we're trying to avoid.

How to fix it with volumes

The correct docker-compose.yml adds persistence:

services:
  api:
    build: ./flask-api
    ports:
      - "5000:5000"
    environment:
      - REDIS_HOST=redis

  redis:
    image: redis:alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data

volumes:
  redis_data:

So, what is a volume and how does it work?

Think of a Docker volume as a dedicated external hard drive for your container. By default, when a container writes data, it does so to a temporary layer that gets destroyed when the container is removed. A volume provides a way to save that data permanently.

Here’s how it works:

Docker creates and manages a special storage area on the host machine, completely separate from any container's filesystem. In our docker-compose.yml, the volumes: redis_data: section at the bottom tells Docker to create a named volume called redis_data.
When the Redis container starts, the volumes: - redis_data:/data line tells Docker to "plug in" this external hard drive. It connects the redis_data volume to the /data directory inside the container.
Now, whenever the Redis process inside the container writes data to its /data directory (which we've configured it to do), it's actually writing to the redis_data volume on the host machine.
When you run docker compose down, the Redis container is destroyed, but the redis_data volume is untouched. It's like unplugging the external hard drive, and the data is still safe. The next time you run docker compose up, a brand new Redis container is created, the volume is re-attached, and Redis finds all its old data right where it left it.

This mechanism is the key to giving our stateful service a memory that survives restarts.

Run the corrected version:

docker compose up --build

Send several predictions to build up state:

for i in {1..5}; do
  curl -X POST http://localhost:5000/predict \
    -H "Content-Type: application/json" \
    -d "{
      \"series_id\": \"demo\",
      \"historical_data\": [{\"timestamp\": \"2024-01-01T12:0$i:00\", \"value\": $((i*10))}]
    }"
done

Now comes the test. Restart everything:

docker compose down
docker compose up

Make another prediction. Look at data_points_used. It includes all previous points. The model picks up exactly where it left off.

This works because the volume exists independently of the container lifecycle.

How the code handles state

The Flask API in flask-api/app.py stores each data point in Redis using sorted sets:

def store_data_point(series_id, timestamp, value):
    key = f"ts:{series_id}"
    redis_client.zadd(key, {json.dumps({"ts": timestamp, "val": value}): timestamp})

When making predictions, it retrieves recent history:

def get_recent_data(series_id, limit=100):
    key = f"ts:{series_id}"
    data = redis_client.zrange(key, -limit, -1)
    return [json.loads(d) for d in data]

Redis sorted sets give you automatic time ordering. The volume ensures this data survives restarts.

Test the health endpoint

Check that everything is connected properly:

curl http://localhost:5000/health

You should see:

{
  "model_loaded": true,
  "redis_connected": true,
  "status": "healthy"
}

If redis_connected is false, check your Docker logs. Common issues are network configuration or Redis not starting properly.

What About Scaling?

This setup works well for single-instance deployments. When traffic increases, you have a few options.

Horizontal scaling with Redis Cluster

For high throughput, distribute your data across multiple Redis nodes. Redis Cluster handles sharding automatically.

High availability with Redis Sentinel

Add failover capability so your state store doesn't become a single point of failure. Sentinel monitors Redis instances and promotes replicas when the primary fails.

Use managed Redis services

AWS ElastiCache, Azure Cache for Redis, or Google Cloud Memorystore handle the operational burden. You focus on your model, they handle Redis reliability.

The key insight: your API containers remain stateless. You scale the state store independently.

Common Pitfalls to Avoid

I can't emphasize this enough: test your persistence before deploying to production.

Don't assume volumes work

Actually restart your containers and verify state persists. I've seen deployments fail because someone forgot to mount the volume in production.

Don't ignore Redis memory limits

Redis keeps everything in memory. Monitor your memory usage. Set maxmemory policies appropriate for your workload. If you run out of memory, Redis will start evicting keys or refuse writes.

Don't skip monitoring

Add health checks. Monitor Redis connection status. Track prediction latency. You want to know when things break, not learn about it from angry users.

Conclusion

Time-series models need memory. Docker containers lose memory by default. The solution is simple: separate state from compute.

Use Redis as an external state store. Use Docker volumes to persist that state. Your model stays smart, your containers stay replaceable, and your deployments become reliable.

The full working code is available at github.com/ag-chirag/docker-redis-time-series. Clone it, run it, break it, learn from it.

And remember: the simplest solution that works is usually the right one. You don't always need Kubernetes and StatefulSets. Sometimes Docker Compose and a volume are enough.

Chirag Agrawal - freeCodeCamp.org

How to Integrate Vector Search in Columnar Storage

Table of Contents

Prerequisites

The Unique Challenge of Vector Search

BigQuery's Foundational Distributed Architecture

Dremel: The Distributed Query Engine

Borg: Cluster Management and Resource Orchestration

Colossus: The Distributed Storage Layer

Jupiter: The High-Speed Network Fabric

The Role of Columnar Storage in Vector Operations

Accelerating Computations with SIMD

The TreeAH Indexing Algorithm

1. Hierarchical Tree Structure

2. Product Quantization (PQ)

3. Asymmetric Hashing

Architectural Comparison: TreeAH vs. HNSW

The End-to-End Vector Search Query Flow

Practical Implications for Engineering Teams

1. Query Latency vs. Throughput

2. Cost Model Considerations

3. Index Management Trade-offs

4. Integration Benefits That Actually Matter

Conclusion

Further Reading

How to Persist State in Time-Series Models with Docker and Redis

What we’ll cover:

Who is This Guide For?

Understanding the Problem

So, what is a time-series model?

1. Containers are ephemeral by design

2. Lost context between predictions

3. Model amnesia on restart

The Solution: External State Store

Hands-On Implementation

Start with the broken approach

How to fix it with volumes

So, what is a volume and how does it work?

How the code handles state

Test the health endpoint

What About Scaling?

Horizontal scaling with Redis Cluster

High availability with Redis Sentinel

Use managed Redis services

Common Pitfalls to Avoid

Don't assume volumes work

Don't ignore Redis memory limits

Don't skip monitoring

Conclusion