indexing - freeCodeCamp.org

How Database Indexes Work – A Practical Guide with PostgreSQL Examples

iyiola — Thu, 16 Apr 2026 17:27:44 +0000

Every developer eventually runs into a slow query. The table has grown from a few hundred rows to a few million, and what used to take milliseconds now takes seconds — or worse.

The fix, more often than not, is an index.

A database index is a data structure that helps the database find rows faster without scanning the entire table. It works a lot like the index at the back of a textbook: instead of reading every page to find a topic, you look it up in the index, get the page number, and go straight there.

In this tutorial, you'll learn how indexes work under the hood, how to create and use them effectively in PostgreSQL, and how to avoid the common mistakes that make indexes useless or even harmful.

Prerequisites
Why Do You Need Indexes?
How Indexes Work Under the Hood
How to Create Your First Index
How to Use EXPLAIN ANALYZE to Measure Performance
Types of Indexes in PostgreSQL
How to Create a Composite Index
How to Create a Partial Index
How to Create an Expression Index
How to Create a Unique Index
How to Manage Indexes
When Indexes Hurt Instead of Help
Common Mistakes That Prevent Index Usage
Best Practices for Indexing
Conclusion

Prerequisites

To follow along with the examples, you'll need:

Basic knowledge of SQL (SELECT, INSERT, UPDATE, DELETE, WHERE, JOIN)
A running PostgreSQL instance (version 12 or later)
A SQL client like psql, pgAdmin, or DBeaver

If you don't have PostgreSQL installed locally, you can use a free cloud-hosted instance from services like Neon or Supabase.

Why Do You Need Indexes?

When you run a query like SELECT * FROM users WHERE email = 'jane@example.com', the database needs to find the matching row. Without an index, PostgreSQL performs a sequential scan — it reads every single row in the table and checks whether the email column matches.

For a table with 100 rows, this is fine. For a table with 10 million rows, it's painfully slow.

An index solves this by creating a separate, sorted data structure that maps column values to their row locations. Instead of scanning 10 million rows, PostgreSQL can look up the value in the index and jump directly to the matching row. This can reduce query time from seconds to milliseconds.

But indexes aren't free. They come with trade-offs you need to understand before adding them everywhere. You'll learn about those trade-offs throughout this tutorial.

How Indexes Work Under the Hood

PostgreSQL's default index type is the B-tree (balanced tree). Understanding how a B-tree works will help you make smarter decisions about when and how to index.

A B-tree organizes data into a sorted, hierarchical structure with three levels:

Root node — the top of the tree. It holds a few values that divide the data into broad ranges.
Internal nodes — each one further narrows down the range.
Leaf nodes — the bottom level. These hold the actual indexed values along with pointers to the corresponding rows in the table.

When PostgreSQL uses a B-tree index to find a value, it starts at the root and follows the path that matches the target value, moving through internal nodes until it reaches the correct leaf node. This path is called a tree traversal, and it typically requires only 3–4 steps even for tables with millions of rows.

Think of it like a phone book. You don't start at page one and read every name. You open to roughly the right section (root), narrow it down to the right page (internal nodes), and scan the entries on that page (leaf node).

This sorted structure is also why B-tree indexes work well for range queries like WHERE price > 50 AND price < 100. The database finds the starting point in the tree and then scans forward through the leaf nodes, which are already in order.

How to Create Your First Index

Let's build a practical example. You'll create a table, load it with data, and see the difference an index makes.

Step 1 – Create the Table and Insert Sample Data

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    city VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW()
);

Now insert a large number of rows so the performance difference is visible. This generates 500,000 rows of sample data:

INSERT INTO customers (first_name, last_name, email, city)
SELECT
    'User' || gs,
    'Last' || gs,
    'user' || gs || '@example.com',
    (ARRAY['Lagos', 'London', 'New York', 'Berlin', 'Tokyo'])[1 + (gs % 5)]
FROM generate_series(1, 500000) AS gs;

Step 2 – Query Without an Index

EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'user250000@example.com';

You'll see output similar to this:

Seq Scan on customers  (cost=0.00..11374.00 rows=1 width=52) (actual time=45.123..91.456 rows=1 loops=1)
  Filter: ((email)::text = 'user250000@example.com'::text)
  Rows Removed by Filter: 499999
Planning Time: 0.085 ms
Execution Time: 91.502 ms

The key detail here is Seq Scan — PostgreSQL scanned all 500,000 rows to find a single match. It filtered out 499,999 rows. That's a lot of wasted work.

Step 3 – Create an Index

CREATE INDEX idx_customers_email ON customers (email);

This creates a B-tree index on the email column. The name idx_customers_email follows a common naming convention: idx_ prefix, then the table name, then the column name.

Step 4 – Query With the Index

Run the same query again:

EXPLAIN ANALYZE
SELECT * FROM customers WHERE email = 'user250000@example.com';

Now you'll see something like this:

Index Scan using idx_customers_email on customers  (cost=0.42..8.44 rows=1 width=52) (actual time=0.034..0.036 rows=1 loops=1)
  Index Cond: ((email)::text = 'user250000@example.com'::text)
Planning Time: 0.112 ms
Execution Time: 0.058 ms

The scan type changed from Seq Scan to Index Scan. The execution time dropped from ~91ms to ~0.06ms. That's roughly a 1,500x improvement — from one line of SQL.

How to Use `EXPLAIN ANALYZE` to Measure Performance

EXPLAIN ANALYZE is your most important tool for understanding how PostgreSQL executes a query. You already saw it in the previous section, but let's break down what the output means.

EXPLAIN ANALYZE SELECT * FROM customers WHERE city = 'Lagos';

The output will tell you several things:

Scan type — whether PostgreSQL used a sequential scan, index scan, bitmap index scan, or another access method
Cost — the estimated cost in arbitrary units. The first number is the startup cost, the second is the total cost
Rows — how many rows PostgreSQL estimated it would find versus how many it actually found
Actual time — the real time in milliseconds to execute the query
Rows Removed by Filter — how many rows were scanned but didn't match the condition

If you see Seq Scan on a large table with a selective WHERE clause, that's usually a sign you need an index. If you see Index Scan or Index Only Scan, your index is working.

One thing to keep in mind: EXPLAIN without ANALYZE shows the plan without actually running the query. EXPLAIN ANALYZE runs the query and shows real timing data. Always use EXPLAIN ANALYZE when you're investigating performance, but be careful with it on destructive queries — EXPLAIN ANALYZE DELETE FROM ... will actually delete the rows. Wrap those in a transaction and roll back:

BEGIN;
EXPLAIN ANALYZE DELETE FROM customers WHERE city = 'Berlin';
ROLLBACK;

Types of Indexes in PostgreSQL

PostgreSQL supports several index types, each optimized for different query patterns.

B-tree (Default)

B-tree is the default index type and covers the vast majority of use cases. It supports equality checks (=), range queries (<, >, <=, >=, BETWEEN), sorting (ORDER BY), and IS NULL / IS NOT NULL checks.

-- These are equivalent – B-tree is the default
CREATE INDEX idx_name ON customers (last_name);
CREATE INDEX idx_name ON customers USING btree (last_name);

Use B-tree when you don't have a specific reason to use something else.

Hash

Hash indexes are optimized purely for equality comparisons (=). They don't support range queries or sorting. In practice, B-tree handles equality checks almost as fast, so hash indexes are rarely necessary.

CREATE INDEX idx_email_hash ON customers USING hash (email);

Consider a hash index only if you have a very large table with frequent equality-only lookups and want to save a small amount of index space.

GIN (Generalized Inverted Index)

GIN indexes are designed for values that contain multiple elements — like arrays, JSONB documents, or full-text search vectors. Instead of indexing a single value per row, GIN indexes every element within the value.

-- Add a JSONB column
ALTER TABLE customers ADD COLUMN preferences JSONB DEFAULT '{}';

-- Index the JSONB column
CREATE INDEX idx_preferences ON customers USING gin (preferences);

-- Now this query uses the GIN index
SELECT * FROM customers WHERE preferences @> '{"newsletter": true}';

Use GIN when you're querying inside JSONB data, searching arrays with @> or &&, or doing full-text search with tsvector.

GiST (Generalized Search Tree)

GiST indexes support geometric data, ranges, and full-text search. They're commonly used with PostGIS for geospatial queries.

-- Range type example
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    duration TSRANGE
);

CREATE INDEX idx_event_duration ON events USING gist (duration);

-- Find overlapping events
SELECT * FROM events WHERE duration && '[2025-01-01, 2025-01-31]'::tsrange;

Use GiST when you're working with spatial data, range types, or need overlap/containment operators.

BRIN (Block Range Index)

BRIN indexes are extremely small and work well on large tables where the physical row order correlates with the indexed column's value. A common example is a timestamp column on an append-only table where new rows always have later timestamps.

CREATE INDEX idx_created_at_brin ON customers USING brin (created_at);

BRIN stores summary information (min/max values) for each block of rows rather than indexing every row individually. This makes the index much smaller than a B-tree, but it only works well when the data is naturally ordered.

Use BRIN for very large, append-only tables with naturally ordered data — like logs, events, or time-series data.

How to Create a Composite Index

A composite index (also called a multi-column index) covers more than one column. It's useful when your queries frequently filter or sort by multiple columns together.

CREATE INDEX idx_city_lastname ON customers (city, last_name);

The order of columns in a composite index matters. PostgreSQL can use this index for queries that filter on city alone, or on both city and last_name. But it can't efficiently use this index for queries that filter only on last_name.

Think of it like a phone book sorted by city first, then by last name within each city. You can easily look up everyone in Lagos. You can also look up everyone named "Adeyemi" in Lagos. But finding all people named "Adeyemi" across all cities requires scanning the whole book.

This principle is called the leftmost prefix rule: PostgreSQL can use a composite index for queries that include the leftmost column(s) of the index, but not for queries that skip them.

-- ✅ Uses the index (matches leftmost column)
SELECT * FROM customers WHERE city = 'Lagos';

-- ✅ Uses the index (matches both columns, left to right)
SELECT * FROM customers WHERE city = 'Lagos' AND last_name = 'Adeyemi';

-- ❌ Cannot use this index efficiently (skips the leftmost column)
SELECT * FROM customers WHERE last_name = 'Adeyemi';

When deciding column order, place the most selective column first — the one that narrows down the results the most.

How to Create a Partial Index

A partial index covers only a subset of rows in a table. You define the subset with a WHERE clause in the index definition.

This is useful when you only query a specific portion of the data. For example, if you have an orders table and you frequently query for pending orders but rarely look at completed ones:

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total NUMERIC(10, 2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Only index rows where status is 'pending'
CREATE INDEX idx_orders_pending ON orders (customer_id)
WHERE status = 'pending';

This index is smaller than a full index because it skips all rows that don't match the WHERE condition. Smaller indexes use less disk space, consume less memory, and are faster to maintain during writes.

For the index to be used, your query's WHERE clause must match the index's condition:

-- ✅ Uses the partial index
SELECT * FROM orders WHERE status = 'pending' AND customer_id = 42;

-- ❌ Cannot use the partial index (different status)
SELECT * FROM orders WHERE status = 'shipped' AND customer_id = 42;

How to Create an Expression Index

Sometimes you need to index the result of a function or expression rather than a raw column value. Expression indexes (also called functional indexes) handle this.

A common scenario is case-insensitive email lookups. If your queries use LOWER(email), a regular index on email won't help — PostgreSQL sees the function call as a different expression.

-- Regular index on email – won't help with LOWER() queries
CREATE INDEX idx_email ON customers (email);

-- This query does NOT use the index above
SELECT * FROM customers WHERE LOWER(email) = 'user100@example.com';

To fix this, create an index on the expression itself:

CREATE INDEX idx_email_lower ON customers (LOWER(email));

Now queries that use LOWER(email) in their WHERE clause will use this index:

-- ✅ Uses the expression index
SELECT * FROM customers WHERE LOWER(email) = 'user100@example.com';

The rule is straightforward: the expression in your query must match the expression in the index exactly. If the index is on LOWER(email), your query must also use LOWER(email).

How to Create a Unique Index

A unique index guarantees that no two rows have the same value (or combination of values) in the indexed columns. It serves a dual purpose: it enforces data integrity and provides fast lookups.

CREATE UNIQUE INDEX idx_customers_email_unique ON customers (email);

If you try to insert a duplicate value, PostgreSQL will reject the operation:

INSERT INTO customers (first_name, last_name, email, city)
VALUES ('Test', 'User', 'user1@example.com', 'Lagos');
-- ERROR: duplicate key value violates unique constraint "idx_customers_email_unique"

You might wonder how this differs from a UNIQUE constraint. Under the hood, PostgreSQL implements UNIQUE constraints by creating a unique index. The two are functionally identical.

The difference is intent — a UNIQUE constraint expresses a data integrity rule, while a unique index explicitly focuses on query performance with uniqueness as a bonus.

How to Manage Indexes

As your database grows, you'll need to inspect, monitor, and maintain your indexes.

How to List All Indexes on a Table

SELECT
    indexname,
    indexdef
FROM pg_indexes
WHERE tablename = 'customers';

This shows the name and full definition of every index on the table.

How to Check Index Size

SELECT
    pg_size_pretty(pg_relation_size('idx_customers_email')) AS index_size;

For a broader view of all indexes and their sizes:

SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE relname = 'customers'
ORDER BY pg_relation_size(indexrelid) DESC;

How to Find Unused Indexes

Indexes that are never used waste disk space and slow down writes. You can find them by checking pg_stat_user_indexes:

SELECT
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE relname = 'customers'
AND idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

If an index has idx_scan = 0 after a reasonable period of normal usage, it's a candidate for removal. Just make sure to check across a full business cycle — some indexes are only used during monthly reports or seasonal operations.

How to Drop an Index

DROP INDEX IF EXISTS idx_customers_email;

If you're dropping an index on a production table and want to avoid locking writes, use CONCURRENTLY:

DROP INDEX CONCURRENTLY IF EXISTS idx_customers_email;

How to Rebuild an Index

Over time, indexes can become bloated as rows are inserted, updated, and deleted. You can rebuild an index to reclaim space:

REINDEX INDEX idx_customers_email;

Or rebuild all indexes on a table:

REINDEX TABLE customers;

On production systems, use REINDEX CONCURRENTLY (PostgreSQL 12+) to avoid locking the table:

REINDEX INDEX CONCURRENTLY idx_customers_email;

When Indexes Hurt Instead of Help

Indexes aren't free. Every index you add comes with costs:

Write overhead — every INSERT, UPDATE, or DELETE must also update every index on the table. If a table has 10 indexes and you insert a row, PostgreSQL performs 11 write operations (one for the table and one for each index). On write-heavy tables, excessive indexes can significantly slow down data modification.
Storage cost — indexes consume disk space. On large tables, indexes can take up as much space as the table itself, sometimes more. You can check this with pg_relation_size.
Memory consumption — PostgreSQL caches frequently used indexes in memory. More indexes means more memory pressure, which can push useful data out of the cache and slow down other queries.
Maintenance burden — indexes need periodic maintenance (vacuuming, reindexing) and add complexity to schema migrations.

The question to ask is not "should I add an index?" but rather "does the read performance gain justify the write performance cost for this table's workload?"

Common Mistakes That Prevent Index Usage

You can have the perfect index and PostgreSQL might still ignore it. Here are the most common reasons.

Wrapping the Indexed Column in a Function

-- Index on email
CREATE INDEX idx_email ON customers (email);

-- ❌ PostgreSQL cannot use the index because of LOWER()
SELECT * FROM customers WHERE LOWER(email) = 'user1@example.com';

-- ✅ Fix: create an expression index on LOWER(email)
CREATE INDEX idx_email_lower ON customers (LOWER(email));

Any function applied to the indexed column in a WHERE clause prevents the standard index from being used. You need an expression index that matches the function.

Implicit Type Casting

-- id is an INTEGER column with an index
-- ❌ Passing a string forces a type cast, which may prevent index usage
SELECT * FROM customers WHERE id = '42';

-- ✅ Use the correct type
SELECT * FROM customers WHERE id = 42;

When the query's value type doesn't match the column type, PostgreSQL may cast the column to match, which prevents index usage.

Using OR Conditions Across Different Columns

-- ❌ OR across different columns can prevent index usage
SELECT * FROM customers WHERE email = 'user1@example.com' OR city = 'Lagos';

-- ✅ Rewrite as UNION for better index utilization
SELECT * FROM customers WHERE email = 'user1@example.com'
UNION
SELECT * FROM customers WHERE city = 'Lagos';

Leading Wildcards in LIKE Queries

-- ❌ Leading wildcard cannot use a B-tree index
SELECT * FROM customers WHERE email LIKE '%@example.com';

-- ✅ Trailing wildcard CAN use a B-tree index
SELECT * FROM customers WHERE email LIKE 'user1%';

A B-tree index is sorted from left to right. A leading wildcard (%something) means the database can't use the sorted structure and falls back to a sequential scan. If you need to search by suffix or substring, consider a GIN index with the pg_trgm extension.

Low Selectivity

If a column has very few distinct values relative to the number of rows (low selectivity), PostgreSQL may decide a sequential scan is faster than using the index.

For example, if a status column has only three possible values ('pending', 'shipped', 'delivered') and each value covers roughly a third of the table, an index on status alone provides little benefit. PostgreSQL would still need to read a large portion of the table, and the extra index lookup adds overhead.

A partial index is often the better solution in these cases.

Best Practices for Indexing

Here's a summary of the key principles to follow:

Index columns that appear in WHERE, JOIN, and ORDER BY clauses. These are the columns the database needs to search, match, or sort by. Start with the queries that run most frequently or take the longest.
Measure before and after with EXPLAIN ANALYZE. Never add an index based on guesswork. Run your query with EXPLAIN ANALYZE, add the index, and run it again. If the execution time doesn't improve meaningfully, the index isn't helping.
Don't index every column. Each index slows down writes and consumes storage. Be deliberate about which columns you index based on actual query patterns.
Use composite indexes for multi-column filters. If your queries commonly filter on city and last_name together, a composite index on (city, last_name) is more efficient than two separate single-column indexes.
Put the most selective column first in composite indexes. The column that narrows the results the most should come first.
Use partial indexes when you only query a subset of data. If 90% of your queries target rows where status = 'active', a partial index on that subset is smaller and faster than a full index.
Monitor index usage regularly. Query pg_stat_user_indexes to find unused indexes and remove them.
Rebuild bloated indexes periodically. On tables with heavy update/delete activity, indexes can become bloated. Use REINDEX CONCURRENTLY on production systems.

Conclusion

In this tutorial, you learned what database indexes are and why they matter for query performance. You explored how B-tree indexes work under the hood, created several types of indexes (single-column, composite, partial, expression, and unique), and used EXPLAIN ANALYZE to measure the impact.

You also learned about the trade-offs indexes introduce — write overhead, storage cost, and memory pressure — and the common mistakes that silently prevent PostgreSQL from using your indexes.

The core principle is simple: index deliberately based on your actual query patterns, measure the results, and remove anything that isn't pulling its weight.

If you found this tutorial helpful, you can find more of my writing on freeCodeCamp and connect with me on LinkedIn and X.

How to Streamline Search in Web Applications with Elasticsearch

Oluwatobi — Thu, 16 Apr 2026 16:25:38 +0000

They say data is the new gold. But navigating through a large dataset to meet the demands of consumers in record time still gives backend devs a headache.

Conventional database queries often aren't totally reliable in getting accurate search results fast. But fortunately, Elasticsearch comes to the rescue.

In this article, I'll walk you through how to use Elasticsearch to enhance database searches and analytics while still maintaining efficiency.

Here are the prerequisites for this tutorial:

A Node.js environment
Basic backend knowledge

With that, let's get started. But first of all, what is Elasticsearch?

Table of Content

What is Elasticsearch?
Elasticsearch Key Terms
How to Set Up Elasticsearch
How to Set Up the Demo Project
How to Set Up Elasticsearch in Your Project
How to Work with Indexes in Elasticsearch
Search Implementation
Full Code
Wrapping Up
Conclusion

What is Elasticsearch?

Elasticsearch is a search engine built by Apache that can index words and phrases, providing advanced text and vector search capabilities. It also has other useful features such as search analytics and an auto-complete feature.

Note that Elasticsearch isn't a database, even though it does provide indexing features (which popular databases also do).

Other popular alternatives to this tool used in production environments include Algolia, OpenSearch and MeiliSearch.

Elasticsearch Key Terms

in this section, we'll go over some important terminology used in Elasticsearch. To ease your understanding, I'll make references to common database terminologies.

Index: This serves as a storage location for the data you're going to explore. It's like the database for Elasticsearch. It also shares other properties that DBs possess like uniqueness.
Document: This is the smallest unit of information stored within the index. It's structurally identical to the MongoDB-based document and is also similar to rows in SQL-based databases.
Mapping: Mapping refers to sets of rules or instructions that define how documents and fields are stored in the Elasticsearch index.
Score: This is generated by Elasticsearch to show the degree of relevance of the search query to the stored index.
Analyzer: When data is sent to the Elasticsearch engine for indexing, it initially passes through an analyzer which processes the text before indexing. This is achieved via Filters and Tokenizers.
Tokenizers: This tool converts the gross unstructured data sent to the Elasticsearch engine into structured data tokens for further processing and storage.
Aggregator: This search tool performs detailed analysis on the tokens stored in the index to generate actionable data insights. It's an advantage of the Elasticsearch engine. Mongo DB’s aggregator offer similar functions.
Filter: A set of instructions which modifies tokens generated during the process of analysis. This could entail removal of fillers, capitalization rules, and so on.
Bulk index: This refers to indexing more than one document at once. You typically do this when indexing a database with pre-existing content.

How to Set Up Elasticsearch

For the purpose of this tutorial, we'll use Elasticsearch's installable software on our local machine. Online hosted versions of Elasticsearch also exist which work hitch-free as well.

Here is a link detailing how to setup Elasticsearch on Windows. For non-Windows users, you can also install Elasticsearch on Linux/Mac OS or use Docker.

Note that for Windows users, make sure you run Elasticsearch as an Administrator to avoid installation errors.

After successful installation, you can test if it's functioning by navigating to localhost:9200 which serves as the default local endpoint for Elasticsearch. There you'll see a success message on the screen similar to the image below:

With that , we'll move on to setting up our project and integrating ElasticSearch into our demo project.

How to Set Up the Demo Project

For the sake of this tutorial, we will be utilizing a ready-built forum-based backend application built in Node Express JS. Here is the link to the project.

to get the project up and running, clone this package and run

npm start

MySQL will serve as the default database for this tutorial. Let's now proceed to the next section.

How to Set Up Elasticsearch in Your Project

The existing demo project is a backend implementation of a forum site which allows users to post text content and facilitate discussions through category-based threads.

Elasticsearch is great for ensuring that users can sift through these posts and threads to accurately locate key content using distinct keywords. This is more effective than using traditional database search queries which can be cumbersome.

To set up Elasticsearch, start by installing the Elasticsearch npm package. To do this, run the command below in your project directory:

npm install @elastic/elasticsearch

After successful installation, create a config.js file where you'll setup your driver to connect to your Elasticsearch application.

const { Client } = require('@elastic/elasticsearch');

const esClient = new Client({
  node: 'http://localhost:9200',
  auth: {
    username: process.env.ELASTICSEARCH_USERNAME,
    password: process.env.ELASTICSEARCH_PASSWORD
  },
  maxRetries: 5,
  requestTimeout: 60000,
  tls: {
    rejectUnauthorized: process.env.NODE_ENV !== 'development'
  }
});

module.exports = esClient;

To access and use Elasticsearch's capabilities within your backend application, you'll need to setup and configure your Elasticsearch driver. The details are specified in the config file code above.

As mentioned earlier, Elasticsearch runs on the localhost:9200 port. So your Elasticsearch node will be directed to the localhost port. Online hosted Elasticsearch nodes will also work in similar scenarios.

Next in the config file, you'll provide the authentication credentials required to access Elasticsearch. The requested username and password will be supplied within the Auth object. If you're running Elasticsearch locally, authentication may not be required unless security is enabled.

In this scenario, MaxRetries refers to the number of maximum unsuccessful attempts to access Elasticsearch. In this case, we've pegged it at 5 attempts. requestTimeout is the time in milliseconds after which the request will automatically terminate if it's not processed.

Once you've completion the Config file, you'll import this config and initialize the Elasticsearch client when your backend starts.

How to Work with Indexes in Elasticsearch

Before we start harnessing the full power of Elasticsearch, we need to customize its search capabilities within the backend of the project. This involves setting up an index within the Elasticsearch Engine that indexes all posts made to the backend application.

const esClient = require('./config');

const setupIndex = async () => {
  try {
    const indexExists = await esClient.indices.exists({
      index: INDEX_NAME
    });

    if (indexExists) {
      console.log(`Index "${INDEX_NAME}" already exists`);
      return;
    }

    await esClient.indices.create({
      index: INDEX_NAME,
      ...indexMapping
    });

    console.log(`Index "${INDEX_NAME}" created`);
  } catch (err) {
    console.error(err);
    throw err;
  }
};

The code above highlights creating a new index. First, you need to invoke the setupIndex() function. Within this function, you're providing the preferred name for your index. Elasticsearch then checks if the name already exists within its indexes.

The function terminates if the index name already exists (to prevent duplication). But if it doesn't exist, it proceeds to create an index with that unique name alongside the index Mapping rules (which we'll discuss further shortly).

After creating the index, you'll see a success message in your application console.

How to Delete an Index

After a while, an index may no longer serve its purpose and you may need to remove it from Elasticsearch.

You can do this by executing the esClient.indices.delete() command as shown below:

const deleteIndex = async () => {
  try {
    await esClient.indices.delete({ index: INDEX_NAME });
    console.log(`${INDEX_NAME} deleted`);
  } catch (err) {
    console.error("Error deleting index:", err);
  }
};

How to Delete a Post within an Index

Sometimes, posts get deleted and modified. Also, users may get banned, after which you'd want to remove their content from the stored database .

In these cases, you'll want to ensure true deletion – that is, both from the database and from Elasticsearch indexed storage.

To do this, you'll call the esClient.delete() function, passing the Elasticsearch Client ID and the post's unique ID that you want to delete as callback arguments to your esClient.delete function.

const deletePost = async (postId) => {
  try {
    await esClient.delete({
      index: INDEX_NAME,
      id: postId.toString(),
    });

    console.log("Post successfully deleted");
    return { success: true, postId };
  } catch (err) {
    console.error(err);
    throw err;
  }
};

How to Index a Post

After setting up the Elasticsearch Index, you'll want to automatically index posts made to the database into the Elasticsearch index.

To do this, you'll need to make sure that the post is compatible with your index schema via the transformPostTOESRepo function. This function extracts and formats the post data so it matches the Elasticsearch document structure.

const transformPostToESDoc = (post) => {
  return {
  id: post.id,
  title: post.title,
  content: post.body,
  author: post.author,
  category: post.category,
  tags: post.tags,
  views: post.views || 0,
  published_at: post.created_at
};

const indexPost = async (postId) => {
  try {
    const postRepo = await getPostRepo();
    const post = await postRepo.findOne({ where: { id: postId } });

    if (!post) {
      throw new Error("Post not available");
    }

    const esDocument = transformPostToESDoc(post);

    await esClient.index({
      index: INDEX_NAME,
      id: post.id.toString(),
      document: esDocument
    });

    console.log("Post successfully indexed");
    return { success: true, postId };
  } catch (err) {
    console.error(err);
    throw err;
  }
};

The post to be indexed must have a unique ID. For ease of use, we used the unique post ID constraint that comes by default in regular databases. Optionally, you can also use UUID libraries to generate unique post IDs.

The Post information is then attached to the esClient.index() function as the document to be indexed. We also put appropriate error handling measures in place to prevent the app from crashing if the process is unsuccessful.

How to Define Elastic Search Mapping Rules

Elasticsearch mappings define how your data is stored and indexed. They specify the data type of each field and how text is analyzed for search.

In the example below, we'll define an index configuration that includes custom analyzers for autocomplete and mappings for each post field (like title, content, and author).

const indexMapping = {
  settings: {
    analysis: {
      analyzer: {
        autocomplete: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase', 'autocomplete_filter']
        },
        autocomplete_search: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase']
        }
      },
      filter: {
        autocomplete_filter: {
          type: 'edge_ngram',
          min_gram: 2,
          max_gram: 10
        }
      }
    }
  },
  mappings: {
    properties: {
      id: { type: 'integer' },
      title: {
        type: 'text',
        analyzer: 'autocomplete',
        search_analyzer: 'autocomplete_search',
        fields: {
          keyword: { type: 'keyword' },
          standard: { type: 'text' }
        }
      },
      content: {
        type: 'text',
        analyzer: 'standard'
      },
      category: {
        type: 'keyword'
      },
      tags: { type: 'keyword' },
      author: {
        type: 'text',
        fields: {
          keyword: { type: 'keyword' }
        }
      },
      views: { type: 'integer' },
      published_at: { type: 'date' }
    }
  }
};

The indexMapping object defines how Elasticsearch should store and process your data. It consists of two main parts: settings and mappings.

The mappings section defines the structure of your documents. Each field (like title, content, or author) has a type such as text, keyword, integer, or date. This tells Elasticsearch how to store and search that field.

For text fields, we can also define analyzers. Analyzers control how text is broken into smaller pieces (tokens) during indexing and search.

In the settings section, we defined a custom analyzer for autocomplete. This uses an edge_ngram filter to generate partial word matches, so users can find results as they type. We also defined a separate search_analyzer to ensure that search queries are processed correctly.

Together, these settings allow you to support features like autocomplete while keeping search results accurate and efficient.

Search Implementation

In order to implement your search functionality, you'll need to build out the API. This involves building the business logic service and the API route. You'll also use GET requests and attach your search term as a query. The result it generates will be received as a JSON document.

Then you'll implement the search post service function. In this scenario, you'll be using the search engine capabilities to search for phrases within the index. In line with best practices, you'll use a pagination technique to minimize receiving unwanted information.

The search query will consist of the index name, pagination parameters (from and size) to control which results are returned, and the expected maximum size of the result. You'll also attach a query object specifying the modality of the search that the Elasticsearch engine should use.

const searchElastic = async (query, page = 1, size = 10) => {
  const searchQuery = {
    index: INDEX_NAME,
    from: (page - 1) * size,
    size,
    query: {
      bool: {
        must: [
          {
            multi_match: {
              query,
              fields: ["title^3", "content"],
              type: "best_fields",
              fuzziness: "AUTO"
            }
          }
        ]
      }
    }
  };

  const result = await esClient.search(searchQuery);
  return result.hits.hits;
};

In the code above, the function is named searchElastic. The function contains three variables which must be passed in order to execute it: size, page and query.

The size variable specifies the maximum number of documents per search query to be returned. The default count could be any integer.

The query uses a multi_match clause to search across multiple fields, such as title and content. The title^3 syntax boosts matches in the title, making them more relevant than matches in other fields.

We also included a must clause which defines conditions that documents must match to be included in the results.

The search results are usually ranked based on their degree of relevance to the search query.

Full Code

With this, you've completed this tutorial and have configured Elasticsearch to index posts made to your database. Here's the full code:

Elasticsearch Client (config.js):

const { Client } = require('@elastic/elasticsearch');

const esClient = new Client({
  node: 'http://localhost:9200',
  auth: {
    username: process.env.ELASTICSEARCH_USERNAME,
    password: process.env.ELASTICSEARCH_PASSWORD
  },
  maxRetries: 5,
  requestTimeout: 60000,
  tls: {
    rejectUnauthorized: process.env.NODE_ENV !== 'development'
  }
});

module.exports = esClient;

Index mapping:

const indexMapping = {
  settings: {
    analysis: {
      analyzer: {
        autocomplete: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase', 'autocomplete_filter']
        },
        autocomplete_search: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase']
        }
      },
      filter: {
        autocomplete_filter: {
          type: 'edge_ngram',
          min_gram: 2,
          max_gram: 10
        }
      }
    }
  },
  mappings: {
    properties: {
      id: { type: 'integer' },
      title: {
        type: 'text',
        analyzer: 'autocomplete',
        search_analyzer: 'autocomplete_search',
        fields: {
          keyword: { type: 'keyword' },
          standard: { type: 'text' }
        }
      },
      content: {
        type: 'text',
        analyzer: 'standard'
      },
      category: {
        type: 'keyword'
      },
      tags: { type: 'keyword' },
      author: {
        type: 'text',
        fields: {
          keyword: { type: 'keyword' }
        }
      },
      views: { type: 'integer' },
      published_at: { type: 'date' }
    }
  }
};

Create index:

const setupIndex = async () => {
  try {
    const indexExists = await esClient.indices.exists({
      index: INDEX_NAME
    });

    if (indexExists) {
      console.log(`Index "${INDEX_NAME}" already exists`);
      return;
    }

    await esClient.indices.create({
      index: INDEX_NAME,
      ...indexMapping
    });

    console.log(`Index "${INDEX_NAME}" created`);
  } catch (err) {
    console.error(err);
    throw err;
  }
};

Delete index:

const deleteIndex = async () => {
  try {
    await esClient.indices.delete({ index: INDEX_NAME });
    console.log(`${INDEX_NAME} deleted`);
  } catch (err) {
    console.error("Error deleting index:", err);
  }
};

Delete document (post):

const deletePost = async (postId) => {
  try {
    await esClient.delete({
      index: INDEX_NAME,
      id: postId.toString()
    });

    console.log("Post successfully deleted");
    return { success: true, postId };
  } catch (err) {
    console.error(err);
    throw err;
  }
};

Transform and index post:

const transformPostToESDoc = (post) => {
  return {
  id: post.id,
  title: post.title,
  content: post.body,
  author: post.author,
  category: post.category,
  tags: post.tags,
  views: post.views || 0,
  published_at: post.created_at
};

const indexPost = async (postId) => {
  try {
    const postRepo = await getPostRepo();
    const post = await postRepo.findOne({ where: { id: postId } });

    if (!post) {
      throw new Error("Post not available");
    }

    const esDocument = transformPostToESDoc(post);

    await esClient.index({
      index: INDEX_NAME,
      id: post.id.toString(),
      document: esDocument
    });

    console.log("Post successfully indexed");
    return { success: true, postId };
  } catch (err) {
    console.error(err);
    throw err;
  }
};

Search function:

const searchElastic = async (query, page = 1, size = 10) => {
  const searchQuery = {
    index: INDEX_NAME,
    from: (page - 1) * size,
    size,
    query: {
      bool: {
        must: [
          {
            multi_match: {
              query,
              fields: ["title^3", "content"],
              type: "best_fields",
              fuzziness: "AUTO"
            }
          }
        ]
      }
    }
  };

  const result = await esClient.search(searchQuery);
  return result.hits.hits;
};

Wrapping Up

Now you know how to use Elasticsearch to improve user search in your web applications. Elasticsearch is agnostic which allows you to use it across programming languages and frameworks. Its large community base also provides helpful user guides to make onboarding easier.

To further harness Elasticsearch's power, you can explore other tools within the ELK stack (Elasticsearch, Log Stash, and Kibana ) that'll help you generate high quality data visualizations for your data, especially for enterprise applications.

Conclusion

A fast and reliable search engine isn’t negotiable in your web applications these days. Elasticsearch is your go-to for getting this done.

If you would like to read other articles that will enhance your tech journey, feel free to check out my website here . Stay active!

indexing - freeCodeCamp.org

How Database Indexes Work – A Practical Guide with PostgreSQL Examples

Table of Contents

Prerequisites

Why Do You Need Indexes?

How Indexes Work Under the Hood

How to Create Your First Index

Step 1 – Create the Table and Insert Sample Data

Step 2 – Query Without an Index

Step 3 – Create an Index

Step 4 – Query With the Index

How to Use EXPLAIN ANALYZE to Measure Performance

Types of Indexes in PostgreSQL

B-tree (Default)

Hash

GIN (Generalized Inverted Index)

GiST (Generalized Search Tree)

BRIN (Block Range Index)

How to Create a Composite Index

How to Create a Partial Index

How to Create an Expression Index

How to Create a Unique Index

How to Manage Indexes

How to List All Indexes on a Table

How to Check Index Size

How to Find Unused Indexes

How to Drop an Index

How to Rebuild an Index

When Indexes Hurt Instead of Help

Common Mistakes That Prevent Index Usage

Wrapping the Indexed Column in a Function

Implicit Type Casting

Using OR Conditions Across Different Columns

Leading Wildcards in LIKE Queries

Low Selectivity

Best Practices for Indexing

Conclusion

How to Streamline Search in Web Applications with Elasticsearch

Table of Content

What is Elasticsearch?

Elasticsearch Key Terms

How to Set Up Elasticsearch

How to Set Up the Demo Project

How to Set Up Elasticsearch in Your Project

How to Work with Indexes in Elasticsearch

How to Delete an Index

How to Delete a Post within an Index

How to Index a Post

How to Define Elastic Search Mapping Rules

Search Implementation

Full Code

Wrapping Up

Conclusion

How to Use `EXPLAIN ANALYZE` to Measure Performance