database - freeCodeCamp.org

How to Use PostgreSQL as a Cache, Queue, and Search Engine

Aaron Yong — Tue, 21 Apr 2026 16:58:55 +0000

"Just use Postgres" has been circulating as advice for years, but most articles arguing for it are opinion pieces. I wanted hard numbers.

So I built a benchmark suite that pits vanilla PostgreSQL against a feature-optimized PostgreSQL instance — measuring caching, message queues, full-text search, and pub/sub under controlled conditions.

In this article, you'll learn how to use PostgreSQL's built-in features for caching, job queues, full-text search, and pub/sub. You'll see actual benchmark results (latency percentiles, throughput, and error rates) comparing naive PostgreSQL patterns against optimized ones, and understand where PostgreSQL's limits are so you can decide whether you really need that extra service in your stack.

Prerequisites
The Setup
Benchmark 1: Caching with UNLOGGED Tables
Benchmark 2: Job Queues with SKIP LOCKED
Benchmark 3: Full-Text Search with tsvector
Benchmark 4: Pub/Sub with LISTEN/NOTIFY
The Combined Workload: The Honest Test
What I Learned

Prerequisites

To follow along or reproduce the benchmarks, you'll need:

Docker and Docker Compose
Node.js 20+ (for the Express TypeScript API layer)
k6 for load testing
Basic familiarity with SQL and PostgreSQL

The full benchmark project is open source on GitHub — you can clone it and run every test yourself.

The Setup

The benchmark uses two identical PostgreSQL 17 instances running in Docker containers, each with fixed resource constraints (2 CPUs, 2 GB RAM). Both share the same Express TypeScript API layer — the only difference is which PostgreSQL features are enabled.

┌─────────┐     ┌──────────────────┐     ┌─────────────────┐
│   k6    │────>│  Express API     │────>│  PG Baseline    │
│  (load  │     │  (TypeScript)    │     │  (vanilla PG17) │
│  test)  │────>│  Port 3001/3002  │────>│  PG Modded      │
└─────────┘     └──────────────────┘     │  (features on)  │
                                         └─────────────────┘

The baseline instance uses naïve approaches (regular tables, ILIKE search, polling). The modded instance uses PostgreSQL's built-in features (UNLOGGED tables, tsvector with GIN indexes, LISTEN/NOTIFY, partial indexes). Same hardware, same API code, same data. Only the database features differ.

Both instances share this tuned postgresql.conf:

# Memory allocation
shared_buffers = 512MB           # 25% of available RAM
effective_cache_size = 1536MB    # 75% of RAM — helps the query planner
work_mem = 16MB                  # per-sort/hash operation memory

# SSD-optimized planner settings
random_page_cost = 1.1           # default 4.0 assumes spinning disks
effective_io_concurrency = 200   # allow parallel I/O on SSDs

These settings matter. The defaults assume spinning disks from the early 2000s. Setting random_page_cost = 1.1 tells the query planner that random reads are nearly as fast as sequential reads on SSDs, which encourages index usage over sequential scans.

Benchmark 1: Caching with UNLOGGED Tables

The idea: Use an UNLOGGED table as an in-database cache. UNLOGGED tables skip PostgreSQL's Write-Ahead Log (WAL) — the mechanism that guarantees durability. Since cache data is ephemeral by nature, losing it on a crash is acceptable, and skipping WAL removes the biggest write bottleneck.

-- Modded: UNLOGGED table for cache entries
CREATE UNLOGGED TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);

-- Baseline: same schema, but a regular (logged) table
CREATE TABLE cache_entries (
    key TEXT PRIMARY KEY,
    value JSONB NOT NULL,
    expires_at TIMESTAMPTZ
);

Results (200 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (regular table)	1.87ms	6.00ms	2.50ms	1,754/s
Modded (UNLOGGED table)	1.71ms	5.24ms	2.17ms	1,760/s

A consistent 13% improvement across all percentiles. Not dramatic, but free — you change one keyword in your CREATE TABLE statement.

Under Stress (1,000 Virtual Users, No Sleep)

Mode	p50	p95	req/s	Total Requests
Baseline	83.38ms	143.23ms	7,663/s	728,021
Modded	77.69ms	126.39ms	8,062/s	765,934

The relative improvement stays locked at 12-13% regardless of load level. The UNLOGGED advantage is a per-write optimization — it saves the same amount of I/O whether you are doing 100 or 10,000 writes per second. The modded instance served 37,000 more requests in the same time window.

The Verdict

UNLOGGED tables won't match Redis for sub-millisecond hot-path caching (real-time bidding, gaming leaderboards). But for web applications where the difference between 2ms and 5ms is invisible to users, they eliminate an entire infrastructure dependency for zero additional complexity.

You do give up Redis data structures (sorted sets, HyperLogLog, streams). If you need those, a dedicated cache is still the right call.

Benchmark 2: Job Queues with SKIP LOCKED

The idea: Use PostgreSQL as a job queue with SELECT ... FOR UPDATE SKIP LOCKED. Multiple workers poll the same table, and SKIP LOCKED ensures each worker gets a different row — no duplicates, no contention.

-- Queue table with a partial index on pending jobs only
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    payload JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Partial index: only indexes pending jobs
-- As jobs complete, they leave the index — it stays small forever
CREATE INDEX idx_pending_jobs ON job_queue (created_at)
    WHERE status = 'pending';

The dequeue pattern:

-- Atomic dequeue: select + update in one statement
UPDATE job_queue SET status = 'processing'
WHERE id = (
    SELECT id FROM job_queue
    WHERE status = 'pending'
    ORDER BY created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED  -- skip rows locked by other workers
) RETURNING *;

How SKIP LOCKED works: Worker A locks row 1. Worker B tries row 1, sees the lock, skips it, and takes row 2 instead. No blocking, no duplicates. If a worker crashes, the transaction rolls back and the row becomes available again.

Results (100 Producers + 50 Consumers)

Mode	p50	p95	avg	req/s
Baseline (full index)	1.90ms	5.01ms	2.30ms	1,053/s
Modded (partial index)	1.81ms	5.28ms	2.29ms	1,052/s

They're virtually identical. The partial index doesn't show its value in a 60-second benchmark because the table doesn't accumulate enough completed rows for the index size difference to matter. In a production system with millions of completed jobs, the partial index keeps the index at kilobytes while a full index grows to gigabytes.

The Verdict

SKIP LOCKED is production-ready for job queues. Libraries like pg-boss (Node.js) and river (Go) build on this exact pattern.

You do give up exchange/routing patterns (fan-out, topic-based routing) and consumer groups with message replay. If you need those, a dedicated message broker is still the right tool. For simple "process this job once" workloads, PostgreSQL handles it.

Benchmark 3: Full-Text Search with tsvector

The idea: Use PostgreSQL's built-in full-text search instead of a separate search service. A tsvector column stores pre-processed search tokens, and a GIN (Generalized Inverted Index) enables fast lookups using the same inverted index concept that powers Elasticsearch.

-- Search-optimized article table
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector tsvector  -- pre-computed search tokens
);

-- GIN index for full-text search
CREATE INDEX idx_search ON articles USING GIN (search_vector);

-- Auto-update search_vector on insert/update
CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english',
        COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_search
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_search_vector();

The baseline uses ILIKE with a leading wildcard — the approach most developers reach for first:

-- Baseline: sequential scan on every query
SELECT * FROM articles
WHERE title ILIKE '%postgresql%' OR body ILIKE '%postgresql%';

-- Modded: GIN index lookup with relevance ranking
SELECT id, title,
    ts_rank(search_vector, plainto_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql')
ORDER BY rank DESC LIMIT 20;

Results (500 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (ILIKE)	1.96ms	101.83ms	25.22ms	561/s
Modded (tsvector + GIN)	2.76ms	10.39ms	3.76ms	675/s

This is the standout result. The baseline's p95 of 101ms versus the modded's 10ms is a 10x improvement.

Why the baseline's p50 (1.96ms) is slightly better than the modded's (2.76ms): simple ILIKE queries on small result sets can be fast when the data fits in shared_buffers. But as load increases and the buffer cache is contested, sequential scans degrade dramatically. The GIN index stays stable.

Under Stress (500 Virtual Users, No Sleep)

Mode	p50	p95	req/s	Total Requests
Baseline (ILIKE)	599ms	1,000ms	558/s	50,212
Modded (tsvector)	209ms	396ms	1,441/s	129,679

ILIKE collapses to 1-second p95 latencies. Each query forces a sequential scan of all 10,000 articles, blocking shared buffers and starving concurrent queries. The tsvector approach serves 2.6x more requests in the same time window because the GIN index lookup is O(log n) regardless of concurrency.

The Verdict

This is the strongest argument in the entire benchmark. The fix requires zero extensions — to_tsvector(), plainto_tsquery(), and CREATE INDEX USING GIN are all built into core PostgreSQL. If you're doing WHERE column ILIKE '%term%' on any table with more than a few thousand rows, you're leaving massive performance on the table.

You do give up distributed search across shards, complex analyzers for CJK languages, and aggregation/faceted search pipelines. For a product search bar, blog search, or internal tool — PostgreSQL is enough.

Benchmark 4: Pub/Sub with LISTEN/NOTIFY

The idea: Use PostgreSQL's native LISTEN/NOTIFY for pub/sub messaging, triggered automatically on INSERT via a database trigger.

-- Trigger that fires pg_notify on every new message
CREATE OR REPLACE FUNCTION notify_message() RETURNS trigger AS $$
BEGIN
    PERFORM pg_notify(NEW.channel, NEW.payload::text);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_notify
    AFTER INSERT ON messages
    FOR EACH ROW EXECUTE FUNCTION notify_message();

Results (200 Virtual Users)

Mode	p50	p95	avg	req/s
Baseline (poll-based)	1.99ms	6.04ms	2.84ms	1,116/s
Modded (LISTEN/NOTIFY)	1.65ms	4.80ms	2.13ms	1,131/s

Here we have a 20% improvement at p95. The trigger-based approach does more work per INSERT (INSERT + NOTIFY), but the reduced round trips and better connection reuse patterns offset the overhead.

The Verdict

LISTEN/NOTIFY works for real-time features where you would otherwise reach for Redis pub/sub. The main limitation is payload size (8,000 bytes maximum) and the requirement for dedicated connections (incompatible with PgBouncer in transaction mode).

The Combined Workload: The Honest Test

Individual benchmarks are flattering. The real question: can one PostgreSQL instance handle caching, queues, search, and pub/sub simultaneously without degrading?

Results (All Four Workloads Running Together)

Mode	p50	p95	avg	req/s
Baseline	1.65ms	5.24ms	2.17ms	1,424/s
Modded	1.86ms	6.05ms	2.47ms	1,417/s

Under combined load, the baseline marginally outperforms the modded setup. The modded PostgreSQL does more work per operation — maintaining GIN indexes, firing triggers, running pg_cron in the background. When all these features are active simultaneously, the overhead is measurable: about 15% higher p95 latency.

But both setups stay comfortably under 10ms at p95. For most web applications, that's more than good enough.

What I Learned

After running all these benchmarks, here's what I would tell a team evaluating whether to "just use Postgres":

Do it for full-text search: Switching from ILIKE to tsvector with a GIN index is a 10x improvement that requires zero extensions. This is the single highest-ROI change in the entire PostgreSQL ecosystem, and most developers don't know it exists.
Do it for job queues: SKIP LOCKED is production-ready and eliminates RabbitMQ for simple "process this job" workloads. Use a library like pg-boss or river rather than rolling your own.
Consider it for caching: UNLOGGED tables give a steady 13% improvement over regular tables. If sub-millisecond latency is not a hard requirement (and for most web apps, it is not), you can drop Redis entirely.
Be honest about the overhead: Running all four roles simultaneously adds about 15% latency compared to running any single role. Whether that matters depends on your latency budget.
Know where to stop: PostgreSQL won't match Redis for sub-millisecond caching, Kafka for millions of messages per second, or Elasticsearch for distributed multi-node search with complex analyzers. The line is at extreme throughput or extreme specialization.

The honest conclusion is not "PostgreSQL does everything." It is: for most applications, a single well-configured PostgreSQL instance handles 80% of what you would otherwise need three to five additional services for. That is less infrastructure to deploy, monitor, and maintain — and fewer things to break at 3 AM.

Enterprise-scale applications processing millions of messages per second, serving sub-millisecond cache hits to millions of concurrent users, or running distributed search across terabytes of documents will still need specialized tools. Those tools exist for a reason, and at that scale the operational cost of running them is justified by the performance you get back.

But most of us aren't building at that scale — and may never need to. Starting with PostgreSQL for these roles means you ship faster with fewer moving parts. If and when you outgrow what PostgreSQL can handle, your benchmarks will tell you exactly which role needs to be extracted into a dedicated service. That is a much better position than starting with five services on day one because you assumed you would need them.

The benchmark project is open source if you want to reproduce these results or adapt the tests for your own workload.

You can find more of my writing at site.aaronhsyong.com.

How to Store Data Locally with Isar in Flutter

Atuoha Anthony — Fri, 19 Sep 2025 13:09:48 +0000

When building Flutter applications, managing local data efficiently is critical. You want a database that is lightweight, fast, and easy to integrate, especially if your app will work offline. Isar is one such database. It is a high-performance, easy-to-use NoSQL embedded database tailored for Flutter. With features like reactive queries, indexes, relationships, migrations, and transactions, Isar makes local data persistence both powerful and developer-friendly.

In this article, you’lll learn how to integrate Isar into a Flutter project, set up a data model, and perform the full range of CRUD (Create, Read, Update, Delete) operations. To make this practical, you’ll build a simple to-do app that allows users to create, view, update, and delete tasks.

Prerequisites
What We Are Building
How to Set Up Isar in a Flutter Project
How to Create the Task Model
How to Build the Repository for CRUD Operations
How to Integrate CRUD into the Flutter UI
Beyond CRUD: Advanced Features of Isar
Conclusion

Prerequisites

Before starting, ensure you have the following:

Flutter SDK installed (version 3.0 or above recommended).
Check your version with:
```
 flutter --version
```
Dart knowledge: Familiarity with Dart syntax, classes, and async programming.
Flutter basics: You should know how to set up a Flutter project, build widgets, and use FutureBuilder or setState for state management.
Code editor: VS Code or Android Studio is recommended.

If these are in place, we are ready to begin.

What We Are Building

We will create a Task Manager App that lets users:

Add new tasks.
View all tasks in a list.
Update existing tasks.
Delete tasks.

By the end, you will have a fully functioning CRUD app built with Flutter and Isar.

How to Set Up Isar in a Flutter Project

Step 1: Add dependencies

Open your pubspec.yaml file and add the following:

dependencies:
  flutter:
    sdk: flutter
  isar: ^3.1.0
  isar_flutter_libs: ^3.1.0

dev_dependencies:
  isar_generator: ^3.1.0
  build_runner: any

isar: The core Isar package.
isar_flutter_libs: Required for Flutter integration.
isar_generator: Used to generate code for your models.
build_runner: Runs the code generator.

Run:

flutter pub get

Step 2: Create and initialize Isar

Create a file named isar_setup.dart. This will handle the opening of the Isar database.

import 'package:isar/isar.dart';
import 'package:path_provider/path_provider.dart';
import 'task.dart'; // we will create this model soon

late final Isar isar;

Future<void> initializeIsar() async {
  final dir = await getApplicationDocumentsDirectory();
  isar = await Isar.open(
    [TaskSchema],
    directory: dir.path,
  );
}

Explanation:

getApplicationDocumentsDirectory() provides a storage location for the database file.
Isar.open() initializes the database and registers our Task schema.
late final Isar isar; ensures we can access the database instance globally after initialization.

How to Create the Task Model

Now let’s define our data model for tasks. Create a file named task.dart.

import 'package:isar/isar.dart';

part 'task.g.dart';

@Collection()
class Task {
  Id id = Isar.autoIncrement; // auto-incrementing primary key

  late String name;

  late DateTime createdAt;

  Task(this.name) : createdAt = DateTime.now();
}

Explanation:

@Collection() tells Isar this class represents a database collection.
Id id = Isar.autoIncrement; creates a unique identifier automatically.
late String name; stores the task name.
late DateTime createdAt; stores the creation timestamp.
part 'task.g.dart'; links to the generated code, which will be created after running the code generator.

Generate the code with:

flutter pub run build_runner build

This generates task.g.dart, which contains the necessary schema code.

How to Build the Repository for CRUD Operations

Create a new file called task_repository.dart. This will house the methods for interacting with the database.

import 'package:isar/isar.dart';
import 'task.dart';
import 'isar_setup.dart';

class TaskRepository {
  Future<void> addTask(String name) async {
    final task = Task(name);
    await isar.writeTxn(() async {
      await isar.tasks.put(task);
    });
  }

  Future<List> getAllTasks() async {
    return await isar.tasks.where().findAll();
  }

  Future<void> updateTask(Task task) async {
    await isar.writeTxn(() async {
      await isar.tasks.put(task);
    });
  }

  Future<void> deleteTask(Task task) async {
    await isar.writeTxn(() async {
      await isar.tasks.delete(task.id);
    });
  }
}

Explanation:

addTask: Creates a new task and saves it.
getAllTasks: Reads all tasks from the database.
updateTask: Updates an existing task by calling .put() again.
deleteTask: Removes a task by its id.
isar.writeTxn: Ensures operations run inside a transaction for safety and consistency.

How to Integrate CRUD into the Flutter UI

Now, let’s connect everything inside main.dart.

import 'package:flutter/material.dart';
import 'isar_setup.dart';
import 'task_repository.dart';
import 'task.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();
  await initializeIsar(); // initialize Isar before runApp
  runApp(MyApp());
}

class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: TaskListScreen(),
    );
  }
}

class TaskListScreen extends StatefulWidget {
  @override
  _TaskListScreenState createState() => _TaskListScreenState();
}

class _TaskListScreenState extends State<TaskListScreen> {
  final TaskRepository _taskRepository = TaskRepository();
  late Future<List> _tasksFuture;

  @override
  void initState() {
    super.initState();
    _tasksFuture = _taskRepository.getAllTasks();
  }

  Future<void> _addTask() async {
    await _taskRepository.addTask('New Task');
    setState(() {
      _tasksFuture = _taskRepository.getAllTasks();
    });
  }

  Future<void> _deleteTask(Task task) async {
    await _taskRepository.deleteTask(task);
    setState(() {
      _tasksFuture = _taskRepository.getAllTasks();
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text('Isar CRUD Example')),
      body: FutureBuilder<List>(
        future: _tasksFuture,
        builder: (context, snapshot) {
          if (snapshot.connectionState == ConnectionState.waiting) {
            return Center(child: CircularProgressIndicator());
          } else if (snapshot.hasError) {
            return Center(child: Text('Error: ${snapshot.error}'));
          } else {
            final tasks = snapshot.data ?? [];
            if (tasks.isEmpty) {
              return Center(child: Text('No tasks yet.'));
            }
            return ListView.builder(
              itemCount: tasks.length,
              itemBuilder: (context, index) {
                final task = tasks[index];
                return ListTile(
                  title: Text(task.name),
                  subtitle: Text('Created at: ${task.createdAt}'),
                  trailing: IconButton(
                    icon: Icon(Icons.delete),
                    onPressed: () => _deleteTask(task),
                  ),
                );
              },
            );
          }
        },
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _addTask,
        child: Icon(Icons.add),
      ),
    );
  }
}

Explanation:

initializeIsar(): Ensures the database is ready before the app runs.
_tasksFuture: Holds a future of the list of tasks.
_addTask: Adds a new task and refreshes the list.
_deleteTask: Deletes a task and refreshes the list.
FutureBuilder: Automatically rebuilds the UI when the future completes.
ListView.builder: Displays all tasks dynamically.

This gives you a simple yet complete CRUD app using Isar.

Beyond CRUD: Advanced Features of Isar

Once you are comfortable with CRUD, Isar provides advanced tools to optimize and extend your application:

Reactive Queries:
Instead of using FutureBuilder, you can listen for changes directly.
```
 final stream = isar.tasks.where().watch(fireImmediately: true);
```

Indexes:
Improve query performance by indexing fields.

 @Collection()
 class Task {
   Id id = Isar.autoIncrement;

   @Index()
   late String name;
 }

Relations:
Link one collection to another (for example, Project with many Tasks).
Custom Queries:
Perform complex filtering, sorting, and pagination.
Migrations:
Safely evolve your schema as the app grows.
Batch Operations:
Insert or update many records in one transaction.

Conclusion

We built a simple Flutter to-do app with Isar that supports creating, reading, updating, and deleting tasks. Along the way, we learned how to:

Add Isar dependencies.
Define a model with annotations.
Generate schema code.
Implement CRUD operations in a repository.
Connect Isar to the Flutter UI.

With its performance, developer-friendly API, and advanced features, Isar is an excellent choice for local persistence in Flutter applications.

For further learning, consult the official docs:

How to Handle MongoDB Migrations with ts-migrate-mongoose

Orim Dominic Adah — Wed, 27 Nov 2024 14:06:04 +0000

Database migrations are modifications made to a database. These modifications may include changing the schema of a table, updating the data in a set of records, seeding data or deleting a range of records.

Database migrations are usually run before an application starts and do not run successfully more than once for the same database. Database migration tools save a history of migrations that have run in a database so that they can be tracked for future purposes.

In this article, you’ll learn how to set up and run database migrations in a minimal Node.js API application. We will use ts-migrate-mongoose and an npm script to create a migration and seed data into a MongoDB database. ts-migrate-mongoose supports running migration scripts from TypeScript code as well as CommonJS code.

ts-migrate-mongoose is a migration framework for Node.js projects that use mongoose as the object-data mapper. It provides a template for writing migration scripts. It also provides a configuration to run the scripts programmatically and from the CLI.

How to Set Up the Project
How to Configure ts-migrate-mongoose for the Project
How to Seed User Data with ts-migrate-mongoose
How to Build an API Endpoint to Fetch Seeded Data
Conclusion

How to Set Up the Project

To use ts-migrate-mongoose for database migrations, you need to have the following:

A Node.js project with mongoose installed as a dependency.
A MongoDB database connected to the project.
MongoDB Compass (Optional – to enable us view the changes in the database).

A starter repository which can be cloned from ts-migrate-mongoose-starter-repo has been created for ease. Clone the repository, fill the environment variables and start the application by running the npm start command.

Visit http://localhost:8000 with a browser or an API client such as Postman and the server will return a "Hello there!" text to show that the starter application runs as expected.

How to Configure ts-migrate-mongoose for the Project

To configure ts-migrate-mongoose for the project, install ts-migrate-mongoose with this command:

npm install ts-migrate-mongoose

ts-migrate-mongoose allows configuration with a JSON file, a TypeScript file, a .env file or via the CLI. It is advisable to use a .env file because the content of the configuration may contain a database password and it is not proper to have that exposed to the public. .env files are usually hidden via .gitignore files so they are more secure to use. This project will use a .env file for the ts-migrate-mongoose configuration.

The file should contain the following keys and their values:

MIGRATE_MONGO_URI - the URI of the Mongo database. It is the same as the database URL.
MIGRATE_MONGO_COLLECTION - the name of the collection (or table) which migrations should be saved in. The default value is migrations which is what is used in this project. ts-migrate-mongoose saves migrations to MongoDB.
MIGRATE_MIGRATIONS_PATH - the path to the folder for storing and reading migration scripts. The default value is ./migrations which is what is used in this project.

How to Seed User Data with ts-migrate-mongoose

We have been able to create a project and connect it successfully to a Mongo database. At this point, we want to seed user data into the database. We need to:

Create a users collection (or table)
Use ts-migrate-mongoose to create a migration script to seed data
Use ts-migrate-mongoose to run the migration to seed the user data into the database before the application starts

1. Create a users Collection using Mongoose

Mongoose schema can be used to create a user collection (or table). User documents (or records) will have the following fields (or columns): email, favouriteEmoji and yearOfBirth.

To create a Mongoose schema for the user collection, create a user.model.js file in the root of the project containing the following code snippet:

const mongoose = require("mongoose");

const userSchema = new mongoose.Schema(
  {
    email: {
      type: String,
      lowercase: true,
      required: true,
    },
    favouriteEmoji: {
      type: String,
      required: true,
    },
    yearOfBirth: {
      type: Number,
      required: true,
    },
  },
  {
    timestamps: true,
  }
);

module.exports.UserModel = mongoose.model("User", userSchema);

2. Create a Migration Script with ts-migrate-mongoose

ts-migrate-mongoose provides CLI commands which can be used to create migration scripts.

Running npx migrate create in the root folder of the project will create a script in the MIGRATE_MIGRATIONS_PATH folder (./migrations in our case). is the name we want the migration script file to have when it is created.

To create a migration script to seed user data, run:

npx migrate create seed-users

The command will create a file in the ./migrations folder with a name in the form --seed-users.ts. The file will have the following code snippet content:

// Import your models here

export async function up (): Promise<void> {
  // Write migration here
}

export async function down (): Promise<void> {
  // Write migration here
}

The up function is used to run the migration. The down function is used to reverse whatever the up function executes, if need be. In our case, we are trying to seed users into the database. The up function will contain code to seed users into the database and the down function will contain code to delete users created in the up function.

If the database is inspected with MongoDB Compass, the migrations collection will have a document that looks like this:

{
  "_id": ObjectId("6744740465519c3bd9c1a7d1"),
  "name": "seed-users",
  "state": "down",
  "createdAt": 2024-11-25T12:56:36.316+00:00,
  "updatedAt": 2024-11-25T12:56:36.316+00:00,
  "__v": 0
}

The state field of the migration document is set to down. After it runs successfully, it changes to up.

You can update the code in ./migrations/-seed-users.ts to the one in the snippet below:

require("dotenv").config() // load env variables
const db = require("../db.js")
const { UserModel } = require("../user.model.js");

const seedUsers = [
  { email: "john@email.com", favouriteEmoji: "🏃", yearOfBirth: 1997 },
  { email: "jane@email.com", favouriteEmoji: "🍏", yearOfBirth: 1998 },
];

export async function up (): Promise<void> {
  await db.connect(process.env.MONGO_URI)
  await UserModel.create(seedUsers);}

export async function down (): Promise<void> {
  await db.connect(process.env.MONGO_URI)
  await UserModel.delete({
    email: {
      $in: seedUsers.map((u) => u.email),
    },
  });
}

3. Run the Migration Before the Application Starts

ts-migrate-mongoose provides us with CLI commands to run the up and down function of migration scripts.

With npx migrate up we can run the up function of a specific script. With npx migrate up we can run the up function of all scripts in the ./migrations folder with a state of down in the database.

To run the migration before the application starts, we make use of npm scripts. npm scripts with a prefix of pre will run before a script without the pre prefix. For example, if there is a dev script and a predev script, whenever the dev script is run with npm run dev, the predev script will automatically run before the dev script is run.

We will use this feature of npm scripts to place the ts-migrate-mongoose command in a prestart script so that the migration will run before the start script.

Update the package.json file to have a prestart script that runs the ts-migrate-mongoose command for running the up function of migration scripts in the project.

  "scripts": {
    "prestart": "npx migrate up",
    "start": "node index.js"
  },

With this setup, when npm run start is executed to start the application, the prestart script will run to execute the migration using ts-migrate-mongoose and seed the database before the application starts.

You should have something similar to the snippet below after running npm run start:

Synchronizing database with file system migrations...
MongoDB connection successful
up: 1732543529744-seed-users.ts 
All migrations finished successfully

> ts-migrate-mongoose-starter-repo@1.0.0 start
> node index.js

MongoDB connection successful                      
Server listening on port 8000

Check out the seed-users branch of the repository to see the current status of the codebase at this point in the article.

How to Build an API Endpoint to Fetch Seeded Data

We can build an API endpoint to fetch the seeded users data in our database. In the server.js file, update the code to the one in the snippet below:

const { UserModel } = require("./user.model.js")

module.exports = async function (req, res) {
  const users = await UserModel.find({}) // fetch all the users in the database

  res.writeHead(200, { "Content-Type": "application/json" });
  return res.end(JSON.stringify({ // return a JSON representation of the fetched users data
    users: users.map((u) => ({
      email: u.email,
      favouriteEmoji: u.favouriteEmoji,
      yearOfBirth: u.yearOfBirth,
      createdAt: u.createdAt
    }))
  }, null, 2));
};

If we start the application and visit http://localhost:8000 using Postman or a browser, we get a JSON response similar to the one below:

{
  "users": [
    {
      "email": "john@email.com",
      "favouriteEmoji": "🏃",
      "yearOfBirth": 1997,
      "createdAt": "2024-11-25T14:18:55.416Z"
    },
    {
      "email": "jane@email.com",
      "favouriteEmoji": "🍏",
      "yearOfBirth": 1998,
      "createdAt": "2024-11-25T14:18:55.416Z"
    }
  ]
}

Notice that if the application is run again, the migration script does not run anymore because the state of the migration will now be up after it has run successfully.

Check out the fetch-users branch of the repository to see the current status of the codebase at this point in the article.

Conclusion

Migrations are useful when building applications and there is need to seed initial data for testing, seeding administrative users, updating database schema by adding or removing columns and updating the values of columns in many records at once.

ts-migrate-mongoose can help provide a framework for running migrations for your Node.js applications if you use Mongoose with MongoDB.

How to Use Object Relational Mapping in Node.js – Optimize Database Interactions With Sequelize ORM

Oluwatobi — Wed, 16 Oct 2024 11:26:26 +0000

Databases play a vital role in the development of applications across mobile and web platforms. Adequate knowledge of data interactions between the application structure and the database is essential for storing relevant application data.

Object-relational mapping, as a programming concept, is an efficient standard protocol for facilitating seamless connection with databases. But what does it really mean, and how do you set it up as a developer? We’ll answer these questions and highlight more about object-relational mapping.

Here are the prerequisites:

Knowledge of Node.js
Use the Express framework
An installed MySQL database

Table of Contents
What is an ORM?
How to Set Up Your Node.js Server
How to Integrate Relevant Packages
Demo Project
Additional Information

What is an ORM?

Object Relational Mapping (ORM) is a database communication concept in programming that involves the abstraction of data types as compatible object-oriented programming variables. It simply eliminates the use of database-defined queries and storage types to allow ease of creating databases via the programming languages.

Its use has been widely adopted in the tech space as has more advantages than conventional database query methods. Here are some of them:

It reduces the risk of data manipulation: SQL and non-SQL injections involve inputting malicious SQL syntaxes and queries into the database, which can compromise database security. Having an ORM in place adds an input validation scheme feature, and details the expected input variable syntax and processes it accordingly.
Ease of database communication: ORM serves to simplify the use of databases as a data tool without undergoing the process of learning a different database query language. The ORM schema can be highlighted in an object-oriented fashion in the application language and can be configured to automatically translate the code to queries compatible with the database.
This feature also allows easy code portability, achieving maintenance of a single database integration code base while changing the database without any adverse outcome. It is highly flexible and can be used in any database of choice.
It also has additional features included to allow database interactions. Database migration features and version control processes are provided. With these, we have seen some of its benefits, we will then highlight popular ORM tools used globally.

Here are the popular ORM tools:

For this article, we’ll be streamlining our ORM use cases to a basic Node.js project linked to a MySQL database. We’ll use the Sequelize ORM as the tool of choice.

With an average package download of 8.5 million monthly and an active development community, Sequelize boasts robust features that seamlessly integrate databases with backend applications. It also provides a user oriented documentation which helps guide the user on setting up and using the tool.

Here is a link to the documentation. It also offers support for MySQL, DB2, and SQLite Microsoft SQL server, and it offers features such as read replication, lazy loading, and efficient database transaction properties.

Next, we’ll set up our web application and install Sequelize to connect us to a MySQL database hosted locally.

How to Set Up Your Node.js Server

In this section you’ll set up our Node server. Navigate to the command line and execute npm init. This command creates a new Node project structure for you.

Next, install the Express package – this will serve as the backend framework. You can do this by running the npm i express command.

How to Integrate Relevant Packages

For the purpose of this tutorial, we’ll install the Sequelize Node package manager in our Node application in order to set up the ORM communication to the database.

To set this up, execute npm i sequelize.

We’ll use a locally hosted MySQL database. To do this, we’ll install an npm package database driver. In this case, we will be installing mysql2. Here is a link to the package

Run npm i mysql2 to install it.

Let’s move on to configuring the connection to the database and building our demo project.

Demo Project

In this section we’ll build a simple backend server that performs Create-Read-Update-Delete operations, with the Sequelize library serving as the connection pipeline.

In order to begin the project, we’ll have to set up the database connection for our application. We’ll create a database connection file and set up our database credentials. You can name the file SequelizeConfig.

module.exports = {

    HOST: "localhost",

    USER: "root",

    PASSWORD: "",

    DB: "sequel",

    dialect: "mysql"

}

In the code above, the database credentials were specified, along with the host address. In our case, the database is locally hosted, so localhost is the default host.

The database login details were also provided. The user here is the root, while the password was set to an empty string. This should be tweaked to ensure database security. I also created a defunct database named “sequel”.

The dialect refers to the type of database the user intends to use. In our case, the dialect is MySQL. Note that this can also be replicated on a cloud hosted database with the credentials obtained. With that, let's integrate the connection file with the application.

const SequelConfig = require('../config/sequelize');

const Sequelize = require('sequelize');

const sequelize = new Sequelize(SequelCOnfig.DB, SequelCOnfig.USER, SequelCOnfig.PASSWORD, {

    host: SequelCOnfig.HOST,

    dialect: SequelCOnfig.dialect

});

In order to facilitate a connection to the database, the variables in the config file were imported and initialized in the Sequelize setup file.



const db = {};

db.Sequelize = Sequelize;

db.sequelize = sequelize;

db.user = require('../model/user.model')(sequelize, Sequelize);

db.token = require('../model/token.model')(sequelize, Sequelize)

module.exports= db;

This file above imports the config file created previously and initializes the Sequelize library. The code then fetches the database details inputted in the config file and, when executed, creates the database.

Furthermore, the various database models which will be discussed subsequently are then integrated with the defunct database and generates a SQL database table .

To get this up and running, the database file created is invoked using the sequelize.sync() method. Any error encountered is logged and the database connection gets terminated.

db.sequelize.sync().then(() => {

  console.log('user created ');

}).catch(err => {

  console.error(err)

})

We’ll go on to discuss the database models.

Models

const Sequelize = require("sequelize");

module.exports = (sequelize) => {

sequelize.define(

"user", {

firstName: {

type : Sequelize.DataTypes.STRING,

allowNull : false

},

lastName: {

type : Sequelize.DataTypes.STRING,

allowNull : false

},

email : {

type : Sequelize.DataTypes.STRING,

allowNull : false, unique: true

},

password: {

type : Sequelize.DataTypes.STRING,

allowNull : false

},

role:  {

type : Sequelize.DataTypes.STRING,

allowNull : false

}

}

)

}

In the code above, the user model was initialized in Sequelize ORM and the field details were specified: email, role, lastName, and password. The type of data to be received was also specified.

It also provides an option to ensure the uniqueness of the user details, and the option to prevent the user from leaving some fields empty via the use of allowNull = false.

On execution of the application, the Sequelize ORM creates an SQL equivalent of the model as a data table.

Next, we’ll work on the CRUD functions in Node.js.

Create Operation

const createUser = async (userInfo) => {

try {

// Check if the email already exists in the database

const ifEmailExists = await User.findOne({ where: { email: userInfo.email } });

if (ifEmailExists) {

throw new ApiError('Email has already been registered');

}

// Create the new user

const newUser = await User.create(userInfo);

return newUser; // Return the created user object

} catch (error) {

// Handle errors such as validation or uniqueness constraint

throw error;

}

};

The function above highlights the controller function for creating user entries in the Express server.

The function is asynchronous, which allows for execution of some commands before eventual execution. The code ensures that the user email doesn’t exist in the database before cresting a new user.

In addition, we also ensured that each email field is unique. If the user details are entered into the database successfully, a “successful” response is sent back to the server. Additionally, any error encountered leads to termination of the function and the error gets sent back to the server.

Read Operation

const FetchUser = async (userId) => {

let userDets;

if (userId) {

// Fetch a single user by ID if userId is provided

userDets = await User.findOne({ where: { id: userId } });

// Check if the user exists

if (!userDets) {

throw new ApiError(httpStatus.NOT_FOUND, 'User not found');

}

} else {

// Fetch all users if no userId is provided

userDets = await User.findAll();

// Check if any users were found

if (userDets.length === 0) {

throw new ApiError(httpStatus.NOT_FOUND, 'No users found');

}

}

The read operation fetches the desired query and sends it back to the user without modification. The user ID, which should be unique, is used to search for a specific user. In this scenario, we want access to all the users created in the database.

In case the requested query is not found, an appropriate error code is generated.

Update Operation



const updateUser = async (userId, userDetails) => {

// First, find the user by their ID

const user = await User.findOne({ where: { id: userId } });

if (!user) {

throw new ApiError(httpStatus.BAD_REQUEST, "User doesn't exist");

}

// Update the user with the new details

await User.update(userDetails, { where: { id: userId } });

// Fetch the updated user to return it

const updatedUser = await User.findOne({ where: { id: userId } });

console.log('Updated user:', updatedUser); // Log the updated user

return updatedUser; // Return the updated user object

};

The update operation aims to modify the data entered in previous operations. That is, to update some data fields.

In the case of Sequelize, the update method is invoked. To succeed with this, the particular user to be edited must be identified. The code above then generates the updated data field and sends it as the output of a successful request.

Delete Operation



const deleteUser = async (userId) => {

const user = await User.findOne({ where: { id: userId } });

if (!user) {

throw new ApiError(httpStatus.BAD_REQUEST, "User doesn't exist");

}

// Delete the user

await user.destroy();

console.log('Deleted user:', user); // Log the deleted user

return user; // Return the deleted user object (useful for confirmation)

};

The delete operation is invoked when data in the database table needs to be deleted. Sequelize makes provision for this via the use of the destroy method. This method deletes a specific user. When executed, a success response code is displayed.

Additional Information

So far, we have integrated an ORM library to serve as a connection between our backend application and our relational database. We also explored advanced concepts such as database migrations and CRUD operations. To learn more about this, you can explore the documentation and utilize it in building more complex projects, as hands-on learning is much encouraged.

Feel free to reach out to me on my blog and check out my other articles here. Till next time, keep on coding!

How to Work with SQLite in Python – A Handbook for Beginners

Ashutosh Krishna — Wed, 02 Oct 2024 09:44:37 +0000

SQLite is one of the most popular relational database management systems (RDBMS). It’s lightweight, meaning that it doesn’t take up much space on your system. One of its best features is that it’s serverless, so you don’t need to install or manage a separate server to use it.

Instead, it stores everything in a simple file on your computer. It also requires zero configuration, so there’s no complicated setup process, making it perfect for beginners and small projects.

SQLite is a great choice for small to medium applications because it’s easy to use, fast, and can handle most tasks that bigger databases can do, but without the hassle of managing extra software. Whether you're building a personal project or prototyping a new app, SQLite is a solid option to get things up and running quickly.

In this tutorial, you'll learn how to work with SQLite using Python. Here’s what we’re going to cover in this tutorial:

How to Set Up Your Python Environment
How to Create an SQLite Database
How to Create Database Tables
How to Insert Data into a Table
How to Query Data
How to Update and Delete Data
How to Use Transactions
How to Optimize SQLite Query Performance with Indexing
How to Handle Errors and Exceptions
How to Export and Import Data [Bonus Section]
Wrapping Up

This tutorial is perfect for anyone who wants to get started with databases without diving into complex setups.

How to Set Up Your Python Environment

Before working with SQLite, let’s ensure your Python environment is ready. Here’s how to set everything up.

Installing Python

If you don’t have Python installed on your system yet, you can download it from the official Python website. Follow the installation instructions for your operating system (Windows, macOS, or Linux).

To check if Python is installed, open your terminal (or command prompt) and type:

python --version

This should show the current version of Python installed. If it’s not installed, follow the instructions on the Python website.

Installing SQLite3 Module

The good news is that SQLite3 comes built-in with Python! You don’t need to install it separately because it’s included in the standard Python library. This means you can start using it right away without any additional setup.

How to Create a Virtual Environment (Optional but Recommended)

It’s a good idea to create a virtual environment for each project to keep your dependencies organized. A virtual environment is like a clean slate where you can install packages without affecting your global Python installation.

To create a virtual environment, follow these steps:

First, open your terminal or command prompt and navigate to the directory where you want to create your project.
Run the following command to create a virtual environment:

python -m venv env

Here, env is the name of the virtual environment. You can name it anything you like.

Activate the virtual environment:

# Use the command for Windows
env\Scripts\activate

# Use the command for macOS/Linux:
env/bin/activate

After activating the virtual environment, you’ll notice that your terminal prompt changes, showing the name of the virtual environment. This means you’re now working inside it.

Installing Necessary Libraries

We’ll need a few additional libraries for this project. Specifically, we’ll use:

pandas: This is an optional library for handling and displaying data in tabular format, useful for advanced use cases.
faker: This library will help us generate fake data, like random names and addresses, which we can insert into our database for testing.

To install pandas and faker, simply run the following commands:

pip install pandas faker

This installs both pandas and faker into your virtual environment. With this, your environment is set up, and you’re ready to start creating and managing your SQLite database in Python!

How to Create an SQLite Database

A database is a structured way to store and manage data so that it can be easily accessed, updated, and organized. It’s like a digital filing system that allows you to efficiently store large amounts of data, whether it’s for a simple app or a more complex system. Databases use tables to organize data, with rows and columns representing individual records and their attributes.

How SQLite Databases Work

Unlike most other database systems, SQLite is a serverless database. This means that it doesn’t require setting up or managing a server, making it lightweight and easy to use. All the data is stored in a single file on your computer, which you can easily move, share, or back up. Despite its simplicity, SQLite is powerful enough to handle many common database tasks and is widely used in mobile apps, embedded systems, and small to medium-sized projects.

How to Create a New SQLite Database

Let’s create a new SQLite database and learn how to interact with it using Python’s sqlite3 library.

Connecting to the Database

Since sqlite3 is pre-installed, you just need to import it in your Python script. To create a new database or connect to an existing one, we use the sqlite3.connect() method. This method takes the name of the database file as an argument. If the file doesn’t exist, SQLite will automatically create it.

import sqlite3

# Connect to the SQLite database (or create it if it doesn't exist)
connection = sqlite3.connect('my_database.db')

In this example, a file named my_database.db is created in the same directory as your script. If the file already exists, SQLite will just open the connection to it.

Creating a Cursor

Once you have a connection, the next step is to create a cursor object. The cursor is responsible for executing SQL commands and queries on the database.

# Create a cursor object
cursor = connection.cursor()

Closing the Connection

After you’ve finished working with the database, it’s important to close the connection to free up any resources. You can close the connection with the following command:

# Close the database connection
connection.close()

However, you should only close the connection once you’re done with all your operations.

When you run your Python script, a file named my_database.db will be created in your current working directory. You’ve now successfully created your first SQLite database!

How to Use Context Manager to Open and Close Connections

Python provides a more efficient and cleaner way to handle database connections using the with statement, also known as a context manager. The with statement automatically opens and closes the connection, ensuring that the connection is properly closed even if an error occurs during the database operations. This eliminates the need to manually call connection.close().

Here’s how you can use the with statement to handle database connections:

import sqlite3

# Step 1: Use 'with' to connect to the database (or create one) and automatically close it when done
with sqlite3.connect('my_database.db') as connection:

    # Step 2: Create a cursor object to interact with the database
    cursor = connection.cursor()

    print("Database created and connected successfully!")

# No need to call connection.close(); it's done automatically!

From now on, we’ll use the with statement in our upcoming code examples to manage database connections efficiently. This will make the code more concise and easier to maintain.

How to Create Database Tables

Now that we’ve created an SQLite database and connected to it, the next step is to create tables inside the database. A table is where we’ll store our data, organized in rows (records) and columns (attributes). For this example, we’ll create a table called Students to store information about students, which we’ll reuse in upcoming sections.

To create a table, we use SQL's CREATE TABLE statement. This command defines the table structure, including the column names and the data types for each column.

Here’s a simple SQL command to create a Students table with the following fields:

id: A unique identifier for each student (an integer).
name: The student's name (text).
age: The student's age (an integer).
email: The student's email address (text).

The SQL command to create this table would look like this:

CREATE TABLE Students (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    age INTEGER,
    email TEXT
);

We can execute this CREATE TABLE SQL command in Python using the sqlite3 library. Let’s see how to do that.

import sqlite3

# Use 'with' to connect to the SQLite database and automatically close the connection when done
with sqlite3.connect('my_database.db') as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Write the SQL command to create the Students table
    create_table_query = '''
    CREATE TABLE IF NOT EXISTS Students (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT NOT NULL,
        age INTEGER,
        email TEXT
    );
    '''

    # Execute the SQL command
    cursor.execute(create_table_query)

    # Commit the changes
    connection.commit()

    # Print a confirmation message
    print("Table 'Students' created successfully!")

IF NOT EXISTS: This ensures that the table is only created if it doesn’t already exist, preventing errors if the table has been created before.
connection.commit(): This saves (commits) the changes to the database.

When you run the Python code above, it will create the Students table in the my_database.db database file. You’ll also see a message in the terminal confirming that the table has been created successfully.

If you’re using Visual Studio Code, you can install the SQLite Viewer extension to view SQLite databases.

Data Types in SQLite and Their Mapping to Python

SQLite supports several data types, which we need to understand when defining our tables. Here’s a quick overview of common SQLite data types and how they map to Python types:

SQLite Data Type	Description	Python Equivalent
INTEGER	Whole numbers	`int`
TEXT	Text strings	`str`
REAL	Floating-point numbers	`float`
BLOB	Binary data (e.g., images, files)	`bytes`
NULL	Represents no value or missing data	`None`

In our Students table:

id is of type INTEGER, which maps to Python’s int.
name and email are of type TEXT, which map to Python’s str.
age is also of type INTEGER, mapping to Python’s int.

How to Insert Data into a Table

Now that we have our Students table created, it’s time to start inserting data into the database. In this section, we’ll cover how to insert both single and multiple records using Python and SQLite, and how to avoid common security issues like SQL injection by using parameterized queries.

How to Insert a Single Record

To insert data into the database, we use the INSERT INTO SQL command. Let’s start by inserting a single record into our Students table.

Here’s the basic SQL syntax for inserting a single record:

INSERT INTO Students (name, age, email) 
VALUES ('John Doe', 20, 'johndoe@example.com');

However, instead of writing SQL directly in our Python script with hardcoded values, we’ll use parameterized queries to make our code more secure and flexible. Parameterized queries help prevent SQL injection, a common attack where malicious users can manipulate the SQL query by passing harmful input.

Here’s how we can insert a single record into the Students table using a parameterized query:

import sqlite3

# Use 'with' to open and close the connection automatically
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # Insert a record into the Students table
    insert_query = '''
    INSERT INTO Students (name, age, email) 
    VALUES (?, ?, ?);
    '''
    student_data = ('Jane Doe', 23, 'jane@example.com')

    cursor.execute(insert_query, student_data)

    # Commit the changes automatically
    connection.commit()

    # No need to call connection.close(); it's done automatically!
    print("Record inserted successfully!")

The ? placeholders represent the values to be inserted into the table. The actual values are passed as a tuple (student_data) in the cursor.execute() method.

How to Insert Multiple Records

If you want to insert multiple records at once, you can use the executemany() method in Python. This method takes a list of tuples, where each tuple represents one record.

To make our example more dynamic, we can use the Faker library to generate random student data. This is useful for testing and simulating real-world scenarios.

from faker import Faker
import sqlite3

# Initialize Faker
fake = Faker(['en_IN'])

# Use 'with' to open and close the connection automatically
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # Insert a record into the Students table
    insert_query = '''
    INSERT INTO Students (name, age, email) 
    VALUES (?, ?, ?);
    '''
    students_data = [(fake.name(), fake.random_int(
        min=18, max=25), fake.email()) for _ in range(5)]

    # Execute the query for multiple records
    cursor.executemany(insert_query, students_data)

    # Commit the changes
    connection.commit()

    # Print confirmation message
    print("Fake student records inserted successfully!")

In this code:

Faker() generates random names, ages, and emails for students. Passing the locale([‘en_IN’]) is optional.
cursor.executemany(): This method allows us to insert multiple records at once, making the code more efficient.
students_data: A list of tuples where each tuple represents one student’s data.

How to Handle Common Issues: SQL Injection

SQL injection is a security vulnerability where attackers can insert or manipulate SQL queries by providing harmful input. For example, an attacker might try to inject code like '; DROP TABLE Students; -- to delete the table.

By using parameterized queries (as demonstrated above), we avoid this issue. The ? placeholders in parameterized queries ensure that input values are treated as data, not as part of the SQL command. This makes it impossible for malicious code to be executed.

How to Query Data

Now that we’ve inserted some data into our Students table, let’s learn how to retrieve the data from the table. We'll explore different methods for fetching data in Python, including fetchone(), fetchall(), and fetchmany().

To query data from a table, we use the SELECT statement. Here’s a simple SQL command to select all columns from the Students table:

SELECT * FROM Students;

This command retrieves all records and columns from the Students table. We can execute this SELECT query in Python and fetch the results.

How to Fetch All Records

Here’s how we can fetch all records from the Students table:

import sqlite3

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Write the SQL command to select all records from the Students table
    select_query = "SELECT * FROM Students;"

    # Execute the SQL command
    cursor.execute(select_query)

    # Fetch all records
    all_students = cursor.fetchall()

    # Display results in the terminal
    print("All Students:")
    for student in all_students:
        print(student)

In this example, the fetchall() method retrieves all rows returned by the query as a list of tuples.

All Students:
(1, 'Jane Doe', 23, 'jane@example.com')
(2, 'Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net')
(3, 'Zayyan Arya', 20, 'yashawinibhakta@example.org')
(4, 'Hemani Shukla', 18, 'gaurikanarula@example.com')
(5, 'Warda Kara', 20, 'npatil@example.net')
(6, 'Mitali Nazareth', 19, 'sparekh@example.org')

How to Fetch a Single Record

If you want to retrieve only one record, you can use the fetchone() method:

import sqlite3

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Write the SQL command to select all records from the Students table
    select_query = "SELECT * FROM Students;"

    # Execute the SQL command
    cursor.execute(select_query)

    # Fetch one record
    student = cursor.fetchone()

    # Display the result
    print("First Student:")
    print(student)

Output:

First Student:
(1, 'Jane Doe', 23, 'jane@example.com')

How to Fetch Multiple Records

To fetch a specific number of records, you can use fetchmany(size):

import sqlite3

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:

    # Create a cursor object
    cursor = connection.cursor()

    # Write the SQL command to select all records from the Students table
    select_query = "SELECT * FROM Students;"

    # Execute the SQL command
    cursor.execute(select_query)

    # Fetch three records
    three_students = cursor.fetchmany(3)

    # Display results
    print("Three Students:")
    for student in three_students:
        print(student)

Output:

Three Students:
(1, 'Jane Doe', 23, 'jane@example.com')
(2, 'Bahadurjit Sabharwal', 18, 'tristanupadhyay@example.net')
(3, 'Zayyan Arya', 20, 'yashawinibhakta@example.org')

How to Use `pandas` for Better Data Presentation

For better data presentation, we can use the pandas library to create a DataFrame from our query results. This makes it easier to manipulate and visualize the data.

Here’s how to fetch all records and display them as a pandas DataFrame:

import sqlite3
import pandas as pd

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:
    # Write the SQL command to select all records from the Students table
    select_query = "SELECT * FROM Students;"

    # Use pandas to read SQL query directly into a DataFrame
    df = pd.read_sql_query(select_query, connection)

# Display the DataFrame
print("All Students as DataFrame:")
print(df)

Output:

All Students as DataFrame:
   id                  name  age                        email
0   1              Jane Doe   23             jane@example.com
1   2  Bahadurjit Sabharwal   18  tristanupadhyay@example.net
2   3           Zayyan Arya   20  yashawinibhakta@example.org
3   4         Hemani Shukla   18    gaurikanarula@example.com
4   5            Warda Kara   20           npatil@example.net
5   6       Mitali Nazareth   19          sparekh@example.org

The pd.read_sql_query() function executes the SQL query and directly returns the results as a pandas DataFrame.

How to Update and Delete Data

In this section, we’ll learn how to update existing records and delete records from our Students table using SQL commands in Python. This is essential for managing and maintaining your data effectively.

Updating Existing Records

To modify existing records in a database, we use the SQL UPDATE command. This command allows us to change the values of specific columns in one or more rows based on a specified condition.

For example, if we want to update a student's age, the SQL command would look like this:

UPDATE Students 
SET age = 21 
WHERE name = 'Jane Doe';

Now, let’s write Python code to update a specific student's age in our Students table.

import sqlite3

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # SQL command to update a student's age
    update_query = '''
    UPDATE Students 
    SET age = ? 
    WHERE name = ?;
    '''

    # Data for the update
    new_age = 21
    student_name = 'Jane Doe'

    # Execute the SQL command with the data
    cursor.execute(update_query, (new_age, student_name))

    # Commit the changes to save the update
    connection.commit()

    # Print a confirmation message
    print(f"Updated age for {student_name} to {new_age}.")

In this example, we used parameterized queries to prevent SQL injection.

How to Delete Records from the Table

To remove records from a database, we use the SQL DELETE command. This command allows us to delete one or more rows based on a specified condition.

For example, if we want to delete a student named 'Jane Doe', the SQL command would look like this:

DELETE FROM Students 
WHERE name = 'Jane Doe';

Let’s write Python code to delete a specific student from our Students table using the with statement.

import sqlite3

# Use 'with' to connect to the SQLite database
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # SQL command to delete a student
    delete_query = '''
    DELETE FROM Students 
    WHERE name = ?;
    '''

    # Name of the student to be deleted
    student_name = 'Jane Doe'

    # Execute the SQL command with the data
    cursor.execute(delete_query, (student_name,))

    # Commit the changes to save the deletion
    connection.commit()

    # Print a confirmation message
    print(f"Deleted student record for {student_name}.")

Important Considerations

Conditions: Always use the WHERE clause when updating or deleting records to avoid modifying or removing all rows in the table. Without a WHERE clause, the command affects every row in the table.
Backup: It’s good practice to back up your database before performing updates or deletions, especially in production environments.

How to Use Transactions

A transaction is a sequence of one or more SQL operations that are treated as a single unit of work. In the context of a database, a transaction allows you to perform multiple operations that either all succeed or none at all. This ensures that your database remains in a consistent state, even in the face of errors or unexpected issues.

For example, if you are transferring money between two bank accounts, you would want both the debit from one account and the credit to the other to succeed or fail together. If one operation fails, the other should not be executed to maintain consistency.

Why Use Transactions?

Atomicity: Transactions ensure that a series of operations are treated as a single unit. If one operation fails, none of the operations will be applied to the database.
Consistency: Transactions help maintain the integrity of the database by ensuring that all rules and constraints are followed.
Isolation: Each transaction operates independently of others, preventing unintended interference.
Durability: Once a transaction is committed, the changes are permanent, even in the event of a system failure.

When to Use Transactions?

You should use transactions when:

Performing multiple related operations that must succeed or fail together.
Modifying critical data that requires consistency and integrity.
Working with operations that can potentially fail, such as financial transactions or data migrations.

How to Manage Transactions in Python

In SQLite, transactions are managed using the BEGIN, COMMIT, and ROLLBACK commands. However, when using the sqlite3 module in Python, you typically manage transactions through the connection object.

Starting a Transaction

A transaction begins implicitly when you execute any SQL statement. To start a transaction explicitly, you can use the BEGIN command:

cursor.execute("BEGIN;")

However, it’s usually unnecessary to start a transaction manually, as SQLite starts a transaction automatically when you execute an SQL statement.

How to Commit a Transaction

To save all changes made during a transaction, you use the commit() method. This makes all modifications permanent in the database.

connection.commit()

We have already used the commit() method in the above provided examples.

Rolling Back a Transaction

If something goes wrong and you want to revert the changes made during a transaction, you can use the rollback() method. This will undo all changes made since the transaction started.

connection.rollback()

Example of Using Transactions in Python

To illustrate the use of transactions in a real-world scenario, we’ll create a new table called Customers to manage customer accounts. In this example, we’ll assume each customer has a balance. We will add two customers to this table and perform a funds transfer operation between them.

First, let's create the Customers table and insert two customers:

import sqlite3

# Create the Customers table and add two customers
with sqlite3.connect('my_database.db') as connection:
    cursor = connection.cursor()

    # Create Customers table
    create_customers_table = '''
    CREATE TABLE IF NOT EXISTS Customers (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT NOT NULL UNIQUE,
        balance REAL NOT NULL
    );
    '''
    cursor.execute(create_customers_table)

    # Insert two customers
    cursor.execute(
        "INSERT INTO Customers (name, balance) VALUES (?, ?);", ('Ashutosh', 100.0))
    cursor.execute(
        "INSERT INTO Customers (name, balance) VALUES (?, ?);", ('Krishna', 50.0))

    connection.commit()

Now, let’s perform the funds transfer operation between Ashutosh and Krishna:

import sqlite3


def transfer_funds(from_customer, to_customer, amount):
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        try:
            # Start a transaction
            cursor.execute("BEGIN;")

            # Deduct amount from the sender
            cursor.execute(
                "UPDATE Customers SET balance = balance - ? WHERE name = ?;", (amount, from_customer))
            # Add amount to the receiver
            cursor.execute(
                "UPDATE Customers SET balance = balance + ? WHERE name = ?;", (amount, to_customer))

            # Commit the changes
            connection.commit()
            print(
                f"Transferred {amount} from {from_customer} to {to_customer}.")

        except Exception as e:
            # If an error occurs, rollback the transaction
            connection.rollback()
            print(f"Transaction failed: {e}")


# Example usage
transfer_funds('Ashutosh', 'Krishna', 80.0)

In this example, we first created a Customers table and inserted two customers, Ashutosh with a balance of ₹100, and Krishna with a balance of ₹50. We then performed a funds transfer of ₹80 from Ashutosh to Krishna. By using transactions, we ensure that both the debit from Ashutosh's account and the credit to Krishna's account are executed as a single atomic operation, maintaining data integrity in the event of any errors. If the transfer fails (for example, due to insufficient funds), the transaction will roll back, leaving both accounts unchanged.

How to Optimize SQLite Query Performance with Indexing

Indexing is a powerful technique used in databases to improve query performance. An index is essentially a data structure that stores the location of rows based on specific column values, much like an index at the back of a book helps you quickly locate a topic.

Without an index, SQLite has to scan the entire table row by row to find the relevant data, which becomes inefficient as the dataset grows. By using an index, SQLite can jump directly to the rows you need, significantly speeding up query execution.

How to Populate the Database with Fake Data

To effectively test the impact of indexing, we need a sizable dataset. Instead of manually adding records, we can use the faker library to quickly generate fake data. In this section, we’ll generate 10,000 fake records and insert them into our Students table. This will simulate a real-world scenario where databases grow large, and query performance becomes important.

We will use the executemany() method to insert the records as below:

import sqlite3
from faker import Faker

# Initialize the Faker library
fake = Faker(['en_IN'])


def insert_fake_students(num_records):
    """Generate and insert fake student data into the Students table."""
    fake_data = [(fake.name(), fake.random_int(min=18, max=25),
                  fake.email()) for _ in range(num_records)]

    # Use 'with' to handle the database connection
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Insert fake data into the Students table
        cursor.executemany('''
        INSERT INTO Students (name, age, email) 
        VALUES (?, ?, ?);
        ''', fake_data)

        connection.commit()

    print(f"{num_records} fake student records inserted successfully.")


# Insert 10,000 fake records into the Students table
insert_fake_students(10000)

By running this script, 10,000 fake student records will be added to the Students table. In the next section, we'll query the database and compare the performance of queries with and without indexing.

How to Query Without Indexes

In this section, we’ll query the Students table without any indexes to observe how SQLite performs when there are no optimizations in place. This will serve as a baseline to compare the performance when we add indexes later.

Without indexes, SQLite performs a full table scan, which means that it must check every row in the table to find matching results. For small datasets, this is manageable, but as the number of records grows, the time taken to search increases dramatically. Let’s see this in action by running a basic SELECT query to search for a specific student by name and measure how long it takes.

First, we’ll query the Students table by looking for a student with a specific name. We’ll log the time taken to execute the query using Python’s time module to measure the performance.

import sqlite3
import time


def query_without_index(search_name):
    """Query the Students table by name without an index and measure the time taken."""

    # Connect to the database using 'with'
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Measure the start time
        start_time = time.perf_counter_ns()

        # Perform a SELECT query to find a student by name
        cursor.execute('''
        SELECT * FROM Students WHERE name = ?;
        ''', (search_name,))

        # Fetch all results (there should be only one or a few in practice)
        results = cursor.fetchall()

        # Measure the end time
        end_time = time.perf_counter_ns()

        # Calculate the total time taken
        elapsed_time = (end_time - start_time) / 1000

        # Display the results and the time taken
        print(f"Query completed in {elapsed_time:.5f} microseconds.")
        print("Results:", results)


# Example: Searching for a student by name
query_without_index('Ojasvi Dhawan')

Here’s the output:

Query completed in 1578.10000 microseconds.
Results: [(104, 'Ojasvi Dhawan', 21, 'lavanya26@example.com')]

By running the above script, you'll see how long it takes to search the Students table without any indexes. For example, if there are 10,000 records in the table, the query might take 1000-2000 microseconds depending on the size of the table and your hardware. This may not seem too slow for a small dataset, but the performance will degrade as more records are added.

We use time.perf_counter_ns() to measure the time taken for the query execution in nanoseconds. This method is highly accurate for benchmarking small time intervals. We convert the time to microseconds(us) for easier readability.

Introducing the Query Plan

When working with databases, understanding how queries are executed can help you identify performance bottlenecks and optimize your code. SQLite provides a helpful tool for this called EXPLAIN QUERY PLAN, which allows you to analyze the steps SQLite takes to retrieve data.

In this section, we’ll introduce how to use EXPLAIN QUERY PLAN to visualize and understand the inner workings of a query—specifically, how SQLite performs a full table scan when no index is present.

Let’s use EXPLAIN QUERY PLAN to see how SQLite retrieves data from the Students table without any indexes. We’ll search for a student by name, and the query plan will reveal the steps SQLite takes to find the matching rows.

import sqlite3


def explain_query(search_name):
    """Explain the query execution plan for a SELECT query without an index."""

    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Use EXPLAIN QUERY PLAN to analyze how the query is executed
        cursor.execute('''
        EXPLAIN QUERY PLAN
        SELECT * FROM Students WHERE name = ?;
        ''', (search_name,))

        # Fetch and display the query plan
        query_plan = cursor.fetchall()

        print("Query Plan:")
        for step in query_plan:
            print(step)


# Example: Analyzing the query plan for searching by name
explain_query('Ojasvi Dhawan')

When you run this code, SQLite will return a breakdown of how it plans to execute the query. Here’s an example of what the output might look like:

Query Plan:
(2, 0, 0, 'SCAN Students')

This indicates that SQLite is scanning the entire Students table (a full table scan) to find the rows where the name column matches the provided value (Ojasvi Dhawan). Since there is no index on the name column, SQLite must examine each row in the table.

How to Create an Index

Creating an index on a column allows SQLite to find rows more quickly during query operations. Instead of scanning the entire table, SQLite can use the index to jump directly to the relevant rows, significantly speeding up queries—especially those involving large datasets.

To create an index, use the following SQL command:

CREATE INDEX IF NOT EXISTS index-name ON table (column(s));

In this example, we will create an index on the name column of the Students table. Here’s how you can do it using Python:

import sqlite3
import time


def create_index():
    """Create an index on the name column of the Students table."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # SQL command to create an index on the name column
        create_index_query = '''
        CREATE INDEX IF NOT EXISTS idx_name ON Students (name);
        '''

        # Measure the start time
        start_time = time.perf_counter_ns()

        # Execute the SQL command to create the index
        cursor.execute(create_index_query)

        # Measure the start time
        end_time = time.perf_counter_ns()

        # Commit the changes
        connection.commit()

        print("Index on 'name' column created successfully!")

        # Calculate the total time taken
        elapsed_time = (end_time - start_time) / 1000

        # Display the results and the time taken
        print(f"Query completed in {elapsed_time:.5f} microseconds.")


# Call the function to create the index
create_index()

Output:

Index on 'name' column created successfully!
Query completed in 102768.60000 microseconds.

Even though creating the index takes this long (102768.6 microseconds), it's a one-time operation. You will still get substantial speed-up when running multiple queries. In the following sections, we will query the database again to observe the performance improvements made possible by this index.

How to Query with Indexes

In this section, we will perform the same SELECT query we executed earlier, but this time we will take advantage of the index we created on the name column of the Students table. We'll measure and log the execution time to observe the performance improvements provided by the index.

import sqlite3
import time


def query_with_index(student_name):
    """Query the Students table using an index on the name column."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # SQL command to select a student by name
        select_query = 'SELECT * FROM Students WHERE name = ?;'

        # Measure the execution time
        start_time = time.perf_counter_ns()  # Start the timer

        # Execute the query with the provided student name
        cursor.execute(select_query, (student_name,))
        result = cursor.fetchall()  # Fetch all results

        end_time = time.perf_counter_ns()  # End the timer

        # Calculate the elapsed time in microseconds
        execution_time = (end_time - start_time) / 1000

        # Display results and execution time
        print(f"Query result: {result}")
        print(f"Execution time with index: {execution_time:.5f} microseconds")


# Example: Searching for a student by name
query_with_index('Ojasvi Dhawan')

Here’s what we get in the output:

Query result: [(104, 'Ojasvi Dhawan', 21, 'lavanya26@example.com')]
Execution time with index: 390.70000 microseconds

We can observe a significant reduction in execution time compared to when the query was performed without an index.

Let’s analyze the query execution plan for the query with the index on the name column of the Students table. If you execute the same script again to explain the query, you’ll get the below output:

Query Plan:
(3, 0, 0, 'SEARCH Students USING INDEX idx_name (name=?)')

The plan now shows that the query uses the index idx_name, significantly reducing the number of rows that need to be scanned, which leads to faster query execution.

Comparing Performance Results

Now, let's summarize the performance results we obtained when querying with and without indexes.

Execution Time Comparison

Query Type	Execution Time (microseconds)
Without Index	1578.1
With Index	390.7

Performance Improvement Summary

The query with the index is approximately 4.04 times faster than the query without the index.
The execution time improved by about 75.24% after adding the index.

Best Practices for Using Indexes

Indexes can significantly enhance the performance of your SQLite database, but they should be used judiciously. Here are some best practices to consider when working with indexes:

When and Why to Use Indexes

Frequent Query Columns: Use indexes on columns that are frequently used in SELECT queries, especially those used in WHERE, JOIN, and ORDER BY clauses. This is because indexing these columns can drastically reduce query execution time.
Uniqueness Constraints: When you have columns that must hold unique values (like usernames or email addresses), creating an index can enforce this constraint efficiently.
Large Datasets: For tables with a large number of records, indexes become increasingly beneficial. They enable quick lookups, which is essential for maintaining performance as your data grows.
Composite Indexes: Consider creating composite indexes for queries that filter or sort by multiple columns. For example, if you often search for students by both name and age, an index on both columns can optimize such queries.

Potential Downsides of Indexes

While indexes provide significant advantages, there are some potential downsides:

Slower Insert/Update Operations: When you insert or update records in a table with indexes, SQLite must also update the index, which can slow down these operations. This is because each insert or update requires additional overhead to maintain the index structure.
Increased Storage Requirements: Indexes consume additional disk space. For large tables, the storage cost can be substantial. Consider this when designing your database schema, especially for systems with limited storage resources.
Complex Index Management: Having too many indexes can complicate database management. It may lead to situations where you have redundant indexes, which can degrade performance rather than enhance it. Regularly reviewing and optimizing your indexes is a good practice.

Indexes are powerful tools for optimizing database queries, but they require careful consideration. Striking a balance between improved read performance and the potential overhead on write operations is key. Here are some strategies for achieving this balance:

Monitor Query Performance: Use SQLite’s EXPLAIN QUERY PLAN to analyze how your queries perform with and without indexes. This can help identify which indexes are beneficial and which may be unnecessary.
Regular Maintenance: Periodically review your indexes and assess whether they are still needed. Remove redundant or rarely used indexes to streamline your database operations.
Test and Evaluate: Before implementing indexes in a production environment, conduct thorough testing to understand their impact on both read and write operations.

By following these best practices, you can leverage the benefits of indexing while minimizing potential drawbacks, ultimately enhancing the performance and efficiency of your SQLite database.

How to Handle Errors and Exceptions

In this section, we’ll discuss how to handle errors and exceptions when working with SQLite in Python. Proper error handling is crucial for maintaining the integrity of your database and ensuring that your application behaves predictably.

Common Errors in SQLite Operations

When interacting with an SQLite database, several common errors may arise:

Constraint Violations: This occurs when you try to insert or update data that violates a database constraint, such as primary key uniqueness or foreign key constraints. For example, trying to insert a duplicate primary key will trigger an error.
Data Type Mismatches: Attempting to insert data of the wrong type (for example, inserting a string where a number is expected) can lead to an error.
Database Locked Errors: If a database is being written to by another process or connection, trying to access it can result in a "database is locked" error.
Syntax Errors: Mistakes in your SQL syntax will result in errors when you try to execute your commands.

How to Use Python's Exception Handling

Python’s built-in exception handling mechanisms (try and except) are essential for managing errors in SQLite operations. By using these constructs, you can catch exceptions and respond appropriately without crashing your program.

Here’s a basic example of how to handle errors when inserting data into the database:

import sqlite3


def add_customer_with_error_handling(name, balance):
    """Add a new customer with error handling."""
    try:
        with sqlite3.connect('my_database.db') as connection:
            cursor = connection.cursor()
            cursor.execute(
                "INSERT INTO Customers (name, balance) VALUES (?, ?);", (name, balance))
            connection.commit()
            print(f"Added customer: {name} with balance: {balance}")

    except sqlite3.IntegrityError as e:
        print(f"Error: Integrity constraint violated - {e}")

    except sqlite3.OperationalError as e:
        print(f"Error: Operational issue - {e}")

    except Exception as e:
        print(f"An unexpected error occurred: {e}")


# Example usage
add_customer_with_error_handling('Vishakha', 100.0)  # Valid
add_customer_with_error_handling('Vishakha', 150.0)  # Duplicate entry

In this example:

We catch IntegrityError, which is raised for violations like unique constraints.
We catch OperationalError for general database-related issues (like database locked errors).
We also have a generic except block to handle any unexpected exceptions.

Output:

Added customer: Vishakha with balance: 100.0
Error: Integrity constraint violated - UNIQUE constraint failed: Customers.name

Best Practices for Ensuring Database Integrity

Use Transactions: Always use transactions (as discussed in the previous section) when performing multiple related operations. This helps ensure that either all operations succeed or none do, maintaining consistency.
Validate Input Data: Before executing SQL commands, validate the input data to ensure it meets the expected criteria (for example, correct types, within allowable ranges).
Catch Specific Exceptions: Always catch specific exceptions to handle different types of errors appropriately. This allows for clearer error handling and debugging.
Log Errors: Instead of just printing errors to the console, consider logging them to a file or monitoring system. This will help you track issues in production.
Graceful Degradation: Design your application to handle errors gracefully. If an operation fails, provide meaningful feedback to the user rather than crashing the application.
Regularly Backup Data: Regularly back up your database to prevent data loss in case of critical failures or corruption.
Use Prepared Statements: Prepared statements help prevent SQL injection attacks and can also provide better performance for repeated queries.

How to Export and Import Data [Bonus Section]

In this section, we will learn how to export data from an SQLite database to common formats like CSV and JSON, as well as how to import data into SQLite from these formats using Python. This is useful for data sharing, backup, and integration with other applications.

Exporting Data from SQLite to CSV

Exporting data to a CSV (Comma-Separated Values) file is straightforward with Python’s built-in libraries. CSV files are widely used for data storage and exchange, making them a convenient format for exporting data.

Here’s how to export data from an SQLite table to a CSV file:

import sqlite3
import csv

def export_to_csv(file_name):
    """Export data from the Customers table to a CSV file."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Execute a query to fetch all customer data
        cursor.execute("SELECT * FROM Customers;")
        customers = cursor.fetchall()

        # Write data to CSV
        with open(file_name, 'w', newline='') as csv_file:
            csv_writer = csv.writer(csv_file)
            csv_writer.writerow(['ID', 'Name', 'Balance'])  # Writing header
            csv_writer.writerows(customers)  # Writing data rows

        print(f"Data exported successfully to {file_name}.")

# Example usage
export_to_csv('customers.csv')

How to Export Data to JSON

Similarly, you can export data to a JSON (JavaScript Object Notation) file, which is a popular format for data interchange, especially in web applications.

Here’s an example of how to export data to JSON:

import json
import sqlite3


def export_to_json(file_name):
    """Export data from the Customers table to a JSON file."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Execute a query to fetch all customer data
        cursor.execute("SELECT * FROM Customers;")
        customers = cursor.fetchall()

        # Convert data to a list of dictionaries
        customers_list = [{'ID': customer[0], 'Name': customer[1],
                           'Balance': customer[2]} for customer in customers]

        # Write data to JSON
        with open(file_name, 'w') as json_file:
            json.dump(customers_list, json_file, indent=4)

        print(f"Data exported successfully to {file_name}.")


# Example usage
export_to_json('customers.json')

How to Import Data into SQLite from CSV

You can also import data from a CSV file into an SQLite database. This is useful for populating your database with existing datasets.

Here's how to import data from a CSV file:

import csv
import sqlite3


def import_from_csv(file_name):
    """Import data from a CSV file into the Customers table."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Open the CSV file for reading
        with open(file_name, 'r') as csv_file:
            csv_reader = csv.reader(csv_file)
            next(csv_reader)  # Skip the header row

            # Insert each row into the Customers table
            for row in csv_reader:
                cursor.execute(
                    "INSERT INTO Customers (name, balance) VALUES (?, ?);", (row[1], row[2]))

        connection.commit()
        print(f"Data imported successfully from {file_name}.")


# Example usage
import_from_csv('customer_data.csv')

How to Import Data into SQLite from JSON

Similarly, importing data from a JSON file is simple. You can read the JSON file and insert the data into your SQLite table.

Here's how to do it:

import json
import sqlite3


def import_from_json(file_name):
    """Import data from a JSON file into the Customers table."""
    with sqlite3.connect('my_database.db') as connection:
        cursor = connection.cursor()

        # Open the JSON file for reading
        with open(file_name, 'r') as json_file:
            customers_list = json.load(json_file)

            # Insert each customer into the Customers table
            for customer in customers_list:
                cursor.execute("INSERT INTO Customers (name, balance) VALUES (?, ?);", (customer['Name'], customer['Balance']))

        connection.commit()
        print(f"Data imported successfully from {file_name}.")


# Example usage
import_from_json('customer_data.json')

Wrapping Up

And that’s a wrap! This guide has introduced you to the fundamentals of working with SQLite in Python, covering everything from setting up your environment to querying and manipulating data, as well as exporting and importing information. I hope you found it helpful and that it has sparked your interest in using SQLite for your projects.

Now it's time to put your newfound knowledge into practice! I encourage you to create your project using SQLite and Python. Whether it’s a simple application for managing your library, a budgeting tool, or something unique, the possibilities are endless.

Once you’ve completed your project, share it on Twitter and tag me! I’d love to see what you’ve created and celebrate your accomplishments.

You can find all the code from this tutorial on GitHub. Thank you for following along, and happy coding!

Generate Table of Contents for your freeCodeCamp articles for free using the TOC Generator tool.

Full Stack Development with Next.js, Clerk, and Neon Postgres

Ankur Tyagi — Wed, 10 Jul 2024 15:31:12 +0000

Full stack development is constantly evolving, with new developer tools and products being introduced that allow us to build secure and reliable applications more efficiently.

In this tutorial, I’ll walk you through how to build highly performant web applications with Neon – a serverless PostgreSQL database designed for the cloud. You'll also learn how to perform CRUD (Create, Read, Update, and Delete) operations with Neon.

By the end of this tutorial, you will have the basic knowledge required to start building advanced and scalable web applications with Neon.

What is Neon?

Neon is an open-source, scalable, and efficient Postgres DB that separates compute from storage. This means that database computation processes (queries, transactions, and so on) are handled by one set of resources (compute), while the data itself is stored on a separate set of resources (storage).

This architecture allows for greater scalability and performance, making Neon a solid choice for modern web applications.

Neon - a serverless Postgres database

3 Things to Remember About Neon:

🐘 Postgres: Neon is built on the foundation of Postgres. It supports the same extensions, drivers, and SQL syntax as Postgres, ensuring familiarity and ease of use.
☁️ Serverless: Neon operates on a serverless model. Your database is represented as a simple URL, and Neon automatically scales up and down based on workload demands. Say goodbye to over-provisioning.
🌱 Branching: Just like version control for code, Neon allows you to create instant, isolated copies of your data. This feature is invaluable for development, testing, and maintaining separate environments.

Why Neon?

Neon brings the serverless experience to Postgres. Developers can build faster and scale their products effortlessly, without the need to dedicate big teams or big budgets to the database.

Neon supports multiple languages and frameworks – but what are the unique features that make Neon stand out?

Instant branching and auto-scaling

Neon allows you to create database branches instantly for testing, development, and staging environments. This lets you experiment without affecting the production database.

It also provides an auto-scaling capability that automatically adjusts resources based on the application's workload, ensuring optimal performance and cost-efficiency.

Neon DB Main Dashboard

Support for AI applications

Neon supports AI and machine learning applications by providing a high-performance and scalable infrastructure. It enables you to perform semantic and similarity searches in Postgres and handles complex queries and large datasets efficiently, making it ideal for AI or LLM applications.

Open-source

Neon is backed by a vibrant community of Postgres hackers, systems engineers, and cloud engineers who are all huge fans of Postgres.

As an open-source platform, Neon offers transparency and flexibility. You can also reach out to the team and contributors to ask questions, contribute, and help improve the software.

Serverless Architecture

Neon eliminates the need for manual server management, allowing you to focus on building applications rather than maintaining infrastructure. Its serverless nature provides on-demand scalability, ensuring that your application can handle varying loads without manual intervention.

Built upon Postgres

Postgres is one of the most reliable open-source relational database systems. Neon inherits all the advanced features, stability, and performance optimizations of Postgres, including support for ACID transactions, advanced SQL, and NoSQL/JSON, to create a cheaper and more efficient database for cloud environments.

How to Add Neon to a Next.js App

Neon supports multiple frameworks and libraries and provides clear and detailed documentation on adding Neon to them. The Neon serverless driver enables us to connect and interact with Neon in a Next.js application.

Before we proceed, let’s create a Neon account and project.

Neon DB Projects Overview: View and manage all your projects in one place.

Within your project dashboard, you'll find a database connection string. You'll use this to interact with your Neon database.

Neon DB Project Dashboard: Manage database settings with ease from the project dashboard.

Create a TypeScript Next.js project by running the following code snippet in your terminal:

npx create-next-app neon-blog-with-clerk

Next, install the Neon Serverless package:

npm install @neondatabase/serverless

Create a .env.local file and copy your database connection string into the file:

NEON_DATABASE_URL="postgres://:@.neon.tech:/?sslmode=require"

Create a 'db' folder containing an index.ts file within the Next.js app directory and copy the code snippet below into the file:

import { neon } from '@neondatabase/serverless';

if (!process.env.NEON_DATABASE_URL) {
  throw new Error('NEON_DATABASE_URL must be a Neon postgres connection string')
}

export const getDBVersion = async() => {
    const sql = neon(process.env.NEON_DATABASE_URL!);
    const response = await sql`SELECT version()`;
    return { version: response[0].version }
}

Convert the app/page.tsx file to a server component and execute the getDBVersion() function:

import { getDBVersion } from "./db";

export default async function Home() {
    const { version } = await getDBVersion();
    console.log({version})

   return (<div>{/** — UI elements — */}div>)

}

The getDBVersion() function establishes a connection with the Neon database and allows us to run SQL queries using the Postgres client. This function returns the database version, which is then logged to the console.

{
version: 'PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit'
}

Congratulations – you’ve successfully added Neon to your Next.js application.

But interacting with the Neon database by writing SQL queries directly can require extra learning or introduce complexities for developers who are not familiar with SQL. It can also lead to errors or performance issues when performing complex queries.

This is why Neon supports database ORMs such as Drizzle ORM, which provide a higher-level interface for interacting with the database. Drizzle ORM enables you to write complex query functions and interact with the database easily using TypeScript.

How to Set Up Neon Serverless Driver with Drizzle ORM in Next.js

Drizzle ORM lets you query data and perform various operations on the database using simple TypeScript query commands. It is lightweight, typesafe, and easy to use.

First, you'll need to install the Drizzle Kit and the Drizzle ORM package.

Drizzle Kit lets you manage the database schema and migrations.

npm i drizzle-orm
npm i -D drizzle-kit

Inside the db folder, add an actions.ts, and schema.ts file:

cd db
touch actions.ts schema.ts

Add the code snippet below into the db/schema.ts file. It contains the database schema.

import {  text, serial, pgTable, timestamp } from "drizzle-orm/pg-core";

export const postsTable = pgTable("posts", {
    id: serial("id").primaryKey().notNull(),
    content: text("content").notNull(),
    author: text("author").notNull(),
    author_id: text("author_id").notNull(),
    title: text("title").notNull(),
    created_at: timestamp("created_at").defaultNow(),
    slug: text("slug").notNull(),
});

Update the db/index.ts file to connect to the Neon database and export the Drizzle instance (db). This will be used to execute typesafe SQL queries against your Postgres database hosted by Neon.

import { neon } from '@neondatabase/serverless';
import { drizzle } from 'drizzle-orm/neon-http';
import { postsTable } from './schema';


if (!process.env.NEON_DATABASE_URL) {
  throw new Error('DATABASE_URL must be a Neon postgres connection string')
}
const sql = neon(process.env.NEON_DATABASE_URL!);

export const db = drizzle(sql, {
  schema: { postsTable }
});

Next, create a drizzle.config.ts file at the root of the Next.js folder and add the following configuration:

import type { Config } from 'drizzle-kit';
import * as dotenv from "dotenv";

dotenv.config();

if (!process.env.NEON_DATABASE_URL) throw new Error('NEON DATABASE_URL not found in environment');

export default {
  schema: './src/app/db/schema.ts',
  out: './src/app/db/migrations',
 dialect: "postgresql",
  dbCredentials: {
    url: process.env.NEON_DATABASE_URL,
 },
  strict: true,
} satisfies Config;

The drizzle.config.ts file contains all the information about your database connection, migration folder, and schema files.

Finally, update the package.json file to include the Drizzle Kit commands for generating database migrations and updating the tables.

{
"scripts" : {
"migrate": "npx drizzle-kit generate -- dotenv_config_path='.env.local'",
"db-create": "npx drizzle-kit push -- dotenv_config_path='.env.local'"
}
}

Neon DB Tables Dashboard: Effortlessly manage your database tables and view all data.

How to Build the Application Interface with Next.js

In this section, you’ll learn how to build a blog application that allows users to read posts and authenticate authors, enabling them to create and delete posts from the Neon database.

The application is divided into 3 pages:

Home Page: displays all the available blog posts.
Post Details Page (/posts/[slug]): displays the content of a particular blog post.
Create Post Page (/posts/create): allows authors to create new blog posts.

Install the following packages:

npm install date-fns react-simplemde-editor easymde react-markdown remark-gfm dotenv

The Date Fns package allows us to convert the posts' timestamps to human-readable forms for display within the application. The React SimpleMDE Editor provides a WYSIWYG editor for creating content in markdown formats using an interactive editor, and the React Markdown package converts the markdown texts to their corresponding plain formats.

Next, create a utils.ts file within the Next.js app folder and copy the code snippet below into the file:

import { format } from "date-fns";

export const formatDateString = (dateString: Date | null): string => {
    if (!dateString) return "";
    const date = new Date(dateString);
    const formattedDate = format(date, "MMMM do yyyy, h:mma");
    return formattedDate;
};

export const slugifySentences = (sentence: string): string => {
    const slug = sentence
 .toLowerCase()
 .replace(/[^a-z0-9\s-]/g, "")
 .replace(/\s+/g, "-");

    // Generate 5 random letters
    const randomLetters = Array.from({ length: 5 }, () =>
        String.fromCharCode(97 + Math.floor(Math.random() * 26))
 ).join("");

    return `${slug}-${randomLetters}`;
};

The formatDateString function accepts a Date object and returns the date and time in a human-readable format using the date-fns package. The slugifySentences function creates a slug for each post using the post's title, which is useful for implementing the routes for each post.

Copy the code snippet below into the app/page.tsx file:

import Link from "next/link";
import { formatDateString, slugifySentences } from "./utils";

interface Post {
    author_id: string;
    title: string;
    content: string;
    author: string;
    slug: string;
    id: number | null;
    created_at: Date | null;
}

export default async function Home() {
    // dummy posts
    const posts: Post[] = [
        {
            author_id: "1",
            title: "Welcome to Neon Tutorial",
            content: "This is a test post",
            author: "John Doe",
            slug: slugifySentences("Welcome to Neon Tutorial"),
            id: 1,
            created_at: new Date(),
        },
        {
            author_id: "1",
            title: "Hello World",
            content: "This is a test post",
            author: "Jane Doe",
            slug: slugifySentences("Hello World"),
            id: 2,
            created_at: new Date(),
        },
    ];

    // shorten posts with longer title
    const shortenText = (text: string): string => {
        return text.length <= 55 ? text : text.slice(0, 55) + "...";
    };

    return (
        <div>
            <main className='md:px-8 py-8 px-4 w-full bg-white'>
                {posts?.map((post) => (
                    <Link
                        href={`/posts/${post.slug}`}
                        className='rounded w-full border-[1px] p-4 text-blue-500 hover:bg-blue-50 hover:drop-shadow-md transition-all duration-200 ease-in-out flex items-center justify-between gap-4 mb-4'
                        key={post.id}
                    >
                        <h3 className='text-lg font-semibold'>{shortenText(post.title)}h3>
                        <div className='flex items-center justify-between'>
                            <p className='text-xs text-gray-500'>
                                {formatDateString(post?.created_at)}
                            p>
                        div>
                    Link>
                ))}
            main>
        div>
    );
}

The app/page.tsx file represents the home page of the application and displays all the available posts.

It's live - see the power of serverless PostgreSQL and Next.js

Next, add the routes for creating posts and reading the contents of each post. Within the Next.js app folder, create a posts directory containing /posts/create and /posts/[slug] subdirectories.

Create a page.tsx file within the /posts/create folder and copy the code snippet below into the file:

use client";
import { useState, useCallback } from "react";
import { useRouter } from "next/navigation";
import SimpleMDE from "react-simplemde-editor";
import "easymde/dist/easymde.min.css";
import { slugifySentences } from "@/app/utils";

export default function PostCreate() {
    const [publishing, setPublishing] = useState(false);
    const [content, setContent] = useState("");
    const [title, setTitle] = useState("");
    const router = useRouter();

    const onChangeContent = useCallback((value: string) => {
        setContent(value);
    }, []);

    const handleCreatePost = async (e: React.FormEvent) => {
        e.preventDefault();
        console.log({ title, content });
        router.push("/");
    };

    return (
        
            
                
                    
                        Title
                    
                     setTitle(e.target.value)}
                        className='px-4 py-3 border-2 rounded-md text-lg mb-4'
                    />

                    
                        Content
                    
                    

                    
                
            
        
    );
}

The /posts/create page renders a form that accepts the title and content of the post, allowing authors to create new blog posts.

Create your next blog post with ease

Finally, update the /posts/[slug] page to display each post's content and include a button that allows only the posts' authors to delete posts. (You'll learn how to implement this later in the tutorial.)

use client";
import { useRouter, useParams } from "next/navigation";
import ReactMarkdown from "react-markdown";
import { useEffect, useState, useCallback } from "react";
import remarkGfm from "remark-gfm";
import { formatDateString } from "@/app/utils";

export default function Post() {
    const router = useRouter();
    const [loading, setLoading] = useState(true);
    const [post, setPost] = useState(null);
    const params = useParams<{ slug: string }>();

    const deletePost = async () => {
        if (confirm("Are you sure you want to delete this post?")) {
            alert(`Delete ${params.slug}`);
            router.push("/");
        }
    };

    return (
        
            
                
                    
                        {post?.title}

                        
                            
                        
                    

                    
                        
                            Author: {post?.author}
                        
                        
                            Posted on:{" "}
                            
                                {formatDateString(post?.created_at!)}
                            
                        
                    
                

                
                    
                        {post?.content!}
                    
                
            
        
    );
}

The /posts/[slug] page accepts the unique slug for each blog post, fetches the post's content, and allows post authors to delete their own posts.

Blog Post

Congratulations! You've completed the user interface for the application.

How to Authenticate Users with Clerk

Clerk is a complete user management platform that enables you to add various forms of authentication to your software applications. It provides easy-to-use, flexible UI components and APIs that can be integrated seamlessly into your application.

Install the Clerk Next.js SDK by running the following code snippet in your terminal.

npm install @clerk/nextjs

Create a middleware.ts file within the Next.js src folder and copy the code snippet below into the file:

import { clerkMiddleware, createRouteMatcher } from "@clerk/nextjs/server";


// the createRouteMatcher function accepts an array of routes to be protected
const protectedRoutes = createRouteMatcher(["/posts/create"]);

// protects the route
export default clerkMiddleware((auth, req) => {
    if (protectedRoutes(req)) {
        auth().protect();
 }
});

export const config = {
    matcher: ["/((?!.*\\..*|_next).*)", "/", "/(api|trpc)(.*)"],
};

The createRouteMatcher function accepts an array containing routes to be protected from unauthenticated users and the clerkMiddleware() function ensures the routes are protected.

Next, import the following Clerk components into the app/layout.tsx file and update the RootLayout function as shown below:

import {
    ClerkProvider,
    SignInButton,
    SignedIn,
    SignedOut,
    UserButton,
} from "@clerk/nextjs";
import Link from "next/link";

export default function RootLayout({
    children,
}: {
    children: React.ReactNode;
}) {
    return (
        <ClerkProvider>
            <html lang='en'>
                <body className={inter.className}>
                    <nav className='w-full py-4 border-b-[1px] md:px-8 px-4 text-center flex items-center justify-between sticky top-0 bg-white z-10 '>
                        <Link href='/' className='text-xl font-extrabold text-blue-700'>
                            Neon Blog
                        Link>

                        <div className='flex items-center gap-5'>
                            {/*-- if user is signed out --*/}
                            <SignedOut>
                                <SignInButton mode='modal' />
                            SignedOut>
                            {/*-- if user is signed in --*/}
                            <SignedIn>
                                <Link href='/posts/create' className=''>
                                    Create Post
                                Link>
                                <UserButton showName />
                            SignedIn>
                        div>
                    nav>

                    {children}
                body>
            html>
        ClerkProvider>
    );
}

When a user is not signed in, the Sign in button component is rendered.

Seamless sign-ups redefined with Clerk UI

Then, after signing into the application, the Clerk User Button component and a link to create a new post are displayed.

After sign-in: Use Clerk's User Button to create a new post

Next, create a Clerk account and add a new application project.

Clerk's sleek UI dashboard

Select username as the authentication method and create the Clerk project.

Clerk's sleek UI dashboard

Finally, add your Clerk publishable and secret keys into the .env.local file.

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=

Clerk provides various ways to read user's data on the client and the server, which is essential for identifying users within the application.

CRUD Operations with the Neon Database

In this section, you’ll learn how to perform CRUD (Create, Read, Update, Delete) operations with the Neon database. These fundamental operations are essential for interacting with and managing data within any application.

The db/actions.ts file will contain the CRUD operations. Add the following code snippet to the file:

import { db } from ".";
import { postsTable } from './schema';
import { desc, eq } from "drizzle-orm";

// add a new row to the posts table
export const createPost = async (post: Post) => {
    await db.insert(postsTable).values({
        content: post.content,
        author: post.author,
        author_id: post.author_id,
        title: post.title,
        slug: post.slug,
    });
};

// get all the posts
export const getAllPosts = async () => {
    return await db.select().from(postsTable).orderBy(desc(postsTable.created_at));
};

// get a post using its slug
export const getSinglePost = async (slug: string) => {
    return await db.query.postsTable.findFirst({
        where: (post, { eq }) => eq(post.slug, slug)
    });
};

// delete a post
export const deletePost = async (id: number) => {
    await db.delete(postsTable).where(eq(postsTable.id, id));
};

// update a post's content
export const updatePost = async (content: string, id: number) => {
    await db.update(postsTable)
        .set({ content: content })
        .where(eq(postsTable.id, id));
};

From the code snippet above:

This createPost function takes a post object as an argument and inserts a new row into the postsTable with the specified post content, author, author ID, title, and slug.
The getAllPosts function retrieves all the posts from the postsTable and sorts them in descending order by their creation date (created_at).
This getSinglePost function takes a slug as an argument and retrieves the first post that matches the given slug from the postsTable. The slug is unique, so it will return a single object.
This deletePost function takes an id as an argument and deletes the post with the matching ID from the postsTable.
This updatePost function accepts content and a post's id as arguments and updates the post's content with the matching ID in the postsTable.

Finally, you can execute the CRUD functions on the server via API endpoints or Next.js server fetch requests.

For instance, you can fetch all the existing blog posts within the Neon database and display them within the application using the Next.js server data fetching method:

import { getAllPosts } from "./db/actions";

const getPosts = async () => await getAllPosts()

export default async function Home() {
    const posts = await getPosts()

    return (<div>{/** -- UI elements --*/}div>)
}

You can also create a Next.js API endpoint that returns all the available blog posts. Create a /api/posts/all endpoint that returns the posts:

import { getPosts } from "@/app/db/actions";
import { NextRequest, NextResponse } from "next/server";

export async function POST() {


    try {
        const data = await getPosts()
        return NextResponse.json({ message: "Post fetched", data }, { status: 200 });
 } catch (err) {
        return NextResponse.json(
 { message: "Post not available", err },
 { status: 400 }
 );
 }
}

Congratulations! You’ve completed the project for this tutorial.

You can find the code for the app we built here.

Conclusion

In this tutorial, you’ve learned what a Neon database is, how to create one, and how to perform CRUD operations with Neon and Drizzle ORM in a Next.js application.

Neon's serverless architecture, combined with its scalability and performance optimizations, makes it an excellent choice for modern web applications. Neon also provides a smooth developer experience and a community of passionate individuals ready to help you achieve your application goals. Thank you for reading.

Next Steps

By now, you should have a good understanding of how to build full-stack applications with Neon and Next.js.

If you'd like to learn more about how you can leverage Neon to build advanced and scalable applications, you can check out the following resources:

Thanks for Reading!

That's it for this tutorial. I hope you learned something new today.

If you did, please share so that it reaches others as well.

You can connect with me on Twitter or subscribe to my newsletter.

Want to read more interesting blog posts?

You can read more tutorials like this one on my blog.

How to Optimize Your Database – Optimization Principles and Best Practices

Oluwatobi — Fri, 10 May 2024 15:21:56 +0000

Databases are an integral component of building applications, whether web, desktop or mobile. They symbolically serve as the mitochondria of the application, as their primary function is to manage data.

Database management is a critical skill a developer must possess in building scalable applications that have a high level of efficiency. If not handled properly, it can result in data loss and mismanagement on the part of the database developer.

Hence, databases must be structured and built with the users in mind and built utilizing the best practices available.

This article aims to highlight general principles of database best practices and also explain each peculiarity. But before we discuss that in detail, let’s review what database transactions are all about.

What are Database Transactions?

Database transactions are simply groups of operations which can be termed as a unit of a work process performed on a database within a database management system.

It encompasses basic operations such as CRUD operations to more advanced operations such as database indexing, caching, and normalization.

With so many users performing many transactions at the same time, it’s important to ensure that the database is concurrency-enabled to prevent data interference between two or more users accessing the same resource.

Hence, there is the need for the ACID principle. What then does ACID represent?

Atomicity
Consistency
Isolation
Durability

Subsequently, we will be discussing each point in detail. First on our list is atomicity.

What is the Database Atomicity Principle?

What does database atomicity entail? The atomicity of a database simply means that a database operation can’t be broken down further as a unit. This means that the database operation or transactions gets executed completely, and in case any error comes up during the execution process, the entire operation gets completely cancelled, preventing room for partial operation execution.

If the database isn’t atomic, this can result in the provision of misleading incomplete data and ultimately result in entire system chaos. How does the database ensure atomicity? It does this by creating a copy of the existing database before the operation gets executed and then initiates a crash recovery and backup restoration operation in the event of an operation failure.

It is also important to note that other database principles such as consistency and durability rely on the need for the database to be atomic to be truly fulfilled.

Having discussed this, let’s move on to the database consistency principle.

What is the Database Consistency Principle?

This principle entails that the database has certain constraints, cascades, triggers and other requirements in place, which needs to be fulfilled while making changes to an established database. Failure to fulfill this requirement will lead to consistency errors, returning the database to its previous stable state.

Also, consistency as a principle ensures that the data updated by a user is made available as the latest version of the data in the database to all users who desire to read the database. Having this in place eliminates the occurrence of inconsistencies and aids faster information retrieval.

Understanding what it means for a database to be consistent involves ensuring the operation performed on the database passes the integrity check before being successfully executed. Having exhausted this in detail, let's discuss the database isolation principle.

What is the Database Isolation Principle?

Why should we isolate a database and how does one make a database operation independent from other database operations?

Isolation is necessary in a database management system to ensure that the user's access to information on the database is not interfered with by other concurrent transactions undertaken by other users on the database. To enforce this, the use of isolation levels in each database operation helps to preserve information integrity.

To effectively guarantee the database integrity, specific database isolation levels must be used. Here are some of the isolation levels ranked in order of hierarchy:

Read uncommitted
Read committed
Repeatable read
Serializability

Read Uncommitted Isolation Level

The read uncommitted database isolation level allows other users to have access to read current database transactions which has not yet been completely or successfully executed. It allows access to read what is being referred to as dirty read, which is one of the data inconsistencies that can be seen. This level of data isolation isn’t advised.

Read Committed Isolation Level

This database isolation level disallows other users to read or have access to a database transaction that has not yet been committed. Hence it prevents other users from seeing, updating or overwriting it until it has been completely executed.

Repeatable Read Isolation Level

This isolation level exclusively isolates a transaction from other transactions occurring concurrently, preventing other users access to read and update the transactions.

Serializability Isolation Level

This is the highest level of data isolation and is referred to as the strictest level. It isolates the multiple transactions performed concurrently and executes them efficiently as they are executed serially. It also prevents database inconsistencies.

Without these levels in place, inconsistent database mishaps such as dirty reads, non-repeatable reads, phantom reads and many others may be experienced. With this, let's move on to the last point about database durability and discuss it in detail.

What is the Database Durability Principle?

What does it imply when we describe a database as durable and how do we ensure the durability of a database? Durability as it sounds is a principle which ensures that databases have a high level of immortality.

Irrespective of any adverse outcomes that the database management system might face such as outages and crashes, there shouldn't be any loss of database information.

How do databases try to achieve this? The database creates a transactional log that contains the recorded data before any new operation gets executed. In the event of any of these adverse events, the transaction log serves as the backup store, ensuring that the database info is well preserved up to the point before the operation occurred, thereby mitigating against data breaches and loss.

We'll also highlight other helpful database operations best practices that can also be implemented.

Other Database Operations Best Practices

The BASE principle, which is more suited for NoSQL databases such as MongoDB, Redis, and Cassandra, and so on. It entails a database to be:

Basically available
Existing in a soft state
And be eventually consistent.

Basically Available

This entails that the database prioritizes the availability of the database operations over consistency and concurrency. This is quite applicable to distributed systems which rely on a high level of efficiency to function effectively.

Soft State

This ensures easy flexibility of the database, allowing for size scaling, operations and increased concurrency for optimal database performance at all times. This allows the data to maintain resiliency.

Eventually Consistent

This entails that, irrespective of how the transactions get executed in the sequences, it eventually achieves efficient consistency. This is achieved by conflict resolution and reconciliation. This eventually contributes to the building of a resilient data system.

Conclusion

With this, we have come to the end of the tutorial. We hope you’ve learned essentially about optimizing database operations and their efficiency using the ACID principle and other best practices available.

Feel free to drop comments and questions in the box below, and also check out my other articles here. Till next time, keep on coding!

How to Run a Postgres Database in Azure Kubernetes Service and Integrate it with a Node.js Express Application

Ayomide Wilfred Adeyemi — Wed, 08 May 2024 20:43:04 +0000

Hey everyone! Today, you're going to learn about deploying a Postgres container in Azure Kubernetes Service (AKS) and connecting it to a Node.js application.

In this fast-paced development landscape, deploying via containers, particularly with Kubernetes, is becoming increasingly popular. Some companies perform numerous deployments daily, so it's crucial for you to learn these technologies.

Kubernetes is a popular choice to deploy containerized applications like web servers, databases, and APIs. You can set up Kubernetes either locally or in the cloud. In this tutorial, we'll explore setting up Kubernetes on a cloud platform, specifically Azure.

I'll walk you through the process of setting up Kubernetes using Azure Kubernetes Service (AKS). You'll configure your YAML file using StatefulSet, Persistent Volume, and Services to deploy a PostgreSQL database on Azure Kubernetes. Then, you'll obtain the PostgreSQL database credentials running inside the AKS and use them to establish a connection with a Node.js application.

We'll cover key concepts such as deployment, stateful sets, persistent volumes, and services, preparing you to deploy a Postgres container effectively on AKS. I'll also help you connect your Node.js Express app to the Postgres container within the AKS cluster.

So find a comfortable seat and get ready, as we're about to dive in.

Prerequisites

Before you begin, it's important to understand some basic concepts in Kubernetes like pods, deployments, services, and nodes.

If you're new to this, I recommend checking out the Stashchuk freeCodeCamp video for a beginner-friendly tutorial.

You'll also need an active Azure account and subscription to follow along.

Challenges We're Trying to Solve
– Deployments
– StatefulSets
– Persistent Volumes
Azure Kubernetes Service (AKS)
– Sign in to Your Azure Portal
– Create a Resource
– Create a new container
– Create a new Azure Kubernetes Service(AKS)
– Create a new resource group
– Give your Kubernetes cluster a name
– Navigate to the node pool page
– Enable container logs and set up alerts
– Advanced Section
– Tags
Connect to Your AKS Cluster
– Download Azure CLI and kubectl
– Verify if Azure CLI is installed
– Verify if kubectl is installed
– Login to Azure account
– Configure kubectl
How to Create Resources with YAML
– Clone the Repository
– Open the Repository
– Install Dependencies
YAML Configuration
– StorageClass
– PersistentVolumeClaim
– ConfigMap
– StatefulSet
– Service
How to Deploy YAML Resource to Azure
– Deploy the YAML resource
Node.js Application
– Configure Nodejs
– Run Nodejs Application
– Test the Application
– Open Postman
– Confirm the Data
– Delete Pod
– Data Persistence
Conclusion

Challenges We're Trying to Solve

Firstly, what is Kubernetes? Well, it's like a manager for your software containers. It helps you run and manage lots of containers like web servers, databases, microservices, and APIs which are like little packages holding your applications.

Kubernetes takes care of things like starting, stopping, and scaling these containers, so your apps run smoothly even when there is more load on your application. It's popular because it makes running software in the cloud easier and more reliable.

Now, let's talk about how to tackle some challenges you might face with a real-world application running Postgres in a Kubernetes production cluster.

Imagine that the infrastructure hosting your Postgres crashes, causing you to lose all the services and data stored in the database. Or, picture a scenario where the Postgres database becomes corrupted, leading to data loss.

In both cases, you need a way to back up your application so you can restore it to a working state if disaster strikes.

So, how do you capture a comprehensive application backup that includes all the necessary data? This backup should allow you to restore the entire application, including the database, if you lose your cluster or encounter data loss.

In Kubernetes, think of a Pod as the tiniest unit that you can deploy. It's like a small box that holds one thing, like a web server or a database. So, if your Pod isn't running, your web server or database isn't either.

This means that if the cluster where your Pod runs gets destroyed, all the data in the Pod disappears too. All the nodes (virtual machines that run your application over the network) will also be wiped out.

How can you make a pod stay on one specific node where the data is and never move? And how can you make sure that each pod can be found separately when you're using a load balancer?

One solution is to consider how you deploy your application on Kubernetes. Typically, you create a deployment and expose it using a service, specifying the service type as either Cluster type, NodePort, or LoadBalancer.

But not all applications are the same when it comes to state. Some applications, known as stateless applications, don't rely on storing data locally, so losing their state isn't a big issue.

But for applications like databases or caches, maintaining state is crucial because they rely on storage. In Kubernetes, deploying stateful applications like databases using just deployment isn't ideal. You need a solution that ensures your application's data is safely stored and can be recovered in case of failure.

Deployments

You might be wondering why we can't just use a Kubernetes deployment to deploy Postgres in the Kubernetes cluster? Well, the thing is, many people aren't aware of the difference between a deployment and a stateful set.

Let's imagine you have a pod running in your cluster that you created using a deployment. Then you scaled up to two pods, so you now have Pod A and Pod B.

The problem arises because, by default, pods created as part of the same deployment share the same persistent volume (PV) across the cluster. So, when you scaled up, both instances of Postgres would write to the same storage, which could lead to data corruption.

Another issue arises from a networking perspective. Pods A and B don't have a dependable way to communicate with each other over the network. By default, Kubernetes pods don't have their own DNS names. Instead, you rely on services to expose ports to other applications in the cluster.

If you take a closer look at pod names, you'll notice that pods are assigned a random hash at the end of their names. Because of this, pods lack a consistent network identity. Every time a pod is destroyed and recreated, it receives a new randomized name. This inconsistency isn't ideal for reliable networking.

Postgres isn't naturally made for Kubernetes, and Kubernetes can be tough when handling stateful tasks. To set up a Postgres instance, you've got to know the right Kubernetes setup. You can't just throw it in a pod, because if the pod goes down, so does your data. But, for a quick integration, a pod could work fine.

Deployments aren't ideal either, since you don't want your pod randomly placed on a node. But for testing, deployments are handy if you just need a Postgres instance to run temporarily.

What you really want is a pod that sticks to a particular node where your data resides, and stays put. Plus, you also want your pod to be individually addressable. for this we need what we called a statefulSet.

StatefulSets

When you update your deployment to become a StatefulSet, Kubernetes introduces some improvements for deploying stateful workloads. One major change is how it handles scaling.

If you specify that you want three replicas of your StatefulSet, Kubernetes won't create all three pods at once. Instead, it creates them one by one. Each pod gets its own unique DNS name, starting with the pod's name followed by an ordinal number starting from zero. So, when you scale up, the ordinal number increases for each new pod.

Here's the cool part: if a pod like Pod-0 is destroyed and needs to be remade, it will return with the same name. This means each pod has a specific address, even if it's replaced.

And here's another cool feature: each pod in a StatefulSet gets its own persistent volume (PV). This lets you keep the same storage even if you scale up or down. This brings us to another concept called persistent volumes.

Persistent Volumes

Let's forget about pods, deployment, and containers for a moment. What exactly is "state"? In simple terms, state is the data that your applications need to work properly.

Now, when we talk about processes, there are two types: stateless and stateful. Stateless processes don't rely on any data to work. They just do their thing without needing any specific information. On the other hand, stateful processes need data or state to function properly.

Now, where do you store this state? There are two main places: memory and disk. Memory allows for quick access to data, which is great for applications like Redis, MongoDB, Postgres, or MySQL. They store their state on memory for quick access. But for persistent, they store it on disk on the file system (for more permanent storage).

Why the file system? Because it's the only way to keep the state persistent even when the system reboots. So, when a process dies and gets recreated, it can read its state from the file system.

I like breaking things down because I used to teach tech stuff. Now, let's get into setting up Kubernetes in Azure.

Azure Kubernetes Service (AKS)

In this section, I'll guide you through setting up a Kubernetes cluster on Azure.

To begin, you will have to sign into your Azure portal. Once logged in, you should see a dashboard similar to this:

Azure portal homepage

Step 2: Create a resource

Click on "create a resource" to create a resource.

Resources are the various services, components, and assets that you can create and manage within the Azure cloud platform. These resources can include virtual machines, databases, storage accounts, networking components, web applications, and more.

Creating a resource in Azure portal

Step 3: Create a new container

Next, navigate to the "Containers" category from the options available on the left pane. Click on Containers as shown by the arrow in the screenshot.

Again, Kubernetes is a container orchestration platform. It manages and orchestrates the deployment, scaling, and operation of application containers across clusters of machines. Kubernetes provides a framework for automating the deployment, scaling, and management of containerized applications.

Creating new container (Kubernetes) in Azure

Step 4: Create a new Azure Kubernetes Service (AKS)

Select "Azure Kubernetes Service (AKS)" from the list of available container services and click Create. This will take you to the AKS creation page.

Creating a new Azure Kubernetes service

Step 5: Create a new resource group

In the "Resource group" section, click on "Create new" to create a new resource group for your Azure Kubernetes Service (AKS) deployment.

In Azure, a "resource group" is a logical container used to group together related Azure resources. It serves as a way to organize and manage these resources collectively, rather than individually.

When you create resources such as virtual machines, databases, storage accounts, or any other Azure service, you typically associate them with a resource group.

Creating a new resource group in azure portal

Let's name the resource group "AZURE-POSTGRES-RG" as shown below. You can name it anything you like. Then click ok.

Inputting name for the resource group

Step 6: Give your Kubernetes cluster a name

Now let's name the session for configuring the Kubernetes cluster "Kubernetes Cluster Name".

In Azure, a Kubernetes cluster is a managed container orchestration service provided by Azure Kubernetes Service (AKS). It allows you to deploy, manage, and scale containerized applications using Kubernetes without having to manage the underlying infrastructure.

Give it a name like "AZURE-POSTGRES-KC" and and select a region that's close to you. In my case I select (Asia Pacific) East Asia and click next.

Naming the Kubernetes cluster name

Step 7: Navigate to the node pool page

Now it's time to configure the node pool session by clicking on the agentpool.

In Azure, a node pool is a group of virtual machines (VMs) that are provisioned and managed together within an Azure Kubernetes Service (AKS) cluster. Each node pool runs a specific version of Kubernetes and has its own set of configurations, such as VM size, OS image, and node count.

Editing agentpool

Set the minimum node count to 1, maximum node count to 2, and the maximum pods per node to 30 to minimise cost. Then click update.

These parameters help control the size and behavior of the node pool in an Azure Kubernetes Service (AKS) cluster:

Minimum Node Count: Ensures a minimum number of nodes are always available for consistent performance and availability, even during low-demand periods.
Maximum Node Count: Sets an upper limit on the number of nodes in the node pool to manage costs and prevent over-provisioning.
Maximum Pods per Node: Defines the maximum number of pods that can run on each node, optimizing resource utilization and preventing overcrowding.

Updating agentpool details

Once you've clicked "Update," you'll be directed to the "Networking" section as shown below. Keep the page as is and proceed by clicking "Next." This will take you to Integration session.

Navigating to Next Page

Azure Container Registry (ACR) is a fully managed private Docker registry service provided by Microsoft Azure. It enables developers to store, manage, and deploy Docker container images securely within their Azure environment.

You will need a place to store the Docker image that's pulled.

To begin, select "Create New" to set up a new container registry. This action will bring up a page where you can input the necessary details, as illustrated on the right side of the image below. Enter the details as indicated by the arrows and then click "Okay." Once you're done, proceed by clicking "Next."

Naming and editing Azure Container Registry details

Step 8: Enable container logs and set up alerts

The Enable Container Logs option allows you to turn on logging for your containers. Logging records important information about what's happening inside your containers, like errors, warnings, and other events. It's useful for troubleshooting and monitoring your applications.

Choosing container logs

Step 9: Advanced section

Keep the Monitoring section unchanged and proceed by clicking "Next."

Navigating to Next Page

Step 10: Tags

Keep the Tags section unchanged and proceed by clicking "Next."

Navigating to Next Page

Step 11: Click "Review + create" to finalize the deployment

Once completed, your resource group, Azure Kubernetes Service (AKS), Azure Container Registry, and Kubernetes cluster will be created.

Completing Azure Kubernetes Setup

The screenshot below shows that the deployment was successful.

Successful Deployment

You've just successfully created an Azure Kubernetes Service from the Azure portal. Congrats!

How to Connect to Your AKS Cluster Using the Command Line

After successfully creating a new AKS in the Azure portal, the next step is to establish a connection to that cluster.

In this section, I'll guide you through Azure login, configuring kubectl to use the current context, and creating the YAML file for our Postgres container. This file will include StatefulSet, persistent volume, persistent volume claim, config map, and using Azure File for data storage.

I'll also show you how to run a Node.js Express application locally, use Postman to test the endpoints, and receive a response confirming that data was sent to the database successfully.

Download Azure CLI and kubectl

To start, you'll need to download the Azure CLI and kubectl.

Azure CLI (Command-Line Interface): a command-line tool provided by Microsoft for managing Azure resources. It allows users to interact with Azure services and resources directly from the command line, making it easy to automate tasks, create scripts, and manage Azure resources programmatically.
kubectl: a command-line tool for managing Kubernetes clusters, used to deploy, scale, and manage containerized applications. It allows users to perform operations like deploying applications, managing pods, services, and deployments, inspecting cluster resources, scaling applications, and debugging issues, simplifying management of containerized workloads in a Kubernetes environment.

I'm using the warp terminal. Warp is the terminal reimagined with AI and collaborative tools for better productivity. You can run the command using PowerShell on Windows or Terminal on Mac. I'm using a MacBook.

Verify if the Azure CLI is installed by typing the command `az --version`

Once the download finishes, verify whether Azure CLI is installed on your computer by running the command az --version. If the installation is successful, you should see an output similar to this:

Verifying Azure CLI Installation

Verify if kubectl is installed

To check if kubectl is installed, just type kubectl version in the command line.

Verifying Kubectl Installation

Enter az login in the command line. This will open your browser and prompt you to sign in to your Azure account.

Logging into Azure

After signing in, it shows details about your Azure subscription, including the subscription name, ID, and user information.

Select an Azure subscription

Azure subscriptions are logical containers used to provision resources in Azure. You'll need to locate the subscription ID that you plan to use in this module. Use the command to list your Azure subscriptions:

az account list --output table

Use the following command to ensure you're using an Azure subscription that allows you to create resources for the purpose of this module, substituting your subscription ID (SubscriptionId):

az account set --subscription "Name of the subscription"

Configure kubectl to connect to your Azure Kubernetes

Replace Your_Azure_Resource_groups_name in the code below with the name you chose when creating a resource group. Also, replace your_azure_kubernetes_service_name with the name of your Kubernetes cluster. Then, execute the following command:

az aks get-credentials --resource-group [Your_Azure_Resource_groups_name] --name [your_azure_kubernetes_service_name]

The output should look like this:

Merging kubectl with Azure Kubernetes Service

Verify if kubectl has been merged successfully

Run the following command kubectl get nodes:

Verifying if merged is successful

When you run this command, Kubernetes communicates with the cluster's control plane to fetch a list of all the nodes that are part of the cluster you created. As you can see, this is the node that was running in the Kubernetes cluster we created inside Azure.

Virtual Node running in AKS cluster

Run the command `kubectl get pods`

Displaying Pod Information

When you run the command kubectl get pods, Kubernetes attempts to retrieve information about all pods within the default namespace of your cluster. But in this case, the output indicates that there are no resources (pods) found within the default namespace, implying that no pods currently exist in that namespace.

A namespace in Kubernetes is a virtual cluster environment within which resources like pods, services, and deployments are organized and isolated. It's a way to divide cluster resources between multiple users, teams, or projects. Namespaces provide a scope for names and make it easier to manage and control access to resources.

By default, Kubernetes starts with a "default" namespace, but you can create additional namespaces to organize and manage resources more effectively. Namespaces help prevent naming conflicts and provide a logical separation of resources, allowing different teams or projects to work independently within the same Kubernetes cluster.

Create a namespace

Creating Namespace

When you run the command kubectl create namespace database, Kubernetes creates a new namespace named "database." The output "namespace/database created" confirms that the namespace has been successfully created.

You can now use this namespace to organize and manage resources related to databases within the Kubernetes cluster.

Confirm the namespace

The command kubectl get namespace lists all namespaces in the Kubernetes cluster including the database namespace we just created, showing their names, status (active), and age.

Confirming namespace

Get pod information in database namespace

Displaying Pod Information related to database namespace

This command, kubectl get pods -n database, attempts to fetch information about pods specifically within the "database" namespace. But the output No resources found in database namespace indicates that there are currently no pods deployed in the "database" namespace.

How to Create Resources with YAML

Let's explore creating resources with YAML to provision our PostgreSQL database running in an Azure Kubernetes cluster. But first, what exactly is YAML?

Kubernetes YAML is a configuration file written in YAML (YAML Ain't Markup Language). They define how Kubernetes resources like pods, deployments, and services should be set up within a cluster. These files are easy to read and specify details like resource names, types, specifications, labels, and annotations. They're crucial for deploying applications and infrastructure on Kubernetes clusters.

YAML is what you will use to create Kubernetes resources that will run Postgres.

First, you need to clone this GitHub repository. Inside, you'll find a Node.js Express application and a YAML file. The Node.js app allows users to register with their email, password, and full name, and also enables them to log in by verifying their details in the database. If their details are found, it displays a success message.

Clone the repository

Create a new folder on your computer and then clone this repository into it.

Open your terminal or PowerShell, go to the folder you want, and use the command below to clone the repository into your computer in that location.

git clone https://github.com/ayowilfred95/Azure-k8s-postgres.git

Open the cloned repository in any text editor

I'm using Visual Studio Code, but feel free to use any text editor you prefer. Here's the structure of the project:

Project folder structure

Install project dependencies

Open the terminal in VS Code and go to the main directory of the project. Next, execute the command npm install to install all the required packages and dependencies for the project:

npm install

Since the backend application is a Node.js Express app, you use npm to install dependencies (similar to how we use maven clean install in Java).

After the dependencies are installed, open the file named "postgres.yaml". It holds all the YAML configurations required to set up your PostgreSQL database that will run in the Kubernetes cluster.

YAML Configuration

In the postgres.yaml file, there are five configurations separated by ---. It's important to use this "---" symbol when declaring different types of Kubernetes resources. If you forget to do this, you'll encounter an error.

StorageClass

The first one is the StorageClass. This YAML configuration defines a StorageClass in Kubernetes for managing storage resources.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azuredisk-premium-retain
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Retain   # Retain or Delete
volumeBindingMode: WaitForFirstConsumer   # WaitForFirstConsumer or Immediate
allowVolumeExpansion: true    # true or false
parameters:
  storageaccounttype: Premium_LRS   # Premium or Standard
  kind: Managed

Let's break down what each part means:

kind: StorageClass: Indicates the type of Kubernetes resource being defined, which is a StorageClass. A StorageClass defines the class of storage offered by a cluster.
apiVersion: storage.k8s.io/v1: Specifies the Kubernetes API version being used for this resource.
metadata: name: azuredisk-premium-retain: Provides metadata for the StorageClass, including its name, which in this case is "azuredisk-premium-retain".
provisioner: kubernetes.io/azure-disk: Specifies the provisioner responsible for provisioning storage. In this case, it's "kubernetes.io/azure-disk", indicating that Azure Disk will be used as the storage provisioner.
reclaimPolicy: Retain: Defines the reclaim policy for the storage resources. It specifies what action should be taken when the associated persistent volume is released. Here, it's set to "Retain", meaning the volume is retained even after it's no longer used by a pod.
volumeBindingMode: WaitForFirstConsumer: Specifies the volume binding mode, which determines when volume binding should occur. In this case, it's set to "WaitForFirstConsumer", meaning the volume will be bound when the first pod using it is created.
allowVolumeExpansion: true: Indicates whether volume expansion is allowed. Setting it to "true" means that the size of the volume can be increased if needed.
parameters: Contains additional parameters specific to the provisioner. Here, it specifies the storage account type as "Premium_LRS" and the kind of storage as "Managed".

Overall, this configuration sets up a StorageClass named "azuredisk-premium-retain" using Azure Disk as the provisioner, with specific policies and parameters tailored for Azure storage.

PersistentVolumeClaim

The second configuration in the postgres.yaml file is the persistent volume claim.

This YAML configuration defines a PersistentVolumeClaim (PVC) in Kubernetes, which is used to request storage resources.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-managed-disk-pvc
spec:
  accessModes:
  - ReadWriteOnce   # ReadWriteOnce, ReadOnlyMany or ReadWriteMany
  storageClassName: azuredisk-premium-retain
  resources:
    requests:
      storage: 4Gi

Let's break down what each part means:

apiVersion: v1: Specifies the Kubernetes API version being used for this resource.
kind: PersistentVolumeClaim: Indicates the type of Kubernetes resource being defined, which is a PersistentVolumeClaim. A PVC is used by pods to request storage resources.
metadata: name: azure-managed-disk-pvc: Provides metadata for the PersistentVolumeClaim, including its name, which is "azure-managed-disk-pvc".
spec: Describes the desired state of the PersistentVolumeClaim.
accessModes: - ReadWriteOnce: Specifies the access mode for the volume. Here, it's set to "ReadWriteOnce", meaning the volume can be mounted as read-write by a single node at a time.
storageClassName: azuredisk-premium-retain: Specifies the StorageClass to use for provisioning the volume. This PVC will use the StorageClass named "azuredisk-premium-retain" defined previously.
resources: requests: storage: 4Gi: Specifies the desired storage capacity for the volume. Here, it requests 4 gigabytes (Gi) of storage.

Overall, this configuration sets up a PersistentVolumeClaim named "azure-managed-disk-pvc" requesting storage resources with specific access modes, storage class, and storage capacity.

ConfigMap

The third configuration in the postgres.yaml file is the config map. This YAML configuration defines a ConfigMap in Kubernetes, which is used to store configuration data in key-value pairs.

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  labels:
    app: postgres
data:
  POSTGRES_DB: freecodecamp
  POSTGRES_USER: freecodecamp1
  POSTGRES_PASSWORD: freecodecamp@
  PGDATA: /var/lib/postgresql/data/pgdata

Let's break down what each part means:

apiVersion: v1: Specifies the Kubernetes API version being used for this resource.
kind: ConfigMap: Indicates the type of Kubernetes resource being defined, which is a ConfigMap. A ConfigMap is used to store non-confidential data in key-value pairs.
metadata: name: postgres-config: Provides metadata for the ConfigMap, including its name, which is "postgres-config".
labels: app: postgres: Labels are key-value pairs used to organize and select resources. Here, a label "app" with the value "postgres" is applied to the ConfigMap.
data: Contains the key-value pairs of configuration data.
POSTGRES_DB: pisonitsha: Specifies the name of the PostgreSQL database as "pisonitsha".
POSTGRES_USER: pisonitsha1: Specifies the username for accessing the PostgreSQL database as "pisonitsha1".
POSTGRES_PASSWORD: pisonitsha@: Specifies the password for accessing the PostgreSQL database as "pisonitsha@".
PGDATA: /var/lib/postgresql/data/pgdata: Specifies the location of PostgreSQL data directory as "/var/lib/postgresql/data/pgdata".

Overall, this configuration sets up a ConfigMap named "postgres-config" containing key-value pairs of configuration data, such as database name, username, password, and data directory location, which can be used by other Kubernetes resources.

Note: It's recommended to avoid hardcoding secret variables such as POSTGRES_DB, POSTGRES_PASSWORD,PGDATA and instead store them in secret files, for the sake of simplicity in this tutorial, we'll keep them hardcoded.

StatefulSet

The fourth configuration is the stateful set.This YAML configuration defines a StatefulSet in Kubernetes, which is used to manage stateful applications like databases.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:10.4
          imagePullPolicy: "IfNotPresent"
          ports:
          - containerPort: 5432
          envFrom:
          - configMapRef:
              name: postgres-config
          volumeMounts:
          - name: azure-managed-disk-pvc
            mountPath: /var/lib/postgresql/data
      volumes:
      - name: azure-managed-disk-pvc
        persistentVolumeClaim:
          claimName: azure-managed-disk-pvc
    ```

Let's break down what each part means:

* `apiVersion: apps/v1`: Specifies the Kubernetes API version being used for this resource.
* `kind: StatefulSet`: Indicates the type of Kubernetes resource being defined, which is a `StatefulSet`. `StatefulSets` are used to manage stateful applications by providing unique **identities** and stable **network** identities to each pod.
* `metadata: name: postgres`: Provides metadata for the `StatefulSet`, including its name, which is "postgres".
* `spec`: Describes the desired state of the `StatefulSet`.
* `serviceName: postgres`: Specifies the name of the Kubernetes service that will be used to access the `StatefulSet` pods.
* `selector: matchLabels: app: postgres`: Selects the pods controlled by this `StatefulSet` based on the label "app: postgres".
* `replicas: 1`: Specifies the desired number of replicas (instances) of the StatefulSet, which is 1 in this case.
* `template`: Defines the pod template used to create pods managed by the `StatefulSet`.
* `metadata: labels: app: postgres`: Labels applied to the pods created from this template.
* `spec`: Describes the specification of the containers within the pod.
* `containers`: Specifies the containers running in the pod.
* `name: postgres`: Defines the name of the container as "postgres".
* `image: postgres:10.4`: Specifies the Docker image used for the container, which is "postgres:10.4".
* `imagePullPolicy: "IfNotPresent"`: Specifies the policy for pulling the container image, which is "IfNotPresent", meaning it will only pull the image if it's not already present on the node.
* `ports: containerPort: 5432`: Specifies the port that the PostgreSQL service inside the container is listening on.
* `envFrom: configMapRef: name: postgres-config`: Injects environment variables from a ConfigMap named "**postgres-config**" that you defined earlier.
* `volumeMounts: name: azure-managed-disk-pvc mountPath: /var/lib/postgresql/data`: Mounts a persistent volume claim named "azure-managed-disk-pvc" to the container at the specified path.
* `volumes: name: azure-managed-disk-pvc persistentVolumeClaim: claimName: azure-managed-disk-pvc`: Defines the persistent volume claim named "azure-managed-disk-pvc" to be used by the pod.

Overall, this configuration sets up a StatefulSet named "postgres" with one replica, running a PostgreSQL container with specific settings and mounted persistent storage.

### Service

The fifth configuration is the **service**. This YAML configuration defines a **Service** in Kubernetes, which is used to expose the `StatefulSet` we declared earlier as a network service.

```bash
apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  type: LoadBalancer
  selector:
    app: postgres
  ports:
    - protocol: TCP
      name: https
      port: 5432
      targetPort: 5432

Let's break down what each part means:

apiVersion: v1: Specifies the Kubernetes API version being used for this resource.
kind: Service: Indicates the type of Kubernetes resource being defined, which is a Service. Services allow pods to be accessed by other pods or external users.
metadata: name: postgres: Provides metadata for the Service, including its name, which is "postgres".
labels: app: postgres: Labels are key-value pairs used to organize and select resources. Here, a label "app" with the value "postgres" is applied to the Service.
spec: Describes the desired state of the Service.
type: LoadBalancer: Specifies the type of Service, which is "LoadBalancer". This type allows the Service to be exposed externally with a cloud provider's load balancer.
selector: app: postgres: Selects the pods controlled by the Service based on the label "app: postgres".
ports: Specifies the ports that the Service will listen on.
protocol: TCP: Specifies the protocol used for the port, which is TCP.
name:https : Specifies a name for the port, which is "https".
port: 5432: Specifies the port number on which the Service will listen, which is 5432.
targetPort: 5432: Specifies the target port on the pods to which traffic will be forwarded, which is also 5432. This means that traffic received on port 5432 of the Service will be forwarded to port 5432 on the pods.

Overall, this configuration sets up a Service named "postgres" with a LoadBalancer type, forwarding traffic on port 5432 to pods labeled with "app: postgres".

How to Deploy YAML Resource to Azure Kubernetes Service (AKS)

You've previously connected "kubectl" with the Azure Kubernetes Service (AKS) you set up. Let's double-check it.

In your VS Code terminal, rerun the command kubectl get nodes. You'll see an output like this, though your node's value will be different.

Displaying node information running in Azure Kubernetes cluster

Next, verify the namespace you previously created by executing the command: kubectl get namespace database. Your output should resemble this:

Retrieving Namespace Information

Deploy the YAML resource

Once you've confirmed everything is set, you can deploy the YAML resource. This will establish your PostgreSQL database in the Azure Kubernetes cluster you've configured.

Run the below command in the main directory where the configuration file is located. Currently, I'm in the project's root directory (azure-k8s-postgres). To deploy the database, just execute this command below:

kubectl apply -n database -f postgres.yaml

Your output should look like this. This output confirms that all these components have been successfully created in Kubernetes.

Applying Configuration to Namespace

Execute the command below to verify that the pod is running:

kubectl get pods -n database

Your output should look like this:

Fetching Pods in Namespace

This output confirms that a pod name "postgres-0" is running in your Azure Kubernetes Cluster. But it was not the only pod you created. As I said earlier, to connect to a pod, you need what is called service. And you have declared a service resource in our configuration file which has also been deployed into your Kubernetes.

To get the status of the service, run this command:

kubectl get services -n database

Your output should look like this:

Retrieving Services in Namespace

This output displays the services in the "database" namespace, including a service named "postgres" with the type "LoadBalancer," its internal cluster IP, external IP, and port mappings. You'll utilize the external IP along with the Postgres port "5432" to connect your database with the Node.js application. Note that your external IP will differ from mine.

Node.js Application

In this section, I'll guide you through setting up your Node.js app to connect to a PostgreSQL database in your Azure Kubernetes Service.

We'll cover sending data into the database and retrieving it using Postman. Also, I'll demonstrate how to check if the data remains in the database even if the pod running PostgreSQL in the cluster is deleted.

Configure your Node.js application

Go to the database folder and open the database.js file. Replace the host with your EXTERNAL-IP obtained from the service, and leave the rest unchanged since you've already defined those variables in your config map.

Your database.js file should resemble the CodeSnap below:

CodeSnap of database.js Configuration

Run your Node.js application

In your VS Code terminal, execute this command to start the Node.js application locally:

npm start

Your output should look like this if the connection is established successfully.

Server Listening on Port 4000

If your output looks the same as mine, it indicates that you've successfully connected your Node.js application to the PostgreSQL database running in your Azure Kubernetes cluster. Congratulations! 🎉

Test the application

Testing is a fundamental principle in DevOps operations. It helps us understand the state of the application we've built before releasing it to users. Any application that doesn't pass the testing stage will not be deployed. This is a rule in DevOps.

For this tutorial, you'll be using Postman. You can download Postman here. Postman enables you to test API endpoints by receiving status responses.

Check out this post on how to use Postman to test APIs. If you want to learn more, here's a full course on the subject.

Open your Postman application

To begin using Postman, start by creating a new API request in your preferred workspace. Choose POST. POST requests add new data to the database or server. Then, paste the endpoint URL (localhost:4000/api/v1/admin/register) for your Postman test.

The below screenshot illustrates how you will create a POST request.

Postman URL Endpoint Configuration

In the body, paste the JSON data shown below inside it as shown below:

{   
    "fullName":"Azure postgres freecodecamp",
    "email":"freecodecamp@gmail.com",
    "password":"freecodecamp"
}

Postman Request Body

Once you've set up the request, just click the "Send" button to send it. Postman will then show you status codes, and the response payload as shown below.

Postman API Response

Confirm the data

To confirm that the data you sent into the database exists, make a GET request to this endpoint URL: localhost:4000/api/v1/admin/freecodecamp@gmail.com

Postman GET Request URL Endpoint

When you click send, Postman will then show you status codes and the response payload as shown below. Notice that we didn't put anything in the body because this is a GET request.

Postman GET Request by Email Retrieval Response

Delete the pod to confirm data persistence

We chose to create our PostgreSQL database using a StatefulSet to ensure that data persists even if the pod is destroyed. Let's test this by deleting the pod and checking if the data remains intact.

In your VS Code terminal, execute the command: kubectl delete pod -n database postgres-0.

This command deletes a pod named "postgres-0" in the "database" namespace from your Kubernetes cluster. Your output should look like this.

Deleting Pod in Namespace:

Pod recreation

Kubernetes has a built-in feature called replication controllers or replica sets that ensure a specified number of pod replicas are running at any given time. If a pod is deleted, Kubernetes will automatically recreate it to maintain the desired number of replicas, ensuring high availability

If you run kubectl get pods -n database, you'll notice that Kubernetes has created a new pod with the same name, "postgres-0", to replace the one that was deleted. This ensures that the application remains available and continues to function as expected.

Pod Recreated back in Namespace

Data persistence

Navigate back to Postman and make a GET request to the endpoint URL localhost:4000/api/v1/admin/freecodecamp@gmail.com.

You should get the same response as before. So under the hood, when we delete the pod, the storage disk was not deleted. The storage disk is inside the Azure disk. How do we know that? If you run this command:

kubectl get pvc -n database

you should get this output:

Persistennce Volume Claim details in namespace

This shows details about a storage called "azure-managed-disk-pvc" in your Kubernetes. It's currently in use and has 4 gigabytes of space available. It's set up to be read and written to by one system at a time. This storage is provided by a service called "azuredisk-premium-retain" that we configured earlier.

Clean up resources

In this tutorial, you created Azure resources in a resource group. If you won't need these resources later, delete the resource group from the Azure portal or run the following command in your terminal:

az group delete --name AZURE-POSTGRES-RG --yes

This command might take a minute to run.

Conclusion

We've gone on quite a journey here! You've learned how to deploy a Postgres container in Azure Kubernetes Service (AKS) and integrate it with a Node.js application.

In this tutorial, I guided you through the process of configuring Kubernetes using Azure Kubernetes Service (AKS). You learned to customize YAML files utilizing StatefulSet, Persistent Volume, and Services to deploy a PostgreSQL database on Azure Kubernetes. You also acquired PostgreSQL database credentials running within AKS to establish connectivity with a Node.js application. I then provided detailed instructions on connecting your Node.js Express app to the Postgres container within the AKS cluster.

Thank you for reading!

How to Implement Relationship Based Access Control (ReBAC)

freeCodeCamp — Fri, 08 Mar 2024 14:08:21 +0000

By Imran

In today's digital age, managing who can access what resources is more critical than ever. That's where ReBAC comes in. It's a fresh take on authorization, focusing on the relationships between different entities rather than just assigning static roles or attributes.

Traditional access control methods, like Role-Based Access Control (RBAC), assign specific roles to users. While this works in many cases, it can become difficult, especially in dynamic environments where roles and permissions need to adapt quickly. On the other hand, Attribute-Based Access Control (ABAC) offers flexibility based on user attributes, but it can get complex to manage.

Now, ReBAC is all about understanding the intricate web of relationships between entities. Whether it's within an organization, a social media platform, or a project management tool, ReBAC ensures that access control remains dynamic and context-aware.

By the end of this tutorial, you'll have a clear understanding of ReBAC and be able to model a ReBAC scenario.

Key Takeaways

ReBAC Principles: Understand how ReBAC uses relationships between entities for access control, differing from traditional models.
Policy Visualization: Learn about representing policies as graphs for clearer management.
Real-World Examples: Explore ReBAC's application in scenarios like social media platforms and project management tools.
Benefits of ReBAC: Discover the advantages like granular control and dynamic policy adaptation.
Permission Models: Get familiar with ReBAC's common models such as Ownership and Hierarchical Models.
Permify Implementation: Step-by-step guide to implement ReBAC in Permify, including entity definition, relationship establishment, and permissions setup.

What is Relationship-Based Access Control (ReBAC)?

Traditional access control methods, like Role-Based Access Control (RBAC), assign specific roles to users, like giving someone a badge that says "manager" or "employee". But what if the roles aren't so clear-cut, and relationships between people and resources matter more?

That's where Relationship-Based Access Control (ReBAC) steps in. Instead of relying solely on predefined roles or attributes, ReBAC considers the intricate web of connections between users, resources, and other entities. It's like saying, "You can access this because you're connected to it in this specific way", rather than just based on a generic label.

But how does ReBAC actually do this? ReBAC examines the relationships between entities, such as users and resources, and uses these connections to determine access.

Let's break it down further. In our everyday lives, we have relationships that matter. Think about social media – you can see certain posts because you're friends with someone or because someone you follow liked it. ReBAC takes this idea and applies it to access control in systems.

Policy as a Graph

At the core of ReBAC lies the concept of "Policy as a Graph". This idea shows the importance of visualizing access policies through relationships.

Imagine that you have a detailed map of a bustling city. It doesn't just show buildings but also the connections between them – the roads, bridges, and pathways that link everything together.

Now, picture this map as a representation of your organization. Instead of buildings, it represents team members, departments, and their roles. The connections between them symbolize the relationships that dictate access.

This is what we mean by "Policy as a Graph" in ReBAC.

In simpler terms, access policies are like interconnected dots on a graph. Each dot represents an entity, and the lines between them signify the relationships influencing authorization. It's a visual representation that helps us understand the complex web of connections that govern access.

How is ReBAC Different from Other Control Models?

Now, let's explore how ReBAC sets itself apart from other access control models, such as Role-Based Access Control (RBAC).

Unlike traditional models, ReBAC doesn't rely solely on rigid roles or attributes. Instead, it works on deriving permissions from existing relationships. Here's how it stands out:

Role Derivation:ReBAC allows the creation of authorization policies based on pre-existing relationships. This means that assigning a user a certain role in one context might automatically extend that role to related entities, saving the need for manual assignment.
Resource Roles:Unlike global roles in traditional models, ReBAC introduces the concept of resource-specific roles (for example: Folder#Owner). These roles are exclusive to the context of a particular resource, ensuring that permissions are relevant and tailored to that specific entity.

Real-World Examples

To better understand how Relationship-Based Access Control (ReBAC) functions in the real world, let's explore two scenarios that mimic everyday complexities.

These examples will help illustrate how ReBAC excels in managing intricate access dynamics.

Consider an Instagram-inspired platform where users hold individual accounts. Each account consists of user-generated content, namely pictures (Pic 1 and Pic 2), chat interactions with different users, and project collaboration.

The user account possesses a list of blocked users who are restricted from viewing pictures. Here's a detailed breakdown of the entities and permissions:

1. Account Entities

User Account: Represents individual user accounts on the platform.
Pictures (Pic 1 and Pic 2): Depict user-generated visual content.
Chats: Captures interaction histories with different users.
Blocked Users List: Maintains a list of users who are blocked from viewing pictures.

2. Permissions Dynamics

Account Access Permissions:

"Account#Owner" grants ownership, allowing the user account holder to manage all aspects.
"Account#Viewer" enables others to view the user's account.

Picture Management Permissions:

"Picture#Owner" designates ownership at the picture level, allowing the user to edit, delete, and upload pictures.
"Picture#Viewer" permits normal viewers to only view pictures.
"BlockedUser#CannotView" ensures that blocked users cannot view pictures.

Chat Interaction Permissions:

"Chat#Participant" allows users to participate in chat interactions.
"Chat#BlockedUser" restricts certain users from participating in chats.

Account Editing Permissions:

"Account#Edit" grants the ability to update account details and preferences.

Instagram-like Social Platform Entities

In this scenario, the "#" symbol represents the relationship between entities when defining permissions. For example, "Account#Owner" signifies ownership of the user account, allowing the account holder to manage all aspects of their account.

Project Management Tool

Imagine a project management tool where teams collaborate on various projects. Entities like "Teams", "Projects", and "Tasks" play central roles, showcasing ReBAC's adaptability:

1. Team Entities:

Teams: Represent collaborative groups within the project management tool.
Projects: Encompass various ongoing initiatives.
Tasks: Break down project activities into manageable tasks.

2. Permissions Dynamics:

Team Leadership Permissions:

"Team#Lead" designates team leadership, allowing leaders to manage team-related activities.

Project Ownership Permissions:

"Project#Owner" signifies ownership at the project level, granting comprehensive control over project-related actions.

Task Assignment Permissions:

"Task#Assignee" designates individuals responsible for specific tasks.

Project Management Tool Entities

These real-world scenarios demonstrate ReBAC's versatility and effectiveness in managing access control in different settings.

Advantages of ReBAC

Now, let's understand why Relationship-Based Access Control (ReBAC) stands out from traditional methods like Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC). ReBAC brings a host of benefits to the table, enhancing scalability, flexibility, and adaptability in complex organizational setups. Let's take a closer look at its key advantages:

Granular and Contextual Control

ReBAC allows organizations to define granular access controls tailored to the specific relationships between users, resources, and entities. This ensures that permissions are contextually relevant, providing a nuanced level of control.

Efficient Management of Hierarchies

In scenarios with hierarchical structures, ReBAC simplifies the management of access control. By allowing permissions to be inherited based on relationships, it reduces the need for manual role assignments.

This simplifies the creation of natural connections between different business units, resources, and entities, making it easier to navigate complex hierarchies.

Scalability and Adaptability

ReBAC is designed to scale with organizational growth and changes in relationships. It easily accommodates the introduction of new entities or connections. However, it's crucial to address the challenge of role explosion, where the number of roles grows exponentially alongside asset growth.

Without proper management, this can lead to security risks and administrative overhead. Yet, ReBAC's scalability ensures that access controls remain effective, mitigating these challenges and avoiding the need for extensive modifications.

With these advantages in mind, ReBAC offers a robust framework for access control that meets the evolving needs of modern organizations. Now, let's delve into the common Relationship Type Permission Models to further understand how ReBAC operates.

Common Relationship Type Permission Models

Let’s look at the some of the permission models:

Ownership Model

The Ownership Model in ReBAC is a fundamental concept where ownership relationships streamline authorization within hierarchical structures.

In this model, the act of owning a higher-level entity automatically extends ownership over its subordinate entities.

Imagine a scenario in a cloud storage platform where users create folders to organize their files. In the Ownership Model, the user who creates a folder is designated as the owner.

Consequently, this ownership relation automatically grants the user ownership permissions for all files within that folder.

This hierarchical ownership structure simplifies permission management and mirrors real-world ownership dynamics.

Parent-Child & Hierarchical Model

The Parent-Child & Hierarchical Model is a powerful tool for managing access control in hierarchical structures such as organizational frameworks or file systems.

In this model, permissions granted at the parent level down to its child entities, ensuring a cohesive and efficient authorization system.

Consider a corporate environment where organizations have multiple departments. Using ReBAC's Parent-Child & Hierarchical Model, permissions granted at the organization level, such as admin privileges, seamlessly extend to the organization's departments and their respective members.

This hierarchical flow of permissions reflects the organizational structure, making it easy to manage access control across different levels.

User Groups & Teams Model

The User Groups & Teams Model allows for efficient permission management by grouping users based on shared attributes or project affiliations.

In this model, permissions assigned to a group leader, for instance, can be effortlessly applied to all members of that group.

In a collaborative project management tool, teams serve as user groups. Applying ReBAC's User Groups & Teams Model, the team lead's permissions, like editing or deleting project tasks, can be automatically inherited by all team members.

This streamlined approach simplifies access control in collaborative environments, where team-based permissions are crucial for project efficiency.

These three Relationship-Based Access Control models demonstrate the flexibility and adaptability of ReBAC in handling diverse organizational structures and application domains.

By aligning permissions with inherent relationships among entities, ReBAC provides an intuitive and powerful access control framework.

How to Implement ReBAC with Permify

Now, let's practically implement ReBAC using Permify.

Permify is an open-source authorization as a service platform that allows developers to model, manage, and enforce access control in applications. It provides tools for defining complex authorization rules and relationships between entities, such as users, organizations, and resources.

Permify uses a domain-specific language for creating authorization models and offers a Playground environment for testing these models.

It also supports the creation of relational tuples and attributes for managing dynamic access control scenarios, streamlining the process of implementing robust and flexible authorization systems in software applications.

We'll create a scenario covering both Ownership and Parent-Child & Hierarchical models.

We'll use the Permify Playground for modeling.

Modeling

Modeling in Permify involves creating a schema that defines the relationships and permissions between different entities in your system.

Here's a simplified process:

Define Entities: Start by creating entities that represent the resources in your system (for example: users, organizations, teams).
Define Relations: Establish relationships between these entities. For example, an organization can have members and admins, or a team can be part of an organization.
Define Actions and Permissions: Specify the actions that can be performed on each entity and the conditions under which they are allowed. For example, only admins can delete an organization.

Permify uses its own language for modeling authorization logic, allowing for complex structures using set-algebraic operators. The modeling process includes defining entities, relations, actions, permissions, and, if needed, attributes for more advanced scenarios like ABAC (Attribute-Based Access Control).

Modeling in Permify is about creating a clear blueprint of your organization's structure and defining who gets to do what.

Let's break down how to model a schema in Permify.

Step 1: Define Entities

Entities are the core objects in your model. In this case, we have user, organization, department, project, file, and task.

entity user {}
entity organization {}
entity department {}
entity project {}
entity file {}
entity task {}

Step 2: Establish Relationships

Next, we specify relationships between these entities. This defines how they are connected.

Organization:

Has admin who are users.

entity organization {
    relation admin @user
}

Department:

Belongs to an organization (parent).
Has head, manager, and employee roles, all of which are users.

entity department {
    relation parent @organization
    relation head @user
    relation manager @user
    relation employee @user
}

Project:

Belongs to a department (parent).

entity project {
    relation parent @department
}

File:

Belongs to a department (parent).
Has an owner who is a user.

entity file {
    relation parent @department
    relation owner @user
}

Task:

Belongs to a project (parent).
Has an assignee who is a user.

entity task {
    relation parent @project
    relation assignee @user
}

Step 3: Define Permissions

Permissions determine what actions specific roles can perform on each entity.

Project:

contribute_to_project permission is granted to employee or manager of the parent department.

entity project {
    // ... (existing relations)
    permission contribute_to_project = parent.employee or parent.manager
}

File:

read, edit, and delete permissions are controlled based on the manager of the parent department and the owner.

entity file {
    // ... (existing relations)
    permission read   = parent.manager or owner
    permission edit   = parent.manager or owner
    permission delete = owner
}

Task:

view_task permission is given to the assignee.

entity task {
    // ... (existing relations)
    permission view_task = assignee
}

Full Schema:

// Define entities
entity user {}

entity organization {
    // Organizational roles
    relation admin @user
}

entity department {
    // Department roles
    relation parent @organization
    relation head @user
    relation manager @user
    relation employee @user
}

entity project {
    // Project roles
    relation parent @department

    // Permissions
    permission contribute_to_project = parent.employee or parent.manager
}

entity file {
    // Represents files' parent entity (department)
    relation parent @department

    // Represents the owner of the file
    relation owner @user

    // Permissions
    permission read   = parent.manager or owner
    permission edit   = parent.manager or owner
    permission delete = owner
}

entity task {
    // Represents tasks' parent entity (project)
    relation parent @project

    // Represents the assignee of the task
    relation assignee @user

    // Permissions
    permission view_task   = assignee
}

Relation Tuples

The creation of relation tuples for the organization schema can be accomplished through the Permify Playground and API.

Here's how the relationship tuples would be structured according to the schema:

User and Organization Relationships:

For assigning a user as an admin in an organization, the tuple would be: organization:ID#admin@user:ID.
To denote a user as a member of an organization: organization:ID#member@user:ID.

User and Department Relationships:

Assigning a head to a department: department:ID#head@user:ID.
Assigning a manager to a department: department:ID#manager@user:ID.
Associating an employee with a department: department:ID#employee@user:ID.
To set a department's parent organization: department:ID#parent@organization:ID.

Project and Department Relationships:

To define the parent department of a project: project:ID#parent@department:ID.

File Management:

Associating a file with its parent department: file:ID#parent@department:ID.
Defining the owner of a file: file:ID#owner@user:ID.

Task Management:

Linking a task to its parent project: task:ID#parent@project:ID.
Assigning a user as the assignee of a task: task:ID#assignee@user:ID.

In each of these tuples, ID is a placeholder that should be replaced with the actual identifier of the entity or user in your system.

For instance, if you have an organization with an ID of 1 and a user with an ID of 3, and you want to assign this user as an admin of that organization, the tuple would be organization:1#admin@user:3.

Relationships dashboard

These tuples are created and managed using the Permify API. The API allows for creating, updating, and deleting these tuples as needed, reflecting the dynamic nature of relationships and permissions in an organization. This flexibility ensures that your authorization data is always up-to-date and consistent with the current state of your system's entities and their relationships.

Enforcement

To enforce access control in your schema using Permify, you can create scenarios in the Permify Playground's Enforcement section. This is done using YAML to define various test scenarios.

Here's an example based on your schema and the created relation tuples:

Check if a user (for example: user:2) can contribute to a project:

Entity: project:1
Subject: user:2
Assertion: contribute_to_project: true or false (depending on whether user:2 is an employee or manager in the parent department of project:1).

- name: user_access_test
  checks:
    - entity: project:1
      subject: user:2
      context: null
      assertions:
        contribute_to_project: false
  entity_filters: []
  subject_filters: []

Check if a user (for example: user:4) can view a task:

Entity: task:1
Subject: user:4
Assertion: view_task: true or false (true if user:4 is the assignee of the task).

- name: user_access_test
  checks:
    - entity: task:1
      subject: user:4
      context: null
      assertions:
        view_task: false
  entity_filters: []
  subject_filters: []

These scenarios will help you validate the permissions as per your schema in a controlled environment.

Each assertion in the YAML scenario will define the expected outcome (true or false) for a particular action or permission based on your schema and data tuples.

YAML representation

For detailed steps and examples, refer to the Permify Modeling Documentation.

Conclusion

By following these steps, you can effectively implement a sophisticated ReBAC system using Permify.

This implementation will provide a robust, flexible, and secure access control framework tailored to the unique needs and relationships within your organization.

How Databases Guarantee Isolation – Pessimistic vs Optimistic Concurrency Control Explained

Daniel Adetunji — Mon, 05 Feb 2024 22:41:18 +0000

ACID (Atomicity, Consistency, Isolation, and Durability) is a set of guarantees when working with a DBMS. Pessimistic and optimistic concurrency control explains how databases achieve the “I” in ACID.

Isolation is a guarantee that concurrently running transactions should not interfere with each other. This is arguably the most important ACID property, because different DBMS can often have different default isolation levels. And you may need to change this based on what is needed for your application.

In a previous article, I explained the two main isolation levels used by most DBMS. These are the read committed and repeatable read isolation levels.

Pessimistic and optimistic concurrency controls essentially explain some of the ways a database is able to achieve these two isolation guarantees.

Pessimistic Concurrency Control
Pessimistic Concurrency Control Analogy
Real-World Example of Pessimistic Concurrency Control
Pros and Cons of Pessimistic Concurrency Control
How it Guarantees the Read Committed Isolation Level
Optimistic Concurrency Control
Real-World Example of Optimistic Concurrency Control
Pros and Cons of Optimistic Concurrency Control
How it Guarantees the Repeatable Read Isolation Level
Bringing it Together

Pessimistic Concurrency Control

With pessimistic concurrency control, the DBMS assumes that conflicts between transactions are likely to occur. It is pessimistic – that is, it assumes that if something can go wrong, it will go wrong. This pessimism prevents conflicts from occurring by blocking them before they get a chance to start.

To prevent these conflicts, it locks the data that a transaction is using until the transaction is completed. This approach is 'pessimistic' because it assumes the worst-case scenario – that every transaction might lead to a conflict. The data is therefore locked in order to prevent conflicts from happening.

I've mentioned two technical terms here that need clarification: locks and conflict.

What are locks?

A lock is a mechanism used to control access to a database item, like a row or table. Locks ensure data integrity, if multiple transactions are occurring at the same time.

In very simple terms, a lock is analogous to a reservation on the database item. A reservation, be it a restaurant, hotel, or a train, prevents other people from using the resource you reserved for a fixed duration of time. Locks work in a similar way.

There are two types of locks: a read lock and a write lock.

A read lock can be shared by multiple transactions trying to read the same database item. But it blocks other transactions from updating that database item.

A write lock is exclusive – that is, it can only be held by a single transaction. A transaction with a write lock on a database item blocks every other transaction from reading or updating that database item.

What are conflicts?

A conflict refers to a situation where multiple transactions are attempting to access and modify the same data concurrently, in a way that could lead to inconsistencies or errors in the database.

A Library Analogy for Pessimistic Concurrency Control

First, let us describe an analogy for a write lock.

Imagine you're at a library, and you want to borrow a hard copy of a popular book, say, The Great Gatsby by F. Scott Fitzgerald.

Write locks are analogous to borrowing a physical book from the library

With a write lock, the librarian assumes that there can be conflicts over who gets to borrow the book. So, they implement a strict rule to avoid conflicts: only one person can hold the reservation for a physical book at a time.

When you reserve the book, no one else can borrow it. The book is available to be reserved again only once it is returned. This is similar to how a write lock works.

Write locks are exclusive. This means that they can only he held by a single transaction at any time. Similarly, reserving a physical book from the library means no one else has access to it. Only the person with the reservation can read the book, or write in it (although writing in a library book is bad form).

Read locks work a bit differently.

A read lock is analogous to someone making a reservation to borrow an e-book. Borrowing an e-book is not a very popular thing to do, but some libraries do have such a service.

Many people can make the same reservation for the same e-book without any conflict. One person borrowing an e-book version of The Great Gatsby does not stop others from doing the same. But no one who borrows an e-book can update it, by scribbling notes in it that can be seen by others, for example.

Read locks are analogous to borrowing an e-book from the library

Pessimistic concurrency control is very safe because it prevents conflicts from occurring by blocking them before they get a chance to start. A write lock on a database item prevents other transactions from reading or updating that item while that lock is held, similar to to how a library stops more than one person from trying to borrow the same physical book at the same time.

A read lock on a database item allows other transactions to also obtain a read lock for that item, but prevents transactions from updating that item. This is analogous to borrowing an e-book, where multiple people can borrow the same e-book at the same time, but can’t make any updates to it.

A Simple Real-World Example of Pessimistic Concurrency Control in Action

Let's illustrate how pessimistic concurrency control works using a simple example involving a bank balance database table. Assume we have a table named Accounts with the following columns: AccountID and Balance.

Database columns for AccountID and Balance

Two transactions, T1 and T2, intend to update the balance of account 12345. T1 wants to withdraw $300, and T2 wants to deposit $400. At the end of these two transactions, the account balance should read $1600

Here are the steps of how this will work using write locks:

Start of T1 (Withdrawal): T1 requests to update the balance of AccountID 12345. The database system places an exclusive write lock on the row for AccountID 12345, preventing other transactions from reading or writing to this row until T1 is completed. T1 reads the balance ($1500).
T1 Processing: T1 calculates the new balance as $1200 ($1500 - $300).
Commit T1: T1 writes the new balance ($1200) back to the database. Upon successful commit, T1 releases the exclusive lock on AccountID 12345.
Start of T2 (Deposit) After T1 Completes: Now that T1 has completed and the lock is released, T2 can start. T2 attempts to read and update the balance for AccountID 12345. The database system places an exclusive lock on the row for AccountID 12345 for T2, ensuring no other transactions can interfere. T2 reads the updated balance ($1200).
T2 Processing: T2 calculates the new balance as $1600 ($1200 + $400).
Commit T2: T2 writes the new balance ($1600) back to the database. Upon successful commit, T2 releases the exclusive lock on AccountID 12345.
Result: The Accounts table is updated using locks After T1: $1200 After T2: $1600

Without a write lock in this example, T1 and T2 could read the original balance of $1500 at the same time. So, instead of a balance of $1200 after T1 has committed, T2 still reads the original balance of $1500 and adds $400. This would cause the final balance to be $1500 + $400 = $1900 (instead of $1600).

Absence of locking has created free money, which is never a bad thing for a customer. But, if money can be conjured out of thin air because of these conflicts, it can also vanish, and accidentally shrinking bank balances are a quick way to make customers unhappy.

Benefits and Challenges of Pessimistic Concurrency Control

Just like reserving a book ensures that it's set aside for one person, pessimistic concurrency control locks data for a single transaction. Other transactions cannot access or modify this data until the lock is released.

This method prevents two people from trying to take out the same popular book at the same time, thereby avoiding disputes. Similarly, in databases, it stops conflicts due to concurrent transactions before they get a chance to start.

But this approach can be inefficient. The reserved book might sit on the reserved shelf for a while, stopping other people from reading it.

In databases, this locking mechanism can lead to underutilisation of resources and a slowdown in the speed transactions take to complete, since a subset of the data is locked and inaccessible to other transactions.

How Pessimistic Concurrency Controls Guarantee the Read Committed Isolation Level

So, how exactly does pessimistic concurrency control work in ensuring the isolation guarantee, that is the “I” in ACID? The implementation details can vary across different DBMS. But the explanation here shows the general approach.

Recall that the read committed isolation level prevents dirty writes and dirty reads.

Preventing Dirty Writes

Overwriting data that has already been written by another transaction but not yet committed is called a dirty write. A common approach to preventing dirty writes is to use pessimistic concurrency control. For example, by using a write lock at the row level.

When a transaction wants to modify a row, it acquires a lock on that row and holds it until the transaction is complete. Recall that write locks can only be held by a single transaction. This prevents another transaction from acquiring a lock to modify that row.

Preventing Dirty Reads

Reading data from another transaction that has not yet been committed is called a dirty read. Dirty reads are prevented using either a read or write lock. Once a transaction acquires a read lock on a database item, it will prevent updates to that item.

But what happens if you are trying to read something that is already being updated but the transaction has not yet committed? In this instance, the write lock saves the day again.

Since write locks are exclusive (can’t be shared with other transactions), any transaction wanting to read the same database item will have to wait until the transaction with the write lock is committed (or aborted, if it fails). This prevents other transactions from reading uncommitted changes.

Optimistic Concurrency Control

With optimistic concurrency control, transactions do not obtain locks on data when they read or write. The "Optimistic" in the name comes from assuming that conflicts are unlikely to occur, so locks are not needed. If something does go wrong though, conflicts will still be prevented and everything will be OK.

Unlike pessimistic concurrency control – which prevents conflicts from occurring by blocking them before they get a chance to start – optimistic concurrency control checks for conflicts at the end of a transaction.

With optimistic concurrency control, multiple transactions can read or update the same database item without acquiring locks. How exactly does this work?

Every time a transaction wants to update a database item, say a row, it will also read two additional columns added to every table by the DBMS – the timestamp and the version number. Before that transaction is committed, it checks if another transaction has made any change(s) to that row by confirming if the version number and timestamp are the same.

If they have changed, that means another transaction has updated that row, so the initial transaction will have to be retried.

A Simple Real-World Example of Optimistic Concurrency Control in Action

Let's illustrate how optimistic concurrency control works using a simple example involving a bank balance database table. Assume we have a table named Accounts with the following columns: AccountID, Balance, VersionNumber, and Timestamp.

Table showing AccountID, Balance, VersionNumber, and Timestamp columns

Two transactions, T1 and T2, intend to update the balance of account 12345 at the same time. T1 wants to withdraw $200, and T2 wants to deposit $300. At the end of these two transactions, the account balance should read $1100

Here are the steps of how this will work:

Start of Transactions: T1 reads the balance, version number, and timestamp for AccountID 12345. Simultaneously, T2 reads the same row with the same balance, version number, and timestamp.
Processing: T1 calculates the new balance as $800 ($1000 - $200) but does not write it back immediately. T2 calculates the new balance as $1300 ($1000 + $300) but also waits to commit.
Attempt to Commit T1: Before committing, T1 checks the current VersionNumber and Timestamp of AccountID 12345 in the database. Since no other transaction has modified the row, T1 updates the balance to $800, increments the VersionNumber to 2, updates the Timestamp, and commits successfully.
Attempt to Commit T2: T2 attempts to commit by first verifying the VersionNumber and Timestamp. T2 finds that the VersionNumber and Timestamp have changed (now VersionNumber is 2, and Timestamp is updated), indicating another transaction (T1) has updated the row. Since the version number and timestamp have changed, T2 realises there was a conflict.
Resolution for T2: T2 must restart its transaction. It re-reads the updated balance of $800, the new VersionNumber 2, and the updated Timestamp. T2 recalculates the new balance as $1100 ($800 + $300), updates the VersionNumber to 3, updates the Timestamp, and commits successfully.

Result: The Accounts table is updated sequentially and safely without any locks: After T1: $800, VersionNumber: 2. After T2: $1100, VersionNumber: 3.

Benefits and Challenges of Optimistic Concurrency Control

On the positive side, avoiding locks allows for high levels of concurrency. This is particularly beneficial in read-heavy workloads where transactions are less likely to conflict, allowing the system to handle more transactions in a given period. For example, database backups and analytical queries typically used in a data warehouse.

But in scenarios where conflicts are frequent, the cost of repeatedly rolling back and retrying transactions can outweigh the benefits of avoiding locks, making optimistic concurrency control less efficient

How Optimistic Concurrency Controls Guarantee the Repeatable Read Isolation level

The repeatable read is more strict isolation level in that it has the same guarantees as read committed isolation, plus it guarantees that reads are repeatable.

A repeatable read guarantees that if a transaction reads a row of data, any subsequent reads of that same row of data within the same transaction will yield the same result, regardless of changes made by other transactions. This consistency is maintained throughout the duration of the transaction.

How can a repeatable read be achieved? Pessimistic control using a read lock can help with this, since a transaction with a read lock on a database item will prevent that item from being updated. But this can be inefficient, since a long running read transaction can block updates from happening to that database item.

Multi-Version Concurrency Control (MVCC) is a concurrency control method used by some DBMS to allow multiple transactions to access the same data simultaneously without locking the data. This makes it a popular choice for reducing lock contention and improving the scalability of databases.

MVCC achieves this by keeping multiple versions of data objects, which helps to manage different visibility levels for transactions depending on their timestamps or version numbers.

Bringing it Together

A lock is a mechanism used to control access to a database item, like a row or table. In very simple terms, it is analogous to a reservation on a database item.

Pessimistic concurrency control assumes the worst. It assumes that conflicts are likely to happen, so locks are used to block transactions that can cause conflicts before they even get a chance to start.

In situations where conflicts are common, such as a write heavy application, this approach can prevent the overhead associated with frequent rollbacks and retries (which happens in optimistic concurrency control) by ensuring exclusive access to database items during transactions.

Optimistic concurrency control assumes the best. It assumes that conflicts are unlikely to occur, so locks are not needed to stop transactions before they start. Instead, potential conflicts are checked at the end of a transaction and if any are found, the transaction is aborted or retried.

Optimistic concurrency control is useful for read heavy transactions with infrequent writes, as it allows multiple transactions to proceed without the need to use a lock, which can be inefficient.

How to Back Up and Restore Azure SQL Databases

Alex Tray — Wed, 24 Jan 2024 22:34:09 +0000

Microsoft's Azure provides many services via a single cloud, which lets them offer one solution for multiple corporate infrastructures. Development teams often use Azure because they value the opportunity to run SQL databases in the cloud and complete simple operations via the Azure portal.

But you'll need to have a way to back up your data, as it's crucial to ensuring the functionality of the production site and the stability of everyday workflows. So creating Azure SQL backups can help you and your team avoid data loss emergencies and have the shortest possible downtime while maintaining control over the infrastructure.

Another reason to have a current Azure database backup is Microsoft’s policy. Microsoft uses the shared responsibility model, which makes the user responsible for data integrity and recovery while Microsoft only ensures the availability of its services. Microsoft directly recommends using third-party solutions to create database backups.

In case you run a local SQL Server, you'll need to prepare for the possibility of hardware failures that may result in data loss and downtime. An SQL database on Azure helps mitigate that risk, although it's still prone to human errors or cloud-specific threats like malware.

These and other threats make enabling Azure SQL database backups necessary for any organization using Microsoft’s service to manage and process data.

In this tutorial, you'll learn about backing up Azure databases and restoring your data on demand with native instruments provided by Microsoft, including methods like:

Built-in Azure database backup functionality
Cloud archiving
Secondary database and table management
Linked server
Stretch Database

Why Backup Your SQL Azure Database?

Although I covered this briefly in the intro, there are many reasons to back up your SQL Azure database data.

Disaster Recovery

Data centers can be damaged or destroyed by planned cyberattacks, random malware infiltration (check out this article to discover more on ransomware protection), and natural disasters like floods or hurricanes, among others. Backups can be used to swiftly recover data and restore operations after various disaster cases.

Data Loss Prevention

Data corruption, hardware failure, and accidental or malicious deletion lead to data loss and can threaten an organization. Backup workflows set up to run regularly mean you can quickly recover the data that was lost or corrupted.

Compliance and Regulations

Compliance requirements and legislative regulations can be severe regardless of your organization’s industry. Mostly, laws require you to keep up with security and perform regular backups for compliance.

Testing and Development

You can use backups to create Azure database copies for development, troubleshooting, or testing. Thus, you can fix, develop, or improve your organization’s workflows without involving the production environment.

How to Back Up Your Azure SQL Database

Backing up your Azure SQL database can be challenging if you go through the process without preparation. So that's why I wrote this guide – to help you be prepared. Here's what we'll cover in the following sections:

Requirements for SQL Azure database backup
How to configure database backups in Azure with native tools
Cloud archiving
Backup verification and data restoration

SQL Azure Database Backup Requirements

Before backing up your SQL Azure databases, you need to create and configure Azure storage. Before you do that, you'll need to go through the following steps:

First, open the Azure management portal and find Create a Resource.

Then, go to Storage > Storage account. Provide the information, including the location and names of a storage account and resource group according to your preferences. After you enter the information, hit Next.

Storage account config

Then go to the advanced section for additional settings. The optimal choice is to set _"_Secure transfer required" as Enabled and "Allow access" from All networks. For more resilience in case of human error, you can set "Blob soft delete" as Enabled. With that setting, you can quickly correct accidental deletions in the storage account.

After that, specify the tags you need to simplify navigating through your infrastructure.

Azure backup storage tags

Check the settings once more. If everything is configured correctly, hit Create. Your new storage account is now created.

Once the storage volume is created, it's time to configure a backup data storage container.

Go to the storage account, find Containers, then hit the + Container tab there. After that, specify a name for the new container and switch Public access level to Private (no anonymous access).

Container Azure storage account

You can then use the container as a backup storage (.bak files will be stored there in that case).

Azure Database Backup Configuration

Now, everything is set up for you to back up your SQL Azure database. Do the following to create a database backup:

First, go to SQL Management Studio, and establish a connection with the SQL server. After that, right-click the database that should be backed up. The context menu appears, so go to Tasks there. Then hit Back Up….

SQL server tasks backup

Then find the Destination tab, and set Back up to line to URL there. After that, hit New container.

Next, sign in to Azure. Pick the container you created before. Provide your credentials, then hit OK.

You’ll see a message asking you to sign in to Azure subscription. Then, choose the container and hit OK.

Now, you'll see the configured backup destination URL listed. To start the workflow to back up your Azure data, hit OK once again.

When your SQL Azure database backup is completed, the message shows up: "The backup of database ‘your database name’ completed successfully."

The backup file in the target container should now be visible from the Azure portal.

Keep in mind that, when uploading backups to any cloud storage, you may face issues if your network connection is not fast enough.

In case that’s true for you, you can reorganize your backup workflows: send backup data to a physical storage drive first, and then send another copy to the cloud. Thus, you can prevent operational challenges that might appear due to network bandwidth deficiency.

Cloud Archiving for Azure Database Backups

Databases tend to grow in volume as the organization grows. This means that the storage space required to fit the data and that data's backup increases significantly. Also, the original data volume prolongs the duration of full backup workflows, posing another challenge.

Of course, the first way to get more storage space is to revise your data regularly and erase records that are irrelevant, outdated, or unnecessary otherwise. Still, it's sometimes difficult to determine if data will be or become unnecessary or irrelevant, especially when dealing with issues of compliance.

To keep your organization compliant in any case, data archiving can help you solve two problems at once: you can ensure data accessibility on one hand, and save storage space on the other hand.

To archive your SQL database in the cloud, you should first save that database copy to an Azure blob container. Then, to move a newly created blob to the archive tier in the Azure portal, do the following:

Go to the required container where the SQL database is stored.
Choose the blob that you need to move.
Hit Change tier.

Azure blob container change tier

In the Access tier dropdown menu, choose Archive.

Azure blob change tier

Hit Save.

Additionally, the Archive storage tier is the most affordable one in Azure, meaning that you can reduce your database data TCO with it.

Secondary Database and Table Management

There exist several workflows that can help you set up Azure database backup archiving for your organization. When you need the data to stay in the initial database, for instance, creating a separate table and moving that data there can be your choice. However, the filegroup of that table should stay apart from the main database and be moved to a separate disk whenever possible.

Most probably, you’ll want to let users access the data you send to a separate table. To make that happen, you can create a view merging the relevant tables and redirect the requests to that view, not to the original table. Doing things that way, you can keep the data accessible while dealing with maintenance faster.

SQL Server Linking

If you can’t move the data to another database for internal reasons such as special Azure backup policies, you can consider maintaining your primary database accordingly.

Here, the outcome is likely to be that of the previous case, but you need to link the SQL servers or configure apps so they can send direct requests to your second server.

The downside here is that your SQL database, which was supposed to be a backup one, becomes a production database and gains appropriate importance for an organization.

There are two ways to create linked servers via SQL Server Management Studio (SSMS):

sp_addlinkedserver (Transact-SQL) system stored procedure that creates a linked server
SSMS GUI

After you've ensured that you have appropriate access rights on both server instances you need to link, the network is configured appropriately to access them, and SSMS is installed, you'll need to go through the following steps:

First, open SSMS.

Microsoft SSMS

Connect to the instance where you need to establish a linked server. Then find Object Explorer > Server Objects, then right-click Linked Servers.

Pick New Linked Server from the dropdown:

New linked server SSMS

Then configure the server properties, including name, server type, provider and product name:

Linked server configuration SSMS

Then you'll just need to complete the security configuration, set up the server options, and complete connection testing.

Original Data Deletion

When you don’t need 24/7 data availability but need the data stored due to internal policies or compliance requirements, you can choose what's probably the simplest solution to increase storage space efficiency. Just back up the data that can stay unavailable and then delete the originals from the main database. Accessing any records you may need will still be possible via the backup.

Stretch Database

Aiming to make data management of organizations’ databases simpler, Microsoft implemented a Stretch Database feature in SQL Server 2016. With this feature, you can get an SQL backup to Azure after you send the data from the hosted database to an Azure SQL database. The method enables you to increase overall infrastructure cost-efficiency by simplifying backup workflows.

To enable this workflow in your environment, develop the policy specifying the data on a hosted server to send to Azure. You don’t need to introduce any changes in applications that use the production database: SQL Server can independently get the records from the Azure SQL Database.

Azure Database Backups Verification and Restoration

During an SQL Azure database backup, you can choose to create such backups WITH CHECKSUMS or without them. When the workflow is complete, I recommend you use the following command: RESTORE VERIFYONLY. This command enables you to check the recoverability of backup files.

To access the data, you can restore records from a backup to a different database. With Azure Automation scripts on backups, you can accelerate the restoration process, thus minimizing downtime and increasing the overall resilience of your Azure infrastructure.

You need to follow only a few steps to restore an Azure SQL database to a required recovery point from a backup. Still, keep in mind that your subscription can define the available retention period which can vary from 7 to 35 days. A native tool for backup restoration to SQL servers is Server Management Studio.

To Conclude

The critical nature of Azure SQL database data makes Azure SQL backups obligatory for any organization that uses this Microsoft solution. In this guide, we reviewed the process of creating SQL Azure database backup using native Microsoft tools.

These tools provide data backup, backup verification, and recovery functionality along with some automation.

You can also implement a specialized all-in-one data protection solution, such as NAKIVO, the company where I work. It can help you make your data backup workflows more efficient.

ACID Databases – Atomicity, Consistency, Isolation & Durability Explained

Daniel Adetunji — Wed, 17 Jan 2024 17:45:53 +0000

ACID stands for Atomicity, Consistency, Isolation and Durability. These are four key properties that most database management systems (DBMS) offer as guarantees when handling transactions.

Most popular DBMS like MySQL, PostgresSQL and Oracle have ACID guarantees out of the box. Others have partial ACID guarantees like Redis, DynamoDB, and Cassandra. The trend, however, seems to be that more and more DBMS are offering ACID compliance.

It is important to note that while a lot of DBMS may say they are ACID compliant, the implementation of this compliance can vary.

So, for example, if isolation is a key property that you need for an application you are building, you need to understand how exactly your chosen DBMS implements isolation.

This article will explain what transactions are, and go through, in detail, what atomicity, consistency, isolation and durability mean, using analogies and real world examples.

What are Transactions?

Lots of things can go wrong when using a database:

the database hardware or software can fail
the application calling the database can fail mid-operation
the network can be flooded with more traffic can it can handle (rendering it inoperable)
several clients can make writes at the same time that overwrite the other’s changes
clients can read phantom data that should not be in the database

And so on – this is in no way an exhaustive list of things that can go wrong.

Since things can fail in more ways than we can possibly anticipate, trying to prevent every possible failure can become unnecessarily expensive and complicated. Instead, it is better to design a system that can continue to operate in spite of a failure. Transactions allow us to do this.

Transactions serve a single purpose: they make sure a system is fault tolerant. If a failure in a system occurs, can the system continue to operate without complete catastrophe? Phrased differently, can the system tolerate faults? An answer of ‘yes’ to this question means that such a system is fault tolerant.

So, what exactly is a transaction?

Not this kind of transaction

A transaction is an abstraction. It is a collection of operations (reads and writes) that are treated as a single logical operation.

Imagine you want to buy a single book from an online store, say amazon.com. The steps below show a simplified view of what needs to happen:

First, you select the book, which adds the item to your basket.
The inventory quantity of the book is checked to ensure it is valid (that is, the inventory value for the title you are buying needs to be greater than 0).
You click ‘buy’, which updates Amazon’s inventory for the book and decreases it by 1 (since you are buying a single book).
Also, your bank account balance is updated to account for the cost of the book.

A transaction ensures that all operations related to the purchase are treated as a single operation. If any part of the transaction fails, the entire transaction is rolled back, leaving the database in a state as if the customer had never attempted the purchase, thus maintaining the integrity of the data.

The transaction is committed when all the operations within the transaction are successfully completed and their results are permanently recorded. This permanence is typically achieved by writing the changes to the database's storage, which could be on disk for traditional databases or in memory for in-memory databases like Redis.

By treating all of these different operations as a single logical operation, the database is able to offer some guarantees as to how it can be fault tolerant. These guarantees are atomicity, consistency, isolation and durability.

What Does Atomicity Mean?

Atomicity simply means that all queries in a transaction must succeed for the transaction to succeed. If one query fails, the entire transaction fails.

An Atomic Restaurant

Imagine using a self-service machine at a fast-food restaurant. The transaction in this case is ordering food, and consists of two separate operations:

Select food
Make payment

Both of these must succeed for the transaction to succeed. If either fails, the transaction fails.

Customer making an order in an "atomic" restaurant

You select your burger, fries, and a drink from the touchscreen menu. The machine prompts you to pay, and only after your payment is processed successfully, it sends your order to the kitchen. Moments later, your entire order is ready, and you pick it up from the counter.

This is an atomic operation: the transaction (ordering food) is either entirely completed (if you select your food item and make a payment) or not completed at all.

Either part of the transaction failing means the entire transaction will fail. If your payment fails, the machine won't process any part of the order, so the transaction fails. If you make a payment without selecting a food item, the transaction also fails, as there is nothing for the kitchen to prepare.

A Non-Atomic Restaurant

Now consider the alternative, a traditional sit-down restaurant where you order several dishes. As each dish is prepared, it is brought to your table.

customer making an order in a "non-atomic" restaurant

Again, the transaction is ordering food, and consists of two separate operations:

Select food
Make payment

In this non-atomic restaurant, failure to make a payment does not stop the transaction from completing, since you pay after you have finished your meal. Partial failures do not cause a transaction to fail.

This creates a risk for the restaurant. Customers that choose to dine and dash can order food to their heart’s delight and then simply leave without paying, causing a financial loss for the restaurant.

non-atomic restaurants are at risk of customers doing a dine and dash

Atomic Transactions

If several SQL queries are grouped together in a transaction, atomicity is a guarantee that, should any of the queries fail for any reason (hardware, application or networking problems) then the transaction is aborted and the database returns to its previous state, as if nothing had happened.

Without atomicity, if a failure occurs while some queries are running, it is difficult to know which queries have been committed (that is, completed) and which have not. Running the queries again after a failure can compound the problem, since you risk introducing incorrect data to the database by re-running queries that previously succeeded.

Atomic transactions prevent such uncertainty, since you know that if the previous transaction failed, it failed in its entirety, and you can simply retry without worrying about introducing inconsistent data.

What Does Consistency Mean?

Consistency can mean different things in cloud/software engineering, depending on the context. In the case of ACID, the “C” was most likely added to make the acronym work.

Consistency in the context of ACID means consistency in data, which is defined by the creator of the database. The technical term for consistency in data is called referential integrity. Referential integrity is a method of ensuring that relationships between tables remain consistent. It's usually enforced through the use of foreign keys.

To understand referential integrity, consider the following.

Imagine a library system with two types of cards: a book card and a borrower's card.

The book card lists all the books available in the library.
The borrower's card tracks which books are borrowed by which members.

A book card and borrower card for a library

The rule of the library is that a book can only be listed on a borrower's card if it exists on a book card. This is referential integrity. If someone tries to list a book on a borrower's card that isn't on the book card (that is a book that doesn’t exist in the library), the system will not allow it.

While atomicity, isolation and durability are properties intrinsic to the database itself, consistency in data, or referential integrity, is not a property intrinsic to the database.

Consistency is defined by the creator of the database. The application calling the database relies on the atomicity and isolation properties of the database to maintain that consistency.

What Does Isolation Mean?

Isolation is a guarantee that concurrently running transactions should not interfere with each other. Concurrency here refers to two or more transactions trying to modify or read the same database record(s) at the same time.

There are three levels of transaction isolation. I'll just explain the two main ones below, arranged in order from the least strict to most strict.

Read Committed

This gives two guarantees. It prevents dirty reads and dirty writes.

No Dirty Reads: Reading data from another transaction that has not yet been committed is called a dirty read. With the read committed isolation level, you will only see data that has been committed by another transaction.

No Dirty Writes: Overwriting data that has already been written by another transaction but not yet committed is called a dirty write.

To understand how read committed isolation works, consider the following example.

Imagine a fast-food restaurant with only one last special burger available, and two hungry customers, Marie and Marko, are trying to buy it simultaneously.

Two customers ordering a burger at the same time

Marie checks the availability of burgers and sees the last one available. Unknown to her, Marko’s order is being processed but hasn't been finalised in the system, as he has not paid. Since his order has not yet been finalised, Marie is not aware that his order conflicts with her own. This is similar to a transaction reading the most recently committed data, where it does not see uncommitted changes (like Marko's pending order).
Marie places an order based on this incomplete information, thinking a burger is available.
Once Marko pays, the system updates to show that there are no burgers left. This is similar to a transaction being committed
Marie’s order will have to be aborted since there are no burgers left.

The key point here is step #3. What if Marko’s payment failed at this stage? Then the transaction will not be committed and there would still be a burger available for Marie.

In this example, read committed isolation ensures that Marie is not prematurely excluded from buying the burger just because someone else said they wanted it. Only committed transactions can be read. Therefore, the burger is available to be ordered as long as no one has paid for it.

Repeatable Read

The repeatable read is a more strict isolation level, in that it has the same guarantees as read committed isolation – plus it guarantees that reads are repeatable.

When a transaction reads the same data twice, but sees a different value in each read because a committed transaction has updated the value between the two reads, this is called a fuzzy read. The repeatable read isolation level prevents fuzzy reads.

Fuzzy reads are neither inherently good nor bad. It all depends on what you are trying to achieve.

Fuzzy reads are bad for long-running, read-only transactions, since new writes are likely to occur during the transaction and this can cause inconsistencies in the data. Examples of long running, read-only transactions are a database backup and analytical queries typically used in a data warehouse.

Repeatable reads are usually implemented by the DBMS by reading from a snapshot of the database which remains unchanged for the duration of the transaction, thereby ignoring any new committed writes in that period.

What Does Durability Mean?

Durability is a guarantee that changes made by a committed transaction must not be lost. All committed transactions must be persisted on durable, non-volatile storage, that is on disk. This ensures that any committed transactions are protected even if the database crashes.

Naturally, durability cannot protect against destruction of the disk which stores the data. Additional redundancy can be added by having backups of your database stored separately from the original.

Bringing it Together

ACID (Atomicity, Consistency, Isolation, and Durability) provides a set of guarantees when working with a DBMS. While most relational DBMS are ACID compliant, the implementation of this compliance can vary.

Atomicity ensures that all parts of a transaction are completed or none at all. Partial failures are not allowed.

Consistency, or referential integrity, ensures that data remains accurate and reliable, adhering to predefined rules. Unlike the other priorities, consistency is not intrinsic to the DBMS itself. Instead, the application calling the database relies on the atomicity and isolation properties of the database to maintain consistency.

Isolation is a guarantee that concurrently running transactions should not interfere with each other. This is arguably the most important property because a DBMS can often have different default isolation levels, which may need to be changed based on what is needed for your application.

Finally, durability is a guarantee that changes made by a committed transaction must not be lost.

The SQL Handbook – A Free Course for Web Developers

Lane Wagner — Tue, 05 Sep 2023 13:57:37 +0000

SQL is everywhere these days. Whether you're learning backend development, data engineering, DevOps, or data science, SQL is a skill you'll want in your toolbelt.

This a free and open text-based handbook. If you want to get started, just scroll down and start reading. That said, there are two other options for following along:

Try the interactive version of this SQL course on Boot.dev, complete with coding challenges and projects
Watch the video walkthrough of this course on FreeCodeCamp's YouTube channel (embedded below):

Chapter 1: Introduction
Chapter 2: SQL Tables
Chapter 3: Constraints
Chapter 4: CRUD Operations
Chapter 5: Basic SQL Queries
Chapter 6: How to Structure Return Data in SQL
Chapter 7: How to Perform Aggregations in SQL
Chapter 8: SQL Subqueries
Chapter 9: Database Normalization
Chapter 10: How to Join Tables in SQL
Chapter 11: Database Performance

Chapter 1: Introduction

Structured Query Language, or SQL, is the primary programming language used to manage and interact with relational databases. SQL can perform various operations such as creating, updating, reading, and deleting records within a database.

What is a SQL Select Statement?

Let's write our own SQL statement from scratch. A SELECT statement is the most common operation in SQL – often called a "query". SELECT retrieves data from one or more tables. Standard SELECT statements do not alter the state of the database.

SELECT id from users;

How to select a single field

A SELECT statement begins with the keyword SELECT followed by the fields you want to retrieve.

SELECT id from users;

How to select multiple fields

If you want to select more than one field, you can specify multiple fields separated by commas like this:

SELECT id, name from users;

How to select all fields

If you want to select every field in a record, you can use the shorthand * syntax.

SELECT * from users;

After specifying fields, you need to indicate which table you want to pull the records from using the from statement followed by the name of the table.

We'll talk more about tables later, but for now, you can think about them like structs or objects. For example, the users table might have 3 fields:

id
name
balance

And finally, all statements end with a semi-colon ;.

Which Databases Use SQL?

SQL is just a query language. You typically use it to interact with a specific database technology. For example:

And others.

Although many different databases use the SQL language, most of them will have their own dialect. It's critical to understand that not all databases are created equal. Just because one SQL-compatible database does things a certain way, doesn't mean every SQL-compatible database will follow those exact same patterns.

We're using SQLite

In this course, we'll be using SQLite specifically. SQLite is great for embedded projects, web browsers, and toy projects. It's lightweight, but has limited functionality compared to the likes of PostgreSQL or MySQL – two of the more common production SQL technologies.

And I'll make sure to point out to you whenever some functionality we're working with is unique to SQLite.

NoSQL vs SQL

When talking about SQL databases, we also have to mention the elephant in the room: NoSQL.

To put it simply, a NoSQL database is a database that does not use SQL (Structured Query Language). Each NoSQL typically has its own way of writing and executing queries. For example, MongoDB uses MQL (MongoDB Query Language) and ElasticSearch simply has a JSON API.

While most relational databases are fairly similar, NoSQL databases tend to be fairly unique and are used for more niche purposes. Some of the main differences between a SQL and NoSQL database are:

NoSQL databases are usually non-relational, SQL databases are usually relational (we'll talk more about what this means later).
SQL databases usually have a defined schema, NoSQL databases usually have dynamic schema.
SQL databases are table-based, NoSQL databases have a variety of different storage methods, such as document, key-value, graph, wide-column, and more.

Types of NoSQL databases

A few of the most popular NoSQL databases are:

Comparing SQL Databases

Let's dive deeper and talk about some of the popular SQL Databases and what makes them different from one another. Some of the most popular SQL Databases right now are:

Source: db-engines.com

While all of these Databases use SQL, each database defines specific rules, practices, and strategies that separate them from their competitors.

SQLite vs PostgreSQL

Personally, SQLite and PostgreSQL are my favorites from the list above. Postgres is a very powerful, open-source, production-ready SQL database. SQLite is a lightweight, embeddable, open-source database. I usually choose one of these technologies if I'm doing SQL work.

SQLite is a serverless database management system (DBMS) that has the ability to run within applications, whereas PostgreSQL uses a Client-Server model and requires a server to be installed and listening on a network, similar to an HTTP server.

See a full comparison here.

Again, in this course we will be working with SQLite, a lightweight and simple database. For most backend web servers, PostgreSQL is a more production-ready option, but SQLite is great for learning and for small systems.

Chapter 2: SQL Tables

The CREATE TABLE statement is used to create a new table in a database.

How to use the `CREATE TABLE` statement

To create a table, use the CREATE TABLE statement followed by the name of the table and the fields you want in the table.

CREATE TABLE employees (id INTEGER, name TEXT, age INTEGER, is_manager BOOLEAN, salary INTEGER);

Each field name is followed by its datatype. We'll get to data types in a minute.

It's also acceptable and common to break up the CREATE TABLE statement with some whitespace like this:

CREATE TABLE employees(
    id INTEGER,
    name TEXT,
    age INTEGER,
    is_manager BOOLEAN,
    salary INTEGER
);

How to Alter Tables

We often need to alter our database schema without deleting it and re-creating it. Imagine if Twitter deleted its database each time it needed to add a feature, that would be a disaster! Your account and all your tweets would be wiped out on a daily basis.

Instead, we can use use the ALTER TABLE statement to make changes in place without deleting any data.

How to use `ALTER TABLE`

With SQLite an ALTER TABLE statement allows you to:

Rename a table or column, which you can do like this:

ALTER TABLE employees
RENAME TO contractors;

ALTER TABLE contractors
RENAME COLUMN salary TO invoice;

ADD or DROP a column, which you can do like this:

ALTER TABLE contractors
ADD COLUMN job_title TEXT;

ALTER TABLE contractors
DROP COLUMN is_manager;

Intro to Migrations

A database migration is a set of changes to a relational database. In fact, the ALTER TABLE statements we did in the last exercise were examples of migrations.

Migrations are helpful when transitioning from one state to another, fixing mistakes, or adapting a database to changes.

Good migrations are small, incremental and ideally reversible changes to a database. As you can imagine, when working with large databases, making changes can be scary. We have to be careful when writing database migrations so that we don't break any systems that depend on the old database schema.

Example of a bad migration

If a backend server periodically runs a query like SELECT * FROM people, and we execute a database migration that alters the table name from people to users without updating the code, the application will break. It will try to grab data from a table that no longer exists.

A simple solution to this problem would be to deploy new code that uses a new query:

SELECT * FROM users;

And we would deploy that code to production immediately following the migration.

SQL Data Types

SQL as a language can support many different data types. But the datatypes that your database management system (DBMS) supports will vary depending on the specific database you're using.

SQLite only supports the most basic types, and we're using SQLite in this course.

SQLite Data Types

Let's go over the data types supported by SQLite: and how they are stored.

NULL - Null value.
INTEGER - A signed integer stored in 0,1,2,3,4,6, or 8 bytes.
REAL - Floating point value stored as an 64-bit IEEE floating point number.
TEXT - Text string stored using database encoding such as UTF-8
BLOB - Short for Binary large object and typically used for images, audio or other multimedia.

For example:

CREATE TABLE employees (
    id INTEGER,
    name TEXT,
    age INTEGER,
    is_manager BOOLEAN,
    salary INTEGER
);

Boolean values

It's important to note that SQLite does not have a separate BOOLEAN storage class. Instead, boolean values are stored as integers:

0 = false
1 = true

It's not actually all that weird – boolean values are just binary bits after all!

SQLite will still let you write your queries using boolean expressions and true/false keywords, but it will convert the booleans to integers under-the-hood.

Chapter 3: Constraints

A constraint is a rule we create on a database that enforces some specific behavior. For example, setting a NOT NULL constraint on a column ensures that the column will not accept NULL values.

If we try to insert a NULL value into a column with the NOT NULL constraint, the insert will fail with an error message. Constraints are extremely useful when we need to ensure that certain kinds of data exist within our database.

NOT NULL constraint

The NOT NULL constraint can be added directly to the CREATE TABLE statement.

CREATE TABLE employees(
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE,
    title TEXT NOT NULL
);

SQLite limitation

In other dialects of SQL you can ADD CONSTRAINT within an ALTER TABLE statement. SQLite does not support this feature, so when we create our tables we need to make sure we specify all the constraints we want.

Here's a list of SQL Features SQLite does not implement in case you're curious.

Primary Key Constraints

A key defines and protects relationships between tables. A primary key is a special column that uniquely identifies records within a table. Each table can have one, and only one primary key.

Your primary key will almost always be the "id" column

It's very common to have a column named id on each table in a database, and that id is the primary key for that table. No two rows in that table can share an id.

A PRIMARY KEY constraint can be explicitly specified on a column to ensure uniqueness, rejecting any inserts where you attempt to create a duplicate ID.

Foreign Key Constraints

Foreign keys are what makes relational databases relational! Foreign keys define the relationships between tables. Simply put, a FOREIGN KEY is a field in one table that references another table's PRIMARY KEY.

Creating a Foreign Key in SQLite

Creating a FOREIGN KEY in SQLite happens at table creation! After we define the table fields and constraints we add an additional CONSTRAINT where we define the FOREIGN KEY and its REFERENCES.

Here's an example:

CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    department_name TEXT NOT NULL
);

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department_id INTEGER,
    CONSTRAINT fk_departments
    FOREIGN KEY (department_id)
    REFERENCES departments(id)
);

In this example, an employee has a department_id. The department_id must be the same as the id field of a record from the departments table.

Schema

We've used the word schema a few times now, let's talk about what that word means. A database's schema describes how data is organized within it.

Data types, table names, field names, constraints, and the relationships between all of those entities are part of a database's schema.

There is no perfect way to architect a database schema

When designing a database schema there typically isn't a "correct" solution. We do our best to choose a sane set of tables, fields, constraints, etc that will accomplish our project's goals. Like many things in programming, different schema designs come with different tradeoffs.

How do we decide on a sane schema architecture?

One very important decision that needs to be made is to decide which table will store a user's balance! As you can imagine, ensuring our data is accurate when dealing with money is super important. We want to be able to:

Keep track of a user's current balance
See the historical balance at any point in the past
See a log of which transactions changed the balance over time

There are many ways to approach this problem. For our first attempt, let's try the simplest schema that fulfills our project's needs.

Chapter 4: CRUD Operations in SQL

What is CRUD?

CRUD is an acronym that stands for CREATE, READ, UPDATE, and DELETE. These four operations are the bread and butter of nearly every database you will create.

HTTP and CRUD

The CRUD operations correlate nicely with the HTTP methods you may have already learned:

HTTP POST - CREATE
HTTP GET - READ
HTTP PUT - UPDATE
HTTP DELETE - DELETE

SQL Insert Statement

Tables are pretty useless without data in them. In SQL we can add records to a table using an INSERT INTO statement. When using an INSERT statement we must first specify the table we are inserting the record into, followed by the fields within that table we want to add VALUES to.

Here's an example of an INSERT INTO statement:

INSERT INTO employees(id, name, title)
VALUES (1, 'Allan', 'Engineer');

HTTP CRUD Database lifecycle

It's important to understand how data flows through a typical web application.

The front-end processes some data from user input - maybe a form is submitted.
The front-end sends that data to the server through an HTTP request - maybe a POST.
The server makes a SQL query to it's database to create an associated record - Probably using an INSERT statement.
Once the server has processed that the database query was successful, it responds to the front-end with a status code! Hopefully a 200-level code (success)!

Manual Entry

Manually INSERTing every single record in a database would be an extremely time-consuming task! Working with raw SQL as we are now is not super common when designing backend systems.

When working with SQL within a software system, like a backend web application, you'll typically have access to a programming language such as Go or Python.

For example, a backend server written in Go can use string concatenation to dynamically create SQL statements, and that's usually how it's done.

sqlQuery := fmt.Sprintf(`
INSERT INTO users(name, age, country_code)
VALUES ('%s', %v, %s);
`, user.Name, user.Age, user.CountryCode)

SQL Injection

The example above is an oversimplification of what really happens when you access a database using Go code. In essence, it's correct. String interpolation is how production systems access databases. That said, it must be done carefully to not be a security vulnerability. We'll talk more about that later!

Count

We can use a SELECT statement to get a count of the records within a table. This can be very useful when we need to know how many records there are, but we don't particularly care what's in them.

Here's an example in SQLite:

SELECT count(*) from employees;

The * in this case refers to a column name. We don't care about the count of a specific column - we want to know the number of total records so we can use the wildcard (*).

HTTP CRUD database lifecycle

We talked about how a "create" operation flows through a web application. Let's talk about a "read".

Let's talk through an example. Our product manager wants to show profile data on a user's settings page. Here's how we could engineer that feature request:

First, the front-end webpage loads.
The front-end sends an HTTP GET request to a /users endpoint on the back-end server.
The server receives the request.
The server uses a SELECT statement to retrieve the user's record from the users table in the database.
The server converts the row of SQL data into a JSON object and sends it back to the front-end.

WHERE clause

In order to keep learning about CRUD operations in SQL, we need to learn how to make the instructions we send to the database more specific. SQL accepts a WHERE statement within a query that allows us to be very specific with our instructions.

If we were unable to specify the specific record we wanted to READ, UPDATE, or DELETE making queries to a database would be very frustrating, and very inefficient.

Using a WHERE clause

Say we had over 9000 records in our users table. We often want to look at specific user data within that table without retrieving all the other records in the table. We can use a SELECT statement followed by a WHERE clause to specify which records to retrieve. The SELECT statement stays the same, we just add the WHERE clause to the end of the SELECT.

Here's an example:

SELECT name FROM users WHERE power_level >= 9000;

This will select only the name field of any user within the users table WHERE the power_level field is greater than or equal to 9000.

Finding NULL values

You can use a WHERE clause to filter values by whether or not they're NULL.

IS NULL

SELECT name FROM users WHERE first_name IS NULL;

IS NOT NULL

SELECT name FROM users WHERE first_name IS NOT NULL;

DELETE

When a user deletes their account on Twitter, or deletes a comment on a YouTube video, that data needs to be removed from its respective database.

DELETE statement

A DELETE statement removes a record from a table that match the WHERE clause. As an example:

DELETE from employees
    WHERE id = 251;

This DELETE statement removes all records from the employees table that have an id of 251!

The danger of deleting data

Deleting data can be a dangerous operation. Once removed, data can be really hard if not impossible to restore! Let's talk about a couple of common ways back-end engineers protect against losing valuable customer data.

Strategy 1 - Backups

If you're using a cloud-service like GCP's Cloud SQL or AWS's RDS you should always turn on automated backups. They take an automatic snapshot of your entire database on some interval, and keep it around for some length of time.

For example, the Boot.dev database has a backup snapshot taken daily and we retain those backups for 30 days. If I ever accidentally run a query that deletes valuable data, I can restore it from the backup.

You should have a backup strategy for production databases.

Strategy 2 - Soft deletes

A "soft delete" is when you don't actually delete data from your database, but instead just "mark" the data as deleted.

For example, you might set a deleted_at date on the row you want to delete. Then, in your queries you ignore anything that has a deleted_at date set. The idea is that this allows your application to behave as if it's deleting data, but you can always go back and restore any data that's been removed.

You should probably only soft-delete if you have a specific reason to do so. Automated backups should be "good enough" for most applications that are just interested in protecting against developer mistakes.

Update query in SQL

Whenever you update your profile picture or change your password online, you are changing the data in a field on a table in a database. Imagine if every time you accidentally messed up a Tweet on Twitter you had to delete the entire tweet and post a new one instead of just editing it...

...Well, that's a bad example.

Update statement

The UPDATE statement in SQL allows us to update the fields of a record. We can even update many records depending on how we write the statement.

An UPDATE statement specifies the table that needs to be updated, followed by the fields and their new values by using the SET keyword. Lastly a WHERE clause indicates the record(s) to update.

UPDATE employees
SET job_title = 'Backend Engineer', salary = 150000
WHERE id = 251;

Object-Relational Mapping (ORMs)

An Object-Relational Mapping or an ORM for short, is a tool that allows you to perform CRUD operations on a database using a traditional programming language. These typically come in the form of a library or framework that you would use in your backend code.

The primary benefit an ORM provides is that it maps your database records to in-memory objects. For example, in Go we might have a struct that we use in our code:

type User struct {
    ID int
    Name string
    IsAdmin bool
}

This struct definition conveniently represents a database table called users, and an instance of the struct represents a row in the table.

Example: Using an ORM

Using an ORM we might be able to write simple code like this:

user := User{
    ID: 10,
    Name: "Lane",
    IsAdmin: false,
}

// generates a SQL statement and runs it,
// creating a new record in the users table
db.Create(user)

Example: Using straight SQL

Using straight SQL we might have to do something a bit more manual:

user := User{
    ID: 10,
    Name: "Lane",
    IsAdmin: false,
}

db.Exec("INSERT INTO users (id, name, is_admin) VALUES (?, ?, ?);",
    user.ID, user.Name, user.IsAdmin)

Should you use an ORM?

That depends – an ORM typically trades simplicity for control.

Using straight SQL you can take full advantage of the power of the SQL language. Using an ORM, you're limited by whatever functionality the ORM has.

If you run into issues with a specific query, it can be harder to debug with an ORM because you have to dig through the framework's code and documentation to figure out how the underlying queries are being generated.

I recommend doing projects both ways so that you can learn about the tradeoffs. At the end of the day, when you're working on a team of developers, it will be a team decision.

Chapter 5: Basic SQL Queries

How to use the `AS` Clause in SQL

Sometimes we need to structure the data we return from our queries in a specific way. An AS clause allows us to "alias" a piece of data in our query. The alias only exists for the duration of the query.

`AS` keyword

The following queries return the same data:

SELECT employee_id AS id, employee_name AS name
FROM employees;

and:

SELECT employee_id, employee_name
FROM employees;

The difference is that the results from the aliased query would have column names id and name instead of employee_id and employee_name.

SQL Functions

At the end of the day, SQL is a programming language, and it's one that supports functions. We can use functions and aliases to calculate new columns in a query. This is similar to how you might use formulas in Excel.

IIF function

In SQLite, the IIF function works like a ternary. For example:

IIF(carA > carB, "Car a is bigger", "Car b is bigger")

If a is greater than b, this statement evaluates to the string "Car a is bigger". Otherwise, it evaluates to "Car b is bigger".

Here's how we can use IIF() and a directive alias to add a new calculated column to our result set:

SELECT quantity,
    IIF(quantity < 10, "Order more", "In Stock") AS directive
    from products

How to Use `BETWEEN` with `WHERE`

We can check if certain values are between two numbers using the WHERE clause in an intuitive way. The WHERE clause doesn't always have to be used to specify specific id's or values. We can also use it to help narrow down our result set. Here's an example:

SELECT employee_name, salary
FROM employees
WHERE salary BETWEEN 30000 and 60000;

This query returns all the employees name and salary fields for any rows where the salary is BETWEEN 30,000 and 60,000. We can also query results that are NOT BETWEEN two specified values.

SELECT product_name, quantity
FROM products
WHERE quantity NOT BETWEEN 20 and 100;

This query returns all the product names where the quantity was not between 20 and 100. We can use conditionals to make the results of our query as specific as we need them to be.

How to return distinct values

Sometimes we want to retrieve records from a table without getting back any duplicates.

For example, we may want to know all the different companies our employees have worked at previously, but we don't want to see the same company multiple times in the report.

`SELECT DISTINCT`

SQL offers us the DISTINCT keyword that removes duplicate records from the resulting query.

SELECT DISTINCT previous_company
    FROM employees;

This only returns one row for each unique previous_company value.

Logical Operators

We often need to use multiple conditions to retrieve the exact information we want. We can begin to structure much more complex queries by using multiple conditions together to narrow down the search results of our query.

The logical AND operator can be used to narrow down our result sets even more.

`AND` operator

SELECT product_name, quantity, shipment_status
    FROM products
    WHERE shipment_status = 'pending'
    AND quantity BETWEEN 0 and 10;

This only retrieves records where both the shipment_status is "pending" AND the quantity is between 0 and 10.

Equality operators

All of the following operators are supported in SQL. The = is the main one to watch out for, it's not == like in many other languages.

=
<
>
<=
>=

For example, in Python you might compare two values like this:

if name == "age"

Whereas in SQL you would do:

WHERE name = "age"

`OR` operator

As you've probably guessed, if the logical AND operator is supported, the OR operator is probably supported as well.

SELECT product_name, quantity, shipment_status
    FROM products
    WHERE shipment_status = 'out of stock'
    OR quantity BETWEEN 10 and 100;

This query retrieves records where either the shipment_status condition OR the quantity condition are met.

Order of operations matter when using these operators.

You can group logical operations with parentheses to specify the order of operations.

(this AND that) OR the_other

The `IN` operator

Another variation to the WHERE clause we can utilize is the IN operator. IN returns true or false if the first operand matches any of the values in the second operand. The IN operator is a shorthand for multiple OR conditions.

These two queries are equivalent:

SELECT product_name, shipment_status
    FROM products
    WHERE shipment_status IN ('shipped', 'preparing', 'out of stock');

SELECT product_name, shipment_status
    FROM products
    WHERE shipment_status = 'shipped'
        OR shipment_status = 'preparing'
        OR shipment_status = 'out of stock';

Hopefully, you're starting to see how querying specific data using fine-tuned SQL clauses helps reveal important insights. The larger a table becomes the harder it becomes to analyze without proper queries.

The `LIKE` keyword

Sometimes we don't have the luxury of knowing exactly what it is we need to query. Have you ever wanted to look up a song or a video but you only remember part of the name? SQL provides us an option for when we're in situations LIKE this.

The LIKE keyword allows for the use of the % and _ wildcard operators. Let's focus on % first.

`%` Operator

The % operator will match zero or more characters. We can use this operator within our query string to find more than just exact matches depending on where we place it.

Here are some examples that show how these work:

Product starts with "banana":

SELECT * FROM products
WHERE product_name LIKE 'banana%';

Product ends with "banana":

SELECT * from products
WHERE product_name LIKE '%banana';

Product contains "banana":

SELECT * from products
WHERE product_name LIKE '%banana%';

Underscore Operator

As discussed, the % wildcard operator matches zero or more characters. Meanwhile, the _ wildcard operator only matches a single character.

SELECT * FROM products
    WHERE product_name LIKE '_oot';

The query above matches products like:

boot
root
foot

SELECT * FROM products
    WHERE product_name LIKE '__oot';

The query above matches products like:

shoot
groot

Chapter 6: How to Structure Return Data in SQL

The `LIMIT` keyword

Sometimes we don't want to retrieve every record from a table. For example, it's common for a production database table to have millions of rows, and SELECTing all of them might crash your system. This is where the LIMIT keyword enters the chat.

The LIMIT keyword can be used at the end of a select statement to reduce the number of records returned.

SELECT * FROM products
    WHERE product_name LIKE '%berry%'
    LIMIT 50;

The query above retrieves all the records from the products table where the name contains the word berry. If we ran this query on the Facebook database, it would almost certainly return a lot of records.

The LIMIT statement only allows the database to return up to 50 records matching the query. This means that if there aren't that many records matching the query, the LIMIT statement will not have an effect.

The SQL `ORDER BY` keyword

SQL also offers us the ability to sort the results of a query using ORDER BY. By default, the ORDER BY keyword sorts records by the given field in ascending order, or ASC for short. However, ORDER BY does support descending order as well with the keyword DESC.

Examples

This query returns the name, price, and quantity fields from the products table sorted by price in ascending order:

SELECT name, price, quantity FROM products
    ORDER BY price;

This query returns the name, price, and quantity of the products ordered by the quantity in descending order:

SELECT name, price, quantity FROM products
    ORDER BY quantity desc;

Order By and Limit

When using both ORDER BY and LIMIT, the ORDER BY clause must come first.

Chapter 7: How to Perform Aggregations in SQL

An "aggregation" is a single value that's derived by combining several other values. We performed an aggregation earlier when we used the count statement to count the number of records in a table.

Why use aggregations?

Data stored in a database should generally be stored raw. When we need to calculate some additional data from the raw data, we can use an aggregation.

Take the following count aggregation as an example:

SELECT COUNT(*)
FROM products
WHERE quantity = 0;

This query returns the number of products that have a quantity of 0. We could store a count of the products in a separate database table, and increment/decrement it whenever we make changes to the products table - but that would be redundant.

It's much simpler to store the products in a single place (we call this a single source of truth) and run an aggregation when we need to derive additional information from the raw data.

The `SUM` function

The sum aggregation function returns the sum of a set of values.

For example, the query below returns a single record containing a single field. The returned value is equal to the total salary being collected by all of the employees in the employees table.

SELECT sum(salary)
FROM employees;

Which returns:

SUM(SALARY)
2483

The `MAX` function

As you may expect, the max function retrieves the largest value from a set of values. For example:

SELECT max(price)
FROM products

This query looks through all of the prices in the products table and returns the price with the largest price value. Remember it only returns the price, not the rest of the record. You always need to specify each field you want a query to return.

A note on schema

The sender_id will be present for any transactions where the user in question (user_id) is receiving money (from the sender).
The recipient_id will be present for any transactions where the user in question (user_id) is sending money (to the recipient).

In other words, a transaction can only have a sender_id or a recipient_id - not both. The presence of one or the other indicates whether money is going into or out of the user's account.

This user_id, recipient_id, sender_id schema we've designed is only one way to design a transactions database - there are other valid ways to do it. It's the one we're using, and later we'll talk more about the tradeoffs in different database design options.

The `MIN` function

The min function works the same as the max function but finds the lowest value instead of the highest value.

SELECT product_name, min(price)
from products;

This query returns the product_name and the price fields of the record with the lowest price.

The `GROUP BY` clause

There are times we need to group data based on specific values.

SQL offers the GROUP BY clause which can group rows that have similar values into "summary" rows. It returns one row for each group. The interesting part is that each group can have an aggregate function applied to it that operates only on the grouped data.

Example of `GROUP BY`

Imagine that we have a database with songs and albums, and we want to see how many songs are on each album. We can use a query like this:

SELECT album_id, count(song_id)
FROM songs
GROUP BY album_id;

This query retrieves a count of all the songs on each album. One record is returned per album, and they each have their own count.

The `AVG()` function

Just like we may want to find the minimum or maximum values within a dataset, sometimes we need to know the average!

SQL offers us the AVG() function. Similar to MAX(), AVG() calculates the average of all non-NULL values.

select song_name, avg(song_length)
from songs

This query returns the average song_length in the songs table.

The `HAVING` clause

When we need to filter the results of a GROUP BY query even further, we can use the HAVING clause. The HAVING clause specifies a search condition for a group.

The HAVING clause is similar to the WHERE clause, but it operates on groups after they've been grouped, rather than rows before they've been grouped.

SELECT album_id, count(id) as count
FROM songs
GROUP BY album_id
HAVING count > 5;

This query returns the album_id and count of its songs, but only for albums with more than 5 songs.

`HAVING` vs `WHERE` in SQL

It's fairly common for developers to get confused about the difference between the HAVING and the WHERE clauses - they're pretty similar after all.

The difference is fairly simple in actuality:

A WHERE condition is applied to all the data in a query before it's grouped by a GROUP BY clause.
A HAVING condition is only applied to the grouped rows that are returned after a GROUP BY is applied.

This means that if you want to filter on the result of an aggregation, you need to use HAVING. If you want to filter on a value that's present in the raw data, you should use a simple WHERE clause.

The `ROUND` function

Sometimes we need to round some numbers, particularly when working with the results of an aggregation. We can use the ROUND() function to get the job done.

The SQL round() function allows you to specify both the value you wish to round and the precision to which you wish to round it:

round(value, precision)

If no precision is given, SQL will round the value to the nearest whole value:

select song_name, round(avg(song_length), 1)
from songs

This query returns the average song_length from the songs table, rounded to a single decimal point.

Chapter 8: SQL Subqueries

Subqueries

Sometimes a single query is not enough to retrieve the specific records we need.

It is possible to run a query on the result set of another query - a query within a query! This is called "query-ception"... erm... I mean a "subquery".

Subqueries can be very useful in a number of situations when trying to retrieve specific data that wouldn't be accessible by simply querying a single table.

How to retreive data from multiple tables

Here is an example of a subquery:

SELECT id, song_name, artist_id
FROM songs
WHERE artist_id IN (
    SELECT id
    FROM artists
    WHERE artist_name LIKE 'Rick%'
);

In this hypothetical database, the query above selects all of the song_ids, song_names, and artist_ids from the songs table that are written by artists whose name starts with "Rick". Notice that the subquery allows us to use information from a different table - in this case the artists table.

Subquery syntax

The only syntax unique to a subquery is the parentheses surrounding the nested query. The IN operator could be different, for example, we could use the = operator if we expect a single value to be returned.

Here's an example:

SELECT id, song_name, artist_id
FROM songs
WHERE artist_id IN (
    SELECT id
    FROM artists
    WHERE artist_name LIKE 'Rick%'
);

No tables necessary

When working on a back-end application, this doesn't come up often, but it's important to remember that SQL is a full programming language. We usually use it to interact with data stored in tables, but it's quite flexible and powerful.

For example, you can SELECT information that's simply calculated, with no tables necessary.

SELECT 5 + 10 as sum;

Chapter 9: Database Normalization

Table Relationships

Relational databases are powerful because of the relationships between the tables. These relationships help us to keep our databases clean and efficient.

A relationship between tables assumes that one of these tables has a foreign key that references the primary key of another table.

@youtube

Types of Relationships

There are 3 primary types of relationships in a relational database:

One-to-one
One-to-many
Many-to-many

One-to-one

A one-to-one relationship most often manifests as a field or set of fields on a row in a table. For example, a user will have exactly one password.

Settings fields might be another example of a one-to-one relationship. A user will have exactly one email_preference and exactly one birthday.

One to many

When talking about the relationships between tables, a one-to-many relationship is probably the most commonly used relationship.

A one-to-many relationship occurs when a single record in one table is related to potentially many records in another table.

Note that the one->many relation only goes one way, a record in the second table can not be related to multiple records in the first table!

Examples of one-to-many relationships

A customers table and a orders table. Each customer has 0, 1, or many orders that they've placed.
A users table and a transactions table. Each user has 0, 1, or many transactions that taken part in.

Many to many

A many-to-many relationship occurs when multiple records in one table can be related to multiple records in another table.

Examples of many-to-many relationships

A products table and a suppliers table - Products may have 0 to many suppliers, and suppliers can supply 0 to many products.
A classes table and a students table - Students can take potentially many classes and classes can have many students enrolled.

Joining tables

Joining tables helps define many-to-many relationships between data in a database. As an example, when defining the relationship above between products and suppliers, we would define a joining table called products_suppliers that contains the primary keys from the tables to be joined.

Then, when we want to see if a supplier supplies a specific product, we can look in the joining table to see if the ids share a row.

Unique constraints across 2 fields

When enforcing specific schema constraints we may need to enforce the UNIQUE constraint across two different fields.

CREATE TABLE product_suppliers (
  product_id INTEGER,
  supplier_id INTEGER,
  UNIQUE(product_id, supplier_id)
);

This ensures that we can have multiple rows with the same product_id or supplier_id, but we can't have two rows where both the product_id and supplier_id are the same.

Database normalization

Database normalization is a method for structuring your database schema in a way that helps:

Improve data integrity
Reduce data redundancy

What is data integrity?

"Data integrity" refers to the accuracy and consistency of data. For example, if a user's age is stored in a database, rather than their birthday, that data becomes incorrect automatically with the passage of time.

It would be better to store a birthday and calculate the age as needed.

What is data redundancy?

"Data redundancy" occurs when the same piece of data is stored in multiple places. For example: saving the same file multiple times to different hard drives.

Data redundancy can be problematic, especially when data in one place is changed such that the data is no longer consistent across all copies of that data.

Normal Forms

The creator of "database normalization", Edgar F. Codd, described different "normal forms" a database can adhere to. We'll talk about the most common ones.

First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)

In short, 1st normal form is the least "normalized" form, and Boyce-Codd is the most "normalized" form.

The more normalized a database, the better its data integrity, and the less duplicate data you'll have.

In the context of normal forms, "primary key" means something a bit different

In the context of database normalization, we're going to use the term "primary key" slightly differently. When we're talking about SQLite, a "primary key" is a single column that uniquely identifies a row.

When we're talking more generally about data normalization, the term "primary key" means the collection of columns that uniquely identify a row. That can be a single column, but it can actually be any number of columns. A primary key is the minimum number of columns needed to uniquely identify a row in a table.

If you think back to the many-to-many joining table product_suppliers, that table's "primary key" was actually a combination of the 2 ids, product_id and supplier_id:

CREATE TABLE product_suppliers (
    product_id INTEGER,
    supplier_id INTEGER,
    UNIQUE(product_id, supplier_id)
);

1st Normal Form (1NF)

To be compliant with first normal form, a database table simply needs to follow 2 rules:

It must have a unique primary key.
A cell can't have a nested table as its value (depending on the database you're using, this may not even be possible)

Example of NOT 1st normal form

name	age	email
Lane	27	lane@boot.dev
Lane	27	lane@boot.dev
Allan	27	allan@boot.dev

This table does not adhere to 1NF. It has two identical rows, so there isn't a unique primary key for each row.

Example of 1st normal form

The simplest way (but not the only way) to get into first normal form is to add a unique id column.

id	name	age	email
1	Lane	27	lane@boot.dev
2	Lane	27	lane@boot.dev
3	Allan	27	allan@boot.dev

It's worth noting that if you create a "primary key" by ensuring that two columns are always "unique together" that works too.

You should almost never design a table that doesn't adhere to 1NF

First normal form is simply a good idea. I've never built a database schema where each table isn't at least in first normal form.

2nd Normal Form (2NF)

A table in second normal form follows all the rules of 1st normal form, and one additional rule:

All columns that are not part of the primary key are dependent on the entire primary key, and not just one of the columns in the primary key.

Example of 1st NF, but not 2nd NF

In this table, the primary key is a combination of first_name + last_name.

first_name	last_name	first_initial
Lane	Wagner	l
Lane	Small	l
Allan	Wagner	a

This table does not adhere to 2NF. The first_initial column is entirely dependent on the first_name column, rendering it redundant.

Example of 2nd normal form

One way to convert the table above to 2NF is to add a new table that maps a first_name directly to its first_initial. This removes any duplicates:

first_name	last_name
Lane	Wagner
Lane	Small
Allan	Wagner

first_name	first_initial
Lane	l
Allan	a

2NF is usually a good idea

You should probably default to keeping your tables in second normal form. That said, there are good reasons to deviate from it, particularly for performance reasons. The reason being that when you have query a second table to get additional data it can take a bit longer.

My rule of thumb is:

Optimize for data integrity and data de-duplication first. If you have speed issues, de-normalize accordingly.

3rd Normal Form (3NF)

A table in 3rd normal form follows all the rules of 2nd normal form, and one additional rule:

All columns that aren't part of the primary are dependent solely on the primary key.

Notice that this is only slightly different from second normal form. In second normal form we can't have a column completely dependent on a part of the primary key, and in third normal form we can't have a column that is entirely dependent on anything that isn't the entire primary key.

Example of 2nd NF, but not 3rd NF

In this table, the primary key is simply the id column.

id	name	first_initial	email
1	Lane	l	lane.works@example.com
2	Breanna	b	breanna@example.com
3	Lane	l	lane.right@example.com

This table is in 2nd normal form because first_initial is not dependent on a part of the primary key. However, because it is dependent on the name column it doesn't adhere to 3rd normal form.

Example of 3rd normal form

The way to convert the table above to 3NF is to add a new table that maps a name directly to its first_initial. Notice how similar this solution is to 2NF.

id	name	email
1	Lane	lane.works@example.com
2	Breanna	breanna@example.com
3	Lane	lane.right@example.com

name	first_initial
Lane	l
Breanna	b

3NF is usually a good idea

The same exact rule of thumb applies to the second and third normal forms.

Optimize for data integrity and data de-duplication first by adhering to 3NF. If you have speed issues, de-normalize accordingly.

Remember the IIF function and the AS clause.

Boyce-Codd Normal Form (BCNF)

A table in Boyce-Codd normal form (created by Raymond F Boyce and Edgar F Codd) follows all the rules of 3rd normal form, plus one additional rule:

A column that's part of a primary key can not be entirely dependent on a column that's not part of that primary key.

This only comes into play when there are multiple possible primary key combinations that overlap. Another name for this is "overlapping candidate keys".

Only in rare cases does a table in third normal form not meet the requirements of Boyce-Codd normal form.

Example of 3rd NF, but not Boyce-Codd NF

release_year	release_date	sales	name
2001	2001-01-02	100	Kiss me tender
2001	2001-02-04	200	Bloody Mary
2002	2002-04-14	100	I wanna be them
2002	2002-06-24	200	He got me

The interesting thing here is that there are 3 possible primary keys:

release_year + sales
release_date + sales
name

This means that by definition this table is in 2nd and 3rd normal form because those forms only restrict how dependent a column that is not part of a primary key can be.

This table is not in Boyce-Codd's normal form because release_year is entirely dependent on release_date.

Example of Boyce-Codd normal form

The easiest way to fix the table in our example is to simply remove the duplicate data from release_date. Let's make that column release_day_and_month.

release_year	release_day_and_month	sales	name
2001	01-02	100	Kiss me tender
2001	02-04	200	Bloody Mary
2002	04-14	100	I wanna be them
2002	06-24	200	He got me

BCNF is usually a good idea

The same exact rule of thumb applies to the 2nd, 3rd and Boyce-Codd normal forms. That said, it's unlikely you'll see BCNF-specific issues in practice.

Optimize for data integrity and data de-duplication first by adhering to Boyce-Codd normal form. If you have speed issues, de-normalize accordingly.

Normalization Review

In my opinion, the exact definitions of 1st, 2nd, 3rd and Boyce-Codd normal forms simply are not all that important in your work as a back-end developer.

However, what is important is to understand the basic principles of data integrity and data redundancy that the normal forms teach us.

Let's go over some rules of thumb that you should commit to memory - they'll serve you well when you design databases and even just in coding interviews.

Rules of thumb for database design

Every table should always have a unique identifier (primary key)
90% of the time, that unique identifier will be a single column named id
Avoid duplicate data
Avoid storing data that is completely dependent on other data. Instead, compute it on the fly when you need it.
Keep your schema as simple as you can. Optimize for a normalized database first. Only denormalize for speed's sake when you start to run into performance problems.

We'll talk more about speed optimization in a later chapter.

Chapter 10: How to Join Tables in SQL

Joins are one of the most important features that SQL offers. Joins allow us to make use of the relationships we have set up between our tables. In short, joins allow us to query multiple tables at the same time.

`INNER JOIN`

The simplest and most common type of join in SQL is the INNER JOIN. By default, a JOIN command is an INNER JOIN.

An INNER JOIN returns all of the records in table_a that have matching records in table_b, as demonstrated by the following Venn diagram.

The `ON` clause

In order to perform a join, we need to tell the database which fields should be "matched up". The ON clause is used to specify these columns to join.

SELECT *
FROM employees
INNER JOIN departments 
ON employees.department_id = departments.id;

The query above returns all the fields from both tables. The INNER keyword doesn't have anything to do with the number of columns returned - it only affects the number of rows returned.

Namespacing on Tables

When working with multiple tables, you can specify which table a field exists on using a .. For example:

table_name.column_name

SELECT students.name, classes.name
FROM students
INNER JOIN classes on classes.class_id = students.class_id;

The above query returns the name field from the students table and the name field from the classes table.

`LEFT JOIN`

A LEFT JOIN will return every record from table_a regardless of whether or not any of those records have a match in table_b. A left join will also return any matching records from table_b.

Here is a Venn diagram to help visualize the effect of a LEFT JOIN.

A small trick you can do to make writing the SQL query easier is define an alias for each table. Here's an example:

SELECT e.name, d.name
FROM employees e
LEFT JOIN departments d
ON e.department_id = d.id;

Notice the simple alias declarations e and d for employees and departments respectively.

Some developers do this to make their queries less verbose. That said, I personally hate it because single-letter variables are harder to understand the meaning of.

`RIGHT JOIN`

A RIGHT JOIN is, as you may expect, the opposite of a LEFT JOIN. It returns all records from table_b regardless of matches, and all matching records between the two tables.

SQLite Restriction

SQLite does not support right joins, but many dialects of SQL do. If you think about it, a RIGHT JOIN is just a LEFT JOIN with the order of the tables switched, so it's not a big deal that SQLite doesn't support the syntax.

`FULL JOIN`

A FULL JOIN combines the result set of the LEFT JOIN and RIGHT JOIN commands. It returns all records from both from table_a and table_b regardless of whether or not they have matches.

SQLite

Like RIGHT JOINs, SQLite doesn't support FULL JOINs but they are still important to know.

Chapter 11: Database Performance

SQL Indexes

An index is an in-memory structure that ensures that queries we run on a database are performant, that is to say, they run quickly.

If you've learned about data structures, most database indexes are just binary trees. The binary tree can be stored in ram as well as on disk, and it makes it easy to lookup the location of an entire row.

PRIMARY KEY columns are indexed by default, ensuring you can look up a row by its id very quickly. But if you have other columns that you want to be able to do quick lookups on, you'll need to index them.

`CREATE INDEX`

CREATE INDEX index_name on table_name (column_name);

It's fairly common to name an index after the column it's created on with a suffix of _idx.

Index Review

As we discussed, an index is a data structure that can perform quick lookups. By indexing a column, we create a new in-memory structure, usually a binary-tree, where the values in the indexed column are sorted into the tree to keep lookups fast.

In terms of Big-O complexity, a binary tree index ensures that lookups are O(log(n)).

Shouldn't we index everything? We can make the database ultra-fast!

While indexes make specific kinds of lookups much faster, they also add performance overhead - they can slow down a database in other ways.

Think about it: if you index every column, you could have hundreds of binary trees in memory. That needlessly bloats the memory usage of your database. It also means that each time you insert a record, that record needs to be added to many trees - slowing down your insert speed.

The rule of thumb is simple:

Add an index to columns you know you'll be doing frequent lookups on. Leave everything else un-indexed. You can always add indexes later.

Multi-column indexes

Multi-column indexes are useful for the exact reason you might think - they speed up lookups that depend on multiple columns.

`CREATE INDEX`

CREATE INDEX first_name_last_name_age_idx
ON users (first_name, last_name, age);

A multi-column index is sorted by the first column first, the second column next, and so forth. A lookup on only the first column in a multi-column index gets almost all of the performance improvements that it would get from its own single-column index. But lookups on only the second or third column will have very degraded performance.

Rule of thumb

Unless you have specific reasons to do something special, only add multi-column indexes if you're doing frequent lookups on a specific combination of columns.

Denormalizing for speed

I left you with a cliffhanger in the "normalization" chapter. As it turns out, data integrity and deduplication come at a cost, and that cost is usually speed.

Joining tables together, using subqueries, performing aggregations, and running post-hoc calculations all take time. At very large scales these advanced techniques can actually take a huge performance toll on an application - sometimes grinding the database server to a halt.

Storing duplicate information can drastically speed up an application that needs to look it up in different ways. For example, if you store a user's country information right on their user record, no expensive join is required to load their profile page.

That said, denormalize at your own risk. Denormalizing a database incurs a large risk of inaccurate and buggy data.

In my opinion, it should be used as a kind of "last resort" in the name of speed.

SQL Injection

SQL is a very common way hackers attempt to cause damage or breach a database. One of my favorite XKCD comics of all time demonstrates the problem:

The joke here is that if someone was using this query:

INSERT INTO students(name) VALUES (?);

And the "name" of a student was 'Robert'); DROP TABLE students;-- then the resulting SQL query would look like this:

INSERT INTO students(name) VALUES ('Robert'); DROP TABLE students;--)

As you can see, this is actually 2 queries! The first one inserts "Robert" into the database, and the second one deletes the students table!

How do we protect against SQL injection?

You need to be aware of SQL injection attacks, but to be honest the solution these days is to simply use a modern SQL library that sanitizes SQL inputs. We don't often need to sanitize inputs by hand at the application level anymore.

For example, the Go standard library's SQL packages automatically protects your inputs against SQL attacks if you use it properly. In short, don't interpolate user input into raw strings yourself - make sure your database library has a way to sanitize inputs, and pass it those raw values.

Congratulations on making it to the end!

If you're interested in doing the interactive coding assignments and quizzes for this course, you can check out the Learn SQL Course course over on Boot.dev

This course is a part of my full back-end developer career path, made up of other courses and projects if you're interested in checking those out.

If you want to see the other content I'm creating related to web development, check out some of my links below:

Lane's Podcast: Backend Banter Lane on Twitter Lane on YouTube

How to Perform CRUD Operations – JavaScript and SQL Example

David Clinton — Thu, 03 Aug 2023 20:41:40 +0000

For the most part, interactive website architectures will involve generating or dispensing data of one sort or another. You can certainly use HTML forms to collect user input. But the kind of web form that's described here will only take you so far.

What we really need is a way to reliably store and manipulate our data within the application environment.

In this article, I'm going to show you how to connect a back end database to your data collection process. The plan involves tossing some HTML, JavaScript, and the tiny database engine SQLite into a bowl, mixing vigorously, and seeing what comes out.

This article comes from my Complete LPI Web Development Essentials Study Guide course. If you'd like, you can follow the video version here:

As you may already know, the SQL in SQLite stands for structured query language. This means that the syntax you'll use for interacting with a SQLite database will closely parallel how you'd do it with databases like MariaDB, Amazon Aurora, Oracle, or Microsoft's SQL Server. If you've got experience with any of those, you'll be right at home here.

Why are we going to use SQLite here? Because it's a very popular choice for the kind of work you're likely to undertake in a web environment.

You'll need to create a new directory on your machine along with some files with JavaScript code. We'll learn how to create, modify, and delete records in a SQLite database.

I could incorporate all those actions into a single file, of course, but I think breaking them out into multiple files will make it easier to understand what's going on.

Connecting to a Database and Creating a Table

Here's what the first file will look like:

const sqlite3 = require('sqlite3').verbose();

// Create/connect to the database
const db = new sqlite3.Database('mydatabase.db');

// Create a table
db.run(`CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT,
    age INTEGER
)`);

// Insert data
const insertQuery = `INSERT INTO users (name, age) VALUES (?, ?)`;
const name = 'Trevor';
const age = 5;
db.run(insertQuery, [name, age], function (err) {
    if (err) {
        console.error(err.message);
    } else {
        console.log(`Inserted data with id ${this.lastID}`);
    }
});

// Close the database connection
db.close();

We begin by loading the sqlite3 module as sqlite3 and then creating the db variable to represent our new database instance. The database will be called mydatabase.db.

const sqlite3 = require('sqlite3').verbose();
const db = new sqlite3.Database('mydatabase.db');

If there isn't a database using that name in our local directory, the code will create one, otherwise it'll just connect to the one that's there already.

Since this is our first run, I'll create a new table within the mydatabase.db database. There will be three keys in our table: id, name, and age.

db.run(`CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT,
    age INTEGER
)`);

As you can see, id will be the primary key that we'll use to reference individual records.

We defined the data type of each key: integer, text and, again, integer. This definition is something we only need to do once. But we do want to get it right, because changing it later, after we've already added data, can be tricky.

Inserting New Data into a Table

In this section, we'll will add a new record to the table using the SQL INSERT command.

const insertQuery = `INSERT INTO users (name, age) VALUES (?, ?)`;
const name = 'Trevor';
const age = 5;
db.run(insertQuery, [name, age], function (err) {
    if (err) {
        console.error(err.message);
    } else {
        console.log(`Inserted data with id ${this.lastID}`);
    }
});

You'll probably discover that official SQL documentation always capitalizes key syntax terms like INSERT and SELECT. That's a useful best practice, but it's not actually necessary. As a rule, I'm way too lazy to bother.

The query itself is templated as insertQuery, with the name and age details added as constants in the lines that follow.

The db.run method, using the insertQuery constant and those two values (name and age) as attributes, is then executed. Based on the success or failure of the operation, log messages will be generated.

But hang on for a moment. What's with those question marks after declaring insertQuery? And why did we need to break this process into two parts?

This is actually an important security practice known as an escape variable. With this in place, when the db.run() method executes the prepared statement, it'll automatically handle the escaping of the variable value, preventing SQL injection.

Lastly, we close down the connection:

db.close();

Modifying Data

Now let's see how the "modify" code works. Like before, we create a SQLite3 constant and then connect to our database.

This time, however, our table already exists, so we can go straight to the "modify" section.

const sqlite3 = require('sqlite3').verbose();

// Create/connect to the database
const db = new sqlite3.Database('mydatabase.db');

// Modify data
const updateQuery = `UPDATE users SET age = ? WHERE name = ?`;
const updatedAge = 30;
const updatedName = 'name2';
db.run(updateQuery, [updatedAge, updatedName], function (err) {
    if (err) {
        console.error(err.message);
    } else {
        console.log(`Modified ${this.changes} row(s)`);
    }
});

// Close the database connection
db.close();

The pattern is similar. We define an updateQuery method to UPDATE a record that we'll define. This operation will change the age value for an entry whose name equals Trevor.

You may recall that Trevor's age was earlier listed as 25. We're going to update that to 30. Everything else will work the same as before, including closing the connection when we're done.

This section of code from the third file will delete a record:

const deleteQuery = `DELETE FROM users WHERE name = ?`;
const deletedName = 'name1';
db.run(deleteQuery, [deletedName], function (err) {
    if (err) {
        console.error(err.message);
    } else {
        console.log(`Deleted ${this.changes} row(s)`);
    }
});

The code above will delete the record where the name equals Trevor.

You can run any of those files using the node command. But you should first make sure that you've installed the sqlite3 module:

$ npm install sqlite3

Next I'll use node to run the first file (that you could choose to call db.js).

$ node db.js
Inserted data with id 1

We'll see that a new record has been successfully inserted. If you list the directory contents, you'll also see that a new mydatabase.db file has been created.

You can always manually log into sqlite3 to see how things might have changed. I'll reference the mydatabase.db file so we can open it up right away.

$ sqlite3 mydatabase.db

Typing .tables within the SQLite interface will list all the existing tables in this database. In our case, it'll be the users table we created.

sqlite> .tables
users
sqlite>

Now I'll use the SQL select command to display a record. Here I'll use the asterisk to represent all records and specify the users table.

sqlite> SELECT * FROM users;
1|Trevor|25
sqlite>

We can see that record 1 containing Trevor who is 25 years old has been created. Great!

Finally, we can run the delete code which should remove Trevor altogether:

const deleteQuery = `DELETE FROM users WHERE name = ?`;
const deletedName = 'Trevor';
db.run(deleteQuery, [deletedName], function (err) {
    if (err) {
        console.error(err.message);
    } else {
        console.log(`Deleted ${this.changes} row(s)`);
    }
});

I should note that the db.run and db.close format I used for those methods can also be referred to as Database.run(), and database.close(). It's just a matter of preference - or, in my case, laziness. I'm a Linux admin, after all, and the very best admins are, in principle, lazy.

Summary

We've seen how use JavaScript to connect to a back end database, create a new table, and then add, modify, and delete records in that table. And we seem to have gotten away with it, too!

Now try this on your own computer. But play around with the values. Even better: build something practical.

This article comes from my Complete LPI Web Development Essentials Study Guide course. And there's much more technology goodness available at bootstrap-it.com

How to Install MySQL and MySQL Workbench on Windows

Md. Fahim Bin Amin — Fri, 30 Jun 2023 18:23:15 +0000

If you want to learn MySQL, starting with a good client is super helpful – especially when you are just beginning your journey.

There are a lot of clients out there for your MySQL-based needs, like XAMPP, DataGrip, and others. Among all of them, I prefer the MySQL Workbench. It is completely free, by the way.

In this tutorial, I will show you how you can install and configure your Windows machine for this MySQL and MySQL workbench from scratch.

If you enjoy learning from videos as well, then don't worry as I have also created a step-by-step video just for you:

How to Install MySQL Workbench

➡️ Download MySQL Workbench

Make sure to visit only the official website for downloading the MySQL Workbench. You do not want to get into shoddy websites and download the wrong file that infects your favorite machine, right?

Find the official website for MySQL Workbench: https://www.mysql.com/products/workbench/

Now click on the "DOWNLOADS" tab.

Scroll down until you find MySQL Community (GPL) Downloads ».

Click on MySQL Community (GPL) Downloads ». After that, on the new page, click "MySQL installer for Windows".

From the dropdown menu, select your operating system as "Microsoft Windows". Then download the file which is larger in size.

A .msi file will be downloaded. That is our installer file to install MySQL and MySQL workbench.

➡️ Install both MySQL and MySQL Workbench

Simply double click on the installer file. It will reload the necessary components and open the installer GUI selection window. Choose the setup type as custom and click "Next".

Select Custom

A new page will appear. Make sure to select the latest "MySQL Server", "MySQL Workbench" and "MySQL Shell". Selecting and clicking on the right side arrow will take the product name in the "Products to be installed section". Then click "Next".

Install necessary components

Click "Execute" to install the three necessary components. The process might take some time depending on your internet speed and computer configuration. After it gets finished, simply click "Next".

Execute

In the Product Configuration window, simply click "Next". It will install the three selected components for us.

Next

Keep everything as it is and simply click "Next". It will configure the MySQL Server.

Next

Keep everything as it is and simply click "Next". It will apply the TCP/IP connectivity for our MySQL server.

Next

Now give it a Root password. For testing purposes, I am using a very simple "1111" as my password, but I would recommend not doing the same. Also, make sure to remember the password as you will need it when you want to work in MySQL Workbench. Click "Next".

Next

Keep everything as it is, and simply click "Next". It will make sure to setup our root password for the MySQL workbench.

Root password

We want to run the service as a Standard System Account for our operating system. Therefore, keep everything as it is, and simply click "Next".

Next

Select the option to grant full access to the user running the Windows Service and then click "Next."

Next

Then click "Execute". This will grant the full access to the user running the Windows service and the administrator group only, but the other users and groups will not have its access.

So if you have multiple user accounts in your computer, then they will not be able to access the MySQL server/Workbench. If you want then you can change the settings here based on your need.

As I have only one user account in my Windows machine, I can safely keep the first option selected.

Execute

It might take some time. Then when you will receive a green check box in all configuration steps, simply click "Finish".

Finish

The configuration has beep applied successfully. Simply click "Next".

Next

Click "Finish" to complete the installation.

Finish

It will open the MySQL Workbench and MySQL Shell. Simply close all of them now.

➡️ Configuration

Now we need to configure the path variables for our operating system. Go to the drive where you have installed your Windows operating system. Like others, I have also installed my operating system on the "C" drive.

Therefore, I am going to the "C" drive and opening the "Program Files" directory.

Go to the "MySQL" folder.

Then go to the MySQL Server folder.

Go to the "bin" folder.

Copy the path/address.

Now open the Environment Variables settings. Simply click on the Windows button and type "env".

Click "Environment Variables".

Select the "Path" and click "Edit".

Click "New". A new blank box will appear. Paste the path/address that you copied earlier. Do not close the window now as we need to do the same thing for the MySQL Shell folder.

Now, we need to do the same thing for the MySQL Shell also. Open the MySQL Shell folder now.

Go to the "bin" folder.

Copy the path/directory.

Now apply the same process as you did earlier. Click "New" on the Edit environment variable window. Paste the path/directory in the new blank box.

Now click "OK".

Click "OK" again.

And click "OK" one more time.

➡️ Finishing Up

Our task is now finished. You can now open the MySQL Workbench.

Simply click on the Local instance. It will ask for the root password. Enter the password. If you do not want to go into the same hassle of entering a password every time, check the box on save password in the vault. Click "OK".

This is your default MySQL Workbench workspace.

Workbench workspace

If you want then you can also hide the SQL Additions tab by clicking on the colored box.

For getting the Schemas, click on the "Schemas" tab from the navigator.

Your MySQL Workbench is also ready for any kind of development process. You can also use MySQL from your terminal as well.

Conclusion

Thank you for reading the entire article.

If you have any questions, feel free to reach out to me using Twitter or LinkedIn.

Also, make sure to follow me on GitHub!

You can also subscribe to my YouTube channel for more helpful video content.

If you are interested then you can also check my website: https://fahimbinamin.com/

Have a great day! 😊

Cover: Photo by Boitumelo Phetla on Unsplash

database - freeCodeCamp.org

How to Use PostgreSQL as a Cache, Queue, and Search Engine

Table of Contents

Prerequisites

The Setup

Benchmark 1: Caching with UNLOGGED Tables

Results (200 Virtual Users)

Under Stress (1,000 Virtual Users, No Sleep)

The Verdict

Benchmark 2: Job Queues with SKIP LOCKED

Results (100 Producers + 50 Consumers)

The Verdict

Benchmark 3: Full-Text Search with tsvector

Results (500 Virtual Users)

Under Stress (500 Virtual Users, No Sleep)

The Verdict

Benchmark 4: Pub/Sub with LISTEN/NOTIFY

Results (200 Virtual Users)

The Verdict

The Combined Workload: The Honest Test

Results (All Four Workloads Running Together)

What I Learned

How to Store Data Locally with Isar in Flutter

Table of Contents

Prerequisites

What We Are Building

How to Set Up Isar in a Flutter Project

Step 1: Add dependencies

Step 2: Create and initialize Isar

How to Create the Task Model

How to Build the Repository for CRUD Operations

How to Integrate CRUD into the Flutter UI

Beyond CRUD: Advanced Features of Isar

Conclusion

How to Handle MongoDB Migrations with ts-migrate-mongoose

Table of Contents

How to Set Up the Project

How to Configure ts-migrate-mongoose for the Project

How to Seed User Data with ts-migrate-mongoose

1. Create a users Collection using Mongoose

2. Create a Migration Script with ts-migrate-mongoose

3. Run the Migration Before the Application Starts

How to Build an API Endpoint to Fetch Seeded Data

Conclusion

How to Use Object Relational Mapping in Node.js – Optimize Database Interactions With Sequelize ORM

Table of Contents

What is an ORM?

How to Set Up Your Node.js Server

How to Integrate Relevant Packages

Demo Project

Models

Create Operation

Read Operation

Update Operation

Delete Operation

Additional Information

How to Work with SQLite in Python – A Handbook for Beginners

How to Set Up Your Python Environment

Installing Python

Installing SQLite3 Module

How to Create a Virtual Environment (Optional but Recommended)

Installing Necessary Libraries

How to Create an SQLite Database

How SQLite Databases Work

How to Create a New SQLite Database

Connecting to the Database

Creating a Cursor

Closing the Connection

How to Use Context Manager to Open and Close Connections

How to Create Database Tables

Data Types in SQLite and Their Mapping to Python

How to Insert Data into a Table

How to Insert a Single Record

How to Insert Multiple Records

How to Handle Common Issues: SQL Injection

How to Query Data

How to Fetch All Records

How to Fetch a Single Record

How to Fetch Multiple Records

How to Use pandas for Better Data Presentation

How to Use `pandas` for Better Data Presentation