<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ NoSQL - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ NoSQL - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Tue, 09 Jun 2026 10:25:38 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/nosql/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Store Data Locally with Isar in Flutter ]]>
                </title>
                <description>
                    <![CDATA[ When building Flutter applications, managing local data efficiently is critical. You want a database that is lightweight, fast, and easy to integrate, especially if your app will work offline. Isar is one such database. It is a high-performance, easy... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/store-data-locally-with-isar-in-flutter/</link>
                <guid isPermaLink="false">68cd561cebc0d959d789d679</guid>
                
                    <category>
                        <![CDATA[ Flutter ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Dart ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Atuoha Anthony ]]>
                </dc:creator>
                <pubDate>Fri, 19 Sep 2025 13:09:48 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1758287132737/7886bedc-374f-401d-b59c-04c59590e81f.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When building Flutter applications, managing local data efficiently is critical. You want a database that is lightweight, fast, and easy to integrate, especially if your app will work offline. Isar is one such database. It is a high-performance, easy-to-use NoSQL embedded database tailored for Flutter. With features like reactive queries, indexes, relationships, migrations, and transactions, Isar makes local data persistence both powerful and developer-friendly.</p>
<p>In this article, you’lll learn how to integrate Isar into a Flutter project, set up a data model, and perform the full range of CRUD (Create, Read, Update, Delete) operations. To make this practical, you’ll build a simple to-do app that allows users to create, view, update, and delete tasks.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-we-are-building">What We Are Building</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-set-up-isar-in-a-flutter-project">How to Set Up Isar in a Flutter Project</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-create-the-task-model">How to Create the Task Model</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-build-the-repository-for-crud-operations">How to Build the Repository for CRUD Operations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-integrate-crud-into-the-flutter-ui">How to Integrate CRUD into the Flutter UI</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-beyond-crud-advanced-features-of-isar">Beyond CRUD: Advanced Features of Isar</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, ensure you have the following:</p>
<ol>
<li><p><strong>Flutter SDK</strong> installed (version 3.0 or above recommended).<br> Check your version with:</p>
<pre><code class="lang-bash"> flutter --version
</code></pre>
</li>
<li><p><strong>Dart knowledge</strong>: Familiarity with Dart syntax, classes, and async programming.</p>
</li>
<li><p><strong>Flutter basics</strong>: You should know how to set up a Flutter project, build widgets, and use <code>FutureBuilder</code> or <code>setState</code> for state management.</p>
</li>
<li><p><strong>Code editor</strong>: VS Code or Android Studio is recommended.</p>
</li>
</ol>
<p>If these are in place, we are ready to begin.</p>
<h2 id="heading-what-we-are-building">What We Are Building</h2>
<p>We will create a Task Manager App that lets users:</p>
<ul>
<li><p>Add new tasks.</p>
</li>
<li><p>View all tasks in a list.</p>
</li>
<li><p>Update existing tasks.</p>
</li>
<li><p>Delete tasks.</p>
</li>
</ul>
<p>By the end, you will have a fully functioning CRUD app built with Flutter and Isar.</p>
<h2 id="heading-how-to-set-up-isar-in-a-flutter-project">How to Set Up Isar in a Flutter Project</h2>
<h3 id="heading-step-1-add-dependencies">Step 1: Add dependencies</h3>
<p>Open your <code>pubspec.yaml</code> file and add the following:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">dependencies:</span>
  <span class="hljs-attr">flutter:</span>
    <span class="hljs-attr">sdk:</span> <span class="hljs-string">flutter</span>
  <span class="hljs-attr">isar:</span> <span class="hljs-string">^3.1.0</span>
  <span class="hljs-attr">isar_flutter_libs:</span> <span class="hljs-string">^3.1.0</span>

<span class="hljs-attr">dev_dependencies:</span>
  <span class="hljs-attr">isar_generator:</span> <span class="hljs-string">^3.1.0</span>
  <span class="hljs-attr">build_runner:</span> <span class="hljs-string">any</span>
</code></pre>
<ul>
<li><p><code>isar</code>: The core Isar package.</p>
</li>
<li><p><code>isar_flutter_libs</code>: Required for Flutter integration.</p>
</li>
<li><p><code>isar_generator</code>: Used to generate code for your models.</p>
</li>
<li><p><code>build_runner</code>: Runs the code generator.</p>
</li>
</ul>
<p>Run:</p>
<pre><code class="lang-bash">flutter pub get
</code></pre>
<h3 id="heading-step-2-create-and-initialize-isar">Step 2: Create and initialize Isar</h3>
<p>Create a file named <code>isar_setup.dart</code>. This will handle the opening of the Isar database.</p>
<pre><code class="lang-dart"><span class="hljs-keyword">import</span> <span class="hljs-string">'package:isar/isar.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'package:path_provider/path_provider.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'task.dart'</span>; <span class="hljs-comment">// we will create this model soon</span>

<span class="hljs-keyword">late</span> <span class="hljs-keyword">final</span> Isar isar;

Future&lt;<span class="hljs-keyword">void</span>&gt; initializeIsar() <span class="hljs-keyword">async</span> {
  <span class="hljs-keyword">final</span> dir = <span class="hljs-keyword">await</span> getApplicationDocumentsDirectory();
  isar = <span class="hljs-keyword">await</span> Isar.open(
    [TaskSchema],
    directory: dir.path,
  );
}
</code></pre>
<p><strong>Explanation</strong>:</p>
<ul>
<li><p><code>getApplicationDocumentsDirectory()</code> provides a storage location for the database file.</p>
</li>
<li><p><code>Isar.open()</code> initializes the database and registers our <code>Task</code> schema.</p>
</li>
<li><p><code>late final Isar isar;</code> ensures we can access the database instance globally after initialization.</p>
</li>
</ul>
<h2 id="heading-how-to-create-the-task-model">How to Create the Task Model</h2>
<p>Now let’s define our data model for tasks. Create a file named <code>task.dart</code>.</p>
<pre><code class="lang-dart"><span class="hljs-keyword">import</span> <span class="hljs-string">'package:isar/isar.dart'</span>;

<span class="hljs-keyword">part</span> <span class="hljs-string">'task.g.dart'</span>;

<span class="hljs-meta">@Collection</span>()
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Task</span> </span>{
  Id id = Isar.autoIncrement; <span class="hljs-comment">// auto-incrementing primary key</span>

  <span class="hljs-keyword">late</span> <span class="hljs-built_in">String</span> name;

  <span class="hljs-keyword">late</span> <span class="hljs-built_in">DateTime</span> createdAt;

  Task(<span class="hljs-keyword">this</span>.name) : createdAt = <span class="hljs-built_in">DateTime</span>.now();
}
</code></pre>
<p><strong>Explanation</strong>:</p>
<ul>
<li><p><code>@Collection()</code> tells Isar this class represents a database collection.</p>
</li>
<li><p><code>Id id = Isar.autoIncrement;</code> creates a unique identifier automatically.</p>
</li>
<li><p><code>late String name;</code> stores the task name.</p>
</li>
<li><p><code>late DateTime createdAt;</code> stores the creation timestamp.</p>
</li>
<li><p><code>part 'task.g.dart';</code> links to the generated code, which will be created after running the code generator.</p>
</li>
</ul>
<p>Generate the code with:</p>
<pre><code class="lang-bash">flutter pub run build_runner build
</code></pre>
<p>This generates <code>task.g.dart</code>, which contains the necessary schema code.</p>
<h2 id="heading-how-to-build-the-repository-for-crud-operations">How to Build the Repository for CRUD Operations</h2>
<p>Create a new file called <code>task_repository.dart</code>. This will house the methods for interacting with the database.</p>
<pre><code class="lang-dart"><span class="hljs-keyword">import</span> <span class="hljs-string">'package:isar/isar.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'task.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'isar_setup.dart'</span>;

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TaskRepository</span> </span>{
  Future&lt;<span class="hljs-keyword">void</span>&gt; addTask(<span class="hljs-built_in">String</span> name) <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">final</span> task = Task(name);
    <span class="hljs-keyword">await</span> isar.writeTxn(() <span class="hljs-keyword">async</span> {
      <span class="hljs-keyword">await</span> isar.tasks.put(task);
    });
  }

  Future&lt;<span class="hljs-built_in">List</span>&lt;Task&gt;&gt; getAllTasks() <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> isar.tasks.where().findAll();
  }

  Future&lt;<span class="hljs-keyword">void</span>&gt; updateTask(Task task) <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">await</span> isar.writeTxn(() <span class="hljs-keyword">async</span> {
      <span class="hljs-keyword">await</span> isar.tasks.put(task);
    });
  }

  Future&lt;<span class="hljs-keyword">void</span>&gt; deleteTask(Task task) <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">await</span> isar.writeTxn(() <span class="hljs-keyword">async</span> {
      <span class="hljs-keyword">await</span> isar.tasks.delete(task.id);
    });
  }
}
</code></pre>
<p><strong>Explanation</strong>:</p>
<ul>
<li><p><code>addTask</code>: Creates a new task and saves it.</p>
</li>
<li><p><code>getAllTasks</code>: Reads all tasks from the database.</p>
</li>
<li><p><code>updateTask</code>: Updates an existing task by calling <code>.put()</code> again.</p>
</li>
<li><p><code>deleteTask</code>: Removes a task by its <code>id</code>.</p>
</li>
<li><p><code>isar.writeTxn</code>: Ensures operations run inside a transaction for safety and consistency.</p>
</li>
</ul>
<h2 id="heading-how-to-integrate-crud-into-the-flutter-ui">How to Integrate CRUD into the Flutter UI</h2>
<p>Now, let’s connect everything inside <code>main.dart</code>.</p>
<pre><code class="lang-dart"><span class="hljs-keyword">import</span> <span class="hljs-string">'package:flutter/material.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'isar_setup.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'task_repository.dart'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'task.dart'</span>;

<span class="hljs-keyword">void</span> main() <span class="hljs-keyword">async</span> {
  WidgetsFlutterBinding.ensureInitialized();
  <span class="hljs-keyword">await</span> initializeIsar(); <span class="hljs-comment">// initialize Isar before runApp</span>
  runApp(MyApp());
}

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MyApp</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">StatelessWidget</span> </span>{
  <span class="hljs-meta">@override</span>
  Widget build(BuildContext context) {
    <span class="hljs-keyword">return</span> MaterialApp(
      home: TaskListScreen(),
    );
  }
}

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TaskListScreen</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">StatefulWidget</span> </span>{
  <span class="hljs-meta">@override</span>
  _TaskListScreenState createState() =&gt; _TaskListScreenState();
}

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">_TaskListScreenState</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">State</span>&lt;<span class="hljs-title">TaskListScreen</span>&gt; </span>{
  <span class="hljs-keyword">final</span> TaskRepository _taskRepository = TaskRepository();
  <span class="hljs-keyword">late</span> Future&lt;<span class="hljs-built_in">List</span>&lt;Task&gt;&gt; _tasksFuture;

  <span class="hljs-meta">@override</span>
  <span class="hljs-keyword">void</span> initState() {
    <span class="hljs-keyword">super</span>.initState();
    _tasksFuture = _taskRepository.getAllTasks();
  }

  Future&lt;<span class="hljs-keyword">void</span>&gt; _addTask() <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">await</span> _taskRepository.addTask(<span class="hljs-string">'New Task'</span>);
    setState(() {
      _tasksFuture = _taskRepository.getAllTasks();
    });
  }

  Future&lt;<span class="hljs-keyword">void</span>&gt; _deleteTask(Task task) <span class="hljs-keyword">async</span> {
    <span class="hljs-keyword">await</span> _taskRepository.deleteTask(task);
    setState(() {
      _tasksFuture = _taskRepository.getAllTasks();
    });
  }

  <span class="hljs-meta">@override</span>
  Widget build(BuildContext context) {
    <span class="hljs-keyword">return</span> Scaffold(
      appBar: AppBar(title: Text(<span class="hljs-string">'Isar CRUD Example'</span>)),
      body: FutureBuilder&lt;<span class="hljs-built_in">List</span>&lt;Task&gt;&gt;(
        future: _tasksFuture,
        builder: (context, snapshot) {
          <span class="hljs-keyword">if</span> (snapshot.connectionState == ConnectionState.waiting) {
            <span class="hljs-keyword">return</span> Center(child: CircularProgressIndicator());
          } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (snapshot.hasError) {
            <span class="hljs-keyword">return</span> Center(child: Text(<span class="hljs-string">'Error: <span class="hljs-subst">${snapshot.error}</span>'</span>));
          } <span class="hljs-keyword">else</span> {
            <span class="hljs-keyword">final</span> tasks = snapshot.data ?? [];
            <span class="hljs-keyword">if</span> (tasks.isEmpty) {
              <span class="hljs-keyword">return</span> Center(child: Text(<span class="hljs-string">'No tasks yet.'</span>));
            }
            <span class="hljs-keyword">return</span> ListView.builder(
              itemCount: tasks.length,
              itemBuilder: (context, index) {
                <span class="hljs-keyword">final</span> task = tasks[index];
                <span class="hljs-keyword">return</span> ListTile(
                  title: Text(task.name),
                  subtitle: Text(<span class="hljs-string">'Created at: <span class="hljs-subst">${task.createdAt}</span>'</span>),
                  trailing: IconButton(
                    icon: Icon(Icons.delete),
                    onPressed: () =&gt; _deleteTask(task),
                  ),
                );
              },
            );
          }
        },
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _addTask,
        child: Icon(Icons.add),
      ),
    );
  }
}
</code></pre>
<p><strong>Explanation</strong>:</p>
<ul>
<li><p><code>initializeIsar()</code>: Ensures the database is ready before the app runs.</p>
</li>
<li><p><code>_tasksFuture</code>: Holds a future of the list of tasks.</p>
</li>
<li><p><code>_addTask</code>: Adds a new task and refreshes the list.</p>
</li>
<li><p><code>_deleteTask</code>: Deletes a task and refreshes the list.</p>
</li>
<li><p><code>FutureBuilder</code>: Automatically rebuilds the UI when the future completes.</p>
</li>
<li><p><code>ListView.builder</code>: Displays all tasks dynamically.</p>
</li>
</ul>
<p>This gives you a simple yet complete CRUD app using Isar.</p>
<h2 id="heading-beyond-crud-advanced-features-of-isar">Beyond CRUD: Advanced Features of Isar</h2>
<p>Once you are comfortable with CRUD, Isar provides advanced tools to optimize and extend your application:</p>
<ol>
<li><p><strong>Reactive Queries</strong>:<br> Instead of using <code>FutureBuilder</code>, you can listen for changes directly.</p>
<pre><code class="lang-dart"> <span class="hljs-keyword">final</span> stream = isar.tasks.where().watch(fireImmediately: <span class="hljs-keyword">true</span>);
</code></pre>
</li>
<li><p><strong>Indexes</strong>:<br> Improve query performance by indexing fields.</p>
<pre><code class="lang-dart"> <span class="hljs-meta">@Collection</span>()
 <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Task</span> </span>{
   Id id = Isar.autoIncrement;

   <span class="hljs-meta">@Index</span>()
   <span class="hljs-keyword">late</span> <span class="hljs-built_in">String</span> name;
 }
</code></pre>
</li>
<li><p><strong>Relations</strong>:<br> Link one collection to another (for example, <code>Project</code> with many <code>Tasks</code>).</p>
</li>
<li><p><strong>Custom Queries</strong>:<br> Perform complex filtering, sorting, and pagination.</p>
</li>
<li><p><strong>Migrations</strong>:<br> Safely evolve your schema as the app grows.</p>
</li>
<li><p><strong>Batch Operations</strong>:<br> Insert or update many records in one transaction.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>We built a simple Flutter to-do app with Isar that supports creating, reading, updating, and deleting tasks. Along the way, we learned how to:</p>
<ol>
<li><p>Add Isar dependencies.</p>
</li>
<li><p>Define a model with annotations.</p>
</li>
<li><p>Generate schema code.</p>
</li>
<li><p>Implement CRUD operations in a repository.</p>
</li>
<li><p>Connect Isar to the Flutter UI.</p>
</li>
</ol>
<p>With its performance, developer-friendly API, and advanced features, Isar is an excellent choice for local persistence in Flutter applications.</p>
<p>For further learning, consult the official docs:</p>
<ol>
<li><p><a target="_blank" href="https://pub.dev/packages/isar">Isar on pub.dev</a></p>
</li>
<li><p><a target="_blank" href="https://isar.dev/">Isar documentation</a></p>
</li>
</ol>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ SQL vs NoSQL: When to Use Which ]]>
                </title>
                <description>
                    <![CDATA[ When should you use a SQL database and when should you use a NoSQL database? We just published a course on the freeCodeCamp.org YouTube channel that will teach you the differences between NoSQL and SQL databases as well as when and why to use each ki... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/sql-vs-nosql-tutorial/</link>
                <guid isPermaLink="false">66b20680125aeccef6f65d12</guid>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ SQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Beau Carnes ]]>
                </dc:creator>
                <pubDate>Wed, 14 Sep 2022 03:38:22 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/09/maxresdefault.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When should you use a SQL database and when should you use a NoSQL database?</p>
<p>We just published a course on the freeCodeCamp.org YouTube channel that will teach you the differences between NoSQL and SQL databases as well as when and why to use each kind of database.</p>
<p>Ania Kubow developed this course. Ania is one of the most popular tutorial creators on the freeCodeCamp.org YouTube channel.</p>
<p>In this course, you are going to go back to basics to learn what exactly a database management system (DBMS) is and how it's defined. You are then going to learn database design and why it's important as well as what a database management system is.</p>
<p>You'll then learn about relational databases followed by a SQL crash course. You will learn about non-relational databases and then learn the pros and cons of using relational databases versus non-relational databases. Finally, you will learn some use cases followed by a NoSQL crash course.</p>
<p>Here are the sections in this course:</p>
<ul>
<li>What actually is a database</li>
<li>What is a database management system</li>
<li>Demo: Creating a database</li>
<li>Common Database Models</li>
<li>Relational databases</li>
<li>SQL</li>
<li>Non-relational databases</li>
<li>Pros and Cons: Comparing RDBMS and NoSQL</li>
<li>Wide Column Database</li>
<li>Document Database</li>
<li>Key-Value Database</li>
<li>Multi-Model Databases</li>
<li>Use cases: When to use RDBMS or NoSQL</li>
</ul>
<p>Watch the full course below or <a target="_blank" href="https://youtu.be/FzlpwoeSrE0">on the freeCodeCamp.org YouTube channel</a> (1.5-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/FzlpwoeSrE0" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Start Using MongoDB – Database Setup for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ MongoDB is an increasingly popular open source NoSQL database. And it has many advantages over traditional SQL databases.  It offers high scalability, reliability, and performance even with a huge amount of data.  This article covers the basics that ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-start-using-mongodb/</link>
                <guid isPermaLink="false">66ba61190013ba5d5012bcbf</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MongoDB ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ valentine Gatwiri ]]>
                </dc:creator>
                <pubDate>Mon, 25 Jul 2022 21:42:56 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/07/pexels-tom-fisk-3285715.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>MongoDB is an increasingly popular open source NoSQL database. And it has many advantages over traditional SQL databases. </p>
<p>It offers high scalability, reliability, and performance even with a huge amount of data. </p>
<p>This article covers the basics that you need to know to get started with MongoDB and how to use it properly.</p>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li>A suitable IDE such as VS Code</li>
<li>A terminal</li>
</ul>
<h3 id="heading-what-youll-learn">What You'll Learn</h3>
<ul>
<li>What is MongoDB?</li>
<li>What is NoSQL?</li>
<li>How to install MongoDB</li>
<li>Hoe to setup MongoDB</li>
<li>How to run MongoDB</li>
</ul>
<h2 id="heading-what-is-a-nosql-database">What is a NoSQL Database?</h2>
<p>A NoSQL database is a non-relational database that does not use the traditional table-based schema of a relational database. </p>
<p>NoSQL databases are often used for big data and real-time web applications. MongoDB is one of the most popular NoSQL databases. It's fast, scalable, and uses JSON documents to store data. </p>
<h2 id="heading-why-should-i-use-no-sql">Why Should I Use No-SQL?</h2>
<p>No-SQL databases are powerful tools that can help you work with large amounts of data. They're especially good at handling unstructured data, so they can be a good choice if you're dealing with a lot of data that doesn't fit into a traditional relational database. </p>
<p>No-SQL databases can also be more scalable than relational databases, which is important if you're expecting your data to grow over time.</p>
<h2 id="heading-how-to-get-started-with-mongodb-install-guide">How to Get Started with MongoDB – Install Guide</h2>
<p>Install MongoDB using <a target="_blank" href="https://www.mongodb.com/docs/manual/administration/install-community/">this link</a> or use the instructions below if you are using Ubuntu: </p>
<ul>
<li>Import the public key</li>
</ul>
<pre><code class="lang-bash">sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5
</code></pre>
<ul>
<li>Create a list file for Ubuntu </li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse"</span> | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list
</code></pre>
<ul>
<li>Run the following command to update:</li>
</ul>
<pre><code class="lang-bash">sudo apt-get update
</code></pre>
<ul>
<li>Install the latest package</li>
</ul>
<pre><code class="lang-bash">sudo apt-get install -y mongodb-org
</code></pre>
<ul>
<li>Then run:</li>
</ul>
<pre><code class="lang-bash">sudo service mongod start
</code></pre>
<h2 id="heading-how-to-create-and-populate-the-mongodb-database">How to Create and Populate the MongoDB Database</h2>
<p>Once you have MongoDB installed, create a data directory where MongoDB will store its data files. By default, this is <code>/data/db</code>, but you can specify a different location if you prefer. Finally, start the MongoDB server by running <code>mongod</code> from the command line.</p>
<p>Make a directory for <code>dbPath</code> with the following command: </p>
<pre><code class="lang-bash">sudo mkdir -p /data/db 
sudo chown -R `id -un` /data/db
</code></pre>
<p>Then run <code>sudo mongod --port 27017</code>or <code>mongod</code> in a different terminal:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/image-214.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Your output format (also known as <code>structured logging</code>) for server logs in MongoDB 4.4+ should look like the above. Although the JSON format may initially seem intimidating, it is made to be used with common JSON tools and frameworks.</p>
<p>Enter the MongoDB shell using this command: </p>
<pre><code class="lang-bash">mongo
</code></pre>
<p>You will get the output shown below after running the following command:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/07/Screenshot-from-2022-07-24-18-37-20.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h2 id="heading-how-to-create-a-new-mongodb-database">How to Create a New MongoDB Database</h2>
<p>The first step in using MongoDB is creating a new database with the command <code>use mydatabase</code>. You can then create collections inside this database. Finally, you can populate your new collection.</p>
<pre><code> use record
 db.users.insert({<span class="hljs-attr">username</span>: <span class="hljs-string">"myname"</span>, <span class="hljs-attr">password</span>: <span class="hljs-string">"mypassword"</span>})
</code></pre><p>The  <code>use record</code> command switches the database to <code>record database</code>. The <code>db.users.insert(...)</code> command adds an input to the <code>users</code> table within the  <code>record</code> database.</p>
<p>Below is the output of the commands above:</p>
<pre><code>WriteResult({ <span class="hljs-string">"nInserted"</span> : <span class="hljs-number">1</span> })
</code></pre><p>Run the following command to view the record you created in the previous step:</p>
<pre><code> db.users.find()
</code></pre><p>The <code>db.users.find()</code> command searches the <code>users</code> table for all entries.<br>Your output yields the following result:</p>
<pre><code>{ <span class="hljs-string">"_id"</span> : ObjectId(<span class="hljs-string">"62dd6ab4a7d1ab0948574778"</span>), <span class="hljs-string">"username"</span> : <span class="hljs-string">"myname"</span>, <span class="hljs-string">"password"</span> : <span class="hljs-string">"mypassword"</span> }
</code></pre><h2 id="heading-how-to-add-new-records-to-your-database">How to Add New Records to Your Database</h2>
<p>To add new records, do the following:</p>
<pre><code> use record
 db.commerce.save({<span class="hljs-attr">scriptname</span>: <span class="hljs-string">"dygraph.min.js"</span>, <span class="hljs-attr">version</span>: <span class="hljs-string">"2.1.0"</span>})
 db.commerce.save({<span class="hljs-attr">scriptname</span>: <span class="hljs-string">"sortable.min.js"</span>, <span class="hljs-attr">version</span>: <span class="hljs-string">"0.8.0"</span>})
</code></pre><p>We've added two records to the <code>commerce</code> table, each with data specified by the <code>scriptname</code> and <code>version</code> attributes.</p>
<p>You should get something like this:</p>
<pre><code>WriteResult({ <span class="hljs-string">"nInserted"</span> : <span class="hljs-number">1</span> })
</code></pre><p>To view all the tables stored in your MongoDB database, run the following commands:</p>
<pre><code> use record
 show collections
</code></pre><p>You should see a similar output to the below:</p>
<pre><code>commerce
users
</code></pre><h2 id="heading-conclusion">Conclusion</h2>
<p>MongoDB is a powerful database system you can use for a variety of applications. It is easy to set up and use, and its scalability makes it a good choice for large-scale projects. </p>
<p>If you are new to database systems, MongoDB is a good place to start.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Relational VS Nonrelational Databases – the Difference Between a SQL DB and a NoSQL DB ]]>
                </title>
                <description>
                    <![CDATA[ This article is an overview of relational and non-relational databases.  Besides learning the fundamental differences between the two types of databases, you will also learn how to decide which one to use for your next project by going over their str... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/relational-vs-nonrelational-databases-difference-between-sql-db-and-nosql-db/</link>
                <guid isPermaLink="false">66b1e4a798966ccde43c3c59</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ SQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Dionysia Lemonaki ]]>
                </dc:creator>
                <pubDate>Mon, 18 Apr 2022 17:56:26 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/04/valeriia-svitlini-5w0ZbF8P5-4-unsplash.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>This article is an overview of relational and non-relational databases. </p>
<p>Besides learning the fundamental differences between the two types of databases, you will also learn how to decide which one to use for your next project by going over their strengths and weaknesses. </p>
<p>Here is what we'll cover:</p>
<ol>
<li><a class="post-section-overview" href="#definition">Defining a database</a><ol>
<li><a class="post-section-overview" href="#sql">What is SQL?</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#relational">Relational databases</a><ol>
<li><a class="post-section-overview" href="#characteristics">Characteristics</a></li>
<li><a class="post-section-overview" href="#acid">ACID properties</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#non-relational">Non-relational databases</a><ol>
<li><a class="post-section-overview" href="#types">Types</a></li>
<li><a class="post-section-overview" href="#base">BASE properties</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#pick">Relational VS Non-relational databases</a></li>
<li><a class="post-section-overview" href="#extra">Further Learning</a></li>
</ol>
<h2 id="heading-what-is-a-database-a-definition-for-beginners">What Is A Database? A Definition for Beginners <a></a></h2>
<p>When it comes to computing, data are pieces of information that come in different forms. Data can be text, numbers, images, audio snippets, or videos. </p>
<p>Collections of information need to be stored somewhere, processed, and interpreted. </p>
<p>You need a way to effortlessly search, access, extract and retrieve the saved resources whenever you need them. </p>
<p>This allows both computers and humans can analyze the accessed data, perform calculations and comparisons, make logical decisions, and reach a conclusion.</p>
<p>You can store the data in a file of some kind, using a software program like an Excel spreadsheet – and this can get the job done.</p>
<p>But what if there are large amounts of data, and you need to be sure they are accurate? </p>
<p>Or what if if you need to retrieve large data sets quickly?  </p>
<p>Or what if if the data needs to have a predefined structure that it should adhere to?</p>
<p>Databases are a much more accessible, efficient, and organized way of storing and working with information over a long period of time.</p>
<p>The ability to store data logically and systematically and retrieve it for use at a later date makes databases a critical part of all web applications.</p>
<p>Databases power all applications. They save and store user information such as usernames, email addresses, encrypted passwords, and physical addresses. </p>
<p>They also store user behavior. For example, in an e-commerce store, the database saves and keeps track of the items you have marked as 'favorites'.</p>
<p>You'll need a <strong>Database Management System</strong> (or DBMS for short) to manage your databases.</p>
<p>A Database Management System is a software program that serves as an intermediary between end-users and the database itself.</p>
<p>It allows its users to create and manage databases. It also allows them to access, modify, and manipulate the data stored in the database by performing operations known as queries. </p>
<p>Users can easily store, retrieve, update, and delete data with the help of a few commands.</p>
<p>When it comes to Database Management Systems, there are generally <strong>two</strong> types to choose from:</p>
<ul>
<li><strong>Relational Databases</strong> (also known as <strong>SQL Databases</strong>)</li>
<li><strong>Non-relational Databases</strong> (also known as <strong>NoSQL Databases</strong>)</li>
</ul>
<h3 id="heading-what-is-sql">What is SQL? <a></a></h3>
<p>SQL is short for <strong>S</strong>tructured <strong>Q</strong>uery <strong>L</strong>anguage. </p>
<p>You will likely hear it pronounced one of two ways – "<em>S. Q. L.</em>" (ess-kew-ell), or "<em>se-quel</em>" (like a sequel to a movie).</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2022/04/Screenshot-2022-04-13-at-6.25.32-PM.png" alt="Image" width="600" height="400" loading="lazy">
<em>https://i.imgur.com/NtGaNA8.png</em></p>
<p>Either way, SQL is a language used for dealing with databases. </p>
<p>Specifically, with SQL, you can write database queries to communicate with the database. These can be commands for performing any of the CRUD (Create Read Update Delete) operations.</p>
<p>SQL is the language of choice for Relational Database Management Systems, which you will learn all about in the following section.</p>
<h2 id="heading-what-is-a-relational-database">What Is A Relational Database? <a></a></h2>
<p>Relational databases (or SQL databases) have been around for a while. The first relational database appeared in 1970, and they are still popular to this day. Some of the most commonly used ones are:</p>
<ul>
<li><a target="_blank" href="https://www.postgresql.org/">PostgreSQL</a>  </li>
<li><a target="_blank" href="https://www.microsoft.com/en-us/sql-server/sql-server-downloads">Microsoft SQL Server</a></li>
<li><a target="_blank" href="https://www.mysql.com/">MySQL</a></li>
<li><a target="_blank" href="https://www.oracle.com/index.html">Oracle</a></li>
<li><a target="_blank" href="https://sqlite.org/index.html">SQLite</a></li>
</ul>
<p>A Relational database stores data in a structured and tabular way. That is,  it stores information in <strong>tables</strong>, which you can think of as storage containers for the data. For example, a company could have an <code>employees</code> table to store data on its employees.</p>
<p>Relational databases have a strict, static, and pre-defined logical <strong>schema</strong>. You can think of a database schema as an organizational blueprint – a set of rules for what can and cannot enter the table and the conditions for how to configure data.</p>
<p>In each table, there is at least one <strong>column</strong>. These columns have a specific data type, such as <code>INTEGER</code> or <code>VARCHAR</code>.  In the <code>employees</code> table, some columns could be <code>employee_id</code>, <code>name</code>, <code>department</code>, <code>email</code>, and <code>salary</code>.</p>
<p>The columns and the data types allowed in each column make up the schema.</p>
<pre><code class="lang-sql">             EMPLOYEES

+<span class="hljs-comment">-------------+------+------------+-------+--------+</span>
| employee_id | name | department | email | salary |
+<span class="hljs-comment">-------------+------+------------+-------+--------+</span>
</code></pre>
<p>A table will also have <strong>rows</strong>, or <em>records</em>. A record is a single data value entry that needs to adhere to the pre-defined schema. Essentially, it is a single item.</p>
<pre><code class="lang-sql">             EMPLOYEES
+<span class="hljs-comment">-------------+------------------+------------+-----------------------+--------+</span>
| employee_id |       name       | department |         email         | salary |
+<span class="hljs-comment">-------------+------------------+------------+-----------------------+--------+</span>
|           1 |  John Doe        | IT         | johndoe@company.com   |   3500 |
|           2 |  Kelly Kellinson | Marketing  | kelly@company.com     |   1500 |
|           3 |  Mike Manson     | Product    | mikekane@company.com  |   2300 |
+<span class="hljs-comment">-------------+------------------+------------+-----------------------+--------+</span>
</code></pre>
<p>And since Relational Databases support SQL, you can perform queries. For example, if you wanted to <code>view</code> the <code>names</code> of the <code>employees</code> , whose monthly salary is <code>greater than 2000 dollars</code>, then you would write the following SQL query:</p>
<pre><code class="lang-SQL"><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">FROM</span> employees
<span class="hljs-keyword">WHERE</span> salary &gt; <span class="hljs-number">2000</span>;
</code></pre>
<p>From the above query, you would get the following output:</p>
<pre><code class="lang-SQL">+<span class="hljs-comment">-------------+</span>
|    name     |
+<span class="hljs-comment">-------------+</span>
| John Doe    |
| Mike Manson |
+<span class="hljs-comment">-------------+</span>
</code></pre>
<h3 id="heading-characteristics-of-relational-databases">Characteristics of Relational Databases <a></a></h3>
<p>So far, you know that Relational Databases:</p>
<ul>
<li>are tabular in format,</li>
<li>are very organized, and the data stored is well-structured,</li>
<li>have a strict, rigid, and pre-defined schema,</li>
<li>use SQL for performing database queries and manipulating data.</li>
</ul>
<p>Additionally, a relational database can have more than one table, and as the name of this type of Database Management System suggests, the tables are <em>related</em> to one another.</p>
<p>For example, an e-commerce company may have a <code>products</code> table, a <code>users</code> table, an <code>emails</code> table, and an <code>orders</code> table.</p>
<p>Since there is a link and connection between the tables and the information stored in them, you can even join tables using a few commands.</p>
<p>There is a <em>primary key</em>, which acts as an identifier and ensures that each item in the table is unique, therefore making sure there is no duplicate and redundant data in tables. </p>
<p>And there is a <em>foreign key</em> that creates those pre-established relationships between tables.</p>
<p>Data points in different tables can have distinct relationships:</p>
<ul>
<li><strong>One-to-one relationships</strong>. In such cases, a record in one table is related only to one record in another table. An example of a one-to-one relationship in an e-commerce store, is that one user can have only one email address, and one email address can belong only to one user.</li>
<li><strong>One-to-many relationships</strong>.  In such cases, one record in one table is related to many other records in another table. For example, in an e-commerce store, a single user can make many orders, but each of those orders is made by a single user.</li>
<li><strong>Many-to-many relationships</strong>. In such cases, one or more records in one table can be related to one or more records in another table. For example, in an e-commerce store, one order can have many products and a product can be ordered many times.</li>
</ul>
<h3 id="heading-acid-properties-in-relational-databases">ACID Properties in Relational Databases <a></a></h3>
<p>Relational Databases offer the ACID database consistency model. </p>
<p>ACID is an acronym for <strong>A</strong>tomicity,  <strong>C</strong>onsistency, <strong>I</strong>solation,  <strong>D</strong>urability.</p>
<p><strong>Atomicity</strong> means that transactions are atomic and take an "all or nothing" approach. </p>
<p>For example, either the entire operation is successful and is completed from start to finish, or it is unsuccessful, and there is an entire operation "rollback". </p>
<p>All operations are guaranteed to end with either a success or a failure, and none are just partially successful. </p>
<p><strong>Consistency</strong> is the property that ensures that the database structure remains intact from the start of a transaction to the end. It makes sure that any data entering the database follows the rules and constraints that are set in place. It is what secures and maintains the integrity of data in relational databases.</p>
<p><strong>Isolation</strong> means that despite the number of transactions taking place at any moment in time, each transaction is treated as an atomic, separate unit, and transactions seem to occur in sequential order. </p>
<p>For example, if two transactions are happening at the same time, this property ensures that one transaction, and the changes occurring there, will not affect in any way the other transaction.</p>
<p>And finally, <strong>Durability</strong> means that any results and changes from the transactions are committed and thus permanent and will persist, even if there is a system failure.</p>
<p>Tge ACID model ensures that databases are reliable and secure.</p>
<h2 id="heading-what-is-a-non-relational-database">What Is A Non-Relational Database? <a></a></h2>
<p>A non-relational Databases is also referred to as a NoSQL database. You will often see that NoSQL stands for both "<strong>N</strong>ot <strong>o</strong>nly  <strong>SQL</strong>" and also "Non-SQL".</p>
<p>Either way, a non-relational database refers to a database that doesn't use the relational data model.</p>
<p>Although this term and this type of database have been around for decades, NoSQL databases started gaining momentum in the late 1990s, when the Internet increased in popularity. </p>
<p>Relational databases alone could not handle the speed – along with the large amounts and size of diverse and complex data – that this rise in internet use and the newly developed web applications required and demanded.</p>
<p>Some of the most popular Non-relational databases are:</p>
<ul>
<li><a target="_blank" href="https://www.mongodb.com/">MongoDB</a>,</li>
<li><a target="_blank" href="https://redis.io/">Redis</a>,</li>
<li><a target="_blank" href="https://cassandra.apache.org/_/index.html">Apache Cassandra</a>,</li>
<li><a target="_blank" href="https://cloud.google.com/bigtable">Google Cloud Bigtable</a>,</li>
<li><a target="_blank" href="https://aws.amazon.com/dynamodb/">Amazon DynamoDB</a>.</li>
</ul>
<p>A non-relational database does not store and organize data in a tabular format. There are no tables, rows, columns, or relationships between different data points.</p>
<p>Instead, data is stored in <strong>collections</strong>. The database is typically unstructured and uses a dynamic schema.</p>
<h3 id="heading-types-of-non-relationional-databases">Types of Non-Relationional Databases <a></a></h3>
<p>There are four major types of non-relational databases:</p>
<ul>
<li><strong>Column oriented databases</strong>,</li>
<li><strong>Key - value data stores</strong>,</li>
<li><strong>Document - oriented stores</strong>,</li>
<li><strong>Graph oriented databases</strong>.</li>
</ul>
<p><strong>Column-oriented databases</strong> are similar in concept to relational databases. But they use groups, or sets of columns (also known as column families) instead of rows to logically organize related data.</p>
<p>You can access a column family independently by using a unique row key associated with an individual column. Searching for specific data is much faster and saves significant time since there is no need to go through rows of unrelated information to find what you are searching for.</p>
<p><strong>Key-value stores</strong> are one of the simplest types of non-relational databases.</p>
<p>Data is stored in dictionaries or hash tables in the form of key-value pair collections. </p>
<p>This type of database has keys that need to be unique. </p>
<p>Keys act as a pointer to a specific value and are associated with that value.</p>
<p>The value assigned to a key can be any piece of information and data type. </p>
<p>To retrieve and access the value, you use the unique key as a reference.</p>
<p><strong>Document-oriented stores</strong> also store data in key-value pair fashion. But in this case, the value is a document that has a unique key as its identifier.</p>
<p>The document has any format, such as XML, YAML, or binary, but typically it has a JSON format.</p>
<p>This type of database stores data in a semi-structured way.</p>
<p>There is no schema or predefined structure. Because of this, it offers flexibility and the ability to re-arrange and re-work the structure of the database if the project's requirements change.</p>
<p>It also provides a SQL-like type of query language or an API to perform queries and CRUD operations on the data.</p>
<p><strong>Graph databases</strong> are the most complex type of non-relational database, and they can handle large sets of data. </p>
<p>They focus on the connections and relationships between data elements and use graph theory to store, search, and manage those relations. </p>
<p>They use <em>nodes</em> to store data and represent an individual entity or piece of data. One node is connected and linked to another node. </p>
<p>To represent the connections or relationships between entities, graph databases use <em>edges</em>.</p>
<h3 id="heading-base-properties-in-non-relational-databases">BASE Properties in Non-relational Databases <a></a></h3>
<p>Non-relational databases offer the BASE database consistency model. This model is not as rigid as the ACID model of relational databases.</p>
<p>BASE is an acronym for:</p>
<ul>
<li><strong>B</strong>asic <strong>A</strong>vailability. This model does not focus on the immediate consistency of data. However, the system appears to be continuously working and guarantees the availability of data at all times.</li>
<li><strong>S</strong>oft state. Because of the lack of immediate consistency, the state of the system may change over time. A soft state means the system doesn't need to be write-consistent.</li>
<li><strong>E</strong>ventual consistency. The main priority is the constant availability of data and not that of data consistency. However, eventually and at some point, you can expect data to be consistent. This may occur when the system stops receiving input.</li>
</ul>
<h2 id="heading-how-to-choose-between-sql-and-nosql-databases">How to Choose Between SQL and NoSQL Databases <a></a></h2>
<p>After learning the basics of SQL and NoSQL databases, you might be wondering which one of the two to choose for your project.</p>
<p>Well, there isn't a clear answer to that question. </p>
<p>Both databases have advantages and disadvantages, and it largely depends on the type of application you are building, the kind of data you will be working with, and your future goals.</p>
<p>It is common for companies to use both types of databases for their products.</p>
<p>Below is a quick summary of their characteristics to help you decide which one might be the right fit for you.</p>
<h3 id="heading-when-to-use-an-sql-database">When to use an SQL database:</h3>
<ul>
<li>You need highly structured data distributed across multiple tables. You need your data to adhere to a strict, predictable, predefined, and already planned schema.</li>
<li>Your data will remain relatively the same. SQL databases are convenient if you don't plan on frequently changing the structure of the database and don't need to regularly update items. Keep in mind that they offer little flexibility.</li>
<li>You need consistent data.</li>
<li>Data integrity and security are a priority.</li>
<li>You want accurate results for complex queries.</li>
</ul>
<p>A disadvantage of SQL databases is that they scale vertically.</p>
<p>You will need to increase the hardware and computing power effort on your current machine as you gather and store more data. </p>
<p>This can be costly. </p>
<p>An increase in processing power and memory storage is needed to handle an increase in load to improve performance.</p>
<h3 id="heading-when-to-use-a-nosql-database">When to use a NoSQL database:</h3>
<ul>
<li>You are working in a fast development environment that requires frequent adaptations of requirements and constant changes to the database structure.</li>
<li>You are working with large amounts of data that are diverse in nature but do not require a lot of structure or accuracy.</li>
<li>You are working with data that needs frequent updates. NoSQL databases offer a loose, flexible, and dynamic schema that allows for regular changes to the data.</li>
<li>You want speedy query results and continuous availability of the system.</li>
<li>You don't want to perform any upfront planning, preparing, or designing of the database, but want to immediately start building instead.</li>
</ul>
<p>A big advantage of NoSQL databases is that they scale horizontally.</p>
<p>They are designed in a way that more machines can be added to the existing machine (such as cloud servers). This behavior is more desirable compared to vertical scaling that requires additional CPU (Central Processing Unit) or RAM (Random Access Memory) resources.</p>
<p>But of course, a disadvantage of NoSQL databases is that they do not ensure data integrity and consistency.</p>
<h2 id="heading-further-learning">Further Learning <a></a></h2>
<p>This article has just scratched the surface, and the best way to learn is by doing.</p>
<p>Here are some learning resources to learn more about databases and SQL:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-free-relational-database-courses-for-beginners/">Learn SQL – Free Relational Database Courses for Beginners</a>. Bookmark this article for a list of free SQL courses.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/learn/relational-database/">freeCodeCamp's Relational Database Certification</a>. In this course, you will learn the necessary developer tools. Then you will learn how to use a code editor, the command line, and Git. You will also learn to work with PostgreSQL (a relational database management system) and SQL – its query language.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-nosql-in-3-hours/">Learn About NoSQL Databases in This 3-hour Course</a>. In this course, you will learn about the four different NoSQL database types. Besides just learning the theory, you will also practice building all four of them.<h2 id="heading-conclusion">Conclusion</h2>
</li>
</ul>
<p>You have made it to the end of the article!</p>
<p>Hopefully, it has helped you understand the primary differences between Relational and Non-Relational databases. You also have some extra resources to start learning and to put your new skills to practice.</p>
<p>Thanks for reading, and happy coding!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ AWS DynamoDB – NoSQL Database Guide for Beginners ]]>
                </title>
                <description>
                    <![CDATA[ What is DynamoDB? DynamoDB is a fully managed NoSQL database from AWS. DynamoDB is similar to other NoSQL databases like MongoDB, except for the fact that you don’t have to do any maintenance or scaling on your part. DynamoDB can handle more than 10... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/aws-dynamodb-database-guide-for-beginners/</link>
                <guid isPermaLink="false">66d035af15ea3036a9539922</guid>
                
                    <category>
                        <![CDATA[ AWS ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Manish Shivanandhan ]]>
                </dc:creator>
                <pubDate>Tue, 11 Jan 2022 16:50:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/01/dynamodb.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <h2 id="heading-what-is-dynamodb">What is DynamoDB?</h2>
<p>DynamoDB is a fully managed <a target="_blank" href="https://www.mongodb.com/nosql-explained">NoSQL database</a> from AWS. DynamoDB is similar to other NoSQL databases like MongoDB, except for the fact that you don’t have to do any maintenance or scaling on your part.</p>
<blockquote>
<p>DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second — via AWS Documentation.</p>
</blockquote>
<p>DynamoDB offers built-in security, on-demand, and point-in-time backups, cross-region replication, in-memory caching, and many other features that support business-critical workloads. </p>
<p>Most importantly, DynamoDB works seamlessly with other AWS applications like S3 and Lambda.</p>
<p>But before we get into the article, it's important that you understand the concept of NoSQL databases.</p>
<h2 id="heading-what-are-nosql-databases">What are NoSQL Databases?</h2>
<p>NoSQL stands for “<strong>not only SQL</strong>”. Simply put, NoSQL databases store documents in a format similar to JSON, while relational databases store data in the form of a table. </p>
<p>NoSQL offers more flexibility in terms of data modeling and does not force you to have a schema to store documents. </p>
<p>A few types of NoSQL databases include pure document databases (like MongoDB), key-value stores (like DynamoDB), wide-column databases (like Cassandra), and graph databases (like Neo4j). <a target="_blank" href="https://www.couchbase.com/resources/why-nosql">Learn more about NoSQL databases here</a>.</p>
<p>Great. Now let’s look at some of the features of DynamoDB.</p>
<h2 id="heading-core-features-of-dynamodb">Core Features of DynamoDB</h2>
<h3 id="heading-autoscaling">Autoscaling</h3>
<p>Probably the most important feature of DynamoDB is that it delivers automatic scaling of throughput and storage based on the performance or usage of your application. </p>
<p>In a typical database server, the sysadmin takes care of scaling when the application encounters higher than usual traffic. </p>
<p>With DynamoDB, you can create database tables that can store and retrieve any amount of data, and the scaling is automatically managed by AWS. This includes scaling up for higher traffic and scaling down for lower traffic, so you only pay for what you use.</p>
<h3 id="heading-data-models">Data Models</h3>
<p>DynamoDB supports both key-value and document data models. This enables you to have a flexible schema, so each row can have any number of columns at any point in time. This is crucial for growing businesses that have ever-changing requirements.</p>
<p>Re-defining database schema is a nightmare that many developers/database admins go through in a growing application. This data model flexibility offers a robust database solution for small as well as large businesses.</p>
<h3 id="heading-replication">Replication</h3>
<p>AWS takes care of DynamoDB table replication automatically based on your choice of AWS regions (cross-region replication). Even distributed applications can have single-digit millisecond read and write performance using DynamoDB.</p>
<p>With replication in place, you don't have to worry about data availability. In the event of the primary source failure, you can easily access the data from a secondary reserve, reducing the probability of application downtime.</p>
<h3 id="heading-backups-amp-recovery">Backups &amp; Recovery</h3>
<p>DynamoDB provides on-demand backups for your tables that you can enable within the AWS console. You can also enable automatic backup and archiving of your data to other AWS solutions like S3.</p>
<p>DynamoDB also offers Point-in-time recovery. This protects your data from accidental write/delete operations. </p>
<p>With Point-in-time recovery, you can restore your database to any point in time for the last 35 days. Point-in-time recovery is achieved by storing incremental backups of your database and that is managed automatically by AWS.</p>
<h3 id="heading-security">Security</h3>
<p>DynamoDB encrypts data at rest by default and also in transit using the keys stored in AWS Key Management Service (or customer-provided keys). </p>
<p>With encryption in place, you can build security-sensitive applications that meet compliance and regulatory requirements. DynamoDB also provides access control via <a target="_blank" href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html">AWS IAM roles.</a></p>
<h3 id="heading-monitoring">Monitoring</h3>
<p>Monitoring is crucial to any business-critical application. It helps maintain reliability and also notifies personnel in case of an event or failure. </p>
<p>AWS offers detailed monitoring tools like CloudWatch Logs, CloudWatch Events, and CloudTrail Logs that will help you to watch, notify, and debug all types of events in DynamoDB. You can also set custom triggers based on metrics like system errors, capacity usage, and so on.</p>
<p>Now let’s compare DynamoDB with two of the popular database alternatives — MySQL and MongoDB.</p>
<h2 id="heading-dynamodb-vs-mysql">DynamoDB vs MySQL</h2>
<p>There is a major difference between MySQL and MongoDB because MySQL is a relational database. In terms of benefits, I think MySQL is limited because of the requirement of having a schema before you can start pushing data.</p>
<p>But MySQL is great for many use cases as well. It is often called “The world’s most popular open-source database” and it delivers a fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server.</p>
<p>But being a NoSQL database gives DynamoDB much more flexibility in terms of data modeling. </p>
<p>Even though AWS provides managed services for MySQL and other relational databases, DynamoDB is a database designed by AWS and not just a hosted database solution. So this offers more improvements and features that MySQL and other relational databases can’t.</p>
<h2 id="heading-dynamodb-vs-mongodb">DynamoDB vs MongoDB</h2>
<p>DynamoDB and MongoDB are closely related to each other since both are NoSQL databases. But since DynamoDB is built and maintained by AWS it offers many more features and integrations, especially with other Amazon services like S3, compared to MongoDB.</p>
<p>If I were running a growing company I would prefer using DynamoDB solely because of its scalability and cross-region replication features. AWS does not offer a managed MongoDB service but if you are looking for one, <a target="_blank" href="https://www.mongodb.com/atlas/database">MongoDB Atlas</a> would be a great alternative.</p>
<p>Another important feature of DynamoDB over MongoDB is that MongoDB is not secure by default and you have to configure security yourself. DynamoDB is secure by default, so it might be a better option if security is a deal-breaker for you.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>AWS DynamoDB is a fully managed NoSQL database that can scale in and scale out based on demand. AWS takes care of typical functions including software patching, replication, and maintenance. </p>
<p>DynamoDB also offers encryption at rest, point-in-time snapshots, and powerful monitoring capabilities. In a nutshell, it is a great option when you are building an application that needs a high-performance scalable NoSQL database.</p>
<p><em>Loved this article?</em> <a target="_blank" href="http://tinyletter.com/manishmshiva"><strong><em>Join my Newsletter</em></strong></a> <em>and get a summary of my articles and videos every Monday</em> morning<em>.</em> You can also <a target="_blank" href="https://www.hardcoder.io/"><strong>visit my blog here.</strong></a></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Learn About NoSQL Databases in This 3-hour Course ]]>
                </title>
                <description>
                    <![CDATA[ NoSQL Databases can sometimes seem confusing and overwhelming, partly because of their flexibility. This is why we have put together a 3-hour video course to help you understand exactly what a NoSQL Database is, as well as the different types availab... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-nosql-in-3-hours/</link>
                <guid isPermaLink="false">66b0a8b46428eb897141f8bf</guid>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ youtube ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ ania kubow ]]>
                </dc:creator>
                <pubDate>Mon, 29 Nov 2021 15:47:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/11/nosql.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>NoSQL Databases can sometimes seem confusing and overwhelming, partly because of their flexibility.</p>
<p>This is why we have put together a 3-hour video course to help you understand exactly what a NoSQL Database is, as well as the different types available to you. </p>
<p>By the end of this course, you will have built 4 databases based on the 4 main types, and you'll have practised your learnings by building out projects. </p>
<p>But first, let's start with the basics.</p>
<h2 id="heading-what-is-nosql">What is NoSQL?</h2>
<p>So the first thing you need to know is that NoSQL is an <strong>approach</strong> to database management.</p>
<p>It’s considered to be super flexible as it allows for a variety of data models, such as  'key-value', 'document', 'wide-column or tabular' and 'graph' formats.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/nosql-types.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>These are the 4 we will be looking at closely in the <strong>video course</strong>, as well as the new emerging trend of Multi Model Databases. </p>
<p>With each deep-dive on the 4 NoSQL database types, we will be approaching each learning as an explanation, example, and exercise – so the 3 E’s – in order to fully grasp the topic we are discussing.</p>
<h2 id="heading-how-do-databases-work">How do Databases Work?</h2>
<p>Databases have multiple layers. The first layer is an interface, or in other words a visual platform where you can visit and interact with data. This is where you'll find the format, the language, and the transport. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/Screenshot-2021-09-14-at-19.17.49.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>In this video course, the interface we are going to use is called Datastax Astra Database management system. This is where we will be creating all 4 of our database types for the example and exercise parts. </p>
<p>DataStax Astra DB is an autoscaling database-as-a-service built on Apache Cassandra, designed to simplify cloud-native application development. </p>
<p>Because it is built on Apache Cassandra, you will see us using the Cassandra Query Language, or CQL, a few times in this course. CQL offers a model close to SQL in the sense that data is put in tables containing rows of columns. These languages are how we interact with the data in our database.</p>
<p>The next layer of a database is the execution layer. This is where we parse the incoming queries, coming from our interface. It is also used as an analyzer and a dispatcher.</p>
<p>And finally we have the storage layer, where the indexing of data happens.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/database.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Using Datastax Astra will allow us to create all 4 types of database types for this tutorial, so I won’t have to sign up to separate database management systems for each section. But you don't have to use it. There are literally dozens and dozens to choose from, so feel free to take your pick. </p>
<h2 id="heading-lets-get-to-it">Let's get to it!</h2>
<p>Now that you know what NoSQL databases types we will be learning about, as well as how Databases work, let's get to learning more about each one in detail.</p>
<p>Here are the topics this course will cover:</p>
<ul>
<li>What is NoSQL?</li>
<li>Why use NoSQL?</li>
<li>SQL vs NoSQL</li>
<li>How to set up our Database</li>
<li>Tabular Type</li>
<li>Document Type</li>
<li>Key-value Type</li>
<li>Graph Type</li>
<li>Multi-Model Type explained</li>
<li>Project – How to use the Document API</li>
<li>Project – How to use the GraphQL API</li>
<li>Where to go next</li>
</ul>
<p>Watch the course below or <a target="_blank" href="https://www.youtube.com/watch?v=xh4gy1lbL2k">on the freeCodeCamp.org YouTube channel</a> (3-hour watch).</p>
<div class="embed-wrapper">
        <iframe width="560" height="315" src="https://www.youtube.com/embed/xh4gy1lbL2k" style="aspect-ratio: 16 / 9; width: 100%; height: auto;" title="YouTube video player" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" loading="lazy"></iframe></div>
<p>Follow me on Youtube for more videos on Software Development:</p>
<figure><a class="kg-bookmark-container" href="https://www.youtube.com/channel/UC5DNytAJ6_FISueUfzZCVsw"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Code with Ania Kubów</div><div class="kg-bookmark-description">Hello everyone. This channel is run by Ania Kubow. In this channel, I will be teaching you JavaScript,React, HTML, CSS, React-native, Node.js and so much more! A little bit about me:My background is in the financial markets, where I worked as a derivates broker our of University. After starting m…</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.youtube.com/s/desktop/6b151e52/img/favicon_144.png" width="144" height="144" alt="favicon_144" loading="lazy"><span class="kg-bookmark-publisher">YouTube</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://yt3.ggpht.com/ytc/AAUvwnjSRt8sIbeM7P--pHoUDh67sDhaNTCMF_XiNOCvUw=s900-c-k-c0x00ffffff-no-rj" width="900" height="900" alt="AAUvwnjSRt8sIbeM7P--pHoUDh67sDhaNTCMF_XiNOCvUw=s900-c-k-c0x00ffffff-no-rj" loading="lazy"></div></a></figure>


 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The Apache Cassandra Beginner Tutorial ]]>
                </title>
                <description>
                    <![CDATA[ By Sebastian Sigl There are lots of data-storage options available today. You have to choose between managed or unmanaged, relational or NoSQL, write- or read-optimized, proprietary or open-source — and it doesn't end there. Once you begin your searc... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-apache-cassandra-beginner-tutorial/</link>
                <guid isPermaLink="false">66d461053bc3ab877dae2232</guid>
                
                    <category>
                        <![CDATA[ apache ]]>
                    </category>
                
                    <category>
                        <![CDATA[ backend ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cassandra ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 15 Jul 2021 13:13:02 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/07/cassandra-welcome.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Sebastian Sigl</p>
<p>There are lots of data-storage options available today. You have to choose between managed or unmanaged, relational or NoSQL, write- or read-optimized, proprietary or open-source — and it doesn't end there.</p>
<p>Once you begin your search, you will end up in the universe that is database marketing. All of the vendors will tell you why their database is fantastic. </p>
<p>Unfortunately, it's difficult to find out when not to use a specific database, because this is not an attractive selling point.</p>
<p>If you know what questions to ask, you will eventually understand all the essential properties of a given system. In the end, your choice will depend on your expertise and your requirements.</p>
<p>In this tutorial I will introduce you to Apache Cassandra, a distributed, horizontally scalable, open-source database. Or as Cassandra users like to describe Cassandra: "It's a database that puts you in the driver seat."</p>
<p>I will share the essential gotchas and provide references to documentation. I’ll also provide insights based on my experience of running Cassandra on a large scale at work, with executable examples wherever possible.</p>
<p>Here’s an overview of everything you'll learn:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-61.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Along the way, you will learn to ask fundamental questions that will help you to chose a database that suits your needs. You'll also learn about other popular databases like Spanner, Cockroach, or FaunaDB, and how they can serve different use-cases.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><a class="post-section-overview" href="#heading-how-to-set-up-a-cassandra-cluster">How to Set Up a Cassandra Cluster</a></li>
<li><a class="post-section-overview" href="#heading-cassandra-architecture">Cassandra Architecture</a><ul>
<li><a class="post-section-overview" href="#heading-decentralization">Decentralization</a></li>
<li><a class="post-section-overview" href="#heading-every-node-is-a-coordinator">Every Node Is a Coordinator</a></li>
<li><a class="post-section-overview" href="#heading-data-partitioning">Data Partitioning</a></li>
<li><a class="post-section-overview" href="#heading-replication">Replication</a></li>
<li><a class="post-section-overview" href="#heading-consistency-level">Consistency Level</a></li>
<li><a class="post-section-overview" href="#heading-tune-for-consistency-by-setting-up-a-strong-consistency-application">Tune for Consistency by Setting up a Strong Consistency Application</a></li>
<li><a class="post-section-overview" href="#heading-tune-for-performance-by-using-eventual-consistency">Tune for Performance by Using Eventual Consistency</a></li>
<li><a class="post-section-overview" href="#heading-understanding-compaction">Understanding Compaction</a></li>
<li><a class="post-section-overview" href="#heading-presorting-data-on-cassandra-nodes">Presorting Data on Cassandra Nodes</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-data-modeling">Data Modeling</a><ul>
<li><a class="post-section-overview" href="#heading-keep-data-in-sync-using-batch-statements">Keep Data in Sync Using <code>BATCH</code> Statements</a></li>
<li><a class="post-section-overview" href="#heading-use-foreign-keys-instead-of-duplicating-data-in-cassandra">Use Foreign Keys Instead of Duplicating Data in Cassandra</a></li>
<li><a class="post-section-overview" href="#heading-indexes-in-cassandra">Indexes in Cassandra</a></li>
<li><a class="post-section-overview" href="#heading-materialized-views">Materialized Views</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-running-a-cluster">Running a Cluster</a><ul>
<li><a class="post-section-overview" href="#heading-fully-managed-cassandra">Fully Managed Cassandra</a></li>
<li><a class="post-section-overview" href="#heading-self-managed-cassandra">Self-Managed Cassandra</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-other-learnings">Other Learnings</a><ul>
<li><a class="post-section-overview" href="#heading-data-migrations">Data Migrations</a></li>
<li><a class="post-section-overview" href="#heading-tombstones">Tombstones</a></li>
<li><a class="post-section-overview" href="#heading-updates-are-just-inserts-and-vice-versa"><code>UPDATE</code>s Are Just <code>INSERT</code>s, and Vice Versa</a></li>
<li><a class="post-section-overview" href="#heading-lightweight-transactions">Lightweight Transactions</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
<li><a class="post-section-overview" href="#heading-references">References</a></li>
</ul>
<h2 id="heading-how-to-set-up-a-cassandra-cluster">How to Set Up a Cassandra Cluster</h2>
<p>To execute the examples of this tutorial, you'll need a running Cassandra cluster. You can get this up and running quickly by using <a target="_blank" href="https://docs.docker.com/get-docker/">Docker</a>.</p>
<blockquote>
<p><strong>Required Docker settings</strong>  </p>
<p>Your device should have a minimum of 8GB of memory and at least 8GB of free disk space. Your Docker settings should be updated to be able to use at least 6GB of memory, or better, 8GB.  </p>
<p>To apply these suggestions, open your Docker preferences, go to Resources, and increase your memory threshold.</p>
</blockquote>
<p>Cassandra is built for scale, and some features only work on a multi-node Cassandra cluster, so let’s start one locally.</p>
<p>For Linux and Mac, run the following commands:</p>
<pre><code class="lang-shell"># Run the first node and keep it in background up and running
docker run --name cassandra-1 -p 9042:9042 -d cassandra:3.7
INSTANCE1=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-1)
echo "Instance 1: ${INSTANCE1}"

# Run the second node
docker run --name cassandra-2 -p 9043:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1 cassandra:3.7
INSTANCE2=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-2)
echo "Instance 2: ${INSTANCE2}"

echo "Wait 60s until the second node joins the cluster"
sleep 60

# Run the third node
docker run --name cassandra-3 -p 9044:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1,$INSTANCE2 cassandra:3.7
INSTANCE3=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-3)
</code></pre>
<p>For Windows, run the following commands in PowerShell:</p>
<pre><code class="lang-shell"># Run the first node and keep it in background up and running
docker run --name cassandra-1 -p 9042:9042 -d cassandra:3.7
$INSTANCE1=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-1)
echo "Instance 1: ${INSTANCE1}"

# Run the second node
docker run --name cassandra-2 -p 9043:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1 cassandra:3.7
$INSTANCE2=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-2)
echo "Instance 2: ${INSTANCE2}"

echo "Wait 60s until the second node joins the cluster"
sleep 60

# Run the third node
docker run --name cassandra-3 -p 9044:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1,$INSTANCE2 cassandra:3.7
$INSTANCE3=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-3)
</code></pre>
<p>The startup process can take a few minutes.</p>
<p>You can verify if everything is done and ready by executing a Cassandra utility tool called <code>nodetool</code> via <code>docker exec</code> on a node:</p>
<pre><code class="lang-shell">$ docker exec cassandra-3 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.17.0.3  112.69 KiB  256          68.7%             bb5ef231-0dd2-4762-a447-806a45f710ac  rack1
UN  172.17.0.2  107.96 KiB  256          68.3%             d7392374-8daa-4292-b724-cb790b0ee6ad  rack1
UN  172.17.0.4  93.93 KiB  256          63.0%             386d094f-5483-4945-a1a7-2bb3975d6167  rack1
</code></pre>
<p>UN means <strong>U</strong>p and <strong>N</strong>ormal. Here, all 3 nodes are running and healthy.</p>
<p>In this tutorial we will send lots of queries to Cassandra. I recommend starting a new shell and connecting to one node using <code>cqlsh</code>. Here's how to start a <code>cqlsh</code> shell in Docker:</p>
<pre><code class="lang-shell">$ docker exec -it cassandra-1 cqlsh

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh&gt;
</code></pre>
<p>And to execute your first query:</p>
<pre><code class="lang-shell">cqlsh&gt; DESCRIBE keyspaces;

system_traces  system_schema  system_auth  system  system_distributed
</code></pre>
<p>The response shows all the existing keyspaces. Keyspaces group tables and are similar to a database in a traditional relational database system. In other systems, groups of certain items are also known as namespaces.</p>
<p>Before you begin creating tables and inserting data, first create a keyspace in your local datacenter, which should replicate data 3 times:</p>
<pre><code class="lang-shell">cqlsh&gt; CREATE KEYSPACE learn_cassandra
  WITH REPLICATION = { 
   'class' : 'NetworkTopologyStrategy',
   'datacenter1' : 3 
  };
</code></pre>
<p>A keyspace with a replication factor of 3 using the <code>NetworkTopologyStrategy</code> was created. The strategy defines how data is replicated in different datacenters. This is the recommended strategy for all user created keyspaces.</p>
<blockquote>
<p><strong>Why should you start with 3 nodes?</strong>  </p>
<p>It’s recommended to have at least 3 nodes or more. One reason is, in case you need  strong consistency, you need to get confirmed data from at least 2 nodes. Or if 1 node goes down, your cluster would still be available because the 2 remaining nodes are up and running.  </p>
<p>You don’t need to fully understand this yet. After reading through the rest of this tutorial, things should be more clear.</p>
</blockquote>
<p>Now, all the nodes are up and healthy. You have a 3-node Cassandra setup listening on ports 9042, 9043, and 9044 for client requests. This is a realistic setup for a small cluster.  </p>
<p>In production, the instances would run on different machines to maximize performance. </p>
<p>Before you start creating tables, reading, and writing data, it's helpful to understand the basics of designing tables for scalability.  </p>
<p>In this tutorial, you will create tables with different settings for a to-do list application. If you want to get your hands dirty straight away, you can jump directly to the next <code>cqlsh</code> example.</p>
<h2 id="heading-cassandra-architecture">Cassandra Architecture</h2>
<p>Cassandra is a decentralized multi-node database that physically spans separate locations and uses replication and partitioning to infinitely scale reads and writes.</p>
<h3 id="heading-decentralization">Decentralization</h3>
<p>Cassandra is decentralized because no node is superior to other nodes, and every node acts in different roles as needed without any central controller. We'll get into examples of decentralization a bit later in this section.</p>
<p>Cassandra's decentralized property is what allows it to handle situations easily in case one node becomes unavailable or a new node is added.</p>
<h3 id="heading-every-node-is-a-coordinator">Every Node Is a Coordinator</h3>
<p>Data is replicated to different nodes. If certain data is requested, a request can be processed from any node.</p>
<p>This initial request receiver becomes the coordinator node for that request. If other nodes need to be checked to ensure consistency then the coordinator requests the required data from replica nodes.</p>
<p>The coordinator can calculate which node contains the data using a so-called <a target="_blank" href="https://cassandra.apache.org/doc/latest/architecture/dynamo.html?highlight=consistency#dataset-partitioning-consistent-hashing">consistent hashing algorithm</a>.</p>
<p><img src="https://lh6.googleusercontent.com/uSbZsiHVeCQ4Vqm_ow9951lfr1a-ZBaNqJWc03rhCn_Wn85qTYVhU3E0pXIU3giWC1juYN2ro8BRejURNu9J4NHcsin2vae3TPLvdeniOur2h1KZgPzmOKPaZMZ6KnIfm6jp1see" alt="Image" width="1600" height="930" loading="lazy">
<em>Every node can be a coordinator</em></p>
<p>The coordinator is responsible for many things, such as request batching, repairing data, or retries for reads and writes.</p>
<h3 id="heading-data-partitioning">Data Partitioning</h3>
<blockquote>
<p>“[Partitioning] is a method of splitting and storing a single logical dataset in multiple databases. By distributing the data among multiple machines, a cluster of database systems can store larger datasets and handle additional requests.  </p>
<p>”<a target="_blank" href="https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6">How Sharding Works</a> by <a target="_blank" href="https://medium.com/@jeeyoungk">Jeeyoung Kim</a></p>
</blockquote>
<p>As with many other databases, you store data in Cassandra in a predefined schema. You need to define a table with columns and types for each column. </p>
<p>Additionally, you need to think about the primary key of your table. A primary key is mandatory and ensures data is uniquely identifiable by one or multiple columns. </p>
<p>The concept of primary keys is more complex in Cassandra than in traditional databases like MySQL. In Cassandra, the primary key consists of 2 parts: </p>
<ul>
<li>a mandatory partition key and</li>
<li>an optional set of clustering columns.</li>
</ul>
<p>You will learn more about the partition key and clustering columns in the data modeling section.</p>
<p>For now, let's focus on the partition key and its impact on data partitioning.</p>
<p>Consider the following table:</p>
<pre><code class="lang-shell">Table Users | Legend: p - Partition-Key, c - Clustering Column

country (p) | user_email (c)  | first_name | last_name | age
----------------------------------------------------------------
US          | john@email.com  | John       | Wick      | 55  
UK          | peter@email.com | Peter      | Clark     | 65  
UK          | bob@email.com   | Bob        | Sandler   | 23 
UK          | alice@email.com | Alice      | Brown     | 26
</code></pre>
<p>Together, the columns <code>user_email</code> and <code>country</code> make up the primary key.</p>
<p>The <code>country</code> column is the partition key (p). The <code>CREATE</code>-statement for the table looks like this:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.users_by_country (
    country text,
    user_email text,
    first_name text,
    last_name text,
    age smallint,
    PRIMARY KEY ((country), user_email)
);
</code></pre>
<p>The first group of the primary key defines the partition key. All other elements of the primary key are clustering columns:</p>
<p><img src="https://lh4.googleusercontent.com/6WeEN0k3xnVfyOsFkZQctzCzUitUSPpM-kev6u5AvnzxCycPudQqfTX6XkiYwupwZ8XHCRJSwcGw1tB4BJe8qhZFybxshs1BZs6DlRg-Re0UCkyvS0oDRkUJhriqSYbjU7sdzMaK" alt="Image" width="1600" height="1087" loading="lazy"></p>
<p>Let’s fill the table  with some data:</p>
<pre><code class="lang-shell">cqlsh&gt; 
INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
  VALUES('US', 'john@email.com', 'John','Wick',55);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
  VALUES('UK', 'peter@email.com', 'Peter','Clark',65);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
  VALUES('UK', 'bob@email.com', 'Bob','Sandler',23);

INSERT INTO learn_cassandra.users_by_country (country,user_email,first_name,last_name,age)
  VALUES('UK', 'alice@email.com', 'Alice','Brown',26);
</code></pre>
<p>If you’re used to designing traditional relational database tables like it’s taught in school or university, you might be surprised. Why would you use <code>country</code> as an essential part of the primary key? </p>
<p>This example will make sense after you understand the basics of partitioning in Cassandra.</p>
<p>Partitioning is the foundation for scalability, and it is based on the partition key. In this example, partitions are created based on <code>country</code>. All rows with the <code>country</code> <code>US</code> are placed in a partition. All other rows with the country <code>UK</code> will be stored in another partition. </p>
<p>In the context of partitioning, the words partition and shard can be used interchangeably.</p>
<p><img src="https://lh4.googleusercontent.com/_APEp3Q3ugdLt1SR53Dej2x5_zOd17QrDFoBzVw9EFx6a0buHe9-A6eBZSAPRlPx-nyd_qU9WpUBcQIxN8uQDSFA_D3hWsFVb5TagJu3Y0fyRdpV0zdBTp8xZE4QWHIgfUg58AZo" alt="Image" width="1600" height="730" loading="lazy"></p>
<p>Partitions are created and filled based on partition key values. They are used to distribute data to different nodes. By distributing data to other nodes, you get scalability. You read and write data to and from different nodes by their partition key. </p>
<p>The distribution of data is a crucial point to understand when designing applications that store data based on partitions. It may take a while to get fully accustomed to this concept, especially if you are used to relational databases. </p>
<p>Instead, think about how you read and write data and how partitioning should be done to scale horizontally.</p>
<blockquote>
<p><strong>What does horizontal scaling mean?</strong>  </p>
<p>Horizontal scaling means you can increase throughput by adding more nodes. If your data is distributed to more servers, then more CPU, memory, and network capacity is available.</p>
</blockquote>
<p>You might ask, then why do you even need <code>email</code> in the primary key?</p>
<p>The answer is that the primary key defines what columns are used to identify rows. You need to add all columns that are required to identify a row uniquely to the primary key. Using only the country would not identify rows uniquely.</p>
<p>The partition key is vital to distribute data evenly between nodes and essential when reading the data. The previously defined schema is designed to be queried by <code>country</code> because <code>country</code> is the partition key. </p>
<p>A query that selects rows by <code>country</code> performs well:</p>
<pre><code class="lang-shell">cqlsh&gt; 
  SELECT * FROM learn_cassandra.users_by_country WHERE country='US';
</code></pre>
<p>In your <code>cqlsh</code> shell, you will send a request only to a single Cassandra node by default. This is called a consistency level of one, which enables excellent performance and scalability.</p>
<p>If you access Cassandra differently, the default consistency level might not be one.</p>
<blockquote>
<p><strong>What does consistency level of one mean?</strong>  </p>
<p>A consistency level of one means that only a single node is asked to return the data. With this approach, you will lose strong consistency guarantees and instead experience eventual consistency.  </p>
<p>We’ll dive deeper into consistency levels later on.</p>
</blockquote>
<p>Let's create another table. This one has a partition defined only by the <code>user_email</code> column:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.users_by_email (
    user_email text,
    country text,
    first_name text,
    last_name text,
    age smallint,
    PRIMARY KEY (user_email)
);
</code></pre>
<p>Now let’s fill this table with some records:</p>
<pre><code class="lang-shell">cqlsh&gt; 
INSERT INTO learn_cassandra.users_by_email (user_email, country,first_name,last_name,age)
  VALUES('john@email.com', 'US', 'John','Wick',55);

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
  VALUES('peter@email.com', 'UK', 'Peter','Clark',65); 

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
  VALUES('bob@email.com', 'UK', 'Bob','Sandler',23);

INSERT INTO learn_cassandra.users_by_email (user_email,country,first_name,last_name,age)
  VALUES('alice@email.com', 'UK', 'Alice','Brown',26);
</code></pre>
<p>This time, each row is put in its own partition.</p>
<p><img src="https://lh3.googleusercontent.com/idG07l3IB5r_XmkI2drNIpOkB9fAhq4N9VNi_yiI6pLZFgDrFUrXizLSpO41-2RYfb_pUHqGdY641SkpUhHwz9zgWb5tQRJnccAkv0fVy4gr2wAx4orr0FPa_IaMfhkp1bmDi_5q" alt="Image" width="1600" height="817" loading="lazy"></p>
<p>This is not bad, per se. If you want to optimize for getting data by <code>email</code> only, it's a good idea:</p>
<pre><code class="lang-shell">cqlsh&gt; 
  SELECT * FROM learn_cassandra.users_by_email WHERE user_email='alice@email.com';
</code></pre>
<p>If you set up your table with a partition key for <code>user_email</code> and want to get all users by <code>age</code>, you would need to get the data from all partitions because the partitions were created by <code>user_email</code>.</p>
<p>Talking to all nodes is expensive and can cause performance issues on a large cluster.</p>
<p>Cassandra tries to avoid harmful queries. If you want to filter by a column that is not a partition key, you need to tell Cassandra explicitly that you want to filter by a non-partition key column:</p>
<pre><code class="lang-shell">cqlsh&gt; 
SELECT * FROM learn_cassandra.users_by_email WHERE age=26 ALLOW FILTERING;
</code></pre>
<p>Without <code>ALLOW FILTERING</code>, the query would not be executed to prevent harm to the cluster by accidentally running expensive queries. Executing queries without conditions (like without a <code>WHERE</code> clause) or with conditions that don’t use the partition key, are costly and should be avoided to prevent performance bottlenecks.</p>
<p>But how do you get all the rows from the table in a scalable way?</p>
<p>If you can, partition by a value like <code>country</code>. If you know all the countries, you can then iterate over all available countries, send a query for each one, and collect the results in your application.</p>
<p>In terms of scalability, it’s worse to just select all rows, because when you use a table partitioned by <code>user_email</code>, all the data is collected in 1 request in a single coordinator.</p>
<p>This is OK as long as you have no performance issues.</p>
<p>By comparison, sending multiple requests by <code>country</code> distributes the effort to different coordinator nodes, which scales a lot better.</p>
<p>If you still need access to all of the data, there is an excellent <a target="_blank" href="https://github.com/datastax/spark-cassandra-connector">integration between Spark and Cassandra</a> that allows efficient reads and writes for massive datasets. The Spark connector for Cassandra groups your data by partition key and can execute queries very efficiently.</p>
<h3 id="heading-replication">Replication</h3>
<p>Scalability using partitioning alone is limited.</p>
<p>Consider a lot of write requests arriving for a single partition. All requests would be sent to a single node with technical limitations such as CPU, memory, and bandwidth. Additionally, you want to handle read and write requests if this node is not available.</p>
<p>That is where the concept of replication comes in. By duplicating data to different nodes, so called replicas, you can serve more data simultaneously from other nodes to improve latency and throughput. It also enables your cluster to perform reads and writes in case a replica is not available.</p>
<p>In Cassandra, you need to define a replication factor for every keyspace. At the beginning of our example, you created a keyspace with a replication factor of 3 for our default datacenter:</p>
<pre><code class="lang-shell">cqlsh&gt; CREATE KEYSPACE learn_cassandra
  WITH REPLICATION = { 
   'class' : 'NetworkTopologyStrategy',
   'datacenter1' : 3 
  };
</code></pre>
<p>A replication factor of one means there’s only one copy of each row in the cluster. If the node containing the row goes down, the row cannot be retrieved.</p>
<p>A replication factor of two means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica.</p>
<p>As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.</p>
<p>Usually, it's recommended to use a replication factor of 3 for production use cases. It makes sure your data is very unlikely to get lost or become inaccessible because there are three copies available. Also, if data is not consistent between replicas at any point in time, you can ask what information state is held by the majority.</p>
<p>In your local cluster setup, the majority means 2 out of 3 replicas. This allows us to use some powerful query options that you will see in the next section.</p>
<h3 id="heading-consistency-level">Consistency Level</h3>
<p>Now that you know about partitioning and replication, you are ready to think about consistency levels. Cassandra has a truly outstanding feature called tunable consistency. </p>
<p>You can define the consistency level of your read and write queries. You can check the <a target="_blank" href="https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/dml/dmlConfigConsistency.html">Cassandra docs</a> for all available settings.</p>
<p>Let’s focus on the most popular settings and try to understand when to choose each consistency level.</p>
<p>Let’s assume you have 3 replicas defined.</p>
<p>The first question you need to answer is, do you need strong consistency?</p>
<blockquote>
<p><strong>What does strong consistency mean?</strong>  </p>
<p>In contrast to eventual consistency, strong consistency means only one state of your data can be observed at any time in any location.  </p>
<p>For example, when consistency is critical, like in a banking domain, you want to be sure that everything is correct. You would rather accept a decrease in availability and increase of latency to ensure correctness.</p>
</blockquote>
<p>It all comes down to the <a target="_blank" href="https://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a>. You can not be available and consistent at the same time in case of connection issues between nodes of your cluster.  </p>
<p>Let's think through the following example:</p>
<p>You want to write a single value to a table. The data is replicated in 2 nodes, and the connection between the nodes is interrupted. First, a write-request is sent to node 1. Then, data is read from node 2.</p>
<p>How do you manage this situation?</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/image-62.png" alt="Image" width="600" height="400" loading="lazy"></p>
<ol>
<li>Should you disallow writes to all nodes to ensure consistency? This means availability would be sacrificed to ensure consistency and correctness.</li>
<li>Accept the write to node 1 and keep serving reads from both nodes. This would keep the system available but depending on what node you read from, the answer will be different, which means sacrificing consistency over availability.</li>
</ol>
<p>You can simplify the problem to make crucial decisions for your application: Do you want consistency or availability? </p>
<p>Another factor is latency. By talking to more nodes to ensure consistency, you need to wait longer to receive all nodes’ responses.</p>
<h3 id="heading-tune-for-consistency-by-setting-up-a-strong-consistency-application">Tune for Consistency by Setting up a Strong Consistency Application</h3>
<p>There is a very important formula that if true guarantees strong consistency:</p>
<pre><code>[read-consistency-level] + [write-consistency-level] &gt; [replication-factor]
</code></pre><blockquote>
<p><strong>What does consistency level mean?</strong>  </p>
<p>Consistency level means how many nodes need to acknowledge a read or a write query.</p>
</blockquote>
<p>You can shift read and write consistency levels to your favor if you want to keep strong consistency. Or you even give up strong consistency for better performance, which is also called eventual consistency:</p>
<p><img src="https://lh4.googleusercontent.com/TTm1Mgq3koomlkP5QWTzfdGrFwcII88ltYepXg5dVeF1JKaCp1K22qJHfhZN_WuG6B-MV3sWw8wNpOv26PtmlUbYTL001HPDPcQnS0wwgkSR4QxmP32_inoYa3gDcb6oUsmGSLPv" alt="Image" width="1600" height="488" loading="lazy"></p>
<p>For a read-heavy system, it’s recommended to keep read consistency low because reads happen more often than writes. Let's say you have a replication factor of 3. The formula would look like this:</p>
<pre><code><span class="hljs-number">1</span> + [write-consistency-level] &gt; <span class="hljs-number">3</span>
</code></pre><p>Therefore, the write consistency has to be set to 3 to have a strongly consistent system.</p>
<p>For a write-heavy system, you can do the same. Set the write consistency level to 1 and the read consistency level to 3.</p>
<p>You either check every node for a read to ensure all nodes have received the last updated state, or, for a write, you ensure that all nodes have written the update to their local storage. Both will make sure that data for reading and writing is correct.</p>
<p>This decision needs to be reflected in all the applications that access your Cassandra data because, on a query level, you need to set the required consistency level.</p>
<p>You set the replication factor of 3. Therefore, you can use a consistency level of <code>ALL</code> or <code>THREE</code>:</p>
<pre><code class="lang-shell">cqlsh&gt; 
   CONSISTENCY ALL;
   SELECT * FROM learn_cassandra.users_by_country WHERE country='US';
</code></pre>
<p>If just one of your applications violates the required consistency strategy, you are quickly at the risk of either dropping consistency or pressuring the cluster more than required.</p>
<h3 id="heading-tune-for-performance-by-using-eventual-consistency">Tune for Performance by Using Eventual Consistency</h3>
<p>If you don't need to be strongly consistent, you can reduce the consistency level for queries to 1 to gain performance:</p>
<pre><code class="lang-shell">cqlsh&gt; 
   CONSISTENCY ONE;
   SELECT * FROM learn_cassandra.users_by_country WHERE country='US';
</code></pre>
<p>Eventually, the data will be spread to all replicas and this will ensure <em>eventual</em> consistency. How fast data will be made consistent depends on different mechanics that sync data between nodes.</p>
<p>Various features can be tuned in Cassandra, like read-repairs and external processes that repair data continuously.</p>
<h3 id="heading-optimize-data-storage-for-reading-or-writing">Optimize Data Storage for Reading or Writing</h3>
<p>Writes are cheaper than reads in Cassandra due to its storage engine. Writing data means simply appending something to a so-called commit-log.</p>
<p>Commit-logs are append-only logs of all mutations local to a Cassandra node and reduce the required I/O to a minimum.</p>
<p>Reading is more expensive, because it might require checking different disk locations until all the query data is eventually found. </p>
<p>But this does not mean Cassandra is terrible at reading. Instead, Cassandra's storage engine can be tuned for reading performance or writing performance.</p>
<h3 id="heading-understanding-compaction">Understanding Compaction</h3>
<p>For every write operation, data is written to disk to provide durability. This means that if something goes wrong, like a power outage, data is not lost.</p>
<p>The foundation for storing data are the so-called <a target="_blank" href="https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html">SSTables</a>. SSTables are immutable data files Cassandra uses to persist data on disk.</p>
<p>You can set various strategies for a table that define how data should be merged and compacted. These strategies affect read and write performance:</p>
<ul>
<li><code>SizeTieredCompactionStrategy</code> is the default, and is especially performant if you have more writes than reads,</li>
<li><code>LeveledCompactionStrategy</code> optimizes for reads over writes. This optimization can be costly and needs to be tried out in production carefully</li>
<li><code>TimeWindowCompactionStrategy</code> is for Time-series data</li>
</ul>
<p>By default, tables use the <code>SizeTieredCompactionStrategy</code>:</p>
<pre><code class="lang-shell">cqlsh&gt; 
   DESCRIBE TABLE learn_cassandra.users_by_country;

CREATE TABLE learn_cassandra.users_by_country (
    country text,
    user_email text,
    age smallint,
    first_name text,
    last_name text,
    PRIMARY KEY (country, user_email)
) WITH CLUSTERING ORDER BY (user_email ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
</code></pre>
<p>Although you can alter the compaction strategy of an existing table, I would not suggest doing so, because all Cassandra nodes start this migration simultaneously. This will lead to significant performance issues in a production system.</p>
<p>Instead, define the compaction strategy explicitly during table creation of your new table:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.users_by_country_with_leveled_compaction (
    country text,
    user_email text,
    first_name text,
    last_name text,
    age smallint,
    PRIMARY KEY ((country), user_email)
) WITH
  compaction = { 'class' :  'LeveledCompactionStrategy'  };
</code></pre>
<p>Let’s check the result:</p>
<pre><code class="lang-shell">cqlsh&gt; 
   DESCRIBE TABLE learn_cassandra.users_by_country_with_leveled_compaction;

CREATE TABLE learn_cassandra.users_by_country_with_leveled_compaction (
    country text,
    user_email text,
    age smallint,
    first_name text,
    last_name text,
    PRIMARY KEY (country, user_email)
) WITH CLUSTERING ORDER BY (user_email ASC)
    AND bloom_filter_fp_chance = 0.1
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
</code></pre>
<p>The strategies define when and how compaction is executed. Compaction means rearranging data on disk to remove old data and keep performance as good as possible when more data needs to be stored.</p>
<p>Check out the excellent <a target="_blank" href="https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbInternals/dbIntHowDataMaintain.html#dbIntHowDataMaintain__dml_types_of_compaction">DataStax documentation about compaction</a> for details. There may even be better strategies in the future for the performance of your use-case.</p>
<h3 id="heading-presorting-data-on-cassandra-nodes">Presorting Data on Cassandra Nodes</h3>
<p>A table always requires a primary key. A primary key consists of 2 parts:</p>
<ul>
<li>At least 1 column(s) as partition key and</li>
<li>Zero or more clustering columns for nesting rows of the data.</li>
</ul>
<p>All columns of the partition key together are used to identify partitions. All primary key columns, meaning partition key and clustering columns, identify a specific row within a partition.</p>
<p>In Cassandra, data is already sorted on disk. So if you want to avoid sorting data later, you can make sure sorting is applied as needed. This can be ensured on the table level and avoids having to sort data in the client applications that query Cassandra.</p>
<p>In our <code>users_by_country</code> table, you can define <code>age</code> as another clustering column to sort stored data:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.users_by_country_sorted_by_age_asc (
    country text,
    user_email text,
    first_name text,
    last_name text,
    age smallint,
    PRIMARY KEY ((country), age, user_email)
) WITH CLUSTERING ORDER BY (age ASC);
</code></pre>
<p>Let’s add the same data again:</p>
<pre><code class="lang-shell">cqlsh&gt; 
INSERT INTO learn_cassandra.users_by_country_sorted_by_age_asc (country,user_email,first_name,last_name,age)
  VALUES('US','john@email.com', 'John','Wick',10);

INSERT INTO learn_cassandra.users_by_country_sorted_by_age_asc (country,user_email,first_name,last_name,age)
  VALUES('UK', 'peter@email.com', 'Peter','Clark',30);

INSERT INTO learn_cassandra.users_by_country_sorted_by_age_asc (country,user_email,first_name,last_name,age)
  VALUES('UK', 'bob@email.com', 'Bob','Sandler',20);

INSERT INTO learn_cassandra.users_by_country_sorted_by_age_asc (country,user_email,first_name,last_name,age)
  VALUES('UK', 'alice@email.com', 'Alice','Brown',40);
</code></pre>
<p>And get the data by country:</p>
<pre><code class="lang-shell">cqlsh&gt; 
      SELECT * FROM learn_cassandra.users_by_country_sorted_by_age_asc WHERE country='UK';

 country | age | user_email       | first_name | last_name
---------+-----+------------------+------------+-----------
      UK |  20 | bob@email.com   |        Bob |   Sandler
      UK |  30 | peter@email.com |      Peter |     Clark
      UK |  40 | alice@email.com |      Alice |     Brown

(3 rows)
</code></pre>
<p>In this example, the clustering columns are <code>age</code> and <code>user_email</code>. So the data is first sorted by age and then by <code>user_email</code>. At its core, Cassandra is still like a key-value store. Therefore, you can only query the table by:</p>
<ul>
<li><code>country</code></li>
<li><code>country</code> and <code>age</code></li>
<li><code>country</code>, <code>age</code>, and <code>user_email</code></li>
</ul>
<p>But never by <code>country</code> and <code>user_email</code>.</p>
<p>After learning about partitioning, replication and consistency levels, let's head into data modeling and have more fun with the Cassandra cluster.</p>
<h2 id="heading-data-modeling">Data Modeling</h2>
<p>You've already learned a lot about the fundamentals of Cassandra.</p>
<p>Let's put your knowledge into practice and design a to-do list application that receives many more reads than writes.</p>
<p>The best approach is to analyze some user stories you want to fulfill with your table design:</p>
<ol>
<li>As a user, I want to create a to-do element   </li>
</ol>
<p>Note: This is only about creating data. For now, you can delay some decisions because you want to focus on how data is read.</p>
<ol start="2">
<li>As a user, I want to list all my to-do elements in ascending order  </li>
</ol>
<p>First, you need to query by <code>user_email</code>. Create a table called <code>todos_by_user_email</code>.</p>
<p>You need 1 table that contains all the information of a to-do element of a user. Data should be partitioned by <code>user_email</code> for efficient read and writes by <code>user_email</code>.</p>
<p>Also, the oldest records should be displayed first, which means using the creation date as a clustering column. The <code>creation_date</code> also ensures uniqueness.:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.todo_by_user_email (
    user_email text,
    name text,
    creation_date timestamp,
    PRIMARY KEY ((user_email), creation_date)
) WITH CLUSTERING ORDER BY (creation_date DESC)
AND compaction = { 'class' :  'LeveledCompactionStrategy'  };
</code></pre>
<ol start="3">
<li>As a user, I want to share a to-do element with another user</li>
</ol>
<p>To get all the to-dos shared with a user, you need to create a table called <code>todos_shared_by_target_user_email</code> to display all shared to-dos for the target user. </p>
<p>The table contains the to-do name to display it.</p>
<p>But the user also wants to see the to-dos they shared with other users. This is another table, <code>todos_shared_by_source_user_email</code>.</p>
<p>Both tables have, according to the use-case, the required <code>user_email</code> as partition keys to allow efficient queries. Also, <code>creation_date</code> is added as a clustering column for sorting and uniqueness:</p>
<pre><code class="lang-shell">cqlsh&gt; 
CREATE TABLE learn_cassandra.todos_shared_by_target_user_email (
    target_user_email text,
    source_user_email text,
    creation_date timestamp,
    name text,
    PRIMARY KEY ((target_user_email), creation_date)
) WITH CLUSTERING ORDER BY (creation_date DESC)
AND compaction = { 'class' :  'LeveledCompactionStrategy'  };

CREATE TABLE learn_cassandra.todos_shared_by_source_user_email (
    target_user_email text,
    source_user_email text,
    creation_date timestamp,
    name text,
    PRIMARY KEY ((source_user_email), creation_date)
) WITH CLUSTERING ORDER BY (creation_date DESC)
AND compaction = { 'class' :  'LeveledCompactionStrategy'  };
</code></pre>
<p>This type of modeling is different than thinking about foreign keys and primary keys that you might know from traditional databases. In the beginning, it's all about defining tables and thinking about what values you want to filter and need to display.</p>
<p>You need to set a partition key to ensure the data is organised for efficient read and write operations. Also, you need to set clustering columns to ensure uniqueness, sort order, and optional query parameters.</p>
<h3 id="heading-keep-data-in-sync-using-batch-statements">Keep Data in Sync Using <code>BATCH</code> Statements</h3>
<p>Due to the duplication, you need to take care to keep data consistent. In Cassandra, you can do that by using <code>BATCH</code> statements that give you an all-at-once guarantee, also called atomicity.</p>
<p>This might sound like a lot of work, and yes, it is a lot of work! If you have a table schema with many relationships, you will have more work compared to a normalized table schema.</p>
<blockquote>
<p><strong>What is a normalized table schema?</strong>  </p>
<p>A normalized table schema is optimized to contain no duplications. Instead, data is referenced by ID and needs to be joined later.  </p>
<p>In Cassandra, you try to avoid normalized tables. It is not even possible to write a query that contains a join.</p>
</blockquote>
<p>Batch statements are cheap on a single partition, but dangerous when you execute them on different partitions, because:</p>
<ul>
<li>Data mutations will not be applied at the same time to all partitions, with no isolation</li>
<li>It is expensive for the coordinator node, because you have to talk to multiple nodes and prepare for a rollback if something goes wrong</li>
<li>There is a batch query size limit of 50kb to avoid overloading the coordinator. This limit can be increased, but this is not recommended</li>
</ul>
<p>In general, batches are costly.</p>
<p>There are other ways to apply changes eventually. If you need to execute them very often, consider using async queries instead with a proper retry mechanism. </p>
<p>Depending on the way you access your Cassandra, the driver might already offer you retry capabilities.</p>
<p>Still, this approach requires thinking about what will happen if a query is never executed. If every query really needs to be executed eventually, how can you make sure that it does not get lost if your service goes down?</p>
<p>The topic itself needs much more time to explain, and might be the main topic of another Cassandra tutorial.</p>
<p>The key learning here is: </p>
<ul>
<li>Single partition batches are cheap and should be used</li>
<li>Batches that include different partitions are expensive, and if there are a lot of reads/writes, this might be the reason why a Cassandra cluster is exhausted.  </li>
</ul>
<p>Let’s create a <code>BATCH</code> statement that contains a to-do element that is shared with a user:</p>
<pre><code class="lang-shell">cqlsh&gt; 

BEGIN BATCH
  INSERT INTO learn_cassandra.todo_by_user_email (user_email,creation_date,name) VALUES('alice@email.com', toTimestamp(now()), 'My first todo entry')

  INSERT INTO learn_cassandra.todos_shared_by_target_user_email (target_user_email, source_user_email,creation_date,name) VALUES('bob@email.com', 'alice@email.com',toTimestamp(now()), 'My first todo entry')

  INSERT INTO learn_cassandra.todos_shared_by_source_user_email (target_user_email, source_user_email,creation_date,name) VALUES('alice@email.com', 'bob@email.com', toTimestamp(now()), 'My first todo entry')

APPLY BATCH;
</code></pre>
<p>Let’s look into one of the tables:</p>
<pre><code class="lang-shell">cqlsh&gt;          
 SELECT * FROM learn_cassandra.todos_shared_by_target_user_email WHERE target_user_email='bob@email.com';

 target_user_email | creation_date   | name   | source_user_email
-------------------+-----------------+--------+-------------------
bob@email.com | 2021-05-24 ...| My first todo entry |   alice@email.com
</code></pre>
<p>All the data exists and can be accessed in a performant way using all the defined tables.</p>
<h3 id="heading-use-foreign-keys-instead-of-duplicating-data-in-cassandra">Use Foreign Keys Instead of Duplicating Data in Cassandra</h3>
<p>You might consider using foreign keys instead of duplicating data.</p>
<p>Traditionally, foreign keys are ID-references of an entity that are located in another table and in relational database. They guarantee that the referenced ID exists.</p>
<p>In Cassandra, this might feel good because you have less duplicated data. At this point, think again about why you use Cassandra. Usually, the answer is high traffic and scalability.</p>
<p>Cassandra can scale enormously and comes with top performance when used correctly.</p>
<p>Normalizing tables is against a lot of principles in Cassandra. You can reference data by ID, but keep in mind this means you need to join the data yourself. This also means reading and writing data to multiple partitions at once.</p>
<p>Cassandra is built for scale. If you start normalizing your schema to reduce duplication, then you sacrifice horizontal scalability.</p>
<p>If you still want to use foreign keys instead of data duplication, you might want to use another database. But, everything comes with trade-offs.</p>
<p>Instead of using Cassandra, you could use a database that sacrifices performance and availability, and gives more consistency guarantees. In cases like this, I can recommend Cloud Spanner or Cockroach DB for a scalable relational database.</p>
<h3 id="heading-indexes-in-cassandra">Indexes in Cassandra</h3>
<p>There are index-like features in Cassandra that can reduce the number of tables you need to maintain on your own. One feature is called secondary indexes.</p>
<p>I cannot recommend them because they only operate locally to a node.</p>
<p>Using a secondary index means talking to all nodes because the coordinator doesn’t know which nodes contain the data if you use other columns to query data than the actual partition key.</p>
<h3 id="heading-materialized-views">Materialized Views</h3>
<p>Materialized views were designed with scalability in mind.</p>
<p>They make it easier to duplicate tables with different partition keys so you can  query data by different column combinations. They also simplify the process of creating a new table and ensuring data integrity for mutations.</p>
<p>There is only one drawback — the source table's full primary key needs to be part of the materialized view's primary key, and optionally, one other column.</p>
<p>The columns that act as partition keys can be different.</p>
<h2 id="heading-running-a-cluster">Running a Cluster</h2>
<p>Running a Cassandra cluster can be intense. It contains your business-critical data and is usually under heavy pressure.</p>
<p>I won't go into details because I am more a Cassandra user than an expert in cluster maintenance. Still, I want to share my knowledge.</p>
<h3 id="heading-fully-managed-cassandra">Fully Managed Cassandra</h3>
<p>Datastax started a fully managed Cassandra product called <a target="_blank" href="https://www.datastax.com/products/datastax-astra">Astra</a>. They promise a lot:</p>
<blockquote>
<ul>
<li>Start in minutes with a free tier, no credit card needed.  </li>
<li>Eliminate the overhead to install, operate, and scale Cassandra clusters.  </li>
<li>Build faster with REST, GraphQL, CQL, and JSON/Document APIs.  </li>
<li>Built on open-source Apache Cassandra™, used by the best of the internet.  </li>
<li>Scale elastically — apps are viral ready from Day 1.  </li>
<li>Deploy multi-cloud, multi-tenant or dedicated clusters on AWS, Azure, or GCP.  </li>
<li>Ensure enterprise-level reliability, security, and management.  </li>
</ul>
<p>Quoted from the <a target="_blank" href="https://www.datastax.com/products/datastax-astra">Astra docs</a></p>
</blockquote>
<p>I have no experience with their offering. But I would give it a try! Their <a target="_blank" href="https://www.datastax.com/products/datastax-astra/pricing">pricing</a> sounds reasonable.</p>
<h3 id="heading-self-managed-cassandra">Self-Managed Cassandra</h3>
<p>Cassandra is built with Java. So knowing the basics of running JVM applications is very beneficial.</p>
<p>If you run Kubernetes, then definitely check out <a target="_blank" href="https://k8ssandra.io/">K8ssandra</a>. It bundles all the helpful tools around Cassandra like:</p>
<ul>
<li><a target="_blank" href="https://stargate.io/">Stargate.io</a> for REST, Graphql, and API Documentation</li>
<li><a target="_blank" href="http://cassandra-reaper.io/">Reaper</a> for easier repair management</li>
<li><a target="_blank" href="https://github.com/spotify/cassandra-medusa">Medusa</a> for backups</li>
<li><a target="_blank" href="https://github.com/datastax/metric-collector-for-apache-cassandra">Metrics collector</a> for monitoring</li>
<li><a target="_blank" href="https://docs.k8ssandra.io/tasks/connect/ingress/">Traefik</a> for ingress</li>
</ul>
<p>This stack of tools is fully open source and can be used without any additional monetary costs.</p>
<p>For developers, there is one very beneficial tool called <a target="_blank" href="https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsNodetool.html">nodetool</a>. It can inspect and provide insights into how many nodes are up, what size certain tables have, how many SSTables and tombstones exist. Nodetool can also repair your data to enforce eventual consistency.</p>
<h2 id="heading-other-learnings">Other Learnings</h2>
<p>Even after years of using Cassandra, there are still things to learn that let you use Cassandra more efficiently. In this section, I want to share various topics that you will experience eventually.</p>
<h3 id="heading-data-migrations">Data Migrations</h3>
<p>If you have worked with other databases before, you might know database migration tools like flyway or liquibase. Since version 4.0 RC-1, there is basic <a target="_blank" href="https://docs.liquibase.com/workflows/database-setup-tutorials/cassandra.html">liquibase support</a>.   </p>
<p>Additionally, the community worked on something similar with <a target="_blank" href="https://github.com/patka/cassandra-migration">Cassandra-migration</a>. It already supports advanced features such as leader election, for when multiple services start at the same time.</p>
<p>Any type of export and import can be done using <a target="_blank" href="https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkCmd.html">DSBulk</a> that allows loading and unloading data from and to Cassandra in CSV and JSON formats.</p>
<h3 id="heading-tombstones">Tombstones</h3>
<p>Cassandra is a multi-node cluster that contains replicated data on different nodes. Therefore, a delete can not simply delete a particular record.</p>
<p>For a delete operation, a new entry is added to the commit-log like for any other insert and update mutation. These deletes are called tombstones, and they flag a specific value for deletion.</p>
<p>Tombstones exist only on disk and can be analyzed and traced as described in this blog post: <a target="_blank" href="https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html">About Deletes and Tombstones in Cassandra</a>.</p>
<p>In Cassandra, you can set a time to live on inserted data. After the time passed, the record will be automatically deleted. When you set a time to live (TTL), a tombstone is created with a date in the future.</p>
<p>In comparison, a regular delete query is the same with the difference that the time date of the tombstone is set to the moment the delete is executed.</p>
<p>Let’s create a tombstone by setting a TTL in seconds which basically function as a delayed delete:</p>
<pre><code class="lang-shell">cqlsh&gt;     
  INSERT INTO learn_cassandra.todo_by_user_email (user_email,creation_date,name) VALUES('john@email.com', toTimestamp(now()), 'This entry should be removed soon') USING TTL 60;
</code></pre>
<p>And the data is stored like regular data:</p>
<pre><code class="lang-shell">cqlsh&gt;      
 SELECT * FROM learn_cassandra.todo_by_user_email WHERE user_email='john@email.com';

  user_email    | creation_date | name
----------------+---------------+--------------------
 john@email.com | 2021-05-30... | This entry should be removed soon

(1 rows)
</code></pre>
<p>You can also read the TTL from the database for a given column:</p>
<pre><code class="lang-shell">cqlsh&gt; 
 SELECT TTL(name) FROM learn_cassandra.todo_by_user_email WHERE user_email='john@email.com';

 ttl(name)
-----------
        43

(1 rows)
</code></pre>
<p>After 60 seconds, the row is gone.</p>
<pre><code class="lang-shell">cqlsh&gt;  
 SELECT * FROM learn_cassandra.todo_by_user_email WHERE user_email='john@email.com';                                  

 user_email | creation_date | todo_uuid | name
-----------+---------------+-----------+------

(0 rows)
</code></pre>
<p>Setting a TTL is one of many ways to  create and execute tombstones.</p>
<p>Unfortunately, there are also others.</p>
<p>For example, when you insert a null value, a tombstone is created for the given cell. And as mentioned for delete requests, different types of tombstones are stored. </p>
<p>By default, after 10 days, data that is marked by a tombstone is freed with a compaction execution. This time can be configured and reduced using the <code>gc_grace_seconds</code> option in the Cassandra configuration.</p>
<blockquote>
<p><strong>When is a compaction executed?</strong>  </p>
<p>When the operation is executed depends mainly on the selected strategy. In general, a compaction execution takes <code>SSTables</code> and creates new <code>SSTables</code> out of it.  </p>
<p>The most common executions are:  </p>
<ul>
<li>When conditions for a compaction are true, that triggers compaction execution when data is inserted   </li>
<li>A manually executed major compaction using the nodetool</li>
</ul>
</blockquote>
<p>Sometimes, tombstones not deleted for the following reasons:</p>
<ul>
<li><strong>Null values</strong> mark values to be deleted and are stored as tombstones. This can be avoided by either replacing null with a static value, or not setting the value at all if the value is null</li>
<li><strong>Empty lists and sets</strong> are similar to null for Cassandra and create a tombstone, so don’t insert them if they’re empty. Take care to avoid null pointer exceptions when storing and retrieving data in your application</li>
<li><strong>Updated lists and sets</strong> create tombstones. If you update an entity and the list or set does not change, it still creates a tombstone to empty the list and set the same values. Therefore, only update necessary fields to avoid issues. The good thing is, they are compacted due to the new values</li>
</ul>
<p>If you have many tombstones, you might run into another Cassandra issue that prevents a query from being executed.</p>
<p>This happens when the <code>tombstone_failure_threshold</code> is reached, which is set by default to 100,000 tombstones. This means that, when a query has iterated over more than 100,000 tombstones, it will be aborted.</p>
<p>The issue here is, once a query stops executing, it’s not easy to tidy things up because Cassandra will stop even when you execute a delete, as it has reached the tombstone limit.</p>
<p>Usually you would never have that many tombstones. But mistakes happen, and you should take care to avoid this case.</p>
<p>There is a handy <a target="_blank" href="https://cassandra.apache.org/doc/latest/operating/metrics.html">operation metric</a> that you should observe called <code>TombstoneScannedHistogram</code> to avoid unexpected issues in production.</p>
<h3 id="heading-updates-are-just-inserts-and-vice-versa"><code>UPDATE</code>s Are Just <code>INSERT</code>s, and Vice Versa</h3>
<p>In Cassandra, everything is append-only. There is no difference between an update and insert.</p>
<p>You already learned that a primary key defines the uniqueness of a row. If there is no entry yet, a new row will appear, and if there is already an entry, the entry will be updated. It does not matter if you execute an update or insert a query.</p>
<p>The primary key in our example is set to <code>user_email</code> and <code>creation_date</code> that defines record uniqueness.</p>
<p>Let’s insert a new record:</p>
<pre><code class="lang-shell">cqlsh&gt;      
  INSERT INTO learn_cassandra.todo_by_user_email (user_email, creation_date, name) VALUES('john@email.com', '2021-03-14 16:07:19.622+0000', 'Insert query');
</code></pre>
<p>And execute an update with a new <code>todo_uuid</code>:</p>
<pre><code class="lang-shell">cqlsh&gt;    
  UPDATE learn_cassandra.todo_by_user_email SET 
    name = 'Update query'
  WHERE user_email = 'john@email.com' AND creation_date = '2021-03-14 16:10:19.622+0000';
</code></pre>
<p>2 new rows appear in our table:</p>
<pre><code class="lang-shell">cqlsh&gt;    
 SELECT * FROM learn_cassandra.todo_by_user_email WHERE user_email='john@email.com';                                                                                                            

  user_email     | creation_date                   | name
----------------+---------------------------------+--------------
 john@email.com | 2021-03-14 16:10:19.622000+0000 | Update query
 john@email.com | 2021-03-14 16:07:19.622000+0000 | Insert query

(2 rows)
</code></pre>
<p>So you inserted a row using an update, and you can also use an insert to update:</p>
<pre><code class="lang-shell">cqlsh&gt;       
  INSERT INTO learn_cassandra.todo_by_user_email (user_email,creation_date,name) VALUES('john@email.com', '2021-03-14 16:07:19.622+0000', 'Insert query updated');
</code></pre>
<p>Let’s check our updated row:</p>
<pre><code class="lang-shell">cqlsh&gt;   
 SELECT * FROM learn_cassandra.todo_by_user_email WHERE user_email='john@email.com';

 user_email     | creation_date            | name
----------------+--------------------------+----------------------
 john@email.com | 2021-03-14 16:10:19.62   |         Update query
 john@email.com | 2021-03-14 16:07:19.62   | Insert query updated


(2 rows)
</code></pre>
<p>So <code>UPDATE</code> and <code>INSERT</code> are technically the same. Don’t think that an <code>INSERT</code> fails if there is already a row with the same primary key.</p>
<p>The same applies to an <code>UPDATE</code> — it will be executed, even if the row doesn’t exist.</p>
<p>The reason for this is because, by design, Cassandra rarely reads before writing to keep performance high. The only exceptions are described in the next section about lightweight transactions.</p>
<p>But, there are restrictions what actions you can execute based on an update or insert:</p>
<ul>
<li>Counters can only be changed with <code>UPDATE</code>, not with <code>Insert</code></li>
<li><code>IF NOT EXISTS</code> can only be used in combination with an <code>INSERT</code></li>
<li><code>IF EXISTS</code> can only be used in combination with an <code>UPDATE</code></li>
</ul>
<p>You will learn more about conditions in queries within the next section.</p>
<h3 id="heading-lightweight-transactions">Lightweight Transactions</h3>
<p>You can use conditions in queries using a feature called lightweight transactions (LWTs), which execute a read to check a certain condition before executing the write.</p>
<p>Let’s only update if an entry already exists, by using <code>IF EXISTS</code>:</p>
<pre><code class="lang-shell">cqlsh&gt;     
  UPDATE learn_cassandra.todo_by_user_email SET
    name = 'Update query with LWT'
  WHERE user_email = 'john@email.com' AND creation_date = '2021-03-14 16:07:19.622+0000' IF EXISTS;

 [applied]
-----------
      True
</code></pre>
<p>The same works for an insert query using <code>IF NOT EXISTS</code>:</p>
<pre><code class="lang-shell">cqlsh&gt;      
  INSERT INTO learn_cassandra.todo_by_user_email (user_email,creation_date,name) VALUES('john@email.com', toTimestamp(now()), 'Yet another entry') IF NOT EXISTS;

 [applied]
-----------
      True
</code></pre>
<p>Those executions are expensive compared to simple <code>UPDATE</code> and <code>INSERT</code> queries. Still, if it’s business-critical, they are an excellent way to achieve transactional safety.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>I hope you enjoyed the article.</p>
<p>If you liked it and feel the need to give me a round of applause, or just want to get in touch, <a target="_blank" href="https://twitter.com/sesigl">follow me on Twitter</a>.</p>
<p>I work at eBay Kleinanzeigen, one of the world’s biggest classified companies. By the way, <a target="_blank" href="https://jobs.ebayclassifiedsgroup.com/ebay-kleinanzeigen">we are hiring</a>!</p>
<p>Special thanks goes to <a target="_blank" href="https://twitter.com/infotexture">Roger Sheen</a>, <a target="_blank" href="https://twitter.com/michaeldlfx">Michael de la Fontaine</a>, <a target="_blank" href="https://twitter.com/donut1987">Christian Baer</a>, <a target="_blank" href="https://twitter.com/thomasuebel">Thomas Uebel</a> and Swen Fuhrmann for excellent feedback and proof-reading.</p>
<h2 id="heading-references">References</h2>
<ul>
<li><a target="_blank" href="https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archDataDistributeReplication.html">Cassandra docs about replication factory</a></li>
<li><a target="_blank" href="https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshConsistency.html?hl=consistency%2Clevel">Cassandra docs about consistency</a></li>
<li><a target="_blank" href="https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbInternals/dbIntHowDataMaintain.html#dbIntHowDataMaintain__dml_types_of_compaction">Compaction strategy overview</a></li>
<li><a target="_blank" href="https://www.slideshare.net/DataStax/the-missing-manual-for-leveled-compaction-strategy-wei-deng-datastax-cassandra-summit-2016,%20%20https://www.youtube.com/watch?v=-5sNVvL8RwI">Details on Leveled Compaction Strategy</a></li>
<li><a target="_blank" href="https://www.datastax.com/blog/materialized-view-performance-cassandra-3x">How materialized views work</a></li>
<li><a target="_blank" href="https://issues.apache.org/jira/browse/CASSANDRA-15071?jql=status%20%3D%20Open%20AND%20priority%20in%20(Blocker%2C%20Urgent%2C%20Critical%2C%20High)%20AND%20text%20~%20%22materialized%20views%22">Known bugs with materialized views</a></li>
<li><a target="_blank" href="https://gist.github.com/irajhedayati/e5efba87c59d6bfca9550a039e84169b">Start multi-node cassandra base</a></li>
<li><a target="_blank" href="https://cassandra.apache.org/doc/latest/operating/metrics.html">Cassandra operation metrics</a></li>
<li><a target="_blank" href="https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbInternals/dbIntAboutDeletes.html">How is data deleted in Cassandra</a></li>
<li><a target="_blank" href="https://youtu.be/a84-UOGZiEg">How the Spark Cassandra connector works</a></li>
<li><a target="_blank" href="https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6">How sharding works</a></li>
<li><a target="_blank" href="https://www.baeldung.com/java-uuid">UUIDs in Java</a></li>
<li><a target="_blank" href="https://en.wikipedia.org/wiki/Universally_unique_identifier">Definition, history and definition of UUIDs</a></li>
<li><a target="_blank" href="https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html">Deletes and Tombstones in Cassandra</a></li>
<li><a target="_blank" href="https://www.datastax.com/blog/basic-rules-cassandra-data-modeling">Basic rules of Cassandra modeling</a></li>
<li><a target="_blank" href="https://www.datastax.com/dev">Data Stax</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a NoSQL Database with RavenDB ]]>
                </title>
                <description>
                    <![CDATA[ By Nahla Davies If you look at any website or application today, somewhere under the hood there is a database. After all, we live in the world of Big Data. And the volume of data is growing exponentially.  With so much data at hand, we need ever more... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-create-a-nosql-database-with-ravendb/</link>
                <guid isPermaLink="false">66d46043e39d8b5612bc0dc8</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 09 Jul 2021 15:06:24 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2021/07/Screen-Shot-2021-07-05-at-9.18.04-AM.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Nahla Davies</p>
<p>If you look at any website or application today, somewhere under the hood there is a database. After all, we live in the world of Big Data. And the volume of data is growing exponentially. </p>
<p>With so much data at hand, we need ever more sophisticated ways to store it and process it. </p>
<p>So job markets continue to be strong for most computer professionals <a target="_blank" href="https://www.freecodecamp.org/news/my-personal-tips-on-working-from-home-during-this-covid-19-season/">working remotely from home</a>, including database architects and database administrators. </p>
<p>There are even more opportunities in data science and analytics. But you need a solid foundation in database programming to take advantage of these opportunities.</p>
<p>In this article, I'll introduce you to the RavenDB database management system. We'll review some essential RavenDB features and after that I'll walk you through setting up your first RavenDB database.</p>
<h2 id="heading-what-is-ravendb">What is RavenDB?</h2>
<p>RavenDB is a cross-platform, distributed, ACID-compliant, document-based, NoSQL database that offers high performance while remaining fairly easy to use. </p>
<p>Knowledge of data programming is also crucial for web and software development, which has become one of the <a target="_blank" href="https://www.waveapps.com/freelancing/web-development/back-end-web-developer-salary">most lucrative remote working jobs</a> in the United States today. </p>
<h2 id="heading-ravendb-features">RavenDB Features</h2>
<p>To use RavenDB effectively, you should understand how each of its features works and why they're important.</p>
<h3 id="heading-cross-platform">Cross-platform</h3>
<p>RavenDB is available for Windows, Linux, and Raspberry Pi. Mac users can run RavenDB within the Docker container system. </p>
<p>This gives developers great flexibility when developing databases and associated applications.</p>
<h3 id="heading-distributed-database">Distributed database</h3>
<p>Generally speaking, a distributed database hosts data in multiple physical locations (for example, different sites or computers). </p>
<p>While the specifics of RavenDB's distributed architecture are beyond the scope of this article, you should understand two of its fundamental elements: clusters and nodes.</p>
<p><strong>Clusters</strong> are collections of an odd number of machines, with a minimum of three. Each machine in the cluster is a <strong>node</strong>. Databases can spread across one or more nodes in the cluster. In some instances, an entire database may be present on each node in a cluster. </p>
<p>In addition to data distribution, clusters self-manage distribution of work, along with failure and recovery efforts.</p>
<p>Distributed database architecture allows for high transaction throughput, that is, high performance. RavenDB can handle up to 150,000 writes and 1 million reads per second. </p>
<p>Distributed architecture also is more resilient when failures occur compared to traditional relational databases. </p>
<p>The distributed architecture of NoSQL databases (see below) makes them useful for developing mobile applications. Still, you should remain vigilant against mobile security risks, as <a target="_blank" href="https://tokenist.com/mobile-device-security/">89% of mobile device vulnerabilities</a> do not require physical access to the mobile device.</p>
<h3 id="heading-acid-compliant">ACID-compliant</h3>
<p>ACID is an acronym for a set of database properties that help ensure the reliable processing of database transactions: </p>
<ul>
<li><strong>Atomicity</strong> ensures that every database transaction is treated as a single unit, no matter how many statements the transaction includes. Atomicity prevents problematic partial updates. During processing, transactions either succeed or fail as units. If a single statement within the transaction fails, the entire transaction fails. Other database clients can never perceive a transaction to be partially resolved. </li>
<li><strong>Consistency</strong> ensures that transactions comply with all data validation rules in the database. If a transaction generates non-compliant data, the database rolls back to the prior valid version. </li>
<li><strong>Isolation</strong> ensures that when multiple transactions take place concurrently, the transactions do not affect each other and do not attempt to use data from an in-process transaction. The final database update for a set of concurrent transactions is the same as if each transaction was processed in series.</li>
<li><strong>Durability</strong> prevents the loss of completed transaction data, even in the event of post-processing system failures. Completed transaction data becomes permanent in the database system, typically in non-volatile memory.</li>
</ul>
<p>Most NoSQL databases are not ACID-compliant. RavenDB is an exception, <a target="_blank" href="https://www.ibm.com/docs/en/cics-ts/5.4?topic=processing-acid-properties-transactions">using ACID principles to drive high performance</a> while also ensuring data integrity and reliability.</p>
<h3 id="heading-nosql">NoSQL</h3>
<p>The value of NoSQL versus SQL if often debated. For our purposes, we can simplify the difference. </p>
<p>In traditional relational databases, SQL programming dominates. In non-relational, distributed databases, NoSQL reigns. </p>
<p>SQL databases rely on tables. NoSQL databases can use other bases, including documents (as RavenDB does), dynamic tables, key-value pairs, and more.</p>
<p>NoSQL databases rely on distributed architecture to scale horizontally. As the database size increases, it is split among several different nodes in a cluster. SQL databases scale vertically – more data requires larger servers.</p>
<p>Searches are also frequently faster in NoSQL databases. Whereas SQL database queries rely on joins or combinations of data from multiple tables into a new table, NoSQL queries typically do not need joins. </p>
<p>Since many NoSQL implementations are cloud-based, developers must always keep <a target="_blank" href="https://www.freecodecamp.org/news/understanding-encryption-algorithms/">encryption of their databases</a> and applications front of mind for security purposes.</p>
<h3 id="heading-document-based">Document-based</h3>
<p>Document-based does not mean that Raven only stores PDFs or word processing documents. For the purposes of NoSQL databases, a document is a collection of structured (actually semi-structured) <a target="_blank" href="https://ravendb.net/articles/nosql-document-oriented-databases-detailed-overview">self-contained data</a>. </p>
<p>You can use one of several languages to code the documents that will eventually reside in the NoSQL database, including Extensible Markup Language (XML) and JavaScript Object Notation (JSON). RavenDB primarily uses JSON documents.</p>
<p>Document-based databases are generally more efficient than their relational counterparts because they store all information about an object in a single document instance rather than spread across multiple tables. This structure increases database efficiency, as it does not require object-relational mapping.</p>
<h2 id="heading-how-to-create-a-new-ravendb-database">How to Create a New RavenDB Database</h2>
<p>It's relatively simple to create a new RavenDB database. But before creating a database, you first need to install the RavenDB system. </p>
<p>You can <a target="_blank" href="https://ravendb.net/">download RavenDB on its website</a> depending on your chosen operating system (Windows, Linux, or <a target="_blank" href="https://www.raspberrypi.org/software/">Raspberry Pi</a>), and there's a Docker version for Mac users. </p>
<p>Installation is quick and easy. You must select whether you want to use a secure or non-secure version. </p>
<p>Secure versions require you to either have or obtain a security certificate, but getting one through RavenDB is also painless. Free certificate licenses are available for the entry-level version of RavenDB.</p>
<p>Once you have installed RavenDB, only a few steps remain before you are working in your first database:</p>
<ol>
<li>Login to your RavenDB application and go to your dashboard.</li>
<li>You will see a menu item for Databases on the dashboard, which you will click to start the process.</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/Screen-Shot-2021-07-05-at-9.18.04-AM-1.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<ol start="3">
<li><p>The window that opens includes a dropdown to search for existing databases, a search box, and a New database button. Click on it.</p>
</li>
<li><p>Once you have opened the new database, you must name it. Names may be as long as 128 characters, including letters, numbers and a limited selection of special characters (“-”, “_”, “.”).</p>
</li>
<li><p>After naming your database, you must assign a replication factor, which specifies distribution of your data across nodes. A replication factor of one means all data is in a single node. For settings above 1, you can choose between dynamic distribution or manual replication node setting (with the appropriate license).</p>
</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/07/Screen-Shot-2021-07-05-at-9.22.36-AM.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<ol start="6">
<li>After completing these steps, you will return to the main database window. All that is left to do is click on the database name, and you are ready to begin creating documents.</li>
</ol>
<p>For true beginners, RavenDB offers users the option to populate an empty database with sample data so that you can get a better feel for how to work in the database.</p>
<h2 id="heading-wrap-up">Wrap-up</h2>
<p>RavenDB is a powerful, robust, easy-to-use and easy-to-learn <a target="_blank" href="https://www.freecodecamp.org/news/nosql-databases-5f6639ed9574/">NoSQL database system</a>. </p>
<p>For users looking to improve their database design and administration skills, RavenDB is a user-friendly training ground.  </p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use Transactions in MongoDB to Prevent Inconsistencies in Your Java Code ]]>
                </title>
                <description>
                    <![CDATA[ By Haritha Yahathugoda The Latest MongoDB version 4.2 introduced multi-document transactions. This was a key feature that was missing from most NoSQL databases (and which SQL DBs bragged about).  A transaction, which can be composed of one or more op... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/mongodb-transactions-in-java/</link>
                <guid isPermaLink="false">66d45f319208fb118cc6cfb9</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Java ]]>
                    </category>
                
                    <category>
                        <![CDATA[ MongoDB ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 12 Jan 2021 06:33:00 +0000</pubDate>
                <media:content url="https://cdn-media-2.freecodecamp.org/w1280/5ff9296a75d5f706921ca9ae.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Haritha Yahathugoda</p>
<p>The Latest MongoDB version 4.2 introduced <a target="_blank" href="https://docs.mongodb.com/v4.2/core/transactions/">multi-document transactions</a>. This was a key feature that was missing from most NoSQL databases (and which SQL DBs bragged about). </p>
<p>A transaction, which can be composed of one or more operations, acts as an atomic operation. If all sub-operations succeed, that transaction is considered to be completed. Otherwise it fails. </p>
<p>This is called atomicity. This is an important concept to understand to keep your data consistent when reading/writing data concurrently.</p>
<h2 id="heading-article-scope-and-goals">Article Scope And Goals</h2>
<p>The goal of this article is to present you with a real life example where data inconsistencies occur without transactions. Then we will build a solution in Java using MongoDB Transactions to prevent them. </p>
<p>By doing so, you will learn to:</p>
<ol>
<li>Avoid <a target="_blank" href="https://en.wikipedia.org/wiki/Race_condition">Race Conditions</a> that could result in data inconsistencies</li>
<li>Build more resilient applications by using Mongo's build-in Retryable Writes</li>
</ol>
<p>Also, I added one wrapper function, <code>static &lt;R&gt; R withTransaction(final Function&lt;ClientSession, R&gt; executeFn);</code>,  that you can use to improve code readability. </p>
<h2 id="heading-example-how-to-handle-concurrent-transactions-against-the-same-bank-account">Example: How to Handle Concurrent Transactions Against the Same Bank Account</h2>
<p>Assume you and your spouse share a joint bank account. Each of you goes to the ATM at the same time and starts withdrawing money. </p>
<pre><code class="lang-markdown">t1 -&gt; You: Press check balance. ATM shows 100 dollars
t2 -&gt; Spouse: Press check balance. ATM shows 100 dollars
t3 -&gt; You &amp; Spouse: withdraw 10 dollars
t4 -&gt; Bank: initializes P1 and P2 to handle your and your spouse's requests.
t5 -&gt; P1 and P2 checked the balance and saw 100 dollars
t6 -&gt; P1 and P2 subtracted 10 dollars from the balance
t7 -&gt; P1 updated the DB with the new balance of 90
t8 -&gt; P2 updated the DB with the new balance of 90
</code></pre>
<p>In the above example, operations did not occur sequentially. The bank's process P2 did not wait for P1 to complete its tasks. If the bank had waited for P1 to finish reading the balance, calculating the new balance, and writing the updated balance back to the DB before it reading the most up to date balance, it wouldn't have lost 10 dollars.</p>
<p>The solution to this problem is <strong>transactions</strong>. You can think of them as somewhat similar to <a target="_blank" href="https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/concurrent/locks/package-summary.html">Locks</a>, Semaphores, and Synchronized blocks in Java. In Java, it guarantees that only the Lock holder executes the code protected by a lock.</p>
<h2 id="heading-how-to-set-up-helper-functions">How to Set Up Helper Functions</h2>
<p>Now let's get to the coding part. I'm going to assume you already have a MongoClient setup. You will need <a target="_blank" href="https://mongodb.github.io/mongo-java-driver/4.0/whats-new/#what-s-new-in-3-8">Java Mongo Driver 3.8 or higher</a>.</p>
<pre><code class="lang-java"><span class="hljs-keyword">final</span> <span class="hljs-keyword">static</span> MongoClient client; <span class="hljs-comment">// assumed you initialized this somewhere</span>

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> ClientSession <span class="hljs-title">getNewClientSession</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> client.startSession();
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> TransactionOptions <span class="hljs-title">getTransactionOptions</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> TransactionOptions.builder()
        .readPreference(ReadPreference.primary())
        .readConcern(ReadConcern.LOCAL)
        .writeConcern(WriteConcern.MAJORITY)
        .build();
}
</code></pre>
<p><code>getNewClientSession</code> simply returns a session for a transaction. <code>ClientSession</code> is an identifier for a particular transaction. This is an important piece of data that you pass into all following Mongo operations so that it can isolate the operations. </p>
<p><code>getTransactionOptions</code> provides options for the Transaction. <code>ReadPreference.primary()</code> gives us the most up to date info on a cluster when we are reading data. <code>WriteConcern.MAJORITY</code> results in the DB acknowledging a commit after it successfully writes to the majority of the servers.</p>
<p>Instead of creating client sessions and transaction options everywhere, we should instead do it on a single method and just pass in the functions that need atomicity to it.</p>
<pre><code class="lang-java"><span class="hljs-keyword">static</span> &lt;R&gt; <span class="hljs-function">R <span class="hljs-title">withTransaction</span><span class="hljs-params">(<span class="hljs-keyword">final</span> Function&lt;ClientSession, R&gt; executeFn)</span> </span>{
    <span class="hljs-keyword">final</span> ClientSession clientSession = getNewClientSession();
    TransactionOptions txnOptions = <span class="hljs-keyword">this</span>.getTransactionOptions();

    TransactionBody&lt;R&gt; txnBody = <span class="hljs-keyword">new</span> TransactionBody&lt;R&gt;() {
        <span class="hljs-function"><span class="hljs-keyword">public</span> R <span class="hljs-title">execute</span><span class="hljs-params">()</span> </span>{
            <span class="hljs-keyword">return</span> executeFn.apply(clientSession);
        }
    };

    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">return</span> clientSession.withTransaction(txnBody, txnOptions);
    } <span class="hljs-keyword">catch</span> (RuntimeException e) {
        e.printStackTrace();
    } <span class="hljs-keyword">finally</span> {
        clientSession.close();
    }
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">null</span>;
}
</code></pre>
<p>The above function runs operations inside a passed-in function, the <code>executeFn</code> argument, as an atomic operation or a transaction. Let's implement our money drawing function using transactions. </p>
<p>Note that I am returning <code>null</code>. You could just throw a new exception to let the caller know that the transaction has failed. For the sake of this example, returning null implies transaction failure.</p>
<h2 id="heading-bank-account-example-in-java">Bank Account Example In Java</h2>
<pre><code class="lang-java"><span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Account</span> </span>{
    <span class="hljs-meta">@BsonId</span>
    ObjectId _id;
    <span class="hljs-keyword">int</span> balance;

    ... getters and setters
}

<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AccountService</span> </span>{
    <span class="hljs-function"><span class="hljs-keyword">public</span> Collection&lt;Account&gt; <span class="hljs-title">getAccounts</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> dbClient.getCollection(<span class="hljs-string">'account'</span>, Account.class);
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> Account <span class="hljs-title">currentBalance</span><span class="hljs-params">(ClientSession session, Bson accountId)</span> </span>{
        <span class="hljs-keyword">return</span> getAccounts().findOne(session, Filters.eq(<span class="hljs-string">'_id'</span>, accountId)).first();
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">int</span> <span class="hljs-title">currentBalance</span><span class="hljs-params">(ClientSession session, Bson accountId)</span> </span>{
        Account account = getAccounts().findOne(session, Filters.eq(<span class="hljs-string">'_id'</span>, accountId)).first();
        <span class="hljs-keyword">return</span> account.balance;
    }

    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">int</span> <span class="hljs-title">updateBalance</span><span class="hljs-params">(ClientSession session, Bson accountId, <span class="hljs-keyword">int</span> newBalance)</span> </span>{
        Account account = getAccounts().updateOne(session, Filters.eq(<span class="hljs-string">'_id'</span>, accountId), Updates.set(<span class="hljs-string">'balance'</span>, newBalance)).first();
        <span class="hljs-keyword">return</span> account.balance;
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> Account <span class="hljs-title">drawCash</span><span class="hljs-params">(ClientSession session, Bson accountId, <span class="hljs-keyword">int</span> amount)</span></span>{
        <span class="hljs-keyword">int</span> currentBalance = <span class="hljs-keyword">this</span>.currentBalance(accountId);
        <span class="hljs-keyword">int</span> newBalance = currentBalance - amount;
        <span class="hljs-keyword">return</span> updateBalance(session, accountId, amount);
    }
}
</code></pre>
<p>In above code snippet, the <code>Account</code> class is a plain Java class model for  the user's account. <code>AccountService</code> is a database accessor for the accounts collection. The <code>drawCach</code> method completes the set of operations executed by a single process (P1 or P2) described in the first example to dispense money to either you or your spouse. </p>
<p>Now we use this <code>withTransaction</code> function to call <code>drawCache</code>:</p>
<pre><code class="lang-java">... Some REST API 
AccountService accountService = ...; <span class="hljs-comment">// Dependency injected</span>

<span class="hljs-meta">@Path('/account/withdraw')</span> <span class="hljs-comment">// Endpoint to withdraw money</span>
withdrawMoney() {
    ObjectId accountId = ...<span class="hljs-comment">// some method to get current users account ID</span>
    Account account = withTransaction(<span class="hljs-keyword">new</span> Function&lt;ClientSession, Account&gt;() {
        <span class="hljs-meta">@Override</span>
        <span class="hljs-function"><span class="hljs-keyword">public</span> Workflow <span class="hljs-title">apply</span><span class="hljs-params">(ClientSession clientSession)</span> </span>{
            <span class="hljs-comment">// Everything inside this block run with in the same transaction as long as you pass the argument clientSession to mongo</span>
            accountService.drawCash(clientSession, accountId, <span class="hljs-number">10</span>);
        }
    });

    <span class="hljs-keyword">if</span>(Objects.isNull(account)){
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Failed to withdraw money"</span>;
    }
    <span class="hljs-keyword">return</span> <span class="hljs-string">"New account balance is "</span> + account.balance;
}
</code></pre>
<p>Now if you call this endpoint twice, concurrently, one user will see the final balance as 90 and the second one will see 80. </p>
<p>You might have guessed that the second user's transaction should have failed. Yes, it did. But MongoDB has a built-in retry mechanism and it automatically retried our second operation again and succeeded.</p>
<h2 id="heading-a-real-world-example-use-case">A Real-World Example Use Case</h2>
<p>We use transactions on our <a target="_blank" href="https://www.ps2pdf.com/video-converter">PS2PDF.com online video converter</a> to prevent one thread from overriding process states updated by another. </p>
<p>For example, for each video convert process, we create a document called Job on the DB. It has a status field which can take values such as <code>STARTED</code>, <code>IN_PROGRESS</code>, and <code>COMPLETED</code>. </p>
<p>Once the thread has updated the Job.status on the DB to <code>COMPLETED</code>, we don't want any slow thread reverting that message to <code>IN_PROGRESS</code>. Once a job has completed, it cannot be changed. </p>
<p>We use the above mentioned <code>withTransaction</code> method to guarantee that no operation overrides the <code>COMPLETE</code> status.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>I hope you can now use transactions to avoid race conditions on your applications. Plus, use built-in <code>retryWrite</code> and <code>retryRead</code> to improve fault tolerance. </p>
<p>I should point out that, MongoDB Transactions are pretty new, and there are articles out there that identify some inconsistencies that occur in special circumstances. But it is highly unlikely that you will run into these issues.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The JavaScript + Firestore Tutorial for 2020: Learn by Example ]]>
                </title>
                <description>
                    <![CDATA[ Cloud Firestore is a blazing-fast, serverless NoSQL database, perfect for powering web and mobile apps of any size. Grab the complete guide to learning Firestore, created to show you how to use Firestore as the engine for your own amazing projects fr... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/the-firestore-tutorial-for-2020-learn-by-example/</link>
                <guid isPermaLink="false">66d037f1871ae63f179f6bc8</guid>
                
                    <category>
                        <![CDATA[ cheatsheet ]]>
                    </category>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Firebase ]]>
                    </category>
                
                    <category>
                        <![CDATA[ JavaScript ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Tutorial ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Reed ]]>
                </dc:creator>
                <pubDate>Thu, 16 Jul 2020 13:00:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/07/The-Firestore-Tutorial-2020.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Cloud Firestore is a blazing-fast, serverless NoSQL database, perfect for powering web and mobile apps of any size. <a target="_blank" href="https://reedbarger.com/resources/javascript-firestore-2020/">Grab the complete guide to learning Firestore</a>, created to show you how to use Firestore as the engine for your own amazing projects from front to back.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<p>Getting Started with Firestore</p>
<ul>
<li>What is Firestore? Why Should You Use It?</li>
<li>Setting Up Firestore in a JavaScript Project</li>
<li>Firestore Documents and Collections</li>
<li>Managing our Database with the Firebase Console</li>
</ul>
<p>Fetching Data with Firestore</p>
<ul>
<li>Getting Data from a Collection with .get()</li>
<li>Subscribing to a Collection with .onSnapshot()</li>
<li>Difference between .get() and .onSnapshot()</li>
<li>Unsubscribing from a collection</li>
<li>Getting individual documents</li>
</ul>
<p>Changing Data with Firestore</p>
<ul>
<li>Adding document to a collection with .add()</li>
<li>Adding a document to a collection with .set()</li>
<li>Updating existing data</li>
<li>Deleting data</li>
</ul>
<p>Essential Patterns</p>
<ul>
<li>Working with subcollections</li>
<li>Useful methods for Firestore fields</li>
<li>Querying with .where()</li>
<li>Ordering and limiting data</li>
</ul>
<p><a target="_blank" href="https://reedbarger.com/resources/javascript-firestore-2020/">Note: you can download a PDF version of this tutorial so you can read it offline.</a></p>
<h3 id="heading-what-is-firestore-why-should-you-use-it">What is Firestore? Why Should You Use It?</h3>
<p>Firestore is a very flexible, easy to use database for mobile, web and server development. If you're familiar with Firebase's realtime database, Firestore has many similarities, but with a different (arguably more declarative) API.</p>
<p>Here are some of the features that Firestore brings to the table:</p>
<h4 id="heading-easily-get-data-in-realtime">⚡️Easily get data in realtime</h4>
<p>Like the Firebase realtime database, Firestore provides useful methods such as .onSnapshot() which make it a breeze to listen for updates to your data in real time. It makes Firestore an ideal choice for projects that place a premium on displaying and using the most recent data (chat applications, for instance).</p>
<h4 id="heading-flexibility-as-a-nosql-database">Flexibility as a NoSQL Database</h4>
<p>Firestore is a very flexible option for a backend because it is a NoSQL database. NoSQL means that the data isn't stored in tables and columns as a standard SQL database would be. It is structured like a key-value store, as if it was one big JavaScript object. </p>
<p>In other words, there's no schema or need to describe what data  our database will store. As long as we provide valid keys and values, Firestore will store it. </p>
<h4 id="heading-effortlessly-scalable">↕️ Effortlessly scalable</h4>
<p>One great benefit of choosing Firestore for your database is the very powerful infrastructure that it builds upon that enables you to scale your application very easily. Both vertically and horizontally. No matter whether you have hundreds or millions of users. Google's servers will be able to handle whatever load you place upon it.</p>
<p>In short, Firestore is a great option for applications both small and large. For small applications it's powerful because we can do a lot without much setup and create projects very quickly with them. Firestore is well-suited for large projects due to it's scalability.</p>
<h3 id="heading-setting-up-firestore-in-a-javascript-project">Setting Up Firestore in a JavaScript Project</h3>
<blockquote>
<p>We're going to be using the Firestore SDK for JavaScript. Throughout this cheatsheet, we'll cover how to use Firestore within the context of a JavaScript project. In spite of this, the concepts we'll cover here are easily transferable to any of the available Firestore client libraries. </p>
</blockquote>
<p> To get started with Firestore, we'll head to the Firebase console. You can visit that by going to <a target="_blank" href="https://firebase.com">firebase.google.com</a>. You'll need to have a Google account to sign in. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/firebase.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Once we're signed in, we'll create a new project and give it a name. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/create-a-project.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Once our project is created, we'll select it. After that, on our project's dashboard, we'll select the code button. </p>
<p>This will give us the code we need to integrate Firestore with our JavaScript project. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/firebase-integration.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Usually if you're setting this up in any sort of JavaScript application, you'll want to put this in a dedicated file called firebase.js. If you're using any JavaScript library that has a package.json file, you'll want to install the Firebase dependency with npm or yarn.</p>
<pre><code class="lang-bash">// with npm
npm i firebase

// with yarn
yarn add firebase
</code></pre>
<p>Firestore can be used either on the client or server. If you are using Firestore with Node, you'll need to use the CommonJS syntax with require. Otherwise, if you're using JavaScript in the client, you'll import firebase using ES Modules.</p>
<pre><code class="lang-js"><span class="hljs-comment">// with Commonjs syntax (if using Node)</span>
<span class="hljs-keyword">const</span> firebase = <span class="hljs-built_in">require</span>(<span class="hljs-string">"firebase/app"</span>);
<span class="hljs-built_in">require</span>(<span class="hljs-string">"firebase/firestore"</span>);

<span class="hljs-comment">// with ES Modules (if using client-side JS, like React)</span>
<span class="hljs-keyword">import</span> firebase <span class="hljs-keyword">from</span> <span class="hljs-string">'firebase/app'</span>;
<span class="hljs-keyword">import</span> <span class="hljs-string">'firebase/firestore'</span>;

<span class="hljs-keyword">var</span> firebaseConfig = {
  <span class="hljs-attr">apiKey</span>: <span class="hljs-string">"AIzaSyDpLmM79mUqbMDBexFtOQOkSl0glxCW_ds"</span>,
  <span class="hljs-attr">authDomain</span>: <span class="hljs-string">"lfasdfkjkjlkjl.firebaseapp.com"</span>,
  <span class="hljs-attr">databaseURL</span>: <span class="hljs-string">"https://lfasdlkjkjlkjl.firebaseio.com"</span>,
  <span class="hljs-attr">projectId</span>: <span class="hljs-string">"lfasdlkjkjlkjl"</span>,
  <span class="hljs-attr">storageBucket</span>: <span class="hljs-string">"lfasdlkjkjlkjl.appspot.com"</span>,
  <span class="hljs-attr">messagingSenderId</span>: <span class="hljs-string">"616270824980"</span>,
  <span class="hljs-attr">appId</span>: <span class="hljs-string">"1:616270824990:web:40c8b177c6b9729cb5110f"</span>,
};
<span class="hljs-comment">// Initialize Firebase</span>
firebase.initializeApp(firebaseConfig);
</code></pre>
<h3 id="heading-firestore-collections-and-documents">Firestore Collections and Documents</h3>
<p>There are two key terms that are essential to understanding how to work with Firestore: <strong>documents</strong> and <strong>collections</strong>. </p>
<p>Documents are individual pieces of data in our database. You can think of documents to be much like simple JavaScript objects. They consist of key-value pairs, which we refer to as <strong>fields</strong>. The values of these fields can be strings, numbers, Booleans, objects, arrays, and even binary data.</p>
<pre><code class="lang-js"><span class="hljs-built_in">document</span> -&gt; { <span class="hljs-attr">key</span>: value }
</code></pre>
<p>Sets of these documents of these documents are known as collections. Collections are very much like arrays of objects. Within a collection, each document is linked to a given identifier (id). </p>
<pre><code class="lang-js">collection -&gt; [{ <span class="hljs-attr">id</span>: doc }, { <span class="hljs-attr">id</span>: doc }]
</code></pre>
<h3 id="heading-managing-our-database-with-the-firestore-console">Managing our database with the Firestore Console</h3>
<p>Before we can actually start working with our database we need to create it.</p>
<p>Within our Firebase console, go to the 'Database' tab and create your Firestore database. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/firestore.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Once you've done that, we will start in test mode and enable all reads and writes to our database. In other words, we will have open access to get and change data in our database. If we were to add Firebase authentication, we could restrict access only to authenticated users. </p>
<p>After that, we'll be taken to our database itself, where we can start creating collections and documents. The root of our database will be a series of collections, so let's make our first collection. </p>
<p>We can select 'Start collection' and give it an id. Every collection is going to have an id or a name. For our project, we're going to keep track of our users' favorite books. We'll give our first collection the id 'books'. </p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/collection-id.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Next, we'll add our first document with our newly-created 'books' collection. </p>
<p>Each document is going to have an id as well, linking it to the collection in which it exists. </p>
<p>In most cases we're going to use an  option to give it an automatically generated ID. So we can hit the button 'auto id' to do so, after which we need to provide a field, give it a type, as well as a value. </p>
<p>For our first book, we'll make a 'title' field of type 'string', with the value 'The Great Gatsby', and hit save. </p>
<p>After that, we should see our first item in our database.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2021/09/first-item.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h3 id="heading-getting-data-from-a-collection-with-get">Getting data from a collection with .get()</h3>
<p>To get access Firestore use all of the methods it provides, we use <code>firebase.firestore()</code>. This method need to be executed every time we want to interact with our Firestore database. </p>
<p>I would recommend creating a dedicated variable to store a single reference to Firestore. Doing so helps to cut down on the amount of code you write across your app. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> db = firebase.firestore();
</code></pre>
<blockquote>
<p>In this cheatsheet, however, I'm going to stick to using the firestore method each time to be as clear as possible.</p>
</blockquote>
<p>To reference a collection, we use the <code>.collection()</code> method and provide a collection's id as an argument. To get a reference to the books collection we created, just pass in the string 'books'.</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> booksRef = firebase.firestore().collection(<span class="hljs-string">'books'</span>);
</code></pre>
<p>To get all of the document data from a collection, we can chain on the <code>.get()</code> method. </p>
<p><code>.get()</code> returns a promise, which means we can resolve it either using a <code>.then()</code> callback or we can use the async-await syntax if we're executing our code within an async function. </p>
<p>Once our promises is resolved in one way or another, we get back what's known as a <strong>snapshot</strong>. </p>
<p>For a collection query that snapshot is going to consist of a number of individual documents. We can access them by saying <code>snapshot.docs</code>. </p>
<p>From each document, we can get the id as a separate property, and the rest of the data using the <code>.data()</code> method. </p>
<p>Here's what our entire query looks like:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> booksRef = firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>);

booksRef
  .get()
  .then(<span class="hljs-function">(<span class="hljs-params">snapshot</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> data = snapshot.docs.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> ({
      <span class="hljs-attr">id</span>: doc.id,
      ...doc.data(),
    }));
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"All data in 'books' collection"</span>, data); 
    <span class="hljs-comment">// [ { id: 'glMeZvPpTN1Ah31sKcnj', title: 'The Great Gatsby' } ]</span>
  });
</code></pre>
<h3 id="heading-subscribing-to-a-collection-with-onsnapshot">Subscribing to a collection with .onSnapshot()</h3>
<p>The <code>.get()</code> method simply returns all the data within our collection. </p>
<p>To leverage some of Firestore's realtime capabilities we can subscribe to a collection, which gives us the current value of the documents in that collection, whenever they are updated. </p>
<p>Instead of using the <code>.get()</code> method, which is for querying a single time, we use the <code>.onSnapshot()</code> method. </p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .onSnapshot(<span class="hljs-function">(<span class="hljs-params">snapshot</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> data = snapshot.docs.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> ({
      <span class="hljs-attr">id</span>: doc.id,
      ...doc.data(),
    }));
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"All data in 'books' collection"</span>, data);
  });
</code></pre>
<p>In the code above, we're using what's known as method chaining instead of creating a separate variable to reference the collection.</p>
<p>What's powerful about using firestore is that we can chain a bunch of methods one after another, making for more declarative, readable code.</p>
<p>Within onSnapshot's callback, we get direct access to the snapshot of our collection, both now and whenever it's updated in the future. Try manually updating our one document and you'll see that <code>.onSnapshot()</code> is listening for any changes in this collection.</p>
<h3 id="heading-difference-between-get-and-onsnapshot">Difference between .get() and .onSnapshot()</h3>
<p>The difference between the get and the snapshot methods is that get returns a promise, which needs to be resolved, and only then we get the snapshot data.</p>
<p><code>.onSnapshot</code>, however, utilizes synchronous callback function, which gives us direct access to the snapshot. </p>
<p>This is important to keep in mind when it comes to these different methods--we have to know which of them return a promise and which are synchronous. </p>
<h3 id="heading-unsubscribing-from-a-collection-with-unsubscribe">Unsubscribing from a collection with unsubscribe()</h3>
<p>Note additionally that <code>.onSnapshot()</code> returns a function which we can use to unsubscribe and stop listening on a given collection. </p>
<p>This is important in cases where the user, for example, goes away from a given page where we're displaying a collection's data. Here's an example, using the library React were we are calling unsubscribe within the useEffect hook. </p>
<p>When we do so this is going to make sure that when our component is unmounted (no longer displayed within the context of our app) that we're no longer listening on the collection data that we're using in this component.</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">App</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> [books, setBooks] = React.useState([]);

  React.useEffect(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">const</span> unsubscribe = firebase
      .firestore()
      .collection(<span class="hljs-string">"books"</span>)
      .onSnapshot(<span class="hljs-function">(<span class="hljs-params">snapshot</span>) =&gt;</span> {
        <span class="hljs-keyword">const</span> data = snapshot.docs.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> ({
          <span class="hljs-attr">id</span>: doc.id,
          ...doc.data(),
        }));
        setBooks(data);
      });
  }, []);

  <span class="hljs-keyword">return</span> books.map(<span class="hljs-function"><span class="hljs-params">book</span> =&gt;</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">BookList</span> <span class="hljs-attr">key</span>=<span class="hljs-string">{book.id}</span> <span class="hljs-attr">book</span>=<span class="hljs-string">{book}</span> /&gt;</span></span>)
}
</code></pre>
<h3 id="heading-getting-individual-documents-with-doc">Getting Individual Documents with .doc()</h3>
<p>When it comes to getting a document within a collection., the process is just the same as getting an entire collection: we need to first create a reference to that document, and then use the get method to grab it.</p>
<p>After that, however, we use the <code>.doc()</code> method chained on to the collection method. In order to create a reference, we need to grab this id from the database if it was auto generated. After that, we can chain on <code>.get()</code> and resolve the promise. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> bookRef = firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .doc(<span class="hljs-string">"glMeZvPpTN1Ah31sKcnj"</span>);

bookRef.get().then(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> {
  <span class="hljs-keyword">if</span> (!doc.exists) <span class="hljs-keyword">return</span>;
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Document data:"</span>, doc.data());
  <span class="hljs-comment">// Document data: { title: 'The Great Gatsby' }</span>
});
</code></pre>
<p>Notice the conditional <code>if (!doc.exists) return;</code> in the code above.</p>
<p>Once we get the document back, it's essential to check to see whether it exists. </p>
<p>If we don't, there'll be an error in getting our document data. The way to check and see if our document exists is by saying, if <code>doc.exists</code>, which returns a true or false value. </p>
<p>If this expression returns false, we want to return from the function or maybe throw an error. If <code>doc.exists</code> is true, we can get the data from <code>doc.data</code>.</p>
<h3 id="heading-adding-document-to-a-collection-with-add">Adding document to a collection with .add()</h3>
<p>Next, let's move on to changing data. The easiest way to add a new document to a collection is with the <code>.add()</code> method. </p>
<p>All you need to do is select a collection reference (with <code>.collection()</code>) and chain on <code>.add()</code>. </p>
<p>Going back to our definition of documents as being like JavaScript objects, we need to pass an object to the <code>.add()</code> method and specify all the fields we want to be on the document. </p>
<p>Let's say we want to add another book, 'Of Mice and Men':</p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .add({
    <span class="hljs-attr">title</span>: <span class="hljs-string">"Of Mice and Men"</span>,
  })
  .then(<span class="hljs-function">(<span class="hljs-params">ref</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Added doc with ID: "</span>, ref.id);
    <span class="hljs-comment">// Added doc with ID:  ZzhIgLqELaoE3eSsOazu</span>
  });
</code></pre>
<p>The <code>.add</code> method returns a promise and from this resolved promise, we get back a reference to the created document, which gives us information such as the created id. </p>
<p>The <code>.add()</code> method auto generates an id for us. Note that we can't use this ref directly to get data. We can however pass the ref to the doc method to create another query.</p>
<h3 id="heading-adding-a-document-to-a-collection-with-set">Adding a document to a collection with .set()</h3>
<p>Another way to add a document to a collection is with the <code>.set()</code> method. </p>
<p>Where set differs from add lies in the need to specify our own id upon adding the data. </p>
<p>This requires chaining on the <code>.doc()</code> method with the id that you want to use. Also, note how when the promise is resolved from <code>.set()</code>, we don't get a reference to the created document:</p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .doc(<span class="hljs-string">"another book"</span>)
  .set({
    <span class="hljs-attr">title</span>: <span class="hljs-string">"War and Peace"</span>,
  })
  .then(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Document created"</span>);
  });
</code></pre>
<p>Additionally, when we use <code>.set()</code> with an existing document, it will, by default, overwrite that document. </p>
<p>If we want to merge, an old document with a new document instead of overwriting it, we need to pass an additional argument to <code>.set()</code> and provide the property <code>merge</code> set to true.</p>
<pre><code class="lang-js"><span class="hljs-comment">// use .set() to merge data with existing document, not overwrite</span>

<span class="hljs-keyword">const</span> bookRef = firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .doc(<span class="hljs-string">"another book"</span>);

bookRef
  .set({
    <span class="hljs-attr">author</span>: <span class="hljs-string">"Lev Nikolaevich Tolstoy"</span>
  }, { <span class="hljs-attr">merge</span>: <span class="hljs-literal">true</span> })
  .then(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Document merged"</span>);

    bookRef
      .get()
      .then(<span class="hljs-function"><span class="hljs-params">doc</span> =&gt;</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Merged document: "</span>, doc.data());
      <span class="hljs-comment">// Merged document:  { title: 'War and Peace', author: 'Lev Nikolaevich Tolstoy' }</span>
    });
  });
</code></pre>
<h3 id="heading-updating-existing-data-with-update">Updating existing data with .update()</h3>
<p>When it comes to updating data we use the update method, like <code>.add()</code> and <code>.set()</code> it returns a promise.</p>
<p>What's helpful about using <code>.update()</code> is that, unlike <code>.set()</code>, it won't overwrite the entire document. Also like <code>.set()</code>, we need to reference an individual document. </p>
<p>When you use <code>.update()</code>, it's important to use some error handling, such as the <code>.catch()</code> callback in the event that the document doesn't exist. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> bookRef = firebase.firestore().collection(<span class="hljs-string">"books"</span>).doc(<span class="hljs-string">"another book"</span>);

bookRef
  .update({
    <span class="hljs-attr">year</span>: <span class="hljs-number">1869</span>,
  })
  .then(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Document updated"</span>); <span class="hljs-comment">// Document updated</span>
  })
  .catch(<span class="hljs-function">(<span class="hljs-params">error</span>) =&gt;</span> {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error updating doc"</span>, error);
  });
</code></pre>
<h3 id="heading-deleting-data-with-delete">Deleting data with .delete()</h3>
<p>We can delete a given document collection by referencing it by it's id and executing the <code>.delete()</code> method, simple as that. It also returns a promise.</p>
<p>Here is a basic example of deleting a book with the id "another book":</p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"books"</span>)
  .doc(<span class="hljs-string">"another book"</span>)
  .delete()
  .then(<span class="hljs-function">() =&gt;</span> <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Document deleted"</span>)) <span class="hljs-comment">// Document deleted</span>
  .catch(<span class="hljs-function">(<span class="hljs-params">error</span>) =&gt;</span> <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Error deleting document"</span>, error));
</code></pre>
<blockquote>
<p>Note that the official Firestore documentation does not recommend to delete entire collections, only individual documents.</p>
</blockquote>
<h3 id="heading-working-with-subcollections">Working with Subcollections</h3>
<p>Let's say that we made a misstep in creating our application, and instead of just adding books we also want to connect them to the users that made them. T</p>
<p>The way that we want to restructure the data is by making a collection called 'users' in the root of our database, and have 'books' be a subcollection of 'users'. This will allow users to have their own collections of books. How do we set that up? </p>
<p>References to the subcollection 'books' should look something like this: </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> userBooksRef = firebase
  .firestore()
  .collection(<span class="hljs-string">'users'</span>)
  .doc(<span class="hljs-string">'user-id'</span>)
  .collection(<span class="hljs-string">'books'</span>);
</code></pre>
<p>Note additionally that we can write this all within a single <code>.collection()</code> call using forward slashes. </p>
<p>The above code is equivalent to the follow, where the collection reference must have an odd number of segments. If not, Firestore will throw an error. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> userBooksRef = firebase
  .firestore()
  .collection(<span class="hljs-string">'users/user-id/books'</span>);
</code></pre>
<p>To create the subcollection itself, with one document (another Steinbeck novel, 'East of Eden') run the following. </p>
<pre><code class="lang-js">firebase.firestore().collection(<span class="hljs-string">"users/user-1/books"</span>).add({
  <span class="hljs-attr">title</span>: <span class="hljs-string">"East of Eden"</span>,
});
</code></pre>
<p>Then, getting that newly created subcollection would look like the following based off of the user's ID.</p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"users/user-1/books"</span>)
  .get()
  .then(<span class="hljs-function">(<span class="hljs-params">snapshot</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> data = snapshot.docs.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> ({
      <span class="hljs-attr">id</span>: doc.id,
      ...doc.data(),
    }));
    <span class="hljs-built_in">console</span>.log(data); 
    <span class="hljs-comment">// [ { id: 'UO07aqpw13xvlMAfAvTF', title: 'East of Eden' } ]</span>
  });
</code></pre>
<h3 id="heading-useful-methods-for-firestore-fields">Useful methods for Firestore fields</h3>
<p>There are some useful tools that we can grab from Firestore that enables us to work with our field values a little bit easier. </p>
<p>For example, we can generate a timestamp for whenever a given document is created or updated with the following helper from the <code>FieldValue</code> property. </p>
<p>We can of course create our own date values using JavaScript, but using a server timestamp lets us know exactly when data is changed or created from Firestore itself. </p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"users"</span>)
  .doc(<span class="hljs-string">"user-2"</span>)
  .set({
    <span class="hljs-attr">created</span>: firebase.firestore.FieldValue.serverTimestamp(),
  })
  .then(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Added user"</span>); <span class="hljs-comment">// Added user</span>
  });
</code></pre>
<p>Additionally, say we have a field on a document which keeps track of a certain number, say the number of books a user has created. Whenever a user creates a new book we want to increment that by one. </p>
<p>An easy way to do this, instead of having to first make a <code>.get()</code> request, is to use another field value helper called <code>.increment()</code>:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> userRef = firebase.firestore().collection(<span class="hljs-string">"users"</span>).doc(<span class="hljs-string">"user-2"</span>);

userRef
  .set({
    <span class="hljs-attr">count</span>: firebase.firestore.FieldValue.increment(<span class="hljs-number">1</span>),
  })
  .then(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Updated user"</span>);

    userRef.get().then(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Updated user data: "</span>, doc.data());
    });
  });
</code></pre>
<h3 id="heading-querying-with-where">Querying with .where()</h3>
<p>What if we want to get data from our collections based on certain conditions? For example, say we want to get all of the users that have submitted one or more books?</p>
<p>We can write such a query with the help of the <code>.where()</code> method. First we reference a collection and then chain on <code>.where()</code>. </p>
<p>The where method takes three arguments--first, the field that we're searching on an operation, an operator, and then the value on which we want to filter our collection. </p>
<p>We can use any of the following operators and the fields we use can be primitive values as well as arrays.</p>
<p><code>&lt;</code>, <code>&lt;=</code>, <code>==</code>, <code>&gt;</code>, <code>&gt;=</code>, <code>array-contains</code>, <code>in</code>, or <code>array-contains-any</code></p>
<p>To fetch all the users who have submitted more than one book, we can use the following query. </p>
<p>After <code>.where()</code> we need to chain on <code>.get()</code>. Upon resolving our promise we get back what's known as a <strong>querySnapshot</strong>. </p>
<p>Just like getting a collection, we can iterate over the querySnapshot with <code>.map()</code> to get each documents id and data (fields):</p>
<pre><code class="lang-js">firebase
  .firestore()
  .collection(<span class="hljs-string">"users"</span>)
  .where(<span class="hljs-string">"count"</span>, <span class="hljs-string">"&gt;="</span>, <span class="hljs-number">1</span>)
  .get()
  .then(<span class="hljs-function">(<span class="hljs-params">querySnapshot</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> data = querySnapshot.docs.map(<span class="hljs-function">(<span class="hljs-params">doc</span>) =&gt;</span> ({
      <span class="hljs-attr">id</span>: doc.id,
      ...doc.data(),
    }));
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Users with &gt; 1 book: "</span>, data);
    <span class="hljs-comment">// Users with &gt; 1 book:  [ { id: 'user-1', count: 1 } ]</span>
  });
</code></pre>
<blockquote>
<p>Note that you can chain on multiple <code>.where()</code> methods to create compound queries.</p>
</blockquote>
<h3 id="heading-limiting-and-ordering-queries">Limiting and ordering queries</h3>
<p>Another method for effectively querying our collections is to limit them. Let's say we want to limit a given query to a certain amount of documents. </p>
<p>If we only want to return a few items from our query, we just need to add on the <code>.limit()</code> method, after a given reference. </p>
<p>If we wanted to do that through our query for fetching users that have submitted at least one book, it would look like the following. </p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> usersRef = firebase
  .firestore()
  .collection(<span class="hljs-string">"users"</span>)
  .where(<span class="hljs-string">"count"</span>, <span class="hljs-string">"&gt;="</span>, <span class="hljs-number">1</span>);

  usersRef.limit(<span class="hljs-number">3</span>)
</code></pre>
<p>Another powerful feature is to order our queried data according to document fields using <code>.orderBy()</code>. </p>
<p>If we want to order our created users by when they were first made, we can use the <code>orderBy</code> method with the 'created' field as the first argument. For the second argument, we specify whether it should be in ascending or descending order. </p>
<p>To get all of the users ordered by when they were created from newest to oldest, we can execute the following query:</p>
<pre><code class="lang-js"><span class="hljs-keyword">const</span> usersRef = firebase
  .firestore()
  .collection(<span class="hljs-string">"users"</span>)
  .where(<span class="hljs-string">"count"</span>, <span class="hljs-string">"&gt;="</span>, <span class="hljs-number">1</span>);

  usersRef.orderBy(<span class="hljs-string">"created"</span>, <span class="hljs-string">"desc"</span>).limit(<span class="hljs-number">3</span>);
</code></pre>
<p>We can chain <code>.orderBy()</code> with <code>.limit()</code>. For this to work properly, <code>.limit()</code> should be called last and not before <code>.orderBy()</code>.</p>
<h2 id="heading-want-your-own-copy">Want your own copy?</h2>
<p>If you would like to have this guide for future reference, <a target="_blank" href="https://reedbarger.com/resources/javascript-firestore-2020/">download a cheatsheet of this entire tutorial here</a>. </p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://reedbarger.com/resources/javascript-firestore-2020/">https://reedbarger.com/resources/javascript-firestore-2020/</a></div>
<h2 id="heading-become-a-professional-react-developer">Become a Professional React Developer</h2>
<p>React is hard. You shouldn't have to figure it out yourself.</p>
<p>I've put everything I know about React into a single course, to help you reach your goals in record time:</p>
<p><a target="_blank" href="https://www.thereactbootcamp.com"><strong>Introducing: The React Bootcamp</strong></a></p>
<p><strong>It’s the one course I wish I had when I started learning React.</strong></p>
<p>Click below to try the React Bootcamp for yourself:</p>
<p><a target="_blank" href="https://www.thereactbootcamp.com"><img src="https://reedbarger.nyc3.digitaloceanspaces.com/reactbootcamp/react-bootcamp-cta-alt.png" alt="Click to join the React Bootcamp" width="600" height="400" loading="lazy"></a>
<em>Click to get started</em></p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How looking back can help us move forward: a retrospective on software gems and fads ]]>
                </title>
                <description>
                    <![CDATA[ By Pakal de Bonchamp Maybe one of the most important qualities of a developer is the ability to pick the right tool for the right job, without hopping onto bandwagons or reinventing the wheel. This might require a bit of technology analysis, but even... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/software-fads-and-gems/</link>
                <guid isPermaLink="false">66d460939f2bec37e2da0664</guid>
                
                    <category>
                        <![CDATA[ asyncio ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Bitcoin ]]>
                    </category>
                
                    <category>
                        <![CDATA[ formats ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 30 Aug 2019 18:30:49 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2019/08/hidden-gem.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Pakal de Bonchamp</p>
<p>Maybe one of the most important qualities of a developer is the ability to pick the right tool for the right job, without hopping onto bandwagons or <a target="_blank" href="https://en.wikipedia.org/wiki/Not_invented_here">reinventing the wheel</a>. This might require a bit of technology analysis, but even more, a touch of critical thinking.</p>
<p>Here is a review of a few exaggerated trends and underrated niceties, in different areas of the marvelous world of computer science: <strong>databases, asynchronicity, cryptocurrency, and data formats</strong>. I won't touch on the subject of REST webservices, which I already <a target="_blank" href="https://www.freecodecamp.org/news/rest-is-the-new-soap-97ff6c09896d/">ranted about at great length</a>. </p>
<p><em>As usual, your feedback is more than welcome if any factual errors slipped into this (not entirely unbiased) article.</em></p>
<h2 id="heading-databases-nosql-amp-zodb">Databases: NoSQL &amp; ZODB</h2>
<p>Few moments, in the history of computer science, were as ironically lit as the arrival of No-SQL databases, around 2009. A tidal wave struck the shores of backend development and system administration: SQL databases were too rigid, too slow, too hard to replicate. </p>
<p>So new projects massively ditched them in favor of key-value stores like Redis, document-oriented databases like MongoDB/CouchDB, or graph-oriented databases like Neo4j. And we must acknowledge one thing: these new databases shone in benchmarks; they shone about as much.... as would shine any SQL database dropping all its <a target="_blank" href="https://www.geeksforgeeks.org/acid-properties-in-dbms/">ACID constraints</a> and query language flexibility.</p>
<p>But the horizon was grim for numerous programmers. They learned, the hard way, that data persistence was not a minor concern. And that they needed, for example, to explicitly activate "Write Concerns" in MongoDB, to ensure that data would not get lost before reaching disk oxide. </p>
<p>They learned that "eventual consistency" was a pretty word for "temporary inconsistency", opening the door to nasty, silent, hard-to-reproduce bugs in production. And that transactions - and their implicit locking - were precious features, and that mimicking them by hand, with awkward flags stuffed into documents, was all but easy and robust. </p>
<p>And they learned that data schemas, and referential integrity, were more than welcome to prevent databases from becoming heaps of incoherent objects. And that the lack of advanced indexing capabilities (on multiple keys, on deep document fields) in key-value stores could become quite embarrassing.</p>
<p> Thus, people began reinventing SQL features on top of NoSQL databases, by mimicking data schemas, foreign keys, advanced aggregation, in language-specific "ORM" libraries (mongoengine, mongoid, mongomapper...). In this context, this "Object-Relational Mapper" acronym should have, by itself, been a hint that something had gone wild. </p>
<p>There was something surreal in watching NoSQL databases, which were honed for specific use cases (highly replicated or heterogeneous data, <a target="_blank" href="https://docs.mongodb.com/manual/core/capped-collections/">capped-size collections</a> or <a target="_blank" href="https://docs.mongodb.com/manual/tutorial/expire-data/">TTLs</a>, pub/sub systems...), be used just to store a bunch of same-shape objects in a single server instance. </p>
<p>A standard SQL database would completely have done the job, and offered many more tooling options and plugins (different storage engines, Percona toolkit scripts, IDEs like HeidiSql or Mysql Workbench, DB schema migration processes integrated into web frameworks...). Even if it meant stuffing extra unstructured data into a serialized Text Field (or, nowadays, dedicated <a target="_blank" href="https://www.postgresql.org/docs/current/datatype-json.html">PostgreSQL Json Fields</a>).</p>
<p>With time, NoSQL databases themselves improved a lot, among other things by borrowing features from the SQL world. But reinventing SQL is not an easy task. Relational databases deal with query language parsing, character sets and collations, data aggregation and conversion, transactions and isolation levels, views and query caches, triggers, embedded procedures, <a target="_blank" href="http://wiki.gis.com/wiki/index.php/Geographic_information_system">GIS</a>, fine-grained permissions, replication and clustering... complex and sensitive features, driven by hundreds of settings spread on multiple levels (per database, per table, per connection). </p>
<p>So despite their great progress (multi-document transactions, better data aggregation, stored JavaScript functions, pluggable storage, role-based access control in MongoDB), NoSQL DBs still have trouble challenging major SQL databases, purely feature-wise. </p>
<p>Luckily, most projects only need a tiny subset of these SQL database features: a few schema validations, a few proper indices, and business can get rolling; so for teams lacking SQL expertise, the relative simplicity of many NoSQL DBs could indeed be, to be honest, a relevant factor.</p>
<p>The wave seems to have faded by now, and projects seem more inclined to combine different databases according to actual needs. They thus separate user accounts, job queues and similar caches, logging and stats data... each into the most relevant storage.</p>
<p>All these cited NoSQL databases, and their countless alternatives, are shining in their intended use cases. But I'd like to mention a too-little-known, too-little-used gem of the Python ecosystem. Have you already wanted to persist your data in a really, reaaaalllly easy way? Then I forward you to the <a target="_blank" href="http://www.zodb.org/en/latest/">ZODB</a>. You open it like a dictionary, you push whatever data you want into it, you commit the transaction, and you're good to go. </p>
<p><em>Example of simple local ZODB instance:</em></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> ZODB <span class="hljs-keyword">import</span> FileStorage, DB
<span class="hljs-keyword">import</span> transaction

storage = FileStorage.FileStorage(<span class="hljs-string">'mydatabase.fs'</span>)
root = DB(storage).open().root()
print(<span class="hljs-string">"ROOT:"</span>, root)
root[<span class="hljs-string">'employees'</span>] = [<span class="hljs-string">'Mary'</span>, <span class="hljs-string">'Jo'</span>, <span class="hljs-string">'Bob'</span>]
transaction.commit()
</code></pre>
<p>Graphs of data are handled gracefully (no recursion error), objects are lazily loaded on access, special "bucket tree" types are provided to browse huge amounts of data while keeping memory low, and several storage backends exist, including <a target="_blank" href="https://relstorage.readthedocs.io/en/latest/install.html">relstorage</a> which leverages the power of SQL databases. Perfect, isn't it?</p>
<p>Alright, I'm lying, there are a few gotchas. There is no built-in indexing system (one must use Zcatalog or the likes instead). Using dedicated "persistent" types is highly advised, to automatically detect and persist mutations of objects. The overall tooling is quite limited compared to mainstream databases. And the concurrency model based on "optimistic locking" might force you, under heavy load, to retry an operation several times until it manages to get applied. </p>
<p>The extreme amount of integration with the Python language has an additional drawback: if you introduce breaking changes into your data model, your database might not load anymore, so you must handle schema migrations carefully. </p>
<p>But context is everything: ZODB is not meant for long term and interoperable data persistence, but for effortless storage of (possibly very heterogeneous) python objects. It can make long-running scripts able to resume after interruption, it can store player data of online game sessions... if you really want to store blog articles or personal accounts in ZODB, you had better limit yourself to native python types, and implement your own sanity checks.  But whatever happens, do not use a very limited <a target="_blank" href="https://docs.python.org/3.7/library/shelve.html">stdlib shelf</a>, if you can have a nifty ZODB under the hand to store your work-in-progress data.</p>
<h2 id="heading-asynchronicity-asyncio-trio-and-green-threads">Asynchronicity: Asyncio, Trio and Green Threads</h2>
<p>There has been an immemorial challenge between synchronous and asynchronous programming models, in all IO-bound programs. Kernels have provided asynchronous modes for disk operations, with more or less success (<em>overlapped</em> non-blocking IO on Windows, limited _io<em>submit</em>() API on Linux...). </p>
<p>Networking code has made the issue still more acute, with the need for huge numbers of long-term connections, each performing only minor CPU operations. </p>
<p>Some languages, like Erland, confronted this by being asynchronous from the start, and letting different tasks communicate by message passing (a.k.a <a target="_blank" href="https://en.wikipedia.org/wiki/Actor_model">Actor Model</a>).</p>
<p>In other languages, several design patterns emerged to tackle the problem:</p>
<ul>
<li>callbacks</li>
<li>async/await syntax</li>
<li>lightweight threads</li>
</ul>
<p>Callbacks were previously the major solution in mainstream frameworks. For example in jQuery or Twisted, the developer would provide callables as arguments or as instance methods, and these would be called on IO completion/cancellation, in a pattern called <a target="_blank" href="https://en.wikipedia.org/wiki/Inversion_of_control">Inversion of Control</a>. It works, for sure, but it makes program flows quite hard to predict and debug, hence the term "callback soup" often used in this context.</p>
<p>For the last few years, the <a target="_blank" href="https://docs.python.org/3/library/asyncio-task.html">async/await</a> syntax has become highly trendy, especially in the Python world. But there is a problem: like Inversion of Control, it's a whole new way of programming, almost a new language. The vast amount of packages currently available, made of modules, classes and methods, just does NOT work with async/await. </p>
<p>Any IO, any expensive operation, hidden deep inside a subdependency, could ruin your day. So we're currently gazing at thousands of great modules being happily reimplemented, with a whole new world of bugs and missing features.</p>
<p>Is it all worth it? Python developers have massively jumped onto the train of the <a target="_blank" href="https://docs.python.org/3/library/asyncio.html">asyncio</a> package, which has become part of the stdlib. But this technology has scary issues, like the difficulty of socket backpressure, the fragile handling of exceptions and ctrl-C, the unsafe cancellation of (leaking) tasks, and the steep learning curve of an API full of gotchas and redundant concepts. Other frameworks like Trio/Curio, seemed <a target="_blank" href="https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/">much more careful on these subjects</a>. </p>
<p>If we have to recode tons of existing libraries, why base new versions on an engine that some developers have - not without arguments - called a <a target="_blank" href="https://veriny.tf/asyncio-a-dumpster-fire-of-bad-design/">dumpster fire of bad design</a>? But the <a target="_blank" href="https://en.wikipedia.org/wiki/Network_effect">network effect</a> is huge in such cases, and alternative async/await-based frameworks will have a hard time challenging the standard.</p>
<p>And what about the third pattern quoted above, lightweight threads? Long before this async/await trend, Python developers thought: we already have some perfectly fine synchronous business code, so let's change the way it is run, not the way it is written. Thus appeared lightweight threads, or "greenlets". They work like a bunch of tiny tasks scheduled on top of a few native threads, tasks which yield control to each other only when they block on IO or explicitly do so; and with much greater performance than native threads, in terms of memory usage and switching delay. </p>
<p>In the end, this system can quickly boost about any existing codebase so that it supports thousands of long-term concurrent tasks. And this is not an isolated mad experiment: Python lightweight threads have originally been used in Eve Online game (via Stackless Python), and have since successfully been ported to CPython (Gevent, Eventlet...) and PyPy. And they have actually <a target="_blank" href="https://en.wikipedia.org/wiki/Green_threads">existed for a long time</a> in lots of programming languages, under different names (green processes, green threads, fibers...).</p>
<p>The drawbacks of this system?</p>
<ul>
<li>Libraries must play nice with green threads, by yielding control instead of blocking on IOs, and launching green threads instead of native threads. In python, main libraries (socket, time.sleep(), threading) are forcibly made green-friendly via monkey-patching; but compiled extensions must be especially checked, since they can bypass these patches and block on their own system calls.</li>
<li>No heavy computation, or otherwise time-consuming tasks, must be performed, else all other tasks get impacted by the delay. For such needs, just delegate work to a pool of <a target="_blank" href="http://www.gevent.org/api/gevent.threadpool.html">native threads</a> (or a <a target="_blank" href="http://www.celeryproject.org/">celery</a>-like worker queue).</li>
</ul>
<p>As we see, these drawbacks are similar to those of async/await, except that you almost don't have to touch the original, synchronous code. An "except" which can mean months or years of work avoided ; your CTO and CEO should be highly pleased about this.</p>
<p>Now, you'll sometimes hear strange rationalizations from people who ditched lightweight threads in favor of a whole async/await reimplementation. Something in the lines of "<em>Explicit is better than implicit, and all these awaits show me exactly where my code could switch context, whereas green threads might switch discreetly if a third-party function performs any kind of IO or explicit switch</em>".</p>
<p>But the thing is...</p>
<p>FIRST, why do you need to know at which points exactly the program will switch to another task? For all the past years, with native (preemptive) thread, a switch could happen anywhere, anytime, even right in a middle of a simple increment. </p>
<p>But we learned to deal with this invisible threat properly, by protecting critical sections with locks and other synchronization primitives (Recursive Locks, Event, Condition, Semaphore...), keeping a proper order when nesting locks, and using thread-safe data structures (Queues and the likes) which handle concurrency for us. </p>
<p>Green threads are a middle ground between (implicit) preemptive threads and (explicit) async/await, but all of these technologies had better stick to the good old way of protecting concurrent operations. </p>
<p>Locks can be dangerous when misused (especially since most implementations stall, instead of detecting deadlock and reporting them as exceptions), but they are cheap and robust. What is the point of attempting to do lock-less concurrency, by checking the position of each potentially switch-triggering calls, when you could anytime have to add a new operation (even a simple logging output) in the middle of your carefully crafted lock-less sequence, and thus ruin its safety?</p>
<p><em>This naive code shows how a recently added call to log_counter_value() breaks an otherwise safe asynchronous code.</em></p>
<pre><code class="lang-python">
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">increment_counter</span>(<span class="hljs-params">counter</span>):</span>
     current = counter.current_value
     <span class="hljs-keyword">await</span> log_counter_value(current)  <span class="hljs-comment"># Unwanted context switch happens here</span>
     counter.current_value = current + <span class="hljs-number">1</span>
</code></pre>
<p>SECOND, do you really have to deal with synchronization? In the web world especially, where HTTP requests are not supposed to interact, we want parallelization, not concurrency. Persistent data (and transactions) are supposed to be handled by external databases and caches, not in process memory heap. </p>
<p>So usual thread-safety good practices (using thread-safe initialization of the process via locks, read-only structures for global data, and read-write data only local to stack frames) are enough to make the whole system "thread/greenlet/asynctask safe". </p>
<p>If one day you need to implement highly concurrent algorithms inside a process, you'll choose the best tool for that, but no need for hammer-building factories if all you have to do is thrust one nail.</p>
<h2 id="heading-money-bitcoins-amp-alternatives">Money: Bitcoins &amp; Alternatives</h2>
<p>Let's ponder for a moment. What are the biggest challenges of our 21st century? Climate change? Tax evasion? Legitimacy of state power? So candid minds could think that energetic sobriety, financial traceability, and (really) democratic organizations, would be goals to pursue.</p>
<p>But a group of smart hackers decided that current moneys were a major issue, and came up with Bitcoins: energy-devouring "proof of work" system, easy anonymity of money holders, and fuzzy (for the least) governance.</p>
<p>With such adequation between needs and demand, it's no wonder that Bitcoins became what they became: a product of (almost) pure speculation, praised by <a target="_blank" href="https://cointelegraph.com/news/research-suggests-russian-based-hackers-behind-ryuk-ransomwares-25-million-gains">ransomwares</a> and miscellaneous mafias, mass-mined by factories of graphics cards, with an especially high appetite for <a target="_blank" href="https://cryptosec.info/exchange-hacks/">being stolen</a> (or lost). </p>
<p>This money, and its soon-emerged siblings, have a history already full of bewildering moments, with accidental chain splits, <a target="_blank" href="https://en.bitcoin.it/wiki/Softfork">soft forks</a> blocked for <a target="_blank" href="https://en.bitcoin.it/wiki/Segregated_Witness">political reasons</a>, <a target="_blank" href="https://en.wikipedia.org/wiki/List_of_bitcoin_forks">hard forks</a> quite arbitrarily decided by miscellaneous people (or forced by <a target="_blank" href="https://news.bitcoin.com/verge-is-forced-to-fork-after-suffering-a-51-attack/">cyber attacks</a>), and endless battles between different currencies, or different versions of the same currency (Bitcoin Core, Cash, Gold, SV...). Algorithms (cryptography, consensus, transaction code...) were praised as the foundations of a bullet-proof and self-governing system, but some actors had to <a target="_blank" href="https://www.coindesk.com/crypto-developer-komodo-hacks-wallet-users-to-foil-13-million-hack">hack their own users</a> to protect them from theft, while even the so glorified "smart contracts" showed loads of <a target="_blank" href="https://blog.sigmaprime.io/solidity-security.html#">scary security weaknesses</a>, and <a target="_blank" href="https://www.coindesk.com/three-smart-contract-misconceptions">not as many use cases</a> as some expected.</p>
<p>Let's make it clear: the blockchain, a public ledger based on Merkle trees, is far from a bad idea. But when decisions are not based on the needs of society, and carefulness regarding bugs, but on ideology and greed, the outcome can be predicted. And the decline in hype is proportional to unduly invested hopes.</p>
<p>What is the "better" counterpart of Bitcoin, Ethereum, and the like? Lots of alternative cryptocurrencies exist, with lighter forms of authorization, with different crypto algorithms, with different privacy settings, with different adoption rates too... But if you ask me, what we would really need is "<strong>an easily traceable money for State finances and NGOs</strong>"; a public ledger designed so that any citizen could easily audit how public money is used, from the moment it's gathered via taxes and donations, to the moment it gets back into private circuits by paying goods or employee salaries. <em>Does anything like this exist yet, anyone? Couldn't find it...</em></p>
<p>One could also mention non-cryptographic but <em>local</em> moneys (ex. the "<a target="_blank" href="https://translate.google.com/translate?sl=fr&amp;tl=en&amp;u=http%3A%2F%2Fwww.lagonette.org%2F">Gonette</a>" in Lyon, France), kept on parity with national moneys, which have the advantage of favoring local businesses and thus lowering the collateral damages of international trade.</p>
<h2 id="heading-data-formats-text-and-binary">Data Formats: Text and Binary</h2>
<p>A witty passerby once defined XML as "<em>the readability of binary data with the efficiency of text</em>". Indeed XML parsers tend to be sluggish, and to clutter memory (when in DOM mode), compared to binary data loaders; and editing XML configurations and documents by hand is not the best user experience one might have.</p>
<p>We easily understand why XML, as a metalanguage allowing to create new tags and properties for all kinds of uses, needs to be so verbose. But why such enthusiasm for text-based formats, when the goal is to transmit information between servers using well-defined data types ?</p>
<p>Parsing HTTP payloads into an internal representation, and then parsing, for example, its JSON body, ends up adding significant overhead to webservice requests. For what gain ? Binary formats like <a target="_blank" href="http://bsonspec.org/">Bson</a> would make the serialization/deserialization much more performant; and semantically equivalent text formats could be used for debugging (auto-converted by web browser dev tools, Wireshark, CURL and the likes), and for manually crafting test payloads.</p>
<p>For sure, handling these dual representations of the same data would add a bit of complexity to the system, but in an era when startups love exposing webservices to thousands simultaneous clients, the performance boost can be real, with not so much effort. </p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>What's the moral of all this? Always the same, "<em>use the right tool for the right job, and beware of irrational fads</em>". It can take lots of reading before one has a sufficient depth of view, on a specific matter, to take educated decisions; but this investment quickly pays off. </p>
<p>Guessing how well a framework will be supported on the long-term, or which protocol/format will win a standardization war, is a different problem, but at least we can have our opinions firmly founded, when it comes to purely technical aspects, and this is Gold.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Powerful tools for Elasticsearch data visualization & analysis ]]>
                </title>
                <description>
                    <![CDATA[ By Veronika Rovnik The goal is to turn data into information, and information into insight. ―Carly Fiorina About Kibana Kibana is a piece of data visualization software that provides a browser-based interface for exploring Elasticsearch data and n... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/powerful-tools-for-elasticsearch-data-visualization-analysis/</link>
                <guid isPermaLink="false">66d4617cd14641365a05098d</guid>
                
                    <category>
                        <![CDATA[ big data ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data visualization ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Developer Tools ]]>
                    </category>
                
                    <category>
                        <![CDATA[ elasticsearch ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Web Development ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Tue, 13 Aug 2019 17:00:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2019/08/Copy-of-designing-a-scandinavian-style-home--1--1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Veronika Rovnik</p>
<blockquote>
<p>The goal is to turn data into information, and information into insight.</p>
<p>―Carly Fiorina</p>
</blockquote>
<h1 id="heading-about-kibana">About Kibana</h1>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/Kibana-Color-Lockup.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><a target="_blank" href="https://www.elastic.co/products/kibana/?r=fr4">Kibana</a> is a piece of <strong>data visualization software</strong> that provides a browser-based interface for <em>exploring Elasticsearch data</em> and <em>navigating the Elastic Stack</em> — a collection of open-source products (Elasticsearch, Logstash, Beats, and others).</p>
<p>While Logstash and Bits deliver data to Elasticsearch, <strong>Kibana</strong> <em>opens the window into the Elastic Stack</em>, allowing you to track the <em>health of your cluster</em>, perform <em>log</em> and <em>time-series analysis</em>, detect anomalies in the data with <em>unsupervised machine learning</em>, discover relationships using <em>graphs</em> and, most importantly, extract insights from the Elasticsearch data with <strong>visualizations</strong> that can be combined together in a <em>custom interactive dashboard</em>.</p>
<p>Today I’d like to show you how to create a stunning <strong>dashboard</strong> and a tabular <strong>report</strong> based on the Elasticsearch data.</p>
<p>Roll up your sleeves and let’s start!</p>
<h1 id="heading-where-to-start">Where to start</h1>
<p>The <strong>Home</strong> page is the place where everything starts.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/0_fpQgMCmvLqiFhur2.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Here you can decide which actions to take next. The available functionality can be divided into two logical sections:</p>
<ul>
<li><strong>Visualizing</strong> and <strong>exploring</strong> the data. Here you can create a new dashboard, visualization or presentation, build a machine learning model, analyze relationships in your data using <strong>graphs</strong>, and more.</li>
<li><strong>Managing</strong> the <strong>Elastic Stack</strong>: configure your spaces, analyze logs of an application, configure security settings, etc.</li>
</ul>
<p>We’ll focus on the process of creating visualizations and adding them to the dashboard.</p>
<h1 id="heading-how-to-create-a-dashboard-in-kibana">How to create a dashboard in Kibana</h1>
<p>Let me get you a feel for how easy it is to set up a <em>rich dashboard and start reporting.</em></p>
<p>The first essential step to take is to <em>import your data</em> into Kibana. Multiple options for adding data are at your disposal — you can choose the one that works best for you:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/0_sRsqKuv7Ptw0Clt1.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>For demonstration purposes, I’ve selected the sample data.</p>
<p>To design your first data visualizations and combine them into the dashboard, open the <strong>Visualize</strong> page. Here you can create, modify and view the existing visualizations.</p>
<p>What will strike you at once is the abundance of <strong>visualization types</strong> you can choose from.</p>
<p>After you’ve selected the one you need, choose an index pattern as a source so as to inform Kibana about your index. Let’s choose <code>kibana_sample_data_flights</code> and start creating a horizontal bar chart.</p>
<p>Now you can apply a metric aggregation for the Y-axis and a bucket aggregation for the X-axis. Here is a <a target="_blank" href="https://www.elastic.co/guide/en/kibana/7.1/xy-chart.html">list</a> of all available aggregations for charts.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/HorizontalBarChartKibana.gif" alt="Image" width="600" height="400" loading="lazy">
<em>Creating a horizontal bar chart in Kibana</em></p>
<p>Optionally, you can customize the colors of the visualization.</p>
<p><strong>Filtering</strong> is another mighty feature of Elasticsearch and Kibana. It provides a way to visualize only a selected subset of documents.</p>
<p>See how you can apply filters to the fields based on logical conditions:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/FilteringBarChartKibana.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>As you see, Kibana provides a straightforward way of filtering the data via the comfy interface. Along with that, you can choose how to filter the data — either by using the <strong>Kibana Query Language</strong> (a simplified query syntax) or <strong>Lucene</strong>.</p>
<p>To allow end-users to filter the data interactively, you can add <strong>control</strong> widgets — special elements of the dashboard which allow filtering the data simply by clicking them.</p>
<p>Another feature I’d like to highlight is the <strong>advanced filtering by dates</strong> and the ability to set time intervals for refreshing the data in the dashboard.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/0_dO63HLLppucTAw4M.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The good thing is that visualizations are <strong>reusable</strong>. After creating it, you can <strong>save your result</strong> and add it to the dashboard any time as well as <strong>share</strong> with your colleagues given they have access to your Kibana instance.</p>
<p><img src="https://miro.medium.com/max/38/0*sIPxndN5TdA8xOEH?q=20" alt="Image" width="600" height="400" loading="lazy">
<em>Saving a visualization in Kibana</em></p>
<p>After arranging all the visualization elements on a single page, you can export the final dashboard to <strong>PNG</strong> or <strong>PDF</strong> format. This is what makes the dashboards portable — it’s easy to share them across departments in no time.</p>
<p>Let’s look at an example of the dashboard you can create:</p>
<p><img src="https://miro.medium.com/max/38/0*N3TOSp4x8RObP9O-?q=20" alt="Image" width="600" height="400" loading="lazy">
<em>Interacting with the dashboard in Kibana</em></p>
<p>To my mind, the principal features which make each dashboard special are <strong>interactivity</strong> and <strong>expressiveness</strong>. With it, you can communicate business metrics efficiently.</p>
<h1 id="heading-personal-impression">Personal impression</h1>
<p>The visualizations in Kibana ideally perform the tasks they are designed for. What is more, all the visualizations are <strong>eye-catching</strong> and you can tailor them according to your design ideas. The entire process of creating a dashboard in Kibana is meant to be <em>fast</em> and <em>efficient</em> — and it is so due to the Kibana’s user-friendly and intuitive interface.</p>
<p>On the other hand, I’ve felt that some functionality is missing here.</p>
<p>When working with data, one of the effective exploratory techniques you can apply is <strong>slicing</strong> and <strong>dicing</strong> your data before getting to know which aspects of the data to pay attention to. To my mind, the data table widget isn’t the best option — it presents the data in a flat table which doesn’t support a multi-dimensional view of the data. But playing with data should be done interactively and fast.</p>
<p>And this is where a <strong>pivot table control</strong> comes into play. After searching for available solutions, my choice fell on one open-source <strong>plugin</strong> called <a target="_blank" href="https://www.flexmonster.com/?r=fr4">Flexmonster</a>. It handles connecting to the <em>Elasticsearch index</em> and allows creating <strong>tabular reports</strong> based on the data from its documents. Along with that, integrating with Kibana is smooth — the only thing required to get started is to install a plugin by running one line of code in the command line. You can find more details on <a target="_blank" href="https://github.com/flexmonster/pivot-kibana">GitHub</a>. Before using it, I recommend making sure that your Kibana and Elasticsearch instances are of the same version.</p>
<p>Once you set up a tool, you are ready to use all available features for searching in-depth insights.</p>
<h1 id="heading-features-for-analytics-and-reporting">Features for analytics and reporting</h1>
<p>Flexmonster Pivot provides fast access to the most essential reporting functionality. Its toolbar allows connecting to the data source, loading previously saved reports, exporting reports to <strong>PDF</strong>, <strong>Excel</strong>, <strong>HTML</strong>, <strong>CSV</strong>, and images. Besides, I’ve managed to quickly switch between two different modes — the grid and the charts. Cells formatting options include <strong>conditional</strong> and <strong>number formatting</strong>. The field list deserves particular attention — here you can select hierarchies to rows, columns, measures, and report filters. There is also the <em>search input field</em> which is helpful if the index has a long list of fields.</p>
<p>One of the features I’d like to highlight is the ability to <strong>drag and drop</strong> the hierarchies right on the grid. Thereby, you can change the slice completely via the UI.</p>
<p>Another one is the <strong>drill-through</strong> feature — it helps to know which records stand behind the aggregated values.</p>
<h1 id="heading-working-with-a-pivot-table">Working with a pivot table</h1>
<p>Let me show you how to create a report based on the Elasticsearch data:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2019/08/ReportInKibanaDevTo2.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>While testing the tool, I’ve managed to <em>aggregate</em> and <em>filter</em> the data, <em>sort</em> the values on the grid and save the results to continue working with the report later. Plus, exporting works well — it’s easy to share the reports with teammates.</p>
<h1 id="heading-bringing-it-all-together">Bringing it all together</h1>
<p>Today I’ve covered the benefits Kibana provides for visualization of Elasticsearch data. You’ve been able to make sure how dashboards can empower the analysis process.</p>
<p>To my mind, a pivot table is a good tool which enables you to benefit from exploring data before teasing out the answers to complex questions.</p>
<p>Flexmonster nicely complements the available functionality of Kibana - the reports you are creating with it are insightful, customizable and can be easily shared across departments. </p>
<p>Working together, both tools have all the potential to boost your storytelling. </p>
<p>I encourage you to give such a combination a try.</p>
<h2 id="heading-whats-next">What’s next?</h2>
<ul>
<li><a target="_blank" href="https://www.elastic.co/products/stack/reporting/?r=fr4">Reporting with Kibana</a></li>
<li><a target="_blank" href="https://www.elastic.co/guide/en/kibana/current/createvis.html">Creating a visualization in Kibana</a></li>
<li><a target="_blank" href="https://www.flexmonster.com/demos/connect-elasticsearch/?r=fr4">Pivot Table for Elasticsearch</a></li>
<li><a target="_blank" href="https://www.flexmonster.com/blog/new-pivot-table-for-kibana/?r=fr4">How to add a Pivot Table to Kibana</a></li>
</ul>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to use the Xodus database in Kotlin applications ]]>
                </title>
                <description>
                    <![CDATA[ By Mariya Davydova I want to show you how to use one of my favorite database choices for Kotlin applications. Namely, Xodus. Why do I like using Xodus for Kotlin applications? Well, here are a couple of its selling points: Transactional Embedded Sch... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-the-xodus-database-in-kotlin-applications-3f899896b9df/</link>
                <guid isPermaLink="false">66d46017d14641365a050917</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Kotlin ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Wed, 10 Apr 2019 16:12:41 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/0*1jikfdFxD_A5SK6z" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Mariya Davydova</p>
<p>I want to show you how to use one of my favorite database choices for <a target="_blank" href="https://kotlinlang.org/">Kotlin</a> applications. Namely, <a target="_blank" href="https://github.com/JetBrains/xodus">Xodus</a>. Why do I like using Xodus for Kotlin applications? Well, here are a couple of its selling points:</p>
<ul>
<li><strong>Transactional</strong></li>
<li><strong>Embedded</strong></li>
<li><strong>Schema-less</strong></li>
<li><strong>Pure JVM-based</strong></li>
<li>Has an additional <strong>Kotlin DSL</strong> — <a target="_blank" href="https://github.com/JetBrains/xodus-dnq">Xodus-DNQ</a>.</li>
</ul>
<p>What does this mean to you?</p>
<ul>
<li>ACID on-board — all database operations are atomic, consistent, isolated, and durable.</li>
<li>No need to manage an external database — everything is inside your application.</li>
<li>Painless refactorings — if you need to add a couple of properties you won’t have to then rebuild the tables.</li>
<li>Cross-platform database — Xodus can run on any platform that can run a Java virtual machine.</li>
<li>Kotlin language benefits — take the best from using types, nullable values and delegates for properties declaration and constraints description.</li>
</ul>
<p><a target="_blank" href="https://github.com/JetBrains/xodus">Xodus</a> is an open-source product from <a target="_blank" href="https://www.jetbrains.com/">JetBrains</a>. Originally it was developed for internal use, but it was subsequently released to the public back in July 2016. <a target="_blank" href="https://www.jetbrains.com/youtrack">YouTrack issue tracker</a> and <a target="_blank" href="https://www.jetbrains.com/hub/">Hub team tool</a> use it as their data storage. If you are curious about the performance, you can check out the <a target="_blank" href="https://github.com/JetBrains/xodus/wiki/Benchmarks">benchmarks</a>. As for the real-life example, take a look at the <a target="_blank" href="https://youtrack.jetbrains.com/issues">JetBrains YouTrack installation</a>: which at the time of writing has over 1,6 million issues, and that is not even taking into account all the comments and time tracking entries all stored there.</p>
<p><a target="_blank" href="https://github.com/JetBrains/xodus-dnq">Xodus-DNQ</a> is a Kotlin library that contains the data definition language and queries for Xodus. It was also developed first as a part of the product and then later released publicly. YouTrack and Hub both use it for persistent layer definition.</p>
<h3 id="heading-setup">Setup</h3>
<p>Let’s write a small application which stores books and their authors.</p>
<p>I will use Gradle as a build tool, as it helps simplify all the dependencies management and project compilation stuff. If you have never worked with Gradle, I recommend taking a look at the official guides they have on <a target="_blank" href="https://gradle.org/install/">installation</a> and <a target="_blank" href="https://guides.gradle.org/creating-new-gradle-builds/">creating new builds</a>.</p>
<p>So first, we need to start by creating a new directory for our example, and then run <code>gradle init</code> there. This will initialize the project structure and add some directories and build scripts.</p>
<p>Now, create a <code>bookstore.kt</code> file in <code>src/main/kotlin</code> directory. Fill it with the never-going-out-of-fashion classics:</p>
<pre><code class="lang-kotlin"><span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
  println(<span class="hljs-string">"Hello World"</span>)
}
</code></pre>
<p>Then, update the <code>build.gradle</code> file using code similar to this:</p>
<pre><code class="lang-kotlin">plugins {
  id <span class="hljs-string">'application'</span>
  id <span class="hljs-string">'org.jetbrains.kotlin.jvm'</span> version <span class="hljs-string">'1.3.21'</span>
}
group <span class="hljs-string">'mariyadavydova'</span>
version <span class="hljs-string">'1.0-SNAPSHOT'</span>
sourceCompatibility = <span class="hljs-number">1.8</span>
targetCompatibility = <span class="hljs-number">1.8</span>
tasks.withType(org.jetbrains.kotlin.gradle.tasks.KotlinCompile).all {
  kotlinOptions {
    jvmTarget = <span class="hljs-string">"1.8"</span>
  }
}
repositories {
  mavenCentral()
}
dependencies {
  implementation <span class="hljs-string">'org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.3.21'</span>
  implementation <span class="hljs-string">'org.jetbrains.xodus:dnq:1.2.420'</span>
}
mainClassName = <span class="hljs-string">'BookstoreKt'</span>
</code></pre>
<p>There are a few things that are happening here:</p>
<ol>
<li>We add the Kotlin plugin and claim that the compilation output is targeted for JVM 1.8.</li>
<li>We add dependencies to the Kotlin standard library and Xodus-DNQ.</li>
<li>We also add the application plugin and define the main class. In the case of the Kotlin application, we do not have a class with a static method main, like in Java. Instead, we have to define a standalone function <code>main</code>. However, under the hood, Kotlin still makes a class containing this function, and the name of the class is generated from the name of the file. For example, <code>‘bookstore.kt’</code> makes <code>‘BookstoreKt’</code>.</li>
</ol>
<p>We can actually safely remove <code>settings.gradle</code>, as we don’t need it in this example.</p>
<p>Now, execute <code>./gradlew run</code>; you should see “Hello World” in your console:</p>
<pre><code>&gt; Task :run
Hello World
</code></pre><h3 id="heading-data-definition">Data definition</h3>
<p><img src="https://cdn-media-1.freecodecamp.org/images/VQdCPUo-UlHYulNuJGemzF98MzBCfgfsq3k7" alt="Image" width="800" height="469" loading="lazy">
_Photo by [Unsplash](https://unsplash.com/@alfonsmc10?utm_source=medium&amp;utm_medium=referral" rel="noopener" target="_blank" title=""&gt;Alfons Morales on &lt;a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral" rel="noopener" target="<em>blank" title=")</em></p>
<p>Xodus provides three different ways to deal with data, namely <a target="_blank" href="https://github.com/JetBrains/xodus/wiki/Environments">Environments</a>, <a target="_blank" href="https://github.com/JetBrains/xodus/wiki/Entity-Stores">Entity Stores</a> and the <a target="_blank" href="https://github.com/JetBrains/xodus/wiki/Virtual-File-Systems">Virtual File System</a>. However, Xodus-DNQ supports only the Entity Stores, which describe a data model as a set of typed entities with named properties (attributes) and named entity links (relations). It is similar to rows in the SQL database table.</p>
<p>As my goal is to demonstrate how simple it is to operate Xodus via Kotlin DSL, I’ll stick to the entity types API for this story.</p>
<p>Let’s start with an <code>XdAuthor</code>:</p>
<pre><code><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdAuthor</span>(<span class="hljs-title">entity</span>: <span class="hljs-title">Entity</span>) : <span class="hljs-title">XdEntity</span>(<span class="hljs-title">entity</span>) </span>{
  companion object : XdNaturalEntityType&lt;XdAuthor&gt;()
<span class="hljs-keyword">var</span> name by xdRequiredStringProp()
  <span class="hljs-keyword">var</span> countryOfBirth by xdStringProp()
  <span class="hljs-keyword">var</span> yearOfBirth by xdRequiredIntProp()
  <span class="hljs-keyword">var</span> yearOfDeath by xdNullableIntProp()
  val books by xdLink0_N(XdBook::authors)
}
</code></pre><p>From my point of view, this declaration looks pretty natural: we say that our authors always have names and year of birth, may have country of birth and year of death (the latter is irrelevant for the currently living authors); also, there could be any number of books from each author in our bookstore.</p>
<p>There are several things worth mentioning in this code snippet:</p>
<ul>
<li>The <code>companion</code> object declares the <code>entityType</code> property for each class (which is used by the database engine).</li>
<li>The data fields are declared with the help of the delegates, which encapsulate the types, properties, and constraints for these fields.</li>
<li>Links are values, not variables; that is, you don’t set them with <code>=</code>, but access them as a collection. (Pay attention to <code>val books</code> versus <code>var name</code>; I spent quite a bit of time trying to figure out why the compilation with <code>var books</code> kept failing.)</li>
</ul>
<p>The second type is an <code>XdBook</code>:</p>
<pre><code class="lang-kotlin"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdBook</span></span>(entity: Entity) : XdEntity(entity) {
  <span class="hljs-keyword">companion</span> <span class="hljs-keyword">object</span> : XdNaturalEntityType&lt;XdBook&gt;()
<span class="hljs-keyword">var</span> title <span class="hljs-keyword">by</span> xdRequiredStringProp()
  <span class="hljs-keyword">var</span> year <span class="hljs-keyword">by</span> xdNullableIntProp()
  <span class="hljs-keyword">val</span> genres <span class="hljs-keyword">by</span> xdLink1_N(XdGenre)
  <span class="hljs-keyword">val</span> authors : XdMutableQuery&lt;XdAuthor&gt; <span class="hljs-keyword">by</span> xdLink1_N(XdAuthor::books)
}
</code></pre>
<p>The thing to pay attention to here is the declaration of the <code>authors</code>’ field:</p>
<ul>
<li>Notice that we write down the type explicitly (<code>XdMutableQuery&lt;XdAuth</code>or&gt;). For the bidirectional link, we have to help the compiler to resolve the types by leaving a hint on one of the link ends.</li>
<li>Also, notice that <code>XdAuthor::books</code> references <code>XdBook::authors</code> and vice versa. We have to add these references if we want the link to be bidirectional; so if you add an author to the book, the book will appear in the list of the books of this author, and vice versa.</li>
</ul>
<p>The third entity type is an <code>XdGenre</code> enumeration, which is pretty trivial:</p>
<pre><code class="lang-kotlin"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdGenre</span></span>(entity: Entity) : XdEnumEntity(entity) {
 <span class="hljs-keyword">companion</span> <span class="hljs-keyword">object</span> : XdEnumEntityType&lt;XdGenre&gt;() {
   <span class="hljs-keyword">val</span> FANTASY <span class="hljs-keyword">by</span> enumField {}
   <span class="hljs-keyword">val</span> ROMANCE <span class="hljs-keyword">by</span> enumField {}
 }
}
</code></pre>
<h3 id="heading-database-initialization">Database initialization</h3>
<p>Now, when we have declared the entity types, we have to initialize the database:</p>
<pre><code>fun initXodus(): TransientEntityStore {
  XdModel.registerNodes(
      XdAuthor,
      XdBook,
      XdGenre
  )
  val databaseHome = File(System.getProperty(<span class="hljs-string">"user.home"</span>), <span class="hljs-string">"bookstore"</span>)
  val store = StaticStoreContainer.init(
      dbFolder = databaseHome,
      environmentName = <span class="hljs-string">"db"</span>
  )
  initMetaData(XdModel.hierarchy, store)
  <span class="hljs-keyword">return</span> store
}
fun main() {
  val store = initXodus()
}
</code></pre><p>This code shows the most basic setup:</p>
<ul>
<li>We define the data model. Here we list all entity types manually, but it is possible to <a target="_blank" href="https://jetbrains.github.io/xodus-dnq/meta-model.html">auto scan the classpath</a> as well.</li>
<li>We initialize the database store in <code>{user.home}/bookstore</code> folder.</li>
<li>We link the metadata with the store.</li>
</ul>
<h3 id="heading-filling-the-data-in">Filling the data in</h3>
<p><img src="https://cdn-media-1.freecodecamp.org/images/X41x19KiXNapMX3lR5wDETktCtpl1prZQiHD" alt="Image" width="800" height="533" loading="lazy">
_Photo by [Unsplash](https://unsplash.com/@anniespratt?utm_source=medium&amp;utm_medium=referral" rel="noopener" target="_blank" title=""&gt;Annie Spratt on &lt;a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral" rel="noopener" target="<em>blank" title=")</em></p>
<p>Now that we have initialized the database, it’s time to put something inside. Before doing this, let’s add <code>toString</code> methods to our entity classes. Their only purpose is to allow us to output the database content in a human-readable format.</p>
<pre><code class="lang-kotlin"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdAuthor</span></span>(entity: Entity) : XdEntity(entity) {
  ...
  <span class="hljs-keyword">override</span> <span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">toString</span><span class="hljs-params">()</span></span>: String {
    <span class="hljs-keyword">val</span> bibliography = books.asSequence().joinToString(<span class="hljs-string">"\n"</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-string">"<span class="hljs-variable">$name</span> (<span class="hljs-variable">$yearOfBirth</span>-<span class="hljs-subst">${yearOfDeath ?: <span class="hljs-string">"???"</span>}</span>):\n<span class="hljs-variable">$bibliography</span>"</span>
  }
}
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdBook</span></span>(entity: Entity) : XdEntity(entity) {
  ...
  <span class="hljs-keyword">override</span> <span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">toString</span><span class="hljs-params">()</span></span>: String {
    <span class="hljs-keyword">val</span> genres = genres.asSequence().joinToString(<span class="hljs-string">", "</span>)
    <span class="hljs-keyword">return</span> <span class="hljs-string">"<span class="hljs-variable">$title</span> (<span class="hljs-subst">${year ?: <span class="hljs-string">"Unknown"</span>}</span>) - <span class="hljs-variable">$genres</span>"</span>
  }
}
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">XdGenre</span></span>(entity: Entity) : XdEnumEntity(entity) {
  ...
  <span class="hljs-keyword">override</span> <span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">toString</span><span class="hljs-params">()</span></span>: String {
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">this</span>.name.toLowerCase().capitalize()
  }
}
</code></pre>
<p>Notice <code>books.asSequence().joinToString("\n")</code> and <code>genres.asSequence().joinToString(", ")</code> instructions: here we use <code>asSequence()</code> method to convert an <code>XdQuery</code> to a Kotlin collection.</p>
<p>Right, let’s now add several books from our collection inside the main function. All database operations (creating, reading, updating and removing entities) we do inside transactions — atomic database modifications, which guarantees to preserve the consistency.</p>
<p>In the case of our bookstore, there are plenty of ways to fill it with stuff:</p>
<ol>
<li>Add an author and a book separately:</li>
</ol>
<pre><code class="lang-kotlin"> <span class="hljs-keyword">val</span> bronte = store.transactional {
   XdAuthor.new {
     name = <span class="hljs-string">"Charlotte Brontë"</span>
     countryOfBirth = <span class="hljs-string">"England"</span>
     yearOfBirth = <span class="hljs-number">1816</span>
     yearOfDeath = <span class="hljs-number">1855</span>
   } 
 }
 store.transactional {
   XdBook.new {
     title = <span class="hljs-string">"Jane Eyre"</span>
     year = <span class="hljs-number">1847</span>
     genres.add(XdGenre.ROMANCE)
     authors.add(bronte)
   }
 }
</code></pre>
<ol start="2">
<li>Add an author and put several books in their list:</li>
</ol>
<pre><code class="lang-kotlin"> <span class="hljs-keyword">val</span> tolkien = store.transactional {
   XdAuthor.new {
     name = <span class="hljs-string">"J. R. R. Tolkien"</span>
     countryOfBirth = <span class="hljs-string">"England"</span>
     yearOfBirth = <span class="hljs-number">1892</span>
     yearOfDeath = <span class="hljs-number">1973</span>
   }
 }
 store.transactional {
   tolkien.books.add(XdBook.new {
     title = <span class="hljs-string">"The Hobbit"</span>
     year = <span class="hljs-number">1937</span>
     genres.add(XdGenre.FANTASY)
   })
   tolkien.books.add(XdBook.new {
     title = <span class="hljs-string">"The Lord of the Rings"</span>
     year = <span class="hljs-number">1955</span>
     genres.add(XdGenre.FANTASY)
   })
 }
</code></pre>
<ol start="3">
<li>Add an author with books:</li>
</ol>
<pre><code class="lang-kotlin"> store.transactional {
   XdAuthor.new {
     name = <span class="hljs-string">"George R. R. Martin"</span>
     countryOfBirth = <span class="hljs-string">"USA"</span>
     yearOfBirth = <span class="hljs-number">1948</span>
     books.add(XdBook.new {
       title = <span class="hljs-string">"A Game of Thrones"</span>
       year = <span class="hljs-number">1996</span>
       genres.add(XdGenre.FANTASY)
     })
   }
 }
</code></pre>
<p>To check that everything is created, all we need to do is to print the content of our database:</p>
<pre><code class="lang-kotlin">store.transactional(readonly = <span class="hljs-literal">true</span>) {     println(XdAuthor.all().asSequence().joinToString(<span class="hljs-string">"\n***\n"</span>))
 }
</code></pre>
<p>Now, if you execute <code>./gradlew run</code>, you should see the following output:</p>
<pre><code>Charlotte Brontë (<span class="hljs-number">1816</span><span class="hljs-number">-1855</span>):
Jane Eyre (<span class="hljs-number">1847</span>) - Romance
***
J. R. R. Tolkien (<span class="hljs-number">1892</span><span class="hljs-number">-1973</span>):
The Hobbit (<span class="hljs-number">1937</span>) - Fantasy
The Lord <span class="hljs-keyword">of</span> the Rings (<span class="hljs-number">1955</span>) - Fantasy
***
George R. R. Martin (<span class="hljs-number">1948</span>-???):
A Game <span class="hljs-keyword">of</span> Thrones (<span class="hljs-number">1996</span>) - Fantasy
</code></pre><h3 id="heading-constraints">Constraints</h3>
<p>As mentioned, the transactions guarantee data consistency. One of the operations which Xodus does before saving the changes is checking the constraints. In the DNQ, some of them are encoded in the name of the delegate which provides a property of a given type. For example, <code>xdRequiredIntProp</code> has to always be set to some value, whereas <code>xdNullableIntProp</code> can remain empty.</p>
<p>Despite this, Xodus-DNQ allows defining more complex constraints which are described in the <a target="_blank" href="https://jetbrains.github.io/xodus-dnq/properties.html#simple-property-constraints">official documentation</a>. I have added several examples to the <code>XdAuthor</code> entity type:</p>
<pre><code class="lang-kotlin">  <span class="hljs-keyword">var</span> name <span class="hljs-keyword">by</span> xdRequiredStringProp { containsNone(<span class="hljs-string">"?!"</span>) }
  <span class="hljs-keyword">var</span> country <span class="hljs-keyword">by</span> xdStringProp {
    length(min = <span class="hljs-number">3</span>, max = <span class="hljs-number">56</span>)
    regex(Regex(<span class="hljs-string">"[A-Za-z.,]+"</span>))
  }
  <span class="hljs-keyword">var</span> yearOfBirth <span class="hljs-keyword">by</span> xdRequiredIntProp { max(<span class="hljs-number">2019</span>) }
  <span class="hljs-keyword">var</span> yearOfDeath <span class="hljs-keyword">by</span> xdNullableIntProp { max(<span class="hljs-number">2019</span>) }
</code></pre>
<p>You may be wondering why I have limited the <code>countryOfBirth</code> property length to 56 characters. Well, the longest official country name which I <a target="_blank" href="https://www.worldatlas.com/articles/what-is-the-longest-country-name-in-the-world.html">found</a> is “The United Kingdom of Great Britain and Northern Ireland” — precisely 56 characters!</p>
<h3 id="heading-queries">Queries</h3>
<p>We have already used database queries above. Do you remember? We printed the list of authors using <code>XdAuthor.all().asSequence()</code>. As you may guess, the <code>all()</code> method returns all the entries of a given entity type.</p>
<p>More often than not though, we will prefer filtering data. Here are some examples:</p>
<pre><code class="lang-kotlin">store.transactional(readonly = <span class="hljs-literal">true</span>) {
  <span class="hljs-keyword">val</span> fantasyBooks = XdBook.filter { 
    it.genres contains XdGenre.FANTASY }
  <span class="hljs-keyword">val</span> booksOf20thCentury = XdBook.filter { 
    (it.year ge <span class="hljs-number">1900</span>) and (it.year lt <span class="hljs-number">1999</span>) }
  <span class="hljs-keyword">val</span> authorsFromEngland = XdAuthor.filter { 
    it.countryOfBirth eq <span class="hljs-string">"England"</span> }

  <span class="hljs-keyword">val</span> booksSortedByYear = XdBook.all().sortedBy(XdBook::year)
  <span class="hljs-keyword">val</span> allGenres = XdBook.all().flatMapDistinct(XdBook::genres)
}
</code></pre>
<p>Again, there are plenty of options for building data queries, so I strongly recommend taking a look at the <a target="_blank" href="https://jetbrains.github.io/xodus-dnq/queries.html">documentation</a>.</p>
<p>I hope this story is as useful for you as it was for me when I wrote it :) Any feedback is highly appreciated!</p>
<p>You can find the <a target="_blank" href="https://github.com/mariyadavydova/bookstore-xodus-example">source code</a> for this tutorial here.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ The basics of NoSQL databases — and why we need them ]]>
                </title>
                <description>
                    <![CDATA[ By Nandhini Saravanan A beginner’s guide to the NoSQL world Organizing data is a very difficult task. When we say organise, we are actually categorising stuff depending on its type and function. _[Source](https://bitnine.net/wp-content/uploads/2016/... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/nosql-databases-5f6639ed9574/</link>
                <guid isPermaLink="false">66c35c1acf1314a450f0d72d</guid>
                
                    <category>
                        <![CDATA[ database ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ startup ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Thu, 31 Jan 2019 18:33:53 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/0*e6sondpXX3eeM_Tv" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Nandhini Saravanan</p>
<h4 id="heading-a-beginners-guide-to-the-nosql-world">A beginner’s guide to the NoSQL world</h4>
<p>Organizing data is a very difficult task. When we say organise, we are actually categorising stuff depending on its type and function.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*HYo3nxIVQPPy6mPRJ-RbVw.jpeg" alt="Image" width="640" height="400" loading="lazy">
_[Source](https://bitnine.net/wp-content/uploads/2016/12/SQL-vs.-NoSQL-Comparative-Advantages-and-Disadvantages.jpg" rel="noopener" target="<em>blank" title=")</em></p>
<p>One option is RDBMS is like an Excel Sheet — you categorise data in the form of tables. You can form relationships between the tables.</p>
<p>A <strong><em>query</em></strong> questions the database, which gives you a relevant answer in return. This querying language is <strong>SQL</strong> or <strong>Structured Query Language.</strong></p>
<p>For example,</p>
<pre><code>select * <span class="hljs-keyword">from</span> Employee_Data;
</code></pre><p>selects all the Employee Data from the Employee_Data table.</p>
<p>Relational databases follow a <a target="_blank" href="https://en.wikipedia.org/wiki/Database_schema"><strong><em>schema</em></strong></a>, a detailed blueprint of how your tables work.</p>
<p>You use Amazon, Facebook and so many networking applications. They release updates, add new functionalities and even extra modules. So how does one change the schema each time? Isn’t it time consuming for such huge companies to devote their time and labour to changing the schema?</p>
<p>This is where <strong>SQL could not work</strong>.</p>
<h3 id="heading-the-cons-of-rdbms">The Cons of RDBMS</h3>
<p>Relational databases aren’t as bad as people portray these days. They are still in use by plenty of organisations. The introduction of NoSQL into the picture is to fill up the spaces where RDBMS can’t be of use anymore.</p>
<p>I am going to show you examples so that you have a clear understanding.</p>
<h4 id="heading-1-rdbms-can-not-handle-data-variety">1. RDBMS can not handle ‘Data Variety’.</h4>
<p>The amount of unstructured data continues to increase yearly and managing it is hard. RDBMS can’t force all types of data under a unified schema of tables.</p>
<p><strong>Data Silos</strong> are also a problem for developers.</p>
<p>According to <a target="_blank" href="https://www.techtarget.com/">Tech Target</a>, a <strong>data silo</strong> is a repository of data that remains under the control of one department. It is isolated from the rest of the organisation.</p>
<p>This means that when more silos exist for the same data, their contents are likely to differ. It creates confusion on which repository represents the most up-to-date version.</p>
<p>The increase of data from the year 2013 to 2020 is visible in the image below.</p>
<blockquote>
<p>About 44 Zeta bytes of data will be generated in the year 2020.</p>
</blockquote>
<p>Handling such diverse data which aren’t related to each other could be much harder in RDBMS.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*JdDNyv7ujiszKRC2rsYfJw.jpeg" alt="Image" width="595" height="277" loading="lazy">
_[Source](https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm" rel="noopener" target="<em>blank" title=")</em></p>
<p><strong>Example:</strong> It is difficult to store the details of a patient, who has varying body conditions. Categorisation of such diverse data is difficult in RDBMS.</p>
<h4 id="heading-2-difficult-to-change-tables-and-relationships">2. Difficult to change tables and relationships.</h4>
<p>Alteration of the relationships between tables or addition of a new table could affect the existing relations. This means changing the schema.</p>
<p>Change of the schema would be like eliminating the existing one and devising a new schema.</p>
<p>Addition of a new functionality would need all the elements to support the new structure. Change is inevitable.</p>
<p><strong>Example:</strong> Each extra column needs all the prior rows to have values for that column. Whereas in <strong>Cassandra</strong> (a NoSQL database), you can add a column to specific row partitions.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*-tu66cPX8XHUkhqQ" alt="Image" width="638" height="479" loading="lazy">
_In RDBMS, every entry should have the same number of columns. But in Cassandra, each row can have a different number of columns. As you can see, 104 has name only whereas 103 has email, name, tel and tel2. — [Markus Klems](https://www.slideshare.net/yellow7?utm_campaign=profiletracking&amp;utm_medium=sssite&amp;utm_source=ssslideview" rel="noopener" target="<em>blank" title=")</em></p>
<h4 id="heading-3-rdbms-follow-the-acid-properties-of-the-database">3. RDBMS follow the ACID properties of the database.</h4>
<p>The ACID properties of a database are Atomicity, Consistency, Isolation and Durability. ‌</p>
<p><strong>Atomicity</strong> — An “all or nothing” approach. If any statement in the transaction fails, the entire transaction is rolled back.</p>
<p><strong>Consistency —</strong> The transaction must meet all protocols defined by the system. No half completed transactions.</p>
<p><strong>Isolation —</strong> No transaction has access to any other transaction that is in an intermediate or unfinished state. Each transaction is independent.</p>
<p><strong>Durability</strong> — Ensures that once a transaction commits to the database, it is preserved through the use of backups and transaction logs.</p>
<p>The ACID properties aren’t flexible.</p>
<p>For example, RDBMS follows <a target="_blank" href="https://en.wikipedia.org/wiki/Database_normalization"><strong>Normalization</strong></a> or <strong>a single point of truth</strong> concept. For every change you make, you should ensure strict ACID properties. The <a target="_blank" href="https://en.wikipedia.org/wiki/Entity_integrity">entity integrity</a> and <a target="_blank" href="https://en.wikipedia.org/wiki/Referential_integrity">referential integrity</a> rules also apply.</p>
<h3 id="heading-the-cap-theorem">The CAP Theorem</h3>
<p>According to <a target="_blank" href="https://en.wikipedia.org/wiki/CAP_theorem">Wikipedia</a>, the <strong>CAP theorem</strong> (Brewer’s theorem) states that it is impossible for a distributed data store to <strong>simultaneously provide more than two</strong> out of the following three guarantees:</p>
<p><strong>Consistency:</strong> Like the C in ACID.</p>
<p><strong>Availability</strong>: ‌Resources should be always available. There should be a non error response.</p>
<p><strong>Partition tolerance</strong>: No single point (or node) of failure.</p>
<p>It is difficult to achieve all the three conditions. One must compromise between the three.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*DXG4bVXnJ6xhHbJe6SmkFA.png" alt="Image" width="437" height="400" loading="lazy">
_[Source](https://www.dummies.com/wp-content/uploads/423504.image0.jpg" rel="noopener" target="<em>blank" title=")</em></p>
<h3 id="heading-base-to-the-rescue">BASE to the rescue!</h3>
<p>‌NoSQL relies upon a softer model known as the BASE model. <strong>BASE</strong> (<strong>B</strong>asically <strong>A</strong>vailable, <strong>S</strong>oft state, <strong>E</strong>ventual consistency).</p>
<p><strong>Basically Available:</strong> Guarantees the availability of the data . There will be a response to any request (can be failure too).</p>
<p><strong>Soft state</strong>: The state of the system could change over time.</p>
<p><strong>Eventual consistency:</strong> The system will eventually become consistent once it stops receiving input.</p>
<p>NoSQL databases give up the A, C and/or D requirements, and in return they improve scalability.</p>
<h3 id="heading-nosql">NoSQL</h3>
<p>This is when NoSQL came to the rescue.‌ It is “<strong>Not Only SQL”</strong> or “Non-relational” databases.</p>
<p>Characteristics of NoSQL:</p>
<ul>
<li>Schema free</li>
<li>Eventually consistent (as in the BASE property)</li>
<li>Replication of data stores to avoid Single Point of Failure.</li>
<li>Can handle Data variety and huge amounts of data.</li>
</ul>
<h3 id="heading-types-of-nosql-databases">Types of NoSQL databases</h3>
<p>NoSQL databases fall into four main categories:</p>
<p><strong>Key value Stores —</strong> Riak, Voldemort, and Redis</p>
<p><strong>Wide Column Stores —</strong> Cassandra and HBase.</p>
<p><strong>Document databases —</strong> MongoDB</p>
<p><strong>Graph databases</strong> — Neo4J and HyperGraphDB.</p>
<p>The words to the right hand side are examples of the types of NoSQL database types.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*k7VI_3bUow1CvXHxSBKaww.jpeg" alt="Image" width="800" height="800" loading="lazy">
_[Source](https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/nosql-quadrant.jpg" rel="noopener" target="<em>blank" title=")</em></p>
<h3 id="heading-1-key-value-stores">1. <strong>Key Value Stores</strong></h3>
<p>A key value store uses a <strong>hash table</strong> in which there exists a <strong>unique key</strong> and a <strong>pointer</strong> to a particular item of data.</p>
<p>Imagine key value stores to be like a phone directory where the names of the individual and their numbers are mapped together.</p>
<p>Key value stores have no default query language. You retrieve data using <em>get, put, and delete</em> commands. This is the reason it has <strong>high performance.</strong></p>
<p><strong>Applications</strong>: Useful for storage of Comments and Session information. ‌Pinterest uses Redis to store lists of users, followers, unfollowers, boards.</p>
<h3 id="heading-2-wide-column-stores"><strong>2. Wide column stores</strong></h3>
<p>In a column store database, the columns in each row are contained within that row.</p>
<p>Each <strong>column family</strong> is a container of rows in an RDBMS table. The <strong>key</strong> identifies the row consisting of multiple columns.</p>
<p>Rows do not need to have the <strong>same number</strong> of columns. Columns can be added to any row at any time without having to add it to other rows. It is a <strong>partitioned row store.</strong></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*wJdqYEIbxD63059_UNWUYA.png" alt="Image" width="800" height="474" loading="lazy">
_[Source](https://studio3t.com/wp-content/uploads/2017/12/cassandra-column-family-example.png" rel="noopener" target="<em>blank" title=")</em></p>
<h4 id="heading-how-does-a-columnar-database-store-data"><strong>How does a columnar database store data?</strong></h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*p9QNl8LCMfluRlqq7SmV4g.png" alt="Image" width="800" height="449" loading="lazy">
<em>How columnar stores store data</em></p>
<p><strong>Applications</strong>: <a target="_blank" href="https://www.spotify.com/"><strong>Spotify</strong></a> uses Cassandra to store user profile attributes and metadata.</p>
<h3 id="heading-3-document-databases"><strong>3. Document Databases</strong></h3>
<p>‌Document stores uses JSON, XML, or BSON (binary encoding of JSON) documents to store data.</p>
<p>It is like a key-value database, but a document store consists of <strong>semi-structured data</strong>.</p>
<p>A single document is to store records and its data.</p>
<p>‌It <strong>does not support relations or joins.</strong></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*65P8gE1JkCGgZDHWhVt3gg.png" alt="Image" width="800" height="430" loading="lazy">
_An example of a JSON document — [Source](https://webassets.mongodb.com/_com_assets/cms/JSON_Example_Python_MongoDB-mzqqz0keng.png" rel="noopener" target="<em>blank" title=")</em></p>
<p>If we want to store the customer details and their orders, we can use document stores to do it.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/0*MrhoMn_ewuvDcbOO.png" alt="Image" width="738" height="607" loading="lazy">
_The Customer database is stored as a set of documents(can be JSON) which is mapped to the Orders database. Source : [MSDN Microsoft Blog](https://blogs.msdn.microsoft.com/usisvde/2012/04/05/getting-acquainted-with-nosql-on-windows-azure/" rel="noopener" target="<em>blank" title=")</em></p>
<p>Applications: ‌<a target="_blank" href="https://www.sega.com/games"><strong>SEGA</strong></a> uses MongoDB for handling 11 million in-game accounts built on MongoDB.</p>
<h3 id="heading-4-graph-databases"><strong>4. Graph databases</strong></h3>
<p>‌Nodes and relationships are the essential constituents of graph databases. A <strong>node represents an entity.</strong> A <strong>relationship</strong> represents how two nodes are associated.</p>
<p>‌In RDBMS, adding another relation results in a lot of schema changes.</p>
<p>Graph database requires only storing data once (nodes). The different types of relationships (edges) are specified to the stored data.</p>
<p>The relationships between the nodes are predetermined, that is, it is not determined at query time.</p>
<p>Traversing <strong>persisted relationships</strong> are faster.</p>
<p>It is difficult to change a relation between two nodes. It would result in regressive changes in the database.</p>
<p><strong>Example</strong>: This image is how <strong>MySQL</strong> works where it has to perform many operations to find a correct result for Alice.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*0MjP3w9EuC6AK6JOt2FsoA.png" alt="Image" width="800" height="424" loading="lazy">
_[Source](https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/from_relational_model.png" rel="noopener" target="<em>blank" title=")</em></p>
<p>‌<strong>A graph database</strong>, which <strong>predetermines relationships.</strong></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*q2s8IGzh-dF-A_qE5HCT5g.png" alt="Image" width="800" height="276" loading="lazy">
_[Source](https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/relational_to_graph.png" rel="noopener" target="<em>blank" title=")</em></p>
<p>This is some of the basic information you will need to start exploring NoSQL. New databases are being invented for specific uses.</p>
<p>Learn the type of data your application generates, and then it is easy to choose the right database.</p>
<h4 id="heading-i-write-stories-on-life-lessons-coding-and-technology-to-read-more-follow-me-on-twitterhttpstwittercomsnandhini98-and-mediumhttpmediumcomnandhus05">I write stories on Life Lessons, Coding and Technology. To read more, follow me on <a target="_blank" href="https://twitter.com/snandhini98">Twitter</a> and <a target="_blank" href="http://medium.com/@nandhus05">Medium.</a></h4>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to get started with MongoDB in 10 minutes ]]>
                </title>
                <description>
                    <![CDATA[ By Navindu Jayatilake MongoDB is a rich document-oriented NoSQL database. If you are a complete beginner to NoSQL, I recommend you to have a quick look at my NoSQL article published previously. Today, I wanted to share some of the basic stuff about M... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/learn-mongodb-a4ce205e7739/</link>
                <guid isPermaLink="false">66c359c57ef110ecbf367b46</guid>
                
                    <category>
                        <![CDATA[ MongoDB ]]>
                    </category>
                
                    <category>
                        <![CDATA[ NoSQL ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Productivity ]]>
                    </category>
                
                    <category>
                        <![CDATA[ General Programming ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sun, 27 Jan 2019 05:14:21 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*Ta4qktHtO--RMUpnR08mBg.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Navindu Jayatilake</p>
<p>MongoDB is a rich document-oriented NoSQL database.</p>
<p>If you are a complete beginner to NoSQL, I recommend you to have a quick look at my <a target="_blank" href="https://medium.com/@navindushane/say-no-to-sql-ab1e49aa7299">NoSQL article</a> published previously.</p>
<p>Today, I wanted to share some of the basic stuff about MongoDB commands such as querying, filtering data, deleting, updating and so on.</p>
<p><strong>Okay, enough of the talk, let’s get to work!</strong></p>
<h2 id="heading-project-configuration">Project Configuration</h2>
<p>In order to work with MongoDB, first you need to install MongoDB on your computer. To do this, visit <a target="_blank" href="https://www.mongodb.com/download-center/community">the official download center</a> and download the version for your specific OS. Here, I’ve used Windows.</p>
<p>After downloading MongoDB community server setup, you’ll go through a ‘next after next’ installation process. Once done, head over to the C drive in which you have installed MongoDB. Go to program files and select the MongoDB directory.</p>
<pre><code>C: -&gt; Program Files -&gt; MongoDB -&gt; Server -&gt; <span class="hljs-number">4.0</span>(version) -&gt; bin
</code></pre><p>In the bin directory, you’ll find an interesting couple of executable files.</p>
<ul>
<li>mongod</li>
<li>mongo</li>
</ul>
<p>Let’s talk about these two files.</p>
<p><code>mongod</code> stands for “Mongo Daemon”. mongod is a background process used by MongoDB. The main purpose of mongod is to manage all the MongoDB server tasks. For instance, accepting requests, responding to client, and memory management.</p>
<p><code>mongo</code> is a command line shell that can interact with the client (for example, system administrators and developers).</p>
<p>Now let’s see how we can get this server up and running. To do that on Windows, first you need to create a couple of directories in your C drive. Open up your command prompt inside your C drive and do the following:</p>
<pre><code>C:\&gt; mkdir data/dbC:\&gt; cd dataC:\&gt; mkdir db
</code></pre><p>The purpose of these directories is MongoDB requires a folder to store all data. MongoDB’s default data directory path is <code>/data/db</code> on the drive. Therefore, it is necessary that we provide those directories like so.</p>
<p>If you start the MongoDB server without those directories, you’ll probably see this following error:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/r04FRmRGqKUaclGh4ZDo3YsMwOlXMVm2T3bJ" alt="Image" width="800" height="275" loading="lazy">
<em>trying to start mongodb server without \data\db directories</em></p>
<p>After creating those two files, head over again to the bin folder you have in your mongodb directory and open up your shell inside it. Run the following command:</p>
<pre><code>mongod
</code></pre><p>Voilà! Now our MongoDB server is up and running! ?</p>
<p>In order to work with this server, we need a mediator. So open another command window inside the bind folder and run the following command:</p>
<pre><code>mongo
</code></pre><p>After running this command, navigate to the shell which we ran mongod command (which is our server). You’ll see a ‘connection accepted’ message at the end. That means our installation and configuration is successful!</p>
<p>Just simply run in the mongo shell:</p>
<pre><code>db
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/TK2JGg4JXAj0eG9JBzl89ABEF3JuKAwnw2dx" alt="Image" width="561" height="141" loading="lazy">
<em>initially you have a db called ‘test’</em></p>
<h3 id="heading-how-to-set-up-environment-variables">How to Set Up Environment Variables</h3>
<p>To save time, you can set up your environment variables. In Windows, this is done by following the menus below:</p>
<pre><code>Advanced System Settings -&gt; Environment Variables -&gt; Path(Under System Variables) -&gt; Edit
</code></pre><p>Simply copy the path of our bin folder and hit OK! In my case it’s <code>C:\Program Files\MongoDB\Server\4.0\bin</code></p>
<p>Now you’re all set!</p>
<h2 id="heading-how-to-work-with-mongodb">How to Work with MongoDB</h2>
<p>There’s a bunch of GUIs (Graphical User Interface) to work with MongoDB server such as MongoDB Compass, Studio 3T and so on.</p>
<p>They provide a graphical interface so you can easily work with your database and perform queries instead of using a shell and typing queries manually.</p>
<p>But in this article we’ll be using command prompt to do our work.</p>
<p>Now it’s time for us to dive into MongoDB commands that’ll help you to use with your future projects.</p>
<ol>
<li><p>Open up your command prompt and type <code>mongod</code> to start the MongoDB server.</p>
</li>
<li><p>Open up another shell and type <code>mongo</code> to connect to MongoDB database server.</p>
</li>
</ol>
<h3 id="heading-1-find-the-current-database-youre-in">1. Find the current database you’re in</h3>
<pre><code>db
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/o6puQoPSpGCW8-AgizHzAv3Qpywtzsgwd26N" alt="Image" width="561" height="126" loading="lazy"></p>
<p>This command will show the current database you are in. <code>test</code> is the initial database that comes by default.</p>
<h3 id="heading-2-list-databases">2. List databases</h3>
<pre><code>show databases
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/Q-G8NzP5OAXh0Y3OfdOtqFxlFG-tLErPlPSi" alt="Image" width="562" height="205" loading="lazy"></p>
<p>I currently have four databases. They are: <code>CrudDB</code>, <code>admin</code>, <code>config</code> and <code>local</code>.</p>
<h3 id="heading-3-go-to-a-particular-database">3. Go to a particular database</h3>
<pre><code>use &lt;your_db_name&gt;
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/UIRueBuX-r6nRXA-qd6Uv95IBd0UbhVvMZtZ" alt="Image" width="219" height="100" loading="lazy"></p>
<p>Here I’ve moved to the <code>local</code> database. You can check this if you try the command <code>db</code> to print out the current database name.</p>
<h3 id="heading-4-create-a-database">4. Create a Database</h3>
<p>With RDBMS (Relational Database Management Systems) we have Databases, Tables, Rows and Columns.</p>
<p>But in NoSQL databases, such as MongoDB, data is stored in BSON format (a binary version of JSON). They are stored in structures called “collections”.</p>
<p>In SQL databases, these are similar to Tables.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/e7ygVKXaPcqcqCyvurAeUzAbmmREoA6p72V2" alt="Image" width="650" height="450" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/oxeGaPqbZ2pmmZx3WcDo8CXIL4J09PbecBWW" alt="Image" width="702" height="362" loading="lazy">
_SQL terms and NoSQL terms by [Victoria Malaya](https://www.blogger.com/profile/18437865869379626284" rel="noopener" target="<em>blank" title="author profile)</em></p>
<p>Alright, let’s talk about how we create a database in the Mongo shell.</p>
<pre><code>use &lt;your_db_name&gt;
</code></pre><p>Wait, we had this command before! Why am I using it again?!</p>
<p>In MongoDB server, if your database is present already, using that command will navigate into your database.</p>
<p>But if the database is not present already, then MongoDB server is going to create the database for you. Then, it will navigate into it.</p>
<p>After creating a new database, running the <code>show database</code> command will not show your newly created database. This is because, until it has any data (documents) in it, it is not going to show in your db list.</p>
<h3 id="heading-5-create-a-collection">5. Create a Collection</h3>
<p>Navigate into your newly created database with the <code>use</code> command.</p>
<p>Actually, there are two ways to create a collection. Let’s see both.</p>
<p>One way is to insert data into the collection:</p>
<pre><code>db.myCollection.insert({<span class="hljs-string">"name"</span>: <span class="hljs-string">"john"</span>, <span class="hljs-string">"age"</span> : <span class="hljs-number">22</span>, <span class="hljs-string">"location"</span>: <span class="hljs-string">"colombo"</span>})
</code></pre><p>This is going to create your collection <code>myCollection</code> even if the collection does not exist. Then it will insert a document with <code>name</code> and <code>age</code>. These are non-capped collections.</p>
<p>The second way is shown below:</p>
<p>2.1 Creating a Non-Capped Collection</p>
<pre><code>db.createCollection(<span class="hljs-string">"myCollection"</span>)
</code></pre><p>2.2 Creating a Capped Collection</p>
<pre><code>db.createCollection(<span class="hljs-string">"mySecondCollection"</span>, {<span class="hljs-attr">capped</span> : <span class="hljs-literal">true</span>, <span class="hljs-attr">size</span> : <span class="hljs-number">2</span>, <span class="hljs-attr">max</span> : <span class="hljs-number">2</span>})
</code></pre><p>In this way, you’re going to create a collection without inserting data.</p>
<p>A “capped collection” has a maximum document count that prevents overflowing documents.</p>
<p>In this example, I have enabled capping, by setting its value to <code>true</code>.</p>
<p>The <code>size : 2</code> means a limit of two megabytes, and <code>max: 2</code> sets the maximum number of documents to two.</p>
<p>Now if you try to insert more than two documents to <code>mySecondCollection</code> and use the <code>find</code> command (which we will talk about soon), you’ll only see the most recently inserted documents. Keep in mind this doesn’t mean that the very first document has been deleted — it is just not showing.</p>
<h4 id="heading-6-inserting-data"><strong>6. Inserting Data</strong></h4>
<p>We can insert data to a new collection, or to a collection that has been created before.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/uO4agHbI85kMJrQmF1L9pMmhn0WcgngmoPsI" alt="Image" width="605" height="698" loading="lazy">
<em>ways data can be stored in a JSON</em></p>
<p>There are three methods of inserting data.</p>
<ol>
<li><code>insertOne()</code> is used to insert a single document only.</li>
<li><code>insertMany()</code> is used to insert more than one document.</li>
<li><code>insert()</code> is used to insert documents as many as you want.</li>
</ol>
<p>Below are some examples:</p>
<ul>
<li><strong>insertOne()</strong></li>
</ul>
<pre><code>db.myCollection.insertOne(
  {
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"navindu"</span>, 
    <span class="hljs-string">"age"</span>: <span class="hljs-number">22</span>
  }
)
</code></pre><ul>
<li><strong>insertMany()</strong></li>
</ul>
<pre><code>db.myCollection.insertMany([
  {
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"navindu"</span>, 
    <span class="hljs-string">"age"</span>: <span class="hljs-number">22</span>
  },
  {
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"kavindu"</span>, 
    <span class="hljs-string">"age"</span>: <span class="hljs-number">20</span>
  },

  {
    <span class="hljs-string">"name"</span>: <span class="hljs-string">"john doe"</span>, 
    <span class="hljs-string">"age"</span>: <span class="hljs-number">25</span>,
    <span class="hljs-string">"location"</span>: <span class="hljs-string">"colombo"</span>
  }
])
</code></pre><p>The <code>insert()</code> method is similar to the <code>insertMany()</code> method.</p>
<p>Also, notice we have inserted a new property called <code>location</code> on the document for <code>John Doe</code><em>.</em> So if you use <code>find</code><strong><em>,</em></strong> then you’ll see only for <code>john doe</code> the <code>location</code> property is attached.</p>
<p>This can be an advantage when it comes to NoSQL databases such as MongoDB. It allows for scalability.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/QyCgwWUHWoporNunUvoRgdVry-x0QyA8qSxd" alt="Image" width="403" height="40" loading="lazy">
<em>Successfully inserted data</em></p>
<h4 id="heading-7-querying-data"><strong>7. Querying Data</strong></h4>
<p>Here’s how you can query all data from a collection:</p>
<pre><code>db.myCollection.find()
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/rzcViLqDrTy5gqSFoY6n3N7dciNxFTY62eRL" alt="Image" width="578" height="64" loading="lazy">
<em>result</em></p>
<p>If you want to see this data in a cleaner, way just add <code>.pretty()</code> to the end of it. This will display document in pretty-printed JSON format.</p>
<pre><code>db.myCollection.find().pretty()
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/gMIbpNqjr9jmJ3YVDZruX1skX0PCvSruuZWB" alt="Image" width="406" height="172" loading="lazy">
<em>result</em></p>
<p>Wait...In these examples did you just notice something like <code>_id</code>? How did that get there?</p>
<p>Well, whenever you insert a document, MongoDB automatically adds an <code>_id</code> field which uniquely identifies each document. If you do not want it to display, just simply run the following command</p>
<pre><code>db.myCollection.find({}, <span class="hljs-attr">_id</span>: <span class="hljs-number">0</span>).pretty()
</code></pre><p>Next, we’ll look at filtering data.</p>
<p>If you want to display some specific document, you could specify a single detail of the document which you want to be displayed.</p>
<pre><code>db.myCollection.find(
  {
    <span class="hljs-attr">name</span>: <span class="hljs-string">"john"</span>
  }
)
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/TiBBNNp9gmxtPXaHd5BSZ7MkSrv1JkRzkMI1" alt="Image" width="710" height="44" loading="lazy">
<em>result</em></p>
<p>Let’s say you want only to display people whose age is less than 25. You can use <code>$lt</code> to filter for this.</p>
<pre><code>db.myCollection.find(
  {
    <span class="hljs-attr">age</span> : {<span class="hljs-attr">$lt</span> : <span class="hljs-number">25</span>}
  }
)
</code></pre><p>Similarly, <code>$gt</code> stands for greater than, <code>$lte</code> is “less than or equal to”, <code>$gte</code> is “greater than or equal to” and <code>$ne</code> is “not equal”.</p>
<h4 id="heading-8-updating-documents"><strong>8. Updating documents</strong></h4>
<p>Let’s say you want to update someone’s address or age, how you could do it? Well, see the next example:</p>
<pre><code>db.myCollection.update({<span class="hljs-attr">age</span> : <span class="hljs-number">20</span>}, {<span class="hljs-attr">$set</span>: {<span class="hljs-attr">age</span>: <span class="hljs-number">23</span>}})
</code></pre><p>The first argument is the field of which document you want to update. Here, I specify <code>age</code> for the simplicity. In production environment, you could use something like the <code>_id</code> field.</p>
<p>It is always better to use something like <code>_id</code> to update a unique row. This is because multiple fields can have same <code>age</code> and <code>name</code>. Therefore, if you update a single row, it will affect all rows which have same name and age.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/qQH53vM6-peOzS-z9k5YjMoS9R2z1APJrXvB" alt="Image" width="674" height="95" loading="lazy">
<em>result</em></p>
<p>If you update a document this way with a new property, let’s say <code>location</code> for example, the document will be updated with the new attribute. And if you do a <code>find</code>, then the result will be:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/YqJpPAw7d5NPSTzStCevUmgoDTm6FkgPLZ-7" alt="Image" width="516" height="76" loading="lazy">
<em>result</em></p>
<p>If you need to remove a property from a single document, you could do something like this (let’s say you want <code>age</code> to be gone):</p>
<pre><code>db.myCollection.update({<span class="hljs-attr">name</span>: <span class="hljs-string">"navindu"</span>}, {<span class="hljs-attr">$unset</span>: age});
</code></pre><h4 id="heading-9-removing-a-document"><strong>9. Removing a document</strong></h4>
<p>As I have mentioned earlier, when you update or delete a document, you just need specify the <code>_id</code> not just <code>name</code>, <code>age</code>, <code>location</code>.</p>
<pre><code>db.myCollection.remove({<span class="hljs-attr">name</span>: <span class="hljs-string">"navindu"</span>});
</code></pre><h4 id="heading-10-removing-a-collection"><strong>10. Removing a collection</strong></h4>
<pre><code>db.myCollection.remove({});
</code></pre><p>Note, this is not equal to the <code>drop()</code> method. The difference is <code>drop()</code> is used to remove all the documents inside a collection, but the <code>remove()</code> method is used to delete all the documents along with the collection itself.</p>
<h3 id="heading-logical-operators">Logical Operators</h3>
<p>MongoDB provides logical operators. The picture below summarizes the different types of logical operators.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/xO27jGeclafiAUt0a0VYRifhDpISvZcIkhRD" alt="Image" width="659" height="223" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/VsHbrchxUETWqCFhZc6QvmSPUdrbfOHYEH3L" alt="Image" width="745" height="255" loading="lazy">
<em>reference: MongoDB manual</em></p>
<p>Let’s say you want to display people whose age is less than 25, and also whose location is Colombo. What we could do?</p>
<p>We can use the <code>$and</code> operator!</p>
<pre><code>db.myCollection.find({<span class="hljs-attr">$and</span>:[{<span class="hljs-attr">age</span> : {<span class="hljs-attr">$lt</span> : <span class="hljs-number">25</span>}}, {<span class="hljs-attr">location</span>: <span class="hljs-string">"colombo"</span>}]});
</code></pre><p>Last but not least, let’s talk about aggregation.</p>
<h3 id="heading-aggregation">Aggregation</h3>
<p>A quick reminder on what we learned about aggregation functions in SQL databases:</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/JHcuA7YLBiFiCBn1QiOS8NYCUELbGg-LKDSN" alt="Image" width="800" height="533" loading="lazy">
<em>aggregation functions in SQL databases. ref : Tutorial Gateway</em></p>
<p>Simply put, aggregation groups values from multiple documents and summarizes them in some way.</p>
<p>Imagine if we had male and female students in a <code>recordBook</code> collection and we want a total count on each of them. In order to get the sum of males and females, we could use the <code>$group</code> aggregate function.</p>
<pre><code>db.recordBook.aggregate([
  {
    <span class="hljs-attr">$group</span> : {<span class="hljs-attr">_id</span> : <span class="hljs-string">"$gender"</span>, <span class="hljs-attr">result</span>: {<span class="hljs-attr">$sum</span>: <span class="hljs-number">1</span>}}
  }  
]);
</code></pre><p><img src="https://cdn-media-1.freecodecamp.org/images/NeK7Wx3lQ1AaUhGD1VERqmaluAl9qrsXpDMs" alt="Image" width="527" height="61" loading="lazy">
<em>result</em></p>
<h4 id="heading-wrapping-up">Wrapping up</h4>
<p>So, we have discussed the basics of MongoDB that you might need in the future to build an application. I hope you enjoyed this article – thanks for reading!</p>
<p>If you have any queries regarding this tutorial, feel free to comment out in the comment section below or contact me on <a target="_blank" href="https://www.facebook.com/navinduuu">Facebook</a> or <a target="_blank" href="https://twitter.com/NavinduJay">Twitter</a> or <a target="_blank" href="https://www.instagram.com/iamnavindu/">Instagram</a>.</p>
<p>See you guys in the next article! ❤️ ✌?</p>
<p>Link to my previous article: <a target="_blank" href="https://medium.com/@navindushane/say-no-to-sql-ab1e49aa7299">NoSQL</a></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
