What makes Node.js so performant and scalable? Why is Node the technology of choice for so many companies? In this article, we will answer these questions and look at some of the advanced concepts that make Node.js unique. We will discuss:

  1. Event Loops ➰
  2. Concurrency Models 🚈
  3. Child Processes 🎛️
  4. Threads and Worker Threads 🧵

JavaScript developers with a deeper understanding of Node.js reportedly earn 20-30% more than their peers. If you are looking to grow your knowledge of Node.js then this blog post is for you. Let’s dive in 🤿!!

What happens when you run a Node.js Program?

When we run our Node.js app it creates

  • 1 Process 🤖
  • 1 Thread 🧵
  • 1 Event Loop ➰

A process is an executing program or a part of an executing program. An application can be made out of many processes. Node.js runtime, however, initiates only one process.

A thread is a basic unit to which the operating system allocates processor time. Think of threads as a unit that lets you use part of your processor.

An event loop is a continuously running loop (just like a while loop). It executes one command at a time, more on this later. For now, let’s think of it as a while loop that will run until Node has executed every line of code.

Now, let’s take a look at how our code runs inside of Node.js instance.

console.log('Task 1');
console.log('Task 2');
// some time consuming for loop
for(let i = 0; i < 1000000000; i++) {
}
console.log('Task 3');

What happens when we run this code? It will first print out Task 1 then Task 2 and then it will run the time consuming for loop (we won’t see anything in the terminal for a couple seconds) and finally it will print out Task 3.

Let’s look at a diagram of what’s actually happening.

demonstration of event queue in Node.js 

Node puts all our tasks into an Events queue and sends them one by one to the event loop. The event loop is single-threaded and it can only run one thing at a time. So it goes through Task 1 and Task 2 then the very big for loop and then it goes to Task 3. This is why we see a pause in the terminal after Task 2 because it is running the for a loop.

Now let’s do something different. Let’s replace that for loop with an I/O event.

console.log('Task 1');
console.log('Task 2');
fs.readFile('./ridiculously_large_file.txt', (err, data) => {
    if (err) throw err;
    console.log('done reading file');
    process.exit();
});
console.log('Task 3');

Pro tip: you can generate a 100mb file in linux or mac just by running this command:  dd if=/dev/urandom of=ridiculously_large_file.txt bs=1048576 count=100.

We naturally assume that this will output something similar. Just like the for loop reading big files takes time, the execution on the event loop will take some time. However, we get something totally different.

Task 1
Task 2
Task 3
done reading file

But what caused this? How did Task 3 get executed before the file was read? Well let’s take a look at the visuals below to see what’s happening.

event loop demonstrating blocking operation

I/O tasks, network requests, and database processes are classified as blocking tasks in Node.js. So whenever the event loop encounters these tasks it sends them off to a different thread and moves on to the next task in events queue.

A thread gets initiated from the thread pool to handle each blocking task. When it is done, it puts the result in a call-back queue.

When the event loop is done executing everything in the events queue it will start executing the tasks in the call-back queue. So that’s why we see done reading file at the end.

What makes the Single Threaded Event Loop Model efficient? ⚙️

JavaScript was created to do just simple things in web browsers such as form validation or simple animations. This is why it was built with a single-threaded event loop model. But running everything in one thread is considered as a disadvantage.

In 2009 Ryan Dahl, the creator of Node, saw this simple event loop model as an opportunity to build a lightweight web server.

To better understand what problem Node.js solves we should look at what typical web servers were like before Node.js came into play.

This is how a traditional multi-threaded web application model handles a request:

  1. It maintains a thread pool (a collection of available threads)
  2. When client request comes in, a thread is assigned
  3. This thread will take care of reading client requests, processing client requests, performing any blocking IO operations (if required) and preparing a response.
  4. This thread is not free until a response is sent back

The main drawback of this model is handling concurrent users. So let’s say if we have more users visiting our site than there are available threads then some users will need to wait until a thread frees up to get response.

If a lot of users are performing blocking I/O tasks then this wait time also increases. This is also very resource-heavy – so if we are expecting one million concurrent users we better make sure we have enough threads to handle those requests.

Moreover, the server itself start to slow down because of the increasing load. There’s also the overhead of context switching between threads, and writing applications to optimize threads' resource sharing can be painful.

Because of the single-threaded model Node.js, it doesn’t need to spin off new threads for every single request. Node.js also delegates blocking tasks to other components as we saw earlier. Since we don’t really care about many threads this makes Node.js very lightweight and ideal for microservice-based architecture.

Drawbacks of Node’s Single Threaded Model

The single-threaded event loop architecture uses resources efficiently but it does have some drawbacks.

The Node.js instance cannot immediately benefit from multiple cores in your CPU. A Java application can have immediate access to more memory as we upgrade our hardware but Node runs on a single thread.

This is 2020 and we are seeing more and more complicated web applications. What if our application needs to do complex computations, or run a machine learning algorithm? Or what if we want to run a complicated crypto algorithm? In this case we have to harness the power of multiple cores to increase performance.

Languages like Java and C# can programmatically initiate threads and harness the power of multiple cores. In Node.js that is not an option as we saw earlier. Node’s way of solving this problem is child_process.

Child Process in Node

The child_process module gives Node the ability to spawn child processes by accessing operating system commands.

Let’s assume we have a REST endpoint that has a long-running function and we would like to use multiple cores in our processor to execute this function.

Here’s our code:

const { fork } = require('child_process');
 
app.get('/endpoint', (request, response) => {
   // fork another process
   const process_ml_algo = fork('./process_data.js');
   const data = request.body.data;
   // send send the data to forked process
   process_ml_algo.send({ data });
   // listen to forked process 
   process.on('ml_algo', (result) => {
     log.info(`ml_algo executed with ${result}`);
   });
   return response.json({ status: true, sent: true });
});
// receive message from master process
process.on('ml_algo', async (message) => {
    const result = await runMachineLearningProcess(message.mails); 
 
    // send response to master process
    process.send({ result: result });
});

In the example above we demonstrate how we can spin off a new process and share data between them. Using the forked process we can take advantage of multiple cores of CPU.

You can take a look at all the methods of child processes in the official node docs.

Here is a diagram of how child processes work

pictogram showing how child process works

child_process is a good solution but there’s another option. The child_process module spins off new instances of Node to distribute the workload, and all these instances will each have 1 event loop 1 thread and 1 process.

In 2018 Node.js introduced worker_thread. This module gives Node the ability to have:

  • 1 Process
  • Multiple threads
  • 1 Event Loop per thread

Yes! You read that right 😄.

const { Worker, workerData, isMainThread, parentPort } = require('worker_threads');
 
if (isMainThread) {
  const worker1 = new Worker(__filename, { workerData: 'Worker Data 1'});
  worker1.once('message', message => console.log(message));
  const worker2 = new Worker(__filename, { workerData: 'Worker Data 2' });
  worker2.once('message', message => console.log(message));
} else {
  parentPort.postMessage('I am ' + workerData);
}

We check if it is the main thread and then create two workers and pass on messages. On the worker thread the data gets passed on through the postMessage method and the workers execute the command.

Since worker_threads makes new threads inside the same process it requires fewer resources. Also we are able to pass data between these threads because they have the shared memory space.

As of January 2020 worker_threads are fully supported in the Node LST version 12. I highly recommend reading this post if you want to learn more about worker_threads.

Wrapping up

And that’s it!

In this article we looked at how the event loop model works in Node.js, and we discussed some of the pros and cons of the single-threaded model and looked at a couple of solutions.

We didn’t go over all the functionalities of child_process and worker_threads. But I hope that this article provided you with a brief introduction to these concepts and why they exist. Please let me know if you have any feedback. Until next time 👋👋