How to Use OpenTelemetry to Understand Software Performance

If you want good performance in your application, you need to collect data to figure out where to make improvements. That's where OpenTelemetry comes in.

OpenTelemetry offers a single set of APIs and libraries that standardize how you collect and transfer telemetry data. OpenTelemetry provides a specification for instrumentation so that you can send data to distinct backends of your choice.

We just released a full OpenTelemetry course on the freeCodeCamp.org YouTube channel.

This course was made possible by a grant from New Relic, an observability platform that helps developers monitor and debug their applications.

Ania Kubów created the course. Ania is a popular software educator and she will help you get a full understanding of how to use OpenTelemetry in your projects.

In this course, you will learn how to use OpenTelemetry to get full stack observability on the performance and behavior of your software projects.

Here are the topics covered in this course:

What is OpenTelemetry?
What are Microservices?
What is Observability?
M.E.L.T
History
Setting up our Project
What is Tracing?
Context and Propagation
Setting up our Tracing
What are Metrics?
Use cases for OpenTelemetry
Setting up Distributed Tracing
Using other Analysis Tools - New Relic
Where to go next

Watch the course below or on the freeCodeCamp.org YouTube channel (1-hour watch).

Video Transcript

(Autogenerated)

Hello, internet, and welcome to this course all about open telemetry.

In this course, I'm going to be showing you how you can get full stack observability on the performance of the behavior of your software project by using open telemetry with your analysis tool of choice.

If you want to be able to tell why your project or app is running too slow, is broken, or you just want to improve its code quality.

This is the video for you.

But first, what exactly do we mean by over telemetry? Well, let's start off with the name itself, we have open so like open source, and telemetry, which is an institute collection of measurements or other data at remote points.

And that automatic transmission to receiving equipment for monitoring.

The word telemetry actually comes from the Greek root word Teller, or remote, and metron, for measure.

And that's exactly what we are going to be doing, we are going to be facilitating a way to measure the performance of everything we use in our app remotely.

With any app, when you want to start looking at this kind of data, you have two parts that need to come together.

The first is figuring out how to generate and transmit that data.

And then the second part is deciding what you are going to do with that data.

So in other words, how are you going to analyze it? Open telemetry deals with that bus part.

Up until now there has been no real standardized way of describing what your system is doing.

This is down to the fact that we all like to think differently, use different programming languages, different machines, and a combination of different ways.

This was a problem, especially for those wanting to build observability tools.

At the heart of open telemetry project is exactly that, a standardization for describing what distributed systems are doing, no matter what programming language or computer systems you are using.

Today, the open telemetry project can be described as a collection of tools, API's, and SDKs, use it instrument, generate, collect and export telemetry data, so that we can analyze it later on with whatever platform we wish by standardizing our data and means we are not tied to anything in the long run.

And it makes moving from one analysis tool to the next super easy, I will not affect your historic data.

As I mentioned at the beginning, this is an open source project.

It is made up of many, many developers and their inputs.

If we have a look at GitHub here for the project, you will see it is super transparent.

You can see the Governance Committee, you can see the technical committee.

If you are interested in getting involved, please do join their mailing list and attend the community meetings.

You can even see all the meetings that you are welcome to join.

Before we get started, though, I just want to go into a little bit of what we're going to learn on this course.

So let's break it down.

Okay, so in this course, first we're going to look at what are microservices followed by what is observability then we're going to look at melt, followed by the history of open telemetry, then we're going to actually start by setting up a project.

We are then going to talk about tracing, distributed tracing and context and propagation before adding tracing into our project.

We will then move on to looking at metrics before finally ending on our distributed projects.

I will end this course with where to go next.

We expect our websites, apps and online services to load almost instantaneously right? Think of the frustration some of us feel when websites take more than two seconds to load.

on the backend a feat of engineering is needed to keep a global system running to make sure that you have access to Netflix or Instagram.

Only a tap away Let's have a look at Netflix for a minute.

Netflix at its peak consumes 37% of internet bandwidth in the US, there are 1000s of people clicking play at the same time, with activity peaking in the evening.

Though, as a global platform, it's a constant pick.

The challenge is how to run a service with zero loss while processing over 400 billion events daily, and 17 gigabyte per second during peak.

In today's video, we will dive deeper into what types of systems power these amazing services and how we can use data to get deeper visibility into the insights of these complex systems.

Historically, developers about applications in monoliths with large complex code bases, a single monolith will contain all the code for all the business activities and application performed.

This is all fine and dandy if you have a small app.

But what happens if your application turns out to be successful, users will like it and begin to depend on it traffic increases dramatically, and always inevitably use requests improvements and additional features.

So more developers are roped into work on the growing application.

before too long, your application becomes a big ball of mud.

a situation where no single developer understands the entirety of the application.

Your once simple application has now become larger complex.

multiple independent development teams are simulataneously.

Working on the same codebase and simultaneously changing the same sections of code, then it becomes virtually impossible to know who is working on what poetry does collide, code quality suffers, it becomes harder and harder for individual development teams to make changes without having to calculate what the impact will be to other teams and teams can lose sight of how that code might be incompatible with others, among other issues.

This generally results in slower, less reliable applications, not to mention longer and longer development schedules.

In come microservices, the main principle behind a microservice architecture is that the applications are simpler to build and maintain when broken down into smaller pieces.

When using microservices, you isolate software functionality into multiple independent modules that are individually responsible for performing precisely defined standalone tasks.

These modules communicate with each other through simple API's.

microservice architectures let you split applications into distinct independent services, each managed by different teams.

It leads to naturally delegating the responsibilities for building highly scaled applications, allowing work to be done independently on individual services without impacting the work of other developers in other groups working on the same overall application.

However, trying to get visibility into this system can be very difficult.

When you have hundreds of services and applications your request traveled through debugging and troubleshooting can be a nightmare.

This is where open telemetry comes in.

As we already touched on telemetry is defined as the science or process of collecting information about objects that are far away and sending the information somewhere electronically.

Now.

observability means how well you can understand what is going on internally in a system based on its outputs.

Especially as systems become more distributed and complex.

It's hard to see what's going on inside your application and why things may be going wrong.

When talking about observability, we need to define the data types necessary to understand the performance and health of our application broadly, metrics, events, logs and traces.

Met, metrics are measurements collected at regular intervals must have a timestamp and name, one or more numeric values and a count of how many events are represented.

These include error rate, response time or output.

An event is a discrete action happening at any moment in time.

Take a vending machine.

For instance, an event could be the moment when a user makes a purchase from the machine.

Adding metadata to events makes them much more powerful.

With the vending machine example we could add additional attributes such as item category, and payment type.

This allows questions to be asked such as how much money was made from each item category, or what is the most common payment type use.

logs come directly from your app, exporting detailed data and detailed context around an event.

So engineers can recreate what happened millisecond by millisecond.

You have probably logged something when you use things like system out print, or console log traces follow a request from the initial request to the returned output.

It requires the casual chain of events to determine relationships between different entities.

traces are very valuable for highlighting inefficiencies, bottlenecks and roadblocks in the user experience as they can be used to show the end to end latency of individual cores and a distributed architecture.

However, getting that data is very difficult.

You would have to manually instrument every single service one by one layer by layer.

This will take as much time as writing the code itself, which is annoying.

Luckily there are some awesome open source projects as well as companies that make this a lot easier in today's 16 open tracing was released as a cn CF project focused only around distributed tracing.

Because the libraries were lightweight and simple, it could be used to fit any use case.

While it made it easy to instrument data, it made it hard to instrument software that was shipped as binaries without a lot of manual engineering work.

In 2018, a similar project called Open census was open source out of Google, which supported both the capturing retracing permission and metrics.

While it made it easier to get telemetry data from software that was shipped as binaries like Kubernetes, and databases, it made it hard to use the API to instrument custom implementations, not part of the default use case.

Both projects were able to make observability easy for modern applications and expedite wide adoption of distributed tracing by the software industry.

However, developers had to choose between two options with pros and cons.

It turns out that the approaches of the two projects were complimentary rather than contradictory.

There was a no reason why we couldn't have both the abstract vendor neutral API and a well supported default implementation.

In late 2019, the two projects merged to form opentelemetry.

This brought forward the idea of having a single standard for observability instead of two competing standards.

Okay, so first things first, let's go a bit meta and understand exactly what is going to happen.

In order for us to view what is happening in our app.

Here we have our project.

Think about project as a project we have made with our code editors of choice.

For example, if we run it, it runs locally on our machines.

Let's also say this project is built to listen out for requests.

So like listening out for a get request, for example, we have decided that we want to measure our app's performance based on the requests that it makes.

To do this, the first step as we mentioned at the start, would be to implement open telemetry into the project.

We are doing this to help us standardize the data.

Once we have implemented open telemetry and standardize the data, we need to think about what we are going to do with the data, how we're going to view it, and so on.

For this, we can use an analysis tool.

By analysis tool, I mean, any type of tool that gives you observability, we are going to look at a few of these tools, one that focuses specifically on tracing, one that focuses specifically on metrics, and one that looks at everything in one platform.

We are then going to send our data to our analysis tool of choice.

Here we have an example of what our data can look like in a tracing app such as zipkin, a metrics app such as permittees, and an observability tool such as New Relic that will give us an overview of everything as well as other more bespoke insights in one platform.

We will go into each of these in their dedicated sections.

Let's start with implementing open telemetry first.

Okay, so before we start, the only prerequisite I am going to ask of you is that you have Docker downloaded onto your machine.

Docker is a container platform for rapid up microservices development, and delivery.

And if you don't know much about containers and micro services, just stick with me for now.

I have a section dedicated to both of these topics coming up after this section.

For those of you that don't have it, please navigate to the Docker website and follow the instructions.

In order to get set up.

I personally would choose to download the Docker desktop.

As I am working on a Mac, I choose the Mac option.

Once you are done, make sure your Docker desktop is running on a Mac, I would simply hover over the icon like so.

You will see if you open up the platform I currently have no containers running.

Having a container is going to be important for when we get to using the analysis tools, such as a tracing back end like zipkin.

Once we have all got Docker up and running, let's get going.

Let's get our terminal up.

Now I'm going to navigate to a folder where I like to store all my projects.

It's called development.

Please go ahead and choose any directory that you would like to work in.

Now once here using the Mk dir command, I'm going to make a folder for the new project that I'm about to start.

I'm going to call it open telemetry starting out.

Please be aware that if you are using Different terminal, other commands may need to be used.

Now, let's go into the project using the command cd.

The first thing I'm going to do is start our container.

So it's ready for the next section.

So, as mentioned, the first thing that you need before you can start collecting traces is a tracing back end like zipkin, that you can export traces to, to set up zipkin as quickly as possible, run the latest Docker zipkin container, exposing Port 9411.

If you can't, or don't want to run a Docker container, you could also choose to download zipkin directly.

So just so you know, that is an option too.

If you want to explore that option more, I would suggest visiting the zipkin website.

So this is the command to run our container.

And there we go, we now have this container ID, it has worked.

Next up, we need to get a package JSON file in our project, we can do so by typing NPM.

In it, having a package JSON file will make it easier for others to manage and install all the packages we need for the project.

If you are getting errors, it could be because you don't have no js and NPM installed.

If that is the case, please visit node j s.org.

In order to get set up with that by following the download instructions.

Okay, so we have initialized the utility, I am just going to press enter for all these fields, it's prompting me to answer.

So enter and, and, and, and enter.

OK, we are done creating our package JSON file for now.

If we list out all the files on our project, using the ls command, you will see the file right there.

Now finally, I need to create an app js file.

So a JavaScript file and put that in the project to some of you might have a different approach to adding files to a project.

So that is totally up to you, however you'd like to create that file.

But once we are done, we now need to open up our project.

As I am using VS code, I'm going to use the command code.in order to open up our project.

And there we go.

There is a folder with an app js file and a package JSON file.

You will see the package JSON file has all the prompts we were asked, as I skipped all the prompts is just the standard default entries are no entries at all.

If you go to the app js file, you will also see that it's currently empty, it has nothing in it.

Okay, so the first thing I need to do is change this to point to the app js file we created not the index j s as there is no index js file.

And just for fun, this is not necessary, I'm going to fill out the description on my project.

So open telemetry costs.

The next thing I want to do is just add a start script.

So for this script, I'm going to run node app j s.

Okay, now for the fun part, let's get to adding some packages.

To start using open telemetry, we are going to need to install some of its packages.

Using NPM II or install for short.

I'm going to store the open telemetry core package, the open telemetry node package opentelemetry plugin http opentelemetry.

plugin HTTP s, open telemetry exporters zipkin to get us ready for the next section on open telemetry tracing for the next section to an express the only non open telemetry one.

Okay, so we need all of those to install.

Okay, and great.

So these are looking good.

So if we look back here, here are the packages that we just installed.

They have automatically populated our package JSON file.

So this is looking good.

We just missed one.

So let's go back and use NPM II or install for short and install Open telemetry plug in Express.

And we are done.

Okay, now let's move on to our app js file.

The first thing I'm going to do is define a port for us.

So const port, process and V port or a string of Port 8080 Next up, we are going to need Express.

So const Express.

And Express is one of the packages we installed.

So this one right here.

So we need to tell our file that this const requires Express.

We are then going to call Express and stored as the const app.

The first thing I want to do is just get a message in our console log to let us know all is good, and that we are listening out and on which port.

So just as a recap, what I am doing here is getting the app to start a server.

And then with this code, I'm getting it to listen out to our defined port for any requests.

Okay, next up, I'm going to paste this super basic piece of code.

This code is an example of a very basic route.

routing refers to how an applications end points respond to client requests.

We have defined the route using methods of the Express app object that corresponds to http methods.

For example, app get handles GET requests.

These routing methods specify a callback function called when the application receives a request to the specified route an HTTP method.

In other words, it listens out for requests that much specified routes and methods.

In this case, the root is our homepage, we essentially want to trace every single time a get request is made to the homepage.

We will do this in the next section.

Now that we have the basic installation done, it is now time for us to talk about tracing.

So as you know, open telemetry allows us to essentially standardize our data.

The next part is actually viewing the data in a way that we can analyze what is happening behind the scenes, we will do this with a tracing system.

in software engineering tracing involves a specialized use of logging to record information about a program's execution.

This information is typically used by programmers to debug by using the information contained in a trace log to diagnose any problems that might arise with a particular software or app.

Distributed tracing, however, also called a distributed request tracing is a method used to debug and monitor applications built using a micro service architecture.

Distributed tracing helps pinpoint where failures occur, and what causes poor performance.

So as we can now see, being able to get tracing data telemetry is pretty important to the overall performance of an app.

However, as we discussed in the introduction, due to systems using all types of different languages, frameworks and infrastructures, it's a hard thing to do without some sort of common approach.

That's why Evans limitary can help so much with distributed tracing.

By providing a common set of API's, SDKs, and wire protocols.

It gives organizations a single, well supported integration service for end to end distributed tracing telemetry.

For this course, the tracer we're going to be using is called zipkin.

zipkin is a distributed tracing system that helps them gather timing data needed to troubleshoot latency problems and service architectures.

Circuit was originally created by Twitter.

And it's currently run by the open zipkin volunteer organization.

I am using zipkin for no other reason than I had to pick one.

But please do feel free to choose any back end tracing system that you wish, the choice is completely up to you.

In general, once the traces are implemented into applications, they essentially record timing and metadata about operations that take place.

An example of this is when a web server records exactly when it receives a request.

And when it sends a response.

This data is usually represented by a bar, just like this, and goes by the official name of spam.

So in this example, we have two services and a bunch of spans.

To explain this, imagine this represents your favorite food delivery app.

So imagine you make an order right.

Now a few things will happen, each represented by a span.

You send information back and forth from services in order to make the payment, find a delivery driver closest to you, and notify that driver with your order.

Each of these operations generates a spam showing you the work being done to make this happen.

In this case, the spans have implicit relationships, so parent and child but also from individual services and the trace.

As you can see, each of the spans starts at a different point and takes a different amount of time.

We call this latency and network latency.

In a nutshell, latency is delay between an action and a response to that action.

network latency refers to specific delays that take place within a network.

latency is generally measured in milliseconds and is unavoidable due to the way networks communicate with each other.

It depends on several aspects of a network and can vary if any of them are changed.

errors on most systems are usually quite easy to spot.

If your bar ends in red or similar, for example, you know that an error has occurred.

Now it's time for us to look at context and propagation.

These two concepts will allow us to understand the topic of tracing a lot better.

So as we know distributed tracing allows us to correlate events across service boundaries.

But how do we find these correlations for this components in our distributed system need to be able to collect, store and transfer metadata? We refer to this metadata as context.

Context is divided into two types, span context and correlation context.

span context represents the data required for moving trace information across boundaries.

It contains the following metadata.

We have a trace ID, a span ID, the trace flags and the trace state of this bond context.

And then we have the correlation context.

A correlation contacts carries user defined properties.

This is usually things such as a customer ID, providers, hearse name, data region and other telemetry that gives you application specific performance insights.

Correlation context is not required and components may choose to not carry or store this information.

A context will usually have information so we can identify the current span and trace and propagation is the mechanism we use to bundle up our context and transfer across services so that we have it context and propagation.

Together.

These two concepts represent the engine behind distributed tracing.

If you would like to learn more about these two topics and do a deep dive at please visit the urban telemetry website.

For the purpose of this tutorial, however, a basic knowledge above will suffice.

So just as a bit of a heads up in this next section, we will go through how to first initialize a global tracer and after that initialize and register a trace exporter.

Okay, so here we are where we left off.

Now in the last section, we ran the latest Docker zipkin container exposing Port 9411.

If we actually visit localhost 9411, we will see the zipkin UI that comes as part of doing that.

So here we are, this is what we are going to use to view our traces.

Okay, let's carry on.

Next, let's create a file named tracing and j s and add the following code.

This is just the sample code provided to us by opentelemetry.

If you visit their website, you can see I am just copying this code right here and pasting it into my project.

You will also see that this file uses two of the packages we installed in the initial setup.

Once we have pasted that we need to initialize and register a trace exporter.

We have already done part of this as we install two packages necessary for this part and the initial setup section.

And that is the open telemetry tracing and open telemetry export to zipkin packages.

So first off, let's make a new zipkin exporter.

So provide add span processor, new simple span processor.

This is from the open telemetry tracing package, as you can see that has shown up at the top news of can export out.

And that also comes from another package.

So the open telemetry exporters, it can package as you can see appearing at the top, this might not automatically show up for you.

It's just the code editor I'm using.

So you might have to type those two out, and then the right service name and choose a service now.

I'm going to put Getting Started but you can replace it with your own service name.

Okay, that is looking good.

And so is the app js file.

Okay, great.

All tracing in a civilization should happen before your application code runs.

The easiest way to do this is to initialize tracing and a separate file that required using the node r option before your application code runs, I will show you what I mean by this.

So, now, if you run your application with node, our tracing j s and app j s, your application will create and propagate traces over HTTP.

So let's run that.

And now let's send requests to application over HTTP.

We can do so simply by refreshing the localhost 8080 page, you'll see traces exported to our tracing back end, they look like this.

So he will see we are making a get request to the homepage, which is responding with hello world.

And here we have getting started.

As that is what we called our service name.

We also get a start time and a duration as a span.

Now, as you use this more and more often, some spans might appear to be duplicated, but they are not.

This is because some applications can be both the client and the server for these requests.

If this is the case, you will see one span which is the client side request timing, and one span which is the server side request timing, anywhere that they don't overlap is network time.

Okay, now I'm going to show you one more example.

So just to differentiate, I'm going to change my service name to get date.

Now I am going to go into my app js file.

I'm just going to copy this piece of code right here.

And paste.

Now I want to essentially listen out to any time a get request is made to the park date.

So in other words, if someone right now went to localhost 8080 forward slash date, that is us making a get request, we will also be able to see respond with the actual date.

So let's go ahead, let's go back to our localhost 8080 and type forward slash date.

Oops, I stopped my app running, let's make sure it's running.

So I'm just going to start this up again.

Okay, and refresh our page.

And great, there is our object with today's date.

Amazing.

So now if we visit localhost 9411, so the port we exposed, and click run a query, we will see all the requests that have been made.

So there we go, we can now see our get date service.

And at the moment, the only request we have that it's listening out for is the get request.

Now, I have actually renamed the service, remember so if I visit the homepage, you will see that request is also being stored under the get date service name, you can tell which one it is by the timestamp.

Okay, let's move on.

Okay, we are now done with a basic implementation of tracing.

However, we are literally just touching the surface.

In the project portion of our course I will show you how to use open telemetry to instrument a distributed system.

By this I mean I will show you how to trace multiple services and their interactions with each other if those exist.

In this next section, we're going to learn how to collect metrics with open telemetry and remediate Prometheus as a monitoring platform that collects metrics from monitor targets by scraping metrics HTTP endpoints on these targets.

In this section, I will show you how to install, configure and monitor our fast app with Prometheus and open telemetry.

We will download, install and run Prometheus to expose time series data on hosts and services.

Unlike tracing which works in spans metrics are a numeric representation of data measured over intervals of time.

metrics can harness the power of mathematical modeling and prediction to derive knowledge of the behavior of a system over intervals of time in the present and future.

Since numbers are optimized for storage, processing, compression and retrieval, metrics enable longer retention of data as well as easier querying.

This makes metrics perfectly suited to building dashboards that reflect historical trends.

metrics also allow for gradual reduction of data resolution.

After a certain period of time, data can be aggregated into daily or weekly frequency.

Let's have a look at this in action.

In this section, I'm going to be using promethium as my metrics backend.

Now that we have set up end to end traces, we can collect and export some basic metrics.

First, I'm going to stop this from running.

Let's go to the Prometheus download page and download the latest release.

permittees for your operating system.

As I'm using a Mac, I'm going to click on this one right here.

Once that has downloaded open a command line and use CD or the command cd to go into the directory, where you downloaded the promethease tarball.

In my case, it will be the Downloads directory.

Now I need to unretire it into the newly created directory, make sure that you replace the file name with your downloaded harbor.

So don't necessarily use this one.

And now let's go into the directory.

If I list out all the files and folders, you will see a file named permit this yamo this is the file used to configure permit Yes, for now, just make sure permit is start by running the dot four slash promethease binary and the folder and browse to localhost 9090.

Okay, great.

And that is our promethease user interface.

So you will see this as server is ready to receive web requests.

So that should be all good.

I am just going to go ahead and open up a new tab right here so I can keep that running.

And I'm going to open up our directory using the VS code shortcut.

Once we have confirmed that permit the US has started, we need to replace the contents of the primitives yamo file with the following.

So literally just delete everything and put in this much shorter piece of code.

This will set the scrape interval to every 15 seconds.

We are now ready to monitor our Node JS application.

In this next section, we need to initialize the required open telemetry metrics library, initialize a meter and collect metrics and initialize and register a metrics exporter.

To do this, we are going to have to install some libraries, we are going to need the open telemetry metrics package.

So let's go ahead and install that.

Let's navigate back to our project first.

So don't do this in the directory we just downloaded.

And in here, type NPM II or install Open telemetry metrics.

Great, we are now ready to initialize a meter and collect metrics.

We first need a meter to create and monitor metrics a meter and open telemetry is the mechanism used to create and manage metrics, labels, unmetric exporters, create a file named monitoring j s.

So a JavaScript file and the root of your folder and add the following code.

So we're going to need the open telemetry metrics package for this Kant's.

And we're gonna get the meter provider from it and buy it I mean, the open telemetry metrics package, we are then going to make a new console constant meter.

And we're going to use the meter provider we're gonna make a new meter provider and use get meter, as well as I'm just gonna put your meter name for now, we can change this whenever we want.

Now we can require this file from your application code and use the meter to create and manage metrics.

The simplest of these metrics is a counter.

In this next part, we're going to create an export from our monitoring js file, a middleware function that express can use to count all requests made by route.

So first off, we need to modify IO monitoring j s file.

So once again, I'm just going to copy this sample code from open telemetry open source projects in order to help us count the requests and paste that into my monitoring j s file.

Next, we need to import and use this middleware in our application code.

So our app js file.

So we need to get count all requests from our monitoring js file.

So the module export, we do so by typing const count all requests require monitoring j s.

So literally, we are using this right here.

Now let's get to using it.

Type app use count all requests and call it now when you make requests your service you will meter will count all the requests.

Perfect.

Next up, let's look at initializing and registering a metrics exporter counter metrics are only useful if you can export them somewhere where you can see that.

For this we're going to use Prometheus is creating and registering a metrics exporter is much like the tracing exporter above fast, you need to install the Prometheus exporter by running the following command.

So I'm just going to use npm install and open telemetry exporter permit this next step, we need to add some more code to our monitoring js file.

So once again, I'm going to copy the code given to us by opentelemetry To get started, and paste it into my monitoring js file.

Don't worry, I will share this repo with all of you so that if you do get stuck, you can refer to my finished project.

Now in a separate tab, so just leave this running and show Prometheus is running by running the Prometheus binary from earlier and start your application.

We do so by using the script we wrote.

So NPM start, you should see promethease scrape endpoint and the HTTP localhost 94644 slash metrics, as well as listening for requests on localhost 8080.

Now, each time you browse to localhost 8080, you should see Hello, in your browser and your metrics, and Prometheus should update, you can verify the current metrics by browsing to localhost 9464, forward slash metrics, which should look like this, you should also be able to see the gathered metrics in your Prometheus web UI, we can also add more routes in our app js file.

Let's go ahead and do that to see what that would look like.

So I'm just going to add some pre written code here.

And that is a middle tier route, and another route that has the pot back end.

So now we have our date, homepage from the previous section, our back end route now, and a middle tier route, as well as a new homepage route.

I'm also going to need axios for this.

So another package that will help me in making these requests.

So let's go ahead and import that into my project.

And nice that is done.

Let's run NPM start.

Okay, now let's check everything is working as expected.

The home page now responds with Hello backend.

This is actually because we have two homepage routes.

So I'm just going to get rid of the other one in a bit.

The backend route responds with Hello back end, the date route responds with today's date.

So that looks good.

So I'm just going to delete the initial homepage route that we had that responds with hello world and keep the new one.

Okay, and now let's visit middle tier.

And we get response of Hello backend.

And finally, let's visit matrix where we get a request counter of all the routes we have visited.

Okay.

So this looks like it's all working, we visited all the routes, and I counter seems to be working fine.

Now let's go ahead and see that in the permittees UI.

So we're going to have to pick something to execute.

And there we go.

Now, before we move on to our project, I thought let's take a little bit of time to understand exactly what issues can be detected.

With open telemetry.

Here's just a list that I'm going to go through with you.

Starting with the backend.

In the backend with open telemetry you can pick up bad logic or user input, leading to exceptions being thrown, poorly implemented downstream calls.

So for example to infrastructures like databases or downstream API's, leading to exceptionally long response times.

Or you can pick up poorly performant code on a single API, leading to exceptional response times.

On the front end, with open telemetry, you can detect bad logic or user input leading to JavaScript errors.

You can also use it to find poorly implemented JavaScript, making your UI prohibitively slow, despite performant API's.

And you can even use it to locate geo specific slowness requiring geo distribution.

And finally, for infrastructure, you can use it to identify noisy neighbors running on a host sapping resource from other apps, configuration changes, leading to performance degeneration version audit.

So zero day vulnerability checks, ensuring convict changes went through, or just miss figuration with your DNS making your apps inaccessible.

So that is a list of you thinking of issues that you can detect with open telemetry.

Now that we have that covered, let's move on to our project.

And this part of the course, I want to show you what happens when you are building out an app with a more complicated back end that deals with two services.

It is a hypothetical project that you can adapt to anything that you wish, it is an app that gets movies for your database.

By the end of the project, you will be able to trace exactly how we got the movies, and how long each step took.

Okay, so here is a project I have pre made, thanks to the open source community and inspired by an open telemetry contributor, Alan storm.

It is a project that has two services with one service relying on the other, but not the other way around.

In this project, I have a main dashboard service, as well as the movies service, which will return all the movies for our app.

The layout of this project is similar to what we did in the tracing setup.

However, instead of having a separate tracing js file to trace each service, the code is directly in each file.

So as you can see, each service is in the root of our project.

And just going to minimize this so we can see the code a little better now.

So from the beginning of our course, this should look familiar.

So as a reminder, open telemetry requires that we instantiate a trace provider, configure that trace provider with an exporter and install Open telemetry plugins to instrument specific node modules.

Let's talk through the code.

So we can see here as a refresher, especially as it's organized in a different way than what we saw in the basic implementation.

So first, we are going to get the node trace provider from the open telemetry node package.

A trace provider is what will help us create traces on no Jess.

Next, we will get the console span exporter and the simple span processor from the open telemetry tracing package.

And then we need to get the zipkin exporter from the open telemetry exporter zipkin package.

Now that we have what is necessary, let's move on.

So at the moment, we have made a tracing program.

And to actually generate the spans if you remember, we installed a plugin called Open telemetry plugin dash HTTP, the node a trace provider object is smart enough to notice the presence of a plugin and load it.

This code creates a trace provider and adds a span of processor to it.

The trace processor requires an exporter.

So we instantiate that as well.

Both are responsible for getting the telemetry data out of your service and into another system.

With this code, we create an exporter then use that exporter when creating a span processor, and then added that spam processor to our trace provider.

Okay, so that is what that code does.

Let's actually name our service.

Now, I have left this blank.

So let's go ahead and fill that in.

As this service is going to deal with the main service, I'm just going to call it dashboard service.

Here we're instantiating, a zipkin.

exporter, and then adding it to the trace provider.

We of course need to get Express from the package Express.

And so a server and listen out on port 3000 on one four connections.

The app is currently not going to respond with anything for requests any part as we haven't written anything yet.

I wanted to respond with the dashboard itself and the movies from the movie service.

But before we do that, we need to build out our movies js file.

So this file is exactly the same as the other file just perhaps with some code in different places.

uses a different port.

In this file, I want to deal with the movies.

So I'm just going to rename this service name to movies service.

If I ran this service, we would be listening out to Port 3000.

Now, I'm going to determine how our app responds to a get request to the movies endpoint.

So I am just going to write up, get, and then I'm simply going to put the movies path.

So I'm making this up.

This is an async function, and I'm going to pass through a request and response.

Okay, so there we go.

And then I'm simply going to type rest type Jason as a string.

And response, send Jason stringify.

And I'm just going to send a movie object with, let's say, an array of movies.

So let's put some objects in our array, I'm going to make the array have movie objects.

And each movie object is going to have a name.

So a name like Jaws, a genre.

So for example, jaws is a thriller as a string, and that is my first object.

And then I'm just going to make a quick other object.

This time, let's put a different type of film.

So I'm just going to put the string of Ani, and once again, let's put a genre.

So I'm just making the same movie object just with a different name and genre.

I'm going to put family and make another object.

I'm going to stop after this one, because this is just for illustration purposes.

And let's put Jurassic Park as the name and let's put action as the genre.

Okay, that is it.

That's our array of three movie objects.

Okay, now let's run our app.

So let's actually go to localhost 3000, and put the movies path.

So what I'm doing is I am requesting the URL.

And of course, our app is going to listen out for it and make a trace.

So now let's go over to the zipkin UI and search for recent traces, we will see we recorded a single service trace.

However, if we look through into the trace details, we'll see these traces do not look like the traces we've previously seen.

In our previous examples, one span equaled one service.

However, a span is just a span of time that's related to other spans of time, we can use a span to measure any individual part of our service as well.

The Express auto instrumentation plugin creates spans that measure actions within the Express framework, we can use it to find out how long each middleware took to execute, how long your Express handler took to execute, and so on.

It gives you insights not just into what's going on with the service as a whole, but also individual parts of your Express system.

This is the role most of the contract plugins play in the open telemetry project.

The core plugins are concerned with ensuring each request is bound to a single trace.

But the contract plugins will create spans specific to the behavior of a particular framework.

Okay, great.

Let's carry on.

So now that we've done that, I want to show you how to use open telemetry to instrument a distributed system.

This is what our dashboard js file is for is essentially want my dashboard js file to call the movies service as well.

So let's get to writing that code.

The first thing I am going to do is actually use the node fetch library which we haven't installed yet.

So this service, use the node fetch library to call our movies service.

Let's go ahead and install that.

So I'm just going to get my terminal and type NPM I for install and node batch.

Okay, now, once again, I'm going to have to type apt get, and then use the route of dashboard and then write in async function, so a sick function and pass through a request and response.

Now I am going to write here I need to fetch data from a second service.

And that's the movie service.

I'm just gonna write some pseudocode to remind us of that, fetch data running from second service and My second service is the movies service.

Okay, now, I am going to write a function that will essentially help us get all the URL contact from the movies service.

So essentially our object with the three movies in an array or movie objects in an array.

So let's get to writing this function, I'm going to write this function and pass through two parameters.

So whenever I pass through a URL into this function and a fetch, so the fetch is actually going to use our node fetch library, then I'm going to use these two parameters to essentially get the body of that URL.

So I'm going to use a promise to do this new promise, I'm going to pass through resolve and reject.

And then I'm going to use fetch to fetch the URL.

And then whatever is in the body, I'm going to use so fetch URL, resolve reject, then Russ, Russ text, then body.

So that is my function for getting the URL content.

Let's get to using that.

Okay, I'm just going to actually change this.

I don't like the way this is written.

And I want to make it consistent with the bottom.

So I'm just going to change this up.

So it looks a bit neater is just a different way of writing functions.

So just so it's consistent to that.

Okay.

So once again, let's go down to the code I've pre written.

And in here, I'm going to fetch data running from the second service.

So the movies service, I am going to actually save the contents of the URL as movies.

So const movies, await and use the function that we pre wrote to pass through the URL.

So the URL we want to get the content of is HTTP localhost 3004 slash movies.

So there we go, that is ju L.

It is the same URL that I have written here.

So URL U, r, l, that's what we have done.

And then I'm going to need to require node fetch.

So I'm going to look through right require and the package node fetch, I'm going to put rest type Jason, and rez send Jason stringify.

And now I'm going to write the word dashboard.

Okay, so I'm making an object, and I'm gonna write dashboard, and then whatever we've saved as movies should show up here.

So essentially, the contents of the URL will show up here.

Okay, so now I cannot run this file.

So I could run this file, but we will see an error.

This is because our file relies on the movies service being up and running.

So I'm just going to show you now I'm going to type node dashboard, j.

So it's listening at localhost 3001.

However, if I visit localhost 3001 for slash dashboard, I get an error.

This is because we need our movie service to be running.

So let's go ahead and make that true.

I'm just going to open up a new tab and type node movies j s.

Okay, that is not running and listening at Port localhost 3000.

So now, here's our movies.

And let's refresh or let's rerun the dashboard.

So once again, node dashboard, j s.

And then refresh our page.

Amazing, we will see our dashboard object with the contents of the URL from the Ford slash movies path.

Amazing.

So this is working fine.

Our code is working as it should.

Now, let's see how this looks in our zipkin UI.

So I'm just going to rerun the query and look at our latest.

And there we go.

So once again, to reiterate that, in a nutshell, the dashboard service is dependent on a movie service to populate it.

This is true for many apps you interact with today, and which is why this example is the one I wanted to show you for our project.

Now, we can see here that each span from the services are linked together.

The opentelemetry HTTP plugin took care of this for us, the node fetch plugin use the underlying functionality of node j, s, HTTP and HTTPS to make requests.

So that's how to instrument an application using open telemetry.

Like this is pretty cool, as obviously, you can see our dashboard service, and then you can see exactly what time it's going to the get movies, service, and then coming back.

Okay, that brings us to the end of our project.

I hope you've enjoyed this section.

This is, of course, just the surface of what you can do the opentelemetry.

There is a lot more to it.

But until you get the fundamentals, I sincerely hope you can go through this course again and again until you feel more comfortable building your own projects.

So far, in this project, we have directly gotten data from an app and send the data to zipkin.

But what happens if we want to try sending it to another back end to process our telemetry data? Does that mean we would have to re instrument our whole app? Well, the amazing contributors to open telemetry have come up with a solution to fix this.

The open telemetry collector is a way for developers to receive process and export telemetry data to multiple backends.

It supports open source observability data formats like zipkin, Jaeger permittees, or flume bit, sending it to one or more open source or commercial backends.

In this next section, I'm going to show you how to use it.

Okay, so for this section, and using the open telemetry collector, we're going to be using New Relic as our observability tool of choice.

All I'm going to do is head over to New Relic and sign up.

Next, you'll see some questions, please answer these to the best of your ability.

So for example, where you store your data, and just click save, your account will then be set up.

Once you get to this page right here, I'm just going to ask you to not interact with anything for now, and head over to one dot new relic.com.

Once here, I'm going to ask you to go on the drop down of your profile and click API keys.

Once here, I'm going to ask you to gravitate to creating a new queue and just select ingest license.

I'm going to name this otol example.

And I'm just gonna give it some notes just so we can keep track of our API keys.

And great, our API key is now created.

Let's copy it and move on.

The next thing we're going to do is actually get our open telemetry collector.

For this, I'm going to head over to New Relic GitHub account in which I can get the open telemetry examples.

So I'm just going to ask you to clone this repo into your local machine.

I have already done this.

So I'm just going to go ahead and head over to that repo now.

And here it is.

Once here, I'm going to ask you to navigate to the collectors and our exporter Docker otol config yamo file, because we're gonna have to change this up a little bit.

So please do head over here now.

And I'm just going to ask you to add a little line of code.

This line of code will add zipkin as a receiver, a receiver is aware that data gets into the open telemetry collector.

So because we already have our app configured to use zipkin, we will be telling the open telemetry collector Hey, we will be sending you data in the form of zipkin.

So that is what is happening, we are adding zipkin as a receiver and then giving it an endpoint, which in this case is 0.0 dot 09411.

Now because zipkin report tracing data, we are going to add zipkin as a tracing receiver underneath service.

And that's it.

The next thing we need to do is go over to the Docker compose yamo file, and make sure that the Docker container that runs this open telemetry collector is actually able to receive the data through the same port that it would have if it was zipkin.

So we're going to add the Port 9411, just like so and save it.

Now, according to the readme to run this, we need to use the API key we just created.

So in my terminal, I'm just going To export the New Relic API key I just created with this command.

Next, we need to spin up the Docker container with this command, making sure of course that we are in the correct directory.

So that is my fault.

Let's go into that directory.

So the nr exporter Docker.

And once again, we run the Docker container.

and wonderful.

Now let's go back to our movies dashboard project that we have been working for in this course.

So now I just need to modify our app ever since Thirdly, to work with the open telemetry collector.

For this to work, we need to change two things, we need to change the URL that is reporting at.

So this one right here, I'm simply going to use it like so.

So we need to do is for the dashboard and also for the movies service.

And that's it.

Now, I'm just going to reinstall all the dependencies for those who are just joining us here and I've taken this project from the description below.

And then let's run the two services, like we have been doing previously in this tutorial.

Wonderful.

Okay, now I'm just going to call the services by visiting the dashboard service.

As a reminder, the dashboard service relies on the movies service, I'm going to call it multiple times, so we can get lots and lots of data to work with.

So maybe just a few more.

Okay, done.

Let's move on.

Now that we have our app and have successfully instrumented the open telemetry collector that is forwarding our data to New Relic, we should now be able to visualize our data.

For this, we need to go to the Explorer tab on New Relic.

And once here, we will see the two services, the dashboard service and the movies service, just like in the previous distributed tracing section.

Let's deep dive further.

As you can see in the dashboard service, that was a spike.

Let's find out what was happening.

So it looks like there were 18 traces with nine spans and two entities for the dashboard service.

That sounds right.

Great.

And if we dig deeper, you will see the micro service that our dashboard service was communicating with to actually solve the final result.

Wonderful.

As you can see, we can get a lot of awesome data to do things such as get to the root cause analysis of what could be going wrong in your app, check how your microservices are performing, and so, so much more.

Okay, and there we have it.

That was our open telemetry course, before we finish, I just want to take a moment to recap what we have learned in this course.

So to recap, in this course, we learnt how to set up the back end for a project.

Then we learnt how to implement tracing into our project, as well as metrics if we need to.

And then we also looked at two services and how they communicate.

Thanks to distributed tracing.

I hope you now feel comfortable knowing the benefits of using open telemetry, as well as have a good understanding of how you would go about implementing it into your Node JS projects.

If you are wondering where to go next, with your newfound knowledge, I would suggest learning about infrastructure monitoring and digital experience monitoring.

These are two other ways to visualize, analyze and troubleshoot your entire software stack.

I will leave you with this and a link to New Relic To find out more and get a free account with no expiration date.

Thanks so much again for watching and I'll see you soon