Salim Oyinlola - freeCodeCamp.org

How to Become a Microsoft Learn Student Ambassador

Salim Oyinlola — Mon, 14 Aug 2023 16:11:33 +0000

The Microsoft Learn Student Ambassadors program is a student-focused program designed to empower students to become advocates and leaders in the tech ecosystem.

Microsoft Learn Student Ambassadors are student leaders who collaborate with Microsoft to help fellow students learn about technology and engage with various Microsoft technologies and services.

It is important to note that Microsoft Learn Student Ambassadors are not employees of Microsoft. While they work closely with Microsoft and have access to certain resources, training, and support from the company, they do not receive regular employee compensation.

Instead, student ambassadors often receive benefits such as networking opportunities, skill development, and recognition within the tech community.

Responsibilities of Microsoft Learn Student Ambassadors

At the very high level, the responsibilities of student ambassadors include encouraging their peers to use Microsoft Learn, Microsoft's educational platform. It offers free resources, tutorials, and courses on various Microsoft technologies in a bid to help students build skills that open doors.

Student Ambassadors also represent Microsoft technologies on their campuses. In this capacity, Ambassadors engage with faculty and students to showcase the benefits of these tools for education and innovation.

Student Ambassadors are required to complete a variety of pre-determined activities in order to advance through the well-defined program milestones. As you complete these program milestones you can unlock additional program benefits.

You go through these milestones in the following order: new to alpha, alpha to beta, and beta to gold. With each milestone achievement, program certificates are updated.

The different milestones

To attain the Alpha milestone, new student ambassadors are required to complete a learn path on Microsoft Learn within four months of joining the program.

To reach the Beta milestone, first you need to complete the program's technical onboarding as a alpha student ambassador. Then you must host and report an approved event or actively participate in a Student Ambassadors project within twelve months of joining the program.

Finally, to achieve the highest milestone in the program (Gold), you are required to go above and beyond program expectations by demonstrating significant reach and impact.

How to Apply to Be a Microsoft Learn Student Ambassador

Before applying, it is important to note the following:

The minimum age requirement for applicants is 16 years.
Full-time enrollment in an accredited academic institution (such as a College or University) is required for applicants.
Applications are restricted to individual persons and not open to corporate entities.
Individuals who are Microsoft employees or current contractors are ineligible to apply.

If you meet these qualifications, you can apply to become a Student Ambassador.

The most important thing to know about the application process for the Microsoft Learn Student Ambassador program is that the application is open all year long. But applications are only reviewed (and applicants are gotten back to) quarterly – in January, April, July and October.

You can find the application form here. The application consists of four major parts – privacy and terms, personal information, academic institution, and application questions. Let's go through each one now.

Privacy and terms

In the privacy and terms section, you are required to agree to the Microsoft Privacy Statement and confirm your eligibility for the program.

By completing this section, you accept that as you submit your application, your information will be shared with the Microsoft program lead who manages Microsoft Learn Student Ambassadors selection in your region.

Also in this section, you are informed that if your application is not accepted, although all the information you provided will be removed from their servers within twelve months of the application submission date. You can request removal of your application data at any time without constraints.

Personal information

In the personal information section, the required prompts include your name, contact email address, date of birth, gender, pronoun and country or region.

You can also enter your preferred programming language in this section (but this particular question is not required).

In this section, you can also add the social media profiles you use for community leadership as an allusion to your online influence and network.

Academic institution

In the academic institution section, as the name implies, the prompts are aimed at getting the country and region of your academic institution, its name, your degree program and your expected graduation date.

The available options include Computer Science, Electrical Engineering, Software Engineering, Systems Engineering, Engineering - Other, Information Systems, Cyber Security, Business and Others). You'll also be asked to list your degree level (the selections accessible are Bachelor, Master and PhD).

Application questions

In the application questions section, as an applicant, you are required to answer the following three questions.

Tell us about one specific technical skill you have learned, and how you were guided to learning it. How would you use that experience to be a better Student Ambassador? (Guide)
Tell us about a specific time you welcomed a new person into one of your communities. How would you use that experience to be a better Student Ambassador? (Welcome)
Tell us about a specific event you have planned that connected diverse audiences together. How would you use that experience to be a better Student Ambassador? (Connect)

Now, here's the catch. Whilst every applicant gets to answer all three questions, each applicant also gets the opportunity to answer any two of their choice with written responses and one as a video response. I think this is an attempt to get applicants to demonstrate their public speaking skills.

The video must be under two minutes and should be recorded in a quiet space without background noise. The written responses, on the other hand, must be a minimum of 100 characters and maximum of 2000 characters each.

Also, the video response's link has to be publicly accessible to anyone with the link. If an applicant chooses to record their video in any language that is not English, they must include English subtitles.

There's an option to pick from a list of technologies and highlight which ones interest you.

Finally, in more than 100 characters and less than 1000 characters, you can share the link to your Microsoft Learn profile (if you have one) and provide the reviewing team with details about your engagements within your community, both in online and on-campus settings.

In that space, you can also let them know about any organizations you are part of, particularly if you hold leadership roles. Feel free to share any additional information you consider important in this section.

The list of technologies to choose from

The application process/essays may seem quite lengthy. But the good thing about it is that the essay portal automatically saves your application as you go, so you can leave at any time and come back later to continue your application.

Benefits of Being a Microsoft Learn Student Ambassador

The benefits of being a Microsoft Learn Student Ambassador are many. First, you benefit from comprehensive technical and leadership training and gain access to Microsoft's extensive array of learning materials. You also get to connect with a worldwide community of students who share your enthusiasm for building the future you envision.

In my opinion, the biggest thing student ambassadors stand to gain is the community they become a part of alongside fellow ambassadors. This community offers an extensive network of peers, mentors, and professionals who are all invested in the same mission. It becomes a platform for collaboration, knowledge-sharing, and personal growth.

Here are a few other benefits that are accessible as you progress through the milestones:

Benefits of the New Milestone

Microsoft 365: This is a product family of productivity software, collaboration and cloud-based services owned by Microsoft. As a student ambassador in the new milestone, you get Microsoft 365 subscriptions.

The Microsoft 365 subscriptions include full Office desktop apps such as Word, Excel, Outlook, PowerPoint, Access and Publisher for Windows PCs, as well as access to additional OneNote features.

Benefits of the Alpha Milestone

In addition to the benefits that comes with the new milestone, student ambassadors get access to Visual Studio Enterprise. This annual subscription provides Student Ambassadors with $150 monthly Azure credit, license keys to various Microsoft Products, and many other benefits.

In addition, in line with the program's dedication to the growth of student ambassadors, student ambassadors in the Alpha milestone and beyond get free access to LinkedIn Learning.

Benefits of the Beta Milestone

In addition to the existing benefits, upon reaching this milestone, student ambassadors get a Beta swag box.

The Microsoft Learn Student Ambassador Beta Swag Box

Benefits of the Gold Milestone

On top of the previous benefits, student ambassadors who achieve this milestone get a Gold swag box, more program leadership opportunities, consideration for special events and activities, and consideration for MVP mentorship and nomination. This is the highest milestone in the program.

The Microsoft Learn Student Ambassador Gold Swag Box

Windows Insider for Student Ambassadors

Finally, one last benefit I want to touch on is the Windows Insider Program for Student Ambassadors.

Thanks to the Windows Insider Program's collaboration with Microsoft Learn Student Ambassadors, when Student Ambassadors become part of the Windows Insider Program, they will get a chance to preview upcoming Windows features and offer input to influence the direction Windows takes in the future.

Wrapping Up

Once you graduate from the Microsoft Learn Student Ambassador program at the Gold milestone, you will be eligible for nomination to the Windows Insider Most Valuable Professional (MVP) program.

This prestigious status comes with exclusive interaction opportunities with the Windows engineering team, special event invitations, and complimentary subscriptions to tools such as Visual Studio Enterprise, Office 365, and LinkedIn Premium. You you will receive a personalized award kit to commemorate this achievement.

I hope this guide helps you apply for the program and know what to expect! Thanks for reading.

How to Get a GitHub Student Developer Pack

Salim Oyinlola — Fri, 14 Jul 2023 17:35:43 +0000

For many students who are passionate about technology, limited access to crucial tools and resources is a common challenge.

Fortunately, the GitHub Student Developer Pack comes to the rescue. This pack provides students with exclusive access to a variety of top-notch developer tools, allowing them to learn through practical experience.

What is the GitHub Student Developer Pack?

The Student Developer Pack is made up of a selection of benefits generously provided by GitHub's partners and collaborators. These partners joined forces with GitHub to extend these valuable offers to verified students.

Why? Because they understand that the most effective way to support future developers is by offering hands-on experience with industry-standard products and tools.

In essence, the GitHub Student Developer Pack eliminates the obstacles faced by students who are passionate about technology but may not have the resources to pay full price for all these tools.

The pack grants students access to a collection of resources and empowers them to acquire firsthand knowledge and skills through the use of cutting-edge tools.

In this article, I will touch on some of the highlights of the GitHub Student Developer Pack and how you can access to it.

Highlights of the Student Developer Pack

This pack comes with a plethora of offers across different domains in the tech space. For every speciality, there's likely an offer that'll help you get ahead in your career and access the best developer tools used in that industry for free.

These tools cover many topics, including Cloud Computing, Design, Game Development, Infrastructure and APIs, Internet of Things (IoT), Marketing, Mobile Development, Security Analytics and more. So it's fair to say no stone is left unturned with this pack.

Here are some examples of what you'll have access to:

Microsoft Azure

Access to the pack comes with Microsoft Azure's free access to 25+ Microsoft Azure cloud services plus $100 in Azure credit.

Namecheap

As part of the GitHub Student Developer Pack, Namecheap offers free one-year registration on a .me domain that comes with an SSL certificate.

DigitalOcean

With the pack, you will get access to $200 credit valid for one year to pay for cloud infrastructure services such as virtual servers, storage, and networking resources. on DigitalOcean in a pay-as-you-go basis.

Educative

With the GitHub Student Developer pack, as a student, you will get access to six free months of 60+ courses covering in-demand topics. This can help you learn to code, grow your skills, and even succeed in tech interviews.

1Password

The importance of security cannot be overemphasized. With access to 1Password, you can ensure the security of your accounts by monitoring password breaches and identifying other security issues.

As a student, this enables you to maintain the safety of your online accounts. With the Student Developer Pack, you will get 1Password free for a year.

And then, my personal favourite...

The GitHub Campus Experts Program

The GitHub Campus Expert program is geared towards empowering University students to become student leaders while advocating for technology and open-source collaboration.

Campus Experts are trained to organize events, mentor other students, and foster/build a strong tech community on their respective campuses.

But it's important to note that the GitHub Student Developer Pack is a prerequisite for being able to apply for the program.

In all, the pack provides students with a wide range of essential developer tools and resources that are typically expensive or require paid subscriptions for free. Being able to use and learn these tools can help take your career to the next level.

If you're curious, you can take a look at my GitHub Campus Experts profile here.

What Are the Requirements to Get a Student Developer Pack?

To be eligible for the GitHub Student Developer Pack, you must meet the following requirements:

You must be a minimum of 13 years of age.
You must have a user account on GitHub.
You must possess a school-issued email address that can be verified, or alternatively, provide documents as evidence of your current student status.
At the point of your application, you must be currently registered in a program that grants a degree or diploma.

To gain further insights into the application process and the types of documents that are accepted, I encourage you to check the comprehensive Education documentation.

How Do I Get this Pack?

First, you'll need to visit GitHub Education. Sign in with your GitHub Account by clicking the Sign-in button on the top-right corner.

Then click on the Sign up for Student Developer Pack button.

You will be required to fill a form. The questions in the form are as follows:

What e-mail address do you use for school?

If you have a school-issued email, select (or add) it. It is important to note that these email addresses must be verified.

From my experience, selecting a school-issued email address gives you the best chance of a speedy review.

If you don’t have a school-issued email, follow the prompts to fill out some additional information. You will still be eligible even if you only have a personal email address, as long as you can provide alternative documentation to verify your current student status.

What is the name of your school?

This prompt is automatically filled out if you provided a valid student email. However, if your school is not listed, then you need to enter your school's full name and continue. You will be asked to provide further information about your school on the next page.

How do you plan to use GitHub? and then click 'Continue'

And boom!

Yes, it is that easy. By following the steps I have outlined, you can unlock a world of opportunities and resources that can elevate your career to new heights. Armed with the GitHub Student Developer Pack, you now have access to a variety of tools, software licenses, and educational resources that can accelerate your learning and growth.

Finally, I share my articles on Twitter if you enjoyed this article and want to see more.

Gradient Descent – Machine Learning Algorithm Example

Salim Oyinlola — Mon, 24 Oct 2022 13:53:51 +0000

What is the Gradient Descent Algorithm?

Gradient descent is probably the most popular machine learning algorithm. At its core, the algorithm exists to minimize errors as much as possible.

The aim of gradient descent as an algorithm is to minimize the cost function of a model. We can tell this from the meanings of the words ‘Gradient’ and ‘Descent’.

While gradient means the gap between two defined points (that is the cost function in this context), descent refers to downward motion in general (that is minimizing the cost function in this context).

So in the context of machine learning, Gradient Descent refers to the iterative attempt to minimize the prediction error of a machine learning model by adjusting its parameters to yield the smallest possible error.

This error is known as the Cost Function. The cost function is a plot of the answer of the question “by how much does the predicted value differ from the actual value?”. While the way to evaluate cost functions often differs for different machine learning models, in a simple linear regression model, it is usually the root mean squared error of the model.

A 3D plot of the cost function of a simple linear regression model with M representing the minimum point

It is important to note that for the simpler models like the linear regression, a plot of the cost function is usually bow-shaped, which makes it easier to ascertain the minimum point. However, this is not always the case. For more complex models (for instance neural networks), the plot might not be bow-shaped. It is possible for the cost function to have multiple minimum points as shown in the image below.

A 3D plot of the cost function of a neural network. Source: Coursera

How Does Gradient Descent Work?

Firstly, it is important to note that like most machine learning processes, the gradient descent algorithm is an iterative process.

Assuming you have the cost function for a simple linear regression model as j(w,b) where j is a function of w and b, the gradient descent algorithm works such that it starts off with some initial random guess for w and b. The algorithm will keep tweaking the parameters w and b in an attempt to optimize the cost function, j.

In linear regression, the choice for the initial values does not matter much. A common choice is zero.

The perfect analogy for the gradient descent algorithm that minimizes the cost-function j(w, b) and reaches its local minimum by adjusting the parameters w and b is hiking down to the bottom of a mountain or hill (as shown in the 3D plot of the cost function of a simple linear regression model shown earlier). Or, trying to get to the lowest point of a golf course. In either case, they will make repetitive short steps till they make it to the bottom of the mountain or hill.

The Gradient Descent Formula

Here's the formula for gradient descent: b = a - γ Δ f(a)

The equation above describes what the gradient descent algorithm does.

That is b is the next position of the hiker while a represents the current position. The minus sign is for the minimization part of the gradient descent algorithm since the goal is to minimize the error as much as possible. γ in the middle is a factor known as the learning rate, and the term Δf(a) is a gradient term that defines the direction of the minimum point.

As such, this formula tells the next position for the hiker/the person on the golf course (that is the direction of the steepest descent). It is important to note that the term γΔ f(a) is subtracted from a because the goal is to move against the gradient, toward the local minimum.

What is the Learning Rate?

The learning rate is the determinant of how big the steps gradient descent takes in the direction of the local minimum. It determines the speed with which the algorithm moves towards the optimum values of the cost function.

Because of this, the choice of the learning rate, γ, is important and has a significant impact on the effectiveness of the algorithm.

If the learning rate is too big as shown above, in a bid to find the optimal point, it moves from the point on the left all the way to the point on the right. In that case, you see that the cost function has gotten worse.

On the other hand, if the learning rate is too small, then gradient descents will work, albeit very slowly.

It is important to pick the learning rate carefully.

How to Implement Gradient Descent in Linear Regression

import numpy as np
import matplotlib.pyplot as plt

class Linear_Regression:
    def __init__(self, X, Y):
        self.X = X
        self.Y = Y
        self.b = [0, 0]

    def update_coeffs(self, learning_rate):
        Y_pred = self.predict()
        Y = self.Y
        m = len(Y)
        self.b[0] = self.b[0] - (learning_rate * ((1/m) * np.sum(Y_pred - Y)))
        self.b[1] = self.b[1] - (learning_rate * ((1/m) * np.sum((Y_pred - Y) * self.X)))

    def predict(self, X=[]):
        Y_pred = np.array([])
        if not X: X = self.X
        b = self.b
        for x in X:
            Y_pred = np.append(Y_pred, b[0] + (b[1] * x))

        return Y_pred

    def get_current_accuracy(self, Y_pred):
        p, e = Y_pred, self.Y
        n = len(Y_pred)
        return 1-sum(
            [
                abs(p[i]-e[i])/e[i]
                for i in range(n)
                if e[i] != 0]
        )/n
    #def predict(self, b, yi):

    def compute_cost(self, Y_pred):
        m = len(self.Y)
        J = (1 / 2*m) * (np.sum(Y_pred - self.Y)**2)
        return J

    def plot_best_fit(self, Y_pred, fig):
                f = plt.figure(fig)
                plt.scatter(self.X, self.Y, color='b')
                plt.plot(self.X, Y_pred, color='g')
                f.show()


def main():
    X = np.array([i for i in range(11)])
    Y = np.array([2*i for i in range(11)])

    regressor = Linear_Regression(X, Y)

    iterations = 0
    steps = 100
    learning_rate = 0.01
    costs = []

    #original best-fit line
    Y_pred = regressor.predict()
    regressor.plot_best_fit(Y_pred, 'Initial Best Fit Line')


    while 1:
        Y_pred = regressor.predict()
        cost = regressor.compute_cost(Y_pred)
        costs.append(cost)
        regressor.update_coeffs(learning_rate)

        iterations += 1
        if iterations % steps == 0:
            print(iterations, "epochs elapsed")
            print("Current accuracy is :",
                regressor.get_current_accuracy(Y_pred))

            stop = input("Do you want to stop (y/*)??")
            if stop == "y":
                break

    #final best-fit line
    regressor.plot_best_fit(Y_pred, 'Final Best Fit Line')

    #plot to verify cost function decreases
    h = plt.figure('Verification')
    plt.plot(range(iterations), costs, color='b')
    h.show()

    # if user wants to predict using the regressor:
    regressor.predict([i for i in range(10)])

if __name__ == '__main__':
    main()

At its core, you can see that the block of code trains a gradient descent algorithm for a linear regression machine learning model using 0.01 as its learning rate on 100 steps.

Upon running the code above, the output shown is given below:

Conclusion

In conclusion, it is important to note that the gradient descent algorithm is especially important in the artificial intelligence and machine learning domains as the models must be optimized for accuracy.

In this article, you learnt what the Gradient Descent algorithm is, how it works, its formula, what learning rate is, and the importance of picking the right learning rate. You also saw a code illustration of how Gradient Descent works.

Finally, I share my writings on Artificial Intelligence, Machine Learning and Microsoft Azure on Twitter if you enjoyed this article and want to see more.

How to Validate your Machine Learning Models Using TensorFlow Model Analysis

Salim Oyinlola — Wed, 05 Oct 2022 14:37:09 +0000

My first deployed Machine Learning model was a failure. It was a simple Diabetes Diagnosis Model for potential diabetes mellitus patients – and quite frankly, I was beyond excited on deployment.

But the excitement soon disappeared when I received feedback from users. Simply put, the users felt the model was bad.

I was saddened by this, but looking back, they were correct. The model may have performed well in terms of top-level metrics. But from the perspective of the consumer, if a machine learning model provides a poor forecast, that person's experience with the model will be bad.

The issue was that specific model features, or slices of data, were causing the model to perform poorly.

In short, before deploying any machine learning model, the onus is on machine learning engineers to assess it, make sure it satisfies strict quality standards, and acts as predicted for all pertinent slices of data.

What is TensorFlow Model Analysis?

To enable Machine Learning engineers to look at the performance of their models at a deeper level, Google created TensorFlow Model Analysis (TFMA). According to the docs, "TFMA performs its computations in a distributed manner over large amounts of data using Apache Beam."

TFMA, as a tool, enables you to really dig into the model's performance and understand how it varies on different slices of data. It provides support for calculating metrics that were used at training time (that is built-in metrics) as well as metrics defined after the model was saved as part of the TFMA configuration settings.

In this tutorial, you will analyze and evaluate results on a previously trained machine learning model. The model you will use is trained for a Chicago Taxi Example, which uses the Taxi Trips dataset released by the city of Chicago. You can check out the full dataset here.

When you are done with this tutorial, you will be able to use Apache Beam to do a full pass over the specified evaluation dataset. Also, you will not only have a more accurate calculation of metrics, but you'll be able to scale up to massive evaluation datasets, since Beam pipelines can be run using distributed processing back-ends.

Prerequisites

Fundamental knowledge of Apache Beam. The Beam Programming Guide is a great place to start.
Fundamental understanding of the workings of machine learning models.
A new Google Colab notebook to run the Python code in your Google Drive. You can set this up by following this tutorial.

Step 1 – How to Install TensorFlow Model Analysis (TFMA)

With your Google Colab notebook ready, the first thing to do is to pull in all the dependencies. This will take a while.

A blank (new) notebook in dark mode

Rename the file from Untitled.ipynb to TFMA.ipynb.

!pip install -U pip
!pip install tensorflow-model-analysis`

The first line upgrades pip to the latest version. pip is the package management system used to install and manage software packages written in Python. It stands for “preferred installer program”. The second line will install TensorFlow Model Analysis, TFMA.

Now, after that is done, restart the runtime before running the cells below. It is important to restart the runtime before running the cells.

import sys
assert sys.version_info.major==3 
import tensorflow as tf
import apache_beam as beam
import tensorflow_model_analysis as tfma

This block of code imports the needed libraries – sys, tensorflow, apache_beam and tensorflow_model_analysis. You use the assert sys.version_info.major==3 command to verify that the notebook is being run using Python 3.

Step 2 – How to Load the dataset

You will download the tar file and extract it.

import io, os, tempfile
TAR_NAME = 'saved_models-2.2'
BASE_DIR = tempfile.mkdtemp()
DATA_DIR = os.path.join(BASE_DIR, TAR_NAME, 'data')
MODELS_DIR = os.path.join(BASE_DIR, TAR_NAME, 'models')
SCHEMA = os.path.join(BASE_DIR, TAR_NAME, 'schema.pbtxt')
OUTPUT_DIR = os.path.join(BASE_DIR, 'output')

!curl -O https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/{TAR_NAME}.tar
!tar xf {TAR_NAME}.tar
!mv {TAR_NAME} {BASE_DIR}
!rm {TAR_NAME}.tar

The dataset downloaded is in the tar file format. It includes the training datasets, evaluation datasets, the data schema and the training and serving saved models along with eval saved models. You will need all of them in this tutorial.

Step 3 – How to Parse the Schema

You need to parse the downloaded schema so that you can use it with TFMA.

import tensorflow as tf
from google.protobuf import text_format
from tensorflow.python.lib.io import file_io
from tensorflow_metadata.proto.v0 import schema_pb2
from tensorflow.core.example import example_pb2

schema = schema_pb2.Schema()
contents = file_io.read_file_to_string(SCHEMA)
schema = text_format.Parse(contents, schema)

You will parse the schema using the text_format method of the google.protobuf library to convert the protobuf message to text format and TensorFlow's schema_pb2.

Step 4 – How to Use the Schema to Create TFRecords

The next course of action would be to give TFMA access to our dataset. For this, we need to create a TFRecords file. We used our schema to create it, since it gives us the correct type for each feature.

import csv
datafile = os.path.join(DATA_DIR, 'eval', 'data.csv')
reader = csv.DictReader(open(datafile, 'r'))
examples = []
for line in reader:
  example = example_pb2.Example()
  for feature in schema.feature:
    key = feature.name
    if feature.type == schema_pb2.FLOAT:
      example.features.feature[key].float_list.value[:] = (
          [float(line[key])] if len(line[key]) > 0 else [])
    elif feature.type == schema_pb2.INT:
      example.features.feature[key].int64_list.value[:] = (
          [int(line[key])] if len(line[key]) > 0 else [])
    elif feature.type == schema_pb2.BYTES:
      example.features.feature[key].bytes_list.value[:] = (
          [line[key].encode('utf8')] if len(line[key]) > 0 else [])
  # Add a new column 'big_tipper' that indicates if the tip was > 20% of the fare. 
  # TODO(b/157064428): Remove after label transformation is supported for Keras.
  big_tipper = float(line['tips']) > float(line['fare']) * 0.2
  example.features.feature['big_tipper'].float_list.value[:] = [big_tipper]
  examples.append(example)
tfrecord_file = os.path.join(BASE_DIR, 'train_data.rio')
with tf.io.TFRecordWriter(tfrecord_file) as writer:
  for example in examples:
    writer.write(example.SerializeToString())
!ls {tfrecord_file}

It is worthy of note that TFMA supports a number of different model types including TF Keras models, models based on generic TF2 signature APIs, as well TF estimator-based models. However, for this tutorial, you will configure a Keras-based model.

In your Keras setup, you will add your metrics and plots manually as part of the configuration (see the metrics guide for information on the metrics and plots that are supported).

Step 5 – How to Set Up and Run TFMA using Keras

import tensorflow_model_analysis as tfma

You'll finally call and use the instance of tfma that you previously imported at this point.

# You will setup tfma.EvalConfig settings
keras_eval_config = text_format.Parse("""
  ## Model information
  model_specs {
    # For keras (and serving models) we need to add a `label_key`.
    label_key: "big_tipper"
  }

  ## You will post training metric information. These will be merged with any built-in
  ## metrics from training.
  metrics_specs {
    metrics { class_name: "ExampleCount" }
    metrics { class_name: "BinaryAccuracy" }
    metrics { class_name: "BinaryCrossentropy" }
    metrics { class_name: "AUC" }
    metrics { class_name: "AUCPrecisionRecall" }
    metrics { class_name: "Precision" }
    metrics { class_name: "Recall" }
    metrics { class_name: "MeanLabel" }
    metrics { class_name: "MeanPrediction" }
    metrics { class_name: "Calibration" }
    metrics { class_name: "CalibrationPlot" }
    metrics { class_name: "ConfusionMatrixPlot" }
    # ... add additional metrics and plots ...
  }

  ## You will slice the information
  slicing_specs {}  # overall slice
  slicing_specs {
    feature_keys: ["trip_start_hour"]
  }
  slicing_specs {
    feature_keys: ["trip_start_day"]
  }
  slicing_specs {
    feature_values: {
      key: "trip_start_month"
      value: "1"
    }
  }
  slicing_specs {
    feature_keys: ["trip_start_hour", "trip_start_day"]
  }
""", tfma.EvalConfig())

It's also important that you create a tfma.EvalSharedModel that points at the Keras model.

keras_model_path = os.path.join(MODELS_DIR, 'keras', '2')
keras_eval_shared_model = tfma.default_eval_shared_model(
    eval_saved_model_path=keras_model_path,
    eval_config=keras_eval_config)

keras_output_path = os.path.join(OUTPUT_DIR, 'keras')

And then you finally run TFMA, ending this step.

keras_eval_result = tfma.run_model_analysis(
    eval_shared_model=keras_eval_shared_model,
    eval_config=keras_eval_config,
    data_location=tfrecord_file,
    output_path=keras_output_path)

Now that you have run the evaluation, look at the visualizations using TFMA. For the following examples, you can visualize the results from running the evaluation on the Keras model.

To view metrics, you will use [tfma.view.render_slicing_metrics](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/view/render_slicing_metrics). By default, the views will display the Overall slice. To view a particular slice, you can either use the name of the column (by setting slicing_column) or provide a tfma.SlicingSpec.

Step 6 – How to Visualize the Metrics and Plots

At this point, it is important that you note that the columns used in the dataset are as follows:

pickup_community_area
fare
trip_start_month
trip_start_hour
trip_start_day
trip_start_timestamp
pickup_latitude
pickup_longitude
dropoff_latitude
dropoff_longitude
trip_miles
pickup_census_tract
dropoff_census_tract
payment_type
company
trip_seconds
dropoff_community_area, and
tips

For a first trial and as an example, you can set slicing_column to look at the trip_start_hour feature from our previous slicing_specs. You are then able to visualize the column.

tfma.view.render_slicing_metrics(keras_eval_result, slicing_column='trip_start_hour')

On running this, you will see that the metrics visualization supports the following interactions:

Click and drag to pan
Scroll to zoom
Right click to reset the view
Hover over the desired data point to see more details.
Select from four different types of views using the selections at the bottom.

Note that your initial tfma.EvalConfig has created a whole list of slicing_specs, which you can visualize by updating slice information passed to tfma.view.render_slicing_metrics. Here you can select the trip_start_day slice (days of the week).

tfma.view.render_slicing_metrics(keras_eval_result, slicing_column='trip_start_day')

TFMA also supports creating feature crosses to analyze combinations of features. To test this, you will create a cross between trip_start_hour and trip_start_day.

tfma.view.render_slicing_metrics(
    keras_eval_result,
    slicing_spec=tfma.SlicingSpec(
        feature_keys=['trip_start_hour', 'trip_start_day']))

Now, crossing the two columns creates a lot of combinations! But you will narrow down your cross to only look at trips that start at 1pm. Then, you will select binary_accuracy from the visualization as shown below.

tfma.view.render_slicing_metrics(
    keras_eval_result,
    slicing_spec=tfma.SlicingSpec(
        feature_keys=['trip_start_day'], feature_values={'trip_start_hour': '13'}))

Step 7 – How to Track Your Model's Performance Over Time

You'll use your training dataset for training your model. It will hopefully be representative of your test dataset and the data that will be sent to your model in production.

But while the data in inference requests may remain the same as your training data, in many cases it will start to change enough so that the performance of your model will change.

That means that you need to monitor and measure your model's performance on an ongoing basis, so that you can be aware of and react to changes.

Let's look at how TFMA can help.

output_paths = []
for i in range(3):
  # Create a tfma.EvalSharedModel that points to our saved model.
  eval_shared_model = tfma.default_eval_shared_model(
      eval_saved_model_path=os.path.join(MODELS_DIR, 'keras', str(i)),
      eval_config=keras_eval_config)

  output_path = os.path.join(OUTPUT_DIR, 'time_series', str(i))
  output_paths.append(output_path)

  # Run TFMA
  tfma.run_model_analysis(eval_shared_model=eval_shared_model,
                          eval_config=keras_eval_config,
                          data_location=tfrecord_file,
                          output_path=output_path)

  eval_results_from_disk = tfma.load_eval_results(output_paths[:2])

tfma.view.render_time_series(eval_results_from_disk)

Using the tfma, you can validate and evaluate your machine learning models across different slices of data.

You can see from the image above that you can evaluate the auc (area under the curve), auc_precision_recall, binary_accuracy, binary_crossentropy, calibration, example_count, mean_label, mean_prediction, precision, and recall metrics of the machine learning model.

Conclusion

Finally, it is important that TFMA can be configured to evaluate multiple models at the same time. Typically, you do this to compare a new model against a baseline (such as the currently serving model) to determine what the performance differences in metrics (for example AUC) are relative to the baseline.

When thresholds are configured, TFMA will produce a tfma.ValidationResult record indicating whether the performance matches expectations.

If at this point, you have questions about the difference between evaluating machine learning models using TensorBoard and TensorFlow Metrics Analysis (TFMA), this is a valid concern. Both are tools for providing the measurements and visualizations needed during the Machine Learning workflow.

But it is important to note that you use them in different stages of the development process. At a high level, you use TensorBoard to analyze the training process itself while TFMA is concerned with the deep analysis of the 'finished' trained model.

Thank you for reading!

How to Create Serverless Logic with Azure Functions

Salim Oyinlola — Mon, 26 Sep 2022 22:35:00 +0000

What is Serverless Computing?

Serverless computing is a cloud computing model where backend services are provided on a pay-as-you-use basis.

In this model, developers get to create and run development code without having to manage or provision servers. As such, they get to focus solely on writing the business logic (or front-end development) code instead.

Microsoft Azure provides a wide range of options for designing this kind of architecture. However, the most frequently used methods are Azure Logic Apps and Azure Functions which will be the main focus of this article.

What are Azure Functions?

Azure Functions is the serverless computing model on Microsoft Azure platform for creating serverless applications. It enables developers to host their business logics without the need for infrastructure.

The code for Azure functions can be written in a wide range of programming languages including C#, JavaScript, and Python, amongst others. Like other cloud services, it operates on a pay-per-use basis, where you only pay for the resources you consume.

By the end of this tutorial, you will be able to:

Create an Azure function app in the Azure portal.
Exercise a function using triggers.
Monitor and test your Azure function from the Azure portal.

Prerequisites

You will need a valid and active Microsoft Azure account to follow along with this tutorial. You can use either:

Free Azure Trial: With this option, you will start with $200 Azure credit and will have 30 days to use it, in addition to free services.
Azure for Students: This offer is available for students only. With this option, you will start with $100 Azure credit with no credit card required and access to popular services for free whilst you have your credit.

Step 1 – Create your Azure Function App

To establish a serverless computing resource for your business logic using Azure Functions, it is essential to create an Azure Function application. Having created a valid and active Microsoft Azure account, navigate to the portal.

Select Create a resource
Select the Create button underneath the Function App pane.
On clicking the button, a page where you enter the details of your project appears as shown below.

Enter the configuration details for this tutorial under the Basics tab before clicking the Review + create button

Enter the Subscription available on your account. Note that your subscription might not be Visual Studio Enterprise Subscription as shown in mine.
For the Resource group, if you have created one before, you could use it. However, if you have not created one or are not familiar with Azure resource group, create a new one by using the Create new button.

At its core, a resource group is used to group similar Azure services on your Azure subscription together to make managing them easier.

For the Function App name option, enter a globally unique app name. It is important to note that this name is will become part of the base URL for your service.
For the Publish option, select Code.
Choose Node.js for the Runtime Slack option since this tutorial's function examples were implemented using that language.
Leave the Version option as default.
Select the region that is geographically nearest to you when filling out the Region choice. A region is a collection of physical server data centers. Given that my base of operations is in Lagos, Nigeria, I chose South Africa North.
For the Operating System, select one that conforms with your selection of runtime.
For the Plan option, select Consumption (Serverless). The plan you choose will determine how your app scales and what features are enabled.
At this point, you can then click the Review + create button.

The validation and deployment process usually takes up to three minutes. When the validation and deployment processes are both completed for the Azure Function set up, you can then verify that your Azure function app is running.

Step 2 – Verify that your Azure Function App is Running

Once the deployment process is done, select Go to resource. Your function's App pane will now appear. To open it in a browser, select the URL link in the Essentials section.

With the message above, you will see a standard Azure functions web app page appear: Your Function 4.0 app is up and running.

Step 3 – Run your Code On-Demand with Azure Functions

With the function app created, the next step is to build and configure the function. To do this, you have to understand what Triggers and Bindings are.

Triggers cause a function to run. A trigger defines how a function is invoked and a function must have exactly one trigger. Triggers have associated data, which is often provided as the payload of the function. Binding to a function is a way of declaratively connecting another resource to the function; bindings may be connected as input bindings, output bindings, or both. Data from bindings is provided to the function as parameters. – Microsoft Learn Documentation

In general, Azure Functions run in response to an event. This event is known as a Trigger and Bindings are used to connect resources to a function. The typical triggers on Azure range from a blob storage that runs a function when a blob is created or updated to a timer that runs a function on a schedule.

To run your code on Azure Functions, you need to first create your function to run your code within the function app using the template.

To do this, click the Function tag on the menu bar on the left of your functions app's home page.

Click the + Create button to create the function.
Select the Azure Queue Storage trigger template. This trigger runs the functions app when it detects that a message has been added to an Azure storage queue.
Leave everything else as its default value.
Click the Create button to create the function.

Given that the function was created with a template, several other files will be created automatically. These files comprises of a source code and a configuration file.

Click on the Code + Test button on the left pane and open the function.json file by selecting it from the dropdown.
Replace the block of code it shows with the one below.

{
  "bindings": [
    {
      "name": "order",
      "type": "queueTrigger",
      "direction": "in",
      "queueName": "myqueue-items",
      "connection": "MY_STORAGE_ACCT_APP_SETTING"
    },
    {
      "name": "$return",
      "type": "table",
      "direction": "out",
      "tableName": "outTable",
      "connection": "MY_TABLE_STORAGE_ACCT_APP_SETTING"
    }
  ]
}

This block of code sets the trigger for the function as when a message is added to the queue named myqueue-items. Furthermore, it sends the return value to the outTable.

Save and Test/Run the function whilst leaving the testing details as their default values and run the function app.
You will see the output shown below.

The fact that the outcome automatically displays on the Output tab suggests that the function app functions well. Because there isn't any business logic applied to the function, the graphic above is blank.

Conclusion

In this tutorial, you have seen that serverless computing is a great option for hosting business logic code in the cloud. You have seen that with serverless offers such as Azure Functions, you can write your business logic in the language of your choice.

Also, it is important to note that not only does the use of serverless computing solutions avoid the over-allocation of infrastructure (because they can be created and destroyed on demand), but they are also event driven. Event driven in the sense that they run only in response to an event (called a 'trigger'), such as a message being added to a queue, or receiving an HTTP request.

Finally, I share my writings on Twitter if you enjoyed this article and want to see more.

How to Evaluate Machine Learning Models using TensorBoard with TensorFlow

Salim Oyinlola — Wed, 14 Sep 2022 18:31:32 +0000

A key part of the Machine Learning pipeline is finding a model that best represents your data and will function effectively on future datasets.

By virtue of their very nature, Machine Learning models improve iteratively. There is hardly any machine learning model that is trained perfectly on the first try. Usuaully, several iterations are required.

As you would imagine, these models have to be evaluated to make them better. In other words, a machine learning model needs to be assessed before it can be improved on.

TensorBoard was developed to give machine learning engineers a more in-depth look at the performance of their models.

What is TensorBoard?

TensorBoard's basic functionality is to deliver the metrics and visualizations you need for your Machine Learning workflow. It allows you to monitor loss and accuracy, view and assess error graphs, and perform many other tasks.

TensorBoard uses graph concepts to represent the data flow and model actions whilst allowing you to see the graph topologies and parameters of complex, huge models. It also has a very user-friendly and basic UI.

In this tutorial, you will analyze and evaluate results on a trained machine learning model. The model you will use will be trained for a MNIST handwritten digits dataset. It uses the MNIST (Modified National Institute of Standards and Technology) database, which contains an ample collection of handwritten digits. This dataset is commonly used for training various image processing systems.

Prerequisites

To complete this tutorial, you will need:

Fundamental understanding of the workings of Machine Learning models.
A new Google Colab notebook to run the Python code in your Google Drive. You can set this up by following this tutorial.

Step 1 – How to Set Up TensorBoard

Since TensorBoard comes automatically with TensorFlow, you don't need to install it using pip in this setup. Also, since TensorFlow comes pre-installed when you create a new notebook on Google Colab, TensorBoard comes pre-installed as well. So, when setting TensorBoard up, you only need to import tensorflow .

A blank (new) notebook in dark mode

Load the tensorboard extension using the %load_ext magic in your notebook.
After doing this, import the necessary libraries (that is, tensorflow and datetime) as shown below:

%load_ext tensorboard

import tensorflow as tf
import datetime

At this point, you have successfully imported an instance of TensorBoard and set it up. You can now get started.

Step 2 – How to Create and Train the Model

In this tutorial, you will use the MNIST dataset, which includes tiny 28 x 28-pixel handwritten single-digit greyscale images. The dataset, which is one of the pre-installed datasets offered by Keras is frequently used to develop Machine Learning models for digit recognition.

Create an instance of the dataset and name it mnist.
Split the data into train sets and test sets. A train set is a subset of the original data that is used to train the machine learning model while a test set is the subset that is used to check the accuracy of the model.
Standardize all the values of your train and test sets. This implies normalizing the image to the [0,1] range.
Define a function that will be used to train the machine learning model on your dataset. The Sequential Keras model will be used.

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

x_train, x_test = x_train/255.0, x_test/255.0

def create_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
  ])

You will use the Sequential Keras model. At its core, it groups a linear stack of layers into tf.keras.Model whilst providing training and inference features on this model.

The .Flatten() layer flattens the input without affecting the batch size. The input shape in this example is 28 x 28 since the images from the dataset are 28×28-pixel grayscale images of handwritten single-digits. The first .Dense() layer is a regular densely connected NN layer.

The activation function used is 'relu' and the dimensionality of its output space is 512. The .Dropout() layer drops some of the input with the fraction of the input units dropped in this tutorial given as 0.2.

Finally, like the first one, the second. Dense layer is also your regular densely connected NN layer. The activation function we're using is 'softmax' and the dimensionality of its output space is ten.

Call the defined function for the model like this:
With the defined function called, train the model with suitable parameters.
Using the datatime library you previously imported, place the logs in a timestamped subdirectory to allow easy selection of different training runs.

model = create_model()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

The logs are important because the TensorBoard will read from the logs to display the various visualizations with respect to the time at the point.

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

Finally, train (or fit) the machine learning model on three epochs (iterations).

model.fit(x=x_train, 
          y=y_train, 
          epochs=3, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

Step 3 – How to Evaluate the Model

To start TensorBoard within your notebook, run the code below:

%tensorboard --logdir logs/fit

You can now view the dashboards showing the metrics for the model on tabs at the top and evaluate and improve your machine learning models accordingly.

Step 4 – How to Improve the Model

Since the point of evaluating your Machine Learning models is to gain better insight to improve the algorithm, it is imperative that we enhance our model. With these visuals, you can now see the in-depth performance of the model.

The Scalars dashboard can be used to observe other scalar values such as training efficiency and learning rate. It demonstrates how the metrics and loss fluctuate with each epoch.
As the name implies, the Graphs dashboard is used to visualize your model.

The Graph with the tensorboard

To improve this model, you will adjust the number of epochs from 3 to 6 and see how the model performs.

In general, the number of epochs is the number of iterations over the entire training dataset the machine learning model is trained on.

Intuitively, increasing this number almost always improves the performance of your machine learning model. To do this, you will run the code as follows:

model.fit(x=x_train, 
          y=y_train, 
          epochs=6, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

With the change we made, you can then generate another TensorBoard like this:

%tensorboard --logdir logs/fit

From the newly generated visuals, you can see that there is a remarkable improvement in the model's performance.

Conclusion

In this article, you learned how you can use TensorBoard to assess and improve your Machine Learning model's performance.

If at this point you have questions about the difference between TensorBoard and TensorFlow Metrics Analysis (TFMA), this is a valid concern. After all, both are tools for providing the measurements and visualizations needed during the Machine Learning workflow.

But it is important to note that you use each of these tools in distinct stages of the development process. At its core, TensorBoard is used to analyze the training process itself, while TFMA is concerned with the analysis of the 'finished' trained model.

Finally, I share my writings on Twitter if you enjoyed this article and want to see more.

Thank you for reading :)

How to Read and Write Data using Azure Databricks

Salim Oyinlola — Mon, 12 Sep 2022 18:49:37 +0000

Azure Databricks is a data analytics platform hosted on Microsoft Azure that helps you analyze data using Apache Spark.

Databricks helps you create data apps more quickly. This in turn brings to light valuable insights from your data and helps you create robust Artificial Intelligence solutions.

Azure Databricks also combines the strength of Databricks as an end-to-end Apache Spark platform with the scalability and security of Microsoft's Azure platform.

In this tutorial, you will learn how to get started with the platform in Microsoft Azure and see how to perform data interactions including reading, writing, and analyzing datasets. By the end of this tutorial, you will be able to use Azure Databricks to read multiple file types, both with and without a schema.

Prerequisites

You will need a valid and active Microsoft Azure account.

Free Azure Trial: With this option, you will start with $100 Azure credit and will have 30 days to use it in addition to free services.
Azure for Students: This offer is available for students only. With this option, you will start with $100 Azure credit with no credit card required. You'll get access to popular services for free whilst you have your credit.

How to Create Your Databricks Workspace

You must create an Azure Databricks workspace in your Azure subscription before you can utilize Azure Databricks. Go to the Azure portal to do this. As long as you've created a valid and active Microsoft Azure account, this will function.

The Microsoft Azure Home Page

Once there, click the Create a resource button.

On the search prompt in the Create a resource page, search for Azure Databricks and select the Azure Databricks option.

The Microsoft Azure page showing the list of popular resources

Open the Azure Databricks tab and create an instance.

The Azure Databricks pane.

Click the blue Create button (arrow pointed at it) to create an instance.

Then enter the project details before clicking the Review + create button.

The Azure Databricks configuration page

It is important to note that the Subscription option shown above will differ from yours. It will depend on the Azure subscription you have available on your account.

Fill the Workspace name field with a globally unique name. Mine is named salim-freeCodeCamp-databricks1.

Enter the location closest to where you are in the Region option. A region is a set of physical data centers that serve as servers. Since I am based in Lagos, Nigeria, I selected South Africa North.

Select the Standard option which includes Apache Spark with Azure AD in the Pricing Tier option.

With all the configurations set, click the Review + create button. The validation process usually takes about two minutes.

With the validation and deployment processes completed for the workspace, launch the workspace using the Launch Workspace button that appears.

The home page for the created instance of Azure databricks - salim-freeCodeCamp-databricks

Click on the button and you will automatically be signed in using the Azure Directory Single Sign On.

Signing into the workspace of the integration of Microsoft Azure and Databricks

The Microsoft Azure Databricks home page will come up in a new tab as shown below:

The Microsoft Azure Databricks home page

With the workspace launch, create a cluster using the Create a cluster option on the left of the page.

After you have clicked the button and you have created any prior, you will pick one and build on it. Else, you will have to create a new cluster using the Create Cluster button.

Set the configurations for the Azure Databricks cluster

To create the cluster, you have to set the configurations. Choose the Single node option, changing from the Multi node default option, and maintain the other options as default.

Click the Create Cluster button at the bottom of the page. Note that this will take a few minutes and that if the dataset is large, you can explore the Multi node option.

Having created the cluster, import some ready-to-use notebooks by navigating to Workspace > Users > your_account on the left taskbar.

Right-click and select the Import option on the dropdown menu.

With the cluster created, you will then have to import some ready to use notebooks.

To do this, using the left taskbar, you will navigate through Workspace > Users > your_account . Then right-click to see the dropdown menu. You will then select the Import option on the dropdown menu.

The import button will be used to import the dataset to be used

Once you click on the Import button, you will then select the URL option and paste the following URL:

https://github.com/salimcodes/microsoft-learning-paths-databricks-notebooks/blob/master/data-engineering/DBC/03-Reading-and-writing-data-in-Azure-Databricks.dbc

The database folder named 03-Reading-and-writing-data-in-Azure-Databricks.dbc will be used,

You will see he list of files in the 03-Reading-and-writing-data-in-Azure-Databricks.dbc database folder

The image above is what the workspace will like after downloading the file. As such, you have created a Databricks workspace.

How to Read the Data in CSV Format

Open the file named Reading Data - CSV.

Upon opening the file, you will see the notebook shown below:

You will see that the cluster created earlier has not been attached.

On the top left corner, you will change the dropdown which initially shows Detached to your cluster's name. Mine is named Salim Oyinlola's freeCodeCamp Cluster.

The cluster initially created is now attahed to the python notebook

With your cluster attached, you will then run all the cells one after the other.

Running the first cell of the python notebook will initialize the classroom variables & function, mount the dataset and create user-specific database

At its core, the notebook simply reads the data in csv format. Then it adds an option that tells the reader that the data contains a header and to use that header to determine our column names.

You can also add an option that tells the reader to infer each column's data types (also known as a schema).

It is important to note that data can be read in different formats such as JSON (with or without schemas), parquet, and table and views. To achieve this, you can simply run the respective notebooks for each format.

How to Write Data into a Parquet File

Just as there are many ways to read data, there are many ways to write data. But in this notebook, we'll get a quick peek of how to write data back out to Parquet files.

Apache Parquet is a column storage file format that Hadoop systems (such as Spark and Hive) use. The file format is cross-platform, language independent, and it stores data in a column layout using a binary representation.

Parquet files, which effectively store large datasets, have the extension .parquet.

Like what you did when reading data, you will also run the cells one after the other.

The cell to write data into a parquet file

Integral to writing into the parquet file is creating a DataFrame. You will be creating one by running this cell.

This cell shows that the existing files are being overwritten

The .mode"overwrite" method shown below implies that by writing DataFrame to parquet files, you are replacing existing files.

The file has been written and saved in an output location.

At its core, the notebook reads a .tsv file (the same used to read for the .csv file) and writes it back out as a Parquet file.

How to Delete the Azure Databricks Instance (Optional)

Finally, the Azure resources that you created in this tutorial can incur ongoing costs. To avoid such costs, it is important to delete the resource or resource group that contains all those resources. You can do that by using the Azure portal.

Navigate to the Azure portal.
Navigate to the resource group that contains your Azure Databricks instance.
Select Delete resource group.
Type the name of the resource group in the confirmation text box.
Select Delete.

Conclusion

In this tutorial, you have learned the basics about reading and writing data in Azure Databricks.

You now understand the basics of Azure Databricks, including what it is, how to install it, how to read CSV and parquet files, and how to read parquet files into the Databricks file system (DBFS) using compression options.

Finally, I share my writings on Twitter if you enjoyed this article and want to see more.

Thank you for reading :)