Learn TensorFlow Lite for Edge Devices

TensorFlow Lite is an open source deep learning framework that can be used on small devices.

We just published a TensorFlow Lite course on the freeCodeCamp.org YouTube channel.

Bhavesh Bhatt created this course. Bhavesh has created many courses on his own channel and is a great teacher.

TensorFlow Lite is developed by Google and is used to train Machine Learning models on mobile, IoT (Interned of Things), and embedded devices.

When you use TensorFlow Lite the machine learning all happens within the device. This can avoid sending data back and forth with a server.

Here are the topics covered in this course:

Why do we need TensorFlow Lite?
What is Edge Computing?
Why is Edge Computing gaining popularity?
Challenges in deploying models on Edge devices
What is TensorFlow Lite or TFLite?
TensorFlow Lite Workflow
Creating a TensorFlow or Keras model
Converting a TensorFlow or Keras model to TFLite
Validating the TFLite model performance
What is Quantization?
Compressing the TFLite model further
Compressing the TFLite model even further
Validating the most compressed TFLite model performance

Watch the full course below or on the freeCodeCamp.org YouTube channel (1-hour watch).

Transcript

(autogenerated)

TensorFlow light allows you to do machine learning on small devices.

bhavesh is an experienced instructor and He will teach you all about TensorFlow light in this course.

Hello, everyone.

In this tutorial, you will learn the basics of TensorFlow light, and how TensorFlow light can help you create really efficient models that you can deploy on edge devices.

So without wasting any further time, let's kick start the tutorial.

Let us kick start today's discussion about the flight with a small story.

I have a friend whose name is john.

JOHN really likes traveling to different places.

One of his favorite applications is Google lense.

Whenever he visits a new country, he removes his cell phone takes a photograph of the monument that is in front of.

And Google essentially tells him which monument it is.

So now, just by the sheer fascination of the tool, john goes forward and creates his own neural network for detecting landmarks.

He goes through a rigorous process of collecting data, labeling data, cleaning data.

And finally he creates a machine learning model that can tell john, which monument it is.

So everything looks good, he is able to reach a very high accuracy score as well.

Now the only challenge that he has is where should he deploy the model that is created.

So technically, he has two options.

The option that he explores first is cloud computing.

He takes his train model, and he deploys it on Cloud.

He exposes the API that he is created.

And essentially, he creates an Android application that kind of queries the API and fetches the result once he's kind of passed in an image.

Now one of the major challenges that he saw when he created the solution is the network latency.

So the images that are captured generally by cell phones today, range anywhere between three to 10 Mb.

So transporting such huge files take up a lot of time in the entire process of making a prediction.

The second piece that adds complexity to the entire solution is the cost.

The neural network that john has created requires to resources, a storage resource wherein he can save the model weights, and a compute resource for making an inference, the overall project becomes a costly affair for john.

What does he do next, he creates an Android application.

And he finds a way to make an inference on Android using this huge model that he is created.

So john now faces a new issue.

The issue is the model is really huge.

And the cell phone is not very capable enough of storing and processing such a big model.

So now john, is really confused.

He's tried out different techniques to make this work.

But nothing is solving this issue.

Well, it is here that TF light comes into picture.

Before we jump and discuss about the flight, I wanted to throw some light in terms of what edge devices are.

So edge devices are your normal cell phones that you use.

So if you're planning to create some amazing TensorFlow based applications, then essentially one of the main platforms that you can utilize is your cell phones, bat or Android cell phone, or an iOS powered cell phone, anything works.

The other pieces of hardware devices that you can classify into edge devices or microcontrollers.

So there are some amazing applications that have been built using very small compute power.

And all of it is thanks to microcontrollers.

Given how recently, the variable devices that you were essentially have increased their computation power by using a more or faster CPU, you can also put your wearable devices such as your smartwatches into the edge computing bracket.

Now let me go forward and give you a formal definition of edge computing.

So edge computing is basically the practice of moving compute and storage resources closer to the location at which it is needed.

So that is where your edge devices would come into picture.

Now, if you recall it, john had two options to deploy the model.

The first option was to deploy the model on Cloud.

Now many of you would have the impression that a machine learning model running on the server using a large GPU is much more better as compared to running it on the device itself.

Well, the truth is edge devices have become an important platform for machine learning.

Why is it gaining so much of popularity that you essentially have to run your entire machine learning models on the edge devices? Well, let me share details one at a time.

The first and foremost reason why edge computing as a whole is gaining popularity is because of latency.

The use cases that require real time speed, definitely require models to run on the device.

For example, you might be able to reduce the inference latency of resnet 50, from 30 milliseconds to 20 milliseconds.

But the network latency can go up to seconds.

So essentially, it also depends on where your model is deployed.

Where are you waiting the API from.

So if you take into account all of these factors, then essentially deploying a model on server would not be the best possible situation when you want inferences in near real time, or exactly real time as well.

The second reason why creating machine learning models on edge devices is important is because of network and activity.

So if you go back to our earlier example, wherein john wanted to create his own version of Google lense for detecting landmarks, if he happens to visit a country where there is little to no connectivity, it is here that creating a model that sits on the device would be much more better as compared to deploying it on server, because there is this additional dependency on the network that comes into picture.

The third reason why it's very important for you to create machine learning models that run on edge devices, is user privacy.

Putting your machine learning models on the edge devices is also appealing when you are handling sensitive user data.

Machine learning on the cloud means that your systems might have to send user data over networks, making it susceptible to being intercepted.

Cloud computing also means that storing data of many users in the same place or location, which means a data breach can affect many people at once.

So it becomes really important to create machine learning models that can run on edge devices, which can preserve the user privacy data.

Let me now go forward and show you some examples of on device machine learning use cases.

The first one that I'm showing you right now is the feature to try out various cosmetics using AR on YouTube.

The entire computation piece that you're seeing here, is essentially happening on the device itself.

Now the second example that I want to show you is something that you might be already aware of, which is Google Translate.

Google Translate has a feature that allows you to capture text with your phone camera, and translate them in real time without any internet connection.

All of this is essentially possible using edge computing, and more specifically, TF light.

Now that you've seen the amazing new applications that you can create at your end as well, you might be wondering, the second alternative that john had taken initially, that is to create an entire huge TensorFlow model, and then run the inferences from the device.

Why did that fail? Well, to answer that question, here are some of the challenges that you might face when you create chaos or TensorFlow models, and deploy them directly onto edge devices.

Edge devices, not only restricted to mobile phones, but your microcontrollers as well have limited compute power.

Limited memory.

battery consumption is also a factor that you have to account for, as well as the application size.

If I consider a simple microcontroller is well, the processing power isn't so much that you can essentially run inferences on a three or four GB model.

If you consider the storage capacity of majority of the edge devices, well, then ideally, you wouldn't have a lot of storage that you can utilize for just one model.

So these are the challenges that john faced when he took the second approach as well.

So what is the solution? Well, the solution is TensorFlow light.

And so flow light is a production ready cross platform framework for deploying machine learning models on mobile devices and embedded systems.

And flow light at this point of time supports Android, iOS, and any IoT device that can run Linux.

So essentially, if you have any of these hardware devices handy with you Then you can quickly create a TensorFlow model, convert it to an equivalent TF light model, and start using the amazing TF light model.

Now, you might wonder, what exactly is the workflow to create a TF light model? Well, let's look at that as well.

So the workflow is fairly simple, you start by creating a TensorFlow slash karass model.

So in the entire process, you would have to collect data, you would have to clean data, pre processed data, then create models, iterate over multiple models, and based on the metric that you're chasing, if you are chasing for a higher accuracy score, then you would choose a model that would give you the best possible accuracy.

And that's about it, you have your TensorFlow model ready.

Now from that TensorFlow model, you convert it to a TensorFlow light model.

So there is a format change that happens.

I'll talk more about it as we go along.

Once you've converted your model from TensorFlow, to TensorFlow light, then you go forward and deploy the entire TF light model and run your inferences on the edge device.

When I mean inferences, I mean, predictions.

Okay.

Let me now explain this using a block diagram.

So this essentially is the workflow that I've just mentioned, you start off with your high level curiosity is create a model.

Once you have the model ready, then you can essentially use a TF light converter.

And the converter basically takes your save TensorFlow format file, and converts it into a flat buffer file.

I'll give you an idea in terms of what I mean by flat buffer file.

So let's move forward.

So TensorFlow lite represents your model in a flat buffer format.

Now flat buffer format is an efficient cross platform serialization library for c++, C sharp, go Java, kotlin, JavaScript, Python, and so on and so forth.

It was originally created at Google for game development and other performance critical applications.

But slowly, Google realized that you can use the flat buffer format in deploying models on edge devices.

Now you might have an obvious question, why not use our old tried and tested protocol buffers.

And why shift to something that is as newest flatbuffers protocol buffers just to give you context, all your kiraz models that you create all your TensorFlow models that you create are essentially protocol buffer format.

protocol buffers are essentially very similar to the flat buffer format that Google has created.

The major difference is flatbuffers do not need a parsing or an unpacking step to a secondary representation, before you can access the data.

And the code essentially is also larger in case of protocol buffers.

It is your while using TF lite, we make use a flat buffer format, and not protocol buffer format.

So now let's move forward.

So far, we've looked at various aspects of edge computing, we've looked at how deploying models on edge devices is better as compared to deploying it on Cloud.

We've also looked at what TensorFlow light is, and what all it can support at this point of time.

Now is where things would get interesting when I show you through code, how TF light can actually compress your model size without compromising on the accuracy piece.

So now let's go forward and witness the magic of TF light.

Now that we've understood the basics of TensorFlow light, and edge computing, let me show you the power of TensorFlow light using Python.

So for this example, I'm using Google collab.

For those of you who don't know, Google lab is an online environment wherein you can write Python code, you can create machine learning and deep learning models, you can also make use of deep learning models that Google gives you access to for good amount of time.

So this is the interface that I'll be using.

I'll be attaching the link to the GitHub repository in the description section of the video, feel free to access the code from there.

Also inside the GitHub repository.

I'll also give you a link that can open a Google collab notebook directly.

With all the groundwork done, let me now go forward and show you the magic of TensorFlow light.

So the process that I'll follow in this particular tutorial, is I'll create a deep learning model using TensorFlow slash Eros.

I will scale it down to a TF light equivalent model.

Once that is accomplished, I will show you the size difference between the original model as well as the compression model.

I will show you techniques how you can keep compressing the model even further without having to compromise on the accuracy piece.

With that, let me create an instance on Google collab by pressing Connect.

So currently, Google is allocating some space for my computations.

If you're planning to replicate the entire thing that I show in today's video in your local machine, then you will require some set of installations as well.

Given that I'm working with Google collab, all the dependencies that I require for this example are already met.

So now let me go through the different things that are required for this entire tutorial.

First things first, I require the voice module to essentially read my files.

Next up, I'll import NumPy as NP I will require this particular library for mathematical operations.

I will also require h5 p y library.

The h5 p vi library is a pythonic interface to the HD f5 binary data format.

So technically, whatever models I create in chaos, I will basically save it into the h5 format.

Next up, I require matplotlib.

This is again used for visualization.

I require TensorFlow, I'll import kiraz.

From TensorFlow.

These are some of the layers that will require when we come to the deep learning model creation aspect.

If I want to calculate how good my model is performing in terms of accuracy score, this is where SK learn dot matrix module will give me the accuracy score functionality.

And from the system library, I'll also require the function get size of.

So these are some of the things that I require in order to create a TF like model.

So now, let me go forward and run the cell.

So when I run the cell, what essentially happens is Python essentially runs that piece of code.

So let me now go forward and run the cell.

So I don't see any error.

That means all of our imports are in place.

Let me go forward and show you the TensorFlow version that I'm using for this particular tutorial.

I'm currently using TensorFlow 2.6 point zero.

If there are changes that creep up with respect to the API.

Feel free to refer to the TensorFlow documentation.

There are two functions that have created the first function name is get underscore file underscore size.

Essentially, I'm passing in the file location using the OAS library and specifically the function get size, I'm able to get the size of a particular file that I pass in, in byte so let me now go forward and run this cell.

In the previous function, that is get underscore file underscore size, the value returned would be in bytes.

So now rather than comprehending the values of a file size in byte, I've created a helper function called as convert underscore bytes, which essentially takes in the input size in bytes and converts it in either KB MB.

So this is something that I've created.

I've not included DBS because so this is more of an explainer video wherein I intend to create a smaller model or rather a proof of concept rather than like a huge model.

So that is why I have restricted my units to cavies or MDS.

So let me go forward and run the sale.

Now for this particular example, I'll be basically using a very famous data set and deep learning, Carla's fashion m&s data set, so let me unhide the cell so fashion emnes data set contains 70,000 grayscale images, which belong to 10 different categories.

So categories would include the shirt, trouser, pullover, dress, code, sandal, shirt, sneaker bike, and ankle boot.

So these are the different categories that are part of this data set.

The entire activity is a supervised learning task, I will have a set of images, and every image will have a label associated with it.

And I'm trying to train a deep learning model.

So now let me go forward.

So you don't have to worry about the download pieces.

Well, if you have installed TensorFlow correctly, then essentially you just have to call the chaos dot data set dot fashion amnesty function, save the entire data set into a variable called as fashion underscore m nest.

So that is the first step that I've done here.

Once you've done that, then essentially what you have to do next is you have to split your data set into training and testing.

The way you can achieve that is ideally by calling a function called as load underscore data.

So this is what you have in terms of the function.

Once you call this function from fashion underscore m nished, the variable that you just created, you would be able to split your data into training images, training labels, st images, and test labels.

So that's how simple it is.

So let me go forward and run the cell.

So we've downloaded the data files, we've split our data into training and testing.

I've also created a variable, a list variable, called as class underscore names, which contains all the names of the classes that are part of this entire activity.

So let me go forward and run the cell.

So if you recollect, we had 70,000 samples in our data set, we've already split that entire data set into training and testing.

So let me go forward and show you how many images are part of the training data set.

So let me run this cell.

So the shape of my training data set is 60,000 comma 28, comma 28.

So I have 60,000 images.

Each image has a size of 28 cross 28.

So 28 rows, 28 columns represent each image, and I have 60,000 such images.

Given this is a supervised learning task, I would also require 60,000 labels.

So let me check if the total number of labels in my training data set are 60,000.

So let me run this cell.

So as you can clearly see, I have 60,000 labels as well.

Now given that, I've already mentioned that there are 10 unique classes, let me verify that as well.

So as you can clearly see I have class numbers ranging from zero to nine.

And the mapping is what I've created here, which is contained in the variable class underscore names.

Let's now go forward and explore that testing data set as well.

So let me quickly unhide this let me show you the total number of images in the testing data set.

So that come out to be 10,000 60,000 for training in 1000.

For testing, each image is again of the size 28 cross 28.

Similarly, if I look at test underscore labels, I'll have 10,000 samples.

What I intend to show you next is I want to show you a sample image.

So let me show that to you.

So this is a sample image that is part of this data set.

This is clearly an ankle boot.

The size of the image is 28 cross 28.

So 28 rows 28 columns is what you see here.

Before we go forward and train a neural network, a good practice is to scale In the intensity values of the images, which range between zero to 255, zero to one.

So that is what I have done in this piece of code.

So let me run this.

So now my train images will have values ranging from zero to one, and not zero to 255.

So far, we've downloaded the data set, we've split our data set into training and testing.

And we've done some sort of pre processing as well, now is the time when we'll create a simple neural network that 10 classify the entire images into one of the 10 categories that are there.

So in this piece of code, I'm calling the sequential class from the chaos library.

I'm passing in the first layer as a flattened layer.

Now, if you recollect, the images were 28 cross 28.

If I have to pass it through a layer, then I have to basically flatten it first, I am not creating a convolutional neural network given that the data set is fairly simple, I will stick to a normal deep neural network.

So the first layer that I add is a flattened layer, wherein I pass the input shape, which is 28, comma 28.

The second layer is the dense layer.

And the activations supplied to this dense layer is relu.

The final layer is again a dense layer, given that I have 10 different classes to classify between.

So that is what I have here.

So let me quickly create an instance of the model.

So let me run this.

Before we go forward and compile the model, I'll show you the structure of the model as well.

So I'll say model dot summary.

So this essentially is the model summary.

So for the given architecture, we have close to 100k trainable parameters.

Now the next step is to compile the model.

I'm passing in the optimizer, I am passing in this past categorical cross entropy loss.

Given that our classes are mutually exclusive, I'm using the sparse categorical cross entropy loss as compared to the normal categorical cross entropy loss.

And the matrix attend casing for his accuracy.

So I want to create a model that is fairly accurate.

So let me now go forward and run the cell.

So I've created an instance of the model.

I've also compiled the model, now is the time when I'll pass in the training images as well as the training labels to train the entire trainable parameters.

So let me now call the model dot fit function, wherein I'll be passing in the train images, train labels, and I kind of run the entire exercise for 10 epochs.

So let me run the cell.

So with every epoch, you can see that the accuracy is increasing.

So we've successfully trained our model and we've reached a training accuracy score of around 91%, which is Something that's reasonably good given that I've trained the model for only 10 epochs.

So let's go forward.

And remember one thing, the objective of the video is not to train the most accurate classifier at this point of time, but to show you the power of TF light, so that is why I've kind of stopped at 10 a box.

Now the next thing that I do is I create a variable called SK Ross underscore model underscore name.

This is something that will be used as reference, or this will be the baseline model performance that I evaluate later on with the TF light models as well.

So the name of this particular variable is pf underscore model, underscore fashion underscore m NIST dot h phi.

So let me quickly run the cell.

Now let me go forward and call the model dot save function and pass in the filename that I just created.

So let me run the cell.

So as soon as you run the cell, you would have a file that would be created in your Google collab session or in your local directory, which is essentially your saved model file.

So let me show that is well.

So this is our saved model file that has been created.

So let me quickly and I the cell again.

Now I'll go forward, and I'll show you the size of this particular file that we've created.

So let me call the two functions that I've created, that is convert underscore bytes, and get underscore file underscore size, I pass in the same file name, and I want the file size to be in MB.

So let me run the cell.

So currently, I have a model that occupies 1.2 Mb.

So I'll go forward and I'll create a variable called SK Ross underscore model, underscore size, and save the byte equivalent size into this particular variable.

So let me run the cell.

We know for a fact that the model is performing really well on the training data set.

But then essential litmus test is to check how well the model is performing on unseen data that is my testing data set.

So let me go forward and evaluate how good our model performances on their testing data set.

So I call the function model dot evaluate, I pass in the test images, test labels.

And I save the results into two variables called test underscore loss, and test underscore accuracy.

So let me quickly run the cell.

So as you can clearly see, the loss is at a very small value, which is around point three, seven.

And I've reached a testing accuracy score of around 88%.

So we've completed the first part, now it's time to move on to the next part that is creating a TF light equivalent of the same model.

So let's go forward.

So I start the activity by creating a variable called as TF underscore light underscore model underscore file underscore name.

And I pass in an equivalent name to this particular TF light model, which in our case, currently is TF underscore light underscore model.tf light.

So let me quickly run the cell.

Now the process of converting a TensorFlow model or a karass model into a TF light model, essentially requires just a couple of steps.

So this is what I'll highlight right now.

So the first step is to call tf.light.tf light converter dot from Kara's underscore model, I pass in the model that I created.

So if you remember the name of the model variable was essentially model so that is what I'm passing in here in the first line.

Once I've created an instance of the TF light converter from Eros MADI, I save the entire piece into a variable called as TF underscore light underscore converter.

And finally, what I do next is I call the Convert function.

Once the conversion happens, I want the result to be saved into a variable called as TF light underscore model.

So let me quickly run the cell.

So if you look at the output, it says that assets are written into a particular temporary file.

So from that temporary file, I basically have to retrieve the model weights and save it into a TF light equivalent file.

So that is what I've done using this piece of code.

I've created the first variable which is TF light underscore module underscore name.

And I'm passing in the initial name that I've created in the first line of this particular section.

I open the file name with write access, and I write this particular temporary file into this file that I've created.

So that's how simple it is.

So let me quickly run this.

So there is a particular output that is displayed.

This tells me the total number of bytes that have been returned to this particular file.

Now let me go forward and show you the exact size of this TF lite model in kilobytes.

So let me run this.

So the overall file size is close to 400 kilobytes.

So we started off with a model which was occupying around 1.2 Mb.

And after just running a couple of lines of code, we have brought down the file size to around 400 kb.

Now, let me go forward and save this file size into a variable called s TF light underscore file underscore size.

This is something that will make a lot of sense once we go forward.

So let me quickly run this L.

Now we've already converted a model from kiraz to tf light.

But one thing that we've not validated currently is how good the model is in terms of performance.

Is it actually good on unseen data? Or has it dropped in terms of the accuracy score.

So that is what I want to check next, that after compressing the model using TF light, are we losing out on accuracy or not.

So in this section, I'll go over how you can validate the results in terms of how good your TF light model is performing.

So now, let me quickly unhide the cell.

Now, don't get scared by looking at this piece of code, I'll help you understand what I'm trying to achieve.

You're now loading a model, or TensorFlow or a karass model into a TensorFlow session is fairly easy.

But here, what we have done is we've kind of created a TF light model.

If you go back to the discussion that we had, TF led models are essentially flat buffer format files and not your normal usual protocol buffer files.

So in order for us to actually make an inference out of TF light files on our TensorFlow or a Python session, we require something called as an interpreter.

So it is your that will be basically making use of tensor flows interpreter to load the TF light file, and then make inferences or predictions.

So let me now take you through each and every line of code.

So in the first line, I create an instance of the interpreter class, I pass in the TF light model name that we've just created.

So if you look at this particular section, you will also have a TF lite file.

This is what I'm passing in here.

Now once we've created the interpreter object, the interpreter object saves details about the model.

It will have details about the input that it expects the value that type of values it expects, and in terms of the output, it will tell you what the shape of the output should be.

In terms of the output, it will tell you the output shape as well as the output the type that is the output values, it will predict what are the values and what are the type of values.

So all of that is what this particular interpreter will actually have details about.

The details it is fetching is again from the interpreter object that we created and we passed in the TF lite file.

So all of the details would be captured in this particular TF lite file, which is what is read by this interpreter object.

And that is what we are trying to accumulate from input underscore details and output underscore details.

Once we have the input and output details, I also am interested in the shape of the input that is expected and that type of inputs are the variable nature of the inputs that are there.

So let me quickly run this cell to make more sense.

So if you look closely, the input shape is 128 28.

The input type that it expects is NumPy, float 32 the output shape is one comma 10 that is one so basically has one row and 10 columns and the output type is again NumPy float 32.

So this is essentially what the TF lite file contains.

Now if you look at this particular one, this denotes that the A flight is expecting one input at a time.

Now I want to check how good it is performing for 10,000 inputs.

That is where I'll have to reshape the input shape to a particular value, which is what I'll be achieving in this piece of code.

Just to reiterate, again, the input shape is 128 28.

So ideally, I have to pass in just one image sample.

And essentially, I would get a corresponding output for it.

But essentially, in my use case, I want to validate how good the TF light model is performing for the testing data set that I have, which essentially contains 10,000 images.

So if this idea is clear to you, let's go forward.

So now I want to validate how good my TF light model is performing on my testing data set.

So I call the resize underscore tensor underscore input function.

I pass in the details that I want to resize this particular index value, and I pass in how I have to resize it.

So currently, I have 10,000 samples.

So that is what I have entered here, that is 10,000 comma 28, comma 28.

Similar resize operation is what I'm doing at the output side.

So you can see your 10,000 comma 10, from the initial one comma 10.

So that is what I've done here.

Now, once the resize operation has happened, I want to call the allocate tensors.

to actually change the entire structure of the interpreter.

This is what it's read using the TF light file.

And now when I print the input details and output details, I should be able to see that the entire TF light input output values have changed.

So let me quickly run this piece of code.

So as you can clearly see, the input shape has changed from 128 28 to 10,028 28.

So this essentially will help me to validate how good my TF light model is performing.

The other thing that I want to highlight right now is s underscore images dot d type is float 64.

So if you look at the input shape that the model expects is NumPy dot float 32.

So now the only other change that I have to make in order to validate my TF light model is I have to create a new array called as test underscore images underscore NumPy.

Pass in the original array, and change the D type.

I can do it in the same area as well.

But I'm essentially choosing to create two different arrays.

So let me quickly run the cell.

So now if I show you the D type of test underscore images dot NumPy, it will be NumPy float 32.

Now that we have the entire interpreter object set up correctly for our set of inputs, that is the testing data set.

All I have to do right now is firstly call the set underscore tensor function, passing the test underscore images underscore NumPy array that I just created and call the invoke function.

What the invoke function would essentially do is pass in the inputs, get the output.

And once you have the output ready, you call the get underscore Insert Function, which will kind of have the output ready for you and save it into a variable called as TF slide underscore model underscore predictions.

So let me quickly run the cell.

Now the output that you see here, which is prediction results shape is 10,000 rows and 10 columns.

So every column would essentially contain a probability score.

So what I have to do next is I have to pick out the value or the index between zero to nine that has the maximum probability, which is what I've done using this function called as NP dot arg max.

So this will help me get numbers directly that is zero to nine rather than having 10 different columns with probability scores.

Now let me calculate the accuracy score.

And let me print it out for you.

So the testing accuracy of the TF light model is exactly the same that you see when you compare it with your normal karass model.

Now how much of space Have you saved in this entire process? says, Mel let me calculate a ratio between TF lite file size and karass model file size.

So overall, the TF lite model occupies close to 32% of the overall file size that my normal karass model occupies.

But the uniqueness is that I'm not losing out on any accuracy.

So this is the power of TF Lite.

I've been able to compress my entire model from 1.2 Mb to around 400 kb.

And I haven't yet compromised a bit on the accuracy pieces.

Well, Isn't this amazing? Well, if you think the story is ended here, hang on for a second, there is more to go.

So far, what we've done is I've taken a TensorFlow model.

And without any optimization, I've basically converted that into an equivalent TF light model.

Now I'll show you how you can compress your model even further.

without losing out on accuracy as such.

So now let me introduce you to a new concept called les quantization.

So what exactly is this term that I've just mentioned, that is quantization.

So for a given weight value that can be represented in float 32 or float 64 format? Wouldn't it be great if we can bring down the size of those particular values and see very little change in accuracy? Well, this essentially is the concept of quantization, I'm reducing the total number of bits for every weight value, so that the overall size of the entire array reduces.

Just to be more clear, if I have a neural network something like this, where this particular weight value is 5.31345, this particular weight value is 3.8958.

And you have the other way values is well, what if I can change these representations that occupies so many bits to something like this, there will be a small hit in the accuracy.

But overall, I'll be able to compress my model even further.

How when whatever questions you have in mind, just wait for some time.

If this entire idea is clear to you, let us go back to the coding section.

And I'll show you how you can compress your TF light model even further.

So by default, in the previous example, wherein we took a karass model, and we converted that to a TF light model, every weight value is essentially float 32 format.

Wouldn't it be great if I compress it from float 32 to float 16.

This is the activity that I'll be performing next.

So I create a variable with an MTF underscore light underscore model underscore float underscore 16 underscore file underscore name.

And I basically give it a TF lightning which essentially represents that the entire weights inside it would be fluid 16.

So that is what I have here.

So let me quickly run the cell.

If you look at the previous section in terms of how we created a TF light model, the first line of code is something that is pretty much familiar to you, you pass in your karass model, you call the TF light converter dot from kiraz model, and you save it into a variable.

Even the last piece of code is also something that you've already looked at.

What you haven't seen so far is the optimization.

So when you create an instance of the TF light converter, there is a flag called as optimizations.

So you have the optimizations flag here.

I set it to tf dot light dot optimizer default, so I want the default optimizations to take place.

And one other things that I do here.

So I'll speak more about the optimizations in the next section.

So hold on to that thought as well.

Now here there is one more flag called as target underscore spec, and supported underscore types.

Here is where I set every weight value from floor 32 to float 16.

So that is what I'm doing here.

So let me quickly run this piece of code.

So now I have a TF light model, wherein every weight value would be a float 64 Automat, I follow the same process again, wherein I'm fetching data from the temporary file and saving it into a TF lite model.

So let me run this.

I don't know if you've guessed it already or not.

This essentially is a file size in bytes for the newly converted a flight model.

So if I now show you the size of this newly converted TF flight model, then my size has drastically reduced from 400 kilobytes to 200 kilobytes.

The only thing that I changed here was I changed the individual representation of every made value that's about it.

Isn't this amazing, I'm able to save so much of memory, just by changing few values here and there.

Say again, save the file size into a variable called as TF lite underscore float underscore 16 underscore file underscore size.

So let me run this.

Now if I compare it to the original karass model, this particular model occupies 16% of the size that the original model occupied.

And if I compare it to the previously created version, then I can see almost 50% compression that I'm able to achieve by changing the weights from floor 32 to floor 16.

So this is the power of optimizations and TF light.

If you think this is it weird for the next section, wherein I compress the model even further.

I'm not showing you the accuracy piece right now you might be wondering, why isn't he showing us the accuracy has accuracy taken a toss? Well, the answer is no.

I show you the accuracy of an even compress model.

So that will give you a fair sense in terms of how much compression is changing the overall accuracy values as well.

Okay.

So now we have reached the final section wherein I will compress the model even further.

Okay, so here I've created a variable called as TF underscore light, underscore size, underscore current underscore model, underscore file underscore name.

And here I want to see what the eflite file with this particular name.

So I'll quickly run the cell.

In the previous example, I changed every weight value from floor 32, to floor 16.

Rather than you deciding what is good for your model, I would rather let TF light decide that for me.

So if you've been following along so far, then this piece of code is something that we've already covered.

This piece of code is also something that we've already covered.

This is something that is unique.

So here, I set the optimizations fly.

And here I just mentioned, optimize for size, there are different values that you can kind of go through in the documentation.

So based on your needs, you can kind of optimize for size, and the other optimization that are also available for TF light.

So I'm not specifying what type of data type I want, I just want the most optimized version wherein the size that is occupied by this particular TF light model is the most compressed version.

Okay, so I'll quickly run the cell.

So I catch all of the file that is saved into a temporary variable, I save it into this particular variable that I created.

And if you've guessed by now, this is the new file size in bytes.

If I go to the kilobyte section, then my file occupies around 100 kb.

So if you remember, we started from 1.4 Mb, and we have brought down the file size of a deep neural network.

200 kilobytes.

Isn't this amazing.

Just to give you some numbers again, if I compare this particular file size with my original file size, then my current file is almost 8% the size of my original file, that is my karass model that I created.

If I compare it to the previous model as well, I'm basically able to achieve 50% compression, all because of optimizing for size.

This essentially is a power of TF light.

Now I'm really happy with the compression, I have a 1.4 Mb file that I've compressed down to around 100 kb.

But is the accuracy still the same? Well, we'll again follow the same process, wherein I load the newly quantized model into the interpreter object.

I'll get details of those objects, and I'll again reshape the values and pass it testing data set entirely through the interpreter object.

So I'll quickly run the cell.

So as you can clearly see, 128 28 is the interpreter object input values that it expects, I have the output as one comma 10.

The input and output values that are expected are NumPy, float 32.

So that is all good as well.

Coming to the final section, I follow the same process.

Again, no change in the process.

I have a testing data set of 10,000 images, which is what I pass in here.

I allocate tensors, I get the details, and I'll show the details to you as well.

So let me quickly run this 10,028 28 128 28.

So we've resized the tensor input values that we are expecting to validate our testing data set.

You essentially don't need the step again, but as kind of copy it from the initial part.

So I'm running it again.

I pass in the values again.

Now I calculate the accuracy score.

Now is the litmus test for this highly compressed TF light model.

So let me quickly run the cell.

So the accuracy of a TF lite model that occupies almost 8% of the size of the original karass model is equivalent to the original karass model.

If I go up, I'm recording this video in one go.

So if I go up where did this value go? here the value was at 7.66%.

Here it is at 87.59%.

So this is what you can achieve using TF Lite.

I started off with a very simple neural network, the entire model occupied around 1.2 Mb.

You might also argue that 1.2 Mb is kind of small.

But the problem statement was fairly simple.

If you have like a really complex example, wherein you have to classify images into 1000 or 10,000 categories, the model size would eventually increase.

So the objective then becomes can we compress the model size? And the answer is yes, TF lite will help you compress the model size without having to compromise any bit on the accuracy front.

If you've reached this point, then I'm assuming you've seen the entire video or if randomly reached at this point, whichever way you've reached this point.

I hope you enjoyed today's video, I keep creating such amazing videos on data science, machine learning, and Python.

So feel free to check out my channel in the description section of the video as well.

Thank you so much for watching this video.