by Gant Laborde
Perf Machine Learning on Rasp Pi
3 Frameworks for Machine Learning on the Raspberry Pi
The revolution of AI is reaching new heights through new mediums. We’re all enjoying new tools on the edge, but what are they? What products frameworks will fuel the inventions of tomorrow?
If you’re unfamiliar with why Machine Learning is changing our lives, have a read here.
If you’re already excited about Machine Learning and you’re interested in utilizing it on devices like the Raspberry Pi, enjoy!
Simple object detection on the Raspberry Pi
I’ve implemented three different tools for detection on the Pi camera. While it’s a modern miracle that all three work, it’s important for creators to know “how well” because of #perfmatters.
Our three contenders are as follows:
- Vanilla Raspberry Pi 3 B+— No optimizations, but just using a TensorFlow framework on the device for simple recognition.
- Intel’s Neural Compute Stick 2 — Intel’s latest USB interface device for Neural Networks, boasting 8x perf over the first stick! Around $80 USD.
- Xnor.ai — A proprietary framework that reconfigures your model to run efficiently on smaller hardware. Xnor’s binary logic shrinks 32-bit floats to 1-bit operations, allowing you to optimize deep learning models for simple devices.
Let’s evaluate all three with simple object detection on a camera!
Vanilla Raspberry Pi 3 B+
A Raspberry Pi is like a small, wimpy, Linux machine for $40. It allows you to run high-level applications and code on devices like IoT made easy. Though it sounds like I can basically use laptop machine learning on the device, there’s one big gotcha. The RPi has an ARM processor, and that means we’ll need to recompile our framework, i.e. TensorFlow, to get everything running.
⚠️ While this is not hard, this is SLOW. Expect this to take a very… very… long time. This is pretty much the fate of anything compiled on the Raspberry Pi.
Here are all the steps I did, including setting up the Pi camera for object detection. I'm simply including this for posterity. Feel free to skip reading it.
Special thanks to Edje Electronics for sharing their wisdom on setup, an indispensable resource for my own setup and code.
Once I got Tensorflow running, I was able to run object recognition (with the provided sample code) on Mobilenet for 1 to 3 frames per second.
Vanilla Pi Results
For basic detection, 1 to 3 frames per second aren’t bad. Removing the GUI or lowering camera input quality speeds up detection. This means the tool could be an excellent detector for just simple detection. What a great baseline! Let’s see if we can make it better with the tools available.
Intel’s Neural Compute Stick 2
This concept excites me. For those of us without GPUs readily available, training on the edge instead of the cloud, and moving that intense speed to the Raspberry Pi is just exciting. I missed the original stick, the “Movidius”, but from this graph, it looks like I chose a great time to buy!
My Intel NCS2 arrived quickly and I enjoyed unboxing actual hardware for accelerating my training. That was probably the last moment I was excited.
Firstly, the USB takes a lot of space. You’ll want to get a cable to keep it away from the base.
That’s a little annoying but fine. The really annoying part was trying to get my NCS 2 working.
There are lots of tutorials for the NCS by third parties, and following them got me to a point where I thought the USB stick might be broken!
Everything I found on the NCS didn’t work (telling me the stick wasn’t plugged in!), and everything I found on NCS2 was pretty confusing. For a while, NCS2 didn’t even work on ARM processors!
After a lot of false-trails, I finally found and began compiling C++ examples (sorry Python) that only understood USB cameras (sorry PiCam). Compiling the examples was painful. Often the entire Raspberry Pi would become unusable, and I’d have to reboot.
The whole onboarding experience was more painful than recompiling Tensorflow on the raw Pi. Fortunately, I got everything working!
The result!? 🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁🥁
NC2 Stick Results
6 to 8 frames per second… ARE YOU SERIOUS!? After all that?
It must be a mistake, let me run the
10 frames per second…
From videos on the original NCS on python I saw around 10fps.. where’s the 8x boost? Where’s the reason for $80 hardware attached to a $40 device? To say I was let down by Intel’s NCS2 is an understatement. The user experience and final results were frustrating, to put it lightly.
Xnor.ai is a self-contained software solution for deploying fast and accurate deep learning models to low-cost devices. As many discrete logic enthusiasts might have noticed, Xnor is the logical complement of the bitwise XOR operator. If that doesn’t mean anything to you, that’s fine. Just know that the people who created the YOLO algorithm are alluding to the use of the logical operator to compress complex 32-bit computations down to 1-bit by utilizing this inexpensive operation and keeping track of the CPU stack.
In theory, avoiding such complex calculations required by GPUs should speed up execution on edge devices. Let’s see if it works!
Setup was insanely easy. I had an object detection demo up and running in 5 minutes. 5 MINUTES!
The trick with Xnor.ai is that, much like the NCS2 Stick, the model is modified and optimized for the underlying hardware fabric. Unlike Intel’s haphazard setup, everything is wrapped in friendly Python (or C) code.
model = xnornet.Model.load_built_in()
That’s nice and simple.
But it means nothing if the performance isn’t there. Let’s load their object detection model.
Again, no complexity, they have one with no overlay, and one with. Since the others (except for perfcheck on NCS2) were with overlays, let’s use that.
JAW… DROPPING… PERFORMANCE. I not only get a stat on how fast inference could work, but I also get an overall FPS with my overlay that blew everything else out of the water.
OVER 12FPS and an inference speed over 34FPS!?
This amazing throughput is achieved with no extra hardware purchase!? I’d call Xnor the winner at this point, but it seems a little too obvious.
I was able to heat up my device and open a browser in the background to get it down to 8+ FPS, but even then, it’s a clear winner!
The only negative I can give you on Xnor.ai is that I have no idea how much it costs. The Evaluation model has a limit of 13,500 inferences per startup.
While emailing them to get pricing, they are just breaking into non-commercial use, so they haven’t created a pricing system yet. Fortunately, the evaluation model would be fine for most hobbyists and prototypes.
If you need to take a variety of models into account, you might be just fine getting your Raspberry Pi setup from scratch. This would make it a great resource for testing new models and really customize your experience.
When you’re ready to ship, it’s no doubt that both the NCS2 and the Xnor.ai frameworks speed things up. It’s also no doubt that Xnor.ai outperformed the NCS2 in both onboarding and performance. I’m not sure what Xnor.ai’s pricing model is, but that would be the final factor in what is clearly a superior framework.
Post Publish Updates:
This is an excellent blog post on setting up the NCS2
Additionally, if you’re looking to play around with Xnor.ai, the link is www.xnor.ai/ai2go
Gant Laborde is Chief Technology Strategist at Infinite Red, a published author, adjunct professor, worldwide public speaker, and mad scientist in training. Clap/follow/tweet or visit him at a conference.
Expect more awesome edge blog posts coming soon!