by Dan Ruta
How you can use AI, AR, and WebGL shaders to assist the visually impaired
Today, about 4% of the world’s population is visually impaired. Tasks like simple navigation across a room, or walking down a street pose real dangers they have to face every day. Current technology based solutions are too inaccessible, or difficult to use.
As part of a university assignment, we (myself, Louis, and Tom) devised and implemented a new solution. We used configurable WebGL shaders to augment a video feed of a user’s surroundings in real-time. We rendered the output in a AR/VR format, with effects such as edge detection and color adjustments. Later, we also added color blindness simulation, for designers to use. We also added some AI experiments.
We did a more in-depth literature review in our original research paper. ACM published a shorter, two page version here. This article focuses more on the technologies used, as well as some of the further uses, and experiments such as AI integration.
A popular approach we found in our studies of existing solutions was the use of edge detection for detecting obstacles in the environment. Most solutions fell short in terms of usability, or hardware accessibility and portability.
The most intuitive approach we could think of as feedback to the user was through the use of a VR headset. While this meant that the system would not be of help to very severely visually impaired people, it would be a much more intuitive system for those with partial sight, especially for those with blurry vision.
Feature detection, such as edges, are best done using 2D convolutions, and are even used in deep learning (convolutional neural networks). Simply put, these are dot products of a grid of image data (pixels) against weights in a kernel/filter. In edge detection, the output is higher (more white) when the pixel values line up with the filter values, representing an edge.
There are a few available options for edge detection filters. The ones we included as configurations are Frei-chen, and the 3x3 and 5x5 variants of Sobel. They each achieved the same goal, but with slight differences. For example, the 3x3 Sobel filter was sharper than the 5x5 filter, but included more noise, from textures such as fabric:
The web platform
The primary reason we chose the web as a platform was its wide availability, and compatibility across almost all mobile devices. It also benefits from easier access, compared to native apps. However, this trade-off came with a few issues, mostly in terms of necessary set-up steps that a user would need to take:
- Ensure network connectivity
- Navigate to the web page
- Turn the device to landscape mode
- Configure the effect
- Enable VR mode
- Activate full screen mode (by tapping the screen)
- Slot the phone into a VR headset
To avoid confusing a non-technical user, we created the website as a PWA (progressive web app), allowing the user to save it to their Android home screen. This ensures it always starts on the correct page, landscape mode is forced on, the app is always full screen, and not reliant on a network connection.
So instead, we turned to WebGL shaders. Shaders are awesome because of their extreme parallelization of a bit of code (the shader) across the texture (video feed) pixels. To maintain high performance, while keeping a high level of customization, the shader code had to be spliced together and re-compiled at run-time, as configurations changed, but with this, we managed to stay within the 16.7ms frame budget needed for 60fps.
We carried out some user testing. We tested some basic tasks like navigation, and collected some qualitative feedback. This included adjustments to the UI, a suggestion to add an option to configure the colors of the edges and surfaces, and a remark that the field of view (FoV) was too low.
Both software improvement suggestions were applied. The FoV was not something which could have been fixed through software, due to camera hardware limitations. However, we managed to find a solution for this in the form of cheaply available phone-camera fish-eye lenses. The lenses expanded the FoV optically, instead of digitally.
Other than that, the system surpassed initial expectations, but fell short on reading text. This was due to there being two sets of edges for each character. Low light performance was also usable, despite the introduction of more noise.
Some other configurations we included was the radius of the effect, its intensity, and color inversion.
Other use cases
An idea we had was to add shader effects to simulate various types of color blindness, providing an easy way for designers to detect color blindness related accessibility issues in their products, be they software or otherwise.
Using RGB ratio values found here, and turning off edge detection, we were able to add basic simulations of all major types of color blindness through extra, toggle-able components in the shaders.
AI and future work
Although it’s an experiment, still in its very early stages, higher level object detection can be done using tensorflowjs and tfjs-yolo-tiny, a tensorflowjs port of tiny-yolo, a smaller and faster version of the YOLO object detection model.
The next step is to get instance segmentation working in a browser, with something similar to mask rcnn (though, it may need to be smaller, like tiny-yolo), and add it to WebSight, to highlight items with a color mask, instead of boxes with labels.
Of course, I had some extra fun with this as well. Being able to edit what you can see around you in real time opens up a world of opportunities.
For example, using a Matrix shader, you can feel like The One.
Or maybe you just enjoy watching the world burn.
You can tweet more shader ideas at me here: @DanRuta