Machine vision technology continues to rapidly advance and improve, performing object recognition at increasing rates and with increasing accuracy, thus allowing for a multitude of vision-based work to become a reality. This has been demonstrated most readily in navigation systems (such as Tesla’s Autopilot 2.0), but also in other automated processes, from agricultural harvesting to cashier-less stores such as Amazon Go. As cameras are doing more and more of the work, systems are often able to reduce their reliance on costly radar, LiDAR, and ultrasonic sensors. However, cameras are subject to the same inherent visual obstacles we humans are; rain, dust, dirt, fog, snow, frost… all of these provide challenges to our own human vision as we navigate the world today, and camera-based systems are no different. While missing an apple in an automated harvester due to dust on a camera lens is bad for business, not being able to see a pedestrian or a stop sign can have dangerous, even deadly consequences.

Clarity is Needed, but Tends to Add Unwanted Complexity

With this increased reliance on cameras, particularly in the autonomous driving market, the challenge of image obfuscation has become a serious problem that needs to be solved. And although this application of machine vision is on the bleeding edge of modern technology, the solutions often being considered are decidedly unsophisticated. From wipers to washer nozzles to high-frequency vibration generators, there are a multitude of potential clearing agent options. While many of these are effective, most have basic drawbacks. They add cost and weight. They take up valuable packaging space. They can be unreliable, and are yet another subsystem that may need to be serviced or replaced. They are also fundamentally physical in nature, and have a tendency to focus on fixing the problem from an outside-in approach. This leads us to look at the problem from a perspective useful in many potential AI applications; from the original human intelligence perspective.

This leads us to look at the problem from a perspective useful in many potential AI applications; from the original human intelligence perspective.

Like a vision-based navigation system, our eyes can often encounter obstructions that do not allow us to see and process a flawless series of moving images. While there are certainly instances where we remove the worst of the hindrances using goggles, glasses, or simply by cycling our eyelids to clean our lenses and start anew, more often than not our brain connects the dots and fills in the blanks based on what we know we should see. And we are able to discern what that is so accurately because our eyes have been trained for years on reality. That is, we use human intelligence that has been trained with real-life datasets to make judgements on what we see and react accordingly. AI can be used to do just that, eliminating the need for costly and ineffective clearing agents.

Teaching AI to See, Process, and Judge like the Most Intelligent Vision System Around: Humans

Using this insight, we have developed a demonstration of how this could work called SharpWave. SharpWave is a system that has been trained using generative adversarial networks (GANs) to essentially repair obfuscated and damaged images in real time. GANs effectively pit two neural networks against each other – one suggesting a potential candidate for a fix (the generator), and the other evaluating and pointing out any flaws to improve the realism in the suggested data (the discriminator). Iterate intelligently, and one can recreate a useful image or video stream out of one that may have caused a system to break down. There are a number of use cases for implementing a technology like this. One can imagine a scenario where a video stream has been sufficiently repaired to the point where graphics processors can identify objects and make decisions based on those identifications using a repaired video that would otherwise be useless.

Imagine: what if we could use AI to repair the data the vision systems do collect rather than employing costly and often cumbersome mechanical solutions to improve what is captured in the first place? That would be the definition of an elegant solution. And since most vision-based systems already have the graphical processing horsepower required to run this algorithm onboard, it could be as simple as a routine real time processing of the video (or even LiDAR) data, ensuring the quality of the data being processed by the system is appropriate and usable, no matter what the surrounding environment throws at (or on) it.