MIT is using computational imaging to view the unseen

The combination of cameras and computers can do anything from fighting wildfires in California to finding survivors in natural disasters. And MIT is using them to help people see around corners.   Seven years ago, MIT researchers created a new imaging system that used floors, doors and walls as mirrors to give information about scenes that were outside the normal line of sight. Special lasers were used that produced 3D images.

MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) has built off of that original work to develop a simple computational imaging method that can reconstruct hidden video, according to a press release. The findings will be released in a paper next week at the Conference on Neural Information Processing Systems in Vancouver.

Using shadows and light reflections observed from a pile of clutter, MIT researchers are able to reconstruct video from an unseen area of the same room, even if it is outside of the camera's view, according to research team member, CSAIL PhD student Prafull Sharma, who is one of the authors of the paper, along with Lukas Murmann, and researcher Adam Yedidia, and MIT professors Fredo Durand, Bill Freeman, and Gregory Wornell.

SEE: Special feature: Autonomous vehicles and the enterprise (free PDF) (TechRepublic)

"If you just look at the pile of clutter, you would not imagine that we can reconstruct something meaningful out of it," Sharma said. "Just being able to do it is very fascinating, but if you talk about applications, there's this entire field of use cases: Search-and-rescue missions, elderly care, autonomous vehicles, etc."

How they revealed hidden info

To set the scene, Sharma provided an example. Imagine you are standing in a room, looking at an object or pile of objects. You aren't able to see what is behind you, or perhaps what is going on outside your field of vision in another part of the room. However, you might be able to see some faint shadows caused by the pile of objects, which could indicate something is happening outside of your view.

"The objective was, is this enough information to deduce something about the hidden [situation], and can we actually reconstruct it? It's a linear problem, which means that whatever you're observing is basically the multiplication of two matrices," Sharma said.

The two matrices are light transport and hidden video. Light transport involves the way light travels in a scene, which is used to estimate the hidden content out of view, he said.

The process begins by turning on a video camera in a room, which acts as a field of vision, set on a pile of clutter. The pile of clutter almost acts as a pinhole camera one would build in an elementary school science class, according to the press release.

"[The clutter] blocks some light rays, but allows others to pass through, and these paint an image of the surroundings wherever they hit," according to the press release. "But where a pinhole camera is designed to let through just the right amount of rays to form a readable picture, a general pile of clutter produces an image that's scrambled (by the light transport) beyond recognition, into a complex play of shadows and shading."

The clutter almost acts as a mirror, providing a scrambled idea of its surroundings. By multiplying the light transport and hidden video, the team was able to produce a rough idea of what was happening in the hidden scene, Sharma said.

To estimate both of these matrices, the team used convolutional neural networks. "These are a type of neural network produces image-like structures," Sharma said. "If you think about it, these two matrices have image-like structures along different dimensions. For example, the hidden video, each frame is an image. On the light transport matrix side, for each pixel when it was lit up, you were getting an image.

"We have a convolution operations upon these dimensions to create image-like structures. We jointly optimize it to such that the output from these two neural networks should multiply together and equate to whatever was observed," he said.

One neural network produces the scrambling pattern of light and shadows, the other estimates the hidden video, and the combination is able to reproduce an idea of hidden information, according to Sharma.

After the camera records a video of the clutter, the team transfers the content to a graphics processing unit (GPU). Once the video is produced, the team used the networks implemented in PyTorch, which is an open source machine learning library from Facebook, Sharma said.

Use cases

Currently, the team's process takes two hours, however, if the hidden information is able to be discovered in real time, the applications could be very impactful, Sharma said.

"One could imagine that if in a search-and-rescue operation, [and] you wanted to know what was in the room without even going inside, you could use such a technique," Sharma said. "Or for fall detection for the elderly, because you cannot place cameras everywhere, but cameras can most likely observe parts of the room or areas that could possibly have these characteristics. Using this you can see the hidden part of the room and you could basically estimate if someone has fallen and so on."

An extremely powerful application could involve autonomous vehicles, according to Sharma.

"Imagine a parking lot and you cannot always see what's around the corner, but there could possibly just be some pile of clutter which could be observed," he continued. "If our technique can be optimized to run in real time, we can reconstruct the hidden part or situations around the corner. This could avoid collisions, or if there are people walking by, it could basically raise an alert that there might be some people."

As a next step, the team hopes to improve the overall resolution of the system and eventually test it in an uncontrolled environment, according to the press release.

For more, see MIT's self-assembling robots are making moves, literally on TechRepublic.

Also see