Structure from Motion

3D from standard cameras for automotive applications

The World Health Organization estimates than 1.2 million people die in traffic worldwide. Drivers in the USA spend about five years of their lives in a car, and due to the high cost of traffic incidents, insurance has risen to about $0.10 per mile. Automotive Advanced Driver Assistance Systems (ADAS) have the potential to make a big impact: saving lives, saving time, and saving cost by aiding the driver in operating the vehicle, or even take over control of the vehicle completely.
One of the key tasks of an ADAS is to have an understanding of the vehicle’s surroundings. Besides recognizing pedestrians, other vehicles, lanes and obstacles, the system should be aware of where other objects are, in full 3D space. This 3D information enables the ADAS to understand the distance of objects, along with their size, direction and speed, allowing it to take appropriate action. It’s common to think that us humans use our two eyes to sense depth information. At the same time though, we can easily catch a ball with one eye closed. Research has shown that humans actually primarily use monocular vision to sense depth, using motion parallax. This is a depth cue that results from movement. As we move, objects that are closer to us move farther across our field of view than objects that are more distant. The same mechanism, called Structure from Motion can be used in to sense depth using standard video cameras. There are different ways to sense depth using special cameras. Lidar measures distance by illuminating a target with a laser and analyzing the reflected light. Time-of-flight cameras measure the delay of a light signal between the camera and the subject for each point of the image. Another method is to project a pattern of light onto the scene. Capturing this distored pattern with a camera allows the extraction of depth information. Using Structure from Motion has a few key advantages over these approaches. Firstly, there’s no active illumination of the scene required. Such active lighting limits range and outdoor use. In addition, a standard off-the-shelf camera suffices, instead of a specialized depth-sensing camera. This reduces cost, since the standard rear-view or surround-view cameras can be reused, and no active lighting components are needed.

Structure from Motion algorithm

The Structure from Motion algorithm consists of three steps:

  • • 1. Detection of feature points in view
  • • 2. Tracking feature points from one frame to the next frame
  • • 3. Robust estimation of 3D position of these points, based on their motion

The first step is to identify points in the image that can be robustly tracked from one frame to the next. Features on textureless patches, like blank walls, are nearly impossible to localize. Areas with large contrast changes (gradients), like lines, are easier to localize, but lines suffer from the aperture problem, i.e., it is only possible to align patches along the direction of the line, not in a single position. This renders lines not useful to track from frame to frame either. Locations where you find gradients in two significantly different orientations are the good feature points that can be tracked from one frame to the next. Such features show up in the image as corners, where two lines come together. There are different feature detection algorithms, and they’ve been widely researched in the computer vision community. In our application, we use the Harris feature detector. The next step is to track these feature points from frame to frame, to find how much they moved in the image. We use the Lucas Kanade optical flow algorithm for this. This algorithm first builds a multiscale image pyramid, where each image is a smaller scaled image of the originally captured image. The algorithm then searches around the previous frame’s feature point location for a match. When the match is found, it reuses this position as an initial estimate with the larger image in the pyramid, traveling down the pyramid until the original image resolution is reached. This way, larger displacements can also be tracked. The result is two lists of feature points; one for the previous image and one for the current image. Based on these point pairs, you can define and solve a linear system of equations that finds the camera motion, and consequently the distance of each point from the camera. The result is a sparse 3D pointcloud covering the camera’s viewpoint. This pointcloud can then be used for different applications such as automated parking, obstacle detection, or even accurate indoor positioning for mobile phone applications.

Vision processor

Videantis has been working together with Viscoda to implement the Structure from Motion algorithm on the videantis v-MP4280HDX vision processor. The Viscoda Structure from Motion algorithm has been proven to be very robust in reconstructing a 3D point cloud under a wide variety of situations, whether it is in low light conditions or for complex scenes. The videantis vision processor is licensed to semiconductor companies for integration into their systems-on-chips that target a wide variety of automotive applications. The processor architecture has specifically been optimized to run computer vision algorithms at high-performance and with very-low-power consumption. The multi-core architecture can scale from just a few cores to many cores, enabling it to address different performance points: from chips that can be integrated into low-cost cameras, all the way up to very high-performance applications such as multi-camera systems that run a many computer vision algorithms concurrently. The combined solution runs the Viscoda Structure from Motion algorithm on the videantis processor architecture. The resulting combined implementation is small and low power enough to be integrated into smart cameras for automotive applications, making our rides safer and enabling us to let go of the wheel and pedals.

Das könnte Sie auch interessieren

A $50,000 Camera you Already Own

Conventional cameras capture images using only three frequency bands (red, blue, green), while the full visual spectrum is a much richer representation that facilitates a wide range of additional and important applications. A new technology allows conventional cameras to increase their spectral resolution, capturing information over a wide range of wavelengths without the need for specialized equipment or controlled lighting.

Anzeige
Inspired by the Kinect

Although different 3D cameras and scanners have existed for some time, present solutions have been limited by several unwanted compromises. If you wanted high speed, you would get very low resolution and accuracy (e.g. Time-of-Flight cameras and existing stereo vision cameras, which despite being fast typically have resolution in the millimeter to centimeter range). If you wanted high resolution and accuracy, you would typically get a camera that was slow and expensive (e.g. the high accuracy scanners).

www.zividlabs.com

Anzeige
4. VDI-Fachkonferenz ‚Industrielle Bildverarbeitung‘

Vom 18. bis 19. Oktober veranstaltet der VDI die nunmehr 4. Fachkonferenz zum Thema ‚Industrielle Bildverarbeitung‘ im Kongresshaus Baden-Baden. In 19 Fachvorträgen werden u.a. die Schwerpunktthemen Automation in der Robotik mit 3D-Bildverarbeitung, Oberflächeninspektion und Bildverarbeitung in der Nahrungsmittelindustrie und intelligenten Logistik behandelt.

www.vdi-wissensforum.de

Anzeige
Low Noise SWIR-Camera with 400fps

C-Red 2 is an ultra high speed low noise camera designed for high resolution SWIR-imaging based on the Snake detector from Sofradir. The camera is capable of unprecedented performances up to 400fps with a read out noise below 30 electrons. To achieve these performances, it integrates a 640×512 InGaAs PIN Photodiode detector with 15m pixel pitch for high resolution, which embeds an electronic shutter with integration pulses shorter than 1μs. The camera is capable of windowing and multiple ROI, allowing faster image rate while maintaining a very low noise.

www.first-light.fr

Anzeige
Whitepaper: Sechs Kriterien für den optimalen Bildsensor

Ob Automatisierung, Mensch-Maschine-Kollaboration in der Robotik oder selbstfahrende Autos – die Auswahl des richtigen Sensors hängt stark von der Applikation und dem gewünschten Output ab. Diese 6 Faktoren helfen Ihnen dabei, den passenden Sensor für Ihre Applikation zu finden!

imaging.framos.com

Anzeige
Metrologic Anwendertreffen 2017

Vom 17. bis 18. Oktober veranstaltet die Metrologic Group ihr diesjähriges Anwendertreffen in Uhingen. Bei den zahlreichen Anwendervorträgen und der Hausmesse sind auch Messtechnik Firmen wie Creaform, Faro, Hexagon, Kreon und Wenzel vertreten. Anmeldeschluss ist bereits der 4. Oktober.

www.metrologic.frNEUIGKEITENNeuigkeiten