Monthly Computer Vision Meetup Roundup #4

Project Description

Monthly Computer Vision Meetup Roundup #4

This February we hosted the 4th Computer Vision Meetup in a row, and as always we were happy to be given this opportunity. Our small Viennese Computer Vision community is constantly growing, there were already 50 participants this time! No matter if enthusiasts, students, researchers, or people employed in the field of Computer Vision – all of us are drawn in by the relaxed atmosphere, as well as the great talks we had prepared this time. The first presentation held by René Donner from the Computational Imaging Research Lab, Medical University of Vienna, was a practical introduction to Deep Learning and the existing frameworks, which perfectly complemented his theoretical introduction talk held in December. In the second presentation, Stephan Schraml from the Austrian Institute of Technology (AIT) introduced the audience to the world of Event-Driven Stereo for 3D 360° Panoramic Vision and brought his little robot-friend with him.

A TUTORIAL TO DEEP LEARNING BY RENÉ DONNER – PART 2

At the beginning of his talk, René did a short review of what we had learned about Deep Neural Networks in December, just in case someone had missed it. Back then we were shown a Deep Convolutional Neural Network, which was trained to distinguish between two classes, a human face and a face of a cat. René disected the Neural Network and showed us the individual convolutional layers of the network to demonstrate how the network first learns low-level features similar to Gabor-filters and other filters used for edge detection, and later combines this low-level information into more complex shapes, which resemble eyes, faces, and so on. Even though hierarchical approaches based on Gabor-filters and other edge detectors have been known for a few decades, the most fascinating part is, that instead of manually fine-tuning these filters, the Convolutional Neural Network learns the filters by itself! If you are interested in the theory, also take a look at René’s first talk here:  Deep Learning by René Donner Part I

presentation René Donner

As the last time was more theoretical, this time it was more about the existing frameworks used for Deep Learning. Based on his experience, René explained the advantages and disadvantages of the individual frameworks, and finally concluded, that he prefers the recently open-sourced TensorFlow framework by Google for its fast high-level scripting in Python, as well as the ability to do low-level modifications with C++. Another mentioned advantage is, that you can train your model also on a GPU, and thus immensely decrease the training time. Let’s hope Google also releases its multi-GPU training support any time soon  😉

[av_one_third first][/av_one_third]

[av_two_third]

[av_hr class=’invisible’ height=’25’ shadow=’no-shadow’ position=’center’ custom_border=’av-border-thin’ custom_width=’10px’ custom_border_color=” custom_margin_top=’30px’ custom_margin_bottom=’30px’ icon_select=’yes’ custom_icon_color=” icon=’ue808′ font=’entypo-fontello’]

TensorFlow is modelled in such a way, that it describes the mathematical computation of a network with a directed graph of nodes and edges. Nodes implement the mathematical operations, such as “plus” or “softmax”, but can also represent endpoints which are fed with data, used to push out results, or read/write persistent variables. Edges describe the input/output relationships between the individual nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors. The flow of tensors through the graph is where TensorFlow gets its name.

[av_hr class=’invisible’ height=’35’ shadow=’no-shadow’ position=’center’ custom_border=’av-border-thin’ custom_width=’25px’ custom_border_color=” custom_margin_top=’30px’ custom_margin_bottom=’30px’ icon_select=’yes’ custom_icon_color=” icon=’ue808′ font=’entypo-fontello’]

[/av_two_third]

Afterwards, René showed us practical implementations of a MNIST dataset classifier. The first classifier was a simple fully-connected neural network with one hidden layer. Looking at the weights connected to individual hidden neurons, one could identify responses very closely modeling the shape of the 10 numbers (classes in the MNIST dataset). The training went very fast and ended up accurate on both the training and evaluation set. However, the disadvantage of this simple approach is, that it is not robust to small variations related to translation and rotation of the input. For this reason, René showed us Convolutional Neural Networks. Here the network learns individual filters, which the input image is then convolved with, and thus making the system robust with respect to translation and rotation in some degree.

fully-connected network vs Convolutional Neural Network

fully-connected network vs Convolutional Neural Network

René made it visible that he has collected a lot of experience with Neural Networks over the years and we are very thankful that he is so kind to share it with our meetup community. He showed us how to program such a fully-connected network and Convolutional Neural Network with TensorFlow, and commented on each line or paragraph of code. We are sure, René inspired the one or two to try out TensorFlow for themselves, and get into Deep Learning!

Event-Driven Stereo for 3D 360° Panoramic Vision by Stephan Schraml

presentation Stephan Schraml

In this talk, Stephan introduced us to his current research at the AIT in the field of sparse stereo reconstruction using the lesser-known Dynamic Vision Sensor (DVS). The DVS is a bio-inspired optical sensor, which in contrast to the CCD- or CMOS sensors, produces spike-like signals, when certain events are registered. In particular, these events are illumination changes, and the sensor produces a positive signal, if the illumination at a sensor element increases and this change is above a certain threshold, and analogously a negative signal, if the illumination decreases and the change is above a certain threshold. Another main difference to the “classical” optical sensors is, that this image processing is feature-based and event-driven, instead of on a per-frame basis. The advantage of the Dynamic Vision Sensor is, that it has a wide dynamic range, efficient encoding of local changes in the scene, and high temporal resolution, such that the input to the processing software is a continuous stream of events. This makes the DVS a viable option for applications like quality control, people counting, traffic monitoring, and others. Have a look at the presentation.

 

Continuous stream of events from the DVS

Continuous stream of events from the DVS

 

Accumulated positive- and negative events over 20ms

Accumulated positive- and negative events over 20ms

 


360° Panoramic Dynamic Stereo Vision System

Later, Stephan explained his motivation for using the Dynamic Vision Sensor for 360° panoramic stereo vision, namely for example that it is already used by Google in the self-driving cars, mobile robots may take advantage of the full 360° view, as well as video surveillance, and others. Also, the currently existing systems have some disadvantages that may be overcome by using the DVS. Laser-based panoramic systems have a 360° field of view, are accurate, but are expensive (30.000-70.000€). Systems using parabolic mirrors capture the panoramic scene in one image, but have image distortions that need to be accounted for, a complex geometry, and a lower image resolution. Multi-perspective systems have two rotating parallel stereo cameras, but the amount of information captured that needs to be processed, is not feasible.

For this reason, Stephan Schraml, Ahmed Nabil Belbachir (Austrian Institute of Technology), and Horst Bischof (Graz University of Technology) adapted the Dynamic Vision Sensor technology for a multi-perspective system, enabling the system to reconstruct a sparse 3D 360° panoramic image [1]. Here the stereo system utilizes a pair of rotating DVS with sensor elements aligned in a column. This way, the rotating Dynamic Vision Sensor will detect edges in a static scene.

Panoramic stereo Dynamic Vision Sensor system

Panoramic stereo Dynamic Vision Sensor system

[av_hr class=’invisible’ height=’35’ shadow=’no-shadow’ position=’center’ custom_border=’av-border-thin’ custom_width=’25px’ custom_border_color=” custom_margin_top=’30px’ custom_margin_bottom=’30px’ icon_select=’yes’ custom_icon_color=” icon=’ue808′ font=’entypo-fontello’]

However, there are still challenges, namely to find a stereo matching method for sparse data, and that the input of the two DVS is non-simultaneous (see image below).

Non-simultaneous capturing of the same point in scene by the stereo DVS system

Non-simultaneous capturing of the same point in scene by the stereo DVS system

[av_hr class=’invisible’ height=’35’ shadow=’no-shadow’ position=’center’ custom_border=’av-border-thin’ custom_width=’25px’ custom_border_color=” custom_margin_top=’30px’ custom_margin_bottom=’30px’ icon_select=’yes’ custom_icon_color=” icon=’ue808′ font=’entypo-fontello’]

As a solution, Stephan et al. assume, that sequences of events in the scene are correlated, and thus will have a similar event-count, and when corrected for the time difference between the two line DVS, these events will happen very close to each other. For this reason, the stereo matching algorithm looks for events on a horizontal line with the minimal distance.

Sparse event-driven stereo matching algorithm looking for closest correspondences

Sparse event-driven stereo matching algorithm looking for closest correspondences

[av_hr class=’invisible’ height=’35’ shadow=’no-shadow’ position=’center’ custom_border=’av-border-thin’ custom_width=’25px’ custom_border_color=” custom_margin_top=’30px’ custom_margin_bottom=’30px’ icon_select=’yes’ custom_icon_color=” icon=’ue808′ font=’entypo-fontello’]

The advantages of this approach are, that no actual images are required and thus computation time is saved, temporal information of events is exploited, and the core algorithm can be implemented as fast matrix functions. The limitations of the system are, that it is sensitive to noise in low light conditions, and that an increase of sensitivity also increases the amount of data.

Example sparse 3D reconstruction (red = near, dark blue = far)

Example sparse 3D reconstruction (red = near, dark blue = far)

In the end of his talk, Stephan brought the TUCO-3D, which is a prototype model of the multi-perspective Dynamic Vision Sensor system, as well as some 3D glasses, for a great look and feel of the 360° 3D panoramic reconstruction. All in all, it’s a great step forward and we are very happy that Stephan shared his knowledge with our Computer Vision community.

As you can see in the images below, we had a fascinating 360° panoramic 3D experience 🙂

anyliner testing

[1] S. Schraml, A. Belbachir, and H. Bischof, “Event-Driven Stereo Matching for Real-Time 3D Panoramic Vision,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 466-474, June 2015

Hold a talk yourself!

You have a project or topic you would like to talk about or you know someone, who would like to share his/her experiences and knowledge? Please contact us!
Oh – and don’t forget to join our meetup group ! 😉

 

QUESTIONS? LET US KNOW!

If you have questions, suggestions or feedback on this, please don’t hesitate to reach out to us via FacebookTwitter or simply via [email protected]! Cheers!