Computer Vision Meetup 1st Anniversary
We did it! We’ve finished our first year as founders and hosts of the first Computer Vision Meetup in Vienna! A year ago we thought it couldn’t be that hard to host a meetup because all you need for it is already there – or almost there: a good location, interesting people, thrilling talks, great marketing and free drinks. We found out it’s NOT that easy. There are interesting people and there is a community. But it was not so easy to get to the right people or find speakers who love to talk about their special knowledge or project.
But after all we found speakers with cutting-edge topics and our community grew to almost 500 members!
I think it’s time to say THANK YOU to all the speakers and members who participated in the Meetup the past year!
We had so many different and engaging topics. Reaching from Deep Learning to Augmented Realty and from a Demo of a 360° Panoramic Robot to the new Microsoft HoloLens!
Martin Čerman, our specialist for Deep Learning, presented the top 3 talks of 2016. If you want to read more about it, just click on the title and you will get to the whole roundup of the talks.
1st Place: HoloLens Intro + Demo by Peter Sperl
After celebrating our 1st birthday Georg Krispel presented his master’s thesis. The technology of text recognition on mobile devices is already very sophisticated but in real life there are still some tercumstance that can disturb the processing. In general cases you have an almost orthogonal view and a cutout to place the text in the right place. If no text is recognized a repetition is possible to get a validation. But if the problems still appear, the repetition doesn’t always help. These problems are mostly reflections and glares, poor lighting conditions or the resolution of the phone’s camera is too low.
|Perfect conditions during text recognition||Usual field conditions during text recognition||Really bad conditions during text recognition|
So the objectives of Georg’s master’s thesis were to:
- Evaluate the possibilities of mitigating these effects to improve overall text recognition results
- Exploit multiple frames available in the camera stream and their redundant information (Multiple Frame Integration)
- Implement the resulting pipeline on mobile hardware
Georg assumed for all experiments that the text is written on a nearly planar surface and the surface is properly textured for feature based tracking as well as that the camera is a bit in motion during processing.
The modular framework
Georg designed a modular framework, which allows to integrate the redundant information in all single frames. Its initialization process includes two threads in order to outsource computationally expensive tasks — the main thread and the text detection thread.
The text detection thread rectifies the planar surface and finds the position of the desired text within an entire frame. The obtained information is passed to the main thread which tracks the planar surface and thus the position of the text over time. The individual text patches are extracted and integrated as you can see in the image 2.
As already implied the framework consist of several exchangeable modules:
- Visual Tracking
- Scene Text Detection
- Multiple Frame Integration
- Text Recognition
During his evaluation he utilized and evaluated the approaches summarised as you can see below.
For visual tracking Georg integrated a pyramidal implementation of the KLT (Kanade-Lucas-Tomasi) feature tracker to get the homography of different keyframes. Another tracking was approached with the AKAZE features combined with the FLANN. Therefore the features of frames were extracted (AKAZE) and matched with the keyframe (FLANN – Fast Library for Approximate Nearest Neighbors).
The scene text detection includes;
– SWT (Stroke Width Transform), which uses the constant stroke width of characters,
– TS (TextSpotter), which classifies and groups based on the ER (Extremal Region) and
– ANPR (Automatic Number Plate Recognition), which recognises close edges of text by using morphological operations.
For the multiple frame integration (MFI) Georg first used a histogram voting, which groups similar results and categorises it by their recognition certainty. Secondly he used the minimum operator, which filters the lowest value pixels of respective positions for all extracted images. And finally he implemented the Yi Integration to get more contrast and distinguish between text and background.
Multiple frame integration (MFI) methods in general can be divided into two approaches, image enhancement and result fusion methods. They are illustrated in the following image.
Georg tried both, two image enhancement methods (Simple minimum operator as well as an contrast increasing integration method introduced by Yi et.al.) and a result fusion method, which performs a simple histogram based voting and chooses the recognition result with the highest number of occurrences.
To evaluate the impact from these MFI approaches on the actual recognition results, the great Anyline SDK was utilized.
The impact from MFI approaches on the recognition results
For evaluation the use case of energy meter readings are assumed and a proper dataset was created. To compare the detection and tracking performance with full-detection approaches, Georg utilized the CLEAR-MOT evolution framework. It provides meaningful and intuitive measurements, which describe the performance of a multi-object tracker.
Further, the recognition results were compared with ground truth data. Thereby, the individual text patches (left) were passed to the OCR engine as well as their integrated pendants (right).
Even though the minimum operator and YI integration are able to increase the recognition rates (see Image), they highly depend on good image registration (ECC in Figure). However, in all cases the histogram voting delivered the best results.
The thesis shows that it is possible to perform MFI, whilst achieving real-time performance on mobile hardware. Furthermore the multi-thread detection and tracking approach can keep up with full-detection approaches in terms of tracking accuracy. And most important, an increase of the recognition rates is possible: result fusion methods are more promising (and practical if the recognition is fast enough) compared to the tested image enhancement methods.
We hope you had just as much fun as we had in our first year of the Computer Vision Meetup in Vienna and we hope we will see all of you and your friends at our first 2017 edition on the 25th of January!
Hold a talk yourself!
You have a project or topic you would like to talk about or you know someone, who would like to share his/her experiences and knowledge at our Computer Vision Meetup? You want to sponsor a meetup? Please contact us!
It is great to see how our community is growing each month, so if you haven’t done it already – don’t forget to join our meetup group ! ;)
We’ve just recently started live streaming the meetup, we usually post the link once the meetup starts! Please let us know if you have suggestions about how we could improve the meetup, happy to hear your feedback!