Project Description

Computer Vision Meetup 1st Anniversary

Computer Vision Meetup Anyline

We did it! We’ve finished our first year as founders and hosts of the first Computer Vision Meetup in Vienna! A year ago we thought it couldn’t be that hard to host a meetup because all you need for it is already there – or almost there: a good location, interesting people, thrilling talks, great marketing and free drinks. We found out it’s NOT that easy. There are interesting people and there is a community. But it was not so easy to get to the right people or find speakers who love to talk about their special knowledge or project.

Computer Vision Meetup - Introduction by Daniel Albertini

Our CTO Daniel Albertini welcomed everyone to the first Computer Vision Meetup a year ago!

But after all we found speakers with cutting-edge topics and our community grew to almost 500 members!

I think it’s time to say THANK YOU to all the speakers and members who participated in the Meetup the past year!

We had so many different and engaging topics. Reaching from Deep Learning to Augmented Realty and from a Demo of a 360° Panoramic Robot to the new Microsoft HoloLens!


Martin Čerman, our specialist for Deep Learning, presented the top 3 talks of 2016. If you want to read more about it, just click on the title and you will get to the whole roundup of the talks.

3rd Place: Architectural Style Classification by Dr. Gayane Shalunts

2nd Place: Deep Learning Introduction by René Donner & 360° Panoramic Vision with DVS by Stephan Schraml

1st Place: HoloLens Intro + Demo by Peter Sperl

Multiple Frame Integration for OCR on Mobile Devices

After celebrating our 1st birthday Georg Krispel presented his master’s thesis. The technology of text recognition on mobile devices is already very sophisticated but in real life there are still some tercumstance that can disturb the processing. In general cases you have an almost orthogonal view and a cutout to place the text in the right place. If no text is recognized a repetition is possible to get a validation. But if the problems still appear, the repetition doesn’t always help. These problems are mostly reflections and glares, poor lighting conditions or the resolution of the phone’s camera is too low.

Perfect conditions during text recognitionUsual field conditions during text recognitionReally bad conditions during text recognition
Perfect conditions during text recognitionUsual field conditions during text recognitionReally bad conditions during text recognition

So the objectives of Georg’s master’s thesis were to:

  • Evaluate the possibilities of mitigating these effects to improve overall text recognition results
  • Exploit multiple frames available in the camera stream and their redundant information (Multiple Frame Integration)
  • Implement the resulting pipeline on mobile hardware

Georg assumed for all experiments that the text is written on a nearly planar surface and the surface is properly textured for feature based tracking as well as that the camera is a bit in motion during processing.

Pipeline Processing Example

The modular framework

Georg designed a modular framework, which allows to integrate the redundant information in all single frames. Its initialization process includes two threads in order to outsource computationally expensive tasks — the main thread and the text detection thread.

The text detection thread rectifies the planar surface and finds the position of the desired text within an entire frame. The obtained information is passed to the main thread which tracks the planar surface and thus the position of the text over time. The individual text patches are extracted and integrated as you can see in the image 2.

As already implied the framework consist of several exchangeable modules:

  • Visual Tracking
  • Rectification
  • Scene Text Detection
  • Multiple Frame Integration
  • Text Recognition

During his evaluation he utilized and evaluated the approaches summarised as you can see below.

Modular Frameworks

For visual tracking Georg integrated a pyramidal implementation of the KLT (Kanade-Lucas-Tomasi) feature tracker to get the homography of different keyframes. Another tracking was approached with the AKAZE features combined with the FLANN. Therefore the features of frames were extracted (AKAZE) and matched with the keyframe (FLANN – Fast Library for Approximate Nearest Neighbors).

The scene text detection includes;

–  SWT (Stroke Width Transform), which uses the constant stroke width of characters,
–  TS (TextSpotter), which classifies and groups based on the ER (Extremal Region) and
–  ANPR (Automatic Number Plate Recognition), which recognises close edges of text by using morphological operations.

For the multiple frame integration (MFI) Georg first used a histogram voting, which groups similar results and categorises it by their recognition certainty. Secondly he used the minimum operator, which filters the lowest value pixels of respective positions for all extracted images. And finally he implemented the Yi Integration to get more contrast and distinguish between text and background.

Multiple frame integration (MFI) methods in general can be divided into two approaches, image enhancement and result fusion methods. They are illustrated in the following image.

MFI Approaches

Georg tried both, two image enhancement methods (Simple minimum operator as well as an contrast increasing integration method introduced by Yi and a result fusion method, which performs a simple histogram based voting and chooses the recognition result with the highest number of occurrences.

To evaluate the impact from these MFI approaches on the actual recognition results, the great Anyline SDK was utilized.

The impact from MFI approaches on the recognition results

For evaluation the use case of energy meter readings are assumed and a proper dataset was created. To compare the detection and tracking performance with full-detection approaches, Georg utilized the CLEAR-MOT evolution framework. It provides meaningful and intuitive measurements, which describe the performance of a multi-object tracker.

Further, the recognition results were compared with ground truth data. Thereby, the individual text patches (left) were passed to the OCR engine as well as their integrated pendants (right).

Single Frames vs. Respective Integrations

The Conclusion

Even though the minimum operator and YI integration are able to increase the recognition rates (see Image), they highly depend on good image registration (ECC in Figure). However, in all cases the histogram voting delivered the best results.

OCR Reading Accuracy

The thesis shows that it is possible to perform MFI, whilst achieving real-time performance on mobile hardware. Furthermore the multi-thread detection and tracking approach can keep up with full-detection approaches in terms of tracking accuracy. And most important, an increase of the recognition rates is possible: result fusion methods are more promising (and practical if the recognition is fast enough) compared to the tested image enhancement methods.

We hope you had just as much fun as we had in our first year of the Computer Vision Meetup in Vienna and we hope we will see all of you and your friends at our first 2017 edition on the 25th of January!

Hold a talk yourself!

You have a project or topic you would like to talk about or you know someone, who would like to share his/her experiences and knowledge at our Computer Vision Meetup? You want to sponsor a meetup? Please contact us!
It is great to see how our community is growing each month, so if you haven’t done it already – don’t forget to join our meetup group ! ;)

We’ve just recently started live streaming the meetup, we usually post the link once the meetup starts! Please let us know if you have suggestions about how we could improve the meetup, happy to hear your feedback!


If you have questions, suggestions or feedback on this, please don’t hesitate to reach out to us via FacebookTwitter or simply via [email protected]! Cheers!