Thresholding for Mobile OCR – An Introduction: Part 3
In the first and second parts of our blog post series on thresholding, we presented an overview of different thresholding techniques, including a basic global threshold, the Otsu method, and Adaptive Thresholding.
While these methods are suitable for many use cases, some scenarios require more sophisticated algorithms to properly separate the background and foreground information of a scanned image.
This blog post gives a first introduction to one of the advanced thresholding techniques, namely the Contrast Threshold.
Unlike the previous blog posts, we will try to focus more on the explanation of the general idea behind the algorithm, than on implementation details.
But, enough introduction for now, let’s get this started.
Setting the scene
Picture yourself in a situation where you want to scan something like…a debit card. This could be useful for a lot of things. For example you and your friend could share account information, and your friend could quickly transfer the money for the last round of beer you just paid.
Unfortunately though, the bank institutes issuing debit cards like to put extravagant designs on the cards, as it can be seen in the image below. This happy little debit card represents a headscratcher in terms of thresholding, as we will see in the following sections. But, since you are already interested in thresholding, you might have an idea why.
Our example card
For those impatient developers among you, that do not care at all what doesn’t work and why it doesn’t work (which is partially true for me as well): you can either get yourself some coffee and wait until we are done with this part, or skip directly to the section where we explain what works best.
Okay, you are still here. Great.
In the following part we will see that while some of the results look astonishingly good on the first impression, there are usually some details that ruin the pretty picture. Especially if you consider that the resulting image has to be fed to an OCR engine, which needs clearly distinct character shapes. But then again, maybe we are just too demanding, who knows?!
First, let’s give the global threshold a shot on this. Maybe we are lucky, this blog post is over and we can all go home with profound knowledge that global thresholding is almighty.
Short answer: no. Long answer: still no, but see below.
Global Thresholding, not quite it.
In this case, with a threshold of 140 (which is already the best fitting value for this image – just for this), the M is misshaped, the G in the last name is not fully connected, and the 4 is not really a 4, because it is lacking its lower part.
Okay, that would have been too easy anyway.
The next method on the list is the Otsu method. And it shows the same effects as the global threshold – only, the problems have become even worse.
The result of the Otsu Method. Is HATTHIAS?
I guess you can see why. Let’s not waste more time on this.
So, let’s go on with the next method on the list, which is the Adaptive Threshold. And we get some fairly good results on the first impression too. The M looks more or less like an M, the G is connected, and the 4 looks like…well, like how a 4 is supposed to look like.
But then again, it is almost impossible to get rid of the strokes between the symbols. These strokes could be misinterpreted by the OCR engine as symbols, and therefore ruining our whole process.
Adaptive Threshold, looking good, but all those strokes….
So what now, is that it?
Isn’t the Adaptive Threshold the most powerful thresholding method? Is there any other method that is able to solve that problem? The answers are: No, No, and Of course there is. Actually there are plenty. And quite frankly, this blog post would be rather useless if there weren’t.
And that’s the part where it gets interesting. Because now, after all this waffle about what doesn’t work, we finally get to the point where we talk about what works.
Note: We are aware that with the best and most specialized parameter setting, this card could be thresholded to an acceptable result with the previous methods. But you have to keep in mind that we don’t only need to threshold this card in this exact light situation. We have to find a universal method, which can be used in a variety of situations and cards.
The contrast thresholding technique we present here is an adaption of the method described by Su, Bolan and Lu, Shijian and Tan, Chew Lim in Binarization of Historical Document Images Using the Local Maximum and Minimum [http://dl.acm.org/citation.cfm?id=1815351].
We will first give a general overview of the algorithm, and then go into detail of how each step works and what output it produces.
An overview of Contrast Thresholding
First things first: A general overview about the algorithm.
In the first and most important step, a contrast image is created, which uses normalisation to ensure that the resulting contrast is more or less independent of changes in the background of the image.
This contrast image is thresholded with the Otsu method in the next step. This way, only the so called high contrast pixels are highlighted in the image. Sounds good already. But it doesn’t stop here.
After that, each pixel in the original (greyscale) image is revisited, and checked against the number of high contrast pixels in a different window. If that number is above a threshold value, and the pixel value is within the mean and standard deviation of the high contrast pixels values, it is considered a text pixel.
Yes, it is as simple as that. I have good news for those of you that, like me, didn’t get it on the first reading (I know…I wrote this): We will go into detail now.
For the next steps, we will look closely at two pixels in the image. While the red pixel will try to fool us into thinking it is a text pixel, the blue one actually represents a one. The challenge is to identify and eliminate the pretender, while keeping the legitimate pixel.
Our example pixels. red = hypocrite, blue = text
Note: The graphical representations in this post are not 100% scientifically accurate. They are used as visual guidance, and would not withstand a scientific validation.
Step 1: Creating the contrast image
- Create an empty image (we’ll call it contrast image for now)
- Walk through the original image, and for every pixel I(x,y):
- find the maximum pixel value f_max(x,y) in a window of size WxW around the pixel
- Find the minimum pixel value f_min(x,y) in a window of size WxW around the pixel
- Set the value I_c(x,y) in the contrast image to
This way, each pixel in the newly created contrast image will represent the contrast in a window around it.
The denominator in the formula above is the part that does all the magic. If you look at our card, the background is changing constantly, while the value of text pixels stays more or less the same. Without the normalisation of the denominator, text on bright background areas would have a much higher contrast value than the text on darker background areas. The denominator compensates this difference in the background.
For text pixels in bright background areas, the denominator is large, which compensates the large nominator. For text pixels in darker background areas, the denominator is smaller, and therefore reduces the effect of the nominator being small as well.
Oh, and epsilon is really just there to avoid a division by 0. No magic there.
Now that was the math about it, and, the good news is that the hardest part is over already. Well done!
Let’s get back to our red and blue pixel, and see how they are doing.
And we can see that, while both of them are still present in the contrast image, the blue pixel on the text boundary is much brighter than the red pixel on the background stroke.
The red and blue pixels with their corresponding windows. Both are still present after the first step.
Step 2: Creating the high contrast image
There is not much to say about this step. Otsu just thresholds the contrast image, and that is it. The result however is quite impressive, as it eliminates lower contrast pixels and leaves only the high contrast pixels.
In our example, the red pixel is gone and the blue pixel shines brighter than ever. It is looking good, but we are not quite there yet.
The red and blue pixels before and after the Otsu threshold.
Step 3: Could you get to the point, please?
The initial image and the final result
Now that we created the high contrast image, we just put it aside for a moment, and switch back to the original image. Back in there, each pixel has to answer some questions now.
The questions are:
- Are there any high contrast pixels near by?
- If so, how many are there?
- And how is my pixel value compared to the high contrast pixels near by?
Mathematically speaking this means:
- Define a windows size W
- Define a Threshold N_min of minimum high contrast pixels that have to be within the window WxW
- For each pixel in the original image:
- Count the number N_e of actual high contrast pixels within WxW
- If N_e > Nmin:
- Calculate the mean E_mean of all high contrast pixel values (the pixel values in the original image)
- If the pixel value I(x,y) is within the standard deviation E_std of the E_mean, it is a text pixel, and therefore black
And…we are done. That is all about the theory.
Let us have a look on our hypocrite pixel and first check how many high contrast pixels are within its window. If we are kind and define a low threshold N_min, it could be that it has enough high contrast pixels in its window. But the pixel value itself is rather bright – so it fails that test. There is no way around it: this pixel is not a part of a character or number or anything similar – so long and thanks for all the fish!
The blue pixel is doing a lot better. First, it has many high contrast pixels in its window, so it would even survive a strict N_min threshold. Second, its pixel value is darker than the red one, so it passes this test as well. After this, it is considered a text pixel and therefore black in the resulting image.
A closeup on the final result. The red pixel was eliminated, but the blue pixel is still there.
Are we done now, please?
We are almost done with our blog post for today.
As it can be seen in the image above, the Contrast Threshold produces impressive results compared to the previous three thresholding methods when used on more difficult images with a more noisy background.
By taking the normalized contrast of an image into account, the algorithm is mostly robust towards background changes . This makes it a perfect candidate for the debit card example we have seen here.
On the downside, it takes more time to compute the final image than for example the Adaptive Threshold. And there are more parameters that have to be fine-tuned in order to achieve a good result.
We may revisit this thresholding technique in a later post, to show some possible enhancements. But I think that is enough for today.
I hope you enjoyed reading it and will be back soon for further insights on the Anyline Dev blog.
QUESTIONS? LET US KNOW!