Thresholding for Mobile OCR: An Introduction – Part 2
Last week we gave you an An Introduction to Binary, Truncate & To Zero Thresholding, which we hope you found useful! This blog post will dive a little deeper into the thresholding topic with Otsu Thresholding and Adaptive Thresholding. So let’s get started!
Otsu’s method of thresholding, named after Nobuyuki Otsu who first published this thresholding method in 1979, is used to automatically perform clustering-based image thresholding.
But what does that mean?
In global thresholding, one arbitrary value is used as threshold. So in order to get a good result image, we need to find the right threshold value which is basically a trial and error process. Since we want an automated thresholding algorithm, we need a better method to find the right threshold.
Consider a bi-modal image, an image whose histogram has two peaks (aka clusters). A good threshold value for such an image would be a value in the middle of those peaks, which is exactly what the Otsu method does. It automatically calculates a threshold value from the image histogram of a bi-modal image.
Explore computer vision with the free Anyline OCR SDK!
red line = threshold
|Otsu threshold image|
In case you are interested in more detailed information on how Otsu tresholding works, continue reading. Otherwise skip this section and directly continue with the code examples.
Variance is a measure of region homogeneity, which means regions with high homogeneity will have a low variance. Otsu’s algorithm searches for the threshold that minimizes the intra-class variance. In order to do so, one has to consider all possible thresholds and compute the variance for each of the two classes of pixels (i.e., the class below and above threshold).
Computing this intra-class variance for each of the two classes for each possible threshold involves a lot of computation, but luckily there is a much faster way. If the intra-class variance is extracted from the total variance of the combined distribution, the so-called inter-class variance is the result:
While the class means are computed like:
- The method assumes that the histogram of the image is bi-modal
- It breaks down when the two classes are very unequal (i.e. large size difference) which could result in two maxima for
- The correct maximum is not necessary the global one.
- The selected threshold should correspond to a valley of the histogram.
- The method does not work well with variable illumination.
To execute Otsu thresholding with OpenCV it is necessary to pass an additional flag (THRESH_OTSU) to the threshold() function as well as one of the five threshold types explained in the previous section. Simply pass 0 as a threshold value, it is omitted anyway. The algorithm will then find the optimal threshold value, which will be returned as value of type double. For maxValue it is possible to pass any non-zero value. This value will be assigned to every pixel greater than the threshold value. In this example we used 255 to get a black and white binary image.
using namespace cv; // Read image Mat src = imread("threshold.png", IMREAD_GRAYSCALE); Mat dst;// Otsu Thresholding thresh = threshold(src,dst, 0, 255, THRESH_BINARY | THRESH_OTSU);
red line = threshold
|Otsu threshold image|
In the previous algorithms we used one global threshold to binarize the image, which works fine if you have a relatively uniform background. However, a single threshold will not work well if there is a large variation in the background intensity due to shadows or the the direction of illumination.
In that case it is better to use Adaptive Thresholding (aka local, dynamic or areal thresholding).
|input image||binary thresholding|
thresh = 100
The idea of this algorithm is to partition the image into smaller sub-images and then calculate a different threshold for each sub-image. This approach might lead to sub-images having simpler histograms which will usually generate better results for images with uneven illumination.
OpenCV provides a function to perform adaptive thresholding:
double cv::adaptiveThreshold( cv::InputArray src // input image (8 bit, single channel) cv::OutputArray dst // result image double maxValue // the maximal (non-zero) value that can be assigned to output int adaptiveMethod // adaptive Thresholding algorithm (see Table 2) int thresholdType // use THRESH_BINARY or THRESH_BINARY_INV only int blockSize // size of pixel neighborhood e.g. 3,5,7,9,etc. double C // Constant subtracted from mean or weighted mean usually positive but may be 0 or negative as well );
There are two methods to calculate the weighted mean for the blockSize * blockSize neighborhood:
The threshold value T(x,y) is a mean of the blocksize * blocksize neighborhood of pixel (x,y) minus a constant value C.
The threshold value T(x,y) is a weighted mean of the blocksize * blocksize neighborhood of pixel (x,y) minus a constant value C . The pixel values closer to the center of the neighborhood have a higher weight when calculating the mean value.
using namespace cv; // Read image Mat src = imread("threshold.png", IMREAD_GRAYSCALE); Mat dst; // Set maxValue, blockSize and c (constant value) double maxValue = 255; int blockSize = 9; double c = 41; // Adaptive Threshold adaptiveThreshold(src, dst, thresh, ADAPTIVE_THRESH_GAUSSIAN_C, THRESH_BINARY, blockSize, c);
The following table shows the results of applying adaptive thresholding on the input image with different values.
|blockSize = 5|
c = 41
|blockSize = 7|
c = 41
|blockSize = 9|
c = 41
So far we only discussed thresholding based on grayscale images. However, it is also possible to threshold color images. This approach is called multilevel, multiband or simply multi thresholding and gradually gains more relevance with the increasing number of color documents. One approach is to designate a separate threshold for each of the RGB channels and then combine them with an AND operation.
This reflects the way the camera works and how the data is stored, but it does not correspond to the way that people recognize color. Therefore the HSL & HSV or CMYK color models are more often used which mostly require more sophisticated thresholding algorithms resulting in higher computational complexity.
These approaches are rather complicated and would be too extensive for this blog post but don’t hesitate to contact us if you have any questions!
This was our introduction on mobile thresholding. We hope we could give you a good and concise overview on this topic and that you stay tuned for more!