Get the right number : Image Processing for Object counting

Thursday, August 7th, 2014

Automated counting applications for production lines are designed and developed to track, identify, separate and count products, and all sorts of objects in a bounded image area, and provide fast and highly accurate results. Many of today’s systems are built following traditional approaches to image processing which lowers their efficiency and accuracy. The automated counting system developed by A-Grade IT dedicated developers team for the Bakery factory is designed to locate, identify and count objects in the input image we get from video processing. One of the major advantages of this system is that it’s capable of separating touching products prior to counting, thus the results are highly accurate 99.5%.

Today’s post from one of our image processing experts describes the problem of traditional approaches and introduces a new one that eliminated  the issues that would arise otherwise, and touches upon other areas it could be applied to identify and count products.

Generally speaking image processing is considered as any form of signal processing with an image input, such as photo or video frames. The counting problem in image/video processing is the estimation of the number of objects in a still image or video frame. It arises in many real-world applications including cell counting in microscopic images, monitoring crowds in surveillance systems, or in our case it’s a problem of counting the baked products on the production lines.

The initial version of the production counting system was designed and developed in MATLAB, and had certain limits and restrictions.

  1. Rather low counting accuracy that was dropping even more, if the bakery products were touching and in uneven rows on the production line.
  2. Poorly optimized, the system required too many resources.
  3. Limited to processing only 4 video channels at once.

So, how can we solve it to get an accurate result?


Fig.1. Montage image built from first 25 video frames

I. Traditional image processing approach

Here we cover object counting following traditional approach to image processing, and what challenges we may face, if we were to do it.

First thing we need to do is to separate foreground and background. The easiest way would be to use color segmentation, however then we need to resolve the problem of uneven illumination. To avoid it, it is better to apply adaptive thresholding with Otsu segmentation algorithm for simple cases, using RGB channel. As a result we get a binary image (where every pixel equals 1 to refer an object, and 0 otherwise). This binary image will have a lot of artificial objects with a little square (about 1-10 pixels) that is why we have to remove them using median filtering or methods of the mathematic morphology.  In our case, for counting objects on the production line, morphological opening operation will be a better choice.

At this stage it is crucial to separate all objects into individual 4-connected areas. We define a pixel set as a 4-connected area where every pixel has at least one northern, southern, eastern or western neighbor.

Now that we have cleared the binary image that corresponds to the given frame, we move to the issue of counting the objects themselves. If we were to have one video frame, we could use a wave algorithm to count the products. However, we need to count them during certain time; hence every object will be present at several sequent frames. Therefore, we have to count every object that crosses certain imaginary line which is perpendicular to the moving direction.

01-frame - Copy 02-red - Copy 03-bw - Copy 04-bw - Copy

Fig.2. Image processing:

a) source color image; b) red channel of the source image;

c) Otsu threshold applied; d) cleared binary image

The main challenge for the traditional approach is to identify every object and track it to determine whether it has crossed given line or not. In other words, instead of counting the objects we need to track the object trajectories. And to do that you can apply one of the standard tracking techniques, Kalman filter etc., though it will require large computational resources and results are not that accurate. For example, the change in illumination causes errors in tracking or object recapturing. Those are clear disadvantages of the standard approach.

Sometimes they suggest using rectangular areas of interest instead of lines. The rectangle height is bigger than the object height, allowing us to count objects that are inside of the area completely. Still, this method works for counting similar objects that lay in the single row only. Therefore, we have to look for another way to count the objects on the production line.


05-line - Copy

Fig.3. A line of counting (a line of interest)

II. Our approach to image processing for object counting

Let’s consider the video as a sequence of frames which are arranged in a stack of paper sheets. As it was mentioned before, we have to count every object that crosses the given line.  Therefore, only pixels that are on this line for each frame should be taken into account. That is, we need to consider the frames cross-sectioning along the line. This method is known as a slit-scan camera algorithm and is widely used in the sport for registering a winner crossing the finish line, as well as for artistic purposes.

How does it work?

1) During certain period of time we just copy pixels along the given line from every frame and append them to the resulting image. Thus, we should place the camera in a way that would record objects moving strictly horizontally or vertically. In this case it is enough to copy a row (or column, respectively) from every frame.

222629 image processing

Fig.4. Slit-scan image formation

From the object counting perspective the resulting image is equivalent to the source video and can be considered as its hash, and it is something quite peculiar.

06 - Copy

Fig.5. Resulting image

2) The binary image is built from the resulting image in a way that was described above. Now it is easy to estimate background as a mean or median for each resulting image column.  Every column is the image of the same point, but in different points in time. To clear binary image and separate every object we can use the usual way.

32-s 33-red (1) 34-bw 35-bw

Fig.6. Image processing:

a) resulting color image; b) red channel of the resulting image;

c) Otsu threshold applied; d) cleared binary image

3) Now we have to find 4-connected areas amount using a wave algorithm.


Fig.7. Labeled 4-connected areas

And it wraps it up! We have a solution for objects counting on production lines.

In practice, though, we face some additional problems such as stitching of the resulting images. It is necessary to take into account the objects that are between resulting images. This task is solved efficiently using mathematical morphology, too.

39-p140-p2 (1)41-pp

Fig.8. Image stitching example:

a, b) two sequential resulting images, parts to be stitched are marked by color; c) result of the stitching


The main goal of this article is to show a new promising approach to implementing an automated counting system for production lines. What are the main advantages of the suggested approach?

  1. Low computational cost
  2. High accuracy: the counting accuracy reaches 99.5% , as it is capable of separating touching products in uneven rows
  3. Flexible and universal solution: our automated counting system can be applied to counting other manufactured goods on the production lines, including Pharmaceutical products, Food and beverages, Can counting, Part and component counting, etc.

Overall, the system developed with this approach proved to be simple, flexible and highly accurate, and could be applied to solve issues in other areas, such as road traffic identification and monitoring. And here is a little sneak peek into our future post that touches upon video processing, and how can we apply it to road traffic analysis.


ip-0 ip ip-2

Fig.9. Example of video processing for road traffic:

a) a video frame with the red line as a line of interest;

b) resulting image; c) background image estimation; d) detected foreground objects, namely cars, other objects were disregarded.

If you found this interesting, or have some questions to our expert Alex M., you can ask them in our social media listed below.