Last time we tried to count the number of white pixels on a black image. Using a classification approach was fundamentally limiting the counter to the number classes (eg the number of output neurons). To get round this limitation I replaced the output layers with one output node + ReLU activation.


A simple network trained on the numbers 0 to 9 is able to predict the numbers 0 to 59.

What is we make the images bigger? How high can we count? I tried one 3×3 filter (same padding) followed by three successive 3×3 filters with stride 3×3, which quickly reduced the dimensions down to a small flattened layer. This did ok, but was hardly the 100% accuracy I demand!


Mostly convolutional CNN counts to within a few pixels of the correct answer.

Of course as this is a trivial problem we could cheat:


AveragePooling + one output node = 100% accuracy!

Next time: A harder problem, where average pooling won’t work.