I aim to train a CNN (convolutional neural network) to count (let’s say up to 100), starting with counting the number of white pixels on a black image. I start by making a classifier, trained on images that contain 1, 2, 3, 4 or 5 white pixels.
Results after training for 10 epochs on 12,800 images are pretty good – y is the true value (the label), y_hat is the prediction.

Classifier trained on 12,800 images. These are from the 500 image test set. Test accuracy: 100%
Sadly this does less well on a test set that includes higher numbers (below). Dr R suggested I adopt a “1, 2, 3, 4, many” approach, which would give good accuracy, but I think would be a bit unsatisfying.

Performance less impressive on higher numbers.
Network architecture
I used one 3×3 convolutional filter, then a couple of dense layers. I expected the filter to converge to one high value surrounded by low values — the perfect shape for picking out white pixels surrounded by dark ones. This wasn’t what I found, as shown by these three example, which have been stretched so that abs(max(W)) = 100.
I guess that it doesn’t really matter which convolution you use, provided you understand the output!
Network used: