2017 was a good year for North America, more than doubling the number of states and provinces that I’ve visited. It was a bad year for new countries (zero) and world heritages sites (seven).
Last time we tried to count the number of white pixels on a black image. Using a classification approach was fundamentally limiting the counter to the number classes (eg the number of output neurons). To get round this limitation I replaced the output layers with one output node + ReLU activation.
What is we make the images bigger? How high can we count? I tried one 3×3 filter (same padding) followed by three successive 3×3 filters with stride 3×3, which quickly reduced the dimensions down to a small flattened layer. This did ok, but was hardly the 100% accuracy I demand!
Of course as this is a trivial problem we could cheat:
Next time: A harder problem, where average pooling won’t work.
I aim to train a CNN (convolutional neural network) to count (let’s say up to 100), starting with counting the number of white pixels on a black image. I start by making a classifier, trained on images that contain 1, 2, 3, 4 or 5 white pixels.
Results after training for 10 epochs on 12,800 images are pretty good – y is the true value (the label), y_hat is the prediction.
Sadly this does less well on a test set that includes higher numbers (below). Dr R suggested I adopt a “1, 2, 3, 4, many” approach, which would give good accuracy, but I think would be a bit unsatisfying.
I used one 3×3 convolutional filter, then a couple of dense layers. I expected the filter to converge to one high value surrounded by low values — the perfect shape for picking out white pixels surrounded by dark ones. This wasn’t what I found, as shown by these three example, which have been stretched so that abs(max(W)) = 100.
I guess that it doesn’t really matter which convolution you use, provided you understand the output!