Summary of ImageNet Classification With Deep Convolutional Neural Networks


Until recently, datasets of labeled images were relatively small — on the order of tens of thousands of images. Simple recognition tasks can be solved quite well with datasets of this size, especially if they are augmented with label-preserving transformations. But objects in realistic settings exhibit considerable variability, so to learn to recognize them it is necessary to use much larger training sets.


They used a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool.


Their network achieves top-1 and top-5-test set error rates of 37.5% and 17.0%. The best performance achieved during the ILSVRC-2010 competition was 47.1% and 28.2% with an approach that averages the predictions produced from six sparse-coding models trained on different features.


Their results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning. They still have many orders of magnitude to go in order to match the infero-temporal pathway of the human visual system.

Personal Notes

It’s interesting to see the application of the Convolution and pooling methods, and also very pleasant to see what we’ve learned about optimization using dropout and data augmentation techniques. Even if their some concepts that I still need to understand correctly, this article brighten those concepts, it serves as a concrete example of how to use convolution on complex problematic with realistic settings. I’m hoping to see further more examples with Holberton School.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store