Artificial intelligence is getting some better perspective. Like a person who can read someone else’s penmanship without studying lots of handwriting samples, next-gen image recognition AI can more easily identify familiar sights in new situations.
Made from a new type of virtual building block called capsules, these programs may cut down the enormous amount of data needed to train current image-identifying AI. And that could boost such technology as machine-made medical diagnoses, where example images may be scarce, or the responsiveness of self-driving cars, where the view is constantly shifting. Researchers with Google will present this new version of an artificial neural network at the Neural Information Processing Systems conference in Long Beach, Calif., on December 5.
Neural networks are webs of individual virtual nerve cells, or neurons, that learn to pick out objects in pictures by studying labeled example images. These networks largely classify pictures based on whether they contain certain features. For instance, a program trained on a series of head shots might conclude that a face has two eyes, a nose and a mouth. Show that program a face in profile with only one eye visible, though, and it may not recognize the photo as a face, explains Roland Memisevic, a computer scientist at the University of Montreal who was not involved in the work.
To overcome that limitation, researchers can train a neural network on millions of photos from myriad angles, and the program memorizes all the different ways a face might look. Compared with the human brain, which doesn’t need anywhere near a million examples to know what a face looks like, this system is wildly inefficient. “It’s a disaster,” Memisevic says. “Capsules try to fix that.”
Instead of webs of individual artificial neurons, these new programs have webs of clusters of neurons, called capsules. These teams of neurons can provide more information than one neuron by itself. Each capsule is designed to track not only whether a certain feature is in an image, but also properties of that feature — say, a nose’s size, orientation and position. This spatial awareness helps the program better recognize objects in previously unseen scenarios.
A capsule-containing network trained on head shots could see a face in profile and deduce — based on the appearance of the visible eye, nose and mouth — that the other eye is simply obscured, and the picture depicts a face. Since capsule networks are better at applying what they know to new situations, these neural networks need less training data to achieve the same performance as their predecessors, says Sara Sabour, a computer scientist with Google Brain in Toronto.
Sabour and her colleagues trained one capsule network on images of handwritten numbers and tested it on pictures where each number was slightly distorted. The capsule network recognized the warped images with 79 percent accuracy; a typical neural network trained on the same amount of data only got 66 percent right.
In another experiment, Sabour and colleagues trained a similar capsule network on tens of thousands of photos of toys, and then asked it to recognize the toys from new viewpoints. In this challenge, reported in a paper submitted to the 2018 International Conference on Learning Representations in Vancouver, the capsule network was wrong only about 1.4 percent of the time. A conventional neural network made almost twice as many errors.