Convolutional Networks in Visual Environments.
The puzzle of computer vision might find new challenging solutions when we realize that most successful methods are working at image level, which is remarkably more difficult than processing directly visual streams. In this paper, we claim that their processing naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning with convolutional networks. The theory addresses a number of intriguing questions that arise in natural vision, and offers a well-posed computational scheme for the discovery of convolutional filters over the retina. They are driven by differential equations derived from the principle of least cognitive action. Unlike traditional convolutional networks, which need massive supervision, the proposed theory offers a truly new scenario in which feature learning takes place by unsupervised processing of video signals. It is pointed out that an opportune blurring of the video, along the interleaving of segments of null signal, make it possible to conceive a novel learning mechanism that yields the minimum of the cognitive action. Basically, while the theory enables the implementation of novel computer vision systems, it is also provides an intriguing explanation of the solution that evolution has discovered for humans, where it looks like that the video blurring in newborns and the day-night rhythm seem to emerge in a general computational framework, regardless of biology.
Publisher URL: http://arxiv.org/abs/1801.07110