Francois Chollet: The limitations of Deep Learning. (17 July 2017.)

Chollet has an insightful geometric explanation about internal workings of DL I had not fully internalized:

At their core, deep neural nets still ‘merely’ implement smooth transformations on the input data (implemented via the several layers deep NN). On some level, I ‘knew’ this. However, Chollet’s explanation helped me to understand what kind of fundamental limitations that imposes on the functions or programs that can be learned with DL.

(See the metaphor about taking a sheet of paper, and crumpling it into a very complicated ball.) And how this “crumpling” is done is guided by the training examples , and only by them.

To learn any robust function , for example that classifies images to labels , we need a stunning amount of samples. In more precise language, we need a dense (representative) sample from the space of images we want to classify, because DNs generalize only on the training data points and (moreover) DNs can generalize only in a quite specific (limited) way.

(I was also surprised how this blog post of Chollet’s (written yesterday!) that I noticed by chance on the totally unrelated forum is very related to the speculative musings (fi) I wrote yesterday: most importantly, it explains why some of my thoughts were probably misguided. There are bounds on what kind of abstractions can be achieved with merely forming ‘geometric transformations’ (as Chollet calls them). Mainly, the features that can be learned probably can’t be very abstract, in the human cognitive sense of the word. Especially what kind of features the later layers in a visual processing DNN “chain” will learn – that is going to be very much influenced by the programmer’s choices of 1) the architecture (ie how is restricted; it is a less of a black box than it may sound) and 2) the precise nature of the target space .)