Tuesday, May 31, 2016

Deep Learning: Not as good, not as bad as you think.

Deep Learning is a new (let's say 1990, but common only since 2005) ML method for identification (categorization, function creation) used mostly in vision and NLP.

Deep Learning is a label given to traditional neural nets that have many more internal nodes than ever before, usually designed in layers to feed one set of learned 'features' into the next.

There's a lot of hype:

Deep Learning is a great new method that is very successful.

but

Deep Learning has been overhyped.

and even worse:

Deep Learning has Deep Flaws

but

(Deep Learning's deep flaws)'s deep flaws

Let's look at details.

Here's the topology of a vision deep learning net:

(from Eindhoven)

Yann LeCun
What's missing from deep learning?
1. Theory 2. Reasoning, structured prediction 3. Memory, short-term/working/episodic memory 4. Unsupervised learning that actually works

From all that, what is it? Is DL a unicorn that will solve all our ML needs? Or s DL an overhyped fraud?

With all such questions, the truth is somewhere between the two extremes, we just have to figure out which way it leans.

Yes, there is a lot of hype. It feels like whatever real world problem there is, world hunger, global warming, DL will solve it. That's just not the case. DL's are a predictive model machine, very good at learning a function (with lots of training data). The function may be yes or no, or even a continuous function, but still it's take an input and give an output that's likely to be right or close to right. Not all real world problems fit that (parts of them surely do, but that's not 'solving' the real world problem.

Also, DL's take a lot of tweaking and babysitting. There are lots of parameters (number of nodes, topology of layers, learning methods, special gimmicks like autoencoding, convolution, LSTM, etc etc with lots of their own params). And there are lots of engineering methods that have made DLs successful, but these methods aren't specific to DL. Lots of better data, better software environments, super fast computing environments, etc etc.

However, there are few methods nowadays that are as successful across broad applications as DL. They really are very successful at what they do and I expect lots of applications to be improved considerably with a DL.

Also, for all the tweaking and engineering that needs to be done (as oppose to the comparatively out of the box implementations of regression, SVMs and random trees), there are all sorts of tools publicly available to make that tweaking much easier: Caffe, Theano libraries like Keras or LasagneTorch,  Nervana’s Neon, CGT, or Mocha in Julia.


So there are lots of problems with DLs. But they're the best we have right now and do stunningly well.

No comments: