Least Uninteresting Number: More comparisons between Statistics and ML

This is a continuation of a post I made about differences between statistics and ML.

I'm not intentionally trying to piss people off ("How dare you imply that we are not as good as those other guys") but I suppose some things might be provocative and arguable. All generalizations are false but a dog with three legs is still a dog ("Are you calling me a dog? How dare you!"). Isn't the point here really that stats and ML have quite a bit in common? Also, I use 'data' as a mass noun "the data is consistent with an increase in effect". Like 'water', I use it grammatically as singular. So there.

Knowledge doesn't come to us in a package; it is discovered piece by piece, following the path of least resistance, with no overarching systematic plan to fill out. Afterwards, the stories are made coherent and clean. and oversimplified for the textbooks. Also, different people in different academic cultures may explore the same things but with different basic tools. Some people call themselves X, some call themselves Y, they both do Z. But X and Y never communicate, not because they are competitors but because their motivations, their culture, the building they are housed in on campus, are so very different, they just aren't even aware of the other's existence.

Statistics started in the 1800's with government and economic numbers, but then also sociology (Quetelet), and then at the beginning of the 1900's with agronomy (Fisher) before it then exploded in every natural science (medicine, psychology, econometrics, etc). Though it started from applications, the mathematics behind it (I blame Pearson?) came from mathematical analysis (all those normal curves and beta distributions are special functions of analysis). Everyday statistics is making hypotheses, doing a t-test, p-values, most likelihood estimators, Gamma distributions. The point of statistics is to take a lot of data and say one or two small things about it (x is better than y).

ML (machine learning), very distinctly, came out of the cybernetics/AI community, a mix of electrical engineers and computer scientists each of which have their own subcultures but closer to each other than they are to statistics. The mathematics behind ML came out of numerical analysis and industrial engineering, decision trees and linear algebra, linear programming. Everyday ML is neural networks, SVMs. The point of ML is to engineer automatic methods to take lots pf data (like pixels in a picture or a sound pattern) and convert that to a label (what the picture is) or text sequence.

The cultural overlap is basic data munging, data visualization, and logistic regression.

I think the primary social difference (which leads to a few technical differences) is the following. Stats is much older and has tried to solve a few problems very very well. They try to take as little data as possible (because they were historically constrained computationally) and determine knowledge. A lot of statistical consulting is judging the study design, determining what can be known with what probability and what assumptions (like prior distributions) restrict what can be known with what reliability. ML is much newer; expects lots of computational power. It often overlooks lessons learned by stats.

But then stats is a bit held back by its insistence on blind rigor. ML is creating techniques that very successful without worrying about the foundations, about what a p-value is a probability of, or whether it is a probability at all.

What they actually do

Statistics is the science of analysis of data: mean and standard deviation (descriptives, what the data looks like), distributions (eg normal, Chi-squared, Gamma, Poisson), p-values, hypothesis testing, type I/II errors, t-tests and ANOVA, regression and general linear models. Its foundations are probability theory which is applied measure theory which is applied analysis (distributions turn out to be mostly special functions). Concerns: significance, p-value, confidence intervals, power analysis, correct interpretation of data and inferences. There are principles

Machine Learning is almost entirely methods for solving prediction problems. Instead of a human looking through a set of data and eye-balling what the pattern is, let the algorithm look at way more instances than is humanly possible to get the pattern. Most of the methods are ad hoc: neural networks, naive Bayes, SVM, decision trees, random forests. There are no principles. Sorry, there is not the depth of principles that statistics has, except when it borrows those principles.

Misnomers

Both labels are misnomers. Statistics sure is used to study states and governments, but is overwhelmingly the province of (a very weird subset of) mathematics.

Machine Learning does include some learning techniques (in the Active Learning area where real time data feeds supply and modify the model), but is primarily a relabeling of Pattern Recognition (which is a more accurate name, somewhat closer to the prediction methods of complex models, the pattern in general being a very specific kind of model).

View from the outside

From the outside, statisticians are consultants for the research community for agronomy, econometrics, medicine, psychology, any academic science or applied version that takes a lot of data and (interestingly it is the softer sciences like psychology and sociology that send their grad students to the statistics departments for instruction, but the physicists and chemists, even though they may individually use some regressions, don’t usually depend on a statistician even thigh they may do a regression or two. Maybe they think they know enough to do it themselves?). Either way, ML people make more money, I don't know why.

In industry (applied)

Statisticians are employed for quality control. This is their primary act as working statisticians. Taking samples of products, calculating error rate. ML people are more directly part of creating machines that do things in a fancy way, building things that work, like an assembly line robot for cars or zip code reader for handwritten mail.

In academia

Statistics is concentrated in an academic statistics department (Often attached to a mathematics department or ag school) or as a group of consultants for agronomists or medical research.

ML is concentrated in the AI section of a CS department or sprinkled throughout the engineering departments (robotics in MechE, EE (they do everything!). Or in real life in lots of industries, speech recognition, text analytics, vision.

Of course, there are some individuals who probably consider themselves in both camps (Breiman, Tibshirani, Hastie. What about Vapnik)?

Controversies

This has mostly been controversail as to what the differences are because of the tension between trying to assume they are the same but showing where the cultures make them different. Instead this is about the controversies within each.

In statistics, there has been a great internal controversy between frequentism vs Bayesianism. Frequentism is for lack of a better way of saying it, the traditional p-value analysis. Bayesiansim avoids these somewhat with the added controversial notion of allowing an assumed prior distribution set by the experimenter.

Less controversial though is the tension between descriptive statistics (or data exploration) and hypothesis testing.

ML is mostly Bayesian by default since arely are assumptions made about the distribution (or any investigation whatsoever about the effects of the distribution) and MCMC (Monte Carlo Markov Chain). The biggest controversy is between rule based learning and stochastic learning. The success of neural networks in the mid 80's (and the success of 'Google' methods in the 2000's) has largely killed rule learning except for maybe decision tree learning and association rules.

Notation

Usually stats is the old fogie and ML is the uncultured upstart, but in mathematical notation it is different. ML, coming out of engineering, uses more traditional mathematical notation. Though nominally more closely connected to mathematics practice, statistics uses a bizarre overloading of notation that no one else in math uses. For probabilities, distributions, vectors and matrices. Every thing element has multiple meanings, context barely tells you what's the right syntax.

Random notes

ML is almost entirely about prediction, in stats there’s quite a bit else other than that.
ML is almost entirely Bayesian (implicitly). Explicit Bayesianism is out of Stats. Frequentism, traditional statistics, is what most applied statistics uses.
Stats is split into descriptive and inferential meaning either simplify the entirety of some data into a few representative numbers, or judge if some statement is true.. Descriptive creates patterns/hypotheses that then the inferential judges how good the patterns/hypotheses are
predictions vs comparisons. ML is almost entirely predictive. Stats spends a lot of time on comparisons (is one set different from another, is the mean (central tendency) of one set significantly different from that of another)
Leo Breiman also explained a distinction between algorithmic and data modeling which I think maps mostly to ML and stats respectively

How they're the same

I consider ML to be an intellectual subset of stats, taking a lot of data and getting a rule out of it no matter what the application. Whatever things get labeled ML, they really should have a statistical analysis (to be good), and statisticians should be willing to call these methods statistical. So what if they're in different departments.

Least Uninteresting Number

Saturday, September 26, 2015

More comparisons between Statistics and ML