It's 2015. Google has been in business almost 20 years (since 1998). Their primary tech model for web page search is (still) to split all findable web pages by words, make a word/concept vector for each page using the entire corpus of words and rank a word search using links to other web pages (the PageRank algorithm). It is primarily a data driven process rather than an expert driven process. Of course there are lots of add ons, hand-tweaking, special cases, but at the center is an automated process of using what words people have used and what they've linked to.
It's 2015. Almost 60 years ago, Chomsky revolutionized linguistics (assuredly pursuing already existing trends in structuralism). Though the revolution started in substance with 'Syntactic Structures', he also put in a a mortal wound to (already dying) behaviorism with his critique of Skinner's Verbal Behavior. Skinner's thesis was that people learned language by example, by mimicking the thousands of utterances, assimilating the patterns heard. Chomsky's critique was that, in so many ways, this was wrong. The variety of known languages had a narrow set of commonalities not explainable by the broad possibilities allowed by reaction conditioning (fixed action patterns, operant conditioning, stimulus response). People seemed to learn from negative information. Chomsky's criticisms were so incisive and convincing
So behaviorism went out of favor, and Chomsky's linguistics is still like Newtonian mechanics; even if there's a linguistic Einstein, it will only be a slight correction to the very accurate approximation that is Chomskian language theory.
But Google. It is so obviously successful (sure as a company, but their search capability). More relevantly, Google Translate is essentially an ngram analysis. It is really good! Surely it makes many groaning mistakes on uncommon languages, but as time goes by it is more and more successful. It is a poster child for behaviorism. It is behaviorism embodied in a machine. And it works so well.
The point is that Chomsky won 60 years ago with, not exactly rationalism but more like introspection (which had lost earlier). And now Google is winning with pure un-introspected descriptivism.
But why? Why is statistical NLP so (currently) successful (when based on single words or short sequences of words, not constituent phrases) and syntactical NLP (POS, parse trees) is, well, not exactly wrong, just not terribly useful?
Maybe it's machine performance and having lots of data.
Maybe the words themselves have a lot of meaning and just being in the same sentence is enough.
Maybe understanding parse trees is very informative but only under toy conditions; the bag of words has the bulk of the information.
Google is killing Chomksy. It's like saying that Popper killed Logical Positivism. He did, sort of, but along with others, and anyway it's not really dead, just a basis for what comes afterwards.
No comments:
Post a Comment