Tuesday, September 29, 2015

Impenetrability of values of π

What is π?

Wait... before you spit out digits you memorized for that competition, what does it mean first? It is the ratio of the circumference of a circle to its diameter. That also seems a little weird because it's easy to measure a straight line against a straight line, but not against a weird curve. Rulers just don't work. But there's an easy way around that. Use a string, which is infinitely flexible, just make sure it is taut when measuring.

But it's still kind of weird. Who would've bothered to think to care about a ratio? (OK, it's easy to bother...you can walk around a circular castle, how long is it across?) But now that we've started thinking, π is a constant? That's also weird... why would you care? (well, it helps to simplify things and constants are simpler than variables, so...) It's obviously (once pointed out) more than 3 (inscribed regular hexagon) and less than 4 (circumscribed square). Anyway, it appears in quite a few mathematical places seemingly beyond geometry (eg the normal curve, the analytic continuation of the factorial: Γ(1/2) = sqrt(π)), for no apparent reason (well that's what inscrutability is all about!)



Sure, there are calculations with varying degrees of inscrutability: Archimedes' method (pictured), the Taylor expansions (Euler's π/4 = arctan 1, Machin's  π/4 = 4 arctan 1/5 - arctan 1/239), the continued fraction expansion, Chudnovsky's method, the BBS spigot algorithm,

So I think I've dispensed with the impenetrability of π. I've made the tiniest of mathematical scratches and there's a long way to go. But I don't want to go there now. I want to explore the impenetrability of the value of π. The 3.1415926535... that value.

How big is π? Right, a little more than 3. But what is the point really? If you're shooting an arrow across the castle, 'a little more than 3' will do. If you're tying a rope across that 3.14 will do (and you'll want some slack anyway which will wash away any more digits. 

The fraction 22/7 is often given as an easy approximation (= 3.142854...). But what's the point? to remember that fraction (3 digits plus where the division goes, or two small numbers) is just as much as the decimal explnsion 3.14. And 22/7 is somewhat impenetrable itself (oh fourth grade nightmares of fractions!) 7 goes into 22 ... argh how many times ... oh 3 plus what's left over .. hmm.. one seventh) Just use 3.14 and cut out all that nonsense. 

The next best continued fraction expansion is 355/113. Six digits (slightly mnemonic in repeats, but really somewhat random) It gets you 3.14159292... seven correct digits (yes I'm fudging the rounding). But then do you really want to remember that fraction and have to divide 355 by 113? Yechh. Even plain old division is way more impenetrable than just a list of digits (as long as the calculation of those digits is correct). Essentially, I'm saying that 355 divided by 113 is just as impenetrable (in your head) as 4 arctan 1 (and roughly the same calculator button presses).

So what do you really need? If your castle of one kilometer in circumference is measured around in (oh I didn't explain why measuring around to calculate diameter rather than the other way... presumably you're in a siege around the castle and you need to shoot arrows all the way across. It's a use case that comes up more often than you'd think)...measured around with a rope to a precision of within a meter (oh I love decimal) 3 digits, a millimeter 3 more, and already the stretchiness in a kilometer of rope or errors in laying measuring tape a hundred times is way beyond a millimeter.

If you want to explore the digit patterns in π, then, totally, go for the spigot algorithm (oh yeah if you don't mind them in hex).

If you're measuring the circumference of the observable universe to the precision of the radius of a hydrogen atom (you never know when that might matter), then you really need at most 39 digits (46.6 billion ly = 8.8×1026 m /  5.29×1011m ... oh looks more like 38 digits is enough (26 + 11 + 1?).  

But if you're shooting arrows (or painting a storage tank), just use 'a little over 3' and you're gonna add a slop factor anyway because there's lots of little engineering give and take you have to compensate for.

Well, whichever, measure twice, cut once.

(What would I do if I were programming a Mars lander? Hell, yeah, double precision or more!)

Saturday, September 26, 2015

More comparisons between Statistics and ML

This is a continuation of a post I made about differences between statistics and ML.

I'm not intentionally trying to piss people off ("How dare you imply that we are not as good as those other guys") but I suppose some things might be provocative and arguable. All generalizations are false but a dog with three legs is still a dog ("Are you calling me a dog? How dare you!"). Isn't the point here really that stats and ML have quite a bit in common? Also, I use 'data' as a mass noun "the data is consistent with an increase in effect". Like 'water', I use it grammatically as singular. So there. 

Knowledge doesn't come to us in a package; it is discovered piece by piece, following the path of least resistance, with no overarching systematic plan to fill out. Afterwards, the stories are made coherent and clean. and oversimplified for the textbooks. Also, different people in different academic cultures may explore the same things but with different basic tools. Some people call themselves X, some call themselves Y, they both do Z. But X and Y never communicate, not because they are competitors but because their motivations, their culture, the building they are housed in on campus, are so very different, they just aren't even aware of the other's existence.

Statistics started in the 1800's with government and economic numbers, but then also sociology (Quetelet), and then at the beginning of the 1900's with agronomy (Fisher)  before it then exploded in every natural science (medicine, psychology, econometrics, etc). Though it started from applications,  the mathematics behind it (I blame Pearson?) came from mathematical analysis (all those normal curves and beta distributions are special functions of analysis). Everyday statistics is making hypotheses, doing a t-test, p-values, most likelihood estimators, Gamma distributions. The point of statistics is to take a lot of data and say one or two small things about it (x is better than y).

ML (machine learning), very distinctly, came out of the cybernetics/AI community, a mix of electrical engineers and computer scientists each of which have their own subcultures but closer to each other than they are to statistics. The mathematics behind ML came out of numerical analysis and industrial engineering, decision trees and linear algebra, linear programming. Everyday ML is neural networks, SVMs. The point of ML is to engineer automatic methods to take lots pf data (like pixels in a picture or a sound pattern) and convert that to a label (what the picture is) or text sequence.

The cultural overlap is basic data munging, data visualization, and logistic regression.

I think the primary social difference (which leads to a few technical differences) is the following. Stats is much older and has tried to solve a few problems very very well. They try to take as little data as possible (because they were historically constrained computationally) and determine knowledge. A lot of statistical consulting is judging the study design, determining what can be known with what probability and what assumptions (like prior distributions) restrict what can be known with what reliability.  ML is much newer; expects lots of computational power. It often overlooks lessons learned by stats.

But then stats is a bit held back by its insistence on blind rigor. ML is creating techniques that very successful without worrying about the foundations, about what a p-value is a probability of, or whether it is a probability at all.

What they actually do

Statistics is the science of analysis of data: mean and standard deviation (descriptives, what the data looks like), distributions (eg normal, Chi-squared, Gamma, Poisson), p-valueshypothesis testing, type I/II errors, t-tests and ANOVA, regression and general linear models. Its foundations are probability theory which is applied measure theory which is applied analysis (distributions turn out to be mostly special functions). Concerns: significance, p-value, confidence intervals, power analysis, correct interpretation of data and inferences. There are principles

Machine Learning is almost entirely methods for solving prediction problems. Instead of a human looking through a set of data and eye-balling what the pattern is, let the algorithm look at way more instances than is humanly possible to get the pattern. Most of the methods are ad hoc: neural networks, naive Bayes, SVM, decision trees, random forests. There are no principles. Sorry, there is not the depth of principles that statistics has, except when it borrows those principles.

Misnomers

Both labels are misnomers. Statistics sure is used to study states and governments, but is overwhelmingly the province of (a very weird subset of) mathematics.

Machine Learning does include some learning techniques (in the Active Learning area where real time data feeds supply and modify the model), but is primarily a relabeling of Pattern Recognition (which is a more accurate name, somewhat closer to the prediction methods of complex models, the pattern in general being a very specific kind of model).


View from the outside

From the outside, statisticians are consultants for the research community for agronomy, econometrics, medicine, psychology, any academic science or applied version  that takes a lot of data and (interestingly it is the softer sciences like psychology and sociology that send their grad students to the statistics departments for instruction, but the physicists and chemists, even though they may individually use some regressions, don’t usually depend on a statistician even thigh they may do a regression or two. Maybe they think they know enough to do it themselves?). Either way, ML people make more money, I don't know why.


In industry (applied)

Statisticians are employed for quality control. This is their primary act as working statisticians. Taking samples of products, calculating error rate. ML people are more directly part of creating machines that do things in a fancy way, building things that work, like an assembly line robot for cars or zip code reader for handwritten mail.


In academia

Statistics is concentrated in an academic statistics department (Often attached to a mathematics department or ag school) or as a group of consultants for agronomists or medical research.

ML is concentrated in the AI section of a CS department or sprinkled throughout the engineering departments (robotics in MechE, EE (they do everything!). Or in real life in lots of industries, speech recognition, text analytics, vision.

Of course, there are some individuals who probably consider themselves in both camps (Breiman, Tibshirani, Hastie. What about Vapnik)?

Controversies

This has mostly been controversail as to what the differences are because of the tension between trying to assume they are the same but showing where the cultures make them different. Instead this is about the controversies within each.

In statistics, there has been a great internal controversy between frequentism vs Bayesianism. Frequentism is for lack of a better way of saying it, the traditional p-value analysis. Bayesiansim avoids these somewhat with the added controversial notion of allowing an assumed prior distribution set by the experimenter.

Less controversial though is the tension between descriptive statistics (or data exploration) and hypothesis testing.

ML is mostly Bayesian by default since arely are assumptions made about the distribution (or any investigation whatsoever about the effects of the distribution) and MCMC (Monte Carlo Markov Chain). The biggest controversy is between rule based learning and stochastic learning. The success of neural networks in the mid 80's (and the success of 'Google' methods in the 2000's) has largely killed rule learning except for maybe decision tree learning and association rules.


Notation

Usually stats is the old fogie and ML is the uncultured upstart, but in mathematical notation it is different. ML, coming out of engineering, uses more traditional mathematical notation. Though nominally more closely connected to mathematics practice, statistics uses a bizarre overloading of notation that no one else in math uses. For probabilities, distributions, vectors and matrices. Every thing element has multiple meanings, context barely tells you what's the right syntax.


Random notes

  • ML is almost entirely about prediction, in stats there’s quite a bit else other than that.
  • ML is almost entirely Bayesian (implicitly). Explicit Bayesianism is out of Stats. Frequentism, traditional statistics, is what most applied statistics uses.
  • Stats is split into descriptive and inferential meaning either simplify the entirety of some data into a few representative numbers, or judge if some statement is true.. Descriptive creates patterns/hypotheses that then the inferential judges how good the patterns/hypotheses are
  • predictions vs comparisons. ML is almost entirely predictive. Stats spends a lot of time on comparisons (is one set different from another, is the mean (central tendency) of one set significantly different from that of another)
  • Leo Breiman also explained a distinction between algorithmic and data modeling which I think maps mostly to ML and stats respectively

How they're the same

I consider ML to be an intellectual subset of stats, taking a lot of data and getting a rule out of it no matter what the application. Whatever things get labeled ML, they really should have a statistical analysis (to be good), and statisticians should be willing to call these methods statistical. So what if they're in different departments. 

Wednesday, September 23, 2015

Where is the universal electronic health record?

It's the 21st century. Where is our universal electronic health record? The one where all the medical knowledge about us individually is viewable by any doctor anywhere. You know, you get a yearly flu vaccine at your local drug store, and show up at the nearby emergency room for a sprained ankle, but when you go to your yearly checkup with your doc near work, they have no idea! Forget about it being possibly available when you're on vacation and get food poisoning and go to a non-local hospital.

In the middle of backest-woods China I can show up at an ATM for cash. On a flight 40,000 feet over the ocean I can get wifi to check on who was in that movie with that actress in that TV show. But in Boston, in the best place to get sick in the world, with every hospital connected with multiple medical schools, and every doctor with an MD and PhD and leader of the field that covers exactly your problem, you still have to, after getting a CT scan, walk down the hall to pick up a CD to physically deliver it yourself to your assigned specialist's office next door, nominally part of the same hospital network, but only financially connected, not electronically (oh, it is electronically connected, just not for that one thing. Oh, and the other things too which you'll have to walk back and get).

What's the point (other than that EHRs suck (and not just for the lack of interoperability))? The point is that the technology, the capability, and the knowledge to implement seamless connection for all electronic health data (images, reports, visits, medlists) was possible in the 70's ... with 60's technology. There is no rocket science here (a little electronics and programming sure). It is about as complex as ATMs. The internet should make things that much easier. But for whatever reason (oh there are reasons) it isn't there.
(that's not Jimmy Carter, it's a made up person for HIPAA compliance)

http://www.theplaidzebra.com/first-manned-mission-to-mars/
It is the year 2015, and there are plans to send people to Mars, so there is no technological reason why an interplanetary health record (IHR) doesn't already exist for use when they show up there. The record of the infection you got training in the desolate arctic landscape of Ellesmere Island. The dosimeter readings while stationed temporarily on the L2 jump-off station. Your monthly wellness-checkup with your PCP (well, remotely).


Right now all you get is your intraoffice electronic health record (that is, within an office, not between). It would work great if your PCP, endocrinologist, and cardiologist all belong to the same practice. Of course they don't. Sometimes you're lucky and a big hospital will be the only center for an area and all docs belong somehow to that one hospital. I'm not saying things are bad everywhere.

Wait. Expletive. I can't go to any local drugstore (again!) to get an over the counter bottle of Sudafed, some batteries for a game controller, and a jug of bleach for my socks without stormtroopers crashing through the windows, hog-tying me, and interrogating me on suspicion for running a meth lab (I mean every time), because I went to another drugstore across town for that very suspicious flu shot. At least somebody can connect systems. I was almost happy that they cared! About me!

Enough idle complaining. My idle blaming is that it is the health care businesses's fault. The docs are doing their job as well as they can. The businesses don't get anything out of making things easier on the patients or docs. I have all sorts of constructive suggestions just no one likes advice.

Friday, September 18, 2015

Confidence in association rules is identical to conditional probability

There's something that has bothered me for a while. In presentations of association rule learning (as a method of an unstructured learning method/data mining), the basic principles are:

  • the store - the set of all possible items {milk, bread, eggs, beer, diapers} = d
  • transactions - a list of subsets from all possible items (a transactoin = 1 market basket) eg {milk, bread, eggs, beer}, could be represented by a 0-1 vector of length d. # of transactions = n <= 2^d
  • itemsets - a  subset of items in a transaction eg {milk, bread, eggs} or {bread, beer}, k-itemset has k items. 
  • support - support count = frequency of occurrence of n item set \sigma({bread}) = 2, support = proportion of an itemset to total transactions s({bread}) = \sigma({bread})/n = 1
  • frequent itemset - itemset with s >= given threshold
  • association rule: X-> Y,X,Y itemsets, The intention is that X implies Y, or if X appears in a transaction, Y is likely to appear also.
  • support (X->Y) = \sigma(X \cup Y), fraction of transactions including both X and Y
  • confidence - c(X->Y) = \sigma(X\cup Y)/\sigma(X), how often Y appears in transactions that have X

And the various algorithms (brute force, apriori, Eclat, FP-growth) work on the list of transactions to discover association rules with high confidence. Confidence is the primary concept to be optimized

So what is the difficulty? That last definition of confidence. all that buildup with all that new vocabulary, all so straightforward and sensible, but all so new. There's something about ... confidence... that seems so familiar, but the notation... of implies and support .. it's just...

Of course this has been done elsewhere already.

Confidence is simply the conditional probability of Y given X. That's it. In notation:

Pr(Y | X) = Pr( Y and X) / Pr( X )

which is the probability of Y occurring when restricted to when X is already known to have occurred (not temporally). What might be misleading here is 'and' versus 'union'. In the confidence formula we want the frequency of the itemset and in Pr we want the proportion of  events. There is a just a little step of manipulating subsets and events here; the elements of the set unioned with those of Y is equivlanet to the event of those elements conjoined (= anded) with those of Y. A subset of elements S of T is the dual of the events T a subset of S.

Just a little rejiggering of notation and a whole set of concepts opens up to help think about the space of association rules.
(from Pier Luca Lanzi, DMTM 2015 - 05 Association Rules)

Robots having an Explosion, but not Cambrian

In my pursuit to eradicate bad analogies, the latest is in a paper "Is a Cambrian Explosion Coming for Robotics?" by Gill A. Pratt in Journal of Economic Perspectives. It's a great paper, outlining reasons for an accelerating increase in the use of robots of all kinds and the technologies responsible for the acceleration, lots of enabling mechanisms (like energy storage improvements, combining learning in the cloud, wireless availability).
                      

But to the metaphor. The Cambrian Explosion is first an explosion of varieties  and then a very secondary implication an increase in incidences in the fossil records (lots more fossils). The usual explanation of the increase in fossils is that the newer life forms are more fossilizable, not that there are more individual lives.

Pratt's description of the explosion is not about varieties but about the technologies that will enable existing robots to be better.

I know this is a bit of a cavil because there were more fossils created during the Cambrian than before and could be called an explosion, but the usual provocative point about the Cambrian Explosion was that it was the great new variety that didn't exist before. Before the Cambrian, there were multicellular organisms (and fossil evidence of them), but during the Cambrian, lots of new anatomical structures seemed to appear for the first time (shells, tubes, etc).

The point is that when someone evokes 'Cambrian Evolution' it should be a metaphor for diversity not volume.

Otherwise, excellent article.

Tuesday, September 15, 2015

Driving in China

Just came back from a trip to China. Was driven around a lot, didn't drive myself. I noticed a few differences in driving style. In sum, I felt like I had to close my eyes a lot, which is apparently what the drivers do, too.

In the US, Canada, Europe, even France (!), people follow the rules of the road. They drive on one side of the street, they give pedestrians and cyclists a wide margin, even on dreaded traffic circles among the jostling there are rules of priority.

In China, the first impression is, as a backseat driver, to think 'Holy shit! Stop! you're going to hit that... whew... whoa you barely ran over.. whew... OH MY GOD you're going to kill us all... whew.. ' ad nauseam (literally). And the constant honking. I imagine I would be able to maneuver if only I could think but the incessant honking is so distracting.
(that's really how it looks, but somehow that is not a traffic jam, just normal operating procedure, and cars get through)
But after a week of this, a pattern emerges. Not just the feel for the road, the different rules (and lack thereof), but also the different metarules. First, honking is not a mean thing. In the US, honking is like a rude gesture, an insult, a middle finger to your face. You do not use it unless 1) the light has changed and the person in front of you is an absent-minded enough idiot that they don't realize it's their turn or 2) Holy crap! My brakes are out and I'm coming towards an intersection or 3) Some mf- bastard just effing cut me off! In China, very much to the contrary, honking is a courtesy. Pardon me kind sir, I'm just a little behind you and I'm about to overtake you. I'm right here so be careful and don't swerve into me. Thank you so much!

In bigger cities,  the wider roads have sectioned off parts of the road for bikes and scooters, presumably for safety. But whether these extra lanes are there or not, people on bikes, scooters, cars, trucks, etc will all intertwine.

In the US, there is the metarule, the rule of law, the rule that rules should be followed. Or if they're not followed a tinge of guilt and a speedy getaway. In China, the metarule is the rule of expediency, the rule that rules are there to guide you but really, I can fit right here at the moment, and look there's a pregnant woman on a scooter, with a young child sitting on her lap, and talking on a cellphone (she's not smoking that would be crazy), and she's making a left turn across the multi-lane intersection, yes, she can barely zip through before everyone fills the intersection, but oh, she cut across into the right turn lane of the crossing road and through three lanes of assorted vehicles, and left turn success! Some people are confident of what they're doing, some people not so, some people a little faster others a little slower, but everyone is aware of everyone else and they accommodate. Yes, I made some stuff up here. but just the cellphone. All the rest was faithful. Also, I was on a moped with two others (adults) in city traffic. But I'm here, without PTSD.



Sure they follow the traffic lights (as opposed to other countries where a stop light is very optional). If nobody is around sure they may slide through.

A slight detail that makes all this possible is that people just don't drive that fast. Not much faster than a moped (in traffic). That way everyone has enough time to make space for others and judgements about when to fill in that unoccupied space. On the highway however people will drive pretty fast, but there are hardly any cars on the super new clean highways.

There are cops everywhere, at every street corner in their cute little police boxes, but it seems they're not there for traffic but for shopping pedestrians. Also, the policemen seem more like bookish barely-out-of-college age accounting clerks, rather than the usual beefy, sunglassed, terminator-wannabes elsewhere.

Monday, September 14, 2015

Gödel not just good for incompleteness

This theorem is not provable
Kurt Gödel was famous for his incompleteness theorems (GIT) which entirely destroyed Hilbert's program (not really, just changed it's direction) and changed the face of philosophy of mathematics (probably should have but frankly not really), created recursive function theory and proof theory (pretty much).

But he is also well known within logic for many ground-breaking results there.

These results are

I suppose there are other things that he did that would have made him famous if it weren't for each one of the above.


Sunday, September 13, 2015

First world problems, living in space, and calculators

Why do we exercise?

(Beware: this is a mix of opinions about space exploration, medical advice, and mathematical education, and social commentary, so pardon the whiplash.)

When I say 'we', I mean current first-world medical opinion is that daily exercise is important. Driving in cars, little walking, hyper-sugarfied drinks, large servings at restaurants, obesity cardiovascular disease, we are bombarded by personal advice and the fitness-industrial complex to exercise even if you have to drive to the gym.. Treatment for rich people diseases (and being in the first world nowadays allows some measure of curability/treatment either via surgery or lifestyle changes (diet and exercise)) are just not available in the third-world. They're just trying to get by, to make it through the day. They'd love to have the opportunity and control in their lives to eat more than they need or leisure time to rest, instead of having to walk 5 miles to get tainted water (or in inbetween countries, only get tainted water through the plumbing).

Many life threatening problems are so addressable by medical techniques that it is the relatively minor annoyances that have become sever medical crises in the first world, like Alzheimer's or social anxiety. The third-world is just trying to have subsistence level nourishment, not die from diarrhea or fever from infectious disease. (pardon my usage of the first vs third world terms. They are easier to distinguish whereas 'developed' and 'developing' are not).

Presumably before the industrial age (or becoming developed), people got lots of exercise walking around. Yet they died much younger. If it weren't for infectious disease, would they have had a longer life-expectancy?

It's not like exercise is some fancy new idea, it has been around forever. It's just that right now it is a public business. Even within the US, it is a bit of a rich vs poor distinction: those who have the money and time can exercise, but those who work two jobs and have kids really don't (one might ask if poor/busy people don't get lots of exercise naturally just by activity level...).


It has been well documented and studied that astronauts who spend lengthy times in space (weeks and months) as on the no-gravity ISS (International Space Station) have osteoporosis and muscle weakness. In order to counteract this they have as part of their schedule a rigorous exercise plan,


much more rigorous than on Earth. In fact, it has to be much more rigorous to account for the lack of gravity. The gravity on earth is naturally exercising us constantly. Just standing up on Earth we are using muscles from our legs and torso, even sitting is using your back. On the ISS, the zero-gravity environment is like lying down all the time. An astronauts schedule includes a couple hours a day on an exercise bike, or 'weight' lifting. (also, you can do a triathlon in space, but the swimming portion is hard)

You go to space and one particular feature of that environment which should be considered a great facilitator, the lack of gravity, has an effect on our biology which is expecting a much less lenient situation. And the biology pulls the other way. (I've heard that some-impact exercise like walking can be better than bike-riding, which is no-impact, because it encourages bone regrowth that counteracts osteoporosis. I've heard)


Which brings me to calculators in the math class (obviously). Or even computer aided algebra or automated proving systems for academics (and sometimes for engineering).

Kids these days, they can't even do long division! They've been coddled by calculators! How do we expect them to do science, let alone balance their checkbooks?

Calculators (and computer algebra systems to the nth degree) do the calculation for the user. Multiply two 10 digit numbers? A tedious exercise for a person, but a natural fit for a calculator. Solving that integral with square roots and trig symbolically? For a math/engineering whiz it's an hour long homework problem, but for the CAS, it's a natural fit. Solving it numerically? Insane for a person, but a natural fit again for the CAS.

The calculator (and CAS) is not intended to be a crutch that ends up weakening the user making them dependent. It makes you go that much further than you ever could go on foot.

A car takes us hundreds of miles in a day that we would never dream of doing on foot. If it makes us a bit lazy in taking the car for a few hundred yards, well, that's when we have to make sure we walk.

Technology puts us in the first world, but then we have to remember to exercise.

Thursday, September 10, 2015

What statisticians and ML'ers really think of each other

Labels aren't the thing, they just name the thing, and the same thing can have different names, and many different things have the same name. But people often take the label to be the thing.

'Statistics' and 'Machine Learning' are labels for two different things that have some overlap, not identical but cover a lot of the same things.

Statistics is concerned with averages and deviations, probability distributions, design of experiments, and regression, trying to extract knowledge out of tables of numerical data. The usual single sentence summaries are hardly distinguishable from many other things with data in their title, like databases or IT (Information Technology).

Machine Learning is a subset of Artificial Intelligence (itself considered a subset of Computer Science but practiced and motivated by other engineering departments and psychology related fields including linguistics, philosophy and neuroscience). It tries to extract patterns out of numerical data too, but has a different provenance. The two overlap some but each have their own separate culture and methods.

And more to the point, they’re really trying to do mostly the same things and the math for them both is often identical.

But what do they really think of each other?

From the point of view of the statisticians (people who call themselves with that label or are employed by institutions with that label) is that ML is a handful of ad hoc 'predictive analytics' done by a bunch of computer scientists, engineers, or amateurs (or worse!) pulling it out of their ass, their methods are immature (they don't know anything!) and don’t take into account the decades of principles established by the more mature staisticians for quality of results. That is, ML may do new, interesting things but they usually aren’t that new and they’ve never thought of all the methodological pitfalls that have been managed so well already by statistical principles (think of the data!). The statisticians may begrudgingly acknowledge that some of the ML methods are externally successful, but really, with such complicated models how do you know if it is any good outside of your toy domain when you haven’t done a proper analysis of your distributional assumptions? You ML people don't actually know anything!

People who say that they do ML probably do not give themselves the label statistician or work in a statistics group, but rather ‘are’ a computer scientist or engineer. Their point of view is that statisticians are studying pointless details about ancient brittle methods that aren’t particularly interesting, don’t really apply to all the new data sources, and just aren’t as good as this shiny new toy. Also, Bayes says p-values are dumb! The ML people may begrudgingly acknowledge that some of the statistical methods produce quality results, but really who cares about the normal curve and what about Bayes? You statisticians are so old and ossified!

From my point of view, it would be better for everybody if ML were considered a subset of statistics (but successfully studied in other departments) and ML methods could use a lot of analysis by statisticians. And a job that is labeled as data scientist should be easily fillable by a statistician or an ML person. Both sides need more exposure to the methods of the other.

See also Statistics and Machine Learning, Fight! (it's funding and conference culture) and Statistical Modeling the Two Cultures (by Breiman) (data vs algorthmic modeling), The Two Cultures: Statistics-vs Machine Learning  for more opinions on the difference.

Monday, September 7, 2015

Books are slowly disappearing. Good riddance

Buying books would be a good thing if one could also buy the time to read them in: but as a rule the purchase of books is mistaken for the appropriation of their contents. Arthur Schopenhauer

The state of technology nowadays is that hard copy books are going away. We still have big libraries of paper books but little by little books are being published electronically.

The science fiction direction, to follow the trend, is that books will become obsolete, new things will only be electronic, paper copies will be a luxury for eccentric rich people, and there will be an underground secret society of zombie/scholars like ancient monks preparing illuminated manuscripts in shocking physical decrepit hand--held -substance-.

That's science fiction, but there is currently just the foreshadowing of nostalgia. "I love the smell of paper books, my Kindle doesn't give me that", "I order my bookshelves by color of the binding; I have a personal relationship with that color and size and the backpack I put it into at that time of my life", "That quote by that author? Somewhere on the top left page, two lines from the top about a third of the way through". "Wandering through the stacks, I pulled out a random volume, and opened a new world".

But all this is to say... so what? We're reading more than ever (possibly not in long stretches, but the quantity has certainly increased). We have so many sources for reading and so much more availability of time for reading. We have the old stuff newspapers, magazines, books, and paper memos. But we also have computers and laptops and phones, which all can read modified versions of the old stuff, plus emails and blogs and tweets. We can read these easily while sitting or standing, with one hand, on the bus, on a plane, standing in line. No dependency on the location of some heavy object to carry around constantly.

And more importantly for the cognitive experience of reading that paper object, as software, it is universes beyond some dumb stack of paper. The character refers to something they said the day before? No need to reread three chapters, just search for it. 


So we lose the physicality in an electronic book. All the current tricks to make an electronic book feel like a 'real' book, simulating a page leaf turning over, placing icons of books on a picture of a bookcase...they seem so...juvenile, so derivative, so ... last century.

Yes, there'll be no more 'memory palace', arbitrary connections to arbitrary referents that connect the narrative in a long fluent pathway. I'm at this part of the river with the island with no trees...I'm at the next part of the river with the house on one side and cliff on the other... None of that.

Hey. I love books. I love everything about them. All that nostalgia, that's me. 

But it's all going away. Electronic is better.

"Oh telling stories around the tribal fire is so much better intellectually than this paper thing. You're training your memory much better when you're forced to remember."

We're also very dependent on whatever technology is producing this stuff. What if the power grid goes out? We're screwed to the point where ebooks aren't relevant. We won't have the instruction manual to fix the burnt out internet router or turbine at the hydroelectric dam. But then back a century we're screwed if things fall apart where paper books aren't relevant (the printing press is managed by an unknown group and we can't get out the news about the new junta government staging purges. Every technology has its questionable gains that ruin the previous less advanced technology, and it also has its crutches that we'll sorely miss if they're every gone (going backwards). 

Postscript 
Other people can read on a tablet or even a phone. It's really hard for me. I really prefer to have the book in my hands.


Friday, September 4, 2015

The Turing Test - like magic!

Clarke's Third Law: Any sufficiently advanced technology is indistinguishable from magic

Finally, an invocation of the Turing Test which doesn't lie down in fawning adulation, which doesn't assume the Turing Test is the judge of intelligence, artificial or otherwise.

First, the Turing Test is a well accepted method for judging creation of a successful Artificial Intelligence (those capitals are ironic, because artificial intelligence is mostly not HAL 9000). To generalize, the test is really that if a human believes the human source of the test data, then that is successful Artificial Intelligence. The canonical test is a teletype (so that the mechanics of communication is not in question). A person communicates back and forth over the teletype. If that person can't tell if the conversation was with a machine producing the words (presumably by the machine mimicking a human's ... uh.... humanity, rather than being hyperlogical, then success.

It is great fast thinking on Turing's part, going quickly to a workable solution, cutting out lots of junk rationalizations, don't concern oneself with the infinite hypotheses of the underlying processes, just go for the jugular of what you have, the surface behavior and believability.

But frankly it is no different from bald anthropomorphism; if the animal acts like a human it must be human-like more deeply, with the lesson that doing so is usually not very successful. (But contrarily, a subject for another time is that I think many vertebrates share many cognitive abilities of humans, and also contrarily, some behavior that is usually considered special human intelligence may have very low complexity biological mechanisms that underlie them).

Not only is the Test the basis of countless scifi plots, but also countless dumbed-down explanations of artificial intelligence machines.

If it acts like a human then it -is- a human.

Basing success on limited explicit experience rather than looking behind the curtain and seeing the design? That is just plain idiotic. It is denial of common sense. The true test of whether something is artificial or human is by looking behind the curtain, by looking inside the blackbox, by seeing how it is designed. The design is the thing that should be judged, not the paltry examples.

Finite behavior doesn't define essence. The essence defines essence. Sure there's a lot more, the rules create the instances and anecdotes which are telling, but it's the all possibilities that are relveant, not just the small handful of instances.

A counter argument might be that telling essence is not the point, and that knowing essence is not available. So that experience is finite and is all that can be known sometimes (you can't always look inside the blackbox).

Here are two analogies that express my point. Generating genre texts with ngram probabilities (using markov models or deep learning), and generating biological objects using fractals. Here's an example of generated text (from Paul Masurell):

Pride and Prejudice, Jane Austeen.

I do not. It is very much at Pemberley. The idea of their all walking out. I must acknowledge to you. When I do not marry Mr Collins had promised herself. But you have the carriage might be copied. It would look odd to be proud. You are perfectly good. Elizabeth was at the sight of Miss Darcy was delighted. You have no objection to my charge. I know not. Lydia was urgent with the keenest of all. Mr Collins, you puzzle me exceedingly. But, my proposals will not go. To the rest. But, to much conversation, no traces of them.


The results look vaguely like the real thing, and could totally pass for reality (as long as they're not inspected too closely). Also, some humans using all their own skill can only each this level of coherence. So this is a terrible example? Turn up some dials and it gets less and less 'wandering' and more coherent.

Here's another example: Fractal trees. take a visual object like a line. Tack on smaller versions of that line to itself. Repeat to each smaller tree ad infinitum. You get a fractal tree like:

(from Gurpreet's blog. he has code!)
Depending on the rule, the 'tree' can look fluffier or sparser, and more regular or irregular. And it looks so much like a real tree:


(from Sarah Campbell)
And one could go the other direction and say that nature is implementing a recursive algorithm to grow its trees. But this is obviously crap. It certainly looks like a fractal, and I'm sure there are biological processes that can be modeled by some limited nesting (see the Chomsky/Everett disagreement over Piraha). But we know the fractal trees are not made by biology but by an algorithm, and that similarly a broccoli shaped tree whose trunk and branches and branches of those has to stop at some depth to give leaves.

It's like magic tricks: they work on the toy problem (having a card you're thinking of pulled out of a just-cut-up lemon) but don't generalize at all to anything beyond.

So you can make an elephant disappear on stage? Make it really disappear. It all looks right that one time, but is not repeatable because the reality isn't there.

Here's another example, IBM's Deep Blue chess playing program. So what if it wins against a human? (or plays at all). It's not magic. It's simply following game paths. Many game paths.

The Turing Test works in very limited contexts but is superficial.

Any sufficiently advanced technology is indistinguishable from a rigged demo.  James Klass

Thursday, September 3, 2015

Spoilers in Clustering Methods

In voting schemes, when there are more than two cadidates, there is the possibility of a 'spoiler'. That is, if a third candidate is introduced, votes might be taken away only from the formerly winning candidate, 'spoiling' a chance of victory, letting a candidate not preferred by the majority to win because the majority is split between two similar candidates.

This is similar to a clustering algorithm, where having set three clusters, the agglomeration is set to distinguish between two smaller clusters such that, if only two clusters were desired, together would be bigger than the third.


(from Pier Luca Lanzi)

In the example figure, the parameter to the system for k-means clustering is 3. The upper right set is split into two separate clusters. But if the parameter were 2, then those two clusters might combine to make a cluster larger than the lower left.

Of course, that doesn't mean the lower left cluster would be preserved, some elements may move back or forth. This shows that clustering can have anomalies like voting schemes, even though clustering doesn't account for all possible orderings (permutations) of 'candidates' and the correspondence of cluster label with candidate is not perfect.

Wednesday, September 2, 2015

AI: concerning but not evil



    Clarke's Third Law: Any sufficiently advanced technology is indistinguishable from magic

    Hawking, Musk, Wozniak. A new line of men's cologne or a cabal of reactionists against the benevolent Skynet overlords who just want to harvest human energy? Don't worry, it won't hurt.

    Neither, of course. What scent could 'Wozniak' possibly be? Grizzly bear breath?

    All they did was sign an open letter  stating concern over ethics in artificial intelligence, that researchers and implementers should be aware of the unintended consequences of the things they build (which are intended to do human like things but may not be ... perfect).

    Frankly I've overstated the letter's case quite a bit. They call for "maximizing the societal benefit of AI". Wow. Earthshattering. We never knew. It's pretty weak.The strongest thing said there is:
    Because of the great potential of AI, it is important to research how to reap its benefits while avoiding potential pitfalls.
    That was the single mention of anything coming close to not positive. 'Avoid pitfalls', that doesn't seem too extreme. Don't step in puddles. Don't close a door on your fingers. Thanks for the tip.

    Sure there was press about it and interviews and off the cuff fear mongering, based on years of anti-science science-fiction, where the interesting story is the mad or evil or misguided scientist, not the usual reality-based scientific-method science.

    OK there was a link to a more detailed outline of research objectives that would support 'robustness' of AI applications. And that's a longer document, and it, out of much more content, has two items (out of many) interpretable as negative:

    Autonomous weapons: "...perhaps result in 'accidental' battles or wars". Ack! Now I have to be worried about being shot in the face by an Amazon delivery drone.
    Security: how to prevent intentional manipulation by unauthorized parties.

    I have quite of few inspired fears of AI, but they weren't really represented in this document.

    So I conclude that most of these fears dredged up by the letter are very external to the letter, created out of readers' own preconceptions.

    ---

    There's all sorts of worries about science in general, the unintended consequences of something so obviously beneficial. And AI, since it is usually heuristic based, has even more room to have hidden 'features' pop up unexpectedly (and when it does less understanding of 'why' it happened).

    But to the 'evil' point. it's not the AI's fault. It's not a sentient being. Even if it passes a cursory Turing test, it has no 'intention' or 'desire' or other mind to be second guessed as to its machinations (pun intended). If it is anybody's fault it is some person: the designer who did not consider out of range uses, the coder who did not test all cases, the manager who decided a necessary feature was only desirable and left out of the deliverable. repurposed it outside of its limits. Your worries are second guessing a bunch of people behind a curtain, on top of assessing the machine's engineering.

    What are the evils that are going to come about because of AI? It's not a unconscious automaton, after calculating a large nonlinear optimization problem, settling on a local maximum which involves removing all nuclear missiles by ..um.. using them (much more efficient that way). It's not having your resume rejected on submission because it has calculated that your deterministic personality profile is incompatible with an unseen combination of HR parameters which imply that all candidates cannot be black. Haha those may very well happen eventually. It's going to be sending a breast cancer screening reminder to families whose mom has recently died of breast cancer. It's going to be charging you double fees automatically every time you drive through a toll booth because you have another car with EZ Pass. That is, the errors may be associated with AI systems, but the problem is not the inscrutability of the AI but lo-tech errors based on assuming the AI is thinking of all these things and will act responsible enough to fix them. 'AI' isn't thinking and isn't responsible. People have to do that.

    So be concerned about that fraud detection that doesn't allow you to buy a latte at midnight (it's decaf!) but does allow midday withdrawals just under the daily limit for a week. Or be concerned about the process that,  for all the those ahead of you in the royal line of succession, automatically created and accepted Outlook appointments at an abandoned warehouse filled with rusting oil cans and accelerant and told to light a match.
    Any sufficiently advanced incompetence is indistinguishable from malice. (Grey's law)