Showing posts with label words. Show all posts
Showing posts with label words. Show all posts

Wednesday, March 13, 2019

Replace AI and ML in headlines with STATISTICS

Whatever the culture, machine learning methods are statistical. Even if people, both academic and pedestrian, distinguish ML and stats in a practical sense, most ML methods are statistical and in fact created by statisticians. Vladimir Vapnik, the inventor of SVM, has the label 'statistics' (although in Russian) somewhere in his CV. Leo Breiman, the inventor of Random Forests (and a lot of other things), was both an industry consultant and professor of statistics.

Sure, neural networks (which include the metastasized Deep Learning deep neural networks) were invented in the control/systems/cybernetics/computer area, but a label doesn't confer a monopoly on ideas. And that idea is essentially a cascade of logistic regressions, which is pretty easily labeled as statistical.

All this is to say that all those AI techniques that are so big in the news... you could replace those headlines with ...


All these could be rewritten as:


----

What's the point of all this?
Absolutely nothing. But AL and ML tend to be words thrown around as though they're magic. They're not magic. So using 'statistics' instead will bring some sobriety to the conversation. The things that are coming out nowadays are really cool and revolutionary and are real progress in science... but it's not some magical genius in silicon, it's just little math tricks that have built up over time. It's not some science fiction faster-than-light warp drive, it's old tech that has been optimized little by little and it only just popped over the threshold into the mainstream.
---
Of course, not all cool new things in AI and ML are statistical. All the ones you hear about in the news lately are. Except the poker playing machine Libratus. There is a portion of it that involves learning from many games, but the major new process is not anywhere near what is traditionally called 'statistics'.

Monday, August 14, 2017

The Great English Muffin Shift

The Americans and British are separated by a common language. This has been attributed to Churchill, Shaw, and Wilde, all of whom stole from the best, but has never been attributed to Mencken, who should have said it but said other things instead.

There's the differences in pronunciation (Americans pronounce all 'r's, and Brits take a royal 'bahth'), and grammar (Americans go to the hospital, and Brits go to hospital), and there's all sorts of vocabulary differences, lorries and lifts and petrol.

But one primary difference is in vocabulary of food. Zucchini/courgette, eggplant/aubergine, let's call the whole thing off.  A number of baked bread products have different names in the two varieties. What's so special is that they form a chain, as though some higher force pushed in a word at one end of the sausage machine, forcing all the little sausages to move one sausage over, a Great English Muffin Shift. It goes like this:

A cookie in the US is a biscuit in the UK and
biscuit...scone and
muffin...scone, a slightly different kind of scone and
muffin fairy cake, a slightly different kind of muffin and
English muffin...crumpet, because in the UK, you're there already you don't need to specify English.

What 'cookie' means to Brits, and 'crumpet' to Americans, I don't know. Yes, the sausage machine seems to go in reverse there and then start forward again, sometimes the machinery gets stuck.

There's also the Great Fried Potato Migration: what are called 'fries' in the US are called 'chips' in the UK, and 'chips' in the US are called 'crisps' in the UK.

As far as I can tell 'crisps' means nothing to an American beyond you must be talking about something crispy but why would you call it that directly. And 'fries' to a Brit must elicit a 'Pardon me, but fried what?'


Wednesday, August 9, 2017

Butterfly in all the languages of the world

Etymologically, some words are universal. The word 'mother' seems to have some version of an 'm' word in every language (despite the counterintuitive experience that 'm' is not usually the first linguistic sound an infant learns to make).

Some words will stay mostly the same within a historical group: pronouns and numbers tend to maintain meaning through centuries of phonetic changes.

Some words are unique to one language when other languages in the family keep the generic. 'Dog' in English is unique to English, but 'hound', from the Indo-European 'hund' (GE)/'canis' (LA)/'sag' (PE) remains elsewhere.

But are there words, or rather concepts, that are unique in every language. That is, is there a concept, such that in every language, the word for the concept is unique to that language and not shared by others?

If the idea that concept and word are not the same bothers you because, well, a word says what its concept is, then the following should convince you otherwise. Wait...instead just consider that a language foreign to you has mostly different words to you for the same concepts. Therefore words and concepts are not the same. Anyway, on to the main topic...)

Consider the word 'butterfly'. Sorry, consider the insect that in English is referred to as 'butterfly'. In English it is called ... yes, yes, I just said it. It's the usual English word made of two words. 'Butter' and 'fly'. There are all sorts of etymological theories:

  • the insect is a fly the color of butter (some very particular species I presume)
  • they hang out near butter
  • they literally 'flutter by' and people are goofy and pulled a spoonerism
  • the word as borrowed from Dutch who called it 'boterschijte' or, translated back, 'butter shit' because the insect's shit looks like butter, again presumably for some particular species whose shit I have not seen).
All somewhat sounding a little too convenient, like folk etymologies rather than scholarly exegeses. Except that Dutch one. Where did that come from?

But that's just English. The fun thing is is that most languages have their own strange fancy word for 'butterfly', seemingly not borrowed from any other nearby language.
  • Romance
    • Latin: papilio
    • Italian: farfalle
    • French: papillon
    • Spanish: mariposa, 
    • Catalan: papallona,parpalhòla
    • Portuguese: borboleta
    • Romanian: fluture
  • Germanic
    • German: Schmetterling
    • Dutch: vlinder (note not boterschijte)
    • Danish/Norwegian: sommerfugl
    • Swedish: fjäril
    • Icelandic: fiðrildi
  • Slavic
    • Bulgarian: peperuda
    • Serbian/Croatian/Bosnian: leptir
    • Czech/Slovak/Polish: motýl
    • Belarussian: matyliok
    • Ukrainian: metelyk
    • Russian: babochka
  • Celtic
    • Irish: féileacán
    • Scots-Gaelic: dealan-dè
    • Welsh: glöyn byw
For every one of these mostly distinct entries (yes, yes, Slavic has a couple of derivatives of 'motil', and Romance of 'papilionem') there is an obscure etymology, mostly made up, just like the English one. The German 'Schmetterling' seems to come from 'schmettern' meaning 'make a loud noise' or 'strike' (butterflies tend to be quiet) but 'schmetter' is from an older Saxon dialect word usage, having to do with milk products, following the old folk belief that witches fly about in the form of butterflies, in order to steal milk and cream. A bit fanciful and sounds like my great aunt made it up. But then 'schmetten' is a dialect word for cream, deriving from the Czech “smetana”. So it's obvious! Cream, butter, butterfly! Which is to say nothing is obvious and it all sounds made up.

The Irish 'féileacán' also has multiple explanations. Maybe it is from 'feileach' which means 'festive' (butterflies certainly are festive) or it could come from 'eitleach' for flying. A possible sound change but not borne out elsewhere in Irish.

So, what's the point? Take any other language from your own. Almost the definition of it being another language is that there's a different word for everything. But for 'nearby' languages, really most of the words are cognate, just changed slightly, and it is only a handful of words that stand out as being different (e.g. English vs Scots English). The point is that the animal called 'butterfly' in English seems to have few cognates even in nearby languages. What is the explanation? What makes those insects so special? And even if they are special (they are!), aren't there other animals that are as special? A bear is pretty special especially if it's running after you. 

Te direction this is going in is that of all the words in the world, 'butterfly' has no cognates among any languages. By looking at the list that is obviously not true: motyl/matyliok, papilio/papillon/papallona, and others. But it does show that the word seems to vary quite a lot, as though a butterfly really brings out creative neologisms in everyone.

Linguistic note: I stopped at the European of Indo-European only because of familiarity and ease in checking. It would be instructive descriptive (that is non-theoretical) linguistics to investigate:
  • other close families like the many close languages of India, Indic or separately Dravidian, or Chinese
  • very close varieties (mutually intelligible dialects) to see if 'butterfly' is so volatile even in very close languages
  • compare other concepts in a structured manner, e.g. one-for-one against mother, five, dog, fly to see if butterfly really is special (or is it a pattern that's not really a pattern and lots of other middlingly common words have a similar situation

(OK I lied at the beginning. 'Mother' is not considered a language universal by any linguist. It is certainly maintained as the main 'mom' word within Indo-European. But any 'm-' words in other languages are considered by linguists to be coincidences. There does seem to be some lexical universals over all human languages but currently there is only considered to be one, 'huh?'...so far)

Tuesday, August 1, 2017

Statistical Rumsfeld: Now We Know!

No, not a poor punk band name, but Statistical Rumsfeld, popularized by his usage but not created by him, is a way of talking about what you know about your own knowledge:

...there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.

- things we know that we know them: this is data. We've looked, and seen, and we are aware that we've looked and seen and verified and removed doubt. Is it 'yes' or 'no'? Look at the thermometer.

- things we know that we don't know. We know we don't know what's behind the curtain. We know we don't know what the capital of Chad is. We know we don't know what somebody is thinking before they tell us (even sometimes afterwards). We know the boundaries of this darkness. We know he range of possibilities. This is like a probability density; we don't know the particular value of a coin flip but, we know that 1/2 will be one side and half the other. That's something.

- things we don't know that we don't know. We have no idea. We don't know how to look for the value, we don't know the distribution, we don't know what the range is, we don't even know if it's a number. Totally unexpected. A black swan.

Something is left out. you have things that you know and things that you don't know, and you can either know that or not. Two things, with two possibilities for each, four in total. The one that is missing is itself: unknown knowns. 

- things you didn't realize you knew. You didn't know you knew that, did you? Unconscious knowledge. A hidden talent you weren't even aware of. The pattern in the data that was always there.

Or better, in a handy chart:

Things
KnownUnknown
Do you know about them?
KnownKnown Knowns:
Facts, data
Known Unknowns:
Parameters, Distributions, Probabilities
UnknownUnknown Knowns:
Unconscious knowledge
Unknown Unknowns:
Hidden Variables, Black Swans


Monday, July 24, 2017

Triple animal metaphors

Lately I've noticed some animal metaphors that come in threes, to express three levels of something. fast to slow, or big and rare to many small.

Cancer: turtles, rabbits, and eagles. Some cancers are fast. As soon as you find out about them, it's almost too late, there's nothing you can do, the eagles swoops in quickly, as soon as you're symptomatic, you only have weeks or months left. Treatment might help out but only by extending life by a few months and may end up reducing quality of life for that extended period. Lung cancer tends in this direction.

Some cancers are slow. A blood or even genetic test shows that you have or may eventually have the problem, but as it is now, you're more likely to dies of many other things first before the cancer does you in. Treatment may stop the cancer but again may reduce quality of life for a much longer period which wouldn't have mattered anyway. Prostate cancer tends in this direction.

Some cancers are in between and even capricious. As soon as you find out, treatment could entirely eradicate the problem (colonic polyps, melanoma) or

Notice how I said 'tends'. Everyone's situation is different, but here are tendencies.

Here's a classic Tufte graph of survivability percent over years for the primary types of cancer:


(from Cancer survival rates)

This chart shows a bit more, the progression of mortality of the disease over 20 years. Just the 5 year death rate shown by the first column shows that prostate and thyroid cancer are turtles, and pancreatic and liver are eagles.



Marketing: Rabbit,deer, elephant. When pursuing sales to a customer, there is a continuum from many small customers to few, very large customers.

Rabbits are the millions of casual buyers: people buying socks, or a game app. Mass advertising and viral sharing are the way to get to these buyers.

Elephants are the huge multinational corporations that will either be your only customer or might just acquire your company altogether.  Knowing someone inside, or huge involvement in national media is the lead-in to a purchase or acquisition here.

Deer are in between. The upscale cars industrial machinery are the objects for potential buyers.






Tuesday, June 6, 2017

Books That Tell You The Categories Of Books

The classic first section of "If on a winter's night a traveler...' by Italo Calvino, 1979 (tr. from Italian, 1981, by William Weaver):

You are about to begin reading Italo Calvino's new novel, "If on a winter's night a traveler..." ...
In the shop window you have promptly identified the cover with the title you were looking for. Following this visual trail, you have forced your way through the shop past the thick barricade of Books You Haven't Read, which were frowning at you from the tables and shelves, trying to cow you. But you know you must never allow yourself to be awed, that among them there extend for acres and acres the Books You Needn't Read, the  Books Made For Purposes Other Than Reading, Books Read Even Before You Open Them Since They Belong To The Category Of Books Read Before Being Written. And thus you pass the outer girdle of ramparts, but then you are attacked by the infantry of the Books That If You Had MoreThan One Life You Would Certainly Also Read But Unfortunately Your Days Are Numbered. With a rapid maneuver you bypass them and move into the phalanxes of the Books You Mean To Read But There Are Others You Must Read First, the Books Too Expensive Now And You'll WaitTill They're Remaindered, the Books ditto When They Come Out In Paperback, Books You Can Borrow From Somebody, Books That Everybody's Read So It's As If You Had Read Them, Too. Eluding these assaults, you come up beneath the towers of the fortress, where other troops are holding out:

the Books You've Been Planning To Read For Ages,
the Books You've Been Hunting For Years Without Success,
the Books Dealing With Something You're Working On At The Moment,
the Books You Want To Own So They'll Be Handy Just In Case,
the Books You Could Put Aside Maybe To Read This Summer,
the Books You Need To Go With Other Books On Your Shelves,
the Books That Fill You With Sudden, Inexplicable Curiosity, Not Easily Justified.

Now you have been able to reduce the countless embattled troops to an array that is, to be sure, very large but still calculable in a finite number; but this relative relief is then undermined by the ambush of the Books Read Long Ago Which It's Now Time To Reread and the Books You've Always Pretended To Have Read And Now It's Time To Sit Down And Really Read Them.

With a zigzag dash you shake them off and leap straight into the citadel of the New Books Whose Author Or Subject Appeals To You. Even inside this stronghold you can make some breaches in the ranks of the defenders, dividing them into New Books By Authors Or On Subjects Not New (for you or in general) and New Books By Authors Or On Subjects Completely Unknown (at least to you), and defining the attraction they have for you on the basis of your desires and needs for the new and the not new (for the new you seek in the not new and for the not new you seek in the new).

There's also a murder mystery.

Thursday, May 18, 2017

Non-literal 'literally' is not alone in contradicting itself

Literally means taken word for word. "The debt incurred was literally billions" probably means that the value 'billions sounds like an exaggeration but I want to emphasize that it is no exaggeration, that the actual value was in the billions.

But people use 'literally' all the time in a non-literal fashion, as an intensifier. "That party man, that house was literally on fire!" probably meant the the house was quite enjoyable, not that the local firetrucks were dispatched. This usage is not the literal meaning of literal, and it's not really the opposite (it's not saying 'hey this is an exaggeration' but rather 'hey check this out!').

If one is speaking informally then go wild, us literally to mean 'hey check this out'. It's common enough to be understood that way and in such instances not likely to cause misunderstanding. But in formal use, where you really want low ambiguity for transfer of information, then you may even want to avoid 'literally' because it might be misleading: people who don't know the literal meaning of 'literal' might be misled into thinking you're exaggerating or just pointing out some outrageous thing.

Literally applied to itself literally doesn't mean literally. It's a snake biting its own tail. Literally. if the word is the snake and interpreting the meaning is biting something, in this case itself. OK that was way too literal.

Except...
The commonly accepted formal meaning of the word literally, that is, word for word or actually, is not itself a very literal meaning. If you want to be pedantic, as 'literally' is practically asking you to do, the source of 'literally' is via the Latinate for letter, so it should mean something 'by the letter'. This is itself a figurative use. You're not caring about letters but about words (maybe that's too pedantic). You're not caring about words but about primary meanings. Which is a figurative reading of word for word.

the following is  phrase that could be taken metaphorically but in this case the words describe the actual situation.

So literally does not itself have its own literal meaning.

And as self-contradictory as this is (how could we let this go so far?), this is not a strange new bizarro-world mind-bending stand alone example. There are a number of own-tail-biting words.

Really. I mean 'really'. I mean 'really' is an example of a word that is an analogy of this literal vs figurative use whose literal meaning is itself. 'real' means extant or existing or not-fake. 'really' really means 'a lot' or 'very' or 'much', not an exaggeration but an intensification. "It is really hot in here". Sure it is probably hot or at least warm. "our attitude is really getting on my nerves" means it is probably annoying, not that you have exposed neural material that an attitude is physically on top of.

This is very true. Very true. Well, no more true than what true is. 'Very' comes from French (via the Norman Conquest). It is cognate with French 'vrai' for 'true'. over the course of two hundred years after the conquest, there was an influx of huge number of Old French terms. 'Vrai' slipped over to mean true (foreign language usages are more likely to be 'repurposed' (i.e. misused). There is a bit of the etymological fallacy here, that the current meaning of a word should be what it used to be. There is no doubt that very means very nowadays. Maybe a little doubt in 1200AD in London.

But truly 'true' has the literal meaning of  'that which is the case'. Etymologically though it means all sort of things like 'faithful' or 'honest' and only became the opposite of false probably after 'vrai' became 'very'. So semantic drift happens, but that doesn't mean the drift was wrong or incorrect or bad or led to the downfall of civilization.

This is not to say that I like the non-literal use of literal. It hurts me (not literally) when I hear it used non-literally. It's so obviously intended to be meant literally and a non-literal usage just contradicts itself.

Friday, December 16, 2016

Trust vs Depend

"I trust that guy as far as I can throw him"

"I can't trust him to complete the project on time"

'Trust' is used in two different ways. One is the usual opposite of falsehood. If you can't trust them, they are a liar. This presumes intent and is almost demonizing.

The other way is dependability. Trust of the outcome. If you can't trust someone this way, it's not a reflection of their evil intent but about ability to execute. This is very different from falsehood. You can actively do something about this.

The other way you can only seek other sources of information.

So instead of 'trust' 'use 'depend on'. 'Trust' makes it sound like you think they're lying. 'Depend' just means there is doubt without judging.

Monday, November 7, 2016

Bullshit

Bullshit. There's no better word for things that sound true or plausible but have no connection to or no support in reality or any attempt by the speaker to make that connection. The metaphor is weak but it captures the feeling of the realization about what someone else has said.

Usually a statement is called bullshit if it stretches the bounds of plausibility. If it turns out to be a falsehood, it is considered a lie. Harry Frankfurt wrote an entire book on the subject, trying to solidify it (ugh, metaphors) as a statement by a person who is not intentionally trying to lie but rather has no concern at all for its truth value. That is, a bullshitter doesn't care whether a statement is true or false. Frankfurt is a philosopher and is trying to shoehorn a word into his own internal concept or into one that is more logically amenable, which is to say I think that usage is charitable. A bullshitter is trying to lie; if the statement turns out true, they would be surprised.

But the label is problematic. Bullshit is a little taboo (or a lot given the context). What are the alternatives? There are many but they have their problems, too.

All the following words fit the syntactic pattern, in response to something said, "That's X".

The first category is the anachronistic nonsense words:



What do any of these words mean? They're entirely opaque nonsense words, made up by someone long ago out of random sounds to sound like what they're describing. Another thing they all have in common is that currently (and for the past fifty years at least) no one in their right mind would utter these in sincerity, given that they sound like an old man smoking a cigar mincing an oath. All of these words are idiomatic, inexplicable. This description is almost the meaning, which may be the originally intended psychological effect.

Moving closer to reality, the next is the most populated category (and likely most thought of if not used), the actual shit category. Shit itself really isn't (metaphorical) bullshit - metaphorical shit is just worthless, but metaphorical bullshit is a damned lie. Many are minced (crap for shit), but at least they are analyzable.


Now there are real, relatable but still not literal, for mixed up messes, trash or sausage metaphors:


Actual literal non-onomatopoeic words that are close in meaning but just not quite bullshit and still somewhat old fashioned:


And finally the list of actual, literal words:

Of all the lists of related words I've seen for bullshit, 'nonsense' is the only direct literal word for it.

And that is the real problem here. Bullshit is not nonsense. Nonsense is words that make no sense. Bullshit makes sense, may be true or not, but may be misleading. Word salad (a salad made of words), gibbering (of an idiot), unconnected train of thought (of someone distracted) are all nonsense. Bullshit makes sense entirely. Just the intention (or reception) is different from usual truth valued statements.

Therefore there is no good alternative to bullshit.

Wednesday, October 26, 2016

Flounder vs Founder

In the series 'words almost spelled the same and almost mean the same thing, but are not'

Both mean many things but they come closest as things that happen to you metaphorically relating to the sea.

Flounder is the flat fish, and to flounder is to be like a flounder on the deck of a ship and flail about.





Founder, on the other hand is one who starts something (very different!) because they are at the base of things (cognate with the foundation). 


But to founder is not to found something but to begin the process of sinking, to founder upon the shoals. 

Certainly a flounder could founder on the shoals if it put itself into such a bad position, but that is less likely than that a founder of an enterprise would flounder before pivoting to a greenfield market (count the mixed metaphors!).

The etymology of founder is incontrovertibly via French fond from Latin fundus, the bottom. Flounder, the fish, supposedly is cognate with flat (obvious) and plaice (obvious biologically but not immediately obvious phonetically). Despite my metaphor about a fish flailing about, to flounder however has a controvertible provenance, probably mixed up with other similar sounding words, like flop and flail and fluke (is that another fish) in an example of phonosemantics.

So flounder, fish, flop. Founder, sink to the profound bottom.

Monday, October 24, 2016

Insidious vs invidious

In the series 'words almost spelled the same and almost mean the same thing, but are not'

Insidious and invidious.

They both sound bad. One sounds like...well the other does, too. But they are distinct.

That snake slithering up towards you unannounced? Insidious. Shaking a snake in your face? Invidious. Both are pretty mean. Insidious is stealthy or under the radar. Invidious is plain ill will.

Insidious describes something that lies in wait to get you, and invidious is something offensive or defamatory. Cancer can be insidious, lurking in your body without your knowing it. Invidious doesn't hide; it's hateful right away.

Insidious didn't fall too far from the tree – it comes directly from the Latin word insidious meaning "deceitful, cunning, artful," from insidiae "plot, snare, ambush." That's pretty unannounced. Something insidious can even be attractive while doing harm, like an insidious plot to befriend your crush's girlfriend, so you can break them up. But often it's not attractive, just sneaky.

Invidious on the other hand is from the same place as envy. But it has slid over to mean just plain ill-will, where envy might come in but is not necessary.

You can be invidious in an insidious manner (being sneaky about your distaste for the other). But I think insidious carries enough negative feeling in there that it already includes the attributes of invidiousness.