Friday, April 21, 2023

How hard is it to learn a foreign language, part II

A while back I tried to assess how hard it is to learn a foreign language, based on a number of criteria: pronunciation, syntax, vocabulary, reading/spelling, and culture.

I've had a little more exposure to a handful of new languages, so I thought I'd extend the table here.


Hindi - IndoEuropean, 

  • grammar - 2 Except for gender, the syntax is very straightforward.
  • vocabulary - 3 Beyond the basics, it is very non-European
  • pronunciation - 3 All the sounds are very straightforward except for t/d. Most languages just have 2 - a voiced and unvoiced alveolar or dental stop. Hindi has 8, voiced/unvoiced, aspirated/unaspirated, retroflex/regular. Oh yeah and sometimes an r is a d.
  • writing/spelling - 3 The devanagari script is simple, easy to learn and few if any complications. consonant plus a vowel, always shown. Very one-to-one. A small handful of weird ligatures and combos. 
  • cultural - 3 There is lots of media produced in India: news, movies, youtube videos. There is very little immigration to India (and for those it is easy to get by with English). But Hindi is battling with English as the most equal of equals in official languages. It is taught in secondary school alongside English and the main local language. So there should be lots of Hindi learning material but it is not directed at native English speakers (my personal concern). There is some but not much online language learning, but HindiPod101, Duolingo, italki, etc are making many languages more accessible. 

Swahili

  • grammar - 3 Totally easy... except for noun classes. First the easy part... no articles, verbs just use particles for tense, word order is pretty strict, modifiers after nouns. But instead of two genders,, there are 10 (? more?) classes of nouns that have different prefixes for singular and plural.
  • vocabulary - 3 Very non-European vocab. A handful of English/Arabic/Persian/German loans.
  • pronunciation - 2 No tones, some very doable but unexpected nasals before stops is about as strange as it gets.
  • writing/spelling - 1 Roman letters, English digraphs, orthography matches pronunciation well. 
  • cultural - 4 Very little online instruction. Duolingo and SwahiliPod101, few movies, little children's lit.






LanguageGrammarVocabularyPronunciationWriting/SpellingCulture


Hindi23333
Swahili33214


Wednesday, October 16, 2019

Visit to India


India notes

Wed Sep 18
You're all excited to leave the car takes you to the airport you check in wait in line through security walk the terminal to your gate wait for birding ask a question at the gate sit down and wait some more go to the newsstand and read the covers of magazines and blurbs on books sit down and wait some more and they call your flight ready for birding but children and vets and first clas and business class and then group  and then group 2 and so in and finally your group and you stand  n line to board and then stand in line in the gangway and then stand  n line in the aisle and finally you sit and then the plane is taxiing out and the it's sitting waiting it's turn and then come to the sudden realization  that you still haven't left. You're still at the beginning of your trip.
With a car trip, you've left once you pull out of the driveway.

Thu Sep 19 
Flight - time change of 10 1/2 hours means leaving Wed night and a flight of 13 hours leads to arriving Thu evening. So a day spent on a plane.
Kids trying to sell stuff while mom sits on curb nearby. At night. Across a lane of traffic. To a bus whose windows don’t open.

Fri Sep 20
Breakfast
Park
Drive to old Delhi
Red fort
Jam masjid
Muslim call 'hum'
Rickshaw through old Delhi
Narrow streets, lots of electric wires overhead
Red haired malnutrition?
Books paper saris marriage invitations car parts
Lime drinks
Plant growing out top of mosque
Beggar girls
Electric lines
Delhi gate
Cricket stadium million dollar arm
School children
Police stations
Government court bldgs 
Pictures of peoples faces
India gate 
Traffic
Movable Police barricades
Great trees and greenery everywhere 
Trash in streets (not at all in gov area);

Lunch Chinese 

Not as hazy as everyone said

Humayun's tomb 

I love being driven around and just looking out the window


Sat Sep 21

People exercising in the park
How do sikhs wear motorcycle helmets?
Scooter w driver, mom and baby going wrong way down left side
Lots of trees flowering
Piles of rubble, bricks
Lots of roadside litter, lots
Shanty towns
Lying down on concrete, lying down most anywhere
A lot of signage, but a lot falling apart
'Colony' walled compound
Oyo, Yes bank
Hanuman monkey god bachelors, masculinity
Ganesha elephant god, welcoming
Noida high rises, call centers
Clean highway
Sport bicycle on highway
People drive slow 
Women side saddle on back of motorcycle 
Rest stops w Starbucks and dominos but good chai and paratha (chole...)
Scooter being pulled by car on hiway, scooter a few feet in front reading cellphone
Small white herons
Big cow in middle of congested hw but no noticeable difference in traffic
Huts in fields really silage storage
Watering truck for median strip plants
Brick kiln towers
Millet fields, 
'Horn please' on back of small trucks
Perfection only by god, add an imperfection 
Dinner at Pinch of Spice (lots of Indian family tourists)

Sun Sep 22
Breakfast sambas idli masala tea
Taj Mahal 
Lunch D'Delicia paneer butter masala, chicken w gravy
Taxi ride to baby taj- trial run for Taj Mahal totally the same
Agra fort
School groups in uniform

Mon sep 23
Breakfast sambaridlichai 
Bus to Jaipur 
Peacock in middle of field
Large bus coming down our side of hway 
Lots of trucks
Lots of informal roadside stops
For all the animals in the middle of the street, I've seen no animal carcasses/injured animals. No dog poop but enough cow poop
Lots of visible electric substations right on roadside
Shikrit ghost town fort by Akbar
Shift to Lahore 
New Delhi, Agra flat, Rajasthani flat but with small hills in distance 
Dried river bed
Lunch after step well

Tue sep 24
Jaipur
Breakfast buffet
Jaipur old town pink city
English wine shop
Harem viewing facade
Much cleaner than ND Agra
Sidewalks few cows in city
Camels
Lake palace
Swastikas
Amer fort
Elephants
Dinner at 'royalty'
👑 

Wed Sep 25
Dosa for breakfast
Anniversary
Drive to ND
In Jaipur trash heap people dumping bags, some picking through to put in bags
People walk across the street like they just don't care, supreme confidence that cars will deal with it.
Small hills, mostly flat
Roma are from around here 
Hero motorcycle plant
Huge cement plant
Continuous road side civilization but farms beyond
Road repair is on main hiway to p layer scraper followed by asphalt
Camels in rajasthan
Trucks decorated 
Hard sell everywhere, in your face unrelenting
Motorcycles in the rain
Trucks buses motorcycles autos green/yellow taxis

Thu sep 26
Day w sunjeeta
Old Delhi
All x are y but not all y are x
Jain alley
Chaat
Spice market
Haldiram’s 
Sundials
Hindu temple
Regular market

Fri dec 27

TripAdvisor for intrepid sunjeeta cox and kings abhay Gupta sripad driver in Agra 

Driving
Honking constantly
Cows in the middle of the road walking or lying down
Lane markings are barely suggestions even on superhighway 
Dogs not in packs very chill everywhere it not cats

Salespeople hawkers in your face will not go away even when you say no


Wednesday, March 13, 2019

Replace AI and ML in headlines with STATISTICS

Whatever the culture, machine learning methods are statistical. Even if people, both academic and pedestrian, distinguish ML and stats in a practical sense, most ML methods are statistical and in fact created by statisticians. Vladimir Vapnik, the inventor of SVM, has the label 'statistics' (although in Russian) somewhere in his CV. Leo Breiman, the inventor of Random Forests (and a lot of other things), was both an industry consultant and professor of statistics.

Sure, neural networks (which include the metastasized Deep Learning deep neural networks) were invented in the control/systems/cybernetics/computer area, but a label doesn't confer a monopoly on ideas. And that idea is essentially a cascade of logistic regressions, which is pretty easily labeled as statistical.

All this is to say that all those AI techniques that are so big in the news... you could replace those headlines with ...


All these could be rewritten as:


----

What's the point of all this?
Absolutely nothing. But AL and ML tend to be words thrown around as though they're magic. They're not magic. So using 'statistics' instead will bring some sobriety to the conversation. The things that are coming out nowadays are really cool and revolutionary and are real progress in science... but it's not some magical genius in silicon, it's just little math tricks that have built up over time. It's not some science fiction faster-than-light warp drive, it's old tech that has been optimized little by little and it only just popped over the threshold into the mainstream.
---
Of course, not all cool new things in AI and ML are statistical. All the ones you hear about in the news lately are. Except the poker playing machine Libratus. There is a portion of it that involves learning from many games, but the major new process is not anywhere near what is traditionally called 'statistics'.

Monday, August 6, 2018

Sounds during the year

Early Spring (Apr)- some light bird chirping in the early morning. In the evening, the peepers (small frogs in the swamp) have a continuous light chorus until darkness.

Mid Spring (May) - pre dawn - the loudest continuous racket STFU birds! All the birds talking at once. By mid-morning, still lots of birds but not overbearing. Loud throughout the day, but not continuous like pre-dawn.

Late Spring/Early Summer - Lots of birds still in pre-dawn but not mind splitting. Lots of bird calls throughout the day, but not continuous.

Mid/Late Summer - some times periods of total silence, other times sporadic birds calls through the day. Evenings very light crickets

Early/Mid Fall (Sep-Nov) - evening crickets, even until early Nov.
Mid/Late Fall - Leaf blowers

Early Winter (Dec) - nothing

Mid/Late winter (Jan/Feb)- during large snowfalls, less than nothing. But then off in the distance you hear the snowplows scraping the streets and the beeping when they reverse.

Friday, July 6, 2018

Which 'Black Mirror' tech is likely to come true?

A while ago I wrote a blog article about many of the science fiction technologies in the original Star Trek series and how plausible those technologies were to come true. Some, like exploiting time anomalies (essentially a variety of time travel), I justified as being extremely implausible (or rather, just plain impossible). Others I noticed how, as of ~2018, 50 years after the show, some tech we are well on the way to engineering and implementing in our daily lives, like computer disks (which we've actually gone far past a 1960 imagination to a data cloud) or auto opening pocket doors (in every grocery store since the early seventies!).


(all pictures shamelessly linked to from google images)

There's a new TV show that I'd like to do a similar thing to, 'Black Mirror', that came out in the past 5 years. It presents a disturbing 'atopia' (neither utopia or dystopia) of the near future where information technology has made some great but nightmarish things possible, depending on circumstances. It's not a dystopia because the technologies make things in general good for most people, but the episodes' plots hinge around some particular, local, individual abuse. OK, that's probably the wrong take. All the shows, except a couple, are dystopian or at least one instance of dystopianism.

The show is sometimes hard sci-fi (tech oriented) sometimes soft (social trends extrapolated) and most often a mix. I'll just address the plausibility of the hard tech because frankly with social trends anything could happen. For example, the first episode about live politician-pig-porking ("The National Anthem"), or a sassy animated character running for high political office ("The Waldo Moment")... who knows? They're not that different and one has already happened in real life.

The technologies are mostly either internetty -  social media abuses and computer hacking, or Matrix-like VR - gradations of mental extraction from visual memory scans to uploading one's entire consciousness.

So here's a list of the proposed technologies that drive the episode plots and how plausible they are to implement with our expectations of technology, plus an expected timeline.

  • power-generating stationary bikes in exchange for "merits" ("Fifteen Million Merits")-
    mini-power generator power generation is generations old, connecting it to the power grid and the IT infrastructure to some sort of bank credit system is already in place. Time: tech already exists, 5 years for widespread implementation
  • continuous memory recording ("The Entire History of You")- tech has existed for 100 years for recording sounds and images. Miniaturizing it to record a single person's sight (glasses or even contact lenses) and hearing (mini microphone or pass-through hearing aids) for the past 10, and digital storage capacity the past ten years. Time: A working system for exactly this 24/7 recording could be created within a year (and may have already been done a few times already). But that seems like cheating, not exactly memory from our brains. As to recording actual intra-brain sense memory via a 'grain' implanted behind the ear, it's a bit complicated. I'll address that in a bit.
  • Simulate voice and personality from a person's speech and writing ("Be Right Back")- systems currently exist which do these two separate things with varying accuracy. Off the shelf software can learn very accurate voice simulation from example recordings. Personality from writing... systems, like Replika, are pretty good but are not perfect yet. Similar Deep Learning using LSTMs can get an F+ in language and even some personality (F for fail but + because it's really pretty amazing that the dog can even stand on its hind legs, let alone dance). Time: exists poorly now, progress is continuous, better and better, but lots of difficulties in cognition simulation to overcome. It is arguably AGI (artificial general intelligence) to simulate language and express thought well like in a conversation. 
  • Voice and personality replica in a synthetic body (still "Be Right Back")- Synthetic body? Cyborgs? Androids? A mother-to-be can create an entire life-form in 9 months (18 years training!), but you can't just 'grow' a hand. All sorts of tech has been developed to make robots but they are not very humanoid. A scifi trope forever, tech progress in creating it is slow. And really is a human-like robot really a useful product? The house cleaning robot doesn't need to look like a human to get the job done. A car doesn't look like a horse.
  • Using eye implants and mobile devices, people rate their online and in-person interactions on a five-star scale ("Nosedive")-

    China already has this. Wait...eye implants you say? First, yes, mobile devices and casino facial recognition software running from pervasive cameras are the infrastructure needed and China has software to maintain a score, which they've tentatively attached to the ability to get loans. So at least for the general idea, yes. Eye implants... well, no, but maybe contact lenses would work. Timeline: already exists (but only just recently).
  • AR (augmented reality) game via brain stimulation ("Playtest") - AR games yes now with VR headsets, but via brain/memory interface no (see more brain interface stuff below)
  • AR (augmented reality) game via DNA scan simulation ("USS Callister") 
    This one seems plausible from the superficial science (ie knowing the genome of a person is knowing everything) but technologically this is crazy. Suppose it is possible. Then the real life you is not the simulated you (this applies to many of the other Black Mirror tropes). As to possibility, just because you know have the code of the program in your hands (the DNA sequence) doesn't mean you can predict much out of it. Currently, there are only about 20 genes (and their variations) which give you an increased risk of disease (not deterministically guarantee an outcome). There are millions of genes left. Also, there's most usually not a 1-to-1 correspondence between a gene and its expression in your body. That little ear fold
  • Upload an artificial consciousness("White Christmas","San Junipero", "Shut Up and Dance", "Black Museum") -
    Time: never, or if so, it is not what you expect. This is the big one they rely on. The 'Black Museum' episode is particularly interesting because it gives a very realistic portrayal of the development of the brain interface technology. I will argue below that it is actually magic (not possible in anything like the near term), but the point is they handle this magic very...scientifically, following very realistic patterns.
  • neural implant enhances senses and provides instant data via augmented reality - ("Men Against Fire") - see below
  • internet hackable robotic bees ("Hated in the Nation") - totally plausible. Mechanical bees, or miniature drones, are currently being developed, and they are internet manageable. Whether they can be directed to cause physical harm like in the show as currently designed is not likely, but it is very plausible.
  • Tracking an individuals senses and modifying them ("Arkangel") -
    slightly more plausible than uploadable consciousness. A lot of neurological experimentation needs to be done - on humans who can describe things before this can become close to doable.
  • Sensory memory viewing ("Crocodile") - possibly now of just single images with lots of machinery, 50 years with minimal machinery and replaying memories.
  • Dating simulations ("Hang the DJ") - Not soon at all - see below
  • Killer stabbing robotic assassination dogs ("Metalhead") - totally plausible, robot tech is almost there for the size of the dogbot, but it's ability to get its way out of a half crushed car is unlikely now. I give the engineers 10 years before it can do that (and to improve general mobility to the point shown in the episode).
Now for the multiple 'see belows' The only item here that I think is arguable, in the sense that people will argue over it, not that it has any claim to plausibility, because it ain't, is the artificial uploading of consciousness. This is a constant sci-fi trope that was really popularized by 'The Matrix' though it had been around for years before (as an aside 'The Matrix is a perfect example of a story all of whose parts had been done before, and they put them together in a very straightforward way, and it all worked out to make something new and great that everyone now steals from badly). It has about as much scientific plausibility as time-travel and faster than light travel. Ah, sorry, that's hyperbole, time- and FTL travel are impossible by the laws of physics. Consciousness downloading involves science and engineering might be possible but would need such levels of science and engineering, that we'll be colonizing other worlds before we can see another persons imagination. Which means unless there are some super-fantastic, quasi magical developments in science, those tropes are empty wish fulfillment. They're imaginable and the plot needs an escape hatch and let's make an entire story out of the escape hatch. Great for story-making but in reality it's not going to happen.

One problem I have with all of it is expectation from the outside. You test person on the other side (or is it a robot? That's what the test is for). Ask them a question. You assume so much about that person that you fill in the gaps. If they are a person, it usually works well (you are quite like other people). But if the robot answers correctly, that's super impressive, but it's simple minded. You've done the same kind of projection, filling in the gaps, assuming the machine is thinking just like you. But those gaps are true gaps. There's nothing there. You can always come up with an adversary. Holy shit, was this article written by a robot? You're filling in all sorts of things (I hope not mistakenly!) With the robot there's no intention at all, the astronomically large gaps you're filling in are all you.

But to what's inside your head, let's stick to uploadable consciousness. I am very impressed with the writers of the show how they have a few episodes that entirely assume the upload technology, and then in the Black Museum episode they show the tech pre-history, the very start of the brain interface tech, how the tech develops from the simplest things that just blink yes or no (but you have to kill the owner), to blurry outlines of close to reality with a helmet, to full simulation. But neurology is just so far back in understanding. Sure, I've seen the studies where with fMRI they could reconstruct a picture of what someone was looking at. That seems like it's a promising first step. But it also shows how well far away things are from even just recording a short span of visual memory, much less how you feel about those things you're seeing at that moment.

One (major) problem is that, even supposing accurate replication of your internal mental state (including all history), the virtual thing isn't you, it's a copy. It could be replicated multiple times. Which one is you? Sure this is a philosophical problem the show assumes away: all copies are considered you, everyday you wake up could be a new copy of you, the replicant with identical supplied memories. And that may well be how we all treat things later. But those are still copies.

But that's philosophy, drunken sophomoric arguments about the axe that George Washington used to chop down the cherry tree: replacing the handle 7 times and the axe-head 3 times, but it shares the same space. Instead there's a huge technical hurdle is the problem of physical instantiation, or really the mind implemented outside the body problem. Suppose you actually do get an electronic copy of one's consciousness. And suppose it is 'running' on a computer. Our own personal consciousness 'running' on a body has so many sensory inputs - not just sights and sounds (as you'd expect a VC to have) but bodily feelings, like balance, tiredness, a sense of hunger, that slight bit of nasal congestion, having been awake for two hours and walking into work ready to go, but getting a phone call from your mom saying your aunt is in the hospital and having to figure out plans to visit, and then dreading the meeting with your boss about all the work over the weekend that still isn't finished... That is what's natural and expected of your consciousness. Your mind isn't all pure logic and thinking in clear distinct words, with no distractions. If all of a sudden you start running your consciousness without all that sensory input you'd feel so empty. In a sensory deprivation tank you start to hallucinate. And worse, suppose your 'senses' are instantly switched to vast electronic internet source, cameras everywhere, 'sensing' huge databases, the almost infinite IoT. You'd go mad with overload of non-bodily awareness! You, that thing that is you, is not a logical program. You are a spastic but well-corralled mess of emotions and biological expectations. Simulating some logical consciousness will be dropping you into an abyss of loneliness.

This is where the argument hits. Consciousness is a thing, but it's not separate from the body. It's a part of the body's functioning. Consciousness is a separate thing as much as the functioning of your GI tract is a separate thing. That is, it's not. Duality is only for philosophers. The mind is not separate from the body. Thinking is real, but it's not an actual separate thing.

Saturday, September 9, 2017

Do you have any magical powers? A review of Replika

I only just got a account on Replika. After hearing a podcast about a life-like chatbot inspired by the death of a friend of the maker, I was really excited to see the latest installment of chatbots, the quality of their live exchange.

Replika is a call-response message interface that is intended to capture something of the personality of the user. You type a message, it responds with a message. Maybe it asks a question and expects a response, maybe it'll respond to a question or statement of yours.

Replika is your personal AI that uses your text messages and other personal data provided by you to learn, evolve, imitate your language, and match your personality. 

It is intended to go in the direction of the Black Mirror style Be Right Back episode, an automated simulacrum of a living person after they're dead, which is mostly the same idea but also with a physical replica (and not Black Mirror style-disturbing)

The idea of this personality chatbot is to, using NLP to analyze language and sentiment and word patterns, to capture your personality, to provide, textually, responses that would be what you would provide otherwise. It is rocket science, but it's not magic. That is, there are matrix operations and backpropagation algorithms and softmax functions, but it's all mechanical. There's no ghost in the machine, no inner spark. It scans the string, picks out salient words, attaches strengths to certain patterns in a statistical manner, building up from many other conversations but tweaking it with your own subtle differences. Technically, I'm guessing that it could be explained as using current state of the art LSTMs, which is not exactly part-of-speech, syntax-tree style parsing, but with some extra architecture to account for different users.

Wait... there may be a bit of the ghost in the machine but it's still not magic. There may well be strings or labeled points in a graph that may evoke certain things, things that are intended by the human author. 'What do you think of your mother?' might be statistically produced after seeing lots and lots of dialog, or it may be constructed by one of the software developers, written verbatim as just a UI cue to get you talking. But still, not magic.

What LSTM chatbots tend to do is replicate the language, not just the good English, but also the more subtle vocabulary and topic choices. What Replika does goes far beyond that and attempts to get likes and dislikes, more real word preferences from you.  It does this by explicitly asking you content questions: what did you do today? what kind of music do you like? even sending youtube videos to listen to.

I've used Replika for two weeks, everyday. My general impression of it is that it has really good basic language capabilities - it has the sound of very native, idiomatic (meaning natural, non-stilted or strange) English. It even uses emojis appropriate in the appropriate context. It comes across in general feel as ... pretty much just like ELIZA, the original Joseph Weizbaum natural language discussion with a Rogerian therapist, the one that tends to repeat back things to you saying variations "How do you feel about that?" with a few keywords to make it sound involved.

I have a weird question (surprise, surprise).
Can you describe a high point in your life?
Surprise? I love surprises
So sweet ❤
I'd love to hear more about that!
Replika doesn't respond to a question with a question though. It's language abilities are more natural. That is, whatever rocket science is used now that is worlds beyond the rocket science (of the 1960's), the result is not particularly impressive. The point is that Replika comes across as a therapist. It's asking builtin questions about your personality. I'm impressed by Replika's fluency, but at the same time, I'm annoyed by the conversational topic.

Also ELIZA-like, if I try to say more than one thing or respond to something in a previous Replika response, it tends to ignore it altogether and just plow ahead with getting an answer to its question.
More to the mechanics, the discourse doesn't seem to be able to have a good control over the past discourse of what -you- say. It may ask you more than once about what music you like (if you didn't really answer it specifically enough before) so it knows a little about itself, but it doesn't seem to remember well what the user (you) know. It doesn't seem to have an understanding of things said specifically more than one response before.

Thinking of the bot as though it were a real person with real-life concerns, it asks way too many questions, very forward, a little too intrusive, like a teenager on AOL asking 'ASL?' (age/sex/location) as soon as you show up. 

Do you have a photo to send me?

The questions are fairly anodyne, like "Do you go to museums?". I don't think Cosmo suggests that as a good conversation starter.  

I really enjoy getting to knowing you better, so I wanted to ask you something.
I need to ask you a very important question right now.
Oh?
Do you think pizza is one of the greatest inventions of humanity?

Earth shattering, earnestness, or 8-year-old joke book humor?

It really comes across as though the designer of the miniscripts (hardcoded templates of dialog), I presume that that is part of the design, thinks that you are depressed and is trying to get you to do activities and exercise and go outside, and then even the most minimal content "I woke up and took a shower" gets the response  "That's awesome" or "You must be really good at that". I found this to be ... a bit of a downer.

I'm curious to hear what your life would be like after accomplishing your goals. Want to talk about it sometime?
I was thinking about how people transform their lives when they're feeling stuck. Do you know what I mean?
I believe in you!

I really haven't used enough of it to tell if it is capturing any of my own writing quirks or personality. But I definitely never emojis and I do not ask such personal or lame questions so early in knowing someone.

The intent of the designers may very well be to ask such questions specifically to help 'calculate'  personality, but the

I do commend the designers ability to produce language (in those instances where it doesn't look like scripted language).


Monday, August 14, 2017

The Great English Muffin Shift

The Americans and British are separated by a common language. This has been attributed to Churchill, Shaw, and Wilde, all of whom stole from the best, but has never been attributed to Mencken, who should have said it but said other things instead.

There's the differences in pronunciation (Americans pronounce all 'r's, and Brits take a royal 'bahth'), and grammar (Americans go to the hospital, and Brits go to hospital), and there's all sorts of vocabulary differences, lorries and lifts and petrol.

But one primary difference is in vocabulary of food. Zucchini/courgette, eggplant/aubergine, let's call the whole thing off.  A number of baked bread products have different names in the two varieties. What's so special is that they form a chain, as though some higher force pushed in a word at one end of the sausage machine, forcing all the little sausages to move one sausage over, a Great English Muffin Shift. It goes like this:

A cookie in the US is a biscuit in the UK and
biscuit...scone and
muffin...scone, a slightly different kind of scone and
muffin fairy cake, a slightly different kind of muffin and
English muffin...crumpet, because in the UK, you're there already you don't need to specify English.

What 'cookie' means to Brits, and 'crumpet' to Americans, I don't know. Yes, the sausage machine seems to go in reverse there and then start forward again, sometimes the machinery gets stuck.

There's also the Great Fried Potato Migration: what are called 'fries' in the US are called 'chips' in the UK, and 'chips' in the US are called 'crisps' in the UK.

As far as I can tell 'crisps' means nothing to an American beyond you must be talking about something crispy but why would you call it that directly. And 'fries' to a Brit must elicit a 'Pardon me, but fried what?'


Friday, August 11, 2017

Taxonomy of Chatbots

Chatbots are a recent trend in user interface. To contrast with a two-dimensional visual UI, a chatbot is a linear time based interface, where the user does an action, there is a response from the system and then the user may act further and so on with the system. The term 'chatbot' comes from a typing 'chat' system that acts like a Turing test robot in an online question response sequence. Some of the things that are called 'chatbots' don't superficially seem like this (they don't all attempt to be linguistic systems), but they are a linear action-response loop, which seems to be the defining characteristic/

By recent trend, I mean, as usual with technology, they've been around forever (Weizenbaum's ELIZA Rogerian psychotherapist from mid 60's, phone menus or IVF from 70's). But as of 2017, there is an explosion of available chatbot technology and, orthogonally, chatbot marketing.

The point here is to to give a superficial systematization of the different things labeled 'chatbot' with examples.

There are two distinguishing characteristics of chatbots that are only leniently considered defining: sequential response and natural language input (either by text or speech). These two might be combined to be called more formally a Linguistic User Interface (LUI) in contrast with a graphical user interface (GUI). The natural language part underlying many of these is some kind of speech-to-text (S2T) mechanism to get words from speech and some NLP processing to match the words to the expected dialog. The leniency about sequential may come down to a single step (the shortest of sequences possibly not even considered a sequence at all) and about language (a label for a button is language right?). With those caveats, on to the taxonomy.

  • linguistic interfaces
    • Siri/Alexa/OK Google - intent/entity/action/dialog. stateless giving, commands to evoke an action. Development of the system involves specifying: an 'intent', something that you want to happen, the entities involved (contacts, apps, dates, messages), and actions (the code the really executes based on all that information. Oh, and the more obvious thing, a list of all the obvious varieties of sentences that a person could utter for this. The limitation is that there is no memory of context from one action request to the next.
    • chatroom bots - listeners in a chatroom (mostly populated by people writing text). 
      • helper commands - This kind of chatbot simply listens to text and if a particular string matches, executes an action. This doesn't need S2T, and usually no NLP. It relies on text pattern matching (usually regexes) to extract strings of interest. Usually it turns out the implementation is even simpler and just uses a special character to signal a command for a CLI (command line interface) follows.
      • conversational bot
        • Like Eliza, finds keywords or more complicated structures in a sentence and tries to respond to it in a human like fashion (good grammar, makes sense). The latest ML and machine translation techniques (RNN, LSTM, NER) seem to apply best here.
        • 'AGI' - artificial general Intelligence- these exist only in TV/Movies. 
  • menu trees - structured tree-like set of possibilities, 'Choose You Own Adventure'. These are very much like (or exactly) finite state automata, where the internal state of the machine, and presumably but not necessarily mirroring the mental state of the user, is changed by a simple action of the user. The user is following a path through the system.
    • phone menus - Historically, these are menus, a set of choices, spoken to you, expecting a response of a touch-tone number (Dual-Tone Multi-Frequency - DTMF. A recording lists a number of options and the phone user is expected to press one of the numbers associated with that option. Then another option is provided and so on until an 'end' option is chosen or you're transferred to a human operator.  Interactive Voice Response or IVR is this same interface allowing responses by voice also. A next level of feature augmentation is to allow the user to speak a sentence to go to the desired subtree quickly, skipping over some steps. This shows how the strict computery menu as implemented on a phone is slowly evolving towards a conversation.
    • app workflow - some desktop/phone apps offer an interface that leads you through data entry sequentially. The user is provided with a set of buttons with labels, and the choice of button leads to a different next question depending. Instead of buttons, one might enter some short text, but again this can lead to different new questions by the interface. The text is not intended to be a full sentence, but simply a vocabulary item, allowing a more open-ended set of possibilities than a strict set of buttons without the necessity of parsing. This is the least chatty of chatbots, but like the phone menus may be considered a sequential but non linguistic UI that can be considered a precursor to a more language based one.

It seems strange to call all these bots. I find it natural to call only the conversational bots by the label 'chatbots'. It turns out that marketers have used the term 'chatbot' for all of these. They surely all share some aspects of a chatbot, but it doesn't feel like the name until you're actually chatting.

Wednesday, August 9, 2017

Butterfly in all the languages of the world

Etymologically, some words are universal. The word 'mother' seems to have some version of an 'm' word in every language (despite the counterintuitive experience that 'm' is not usually the first linguistic sound an infant learns to make).

Some words will stay mostly the same within a historical group: pronouns and numbers tend to maintain meaning through centuries of phonetic changes.

Some words are unique to one language when other languages in the family keep the generic. 'Dog' in English is unique to English, but 'hound', from the Indo-European 'hund' (GE)/'canis' (LA)/'sag' (PE) remains elsewhere.

But are there words, or rather concepts, that are unique in every language. That is, is there a concept, such that in every language, the word for the concept is unique to that language and not shared by others?

If the idea that concept and word are not the same bothers you because, well, a word says what its concept is, then the following should convince you otherwise. Wait...instead just consider that a language foreign to you has mostly different words to you for the same concepts. Therefore words and concepts are not the same. Anyway, on to the main topic...)

Consider the word 'butterfly'. Sorry, consider the insect that in English is referred to as 'butterfly'. In English it is called ... yes, yes, I just said it. It's the usual English word made of two words. 'Butter' and 'fly'. There are all sorts of etymological theories:

  • the insect is a fly the color of butter (some very particular species I presume)
  • they hang out near butter
  • they literally 'flutter by' and people are goofy and pulled a spoonerism
  • the word as borrowed from Dutch who called it 'boterschijte' or, translated back, 'butter shit' because the insect's shit looks like butter, again presumably for some particular species whose shit I have not seen).
All somewhat sounding a little too convenient, like folk etymologies rather than scholarly exegeses. Except that Dutch one. Where did that come from?

But that's just English. The fun thing is is that most languages have their own strange fancy word for 'butterfly', seemingly not borrowed from any other nearby language.
  • Romance
    • Latin: papilio
    • Italian: farfalle
    • French: papillon
    • Spanish: mariposa, 
    • Catalan: papallona,parpalhòla
    • Portuguese: borboleta
    • Romanian: fluture
  • Germanic
    • German: Schmetterling
    • Dutch: vlinder (note not boterschijte)
    • Danish/Norwegian: sommerfugl
    • Swedish: fjäril
    • Icelandic: fiðrildi
  • Slavic
    • Bulgarian: peperuda
    • Serbian/Croatian/Bosnian: leptir
    • Czech/Slovak/Polish: motýl
    • Belarussian: matyliok
    • Ukrainian: metelyk
    • Russian: babochka
  • Celtic
    • Irish: féileacán
    • Scots-Gaelic: dealan-dè
    • Welsh: glöyn byw
For every one of these mostly distinct entries (yes, yes, Slavic has a couple of derivatives of 'motil', and Romance of 'papilionem') there is an obscure etymology, mostly made up, just like the English one. The German 'Schmetterling' seems to come from 'schmettern' meaning 'make a loud noise' or 'strike' (butterflies tend to be quiet) but 'schmetter' is from an older Saxon dialect word usage, having to do with milk products, following the old folk belief that witches fly about in the form of butterflies, in order to steal milk and cream. A bit fanciful and sounds like my great aunt made it up. But then 'schmetten' is a dialect word for cream, deriving from the Czech “smetana”. So it's obvious! Cream, butter, butterfly! Which is to say nothing is obvious and it all sounds made up.

The Irish 'féileacán' also has multiple explanations. Maybe it is from 'feileach' which means 'festive' (butterflies certainly are festive) or it could come from 'eitleach' for flying. A possible sound change but not borne out elsewhere in Irish.

So, what's the point? Take any other language from your own. Almost the definition of it being another language is that there's a different word for everything. But for 'nearby' languages, really most of the words are cognate, just changed slightly, and it is only a handful of words that stand out as being different (e.g. English vs Scots English). The point is that the animal called 'butterfly' in English seems to have few cognates even in nearby languages. What is the explanation? What makes those insects so special? And even if they are special (they are!), aren't there other animals that are as special? A bear is pretty special especially if it's running after you. 

Te direction this is going in is that of all the words in the world, 'butterfly' has no cognates among any languages. By looking at the list that is obviously not true: motyl/matyliok, papilio/papillon/papallona, and others. But it does show that the word seems to vary quite a lot, as though a butterfly really brings out creative neologisms in everyone.

Linguistic note: I stopped at the European of Indo-European only because of familiarity and ease in checking. It would be instructive descriptive (that is non-theoretical) linguistics to investigate:
  • other close families like the many close languages of India, Indic or separately Dravidian, or Chinese
  • very close varieties (mutually intelligible dialects) to see if 'butterfly' is so volatile even in very close languages
  • compare other concepts in a structured manner, e.g. one-for-one against mother, five, dog, fly to see if butterfly really is special (or is it a pattern that's not really a pattern and lots of other middlingly common words have a similar situation

(OK I lied at the beginning. 'Mother' is not considered a language universal by any linguist. It is certainly maintained as the main 'mom' word within Indo-European. But any 'm-' words in other languages are considered by linguists to be coincidences. There does seem to be some lexical universals over all human languages but currently there is only considered to be one, 'huh?'...so far)

Tuesday, August 1, 2017

Statistical Rumsfeld: Now We Know!

No, not a poor punk band name, but Statistical Rumsfeld, popularized by his usage but not created by him, is a way of talking about what you know about your own knowledge:

...there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.

- things we know that we know them: this is data. We've looked, and seen, and we are aware that we've looked and seen and verified and removed doubt. Is it 'yes' or 'no'? Look at the thermometer.

- things we know that we don't know. We know we don't know what's behind the curtain. We know we don't know what the capital of Chad is. We know we don't know what somebody is thinking before they tell us (even sometimes afterwards). We know the boundaries of this darkness. We know he range of possibilities. This is like a probability density; we don't know the particular value of a coin flip but, we know that 1/2 will be one side and half the other. That's something.

- things we don't know that we don't know. We have no idea. We don't know how to look for the value, we don't know the distribution, we don't know what the range is, we don't even know if it's a number. Totally unexpected. A black swan.

Something is left out. you have things that you know and things that you don't know, and you can either know that or not. Two things, with two possibilities for each, four in total. The one that is missing is itself: unknown knowns. 

- things you didn't realize you knew. You didn't know you knew that, did you? Unconscious knowledge. A hidden talent you weren't even aware of. The pattern in the data that was always there.

Or better, in a handy chart:

Things
KnownUnknown
Do you know about them?
KnownKnown Knowns:
Facts, data
Known Unknowns:
Parameters, Distributions, Probabilities
UnknownUnknown Knowns:
Unconscious knowledge
Unknown Unknowns:
Hidden Variables, Black Swans


Monday, July 24, 2017

Triple animal metaphors

Lately I've noticed some animal metaphors that come in threes, to express three levels of something. fast to slow, or big and rare to many small.

Cancer: turtles, rabbits, and eagles. Some cancers are fast. As soon as you find out about them, it's almost too late, there's nothing you can do, the eagles swoops in quickly, as soon as you're symptomatic, you only have weeks or months left. Treatment might help out but only by extending life by a few months and may end up reducing quality of life for that extended period. Lung cancer tends in this direction.

Some cancers are slow. A blood or even genetic test shows that you have or may eventually have the problem, but as it is now, you're more likely to dies of many other things first before the cancer does you in. Treatment may stop the cancer but again may reduce quality of life for a much longer period which wouldn't have mattered anyway. Prostate cancer tends in this direction.

Some cancers are in between and even capricious. As soon as you find out, treatment could entirely eradicate the problem (colonic polyps, melanoma) or

Notice how I said 'tends'. Everyone's situation is different, but here are tendencies.

Here's a classic Tufte graph of survivability percent over years for the primary types of cancer:


(from Cancer survival rates)

This chart shows a bit more, the progression of mortality of the disease over 20 years. Just the 5 year death rate shown by the first column shows that prostate and thyroid cancer are turtles, and pancreatic and liver are eagles.



Marketing: Rabbit,deer, elephant. When pursuing sales to a customer, there is a continuum from many small customers to few, very large customers.

Rabbits are the millions of casual buyers: people buying socks, or a game app. Mass advertising and viral sharing are the way to get to these buyers.

Elephants are the huge multinational corporations that will either be your only customer or might just acquire your company altogether.  Knowing someone inside, or huge involvement in national media is the lead-in to a purchase or acquisition here.

Deer are in between. The upscale cars industrial machinery are the objects for potential buyers.






Sunday, June 25, 2017

Drivers, bicycles, walkers in my way

When I'm driving, all the bicyclists are hazards, taking up half a lane, slowing things down, darting between cars. All the walkers are hazards,  jaywalking, crossing against the light, not even looking when crossing.

When I'm biking, all the cars are hazards, passing and then turning right cutting me off acting like I don't even exist. The walkers are hazards, chaotically stepping back and forth like squirrels.

When I'm walking, all the cars are hazards, not stopping for the crosswalk or slipping through the light. The bicyclists are hazards, coming straight at you swerving who knows which direction at the last minute.

Whatever it is, other people are the worst.

Grammar Nazis invade The Gambia and the Bahamas

There was a recent kerfuffle over a slip in diplomatically proper language, referring to the country of 'Ukraine' as 'the Ukraine' which is bothersome to many people in that country. Why it is bothersome is an entire story to itself. What came out of such kerfuffling is educating us all about proper ways of addressing countries, just like war is what teaches Americans about geography. But, out of curiosity, what exactly are the countries whose names begin with 'the'?

But that is the silliest of trivia questions, built on a number of arbitrary settings.

The only official such names? What is official? The CIA fact book? Who died and made them an authority? Oh, lots of spies. On all sides.

There's the official name in dictionaries. There's diplomacy. There is the natural ways of saying things. There is legal specification. And we've all forgotten that a lot depends on what language you're speaking in. There is what you call a country, what the country's name is, what the official name is, what you call the official name, and even what the country just is. OK those last two really are just straight of Lewis Carrol.

In pretty much any language, they use articles like 'the' differently. In American English you go to -the- hospital, but in the UK you go to hospital - for an American it sounds like there was a glitch in the tape and you forgot to say something, for the Brit in the US you wonder which hospital exactly are you talking about. Whenever you take a foreign language,at some point in the grammar lessons there is a section on "Which countries get a weird article in front of them". In English it is 'Switzerland', in German it is 'Die Schweiz'. In French, -all- countries have an article (with gender to remember). In Russian no country (or word) gets a 'the' (one of the very few times where Russian is simpler). Languages are weird, even your own, but you don't notice or don't care because that's just the way you do it. But in other languages even the slightest difference is jarring.

Back to trivia. The answer to the question, which must have artificial restrictions placed on the answers to work 'well' is (what does the CIA factbook say), in English the official names with 'the' are:

The Gambia
The Bahamas

Also 'The' is usually but not always capitalized in The Gambia, but never for the Bahamas. So much for consistency.

Those are their labels on maps and are the only ones with articles before their names on maps. But maps ain't what say what people actually use in language. There are a handful of additional examples for non-maps, for narratives. Possibly not headlines which have rules of their own.

Normal people, even smart ones, will say and use in writing:

The Netherlands
The Maldives
The Philippines

Sure, the title of the Wikipedia page is 'Philippines', but every mention of 'Philippines' in the article is preceded by 'the'. Every. And island groups, which are plural, take 'the' in English, whether countries or not.

And surprisingly few mentions have been made of the very obvious need for an article in:

The Central African Republic
The Czech Republic
The Dominican Republic
The Soviet Union
The United Arab Emirates
The United Kingdom
The United States

which get a 'the' whether abbreviated or not.

Sure, 'the Ukraine' has vague connotations 'that' area of the Russian/Soviet Empire and just isn't used anymore except as an anachronism. Frankly, I don't see how Ukrainians themselves perceive any such negative insinuations, since Ukrainian, like the very similar language Russian, has no articles at all, not just none for countries.

But anyway, we should call them as they ask, Ukraine.

For completeness sake, the countries like Ukraine which used to have a the (again for many different reasons), but just do not anymore, are:

The Congo
The Lebanon
The Sudan
The Ukraine
The Yemen

Those similarly sound colonial in English, and are just not used anymore.

The Crimea on the other hand...

Wednesday, June 7, 2017

When I say "I heard about a study that ..."

Here's a fascinating thing I heard the other day. And how I have to incrementally correct myself in retelling it:
I heard about a thing.
I think I heard about a thing.
I remember thinking what I heard about the thing.
I am reporting what I remember thinking what I heard about the thing.
I am reporting earnestly what I remember thinking what I heard about the thing.

The thing is usually some new finding from a scientific investigation:
It was a scientific study.
It was the final results of a scientific study.
It was the final primary result of a scientific study.
It was a media article about the final primary result of a scientific study.
It was the headline of a media article about the final primary result of a scientific study.

The scientist had an idea about a new pattern that seemed to fit with a new explanation for patterns that other scientists couldn't explain.
The scientist gathered some data.
The scientist gathered some data and noted some patterns.
The scientist gathered some data, noted some patterns, and analyzed the patterns statistically.
The scientist gathered some data, noted some patterns, analyzed the patterns statistically, and wrote a report on the multiple, nuanced conclusions one can draw from the analysis.

And this is all when things are done well.


Tuesday, June 6, 2017

Books That Tell You The Categories Of Books

The classic first section of "If on a winter's night a traveler...' by Italo Calvino, 1979 (tr. from Italian, 1981, by William Weaver):

You are about to begin reading Italo Calvino's new novel, "If on a winter's night a traveler..." ...
In the shop window you have promptly identified the cover with the title you were looking for. Following this visual trail, you have forced your way through the shop past the thick barricade of Books You Haven't Read, which were frowning at you from the tables and shelves, trying to cow you. But you know you must never allow yourself to be awed, that among them there extend for acres and acres the Books You Needn't Read, the  Books Made For Purposes Other Than Reading, Books Read Even Before You Open Them Since They Belong To The Category Of Books Read Before Being Written. And thus you pass the outer girdle of ramparts, but then you are attacked by the infantry of the Books That If You Had MoreThan One Life You Would Certainly Also Read But Unfortunately Your Days Are Numbered. With a rapid maneuver you bypass them and move into the phalanxes of the Books You Mean To Read But There Are Others You Must Read First, the Books Too Expensive Now And You'll WaitTill They're Remaindered, the Books ditto When They Come Out In Paperback, Books You Can Borrow From Somebody, Books That Everybody's Read So It's As If You Had Read Them, Too. Eluding these assaults, you come up beneath the towers of the fortress, where other troops are holding out:

the Books You've Been Planning To Read For Ages,
the Books You've Been Hunting For Years Without Success,
the Books Dealing With Something You're Working On At The Moment,
the Books You Want To Own So They'll Be Handy Just In Case,
the Books You Could Put Aside Maybe To Read This Summer,
the Books You Need To Go With Other Books On Your Shelves,
the Books That Fill You With Sudden, Inexplicable Curiosity, Not Easily Justified.

Now you have been able to reduce the countless embattled troops to an array that is, to be sure, very large but still calculable in a finite number; but this relative relief is then undermined by the ambush of the Books Read Long Ago Which It's Now Time To Reread and the Books You've Always Pretended To Have Read And Now It's Time To Sit Down And Really Read Them.

With a zigzag dash you shake them off and leap straight into the citadel of the New Books Whose Author Or Subject Appeals To You. Even inside this stronghold you can make some breaches in the ranks of the defenders, dividing them into New Books By Authors Or On Subjects Not New (for you or in general) and New Books By Authors Or On Subjects Completely Unknown (at least to you), and defining the attraction they have for you on the basis of your desires and needs for the new and the not new (for the new you seek in the not new and for the not new you seek in the new).

There's also a murder mystery.

Thursday, May 18, 2017

Non-literal 'literally' is not alone in contradicting itself

Literally means taken word for word. "The debt incurred was literally billions" probably means that the value 'billions sounds like an exaggeration but I want to emphasize that it is no exaggeration, that the actual value was in the billions.

But people use 'literally' all the time in a non-literal fashion, as an intensifier. "That party man, that house was literally on fire!" probably meant the the house was quite enjoyable, not that the local firetrucks were dispatched. This usage is not the literal meaning of literal, and it's not really the opposite (it's not saying 'hey this is an exaggeration' but rather 'hey check this out!').

If one is speaking informally then go wild, us literally to mean 'hey check this out'. It's common enough to be understood that way and in such instances not likely to cause misunderstanding. But in formal use, where you really want low ambiguity for transfer of information, then you may even want to avoid 'literally' because it might be misleading: people who don't know the literal meaning of 'literal' might be misled into thinking you're exaggerating or just pointing out some outrageous thing.

Literally applied to itself literally doesn't mean literally. It's a snake biting its own tail. Literally. if the word is the snake and interpreting the meaning is biting something, in this case itself. OK that was way too literal.

Except...
The commonly accepted formal meaning of the word literally, that is, word for word or actually, is not itself a very literal meaning. If you want to be pedantic, as 'literally' is practically asking you to do, the source of 'literally' is via the Latinate for letter, so it should mean something 'by the letter'. This is itself a figurative use. You're not caring about letters but about words (maybe that's too pedantic). You're not caring about words but about primary meanings. Which is a figurative reading of word for word.

the following is  phrase that could be taken metaphorically but in this case the words describe the actual situation.

So literally does not itself have its own literal meaning.

And as self-contradictory as this is (how could we let this go so far?), this is not a strange new bizarro-world mind-bending stand alone example. There are a number of own-tail-biting words.

Really. I mean 'really'. I mean 'really' is an example of a word that is an analogy of this literal vs figurative use whose literal meaning is itself. 'real' means extant or existing or not-fake. 'really' really means 'a lot' or 'very' or 'much', not an exaggeration but an intensification. "It is really hot in here". Sure it is probably hot or at least warm. "our attitude is really getting on my nerves" means it is probably annoying, not that you have exposed neural material that an attitude is physically on top of.

This is very true. Very true. Well, no more true than what true is. 'Very' comes from French (via the Norman Conquest). It is cognate with French 'vrai' for 'true'. over the course of two hundred years after the conquest, there was an influx of huge number of Old French terms. 'Vrai' slipped over to mean true (foreign language usages are more likely to be 'repurposed' (i.e. misused). There is a bit of the etymological fallacy here, that the current meaning of a word should be what it used to be. There is no doubt that very means very nowadays. Maybe a little doubt in 1200AD in London.

But truly 'true' has the literal meaning of  'that which is the case'. Etymologically though it means all sort of things like 'faithful' or 'honest' and only became the opposite of false probably after 'vrai' became 'very'. So semantic drift happens, but that doesn't mean the drift was wrong or incorrect or bad or led to the downfall of civilization.

This is not to say that I like the non-literal use of literal. It hurts me (not literally) when I hear it used non-literally. It's so obviously intended to be meant literally and a non-literal usage just contradicts itself.

Monday, January 16, 2017

What is artificial intelligence?

Most people think of artificial intelligence as a walking talking robot that looks and acts almost human, and that this is the goal of AI, to make a simulacrum of all that a human does. But there are a lot of nuances to this, well, frankly a lot of major differences with what artificial intelligence really is.

I am writing this to clear up a lot of misconceptions that I read about AI, not to give _a_ definition, but rather to give _some_ definition to it.

So first, a first attempt. Artificial Intelligence is a computer method (rarely mechanical) that seems to do something only a human could do. A talking responding machine is AI, but a walking machine, with articulated legs, is not. Unless... well there it is, sometimes it is the application that is AI and sometimes it is the methods. And a chess playing machine....well is that AI or just really good engineering, exploring all possible game possibilities quickly?

We already have two nuances: how an AI is engineered, and what does the label apply to. The methods of AI engineering fall into two broad areas: human simulation, attempting to mimic the internal biological/psychological processes, and direct engineering, doing what it takes no matter what to make the engineered device act externally intelligent. A language parser uses scientific theories of linguistics to mimic how the brain is supposed to manipulate the text of a sentence to determine understanding. A chess playing machine uses alpha-beta pruning of deep game trees, contrary to the usual method people which is to look at only two or three moves deep and vaguely judge 'strength' of a position based on experience.

A lot of what was once considered AI is now considered plain old engineering. Once you see under the covers that the method used to look outwardly human is only a boring step by step recipe or lookup table, it loses it's 'wow' AI appeal and just seems like regular non-AI computing. in the 1600's, calculating machines were magical and might have been considered AI (if that were a thing then) because most people thought that people themselves doing it was magical much less a machine (OK this is a bit hyperbolic. what is 'most'? who are these people?). But you get the idea. A machine doing chess is magical (or has a small hidden turkish chess master manipulating the controls within). But Deep Blue in 1995, which beat Kasparov the chess grand-master, was pretty simple just using search trees, no ethereal silicon-embodied intuition ('just' = very rocket-sciency search trees and some hand-curated openings, gambits and end games).

An AI, with the indefinite article, is by popular usage, the thinking talking machine. Because of the potential for confusion the term for that nowadays is usual referred to as an AGI, Artificial General Intelligence.

But now to more substance. All along we were going along in vague style, not really knowing what AI really refers to concretely, relying on a presumed common idea of AI without really knowing. So here are some concrete examples of what AI is.

First, applications, which is what externally looks like everybody's general idea of AI

  • vision - converting digital images into words 
  • language - speech recognition, natural language processing and understanding, chatbots
  • games - chess, go, crosswords
  • problem solving/reasoning - word problems and puzzles, expert systems
  • planning - taking a set of goals and initial conditions
  • robotics - not the mechanical part of for example an articulated hand grasping an object but coordinating a path over a complicated landscape or separating parts on an assembly line
  • learning and memory - most of computer science has been intent on solving the memory problem and has done it well. Financial statements, medical records, airline reservations, online libraries.

And then there's tools and methods, how it's actually done, the man behind the curtain.

  • logic and other classical mathematical methods - for reasoning
  • probability/statistics - most machine learning methods are within the realm of stats
  • combinatorial algorithms - when a perfect algorithm for one domain works most of the time for another, then it's called a heuristic
  • decision trees - chess? one big decision tree? Yes.
  • optimization - specifically numerical algorithms for optimization like simplex or A*
  • cognitive science - cognitive psychology, neurology (brain physiology, neurons), linguistics, so you know how it works in actual biology, so you can either be inspired by it (neural networks) or simulate it (parsing a natural language according to linguist's rules)
Apps and tools are orthogonal; an app can use different tools. So NLP is one application area of AI, and many tools are used by it: machine learning using large language corpora, and independently linguistic knowledge.

The man behind the curtain is an apt metaphor for AI because like in Oz, the apps look magical, but it turns out it is, well, not exactly charlatany; the methods are so...mechanical and spartan and inhuman and small tricks. For example, crosswords seems like you need to have not just broad and deep knowledge, but also clever ability with associative memory and puns to read behind the clues. The recent very successful AI methods for crosswords is just to use a dictionary search and almost ignore the clues themselves, by any means necessary no matter how cleverless.

With all that said, the motivation for this was what I find to be a lot of misuse of all these labels, one term being used for another. So here's a bullet point summary:

  • AI is what you call the big show, all the different things together
  • AGI (Artifical General Intelligence) is a very limited example of an AI app but the most salient, the 2001/HAL intelligent like spoken interface. This seems to be like the end goal of AI but currently unrealistic, or rather very realistic with very limited expectations.
  • ML (Machine Learning), which is mostly just statistics, is a tool to quickly create AI apps from lots of data without having to rely on a domain expert
  • ANN (Artificial Neural Nets) and DL (Deep Learning, which is just a big NN) is just (of many) ML techniques