Friday, December 16, 2016
Practicality vs Esthetics in DataViz
Sometimes we want something that is esthetically pleasing and superficially practical but not necessarily perfectly practical, like a watch with no face numbers. The esthetics is the desired intention.
That's all philosophy diatribe to justify something that bugs the crap out of me.
Basically wordles are the worst. And periodic tables (except for The Periodic Table which is the best). And usually subway diagrams (except for actual city subways). Here's one for semantic web technologies:
This! This is the worstest.
If you know what each of the entities are, and you make all sorts of qualifications, maybe this makes sense a little. It makes only the slightest bit more of sense if 'A above B' means A is 'built on B'. But then there are all sorts of 'If you did that, then why did you do that?' questions (why is encryption and signature off to the side only for some, why is logic and proof separate, is Unicode really such a huge important base technology, etc etc). Wait, isn't a namespace a particular kind of URI? There are many variations on the 'Semantic Web Stack', but each in its own way has all these "I don't get why they did that?" problems. This is all about esthetics (Nice color combo!) and little to do with imparting coherent information. No, you will not learn anything from this. Wait...what the hell is 'signature'?
Trust vs Depend
"I can't trust him to complete the project on time"
'Trust' is used in two different ways. One is the usual opposite of falsehood. If you can't trust them, they are a liar. This presumes intent and is almost demonizing.
The other way is dependability. Trust of the outcome. If you can't trust someone this way, it's not a reflection of their evil intent but about ability to execute. This is very different from falsehood. You can actively do something about this.
The other way you can only seek other sources of information.
So instead of 'trust' 'use 'depend on'. 'Trust' makes it sound like you think they're lying. 'Depend' just means there is doubt without judging.
Friday, November 18, 2016
Effect size versus statistical significance
This difference is presented often laconically ("Using Effect Size—or Why the P Value Is Not Enough") as:
Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them.(Kline RB)
This makes it sound like you have two things that can be presented, and one is much more important than the other. But it's a false dichotomy. You want both. The magnitude is descriptive stats - how big it is. In an experiment on n individuals, fish oil tablets increased memory performance by 10%. If you don't know the effect size, what exactly beyond 'better' do you know about the phenomenon? Statistical significance is trust - how (mathematically) representative the sample is of the population. You can claim something is better but can you really trust the claim?
It's very easy to see how to manufacture a high statistical significance but low effect size - increase the number of instances. In fact, as you increase n, almost all statistical tests asymptotically approach statistical significance (for real world phenomena). Chi-squared is the worst!
A consistent high effect size (over samples) leads obviously to high statistical significance.
But it is possible to have high effect size and low significance.
So in the end, it is not one or the other. Both should be presented. The effect size tells you how different the sample shows phenomenon is, and the p-value tells you how much you can trust the sample that showed the phenomenon.
Wednesday, November 16, 2016
Annoying Sciency Tropes: Big effing number
The yearly output of carbon dioxide gas into the atmosphere is 50 bajillion tons. Wow, that must be bad because a bajillion is a lot. (Also 'tons'. You can have a ton of air? (of course you can that's physics, but it is counterintuitive enough to simply leave the reader with the simple incoherent feeling of 'wow').
The number of deaths due to the Iraq War of 2003 was approximated at 600,000. Of course that is terrible (any such death is terrible). But is it reliable? Is the scale right? How was the number arrived at? What groups are in that number? Is it overcounted? Undercounted? Adding a zero hardly changes the impact of the story but is still wildly inaccurate.
Million, billion, trillion are hard to distinguish. They're mostly 'really a lot', 'really really a lot', 'that sounds like a lot'.
I realize I'm giving these without context, but the point is that often news stories lack all context too.
There's a little bit of technical obscurantism going on (is a nanometer bigger or smaller than a picometer?) which expects education; that is, it is questionable whose fault this is, the one using the technical term or the one reading it. If the reader were educated, this is the best most accurate communication, what technical language nuances are created for. If the reader is not educated in these nuances (which are not nuances to the initiated), then what?
Part of the annoyance is that this is usually combined with a Base Rate Fallacy; usually no comparison data is given - no comparison with the total or comparable items, no context. For example, the debt of the US government is given (latest number) in news stories as $14 trillion. Obviously this is a big unfathomable number, but also there is nothing to compare it with, either the historical debt (what the trend has been over the past few years, what the debt in other countries is like).
What's the solution? For the reader, look outside the article for the base rate or trend. For the writer, supply that! Give something to compare with.
Tuesday, September 27, 2016
"Probabilistic programming languages" aren't
As someone who likes a little consistency in language use, for words to have meanings you can mostly rely on, I am bothered by this usage (just as I'm bothered by the similarly mystically enticing marketing term Deep Learning). Here is a very representative description of PPLs:
Probabilistic Programming (PP)from a blog article on PPL (which also tries to introduce new but uninformative terminology, MPML).
There’s a revolution in Computer Science called Probabilistic programming (PP) where programming languages are now built to compute with uncertainity in addition to computing with logic. This means that existing programming languages can now support random variables, constraints on variables and inference packages. Using a PP language, you can now describe a model of your problem in a compact form with a few lines of code. Then an inference engine is called to automatically generate inference routines (and even source code) to solve that problem. Some notable examples of PP languages include Infer.Net, Stan, BUGS, church, Figarro and PyMC. In this blog post, we will access Stan algorithms through the R interface.
I expect words to mean things, and despite liking metaphorical usage in literature and expository writing, not calling a technical thing what it is sounds too much like slimy obscurantist marketing practice. If it is misleading in any way, it is suspect. Suspect maybe not in venal terms, but more likely suspect in intellectual depth.
For the record, the difficulties in the passage above are:
- There's no revolution, not in computer science, not in programming languages, not in AI. Maybe there's some recognition that there is some progress in usage, but it is incremental.
- No new programming languages are being built. No programming languages are being modified to accommodate new probabilistic data types. This is the biggest clunker. There's no new programming language thing at all. What is new is packages or libraries or functions, in the existing programming languages. PyMC is a library written in Python, and used in Python as native Python. Stan is written in C++ but it is not a syntax/semantics, just a library that is accessible from existing languages (R, Python, Matlab, Julia, etc).
- The idea of operating on distributions as a type is not actually new. Mathematica and Maple have had object oriented implementations of distributions, allowing operating on those distributions functionally. What these PPL packages add is approximation algorithms to compute values for Bayesian inference using Markov-Chain Monte-Carlo (MCMC) , which is fancy talk for calculating a number approximately. Pretty much very analogous to computing a p-value.
- All these PPLs are just library add-ons to existing languages. So in that sense don't worry that you have to learn a new syntax. You surely will have to learn how to use the library.
- It's not about probability in the large. Most all languages have probabilities already (restrict floats to the range 0:1). Some people are creating packages that make it more easy to use probability distributions (which some languages already had libraries for), and to manipulate those distributions (and make statistical inferences from them. But, no, it's not a revolutionary new alternative to languages with logic using probability. It might be a revolutionary library of functions that will make manipulating and computing with distributions and models easier, nut it's not a new language.
Thursday, December 31, 2015
What's with dictionary definitions for metaphorical usage?
I have noticed some online dictionaries giving metaphorical definitions. By this I mean that for a word, giving a meaning entry that is metaphorical, not its literal meaning.
For example, 'to devour'. Without checking, this means to eat ravenously. But it's easy to see that, say, a paper shredder could be said to devour some documents,
You may well note that many words, rather most words, really almost all words have multiple meanings (except for highly stipulated technical terms, and even then things can get loose). Our perception is usually that a word has one meaning and that's that. But then we notice that, well, that same spelling can be used for more than one distinct concept, usually nearby.
You may then well note that for many words, there really is a primary meaning: its meaning out of context that everyone thinks of first, and then secondary meanings, ones that appear in different contexts, that are slight extensions of the primary meaning, or used in analogous situations, not literally.
Here is the example for 'devour' from google:
de·vourThe first is the primary definition, the second a metaphorical one, the third... huh? That is definitely not what 'devour' means. Sure, one can easily use it in 'I devoured the sequel' meaning that I read the sequel quickly and eagerly. But that's not the meaning of 'devour'. That's not what 'mean' means. It's too specific. Does the omission mean you can't watch a movie voraciously? How come 'reading' is more devour-like than other metaphorical uses? This isn't right! If you include read, you should include every other possible metaphorical usage. But of course that is too laborious to imagine.
- eat (food or prey) hungrily or quickly.
- (of fire, disease, or other forces) consume (someone or something) destructively.
- read (something) quickly and eagerly.
The difficulty I'm having is the demarcation line. When does a reasonable metaphorical usage of a word become dictionary-entry-worthy?
Taking the title word 'incensed', which was not deliberate, its primary and only definition is around 'angry', and no mention of the ostensible literal meaning which might have been 'burned like incense'. It already is a metaphor. The only definition is non-literal. So putting in metaphorical usages is necessary. At what point of semantic drift, at what point of leaving the original does a dying metaphor become dead, and at what point does the altered meaning move from quantitative difference to qualitatively requiring a new entry?
A close analogy is with suffixes. You can take any word in the dictionary and find some suffix that applies that will create a perfectly good word. 'Neologistically' is my favorite. 'Neologism' to 'neologistical' to 'neologistically'. Probably not in any dictionary, but perfectly understandable, sounds like a word, and is (arguably) undeniable as a word. Does it need to be in a dictionary? At what point do lexicographers decide not to include a possible variant?
There are a number of possibilities. Checking multiple dictionaries, most don't have the strange 'read' entry, only Google and Macmillan. What I suspect is that there is a tendency to require definitive alternate usage for an additional entry to be made when the entries are edited by humans. And that Google and/or Macmillan introduce metaphorical entries mechanically and its easier to be lenient. The latter two dictionaries certainly need human oversight; that is, the 'read' entry isn't a mistake but a lower threshold.
This will require looking into the editing policies of the various dictionaries.
Wednesday, September 23, 2015
Where is the universal electronic health record?

.

![]() |
http://www.theplaidzebra.com/first-manned-mission-to-mars/ |
Friday, September 4, 2015
The Turing Test - like magic!
Clarke's Third Law: Any sufficiently advanced technology is indistinguishable from magic
It is great fast thinking on Turing's part, going quickly to a workable solution, cutting out lots of junk rationalizations, don't concern oneself with the infinite hypotheses of the underlying processes, just go for the jugular of what you have, the surface behavior and believability.
But frankly it is no different from bald anthropomorphism; if the animal acts like a human it must be human-like more deeply, with the lesson that doing so is usually not very successful. (But contrarily, a subject for another time is that I think many vertebrates share many cognitive abilities of humans, and also contrarily, some behavior that is usually considered special human intelligence may have very low complexity biological mechanisms that underlie them).
Not only is the Test the basis of countless scifi plots, but also countless dumbed-down explanations of artificial intelligence machines.
If it acts like a human then it -is- a human.
Basing success on limited explicit experience rather than looking behind the curtain and seeing the design? That is just plain idiotic. It is denial of common sense. The true test of whether something is artificial or human is by looking behind the curtain, by looking inside the blackbox, by seeing how it is designed. The design is the thing that should be judged, not the paltry examples.
Finite behavior doesn't define essence. The essence defines essence. Sure there's a lot more, the rules create the instances and anecdotes which are telling, but it's the all possibilities that are relveant, not just the small handful of instances.
A counter argument might be that telling essence is not the point, and that knowing essence is not available. So that experience is finite and is all that can be known sometimes (you can't always look inside the blackbox).
Here are two analogies that express my point. Generating genre texts with ngram probabilities (using markov models or deep learning), and generating biological objects using fractals. Here's an example of generated text (from Paul Masurell):
Pride and Prejudice, Jane Austeen.
I do not. It is very much at Pemberley. The idea of their all walking out. I must acknowledge to you. When I do not marry Mr Collins had promised herself. But you have the carriage might be copied. It would look odd to be proud. You are perfectly good. Elizabeth was at the sight of Miss Darcy was delighted. You have no objection to my charge. I know not. Lydia was urgent with the keenest of all. Mr Collins, you puzzle me exceedingly. But, my proposals will not go. To the rest. But, to much conversation, no traces of them.
The results look vaguely like the real thing, and could totally pass for reality (as long as they're not inspected too closely). Also, some humans using all their own skill can only each this level of coherence. So this is a terrible example? Turn up some dials and it gets less and less 'wandering' and more coherent.
Here's another example: Fractal trees. take a visual object like a line. Tack on smaller versions of that line to itself. Repeat to each smaller tree ad infinitum. You get a fractal tree like:
Depending on the rule, the 'tree' can look fluffier or sparser, and more regular or irregular. And it looks so much like a real tree:
(from Sarah Campbell)
And one could go the other direction and say that nature is implementing a recursive algorithm to grow its trees. But this is obviously crap. It certainly looks like a fractal, and I'm sure there are biological processes that can be modeled by some limited nesting (see the Chomsky/Everett disagreement over Piraha). But we know the fractal trees are not made by biology but by an algorithm, and that similarly a broccoli shaped tree whose trunk and branches and branches of those has to stop at some depth to give leaves.
It's like magic tricks: they work on the toy problem (having a card you're thinking of pulled out of a just-cut-up lemon) but don't generalize at all to anything beyond.
So you can make an elephant disappear on stage? Make it really disappear. It all looks right that one time, but is not repeatable because the reality isn't there.
Here's another example, IBM's Deep Blue chess playing program. So what if it wins against a human? (or plays at all). It's not magic. It's simply following game paths. Many game paths.
The Turing Test works in very limited contexts but is superficial.
Any sufficiently advanced technology is indistinguishable from a rigged demo. James Klass
Friday, August 28, 2015
Just Stop It. Website complaints
Stop it. Please just stop it.
Website designers, stop adding crazy stuff and stop changing my defaults 'for' me:
- Stop changing things that I can set up locally. Allow me to set the font size rather than fixing it to what you think is best. Don't change the scroll speed on me, I set it already the way that's easiest for me. I don't want to swipe to move down a paragraph but then you make it skip a page or two.
- Stop it with all the moving images. just a few are OK. Well no not really. One is already almost too much.
- Stop it with the audio. I'm listening to something else. Also, with multiple tabs that I move around, your audio is randomly starting when Im not on your tab. Then there's a frantic search for your goddam tab to kill with a vengeance and remember never to visit anything of yours again.
Tech rationalization: All these things also take up lots of memory and processing time on the local computer running them. Also, they waste my time.
So stop it.
PS IMDB, you're the worst. I love wasting my time on your site. But I don't want to waste away my time-wasting time on waiting for your candy crap ads to load. I want to see them immediately or move on to finding out what the movie was with the thing that that actor (who was in with the actress from that TV show (no not that one, the comedy, no the other more serious comedy) who had that thing happen to him. It was a couple years ago. I think it was a remake?
Wednesday, August 19, 2015
Even docs replaced by robots? Only for boring operations
A new automated anesthesiology device has recently made the news: Automated anesthesiology for colonoscopies. There's the obvious fear of high-priced docs losing their jobs "How dare they assume a machine could replace a physician with years of education and knowledge?'.
But for the moment, what's the situation? Colonoscopies for polyp screening and removal are very routine procedures. For the colonoscopy part, only 5% of patients have a polyp removed. So most of the time the GI doc is doing boring work, looking for polyps that mostly never there.
And similarly for the anesthesiologist except moreso. Even if the GI doc find polyps that are removable, that doesn't change the sedation. If something is found that needs more than just the colo tool, then hey, we ain't doing that here, we're backing out anyway, no need for more anesthesia. All they are doing is conscious sedation over and over and over again.
Every patient needs oversight. Things go wrong. "I didn't know the patient would have a seizure, allergic reaction, is used to the sedation drugs" These things need tweaking. For the most part, the every day stuff and these few weird things are extremely well-known (there's been a high tech assembly line of patients getting colonoscopies forever!). So this is the perfect place for automation to both reduce cost and time and effort. And the machines are going to have extra sensitive alarms, a good buffer to stay away from the bad situations.
There'll still be a need for lots and lots of physicians, don't worry about it, freshly graduated MD. Hopefully family practice, where the real medicine happens, will become more respectable = more highly paid, because it is already high in demand but nobody is going into it because it won't pay for med school tuition loans.
---
The whole point to science is to make things repeatable.
The trend then is that if you do something enough times and for what variation there is, it can be parametrized, then it can be automated and packaged.
We do it for medications: an expert gives very simple instructions on use, and then you do it yourself. Simple first-aid for even life threatening situations doesn't need to be handled by a full physician. Anyone who can read directions and gets a couple hours training can do CPR and use a defibrillator.
Medicine is progressing towards knowledge constantly. Radiology is miniturizing image taking to the point where soon you really could have a Star Trek tricorder to wave over someone to see and judge any internal problems.
Look, there's already the DaVinci robotic surgeon. Of course it doesn't do every thing and needs to be operated by a full surgeon.
(from Medical Devices)
But, soon enough you'll be able to go to your local drugstore and go down the pain-relief aisle, turn on the cough and cold section, then come to the Surgeon-in-a-box aisle:
- Wart-Removal-In-A-Box - wait, don't they have these already, some freezing solution?
- Stitches-In-A-Box - for non-serious cuts that are too deep to heal themselves, place the box opening over the wound and the sensors will be able to see where to close up. Applies flesh knitting goop reducing scarring (Dermabond, based on superglue, it's real).
- Colonoscopy-In-A-Box - you'll still need to take the prep, robots can't see through poop either. Send to the lab any polyps removed in the enclosed vial.
- Lasik-In-A-Box - just place against the affected eye for ten seconds and hold your breath.
Saturday, August 15, 2015
There -are- realistic moon base plans
I lamented the lack of moon base plans recently, but it was an error of not looking around enough.
Recently the European Space Agency got a new director, Johann-Dietrich Woerner, starting July 1.
But even before he started, he had stated his plans for what to do next on the way to other space plans
"the moon station can be an important stepping stone for any further exploration in deep space,"
He states this in the context of ESA's targets after the ISS project finishes.
"In any case, the space community should rapidly discuss post-ISS proposals inside and with the general public, to be prepared,"
I can't tell yet how these plans relate to NASA's stated plans for manned mission to Mars.
Friday, August 7, 2015
What is wrong, terribly wrong, with wordles
I can't stand wordles. They're so mindless and dumbing down. Any good text will have a variety of vocabulary. frequency is misleading, texts are not just dumb bags of words.
These are extremely tendentious. I believe them both. But what I'll explain is what is problematic with them as data visualization.
What's a wordle? Also known as a tag cloud or word cloud, it's a graphic design method that takes a document, determines the frequencies of the unique words in that document, and mooshes the text of the words into an image, some vertical, the size of the word text in proportion to its frequency in the document. So from some document we get the dry list of individual word frequencies:
Wordle 127
word 35
words 30
cloud 28
students 22
clouds 22
Day 18
lessons 12
fused 12
adjectives 6
historical 5
classroom 5
even 4
see 4
...
This can be converted into a barchart:
which is the Zipf curve of the document.
Now comes the cool graphic the wordle. instead of boring bars, make the word itself and its size tell you how important it is. Mushing them all together and letting the natural instinct of readability draw your eye to what's important:
It is certainly esthetically pleasing, a bit Mondrian, with a jazzy visual rhythm. The algorithm to lay out the words is clever in simplicity, and the resulting image allows some simple inference about a text.
But what is the point of a wordle and how successful is it for what ever points it might have?
If the point is that it is a piece of art, then I've made a case for it already. A new wordle for each new document is a bit derivative though, with too many barely distinguishable varieties. One here or there is great, but a number of them is numbing.
How is it as a data visualization? How well does it relate the data?
The ostensible purpose of a wordle is to show you the relative frequency of words in a document. What is actually done is to show you the obvious top two or three most frequent words. All other words are essentially ignored.
That may very well be the best part of the wordle, that it presents essential information (the two or three most frequent) in an esthetically pleasing manner. The size of a word pulls your eye towards it because it is easier to read, and if it is readable, there's no unreading it (it forces its meaning on you).
- the eye is encouraged to dance around. this may account for the esthetics, but it is an annoyance for comparison.
- Vertical presentation of a word almost guarantees that you can't read it.
- comparison of size is even more difficult than a pie chart. two words not even exactly next to each other are difficult to compare (the word length itself is not the frequency but it accounts for the relative noticeability.
So really the information that can be pulled out of a wordle is: the most frequent word (which does usually outweigh all others in most documents), the second and third most frequent, but you're not sure which is which, and maybe one or two in the top ten but maybe you missed some.
Under this analysis, this is a Type V error in Fung's Visualization Trifecta Checkup, where the data and questions are well defined, but the visualization (the V) just isn't right.
So instead of complaining, what would be a better method, one that would actually address the stated purpose of showing relative frequencies?
The simplest (and least graphically pleasing) is the source list of stats: a text list, one word per line followed by its count in the document. Because numbers themselves are hard to judge easily in a list (but lengths are), maybe using a barchart sorted by frequency, and then maybe cut off at about 10 or so. The screen space taken up by the frequency list is about the same as the wordle image itself and allows extraction of a lot more information. All the information is in this list, and it is all readable, and all comparisons can be made very easily. Surely there are frequency questions that can be asked that are not easily answered by the list, but what might be slightly difficult for the list is impossible for the wordle.
What this says is that wordles are really good at showing you the top couple of words in an esthetically pleasing manner; what it puts in your head is mostly 'X is the most common, and Y is maybe a little less common' and thats the extent of its specificity.
But if you want to know even minimally less vague comparisons, and more than 2 words, a wordle does not do it that well.
Or to put it more bluntly, a wordle is popular because it is beautiful, not true.
TL;DR: A wordle is estheticaly pleasing but is not even as good as a piechart for transmitting information.
Monday, July 20, 2015
Deep Learning is not Magic Learning
Any sufficiently advanced technology is indistinguishable from magic. Arthur C Clarke
"Deep Learning is Teaching Computers New Tricks"
"Andrew Ng: Why 'Deep Learning' Is a Mandate for Humans"
Holy crap! Use Deep Learning to create new ideas? You may be thinking that I'm being too harsh; of course article and title writers stretch things out to be more provocative, details left to the gross middle of the article that no one reads. Well, then, yes, I'm being too harsh, not because the details are left out, but because their implications of the details are ignored.
Deep learning is not Magic Learning. Deep Learning isn't what its name says. It is 'just' a more complex (= many more layers than traditional) neural network (which is itself not exactly what its name say, it is 'just' a set (OK I'll grant network) of linear regression models, where some depend on others. It's not magic. It's not human like learning or deep cogitation on concepts. It is just a mathematical model. It can distinguish two almost identical things. It can identify one thing out of many. But that's the all that the technique itself does (just the best in a long line of similar techniques). It (like many other techniques: logistic regression, decision trees, random forests (ooh..they're magical! Their names are so exotic!) needs to be put in a larger framework (like in a process that determines the outlines of faces in a set of cat pictures, or splitting words n a speech to text analyzer). By themselves, there's nothing magical.
This is not to say that there's something wrong with Deep Learning. On the contrary, it is a great recent development, with lots of successes (which is exactly what happened to its simpler self in the late 80's). But in the end it is 'just' a regression model, either saying yes or no to some inputs, or calculating a complex function. But that's it. It is not 'an' artificial intelligence, responding and implementing our requests like a valet. It's just (one of the more) recent advances in discrimination methods. It is an important part of the field of artificial intelligence, but not the entire thing.
Is extracting 100's of initial petroleum products (fuel, plastics, lubricants, medications, etc) magic? not to mention 1000's of downstream products created from manipulation of these?
Frankly Siri is closer to magic because at least 40 years of electrical engineers and phoneticians have worked on converting sound waves produced by a humans oral and nasal cavity, modulated by teeth and tongue, into readable letters.
Deep Learning is not magic. They are a great development in neural networks (an incremental development (a very big incremental development)), but they're not magic and they won't make you your toast for you in the morning.
(this morphed from the inarticulate unfinished beginning of a rant I had planned about ML (Machine Learning). And NLP (Natural Language Processing (not Neuro-Linguistic Programming which actually is horseshit))).
Tuesday, June 9, 2015
Why are there no moon base plans?
Every president since Bush Sr (wait did Obama mention it?) has promised to put a man on mars (wait, did -Clinton do it?).
It seems like these big media plans are almost as common as plans to create a high speed rail line between NYC and Washington (or San Francisco and LA, or Chicago and St. Louis). Every new governor
I'm all gung ho for every sci-fi inspired space plan: mining asteroids for precious resources, terraforming Ganymede for farming, solar sails to travel among the planets.
But.. this should be sci-fi inspired engineering, not science fantasy. Wouldn't it be more cost effective and profitable and room for learning more about engineering around off-earth environments if we went incrementally? There is a space station, a bit smallish, with worldwide support. Shouldn't there be some intermediary step, like a moon base?
First, an efficient transport mechanism to a low orbit space station, via rockets or space elevator or what have you.
Then maybe an intermediate high orbit one.
Then a minimal lunar base.
Then lunar L1 and L2 satellite stations.
Then an expanded lunar base.
...and a whole bunch of intermediary supply chain steps, not just to support a permanent connection (realistically, we don't know if we'l be able to support that in the long term), but to support exploitation of those intermediate steps as ends to themselves.
Then, once all that's done, a visit to Mars (because all those previous items will make the trip that much easier. Don't blow a shitload of money on a one-off to Mars. Make it realistically attainable.
Also, in parallel (and maybe with more money than carved out for a manned mission), that much more robotic exploration. Let the machines die first. It's less expensive and less upsetting and demoralizing.
Oh. I'm sorry. There -are- plans for a moon base.. But I have no idea if this is part of a grand plan.
Also, what's the business plan other than 'Holy shit this will be cool'? (I'm all for that business plan, but my funding is in science fiction dollars)
Sunday, August 28, 2011
Math error in news: divorce rates
The statement in questions was worded something like this:
The South has one of the highest rates of divorce in the country. One reason is that it has more marriages than elsewhere.Sounds plausible right? Only if you redifine the concepts of what you are hearing. This is an egregious type mismatch of a rate to a number. a rate is the ratio of the subset to the whole (whatever the whole is), and a number is..well... it's just the count with no division going on. The rate is presumably the number of divorces per capita (entire population of the region).
The statement, as is, is inferring a number (more marriages) from a rate (higher divorce rate).
So maybe you have a large number of divorces and that can be because here is a large number of marriages (which may or may not be because of a large number of people). That is a reasonable inference to make.
Or you might have a large marriage rate leading to a large marriage number in the region and (assuming people tend to get married within a region) this could lead to a large number of divorces in the region, and so immediately a large divorce rate.
But note this is all relative. A region could have a large divorce rate but small number of marriages or divorces. (or contrapositively, a lower -number- of marriages and high divorce -rate-). Much too unspoken is the relevant contexts for ratios and number comparison.
I don't think this is shoddy math exactly just shoddy use of language (which arguably -is- shoddy mathematics).
First, disclaimers: this is a paraphrase from memory, and I cannot find a transcript to corroborate my hearing.
Wednesday, October 28, 2009
Something about the real world: Bagels at Finagle-a-Bagel suck
And I mean it to sting.
To make this much more than what is a simple complaint (that Finagle-a-Bagel bagels suck and get to that out on the web), let me continue. I now see the arbitrary authoritarian desire for an appellation committee that decides what is what. To mix many philosophies, word meanings are totally a social construction (to be useful, people have to 'agree' and act like they agree) but with a necessary private language (internal theory). Humpty Dumpty can't go around saying 'those things you get at Finagle-a-Bagel with the hole in them that taste sorta muffin-like'...well, actually, yes he can, but it just won't catch on, not because of semantics but because people aren't time-wasting idiots. If everybody calls them bagels, then that's what you'll call them, even if that label doesn't evoke the properties (in your head) that you normally associate with things that you call by that label.
Like how 'white chocolate' might be liked by many people, but... it ain't chocolate.
In a completely different way, I don't get bagels at Dunkin' Donuts. I don't expect them to have good ones. I don't go to FaB for muffins ...
Which is all to say... Finagle-a-Bagel bagels suck.
Now if only I could direct all this energy to the positive....
Friday, June 13, 2008
The invisible character bug
e.g. a file like this:
I get it like this:dfasdfasdfaasdfsregaregeagrerg242342423ytuyutuy qqweqweqweqsdadsasdasdasdzxczcxzcx
Fine. So I can't just remove newlines, so a simple sed oneliner won't work. But a little looking on the web gets me a summary of quick sed oneliners which has exactly what I'm looking for but would never in a million years have figured out on my own:dfasdfasdfaasdf- sregaregeagrerg- 242342423- ytuyutuy qqweqweqweq- sdadsasdasdasd- zxczcxzcx-
It looks for the dash followed by the end of line (in sed fashion, the new line character is not part of a line), and if found appends that line -and- an actual new line character to the search space, which is then searched for by the next 's/...' and removed (and then a little 'goto'ing' which I never new existed in sed before).# if a line ends with a backslash, append the next line to it sed -e :a -e '/-$/N; s/-\n//; ta'
Great. Except it doesn't work. Why not? because..well, before the explanation, I have to complain about the hours and hours (well, 3) that I spent doing the 'debugging by permutation', trying all the possibilities of small changes, maybe it's for a different shell or slightly different sed version, or whatever. OK, that's enough...on with the solution...
Like all the Sherlock Holmes stories, there's always a tiny bit of information that the author doesn't tell you until the very end, which of course if anybody knew already would have solved the problem...the file I received was in -MSDOS- format, meaning simply that new lines are denoted by -2- characters, carriage return -and- line feed (or \r \n, or \x0d \x0a).
So the sed was correctly finding '-' at the end of a line, and appending the next line, but it couldn't find '-\n' and remove it because it really needed to look for '-\r\n'.
That is, an invisible character. You can't see it but you have to know about it to correctly solve the problem. In my very dim memory of the far past, it seems like this used to be a 'joke' bug, a possibility to blame something unknowable on (because you can't -see- it), when the bug is probably really a thinko.
Anyway, hours wasted on trivialities.
That is all.