First, what is it supposed to be now? Wikipedia gives the following:
Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate
Conveniently a source search for the provenance of the term has been done: who first used it in a similar way to what it means now.
And Forbes gave a list of many possible definitions, some more substantive than others. These are all definitions I can agree with. There's data, it's structured in weird new ways, and it comes from new sources that produce gobs and gobs of it. The term 'Big Data' is used to contrast with traditional data, its storage and techniques. Traditional data is not 'Big' and therefore is 'small' and its techniques are 'old'. I'd say any kind of relational database (SQL related) is traditional.
Big data is things like mouse clicks or even mouse movements on a screen, or fitbit near continuous polling of heart rate and blood O2 (does it really do that? I don't know!). Or road sensors checking car presence on a road or at a stop light. Every possible transaction, position.
The thing is that what a lot of people are calling 'Big Data' are really pretty traditional. For example, what about credit card transactions, is that big? They're constantly checking them for out of the ordinary purchases so that they can cut off my service right when I need it at an airport. Or tax returns? There must be millions, and for even the simplest return, hundreds of entries and calculations! Or weather sensors for near continuous temp/pressure/precip/windspeed? These are old data sources, no one would call them Big Data, they have been around long enough that they -are- traditional. Yet they are all at the forefront of large and complex systems design.
And then there are the false positives, those things that are called big data but don't really fit that definition. For example, electronic health records collect a lot of information on a patient. But... it's about as traditional as traditional gets. Straightforward collection of forms or transcripts or lists of entries.
I am purposefully leaving out things like EKGs or radiological images, not because they deny my point but because they are the exception that proves the rule. They both take of gobs and gobs of space (for radiology a single CT scan of the body can take up X Gig. In the best sense, the data 'size' of a study is 1, or rather it is the size of the radiologist's text report describing the results found in the images ('Insignificant thyroidal calcification. No other findings') that counts, and that hardly counts. The scale is pretty small.
So what is not Big Data that actually isn't Big Data (by the official definition)? There's the database of clients that one company manages, all the contact information, the account transactions. Classic relational database. There's the auto parts store with its database of suppliers and buyers, all the various parts for sale with their characteristics and which make/model/year they work with, the list of sales and inventory. Classic relational database.
Also, the term 'Big Data' has a particularly ... millenial 'kids-these-days' feel to it. It is a bit inarticulate in that it's not saying exactly what it is, but everyone has a good idea of what they think it should mean. 'Big' is about as meaningful as 'wow'.
Anyway, 'Big Data' ... I'll use it for what everybody else uses it for (I don't know what that is yet !), both big new things, and also smaller traditional things that most people never really thought about before.
Footnote: this is entirely patterned after the term 'superccomputer' which has a similar history. In the late 70's Cray I was a supercomputer, and today (mid 2010's) a smart phone is just a little ... thing, despite the fact that the smart phone computes more flops than the Cray. A supercomputer is pretty much just the best possible computer in existence -right now-.
No comments:
Post a Comment