Big Data[Go BIG - or go home!] |
Next lecture: the 'HOW' of Big Data [storage, processing]
The slides that follow are very general in nature - they present the 'big picture' concepts of Big Data. By nature, the content is relatively speaking, soft/squishy/fluffy..
It *is* important to understand the context in which we will discuss data mining etc. in upcoming lectures, otherwise the material will seem dry/irrelevant.
How many of these buzzwords do _you_ know? :)
Big Data has indeed become somewhat of a catch-phrase/buzzword.
But, we can provide an operational definition: Big Data is data that is 'too big' to be stored in a single machine, and/or processed by a single machine. In fact, 'single machine' might even mean an entire site (a cluster of machines), if the data is 'too' big to fit in one.
This definition is intentionally vague, to keep it relevant for the future as well.
What makes Big Data 'big'? The following three (or seven!) 'V's do.
Big Data is data that has:
In other words, it is data that is varied in nature (comprises diverse types), needs to accessed and used quickly, and comes in large quantities.
Often, three more Vs are also used to characterize Big Data:
Data measurement units are getting bigger - Gigabyes (GB) to Zettabytes (ZB) to Yottabytes (YB) to... Each change is orders of magnitude bigger!!
Big Data is not only big, but getting bigger at a rapid rate..
Appropriately enough, we can get an overview, from this pic:
Here is another summarization.
Purchase history of products+services could reveal a lot.
Vehicle tracking: license plate pic capture is legal.
Privacy and security are at odds at times: NSA large-scale surveillance, 'No Fly' List, real-time face tracking..
Are you being... spied on?
Big Data can be quite useful if collected, analyzed and interpreted properly. Here are things that can be problematic:
Storing huge amounts of data costs time, money; retrieval could be problematic, analysis will cost as well - somewhat oblivious to such concerns, we are creating a data deluge! How come? Because sensors are ubiquitous, storage is cheap, and we feel we 'need to' [FOMO].
Maybe we need to be prudent, in our quest to squeeze wisdom out of Big Data: "The purpose of [scientific] computing is insight, not numbers." - Richard Hamming (1962), in 'Numerical Methods for Scientists and Engineers'
So, which is it? BOTH!
So far, we looked at what Big Data is. Now we turn to the 'why' - what are the sources, reasons ("drivers") etc.
Big Data can result from [you have seen this before]:
Wikipedia: Datafication is a modern technological trend turning many aspects of our life into computerized data and transforming this information into new forms of value. Examples of datafication as applied to social and communication media are how Twitter datafies stray thoughts or datafication of HR by LinkedIn and others.
In other words, it is the new notion that people, our built envrironment (eg. number of freeways in the US), etc. can create, ie. lead to, (large scale) data.
"Once we datafy things, we can transform their purpose and turn the information into new forms of value."
IoT is the 'Internet of Things' - what if (almost) every lightbulb, tire, building, plane engine, bridge, fridge etc. had an IP address and a sensor, and transmits data periodically through a network? Among other things, this by itself will lead to an *explosion* of data :)
Here is an IoT infographic:
Big Data is starting to play a 'big' role in modernizing manufacturing operations:
Again, these are Big Data's characteristics:
What are things we can we do now, that we couldn't, before?
* combine multiple sources of data (however small or seemingly insignificant) for a better 'bigger picture'
* exploit unstructured data - voice, video, images, tweets, blog posts..
* provide insights to [internal] frontline managers in near-real-time (to enable making more agile business decisions)
* experiment with the marketplace (fluid price-setting) as often as needed!
So here's what is new: better insight, quicker action.
According to IEEE (and others), a long time.
Here are some links (to a variety of conferences, a group and meetup):
We are at the start of a transformative phase, fed by our relatively-new ability to collect, store, analyze and benefit from MASSIVE amounts of data from every walk of life.