***





Big Data

"Go Big, Or Go Home!"

 

 

 

 

Context (setting the stage)

The slides that follow are very general in nature - they present the 'big picture' concepts of Big Data. By nature, the content is relatively speaking, soft/squishy/fluffy..

It *is* important to understand the context in which we will discuss data mining etc. in upcoming lectures, otherwise the material will seem dry/irrelevant.

Big Data, Wordle (TM) summary :)

How many of these buzzwords do _you_ know? :)

What is 'Big Data'?

Big Data has indeed become somewhat of a catch-phrase/buzzword. But, we can provide an operational definition: Big Data is data that is 'too big' to be stored in a single machine, and/or processed by a single machine. This definition is intentionally vague, to keep it relevant for the future as well.

Another way to characterize Big Data

Big Data is data that has:

In other words, it is data that is varied in nature (comprises diverse types), changes often, and comes in large quantities.

More, MORE! (how big?)

Big Data is not only big, but getting bigger at a rapid rate..

Sources of Big Data

Big Data can result from:

Datafication

Wikipedia: Datafication is a modern technological trend turning many aspects of our life into computerised data and transforming this information into new forms of value. Examples of datafication as applied to social and communication media are how Twitter datafies stray thoughts or datafication of HR by LinkedIn and others.

In other words, it is the notion that people, our built envrironment (eg. number of freeways in the US), etc. can lead to data generation.

"Once we datafy things, we can transform their purpose and turn the information into new forms of value."

IoT

IoT is the 'Internet of Things' - what if (almost) every lightbulb, tire, building, plane engine, bridge, fridge etc. had an IP address and a sensor, and transmits data periodically through a network? Among other things, it will lead to an *explosion* of data :)

Here is an IoT infographic:

Some data-related issues

Big Data can be quite useful if collected, analyzed and interpreted properly. Here are things that can be problematic:

Big Data - redux

Again, these are Big Data's characteristics:

Why is this useful again?

What are things we can we do now, that we couldn't, before?

* combine multiple sources of data (however small or seemingly insignificant) for a better 'bigger picture'

* exploit unstructured data - voice, video, images, tweets, blog posts..

* provide insights to [internal] frontline managers in near-real-time (to enable making more agile business decisions)

* experiment with the marketplace (fluid price-setting) as often as needed!

So here's what is new: better insight, quicker action.

How long will this be useful/relevant?

According to IEEE (and others), a long time.

Conferences, organizations, LA user group

Here are some links:

Summary

We are at the start of a transformative phase, fed by our relatively-new ability to collect, store, analyze and benefit from MASSIVE amounts of data from every walk of life.