Information, knowledge...

We are AWASH in a world of info (and data and knowledge and wisdom). Let's look at DS-related aspects for them.

Wikipedia is a huge GRAPH of knowledge!

Before Wikipedia, we had printed (and later, on 'CDROM') encyclopedias, written by experts hired by for-profit companies. Today, this is a democratic process.

On account of the linking between articles, Wikipedia entries form a knowledge graph [watch from 35:00].

DBpedia is a project that analyzes Wikipedia pages, to build a graph of the topics that are in the pages [this graph is going to be richer than just Wikipedia article linking viewed as a graph]. LODmilla is a viewer, to browse through Linked Open Data (LOD); LodLive is another one. Such graph data can also be browsed as a 'wheel'.

PageRank!

Google's MIGHTY PageRank algorithm, in addition to its MapReduce parallel processing, is responsible for turning the company into an Internet giant.

PageRank ranks pages by the number of incoming connections - "the more people talk about me, the more I am important". PageRank can also be used for analyzing social media influencing, credit risk, etc (any place a web of 'players' is involved).

Here is a simple explanation of PageRank [without the iterative calculations that arrive at the ranking].

You can play with PageRank here: https://bytes.usc.edu/~saty/tools/xem/run.html?x=PR

Using data, to rank data-related courses :)

Another "meta" activity that is quite useful, is to RANK the 'most popular' (or 'most viewed', which might not be the same as 'most popular') book/movie/YouTube clip/gadget, using user-submitted ratings and reviews...

Here is a great ranking of ML courses [for you to follow up on!]. Note that such lists could be 'hand generated' [which can be both good and bad].

Most popular coding languages!

It is natural to use data - in this case, # of GitHub 'repos' - to rank programming languages. Here is GitHub's Octoverse ranking for 2018. Older Octoverse pages [massive treasure troves of data-derived insights]: 2017, 2016.

ML, in education

The 'one-size-fits all' classroom model is slowly giving way to individualized, a la carte education... This is enabled via ML, where students' knowledge and skills are evaluated and cross-compared, to prescribe custom learning paths.

This article lists 8 ways by which ML can help. This talks about individualized learning, as does this one.

Hyper-targeted learning [rationale: one size does NOT fit all!]:

Entertainment

applications

Does the 'hot hand' exist?

'Sports analytics' is a fascinating field [if you like sports!], given the availability of massive amounts of data [for basketball, baseball, football, and just about everything else].

Is the 'hot hand' a myth? My friend Jay Cordes has a nice article on this [DO check out almost all of his posts, and his personal pages, including his unicorn one!].

Sports viz!

Another wonderful thing is to visualize (summarize, really) games [right on the field/court!], or even an entire career!

Here is another kind of visualization [compared to the one above].

Social media

A social media platform such as Facebook offers it all - entertainment, education, communication and commerce...

Our profiles (likes, group memberships, friend lists, posts...) are being "used" is un-wholesome ways, with our consent, but without our really knowing how, or the impact - we are ASSETS that can be DATA-MINED.

ML as greenscreen replacement?

The 'traditional' greenscreen technology - first analog, then digital - is about to go away!

The same ML that learns human poses for the purposes of security, etc, can be used to 'cut humans out' of any [non-greenscreen] environment! Here is a writeup on how it works (the article notes that it uses TensorFlow!).

There are many other uses for ML, in media creation (eg. auto-posing a rigged character, in animation).

Compressing song lyrics, ML+audio

The LZW compression algorithm is used to compress images losslessly, by counting repetitive pixels and storing the pixel color just once along with repetition count (instead of storing the same pixel color values multiple times). What about doing this with song lyrics?

Here are projects where ML is used on audio samples...

Commerce

applications

Blockchains!

A blockchain a public, distributed, encrypted, ledger. The ledger is a chain of blocks. Each block is a set of completed transactions (eg. buy/sell). Once part of the blockchain, a block (ie. transaction events in it) cannot be altered.

Here are seven (!) high-level views of how it works. And, this offers simple explanations.

You can play with a toy blockchain here: https://bytes.usc.edu/~saty/tools/xem/run.html?x=blockch

Benefits for data science (all related to governance):

data integrity verification
data tracking (lineage)
data sharing, eg. for discovery, collective decision-making, shared-knowledge economy, crowd-sourced analytics...
data trust [eg. in votes]

Stock price prediction (!)

Here is a Python-based application [with source on GitHub; you can run this on Colab] to predict the price of a stock, using historic (ie past) data for that stock.

Here is how to use a stock analysis toolbox (in Python) by Auquan.

How about using tweet analysis to analyze the stock market? :)

Fraud, risk...

Fraud is an unfortunate reality, in the world of e-commerce - it affects both cardholders and card companies.

DM/ML is quite commonly used to learn 'normal' behavior of cardholders, and use that to flag abnormal/unusual/outlier purchases (in real-time).

Here is a small application note, on analyzing loan risk.

Clickstreams

Clickstreams are 'free', raw, data regarding users' "journeys" through a website (eg. Amazon's). Such data can be mined, to extract meaning (navigation patterns, types of users..) from those journeys.

Targeted ads

Since the beginning of the Web, there have been efforts to deliver 'targeted' ads (as opposed to billboards, TV ads, newspaper inserts and junkmail flyers that are 'shotgun' approaches).

One aspect of this: online newspaper (oxymoron!) outlets have lots of data on their readers, and this data can be sold to advertisers...

Summary

Over these past few weeks, we took a look at DS/DM/ML applications across a wide swatch of human activity! Here is the diagram once again, that shows the spectrum:

Information/knowledge

applications

Information, knowledge...

Wikipedia is a huge GRAPH of knowledge!

PageRank!

Using data, to rank data-related courses :)

Most popular coding languages!

ML, in education

Entertainment

applications

Does the 'hot hand' exist?

Sports viz!

Social media

ML as greenscreen replacement?

Compressing song lyrics, ML+audio

Commerce

applications

Blockchains!

Stock price prediction (!)

Fraud, risk...

Clickstreams

Targeted ads

Summary