***





Visualization

[of data]

 

 

 

 

What are we visualizing, and why?

Data!

Data visualization ("data viz") involves (the study of) tools and techniques for turning data into images/graphics - to obtain BETTER INSIGHT into the data.

In other words, this is about graphical depictions of data. Why do it? To understand, communicate, act/decide.

In what follows, we are going to look at what can be visualized, and how. Note: it's not all 'Big Data' viz, it's not all 'mined' results either.

Also, data viz is distinct from ViSC (Visualization in Scientific Computing).

What is NOT data viz?

Data viz is specific - involves data=>graphics.

Following are examples of viz, but not necessarily of data.

Visualization is needs-driven

What type of visualization we would create, is determined by the type of data, type of analysis, and the intended audience ['consumer'].


[graphic by Katy Börner]


[graphic by Katy Börner]

Viz: a single variable

Classically, a pie-chart can be used to express relative fractions of a quantity:

Creating your own pie-chart is simple..

A histogram/bar-chart can be used as well:

Also lookup: double histogram, density plot.

Bubble plots are also useful:

Wordles can be used to indicate relative strengths of keywords/topics. It is easy to create your own.

Multivariate data - possibly the BEST viz ever?!

Edward Tufte calls the graphic below, ""The greatest statistical graphic ever drawn". Adapted from Wikipedia : 'Charles Minard's 1869 chart showing the number of men in Napoleon's 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path.' Specifically, the graph shows these 6 types of data (in 2D!): the number of Napoleon's troops; the distance traveled; temperature; latitude and longitude; direction of travel; and location relative to specific dates.

Viz: spatial data

Plotting spatial data (eg. incidence locations) on a map reveals patterns/trends in a direct way.

Cholera outbreak map, 1854, London [plotting of reported cases as black dots reveals the source of the outbreak [highlighted region]):

It is quite useful for planning purposes, to visualize data over a map - eg. here are Starbucks locations..

Mined data, eg. associations, can be superposed over a map, eg. in a grocery store. Results can be used to redo the layout. A related topic is product placement.

As we saw earlier, a choropleth map shows spatial, aggregated data (that covers the entire region shown). These come in two varieties - unclassed (continuous scale), classed (discrete ranges).

Viz: spatio-temporal data

Superposing time-varying data on a map reveals course, trends, etc. Such data could be visualized as animations, too.

H1-B visas - who gets them? :) [scroll to the bottom of the report]

Viz: interactivity

Being able to INTERACT with data provides MORE understanding - we can selectively turn items on/off, drill down or roll up, explore the time dimension..

Crime map, USC area :(

Crimean War casualties, by Florence Nightingale (!)

NOAA, Historic Hurricane Tracks

Here is a map that shows LA area traffic data.

Viz: real-time!

Real-time visualization provides a level of immediacy/freshness/relevance/interest that is simply absent in non-real-time data..

world population growth [even more real-time stats!]

local traffic [amazing]

earthquakes!

stocks

cybercrimes (!!)

Your to-do: look for real-time weather (eg. Doppler radar) sites. What other real-time-updated data can you find or think of?

Viz: networks - node attrs

Network visualization is a very popular category - shows RELATIONSHIPS between entities.

A diagram that maps email exchanges between family members:

In the above, we are lacking DETAIL that can be added using extra ATTRIBUTES, and LABELS. Here is an enriched version that uses attrs and labels:

Viz: networks - edge attrs

We can use edge attrs (type, eg. arrows, dashes.., color, thickness etc.) to quantify data. The diagram below shows trade quantities between countries (2012, top 12 countries as per GDP):

'Grab bag' of goodies

An assortment of 'cool' (visually appealing) and USEFUL data visualizations:

Meta viz

We can even visualize aspects of data that can help us carry out better data analysis - below is a 'coverage map' that shows, well, coverage (missing/inadequate data):

HOW to GENERATE data viz??

During the past lectures, we've looked at a few data viz examples (eg. GIS data). Here is a systematic breakdown of ways to create all manner of data viz.

Data science software

Using code

Online tools

Math, analysis and plotting packages

3rd party data-viz software

AR + VR : revolutionary data viz systems?!

The 'coming' revolution in AR and VR is sure to effect sea changes in 'data viz': visualize/interact with => perceive/be immersed in data!

Here is a book on visual data mining, and this is a chapter in it, on using VR to explore data.

In order to make the best use of AR (and VR) for data analysis, we need a thorough understand of how we process visual data. Here is a doc that contains several examples (and a pipeline) for VDM.

Great Wave is a company that had created Meta 2, a platform for data-viz using AR. MUCH MORE is possible!

Where is the 'science'??

Data viz is an art AND a science - there are principles, choices, tradeoffs. As for the principles, these encompass diverse disciplines such as visual perception, color theory, composition (grouping, contrast, harmony, symmetry..), design elements (line, tone, form, texture..), semiotics, etc.

As for what type of graphic to generate for a given type of data analysis, we can follow the guidelines here (from 'Atlas of Knowledge'):

What makes for a good design?

Induce the viewer to think about the substance rather than about methodology, graphic design, the tech of graphic production, or something else. (Edward Tufte)

Summary - why visualize?

We have seen several examples of visualizations, several types, several ways to create visuals... So we ask again, 'why bother'? Because visualization (that is done well) provides a memorable, intuitive, 'compact' way of communicating (vast amounts of) data to users. "Seeing is understanding".

Here is a nice TED talk on this ['why'] aspect: https://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization/

Further reading/exploring

Here is a brief whitepaper from SAS, that discusses data visualization.

Here at USC, we have https://classes.usc.edu/term-20211/course/dsci-454/ and https://classes.usc.edu/term-20203/course/dsci-554/.

The Visual Capitalist, https://www.reddit.com/r/dataisbeautiful/ etc are excellent online pages that showcase and discuss a wide variety of data graphics.

Edward Tufte's books are CLASSICS, eg. 'vdqi': https://www.edwardtufte.com/tufte/books_vdqi

Katy Börner [http://info.ils.indiana.edu/~katy/] has a nice MOOC [http://ivmooc2018.cns.iu.edu/] and an excellent coffee-table book [https://mitpress.mit.edu/books/atlas-knowledge] - there is much more at http://scimaps.org/.

There is also, 'data journalism'/'data stories' - detailed reporting, centered on data viz.