DataVis 101

Introduction to Data Visualisation with Python

Research Computing Team and Service

  • Here to support research(ers)
    • Provide training
    • Support users of Grid and Cloud Computing platforms
    • Provide consultancy
      • To develop project proposals
      • To help recruit people with specialist skills
      • Working directly on research projects
  • For details please see our Website
  • Contact us via the IT Service Desk

Course objectives

This course should help you to:

  • Become familiar with best practice with regards to scientific data visualisation
  • Build a toolkit of resources to help you create objective, informative plots
  • Use Python and Python libraries such as Matplotlib and Seaborn to create aesthetically pleasing graphics
  • Be aware of some common issues when it comes to data visualisation

Documentation

  • Course notes
  • These slides: https://arctraining.github.io/rc-slides/datavis
  • Hackpad

  • A flexible, human-readable programming language with simple “syntax”
  • It is commonly used for:
    • data analytics
    • scientific programming
    • visualisation
  • It has a huge collection of specialised libraries to solve specific research problems
  • It is Open Source
  • A way of graphically representing data
  • Can include plots, graphs, charts, schematics, maps, infographics
  • Commonly found in research papers, publications, policy documents, on conference posters, in talks, on news reports…

Why do you want to learn data vis or Python?

Lets look at the survey results…

Some examples…

Map faults on the surface of Mars

Map faults on the surface of Mars

Model the thermal evolution of meteorite parent bodies

Plot and analyse x-ray element maps

Data Visualisation: Theory

Aims for this section

By the end of this section, you should…

  • Be able to recognise possible issues with scientific figures and be able to critically evaluate data visualisations
  • Have a framework to build useful, objective figures to illustrate your results
  • Have resources to further investigate different aspects of scientific figure making

Source material: Ten Simple Rules for Better Figures by Rougier, Droettboom and Bourne, 2014

Rule 1: Know Your Audience

Who will be reading it, in what context?

  • You
  • Your supervisor(s)
  • General scientific or research audience
  • Experts in your field
  • Experts in a specific method you use
  • The general public
  • Policy makers
  • Undergraduate students
  • Scientific journal
  • A general lecture on a research area
  • An outreach event
  • Updating funders
  • Funding application
  • Thesis examiners

What different requirements might these audiences/contexts have?

  • Understand the needs and knowledge level of your audience
  • Adjust the complexity of your visualisation to match

Consider:

  • Figure type
  • Annotations, labels
  • Amount of data being shown
  • Terminology and jargon

a. Using default settings, basic histogram

b. Customising the colour palette, using Seaborn library, multi-panelled figure

Rule 2: Identify Your Message

  • The figure should express an idea quickly and succinctly
  • Part of this is ensuring you know what the message is
    • People can get stuck knowing they “need to make a figure for this chunk of results” but are not sure what they actually need to convey
  • Come up with a one-sentence statement that captures your results, before building the plot
  • Show your plots to your colleagues and see if they can deduce the key point

Trying to pick the right plot for your use-case

Seaborn categorises its functions for you

See “Overview of seaborn plotting functions” from the Seaborn docs

Browsing documentation galleries can be a good idea…

But be careful not to pick an obscure, strange plot just because it looks pretty!

Also, check and see what other researchers in your field are using…

Rule 3: Adapt the Figure to the Support Medium

  • Understand what medium the figure will be displayed in
  • Adapt the figure appropriately
  • Related to Rule 1: Know Your Audience but more focus on physical medium:
    • Projected on a screen?
    • High resolution printed poster?
    • Journal article read online and zoomable?
    • Likely to be printed out in greyscale?

a. Small font size and line thickness

b. Large font size, heavy line weight

Technical considerations:

  • Font size
  • Vector graphics vs. raster image
  • Line weight
  • DPI or resolution
  • Colour profile: RGB or CMYK

Rule 4: Captions Are Not Optional

  • Captions provide additional context for the figure
  • They help the reader to interpret the figure correctly
  • They should be treated like axes labels, a legend… e.g. essential!

What might a caption look like…

  • In a published research article?
  • On a conference poster?
  • In a seminar or talk?
  • On an outreach leaflet?
  • In a policy document?

Rule 5: Do Not Trust the Defaults

  • Default settings in your chosen plotting software may not suit your needs entirely
    • Defaults may be outdated, not suited to your field of research, or not be accessible
  • Adjust settings as needed - but also recognise when this might be a time-sink!

a. Using default settings, basic histogram

b. Customising the colour palette, using Seaborn library, multi-panelled figure

a. Using default settings, basic scatter plot

b. Customising the colour palette and changing figure parameters

Of course, the solution might also be that a different plot shows your results more clearly… remember Rule 2: Identify Your Message!

Rule 6: Use Colour Effectively

  • Colour can be an important asset in your scientific visualisations
  • Must be used mindfully to avoid confusion

According to Edward Tufte 1983, colour can be either your greatest ally or your worst enemy if not used properly (Rougier et al., 2014).

Uses for Colour

  • Adding an additional dimension to your figures (caution!)
  • Adding clarity to different marker or line styles
  • For emphasis
  • For heatmaps

Low resolution image not intended to be scaled up…

Estimate subsidence of volcanic edifices on planets - using both colour and symbol shape to differentiate.

Topographical map of Mars; co-ordinates system not plotted to avoid discussion on Martian projection systems!

Pitfalls of Colour-Use

Rainbow colour maps such as “jet” have previously been popular despite their issues. “Plasma” is “perceptually uniform” but can obscure details in extreme values.

Cividis is both “perceptually uniform” and also appears the same to audiences with and without colour vision deficiencies, but has the same issues as “plasma” with obscuring features in plots with a wide dynamic range.

Spring is a terrible colour map, even if it’s pretty.

Sometimes you do need a plot that will pick up these sorts of features in the dark and bright areas!

It can be easier to see the issues in an image than in a map

Other packages such as “Cmocean” also provide well-designed scientific colour maps.

Other packages such as “Cmocean” also provide well-designed scientific colour maps… and some strange ones

Plot and analyse x-ray element maps

How can colour infer order?

Data can be encoded by other means aside from colour… which of these could be ordered or infer order?

  • Size
  • Shape
  • Grouping
  • Area
  • Position
  • Saturation
  • Line pattern
  • Line weight
  • Angle
  • Connections

Best Practice

  • It is a good practice to, where possible, avoid conveying information purely through color. You should always consider adding other ways to convey the same information besides just color.

  • Check that your colour choices will be easily distinguishable from one another

  • Ensure that colour does not infer order where there is none!

  • Use perceptually uniform colour maps whenever possible

Further Reading

Rule 7: Do Not Mislead the Reader

  • Figures should represent data as objectively as possible
  • Avoid creating misleading figures

This seems straightforward:

to make a good scientific visualisations, don’t do academic fraud

but actually, it’s easier to accidentally misrepresent data than you might think.

Unintentional misleading of your audience often arises through not fully understanding how you have visually encoded your data.

  • Size
  • Shape
  • Colour
  • Grouping
  • Area
  • Position
  • Saturation
  • Line pattern
  • Line weight
  • Angle
  • Connections

Stacked barplots; both panels show the same data

Bubble plot, where “z” value is shown by both colour map and size

All of these represent the same data…

Both represent the same data…

Both represent the same data…

Both represent the same data…

Rule 8: Avoid “Chartjunk”

  • Chartjunk refers to all the unnecessary or confusing visual elements found in a figure
  • Avoid overloading a figure with too much detail

But do:

  • Add enough annotations that the reader understands the figure
  • Understand that this is subjective!

If your figure is confusing, try fixing it by removing something rather than adding extra labels!

Yikes.

  • When should we use gridlines?
  • When are annotations helpful?
  • What about legends? Are they chart junk?

Rule 9: Message Outshines Beauty

  • The key requirement of a scientific graphic is to communicate scientific results
  • Aesthetics should not distract from the message

Jet is rubish, and pretty ugly… but Spring is also a terrible colour map, even if it’s pretty.

Murphy Quinlan et al. (2021)
  • I really dislike the look and colour scheme of this, but I have to admit it does a better job of representing my data

Rule 10: Get the Right Tool(s)

  • Use appropriate tools for data visualisation
  • Scripting languages like Python and R can be more efficient for producing multiple similar plots
  • While it is absolutely ok to use a graphics package to add annotations etc. to your plots if you are more comfortable with this, remember to always think about how reproducible your workflow is

Tools and libraries within Python; * denotes library used in this course

  • Usually Matplotlib is the best option if you want total control over every aspect of your plot, and want to build something that doesn’t look like any examples from library galleries
  • You can use Matplotlib or Seaborn to create some simple, schematic-type plots
  • This is one of the areas where a graphics program will probably outshine other tools!

Lets build some plots!

  • Using Google Colab
  • Follow along coding session
  • Our Colab notebooks will be shared with you