Sunday, 20 May 2018

The problem of too many lines on a time series line chart

The challenge is how to show several years worth of data in a time series using lines. Even William Playfair struggled with this one















There are two approaches which might help us here
  • Progressively fade the years out as they get further into the past
  • Band several years together to produce a background range

This article shows how to do both of these using Excel.


The problem

We will use some data from my world - it  does not really matter what it is. We have weekly totals for five successive financial years



















So we highlight the five years and insert Excel's default line chart, which produces this:
















It's a horrible mess. Obviously we will work on it a bit: Move the legend to the bottom, stretch it out











It is still extremely hard to follow. The problem is that there are too many lines criss-crossing each other, and the colour of the lines has no real meaning

I regularly see charts like this  - often with even more lines. The analyst has simply trusted to Excel's standard chart gallery and colour schemes. Excel has blown them a raspberry. Excel is a brilliant software package, but you often have to twist its arm to get it to do what you need.

Progressively fade out earlier years

This approach tries to put meaning into the colour of the lines. Most of us will probably agree that it is worth looking at the position at the same point last year. Is the year before that of equal significance? Or the year before the year before? Probably not.  So this approach dilutes the visual impact of earlier years. You can still see them but they clash less.


This is a big improvement. We have used standard Excel colour palette options (we could use a  colour selecting utility to make ore precise custom choices). We have also adjusted the contrast of the gridlines, axis and legend labels,

While it is better, it is still very 'busy'. So the next step would be to sacrifice the oldest year of data






This is a lot clearer - it allows a greater contrast between the most recent year and the one before.

It's still quite busy. What does removing another old year look like?















That looks about right for this pattern of data.

We maybe need to double check that the our colour scheme is giving us something extra (rather than it just being  a result of reducing the number of data series),  So here is the same chart again using the the Excel default colours








Yes the 'fading' scheme is much clearer, and the other adjustments to contrast also help.


Band past years together

What if we do not want to lose sight of the earlier years? A way to retain some of this information is to calculate the range of vales for each week over several years. This takes a little bit more setting up.

First add some calculated columns for the Minimum, Maximum, Range (i..e Maximum - Minimum) and pull through the latest year as a repeat column














Then use the three columns Minimum, Range, Last Year to create the graph.

Start by inserting a Stacked Area graph














Next adjust the data series properties so that the Min series is invisible and the Range (Max-Min) is a faint colour















Next change the chart type for the Last Years (2017-18) data series from Stacked Area to Line














That's more or less it - it just needs some tidying up
  • Move the legend  from the side to underneath
  • Delete the Min from the legend
  • Change the legend label for Min-Max to something more meaningful
  • Adjust contrast on grid-lines and axis labels
  • Adjust line colour of main series to taste














So that's about there now. It allows us to see what the weekly pattern of  the year in question was against the range of the four years before it 

For me this is the best way to look at this particular pattern of data. For other patterns I may think differently but it's a good starting point

The advantage of Excel is that you can quickly knock together different versions to see what they may show

Some other thoughts and possibilities

One approach that might be suggested from modern data visualization thinking is 'small multiples'. In this approach, rather than combining the years into a single chart, each year would be plotted singly and then all would be aligned into an overall view

Here's a quickly assembled version in 'small multiples' style
































This is ok in the sense of showing the pattern for each year. But how easy is it to see whether 2017-18 values stray outside the 'typical' range?. Does this version of small multiples offer much more than simple sparklines?















So maybe to make caparisons easier in the 'small multiple' version we might have to add some solid fill?































Using a solid fill below the line, and containing the plot area makes the comparisons easier. But it still requires a lot of mental gymnastics to make and hold the comparisons

If you able to use these charts in an interactive setting then there are options for rotating through successive years using a pull-down list - as the following illustration roughly simulates:






Saturday, 19 May 2018

Some Websites




Perceptual edge
Probably the single best website for data visualisation. Stephen Few set up Perceptual Edge as a consulting company in 2003. This website contains a substantial collection of articles, available free. It also contains 20 examples of poorly designed charts and how to improve them. The blog provokes a lot of interesting debate. If you are only going to look at one website to learn about data visualisation, pick this one.
  
The Work of Edward Tufte and Graphics Press
This is Edward Tufte’s website. I found it a little idiosyncratic at first. But if you persevere, it has some very stimulating and wide-ranging material. It provides a way to purchase Tufte’s books, including electronic versions of some works.

Peltier Tech Blog

https://peltiertech.com/

This is Jon Peltier’s website. It contains a lot of practical ‘how to’ examples for getting Excel to do things. He ranges into all sort of Excel matters but there is a lot of good material on graphs and charts. He is particularly good at finding solutions for tricky problems.

Evergreen Data
This is Stephanie Evergreen’s website. She runs Evergreen Data and the website promotes this business. The Blog section contains a short selection of articles, some of which are reproduced in her books. It is quality rather than quantity. Recommended

ExcelUser
This is Charley Kyd’s website. It is wide ranging and full of good tips. It includes quite a lot on Excel graphs and charts.

Excel Charts
This is Jorge Camoes’s website. Although it is entitled ‘excel charts’ it ranges much wider than being a simple excel ‘how-to’ site. There is a lot of discussion about what makes good visualisation.

Chandoo.org
Chandoo is one of the Excel lifelines that I have dipped into over the years. He covers charts (and dashboards) extensively. Sometimes the graphic style drifts away from my liking – but that is a matter of personal taste not technique.


Color Brewer 2.0
If you need to set up a visual hierarchy using colour, this site takes away a lot of the hard work. It allows various numbers of levels to be applied to different colour sets. It gives the technical specification of the colours which can then be used to transfer to your software.


Wednesday, 2 May 2018

A set of data presentation standards from 1915

If you are interested in data visualisation you may think that modern data visualisation thinking started with Edward Tufte in the 1980s. You may follow the trail back to John Tukey in the 1970s or even to Mary Eleanor Spear in the 1950s.

In fact we can track back over 100 years. Williard Brinton's landmark book Graphic methods for presenting facts was published in 1914.  The Joint Committee on Standards for Graphic Presentation,  propelled by Brinton,   published its first (draft) set of standards in 1915. These  are surprisingly relevant even now:



Standard 1









Standard 2









Standard 3










Standard 4








Standard 5
















Standard 6


















Standard 7











Standard 8










Standard 9














Standard 10











Standard 11

















Standard 12











Standard 13
















Standard 14




Standard 15







Standard 16


Standard 17



Saturday, 7 April 2018

Boxplots: Creating a Spear range bar chart in Excel

This article shows ways to produce a range chart - an early version of box plot - using Excel. There are different ways to produce box plots on Excel. Microsoft eventually added a box plot to the standard chart gallery in Excel 2016. The methods set out here work for earlier versions of Excel as well

Mary Eleanor Spear illustrates  the range bar in her 1952 book 'Charting Statistics' 











The essentials of this chart are that it uses the five number summary of the spread of the data, with the inter-quartile range (IQR) as a box and the range beyond the IQR as whiskers. These are all the essential elements of a box and whisker plot, 25 years earlier than John Tukey's 1977 book Exploratory Data Analysis which many people herald as the point at which the box plot was 'invented'. I'm calling this the 'Spear range bar' in an attempt to restore some of the credit to her.

We need some example data to demonstrate the method in Excel












These are the key five numbers. First step, we need to calculate the spans between each. This includes the span from zero to the minimum (which  will serve to 'pad' the left hand side of the chart)










Next use the calculated span data to create a stacked bar chart















Excel may default to setting up the bars the 'wrong' way round (as above). If so you will have to twist its arm using 'Select Data' then 'Switch Row/Column'























You may also want to restore the sequence (Format Axis ...Categories in Reverse order)








 





















Now we have got all the bits in the right places, most of the rest is formatting.

We need to 'get rid' of' the 0 to Min span - it is just the padding needed to push the edge of the range away from zero. So reformat that Series so that there is no fill and no border line.



Next reformat the two parts of the interquartile range (Q1 to Median and Median to Q3) as simple boxes (no fill, solid line border)



Next we need to format the whiskers  (Min to Q1 and Q3 to Max). A quick way of doing this is to use a picture fill. In a spare corner of the spreadsheet, set up a thick single cell border. Then take a picture of it (the Windows Snipping Tool works well).

Then use this picture as the fill for the series....Insert from Clipboard. You will probably also want to format to 'Stack' (to ensure the line thickness stays the same irrespective of the whisker length

























Format to no border line. Repeat for the other whisker and we are nearly finished


Format to taste:


It' a little clunky but it's fast and effective

If you look on Google for how to create boxplots in Excel, you will find many blog articles all advising the same trick: namely customising error bars to create the whiskers
























This works well but takes a little more setting up than the picture fill method outlined above.

The left hand whisker is created by adding an error bar to the Min to Q1 series. Select the Minus only option. For the Error Amount pick Custom and then enter the range for the  Min to Q1 series.data



































To create the right whisker, an an error bar to the Median to Q3 series (not the Q3 to Max). Set to Plus to get the bar to extend to the right

To reflect the Spear chart style better, the No Cap option can be selected for End Style. Adjusting to this and thickening up the error bar line slightly gives:
















References - a selection of 'how tos'

http://www.dummies.com/education/math/statistics/box-and-whisker-charts-for-excel/

http://www.contextures.com/excelboxplotchart.html

https://nathanbrixius.wordpress.com/2014/03/10/beautiful-box-plots-in-excel-2013/

https://peltiertech.com/excel-box-and-whisker-diagrams-box-plots/

https://www.vertex42.com/ExcelTemplates/box-whisker-plot.html



Friday, 30 March 2018

Pioneers and opinion shapers: John Wilder Tukey



J W Tukey (1915-2000)


John Tukey was amongst many things a mathematician, a statistician, an academic and an innovator. His output was prodigious and his achievements include:

  • Inventing the box-plot (a development of the range plot devised by Mary Eleanor Spear)
  • Coining the term 'bit, now ubiquitous in computing. It is possible he also invented the term 'software'
  • Devising the Cooley-Tukey fast Fourier transform (FFT) algorithm
  • Devising Tukey's range test, a development of the t-test
  • Devising the Tukey lambda distribution
  • Devising Tukey's test of additivity
  • Devising the Teichmüller–Tukey lemma

Tukey was the driving force behind the development of Exploratory Data Analysis, publishing the landmark book of that name in 1977.  This provides a set of ideas and techniques

There is a YouTube clip of James Cooley and John Tukey talking about FFT at a 1992 conference

There is a YouTube clip of John Tukey explaining the exploration of multidimensional data using the PRIM-9 display system

Notable quotes from Tukey:

The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Sunset salvo. The American Statistician 40 (1).  1986 http://www.jstor.org/pss/2683137

Finding the question is often more important than finding the answer.


Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
The future of data analysis.

Annals of Mathematical Statistics 33 (1) p 13. 1962


The tool that is so dull that you cannot cut yourself on it is not likely to be sharp enough to be either useful or helpful.
The Technical Tools of Statistics. The American Statistician 34 (1).  https://www.jstor.org/stable/2682374