Wednesday, 2 May 2018

A set of data presentation standards from 1915

If you are interested in data visualisation you may think that modern data visualisation thinking started with Edward Tufte in the 1980s. You may follow the trail back to John Tukey in the 1970s or even to Mary Eleanor Spear in the 1950s.

In fact we can track back over 100 years. Williard Brinton's landmark book Graphic methods for presenting facts was published in 1914.  The Joint Committee on Standards for Graphic Presentation,  propelled by Brinton,   published its first (draft) set of standards in 1915. These  are surprisingly relevant even now:



Standard 1









Standard 2









Standard 3










Standard 4








Standard 5
















Standard 6


















Standard 7











Standard 8










Standard 9














Standard 10











Standard 11

















Standard 12











Standard 13
















Standard 14




Standard 15







Standard 16


Standard 17



Saturday, 7 April 2018

Boxplots: Creating a Spear range bar chart in Excel

This article shows ways to produce a range chart - an early version of box plot - using Excel. There are different ways to produce box plots on Excel. Microsoft eventually added a box plot to the standard chart gallery in Excel 2016. The methods set out here work for earlier versions of Excel as well

Mary Eleanor Spear illustrates  the range bar in her 1952 book 'Charting Statistics' 











The essentials of this chart are that it uses the five number summary of the spread of the data, with the inter-quartile range (IQR) as a box and the range beyond the IQR as whiskers. These are all the essential elements of a box and whisker plot, 25 years earlier than John Tukey's 1977 book Exploratory Data Analysis which many people herald as the point at which the box plot was 'invented'. I'm calling this the 'Spear range bar' in an attempt to restore some of the credit to her.

We need some example data to demonstrate the method in Excel












These are the key five numbers. First step, we need to calculate the spans between each. This includes the span from zero to the minimum (which  will serve to 'pad' the left hand side of the chart)










Next use the calculated span data to create a stacked bar chart















Excel may default to setting up the bars the 'wrong' way round (as above). If so you will have to twist its arm using 'Select Data' then 'Switch Row/Column'























You may also want to restore the sequence (Format Axis ...Categories in Reverse order)








 





















Now we have got all the bits in the right places, most of the rest is formatting.

We need to 'get rid' of' the 0 to Min span - it is just the padding needed to push the edge of the range away from zero. So reformat that Series so that there is no fill and no border line.



Next reformat the two parts of the interquartile range (Q1 to Median and Median to Q3) as simple boxes (no fill, solid line border)



Next we need to format the whiskers  (Min to Q1 and Q3 to Max). A quick way of doing this is to use a picture fill. In a spare corner of the spreadsheet, set up a thick single cell border. Then take a picture of it (the Windows Snipping Tool works well).

Then use this picture as the fill for the series....Insert from Clipboard. You will probably also want to format to 'Stack' (to ensure the line thickness stays the same irrespective of the whisker length

























Format to no border line. Repeat for the other whisker and we are nearly finished


Format to taste:


It' a little clunky but it's fast and effective

If you look on Google for how to create boxplots in Excel, you will find many blog articles all advising the same trick: namely customising error bars to create the whiskers
























This works well but takes a little more setting up than the picture fill method outlined above.

The left hand whisker is created by adding an error bar to the Min to Q1 series. Select the Minus only option. For the Error Amount pick Custom and then enter the range for the  Min to Q1 series.data



































To create the right whisker, an an error bar to the Median to Q3 series (not the Q3 to Max). Set to Plus to get the bar to extend to the right

To reflect the Spear chart style better, the No Cap option can be selected for End Style. Adjusting to this and thickening up the error bar line slightly gives:
















References - a selection of 'how tos'

http://www.dummies.com/education/math/statistics/box-and-whisker-charts-for-excel/

http://www.contextures.com/excelboxplotchart.html

https://nathanbrixius.wordpress.com/2014/03/10/beautiful-box-plots-in-excel-2013/

https://peltiertech.com/excel-box-and-whisker-diagrams-box-plots/

https://www.vertex42.com/ExcelTemplates/box-whisker-plot.html



Friday, 30 March 2018

Pioneers and opinion shapers: John Wilder Tukey



J W Tukey (1915-2000)


John Tukey was amongst many things a mathematician, a statistician, an academic and an innovator. His output was prodigious and his achievements include:

  • Inventing the box-plot (a development of the range plot devised by Mary Eleanor Spear)
  • Coining the term 'bit, now ubiquitous in computing. It is possible he also invented the term 'software'
  • Devising the Cooley-Tukey fast Fourier transform (FFT) algorithm
  • Devising Tukey's range test, a development of the t-test
  • Devising the Tukey lambda distribution
  • Devising Tukey's test of additivity
  • Devising the Teichmüller–Tukey lemma

Tukey was the driving force behind the development of Exploratory Data Analysis, publishing the landmark book of that name in 1977.  This provides a set of ideas and techniques

There is a YouTube clip of James Cooley and John Tukey talking about FFT at a 1992 conference

There is a YouTube clip of John Tukey explaining the exploration of multidimensional data using the PRIM-9 display system

Notable quotes from Tukey:

The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Sunset salvo. The American Statistician 40 (1).  1986 http://www.jstor.org/pss/2683137

Finding the question is often more important than finding the answer.


Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
The future of data analysis.

Annals of Mathematical Statistics 33 (1) p 13. 1962


The tool that is so dull that you cannot cut yourself on it is not likely to be sharp enough to be either useful or helpful.
The Technical Tools of Statistics. The American Statistician 34 (1).  https://www.jstor.org/stable/2682374




Wednesday, 24 January 2018

Pioneers and opinion shapers: Florence Nightingale



Florence Nightingale (1820-1910)

Famous for her development and promotion of nursing, she was mythologized as ‘The Lady with the Lamp’ during the Crimean War.

Her work in statistics and what we now call data visualisation is less well known but significant. In 1859, she was elected the first female member of the Royal Statistical Society

http://www.jstor.org/stable/2965763?seq=11#page_scan_tab_contents


Nightingale developed the ‘Polar Area Graph’. These days we may call this a ‘cyclic histogram’ or a ‘Nightingale Rose’ chart: 





Tuesday, 7 November 2017

Doubled bar chart labels in Excel

The following bar chart was published on the BBC website

























Not withstanding the somewhat gruesome subject matter, my interest was caught by the labelling of the categories. The labels had both the location and the year, with the year in bold and on a second line.

I wondered whether it was possible to achieve this effect using Excel

Forcing the year onto a fresh line can be achieved by inserting a "hard" carriage return. (ALT and ENTER keys at the same time)

Jon Peltier provides a method of adding and then formatting the category labels using a second (dummy) data series. https://peltiertech.com/individual-format-category-axis-labels/. The alignment of the labels can be set to right




I have not yet found a way to selectively apply Bold formatting to the text in the label



Sunday, 5 November 2017

Creating a Fan Chart in Excel



Mary Eleanor Spears provides an example of a ‘Fan Chart’ in her book 'Charting Statistics’ (McGraw Hill Book Company. 1952):
























The Fan Chart is in effect a variation of the Slope Chart. It shows relative change rather than absolute change. So all points on the Left Hand side start on zero.

A version can be produced relatively easily in Excel:






















This version was made from the following starting data:




















The main steps were:

Create a basic line chart from the data
Select Data Source: Switch Row/Column
Delete legend
Delete header
X Axis: set  label position high
Format each data series line colour and thickness
Format Y axis tick marks and line
Format Y axis so that X axis crosses at zero
Format Y axis maximum value to 10
Delete all gridlines
Set X axis position on tick marks
Add data labels (to right)
Delete all LH data labels (carefully so as not to delete RH at same time)
Format each RH label with

  • Series name not value
  • No leader lines
  • Position to Right side

One way to add the shading outside of the data range is to create a second chart and overlay the fan chart. Create a clone of the chart and turn that into an area chart, being careful not to change the size at all. Turn the plot area of the clone grey and turn the fill of the wedges within the data area white

Then copy the line version of the fan chart and position it over the area version (making the top chart see-through by use of 'No Fill') settings). Here is a screenshot of the effect:






Thursday, 2 November 2017

Slope Graphs

The slope graph - or slope chart- is a special type of line chart. It is particularly useful for showing multiple  ‘before and after’ comparisons. It may be either based on rankings or on actual scores.

The slope graph is often said to have been devised by Edward Tufte (he published an example in his ground breaking 1983 book 'The Visual Display of Quantitative Information'). Tufte developed and promoted the idea but did not 'invent' it as such.

A slope graph was published in another book on the visual display of quantitative Information,  69 years before Tufte:




(part of Figure 63, p.65, from 'Graphic Methods for Presenting Facts' 1914 by Willard C Brinton - revised edition 1919)

Another example, more modern and also pre-Tufte, was published by Mary Eleanor Spears in her book 'Practical Charting Techniques'.(McGraw Hill. 1969). She describes it as a Ranking Chart:







Unlike the busy 1914 example, focusing on the change between just two points allows us to read easily. The gradient of the connecting line quickly gives both direction and magnitude. This is a considerable advantage over illustrating with a simple table


Edward Tufte has posted more examples on his website: https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0003nk&topic_id=1 including possibly the earliest ever slope graph (from  Scribner's Statistical Atlas of the United States,1883)

Both Jon Peltier (https://peltiertech.com/slope-graphs-in-excel/) and Stephanie Evergreen (http://stephanieevergreen.com/slopegraph/) have provided step by step details of how to create a slope chart in Excel. The latter is clearer, and is also covered her book 'Effective Data Visualization: The Right Chart for the Right Data'.

Using these instructions, together with a tip from Chandoo, enables us to create an Excel version of Mary Eleanor Spears 1960s original:




Producing this chart in Excel was slow, but not difficult. The labels on the left-hand side were formatted using Stephanie Evergreen's method (making Series and Value visible then omitting the comma between the two).


The following data layout was used to create the Slope Chart in Excel:

























Column D is used solely for the Right-Hand labels. It is not included in the ‘Select Data’ range

I could not find a way to reverse the order of Value and Series using Stephanie Evergreen's method. Chandoo pointed out that it is possible to set the value of individual labels using a formula. So, the solution for the right-hand labels is to create the label in another spreadsheet cell and then point each label to the specific cell

This method is quicker than the Evergreen method. Unfortunately, it does not work for the Left-Hand labels – if you want to reuse the chart with refreshed data - as the order is volatile.

A final step might be to format one feature a different colour or intensity to highlight it.

The Slope Chart allows a lot more series to be compared in a single chart than would be sensible with a regular line chart. But what is the upper limit for a Slope Chart? That will depend on the way that you want to use it. Take the following example which compares 44 series in a single chart:  



 From a conventional dashboard perspective, this is now too much to fit with ease on a single screen. So, it will not work easily in an Excel dashboard or PowerPoint slide. But it still works if printed on paper. And it could work in a single screen if set up with a window to roll over the full chart, displaying a portion at a time.

Another approach would be to simplify by removing the labelling