Tuesday, 1 March 2016

The Pie Chart and levels of consciousness

I have been through a succession of stages of understanding and enlightenment in respect of the pie chart. These can be seen as a progression through levels of consciousness (how pretentious does that sound?). Anybody for whom a discussion of pie charts would quickly lead to a state of unconsciousness,  stop reading now - save this for your next bought of insomnia

First Level
I realised that the name referred to pie the food not to pi the strange number with lots of decimal places. Maybe I was unique in mistakenly assuming that the name had something to do with the properties of a circle.

Second  Level

I realised that pie charts were a pain to draw. I was brought up in the Lower Palaeolithic, before the advent of spreadsheets. At school, drawing pie charts involved compasses (to draw circles) and protractors (to measure out angles). When I worked as a professional illustrator, I had to do all this but also very neatly and using ink.  Differentiating the different slices involved very carefully ruled hatching, precise stipple,  or sticking on pieces of expensive patterned film ("Letratone") cut precisely. In those days a spreadsheet was a major undertaking. You had to be really sure you wanted one.

Third  Level
Spreadsheets. Along came Loutus 1-2-3 and then Excel. I was delighted to find that Excel can produce very quickly. Not only that, it offered aesthetically pleasing variations such  as 3D and "exploding pie". I felt very grateful for the existence of computers. I recreated Manhatten using 3 D column charts.

Fourth Level
Having read the works of the data visualisation gurus - I realised, with a bit of an embarrassed sinking feeling, that I was supposed to deprecate pie charts. I realised that the exploding 3D charts made me look like a novice rather than an expert. I stopped using pie charts immediately. I scoffed at other people who still used pie charts and I felt superior.

Fifth Level
I realised that the pie chart can be good after all - if used in the right way. I felt stupid for not holding onto this when I jumped smugly to Level  Four.

Sixth Level
It's a pie chart...but not as we know it. I then realised that there are other kinds of charts which look like pie chart but are really something else. I could not see any missiles directed at these from the data visulaistaion gurus, so concluded that they may be safe to use


Seventh Level
Is there a seventh level or beyond? I have not got there yet. I will let you know


Some Books

I have stumbled about in the information arena for many years. Every now and then a particularly noteworthy book comes to my attention. Here are a few:


How to Lie with Statistics by Darrell Huff. Originally published in 1954, with illustrations by Irving Geis. A revised edition was published by Penguin Books (1973) with illustrations by Mel Calman. This was one of the first books I ever bought and is still excellent today
http://www.amazon.co.uk/How-Lie-Statistics-Penguin-Business/dp/0140136290


Meaningful Graphs: Converting Data into Informative Excel Charts 
by James M. Smith  (2014).  I have seen a lot of books on Excel- and owned a quite few over the years. Many are bloated and not very interesting. This one is very good - and one of the few I've felt worth buying since I discovered good online resources, such as Chandoo.org.
http://www.amazon.co.uk/Meaningful-Graphs-Converting-Informative-Charts/dp/0986054909

Dashboards for Excel by Jordan Goldmeier and Purnachandra Duggirala (2015). Apress. A really useful book - it has tips and tricks that would take you years to discover for yourself
https://www.amazon.co.uk/Dashboards-Excel-Jordan-Goldmeier/dp/1430249447

Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knafic. Kindle Books
A really good book. It looks at systematically at the different aspects of communicating using graphs. Because I like the book,  I can forgive the author for her overuse of the word "leveraging"

Some Books by Edward Tufte

http://www.edwardtufte.com/tufte/ebooks
Edward Tufte introduced the term “chartjunk” and the concept of data-ink ratio in his first book, The Visual Display of Quantitative Information (1983). The second edition was published in 2001 (ISBN 0-9613921-4-2).

His second book, Envisioning Information was published in 1990 Graphics Press, ISBN 0-9613921-1-8.(2001)

Visual and Statistical Thinking: Displays of Evidence for Making Decisions (1997) provides two fascinating case studies: the 1854 cholera epidemic and the 1986 Challenger space shuttle disaster


The Cognitive Style of PowerPoint: Pitching Out Corrupts Within (2006). 

So that's two of us who don't like Powerpoint! Hooray!



Visual Explanations: Images and Quantities, Evidence and Narrative
Graphics Press (1997)  ISBN 0-9613921-2-6.

Beautiful Evidence. Graphics Press, ISBN 0-9613921-7-72006 


Some Books by Stephen Few
If you are interested in data visualisation then it is just a matter of time before you encounter Stephen Few. As well as the books listed below, he also publishes a lot of very good short articles, available free, on his website at www.perceptualedge.com


Now You See It: Simple Visualization Techniques for Quantitative Analysis, Stephen Few,  Analytics Press, 2009


Show Me the Numbers: Designing Tables and Graphs to Enlighten, Second Edition, Stephen Few, Analytics Press, 2012

Information Dashboard Design: Displaying data for at-a-glance monitoring, Second Edition, Stephen Few,  Analytics Press, 2013. Although this book is now in its second edition, I only have the first edition at present.

Signal: Understanding What Matters in a World of Noise, Stephen Few, Analytics Press, 2015


Some Books to avoid?
There are some books in which I have been particularly disappointed. At the risk of upsetting their authors, the following are books which I would definitely not recommend:

Using Charts to Create Effective Dashboards by T.Doyle
Having got this electronically to read on a Kindle, I was dispapointed to find that this book is very short and is barely more than a list of types of chart. There is little on dashboard design at all. Most of the samples it provides are good examples of how not to use charts. This book seems even to have vanished from the Amazon Kindle store now, making my copy an endangered species - soon to be deleted.

MS Excel as a Graphics Design Tool by Clarence Bjorkmann
This was an interesting concept but in the end I was left wondering why on earth anybody would want to use Excel in this way when even Windows Paint could produce better results and there are freeware graphics tools available. Furthermore, the graphic design in the book itself is not impressive. £3.47 I wish I'd spent differently. 

Tuesday, 9 February 2016

Losing your RAG



Variations of the Red-Amber-Green "RAG" performance rating system are used extensively. 

A common version is really a two rating (Red-Green) system i.e. where all performance is marked as either pass or fail. When over-used on a table of numbers (not least  with the common practice of colouring whole cells), this can produce a confusing and garish grid




A proper three-grade RAG system, including the central Amber, allows more sophisticated ratings to be made. But the Amber rating can be used in two very different ways:

  • Fail but only just
  • Pass but close to fail

Either way, including Amber in the pattern of fully coloured cells just adds further to the over-whelming blast of colour that assaults the reader's brain.

So why, so often,  do we end up with these types of reports and dashboards? 

The whole cell colour aspect  is probably mainly down to ease of production. It is much easier to produce this in say, Excel, than to use less over-whelming devices. 

The extensive use of Green is more curious. It probably indicates a service area where performance has regularly been below target and the people concerned are feeling a bit battered.  Lots of Red is demoralising so every opportunity is taken to proclaim the Green. The same motivation will probably lead to adopting the Amber "fail but only just" option. It removes more Red and it gives a feeling of "nearly there".

But before jumping into producing another one of these grids, the analyst should consider what the report or dashboard is actually trying to communicate. In the end, the main purpose has to be to highlight to managers when they need to make a decision or initiate action. 

This being so, what is the purpose of the Green? There are lots of months where the target was achieved. So what? What action will we take? Apart, possibly, from a momentary congratulation  - none. The Green is where we want to be. Colouring it in may give a sense of assurance, but it does not trigger any management action.

The Red rating is more obvious: we have not achieved the target and we need to do something.

What about Amber? A "fail but only just" statement is still a fail. We still need to do something. What we really need to know is where we met the target (Green) but only just. This is a situation where things may deteriorate and become Red. We may need to take action soon, or be pro-active and take action now, so we need to highlight this.

These considerations suggest that what would be best is a "RA" rating i.e. Red (fail) and Amber (pass but only just) system. Leave everything else un-highlighted on the grounds that there is no need to draw attention to it.

An example could look something like














where Red indicates that we missed the target and Amber indicates that we achieved the target but it was so close to the borderline that we may need to act, or at least keep a close eye on it.

This type of Red-Amber rating system can be developed further to make the Amber look more like a faint Red i.e to use a monochrome (Red-red) system in which tone conveys the meaning. Using this approach, together with some further adjustments to contrast, gives the following alternative:
















This illustration uses exactly the same data as the grid above but is set out in a very different way. The idea is to adjust the contrast so that important information stands out and less important information melts into the background. 

Attention is drawn to the latest position rather than to the history of previous results. Indicator 4,  which missed the target is clearly highlighted by the Red marker. Two other indicators (5 and 7), which  only just hit the target, are also highlighted, but in a lighter tone.

The compact visual summary of the last six results then provides additional detail. Indicator 5 has generally been ok so there is probably no need to over-react. Indicator 7 has a history of missing the target. We should continue to monitor it closely.

In conclusion, for dashboards and reports, it may often make sense to abandon the customary RAG systems in favour of a Red-Amber (RA) or Red-red (Rr) as described here.

Finally, another practical reason to abandon the RAG is to help the 10% of males, and 1% of females,  who are colour blind.


References

3 Problems With Traditional KPI Traffic Lights by Stacey Barr. 20/05/2014
(link)

Dashboard Design for Real-Time Situation Awareness by Stephen Few. 2007 (link)


Excel: creating a 'Last Ten Results' indicator panel without using VBA by Edgar Bolton. B I Dashboard Design. 31/01/2016  (link)

Monday, 1 February 2016

Why month-based views are unsuitable for data which exhibits weekly patterns

One of the most commonly encountered formats for presenting data is a table of monthly totals,  often accompanied by a graph.

One reason for the extensive use of this approach is that it is very familiar and is practically self-perpetuating. It almost suggests itself.

Another reason is that it is a relatively easy calculation to produce month based counts using commonly available tools such as Excel, Access, SQL. Producing weekly aggregate numbers from raw data is a slightly higher level of difficulty. Similarly, producing a graph in Excel from a table of monthly totals is almost effortless. Producing an SPC (Statistical Process Control) type run chart is a little less easy.

A monthly review makes perfectly good sense. It is a reasonable time interval against which to assess a progress. Much less than a month and we risk over-reacting to short term fluctuations; much more than a month and we may not give ourselves enough time to act.

Where things can go wrong is where a good time interval for review gets confused with a good time interval for analysis.

Unfortunately there are many situations where a month-by-month view masks, rather than highlights, the real patterns in the data. People presented with a month by month graph in which the numbers go up and down will feel that they are seeing the real changes in activity. They may be reassured if the numbers go the 'right' way and panic if the numbers seem to go in the opposite direction.

The following illustration shows a series of monthly counts of events:


























The monthly totals time series shows numbers going up and down suggesting a pattern of changing activity. 

Most readers will quickly spot that the pattern in the graph is nothing more than a representation of the number of days in each month. It has been built up from exactly 10 events each day throughout the entire period being studied. Nothing is changing at all.

The appearance of changing activity has been exaggerated by the default behaviour of Excel in truncating the vertical axis when creating graphs.


A way of compensating for this is to divide the total number of events in the month by the number of days in the month i.e. to calculate a daily rate. This is added on the illustration above. It shows that the monthly pattern of activity is actually static. 

For analysis, the rule should be that where the period being studied is divided up into unequal time periods (such as months), it is better to use a rate than to use a simple period total. It is surprising how often this simple rule is not followed.

In many situations, this compensating technique (translation into rates) may not be sufficient. The next illustration shows a series of monthly counts of events. The daily rate has been calculated. The rate goes up and down, suggesting change:

























However, here again, the impression of a changing levels of activity is an illusion. 

The underlying pattern of activity is really static. It is a constant 28 events a week throughout the study period. For the purposes of illustration, the weekly pattern synthesised for the illustration has been strongly polarised (All Mondays have 7 events, Tuesdays 6 events, etc down to Sundays with 1 event). 

The spurious pattern in the illustration results from the interplay between the consistent but uneven distribution within each week in combination with the disposition of weeks across month end boundaries.

Another regularly used 'compensatory' technique is to show a monthly total alongside the equivalent month from the previous year:























Surely this approach is valid as, with the exception of some Februaries, we are comparing the activity in exactly equivalent time periods, side by side?

Unfortunately, again, the appearance of changing activity is an illusion. The graph is derived from exactly the same static pattern of 28 events per week as was used in the previous illustration.

If this is what static data looks like through this framework, imagine how hard it would be to make sense of any real patterns of change.

In areas where there are strong weekly patterns, such as healthcare, viewing data through the distortion of a monthly framework makes it very difficult to see what is really going on.

So while monthly views work well for appropriate uses, such as showing cumulative income year-to-date, they will often be a poor choice for displaying and understanding patterns of event activity. 

Unlike the month, the week is a more 'natural' time period which aligns well with patterns of behaviour. It is worth the extra effort of aggregating event data by week and looking at it in weekly run charts. The SPC method can be used to assess whether the variation seen reflects any real changes.

At the very least, a preliminary analysis will be needed to assess the strength of weekly patterns in the data before a decision can be made on how much 'noise' this is likely to create in a monthly time series .

Another technique used in reports and dashboards is to divide monthly totals of events by the number of 'working days' in a month. This can go some way towards making comparisons better. But it still does not compensate fully for a strong weekly pattern. 

Also, it would be necessary to review each measure being reported on and assign an individual number of working days to it. This can vary from measure to measure. Some events would be expected to occur on weekends and bank holidays. Some services may operate for less than five days a week. It cannot be assumed that using the number of 'working days' worked by the analyst in the month is the correct denominator in every, or even any, case.

Sunday, 31 January 2016

Excel: adding interactive drill down to a 'Last Ten Results' indicator panel

A previous article outlined a simple method for creating a 'Last Ten Results' indicator panel in Excel (see link). This supplement shows how to add interactive drill down functionality (Excel skill level: advanced)













Another previous article had shown how to use Excel comments boxes dynamically to provide drill down details (see link). This method can be adapted to work with the last ten results indicator panel. There are a few technical challenges.

Reversing a linking formula
The cells coloured green,  red and grey in the indicator panel read values from a separate data area. The way the data is structured (in this example) the drill-down needs to read from three adjacent rows in the data area, e.g. cells AF10, AF11, AF12.  This is trickier than it first seems. 













Some VBA code which can do this set out below. It is triggered by the Worksheet Selection Change event i.e.the code is run each time a different area on the worksheet is selected. So the code is run when any of the coloured bars are clicked on, displaying the appropriate details for the cell It is run again when any non bar cell is clicked on, removing the previously displayed details.

To make the code a little easier to follow it has been split into three blocks. The first block defines the memory variables that will be needed, clears any pre-existing comments from the worksheet, tests to see if the selected areas is withing the defined target area (i.e the whole indicator panel), and tests to see if it is a coloured results cell or simply the margins around the coloured results cell (requiring no drill down) 






The next block of code records the coordinates of the cell where we want the drill down to be displayed and works out the coordinates of the topmost data cell relating to it




























The final block collates the details from the data area into a cell comment and makes it visible. The details are produced by concatenating all the individual pieces, and using Chr(10) to force new lines within the comments box.



















Inevitably, this VBA code could be written in a more compact and slick manner. But it works as is and runs quickly enough. Also note that the method of 'reversing' the link to identify the data cell is designed for cases where both sections are on the same worksheet. It would need further refinement if the data was located remotely


References

Excel: creating a 'Last Ten Results' indicator panel without using VBA  by Edgar Bolton. 
B I Dashboard Design. 31/01/2016  (link)

Excel: Using cell comments to provide "drill-down" details by Edgar Bolton
B I Dashboard Design. 05/08/2015   (link)

Excel: Using arrows to indicate direction of change in performance  by Edgar Bolton
B I Dashboard Design. 26/07/2015   (link)


How to convert Excel column numbers into alphabetical characters. Microsoft. Article ID: 833402 - Last Review: 09/19/2011 00:01:00 - Revision: 5.0   (link)













Thursday, 28 January 2016

Excel: creating a 'Last Ten Results' indicator panel without using VBA

The BBC website has many good examples of how to display information. The following  device is used to summarise a series of results. 





The BBC use it to display football results in a League Table (see link).  The concept is used more widely in Business intelligence dashboards. There are various approaches to creating this effect. This short article shows a simple way of constructing it using Excel without the need for any VBA coding (Excel skill level: Intermediate). The main idea behind the approach is to use an entire spreadsheet cell for each coloured bar. The colours are controlled by Conditional Formatting.

Step One
Set up a data area. Here, a block of ten successive results by the football team Leicester City F.C. is used. In the illustration, the data is set up on the same worksheet as the dashboard panel. In a real application, the data would probably be held separately. 












Step Two
Each set of bars requires three spreadsheet rows. The middle row is for the coloured bar; the rows either side are clear margins. Similarly,  additional columns are needed between each of the bar cells. Mark out the grid area  by filling each bar cell with a temporary grey background. Then link each bar cell to the corresponding results in the data area.















Step Three
Add Conditional Formatting to the bars. This can be done to the full range as a single block. Three rules will be needed, one for each colour:  In the example, "Win" is to display as Green; "Draw" as Grey and "Loss" as Red. Set the formats so that the font and the background colours are identical, within each rule.













Step Four
If desired, add a light background tint to the whole panel (and set the font colour exactly the same)



Step Five
Shrink the grid by setting column width and row height properties. In the example the following values are used: Bar row 3, margin rows 6, bar columns 1, interval columns 0.17. It may help to zoom to 200% while doing this.











Step Six
To add normal rows of text alongside the bars, merge the three rows in the area adjacent to the bars. This re-creates a more normal row height into which text can be added. When all the row and column adjustments have been made, switch off the grid-lines on the worksheet










It might seem fiddly to set up, but once one line has been created it can be cloned easily. Simply copy and then paste blocks of three rows at a time. 














Enhancements
An additional feature on the BBC website displays the details of each result when the cursor hovers over each coloured bar. A way of adding this type of functionality in Excel (using VBA) is set out in a supplementary article (see link



References

Excel: adding interactive drill down to a 'Last Ten Results' indicator panel  by Edgar Bolton. B I Dashboard Design. 31/01/2016  (link)

Excel: Using cell comments to provide "drill-down" details by Edgar Bolton
B I Dashboard Design. 05/08/2015   (link)

Excel: Using arrows to indicate direction of change in performance  by Edgar Bolton
B I Dashboard Design. 26/07/2015   (link)










Saturday, 31 October 2015

The "Infozak" Pandemic

The word "Muzak" has existed since at least 1954 when it was registered as a trademark in the USA. In a similar manner to the way the name "Hoover" became a generic word for vacuum cleaner, the word "muzak" became generalised too.

Muzak now variously refers to the range of background music played in shops, waiting rooms, during telephone calls when on hold, during breaks in television programme transmission, and so on. The purpose of muzak seems to be to create a calming effect, to create a desired ambience or to fill in problematic gaps

I have coined the term "infozak" to refer to the information equivalent of muzak. The purpose of infozak seems to be to create a calming effect, to create a desired ambience or to fill in problematic gaps.

When presented with a long report replete with tables of data and eye-catching graphs, the prospective reader may immediately gain a sense of calm. Whatever the report is recommending is bound to be the result of a detailed and robust analysis. We can rest easy.

Those involved in the production of the report may know otherwise. How often are reports written "back to front" in the sense that the conclusions are arrived at first and the data and graphs are added later?  The information component of such reports may be selected carefully (i.e. with conscious bias) to attempt to give credibility to proposals arrived at by other methods. Worse still, and common, the information component may be largely decorative, arrived at out of a sense that this type of document "needs a few graphs".

Once you understand the concept of "infozak" you start to recognise it everywhere.

The art of misinformation using "statistics" is at least as old as the profession  of politics. Disraeli is famously attributed as saying "There are lies, damn lies and statistics". Ironically, this attribution itself may not be true, but the activities of modern day political campaigning produce no shortage of examples. Fortunately we are largely tuned in to expect manipulation of information in such arenas.

It is the accidental production of infozak that is the greater concern. Due to the increasing ease of production and dissemination, infozak is everywhere. Bill Gates must take his share of responsibility for the infozak pandemic. 

Many years ago, when my oldest daughter was four, she decided to entertain herself (without authorisation) using Micosoft Access which she had discovered on my computer. In a matter of minutes she had set up a CD collection database system. At the time, as far as I know,  she had received no training in database development and design. She was simply making largely arbitrary choices from the easy-to-use "wizard" in Mr.Gates's software. She produced something about which she had little understanding but which was very pleasing to her.

Microsoft Excel is equally easy to use. Many grown-ups can produce, in a matter of minutes, a raft of visually pleasing tables and graphs from any given selection of unprocessed data. It is possible to produce a time series bar chart of monthly totals in about 80 seconds. With one more click a trend line can be added. Ten more seconds to embed or paste the results into a Word document or Powerpoint presentation. From raw data to information wallpaper in less than two minutes. Beautiful!  





The trouble is that a cursory inspection may not easily distinguish infozak from true information. In fact infozak may look more like information than information does itself. Good information will be seeking to communicate. It will therefore make sparing use of graphical devices. The message may stand out but the medium should blend into the background. Infozak does the reverse. There may well be no clear message at all. The intention is to impress. Graphical devices are used to the full. It looks fantastic. It conveys almost no meaning.

What creates the pandemic is the highly contagious nature of infozak. Once normally rational people are exposed to even a small piece of beatiful infozak they suddenly develop irresistible cravings to start producing infozak themselves. It quickly spreads everywhere. There seems to be no cure.

References

Disraeli quotation