Thursday, 23 March 2017

Familiarity breeds contempt: the pie chart

William Playfair is generally credited with inventing the pie chart, publishing a variety in 1801 [See Note 1]. The following is a much used illustration, taken from a larger more complex chart:




















The name “pie chart”, which may date back to the 1920s, derives from the resemblance of the chart to a pie divided into portions. In France it became known as “le camembert” through its resemblance to the shape of the cheese.

The pie chart requires segments (“slices”) to be constructed exactly. The angle of each segment must be calculated to reflect its proportion of the whole. Before the advent of computers,  drawing a pie chart in ink could be  tricky involving the use of a protractor and a beam compass. Fortunately, these days, most analysts will never have to attempt this.

Modern computing allows pie charts to be constructed in a matter of seconds. Excel 2016 provides the following types as standard:
  • Pie
  • Pie of pie – one segment is subdivided in a second pie chart
  • Bar of Pie – one segment is subdivided in a second chart, bar not pie
  • 3D pie – a “pleasing” 3 dimensional view
  • Doughnut (or “Donut”) – like a pie but with the middle removed
These can be used with various effects, including exploded slices. 

If its origins can be traced back over 200 years, criticism of the pie chart can be traced back over one hundred years:

“the circle with sectors is not a desirable form of presentation" [See Note 2]


Many people today, including some prominent opinion formers, deprecate the use of pie charts. This criticism is echoed widely [See Note 3]. A person newly embarking on a career as an analyst would be forgiven for developing a view that using a pie chart would be a declaration of incompetence.

Certainly there are abundant examples of pie charts being used badly.  But does it follow that every possible use of a pie chart could be replaced favourably by alternatives?


The two-slice pie chart


The readability of a pie chart generally declines as the number of slices is increased. The pie chart probably works best when there are only two slices. 

As an illustration, the following simple chart shows the result of the recent UK referendum on EU membership (the "Brexit" vote):




















This example chart was produced quickly in Excel, with a few adjustments from the default settings. While this is far from spectacular, it works as an illustration of the result and its "narrowness". The colour scheme is based on that used by the two sides in the referendum campaign, rather than being based purely on aesthetics.

It could be argued that a two-number result is a relatively simple situation so may not need an illustration at all.  But the underlying concept of a dashboard - to provide information "at a glance" - suggests that the overall result of a referendum is likely to be the most important piece of information and should therefore be given prominence. 

So how does the two-slice pie chart compare with some simple alternatives?

Alternative: Simple Free text statement

51.9% voted Leave and 48.1% voted Remain

Is this better than use of a two-slice pie chart? It conveys the information in a compact manner, but it lacks impact and emphasis. The reader has to perform a mental comparison having assimilated the numbers. 

Alternative: Free text statement with additional emphasis


51.9% voted Leave and 48.1% voted Remain
Is this better than use of a two-slice pie chart? It has more impact and emphasis but it still lacks something. It is idiosyncratic and one step towards becoming something like a  "word cloud" or "call out". The reader still has to perform a mental comparison.


Alternative: Table


Outcome Votes Percentage
Leave 17410742 51.9%
Remain 16141241 48.1%

Is this better than use of a two-slice pie chart? It is compact and provides all the information. But there is no impact or emphasis and the reader still has to perform a mental comparison.



Alternative: Simple column graph















Is this better than use of a two-slice pie chart? It provides much of the same impact and emphasis. But (as set out here at least) it lacks overall coherence. It could also be argued that the pie chart perhaps gives a better impression of the interdependence (and reciprocal "swing")  between the two parts. It it hard to argue that the column chart above is vastly superior to the two-slice pie.


Alternative: Simple bar graph











Is this better than use of a two-slice pie chart? It is really just a "tipped over" version of the simple column chart, so many of the same arguments apply. The bar chart has some definite advantages over the column chart in that the alignments make it much more readable. But, again, it is hard to argue that it significantly out-performs the two-slice pie as a device in this situation.

Alternative: Stacked "Percentage" bar graph









Is this better than use of a two-slice pie chart? This is in effect a rectangular pie chart. The two-slice pie has at least one advantage in that it automatically contains a line (the top "cut") which is aligned vertically and against which the position of the split (the second "cut") can be judged visually The bar graph only has a single line dividing the two portions so we would need to add a second line at the 50% mark to enable the same kind of visual comparison.








In my opinion, this last simple example - the stacked percentage bar graph with added line  - more or less matches the two-slice pie chart in overall effectiveness.  The pie chart is arguably a slightly better choice for instant communication, being the more familiar. The bar chart is maybe a better choice in terms of efficient use of space within a report or dashboard.

In conclusion, the two-slice pie definitely holds its own, and is arguably better than any of these a simple alternatives. Not everybody will agree. It may become a matter of stylistic choice, or simple personal preference

It may look, from my comments above,  as if I am taking a general stand in favour of the pie chart against the anti-pie masses. This is not the case. The  "beauty contest" above was looking only at the very simplest of situations: a comparison between two parts of a single whole. As the complexity of the situation increases, I believe that the pie chart quickly begins to struggle to compete.


Side-by-side comparison: two-slice pie charts


For more complex situations, first let's look at how well the two-slice pie chart fares when side by side comparisons are required:





If we want to look at the UK referendum results within the different UK countries, we can compare two side by side as in the above. I think this works as an illustration.

What about comparing three countries?





Personally, I am having to start a bit of visual gymnastics at this point. I can still read the information and make the comparisons, but it is becoming harder with my eyes having to dart backwards and forwards between different parts of the illustration

What about comparing four countries?





Well this arrangement is definitely hard work for me. It would probably be better if the four pies were arranged in a square, some thing more like:




Even adjusting the arrangement, I find this illustration far from ideal.

How does it compare with an alternative such as the stacked percentage bar chart?



In my opinion the stacked bar chart is much more effective than four side by side two-slice pie charts. It is also much quicker to set up in Excel as it is all contained within a single chart - the only complicating aspect being my choice to add a line at the 50% mark. The pie chart approach requires setting up four individual graphs and then aligning them - or their snapshot images. Subtle issues with sizing and positioning would need to be dealt with if I was using this approach "for real".




So, in conclusion, comparing two side-by-side two-slice pie charts works ok; comparing three is less good and may be done better using other approaches; comparing four is definitely less effective than alternatives such as the stacked percentage bar chart.



The three-slice pie chart



The 1801 example from William Playfair is a three-slice pie chart. Here it is again with a quick re-rendering in Excel 2016:




























The graph (which is actually part of a larger more complex illustration) shows the proportional split of the then Turkish Empire into three parts (African, European an Asian). I think that this particular three-slice pie chart works. But many three-slice pie charts will not work so well. It is a combination of features that work in favour of the Playfair chart: The "European" slice is pretty much an exact quarter and the sizes of the three slices are not similar to each other. Playfair was blissfully unaware of the conventions that Excel and generations of statisticians would impose upon his idea: by default these would tend to arrange the slices clockwise in order of decreasing size.








In my opinion the clockwise "default" version is slightly less effective than Playfair's original, due largely to the fact that it loses the near horizontal "3 o'clock" line. The chart still works because the three segments are quite different in size.

When the segments are of similar size, things get harder:
















How easy is it to tell the relative sizes of the three slices in the adjusted graph above? I think it is not easy. We would have to add the actual numbers (121%,120%,119%). How does this compare to alternatives, such as a column chart?

























Well the column chart is not particularly easy either in this situation (the insipid colours do not help), but it is possible to make out the relative sizes of the three constituents. So this type of chart does seem more effective at dealing with the same fine margins (1%).




Unlike the three-slice pie chart, which fails, the two-slice pie chart can also handle differences of around 1%:






















In conclusion, there are some situations in which a three-slice pie will work and some in which it will not work. These depend largely upon the data. This means that if a three-slice pie is used in a dashboard, it becomes unpredictable whether it will be effective or not as the data is refreshed. This would tend to make it an unreliable choice.


The verdict on simple pie charts?

I've tried to look at the merits of the pie chart afresh, ignoring the various views expressed by others. My conclusion is that there are some cases - simple two-slice pie charts and side-by-side comparisons of two two-slice pie charts - where it is an effective option. There are situations - such as three-slice pie charts when it can be effective, but it can also be ineffective depending upon the actual data used. So this makes a three- (or more) slice pie chart a risky choice in a dynamic dashboard. In one-off reports with static data, three- or more slice pie charts may be an effective option but a judgement would need to be made on a case-by-case basis. As a rule, I think I would avoid anything over three slices and I would avoid side-by-side comparisons of more than two pie charts.


Finally, the whistles and bells: pie chart variants

Apart from the "standard" pie chart, Excel offers a numbers of additional types and features. How effective are these and when should they be used (or avoided)?


(a) Pie of pie


One segment is subdivided into a second pie chart. This seems to be a messy compromise acknowledging that a pie chart struggles to be effective when the number of slices increases. But attempting to "solve" this by spilling over into a second pie chart produces a confusing overall pattern

(b) Bar of Pie


One segment is subdivided in a second chart, but as a bar not as a second pie chart. This suffers from many of the same deficiencies as the pie of pie



(c) 3D pie


A “pleasing” 3 dimensional view. These 3D charts can be made aesthetically very pleasing but distort the relative sizes of the slices, making the data even harder to read than from a 2D pie chart. These charts may therefore appeal to those making sales presentations but tend to be abhorred by analysts. In the end, the question is what the purpose of the chart is in its context


(d) Doughnut (or “Donut”)


This is like a pie chart but with the middle removed. They are very popular in commercial dashboard software. They have most of the same strengths and weaknesses as a regular pie chart but with the added difficulty of making the areas even harder to assess. So unless the objective is to save on printer ink, a straight swap of a doughnut chart for a pie chart offers little advantage

The doughnut chart can be set up in two different ways: like a pie chart with a hole in it or like an "onion"


[1] See I.Spence. No Humble Pie: The Origins and Usage of a Statistical Chart. Journal of Educational and Behavioral Statistics. Winter 2005, Vol. 30, No. 4, pp. 353–368


[2] W.C. Brinton in the book Graphic methods for presenting facts. 1914. New York.

[3] For a selection:

https://www.stevefenton.co.uk/2009/04/pie-charts-are-bad/
http://kosara.net/publications/Skau-EuroVis-2016.html
http://www.businessinsider.com/pie-charts-are-the-worst-2013-6?IR=T
http://www.storytellingwithdata.com/blog/2011/07/death-to-pie-charts
https://www.quora.com/How-and-why-are-pie-charts-considered-evil-by-data-visualization-experts
https://www.geckoboard.com/blog/pie-charts/#.WMr7svnyjtQ

https://twitter.com/EdwardTufte

 



Sunday, 12 March 2017

Using a pie chart to show abundance over 12 months

When I'm not sitting at a computer trying to figure out how to get Excel to do things, I like to take wildlife photographs. I've just bought another old bird book from a charity shop: Jim Flegg's Field Guide to the Birds of Britain and Europe (New Holland. 1990)  Amongst other good things, it has an interesting type of chart in it (see the right hand side of the scan sample below):
















A simple three scale system is used to indicate, for each month of the year, whether the bird is 

  • not likely to be seen  
  • fairly likely or 
  • highly likely

Although the chart looks superficially like a pie chart, it is more like a choropleth or heatmap. The slices are equal and never change size. They represent the twelve months of the year, running clockwise. The scale is a single-hue progression.


How easy is it to re-create this chart in Excel?

The challenge is to create a "dashboard" version of this chart such as



















in which a bird can be selected from a drop-down list with the corresponding data automatically populating the chart.


Concentrating first on how to create the chart itself: It will obviously have to be rendered as an Excel pie chart. The particular issues to resolve are:

  • How to display equal sized slices
  • How to get the month identifier letters displayed inside the slices, rather than a number
  • How to apply the differential slice tones 


The first two are relatively straight forward:

Giving each month an equal numerical value will make the slices the same size. For idiosyncratic personal reasons I have chosen to give each slice the value 30 (because 360 degrees cut into 12 slices would each be 30 degrees) but any equal number will work exactly the same in Excel.

To display the month letter (rather than the number 30) inside each slice, adjust the properties as follows

Use Format Data Labels:

  • Set Label Options to 'Category name' (not to Value, the default)
  • Set Label position to 'Inside end'


The 'tone' of the slices is set by the Fill properties. If you search the internet for ways to control this property you will generally end up programmatic solutions based on Visual Basic. This is an option.

But there is a way of achieving this without using Visual Basic. It requires setting up a Pie Chart with 36 slices.






















Each month requires three slices. These are then manually formatted to provide the fills for 'None', 'Medium' and 'High'

The data settings for each month will then determine which slices are displayed:




In the illustration, the values for each month are set in column C

The values graphed are those calculated in column G. These will be 30 if the slice matches the value selected in Column C or otherwise will be zero.

The zero value slices will in effect be omitted. The formula controlling this in Column G is

=IF(H3=VLOOKUP(E3,$B$3:$C$14,2,FALSE),30,0)

The labels are based on column F, which take the first letter of the month name, or if there is a zero value slice are blank. This is calculated through the following formula in column F

=IF(G3>0,LEFT(E3,1),"")

That is all there is to it. To use it in a "real" dashboard, we would put the graph in a separate area (or sheet)  to the calculations and then link the selected values to a data table of all birds











Thursday, 16 February 2017

The Gestalt Laws of Perception



Data visualisation "gurus" promote the importance of keeping things simple. They remind us that our primary purpose is to communicate. They deride the use of flashy, over complicated devices which are intended to impress. Then, with no sense of irony, many devote sections of their books to "The Gestalt Laws of Perception" (or variant phrasings of this).


I wasn't sure what to make of this at first. It reminded me a bit of advertisements which use the device of a man in a white coat to imply that the effectiveness of a product is scientifically proven.

So what relevance does "Gestalt" have to the design of business intelligence dashboards?

Firstly "Gestalt" itself is a branch of Psychology. It is usually said to have been founded by Max Wertheimer (1880-1943), Wolfgang Kohler (1887-1967) and Kurt Koffka (1886-1941) although earlier influences are also cited.

The word "gestalt" is German, translating roughly as "shape" or "form" or "pattern". Gestalt Psychology is interested in how an overall whole is perceived. A therapeutic approach has been developed out of this. But it is the insights into the cognitive processes of perception that interest us for design purposes.

The recognition of the "phi phenomenon" (by Max Wertheimer around
 1912) was an early element in the development of Gestalt ideas. 

Switching a series of static lights on and off in succession can create the appearance of movement.

The animated gif image illustrates this. Sixteen individual "lights" have been set to go on and off in sequence. Our brains interpret the pattern as a single dot moving in a circle. 

This is the basis of animation. The zoetrope and other similar 19th century inventions use this effect.

The tagline of Gestalt - a quotation from Kurt Koffka -  is that "the whole is other than the sum of the parts" (this is different to the sometimes misquoted "the whole is greater than the sum of its parts", which is a definition of synergy)


The Gestalt Psychologists went on to define a set of "Principles" or "Laws" of Visual Perception. Both these words sound a little grandiose to me so  I prefer to call them "effects". While worrying about words, it is also worth noting that the terminology used to describe the effects varies a bit between different writers. There is even some inconsistency regarding how many "laws" there actually are. None of this really matters. It is the visual effects described that we need to understand. The vocabulary is secondary.


Similarity



This is the effect in which similar things are perceived to be related.

Distinctions can be based on different attributes including colour, intensity, shape, orientation, size



Proximity


This is the effect in which things which are close together are perceived to be related




Connection



This is the effect in which things that are connected are perceived as being related



Enclosure

This is the effect in which things which are enclosed are perceived as being related. 

This effect is sometimes alternatively called the law or principle of "segregation" or of "common regions".

In the illustration, the enclosure is achieved using a line. It can also be achieved in other ways, for example with a contrasting background tint.


Common fate




This is the effect, also called "synchrony", in which elements moving in the same direction are perceived as being more related than elements that are stationary or that move in different directions.

This effect obviously makes sense in animated figures rather than in static ones.


Experience








This is the effect in which our past experience directs our perceptions. Having had our attention drawn to the same block of 9 squares in the preceding five illustrations, we look to these squares again for a relationship (even though in this case there is nothing different about them).

This effect is not one of the original Gestalt "laws" as such but it is important.

Experience can be that created within our interaction with the dashboard, report or presentation itself. It can also be shaped by context, such as the "house style" of an organisation or the conventions of a culture.

In addition to all of these, our unique individual life experiences will also come into play. We are more likely to recognize something if we have seen it before.




Continuation







This is the effect in which things arranged in a line are perceived as being related

Addition of two sets of squares either side of the original figure creates the impression of a continuous line through it



Closure




This is the effect in which we tend to fill in the gaps to complete shapes.

(see K.Koffka, Principles of Gestalt Psychology (1935) p.167)

Stronger examples, often used by other writers, are provided by the triangle illusion first produced by Gaetano Kanizsa in 1955 or the well known panda image used in the more recent WWF logo
















Simplicity











This is the effect in which we will tend to resolve complex images into the simplest possible forms. For example, the shape in the illustration will most likely be perceived as two overlapping squares

This effect is a core concept in Gestalt and was originally called the "Law of Prägnanz"  also referred to as  "Good Figure".




Object or background










This is the effect, usually referred to as "figure/ground relationship",  in which we try to resolve things as being either objects or background. In the left hand illustration we probably resolve the image as being a dark grey square (the object) on a paler background.

The second image is able to be resolved the other way round i.e. into a large grey square with a small square hole in it. The overall structure of the two illustrations is the same. The fact that the overall page has the same background colour as the little square which influences our perception.

This effect is less predictable than the previous ones. There can be what is sometimes called "unstable" resolution in which the perception switches to and fro between two different versions. The work of M.C.Escher provides many examples where this is entertaining when done deliberately. Accidentally creating this effect, however, runs the risk of becoming distracting.


OK....so what?

Most of the "Gestalt" effects described above are very familiar. We understand them intuitively. When written down, or otherwise presented back to us, they can seem like a statement of the obvious. 

Why,  then,  do we often appear unaware of them when designing our tables, graphs and dashboards?

Probably this is partly because our design decisions can be influenced by other strong factors too, such as the desire to impress (egocentricity) , assumptions about how things are supposed to look (conformity), or the default settings in Excel (popularism)

It is partly also because the interplay between the different effects is not so immediately obvious. There is a loose hierarchy based on the relative strength of the effects in combination:


Proximity is generally stronger than Similarity



Connection is generally stronger than Proximity and Similarity


Enclosure is generally stronger than Connection, Proximity and Similarity

Closure can be stronger than Proximity










The individual effects can also be applied in different intensities. This can have an impact on their interplay.

Once you understand the nature and interplay between these effects you can use them to create deliberate visual hierarchies. These will ensure that the information stands out appropriately in your tables, graphs and dashboards.

This understanding will also help avoid situations where the wrong choices result in things being harder to read than they need be.

The various books and articles listed below contain useful examples and further discussion.


So where did I used to go wrong?

When I look back now at some things that I used to do I can see I was often applying some quick formatting - particularly to tables - which was inharmonious. At the time I probably thought I was making the table (whether in Excel or Word) look "smart" and "professional". I allowed the apparent neatness to obscure the fact that I was making it harder work for the reader.

A typical example would have been to apply a "hierarchy" of thick and thin boxes to the entire table something like:
























In terms of the Gestalt effects, I can see that I have used Enclosure - one of the strongest effects - around practically every cell in the table.

With the Gestalt effects in mind, and removing the over-precision in the data, the following is an alternative view of the same table:





























This may not be to everybody's taste - possibly not even to mine - but it does illustrate how some Gestalt effects can be employed.

The "zebra stripes" use Similarlity (or arguably a form of Enclosure) to group individual month data together. Some purists would argue against "zebra stripes", saying that the use of white space between the rows can achieve as good an effect. I'm not entirely convinced

Proximity is used to group the four Expenditure columns together. This is further enhanced by the use of the horizontal line over the sub-headings, which is a form of Connection. The subheadings are further joined to their data through the Continuity effect. Similarity of font size is used to distinguish the three Expenditure subheadings from the Expenditure Total and the other columns.

By toning the base font colour down from black to gray, I can then use black for emphasis. The contrasts allow the "bad" month to stand out clearly (I am wanting to draw attention to this)

The table does not need a box drawn around it as the Closure effect already defines it as a block 


References and further reading

Principles of Gestalt PsychologyK.Koffka, 1935. This is a substantial work by one of the founders of the Gestalt movement. It can be found online. Most dashboard designers will probably skip this one. 

Information Visualization. Perception for Design, Second Edition, Colin Ware, Elsevier. 2004. This is a detailed and wide-ranging book. Gestalt laws are covered in the section "Static and Moving Patterns" starting at page 189.

Now You See It: Simple Visualization Techniques for Quantitative Analysis, Stephen Few,  Analytics Press, 2009. A very good book generally. It is more focussed on "pre-attentive attributes" than on Gestalt effects per se.

Show Me the Numbers: Designing Tables and Graphs to Enlighten, Second Edition, Stephen Few, Analytics Press, 2012.
Chapter 5 covers "Visual Perception and Graphical Communication" and the Gestalt principles are discussed explicitly in pages 80-5. 

See also link for a short discussion on visual hierarchies


Information Dashboard Design: Displaying data for at-a-glance monitoring, Second Edition, Stephen Few,  Analytics Press, 2013. 
Chapter 5 covers "Tapping into the Power of Visual Perception" and the Gestalt principles are discussed explicitly in pages 87-91. This is probably the best known of Stephen Few's books and is essential reading for anybody interested in dashboard design.



Dashboards for Excel by Jordan Goldmeier and Purnachandra Duggirala (2015). Apress.
The chapter "What is visual perception and how does it work?" includes a section, pages 36-45 "Our bias towards forms: perception and Gestalt psychology" which expands upon the material covered here. This is a really useful book, which has tips and tricks that would take you years to discover for yourself
https://www.amazon.co.uk/Dashboards-Excel-Jordan-Goldmeier/dp/1430249447

Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knafic. Kindle Books
"The Gestalt principles of visual perception" are covered in locations 1453 to 1556


Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel (Voices That Matter)  Jorge Camoes. Kindle Books. 2016 
https://www.amazon.co.uk/Data-Work-practices-effective-information-ebook/dp/B01DYIPZF4
Section Two "Visual Perception" covers the "Gestlat Laws" between locations 1453 to 1556


The Functional Art: An introduction to information graphics and visualization (Voices That Matter) by Alberto Cairo. Kindle Books. 2012.

https://www.amazon.co.uk/Functional-Art-introduction-information-visualization-ebook/dp/B0091SXDOM/ref=sr_1_2?s=digital-text
Gestalt is covered in the section 'The Gestalt School of Thought and Pattern Recognition' starting at location 1655


Design Principles: Visual Perception And The Principles Of Gestalt, Steven Bradley,  Smashing Magazine.March 28th, 2014 (link