Data Visualization II – Creating Barcharts

Let’s continue on our data visualization journey.  To recap briefly, a table would be used in the following situations:

  • When you want the reader to look up individual numbers
  • When you want the reader to compare pairs of related values
  • When you need to show precision
  • When you need to show multiple sets of values in different measures
  • To show summary and detailed information at the same time

However, there are other situations when you need to show your data where a table may not be the best option.

Best way to visualize your data?

Will depend on the message you want to convey.  The popular methods of visualization are tables and graphs.  A quick comparison between them:

Tables interact primarily with our verbal abilities.  We tend to read a table.  We read the information in rows or columns.  VERBAL

Graphs are visual representations.  We see patterns and/or relationships between aspects in a graph.  VISUAL

Neither method of displaying your data is better than the other.  Each has its own merits and will excel for one form of communication.  You need to decide which is the best method to convey your message.

When should you use a graph?

  • To show relationships among and between sets of values by giving them shape
  • When patterns, trends and exceptions are more easily seen rather than read
  • When a series of values should be seen as a whole

 Type of Graphs

  • Bar charts
  • Line graphs

If you think about the graphs you have come across over the years, the majority of them are a combination of bars or lines.  Yes, there are MANY, MANY more but for now we will concentrate on bar charts and line graphs.

Bar Charts

For this example, let’s use the following data:
data fitness;
    input age sex $ heart;
28 M 86
41 M 76
30 M 78
29 M 54
35 F 65
38 F 66
27 F 84
45 F 70
32 M 71
25 M 53
50 F 60
32 F 69
31 M 76
41 M 56
29 M 71
29 F 59
43 F 60
32 M 54
21 M 81
40 M 79
32 F 71
27 F 55
46 M 67
35 F 49

Let’s start with a basic bar chart to show us how many males and females are in this dataset

Proc gchart data=fitness;    
    vbar sex;                              <- Using vertical bars for sex

The result is a bar chart with 2 bars – one for males and one for females and it shows the frequency

Let’s change it up and recreate the same graph using horizontal bars

Proc gchart data=fitness;
    hbar sex;                             <- Using horizontal bars for sex

We have the same graph on its side.  However, we have additional information, the frequency, cumulative frequency, percent, and cumulative percent – the same statistics we obtain when we run a Proc Freq.

Let’s change the sex to age – create a graph where the bars are age.

Proc gchart data=fitness;
    hbar age;

By default SAS will create midpoint values for the continuous variable of age.  If you look at the graph the first thing you’ll ask is: “are all the individuals in our dataset 24, 30, 36, 42, and 48 years of age?”  No!  SAS determined the range of the values then created 5 midpoint values to display.

We don’t like these values and want to select our own “midpoints” of 30, 40 and 50

Proc gchart data=fitness;
    hbar age / midpoints=(30 40 50);    <- Setting the midpoints

These graphs are fine, but ideally what I would like to see the average heart rate for each age group.

Proc gchart data=fitness;
    hbar age / midpoints=(30 40 50)
                      sumvar = heart          <- We want to see heart summarized by the bars
                      type = mean               <- the summary we want to see is the mean

The next piece of information we want to add are the error bars.

Proc gchart data=fitness;
    hbar age / midpoints=(30 40 50)
                      sumvar = heart
                      type = mean
                      errorbar= both    <- you can change the type or errorbars you’d like to                                                               see – both, top, bar

This is our final graph, but we need to add information about it.  The axes, titles, and labels.

title1 “Average Resting Heart Rate by Age”;
axis1 label=(“Heart Rate” j=c “Error Bar Confidence Limits: 95%”)
axis2 label=(“Age” j=r “Group”);

Proc gchart data=fitness;
    hbar age / midpoints=(30 40 50)
                      sumvar = heart
                      type = mean
                      errorbar= both
                      raxis=axis1     <- links back to axis1 statement above
                      maxis=axis2   <- links back to axis1 statement above
                      noframe          <- removes the frame around the graph
                      freqlabel=”Number in Group”   <- label for the frequencies
                      meanlabel=”Mean Heart Rate”  <- label for the means

What if we wanted to see the bars by sex?  We can do that by adding a group= option

Proc gchart data=fitness;
    hbar age / midpoints=(30 40 50)
                      sumvar = heart
                      type = mean
                      errorbar= both
                      freqlabel=”Number in Group”
                      meanlabel=”Mean Heart Rate”
                      group=sex               <- breaks out the bars for males and females
                      space=1                   <- space between the age groups within sex group
                      gspace=5                 <- space between the sex groups

Tips for creating bar charts – bar component design

  1. Use horizontal bars when the labels won’t fit side by side
  2. If you use a legend, do not use white space to separate your bars
  3. Do not overlap the bars
  4. Avoid fill patterns – use solid colours
  5. Use fill colours that are balanced in intensity – you don’t want to emphasize one bar over another
  6. Always start bars at zero

There are many more tips available in Stephen Few’s Show Me The Numbers book

Source:  Show Me The Numbers by Stephen Few, 2012

Screen Shot 2013-11-18 at 7.33.07 PM