Data Visualization III – Creating Plots

When should you use a graph?

  • To show relationships among and between sets of values by giving them shape
  • When patterns, trends and exceptions are more easily seen rather than read
  • When a series of values should be seen as a whole

 Type of Graphs

  • Bar charts
  • Line graphs

This post will deal with line graphs or plots as I would refer to them.  I will again, rely on a SAS example.  I will include the complete code here but will annotate it as we work through it.  The example is examining the relationship of 3 analytes across 5 years.

/* Set the graphics environment */
goptions reset=all cback=white border htext=10pt htitle=12pt;

/* Create data to plot */
data samples;
    input year mn n k;
2008 0.19 45 10.6
2009 0.25 54 9.2
2010 0.52 35 11.0
2011 0.15 48 7.2
2012 0.38 29 8.1

/* Create a format to display the data values with percent signs */
proc format;
    picture pctfmt low – high = ‘009.9%’;

When you are creating a plot or a chart in SAS, there are essentially two parts to creating the graph.  The first part or section of the code is used to set out the environment.  This will include, titles, footnotes. legends, axes, symbols, lines, all aspects of the graph before you add the data.

The second part of the code is the actual plot or graph.  In this example we used PROC GPLOT.  Please take note that all the environmental aspects you created above the PROC GPLOT are used by default or are called upon within the code.


/* Define the title */
title1 “Sample Analysis”;

/* The FOOTNOTE statement creates a legend */
footnote1 height=9pt ‘N=Nitrogen K=Potassium Mn=Manganese’;

/* Create the AXIS definitions that draw a thick border around the */
/* plot area and display multiple scales on the vertical axes */
axis1 order=(2008 to 2012 by 1)
value=(font=’Arial/bold’ height=11pt)
axis2 order=(0 to .6 by .2, 3, 6 to 12 by 2, 13, 15 to 60 by 15)
label=(angle=90 ‘Concentration’)
value=(tick=5 ‘ ‘ tick=10 ‘ ‘)
axis3 order=(0 to .6 by .2, 3, 6 to 12 by 2, 13, 15 to 60 by 15)
value=(tick=5 ‘ ‘ tick=10 ‘ ‘)

/* Define the symbol characteristics */

/* SYMBOL1-SYMBOL3 draw a dot at each point */
/* and connect the points with a line. */
symbol1 interpol=join width=2 color=vligb value=dot height=6;
symbol2 interpol=join width=2 color=salmon value=dot height=6;
symbol3 interpol=join width=2 color=vibg value=dot height=6;

/* SYMBOL4-SYMBOL6 are used with the PLOT2 statement and display the */
/* character symbol of the corresponding element for each point */
symbol4 interpol=none value=’Mn’ font=’Arial/bold’ color=black height=12pt;
symbol5 interpol=none value=’K’ font=’Arial/bold’ color=black height=12pt;
symbol6 interpol=none value=’N’ font=’Arial/bold’ color=black height=12pt;

/* Generate the graph */
proc gplot data=samples;
plot (mn k n)*year / overlay haxis=axis1 vaxis=axis2
vref=3 13 cframe=grayee;
plot2 (mn k n)*year / overlay vaxis=axis3;
format mn k n pctfmt.;


Tips for creating plots or line graphs – points and lines design

  1. If sets of points cannot be clearly distinguished:
    1. enlarge the points
    2. select objects that are more distinct
  2. If points overlap:
    1. enlarge graph or reduce the size of the points
    2. remove the fill colour
  3. Distinguish the lines with different colours
  4. Trend lines:
    1. if you’re trying to show a change in time, use a moving average rather than a straight line
    2. Use a linear trend line ONLY if your data shows a linear relationship
  5. Reference lines:
    1. use only to mark meaningful thresholds and regions

There are many more tips available in Stephen Few’s Show Me The Numbers book

Source:  Show Me The Numbers by Stephen Few, 2012

Screen Shot 2013-11-18 at 7.33.07 PM