Data Visualization I – Creating Tables

Just imagine, you’ve been collecting data for your project for a couple of months or a couple of years.  Your supervisor has just stopped by to visit and wants to see what you’ve been doing!  What do you do?  How do you show them what you’ve been up to?  Data visualization beyond your statistical analyses may be the answer.

This post will be the first in a series of 3 discussing data visualization using SAS.

What is the purpose of Data Visualization?

  • Analysis
    • discover a meaning in your data
  • Communication
    • deliver a message with your data
  • Monitoring
    • monitor a process
  • Planning
    • predictions

Best way to visualize your data?

Will depend on the message you want to convey.  The popular methods of visualization are tables and graphs.  A quick comparison between them:

Tables interact primarily with our verbal abilities.  We tend to read a table.  We read the information in rows or columns.  VERBAL

Graphs are visual representations.  We see patterns and/or relationships between aspects in a graph.  VISUAL

Neither method of displaying your data is better than the other.  Each has its own merits and will excel for one form of communication.  You need to decide which is the best method to convey your message.

When should you use a table?

  • Look up individual values
  • Compare pairs of related values
  • Need precision
  • Multiple sets of values in different measures
  • Show summary and detailed information

Types of Tables

1.  Unidirectional

When categorical items are laid out in one direction.  Either across rows or down columns.

For this example, we will be using a sample dataset that SAS Documentation has available:

data jobclass;
  input Gender Occupation @@;
  datalines;
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 2 1 2 1 2 1 2 1 2 1 2 1 2
1 3 1 3 1 3 1 3 1 3 1 3 1 3
1 1 1 1 1 1 1 2 1 2 1 2 1 2
1 2 1 2 1 3 1 3 1 4 1 4 1 4
1 4 1 4 1 4 1 1 1 1 1 1 1 1
1 1 1 2 1 2 1 2 1 2 1 2 1 2
1 2 1 3 1 3 1 3 1 3 1 4 1 4
1 4 1 4 1 4 1 1 1 3 2 1 2 1
2 1 2 1 2 1 2 1 2 1 2 2 2 2
2 2 2 2 2 2 2 3 2 3 2 3 2 4
2 4 2 4 2 4 2 4 2 4 2 1 2 3
2 3 2 3 2 3 2 3 2 4 2 4 2 4
2 4 2 4 2 1 2 1 2 1 2 1 2 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 3 2 3 2 4 2 4 2 4 2 1 2 1
2 1 2 1 2 1 2 2 2 2 2 2 2 3
2 3 2 3 2 3 2 4
;
Run;

proc format;
    value gendfmt 1=’Female’
                              2=’Male’
                              other=’*** Data Entry Error ***’;
    value occupfmt 1=’Technical’
                                2=’Manager/Supervisor’
                                3=’Clerical’
                                4=’Administrative’
                                other=’*** Data Entry Error ***’;
run;

To create a table we will use the TABULATE PROCedure:

proc tabulate data=jobclass format=8.2;
    class occupation;
    table (occupation=’Job Class’ all=’All Jobs’),
              (n=’Number of employees’*f=9.)/ rts=50;
    format occupation occupfmt.;
    title ‘Job Classes’;
run;

Proc tabulate <- calls up the TABULATE procedure – we specify the name of the dataset and we also specify the format of the cells in our table.

Class statement – just like any other procedure in SAS, variables listed in the Class statement tell us which group observations fall into.  Another way of looking at this – we are listing our categorical variables.

Table – this is where we create the table.  Notice that we are creating rows,columns.  In this example we have the occupation as the row and N as the column.  The row and column content are enclosed in ().  This is done so that we can add additional information, the label for the row and the column = occupation=’Job Class’ and n=’Number of employees’.  We are also specifying in the occupation that we want to see the total – all and we’ve added a label for that as well ‘All jobs’.  the RTS option is the number of spaces we need or want for the headings – so 50 spaces in our example.

Format – applying the formats/labels we created in the Proc Format above.

Title – adding a title to the table

 2.  Bidirectional table

A bidirectional table is one where you have categories across the rows as well as down the columns.

We will build on our example above.

proc tabulate data=jobclass format=8.2;
    class gender occupation;
    table (occupation=’Job Class’ all=’All Jobs’)
              *(n=’Number of employees’*f=9.
              pctn<gender all>=’Percent of row total’
              pctn<occupation all>=’Percent of column total’
              pctn=’Percent of total’),
              gender=’Gender’ all=’All Employees’/ rts=50;
    format gender gendfmt. occupation occupfmt.;
    title ‘Gender Distribution’;
    title2 ‘within Job Classes’;
run;

Let’s concentrate on the table statement as this is where all the action is🙂

If we pull it apart we see that there are 4 row items:

  • number of employees
  • Percent of row total
  • Percent of column total
  • Percent of total

These all appear before the “,”

Now we have gender as our column

If we look at the rows (everything that appears before the , ) you’ll see that we now have an * in the mix.  What this does is says we want several pieces of information for each of the jobs listed in the variable called occupation.  N is self-explanatory.  However we are now asking for 3 different percentages.  The pctn(percentage) of each gender or all the levels of gender.  The pctn(percentage) of each level of occupation or each job, and then the final overall percentage.

 Tips for creating easy-to-read tables – table design

  1. Use white space whenever you can or use subtle fill colours
  2. Avoid grid lines whenever possible – they clutter up the table
  3. Columns or rows?
  4. Groups and breaks – white space and make it noticeable
  5. Hierarchy data – left to right
  6. If there’s a sequence to your values – ensure that they are in order
  7. Keep your text horizontal and left to right
  8. Align your numbers to the right
  9. Align your text to the left
  10. Truncate your data – are 8 0’s really necessary?
  11. Use a legible font

There are many more tips available in Stephen Few’s Show Me The Numbers book

Source:  Show Me The Numbers by Stephen Few, 2012

Screen Shot 2013-11-18 at 7.33.07 PM