Plotting in SAS

Over the course of the last semester – I’ve had a number of my students ask me for more information on how to plot in SAS. I provide them with the syntax to use for plotting residuals to help them assess the assumptions of the models they are creating – but… I really never get into the details of the SGPLOT syntax.

Now if you would have told me that I would want to use SAS to create plots years ago – I would have told you that you were crazy. I had the SAS/GRAPH book and it was a great resource (heck someone stole it – so it must have been good 😉 ) but, there just seemed to be SOOoo many options that you had to learn to take advantage of the SAS/GRAPH features – at that point in my career I just didn’t have the time nor the interest to figure it all out!

Fast forward a number of years to today – and ask me again? YES! Let’s create our plots in SAS! Ok, there’s still some things you need to learn – but the resources that are now available to you – make it so much easier and fun. So, I’m going to add to that pool of resources that are available.

PROC PLOT

Let’s start with a very basic plot, and YES I’m going to use PROC PLOT to get us started. The data we will use is a small snippet of data from my mink project collected in the 1980s. Fifteen female mink were weighed at 3 months of age and we also measured the bone diameter of their rear leg to assess growth. The data is as follows:

IDWeightLength
1112311.7
210889.9
35638.55
48138.95
56829.15
673411.05
79289
88409.95
98408.25
109249.7
1110069.45
128969.45
1390410.85
149578.7
159249.7

First and most important – getting the data into SAS.

Data mink;
input weight bone;
datalines;
1123 11.7
1088 9.9
563 8.55
813 8.95
682 9.15
734 11.05
928 9
840 9.95
840 8.25
924 9.7
924 9.7
1006 9.45
896 9.45
904 10.85
957 8.7
;
Run;

Alright so now let’s try the old standby PROC PLOT. We want to create a scatter plot – since we are working with 2 continuous measures – we will be visualizing our data as a scatter plot – a bunch of dots on an X-Y axis. Let’s put weight as our Y-variable and bone as our X-variable.

Proc plot data=mink;
plot weight*bone;
Run;
Quit;

For those of you that have used SAS for many years – remember these plots? A=1 obs, B=2 obs, etc…

PROC PLOT of Weight by Bone

What an exciting plot!! These are the first types of plots that we would use in SAS – ok I’m taking a LONG time ago – but note that I can still create them if I so choose. But today – I want MORE information and a prettier plot. So let’s move past PROC PLOT and look at PROC SGPLOT.

PROC SGPLOT

Let’s start by replicating the above plot and taking note of the different syntax we need to use. Goal: a plot with weight on the Y-axis and bone on the X-axis.

Proc sgplot data=mink;
scatter y=weight x=bone;
Run;

First thing to take note of – you no longer have to remember which variable goes first in your code – you list them as which is Y and which is X.

The above syntax provides you with this plot:

PROC SGPLOT of Weight by Bone

Quite the improvement right? Now, take a little bit of time to confirm that we are seeing the same information. Yes, I like to double check most of my analyses and visualizations.

Hmm…. now that I’m looking at this plot – I think we need to add a bit of information. Let’s start by adding a title and some units to our axes labels. Please note you could also have added a title to the PROC PLOT plot above.

Proc sgplot data=mink;
title “Mink: Relationship between Weight and Bone Diameter at 3 months of age”;
scatter y=weight x=bone;
xaxis label = “Bone Diameter (mm)”;
yaxis label = “Weight (g)”;
Run;

The title is added by using the title statement. Each of the axes has its own syntax. In this case we wanted to add a label to both the X and Y-axes. To do so, we specify the axis and list the label. Our new plot looks a bit more informative:

PROC SGPLOT with title and axes labels

So it’s starting to look a bit nicer now. But, let’s take it to the next level. I suspect that there may be a linear relationship between these 2 variables. But before I go creating a regression analysis I want to see it on the plot and then I’ll decide if I want to go back and run the regression analysis. Now – this approach may be different for some of you. More often than not, we will run the regression analysis and THEN create the plot. But, with today’s growing emphasis on data visualization, let’s see the plots first to help us decide whether we will or not conduct the regression.

Alrighty – how do I add a regression line? Yes, yes, yes, some of you are already shaking your heads and saying – I can do this in Excel – right-click Add Trendline and Ta-Dah! All done! Sure you can do that but we can do it really easy here too!! Check this out:

Proc sgplot data=mink;
title “Mink: Relationship between Weight and Bone Diameter at 3 months of age”;
scatter y=weight x=bone;
reg y=weight x=bone;
xaxis label = “Bone Diameter (mm)”;
yaxis label = “Weight (g)”;
Run;

By adding one line of code – essentially telling SAS to run the regression with weight as the Y and bone as the X you get this plot:

PROC SGPLOT with the Regression Line added

Looking good right? Alright let’s take it one step further to help me decide whether to run the regression analysis or not. I want to get a sense of how well that line fits my data. Let’s add a Confidence Limits around the line AND Confidence Limits for the individual predicted values. To accomplish this we will add 2 options at the end of our regression line:

Proc sgplot data=mink;
title “Mink: Relationship between Weight and Bone Diameter at 3 months of age”;
scatter y=weight x=bone;
reg y=weight x=bone/ cli clm;
xaxis label = “Bone Diameter (mm)”;
yaxis label = “Weight (g)”;
Run;

PROC SGPLOT With Regression line and CLI, and CLM

Now that’s a great looking plot! After reviewing the plot we just created and the data, I’ve decided that I now want to run the regression analysis. Why? Because I’ve decided I want the regression equation. I know I have more data to be added and want to strengthen my argument that bone diameter has a linear relationship with weight of my female mink.

Easiest way to obtain a regression analysis is to use PROC REG. We don’t have any other information in our dataset, all we have are the weights and bone diameters – no pens, blocks, etc… So PROC REG will be fine. I’m sure you’ve all run this type of analysis before, so I won’t dwell on the code too long. Here is the syntax I ran and the results after.

Proc reg data=mink;
model weight = bone;
Run;
Quit;

As I said above – just the base regression analysis. There are plots created with this analysis today, along with the residual analysis which you will use to assess your model fit. I’m only showing you the results from the model.

PROC REG mode results

So – what can we say about these results? First and foremost our model CANNOT explain a significant amount of variation in our data – p=0.1430. Our r-square is low = 0.1575. Our parameter estimates are not different from 0. Slope has a value of 60.11686 but since it is not different from 0, we essentially have a flat line.

Now go back and review that last plot – does it reflect these results? YES! It does. Right now – I have scroll back and forth and look at the results from 2 separate analyses. Let’s take this plot just one more step – let’s add the regression line or rather the regression equation to our plot!

To do this we will first need to create a dataset with the results from our regression analysis. We will also tell SAS that we do not want / nor need to see the output from our PROC REG again – this is what the NOPRINT option does.

Proc reg data=mink outest=regdata noprint;
model weight = bone ;
Run;
Quit;

This will create a dataset called regdata and if you look at the contents of this dataset – PROC PRINT works well here – this is what you will see:

PROC REG outest=regdata contents

Notice that you can build the equation with the information contained in this dataset – which is exactly what we will do next. The goal of this next step is to create the equation and save it into a MACRO variable in SAS. Let’s see how this happens:

data _null_;
set regdata;
call symput(‘eqn’,”Weight=”||Intercept||” + “||Bone||”*Bone”);
run;

The _null_ dataset is exactly that – no dataset – we need to use the DATA step to create our variable but we don’t need to save it in another dataset – so we use the _null_ dataset.

We tell SAS we need to use the regdata dataset – which we already know contains the information we need to use to create our regression equation.

The next line is creating a MACRO variable called eqn. The CALL SYMPUT() function is what allows us to create and save the information into a MACRO variable. If you look at the last part of the syntax line – I suspect you can see what it is going. The names inside the || || are the references to the pieces of information in the regdata dataset.

When you run this piece of code – you will see nothing happen. BUT if you look in your log window you should see something like this:

DATA _null_ log window

So all this work and why again? Oh right! I want to see the regression equation added to my PROC SGPLOT. So, now we know how to make the plot, we have this eqn macro variable hanging out in SAS waiting to be used – so… let’s put it all together and see what happens:

proc sgplot data=mink;
title “Mink: Relationship between Weight and Bone Diameter at 3 months of age”;
reg y=weight x=bone/ cli clm;
xaxis label = “Bone Diameter (mm)”;
yaxis label = “Weight (g)”;
footnote1 j=l “Regression Equation”;
footnote2 j=l “&eqn”;

run;

In this example we chose to use the footnote feature of SAS. We’ve added 2 footnotes: the first has the title Regression Equation; and the second lists the contents of the MACRO variable eqn we created above. The results are:

PROC SGPLOT with Regression Equation

Pretty cool eh???

One more option to try out! Rather than using the footnotes, you can use the inset function to added the regression equation and specify where you want to add it. Try this one out:

proc sgplot data=mink;
title “Mink: Relationship between Weight and Bone Diameter at 3 months of age”;
reg y=weight x=bone/ cli clm;
xaxis label = “Bone Diameter (mm)”;
yaxis label = “Weight (g)”;
inset “&eqn” / position=bottomright;
run;

This program yields this plot:

PROC SGPLOT with Regression Equation using INSET

Wow! I’m really enjoying the options that are available in SAS to create plots. Very straightforward too! So, next time you need to create a scatter plot and regression line, rather than taking it back to Excel – because we know many of you do this – try it out in SAS!

One final note: SAS Studio users – I just confirmed that ALL of this coding will work there as well! Enjoy!!

2 thoughts on “Plotting in SAS

  1. Thanks for your superb sharing. Just a small addition;
    you need to change ‘eqn’ with “eqn”
    thanks again

    1. Good day and thank-you for your comment. Either set of quotes will work in this code. For consistency I should have use the ” ” – but this is also a great demo on how both sets work. Thanks!

Leave a Reply to Michelle Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s