Area Under the Curve

Calculating the Area Under the Curve is used in many different fields of research and departments here at the University of Guelph.  The example we will work through here, is one from Food Science.  A PDF version of the Powerpoint presentation used for this session can be viewed here.

The presentation explains the process of calculating the AUC (Area Under the Curve), so I will concentrate on the SAS coding here.

Step 1:  entering the data into SAS.  The sample data used in this session is the value of adhesiveness for a food sample, and it was measured at 6 time points.  The dataset is called adhesive, and has 2 variables:  X – which is the time point at which the measure was taken, and ADHESIVE – which is the value of adhesiveness at the matching X timepoint.

Proc print – was used to view the data and to ensure that it was entered correctly into SAS.

Step2:  Creates the plot.  It’s often easier to understand the process of any statistical analysis if you can visualize it.  In this example, we will plot the data to show you the individual areas that SAS will calculate the area of the trapezoid – as explained in the Powerpoint presentation.

SAS graphing features can be tricky and a lot of fun!  So let’s start with the Proc Gplot and work our way backwards.

Proc gplot data=adhesive;   Calling on the GPLOT procedure and using the dataset we created earlier called adhesive

plot   – now we are telling SAS what plots we want to create.  Notice that in this line we are telling SAS to create the plot of Adhesive*X  (Y by X) and we’re doing it twice.  NO, this is not a typo – we are telling SAS that we want to plot the data twice.  I’ll explain this a bit more in a moment.

/ overlay  frame cvref=blue vminor=w hminor=0;  These are all options for the plot.

overlay:  put the 2 plots on top of each other
frame:  put a frame or box around the entire graph
cvref=blue: create a horizontal reference line (by default this will be set to 0) and make it blue
vminor and hminor = 0:  We do not want minor ticks on either the vertical or horizontal axes

So, back to the 2 Adhesive*X plots.  Notice before the Proc GPLOT is called – we have two lines of code, one referring to symbol1 and a second referring to symbol2.  Each of these lines refers to one of the plots.  SYMBOL1 sets out all the characteristics for the first Adhesive*X plot – so we have:

v=none:  value = none – which means we do not want anything displayed for the datapoint
c=green: We want the colour associated with this first plot to be green
i = needle:  we want a needle or a straight vertical line from the x-axis to the datapoint – note that these will be green.

For SYMBOL2

v=dot:  We now want a dot at each datapoint for the second Adhesive*X plot
c=red: We want the dots to be red
i = join:  draw a straight line joining each of the datapoints.

Now take a few moments to review the resulting graph.  Can you find all the features listed above?

Step 3: Now the fun part – the calculation piece.

Remember that whenever we are working with the data in SAS, we need to work within a DATA Step.

Data adhesive (keep= X Adhesive TrapezoidAdhesive SumTrapezoidAdhesive);
   set adhesive;
   lagtime=lag(X);
   lagvalue=lag(Adhesive);
   if X = 0 then do;
      lagtime = 0;
      lagvalue = 0;
   end;
   TrapezoidAdhesive = (X – lagtime) * (Adhesive + Lagvalue)/2;
   SumTrapezoidAdhesive + TrapezoidAdhesive;
Run;

Data adhesive (keep= X Adhesive TrapezoidAdhesive SumTrapezoidAdhesive);  – Remember SAS likes us to save the dataset before we make any changes – so we are calling our revised dataset by the same name – adhesive – and we are only keeping the 4 variables listed in the keep part.

   set adhesive;  We already read the data in, so we are telling SAS to use a dataset called Adhesive

   lagtime=lag(X);
   lagvalue=lag(Adhesive);  In a previous SASsyFridays session we talked about the lag() function.  WE are creating a new variable called lagtime – which has the previous value of X that SAS has read in.  Same with lagvalue.

 if X = 0 then do;
      lagtime = 0;
      lagvalue = 0;
   end;    This piece of code only pertains to the first time we run through the program.  When we read the first value of X which would be 1, then we ask it for the one previous to that, which is 0, SAS will complain when we start doing calculations for all the variables, so we set out starting values of 0 for our lagtime and for our lagvalue.

   TrapezoidAdhesive = (X – lagtime) * (Adhesive + Lagvalue)/2;  Here is the calculation for the Trapezoid

   SumTrapezoidAdhesive + TrapezoidAdhesive;    Calculating the sum of all the parts.

Then we finish the DATA step with a Run; and we print off the results.  You will notice that one column shows you the area fore each trapezoid while the last column is the running total.

Macro

This is a really slick piece of coding, but what if you have several variables you want to use this code with.  Your options are to copy and paste as many times as you have variables, change the name of the variable and run over and over, or create a macro to do this for you.

We talked about creating macros for repetitive operations in the past blog post, but I wanted to show you how we did this for this coding as well.

/* This is the macro code – you run this once to ensure that the macro is stored
in the SAS session that you are running */

%macro Auc(dataset, out, FoodAttribute);   Name of the macro is AUC and it needs 3 pieces of information to run:  a dataset name, the name of the output dataset, and the name of the variable being tested.

symbol1 v=none c=green i=needle;
symbol2 v=dot c=red i=join;

title “Results for &FoodAttribute”;  the &FoodAttribute – is whatever value you provided when you called on the macro to run

Proc gplot data=&dataset;    same here you have &dataset – so this is the name of the dataset you listed when you called on the macro to run

plot &FoodAttribute*X &FoodAttribute*X / overlay
frame
cvref=blue
vminor= 0 hminor=0;
Run;
Quit;

Data &out (keep= X &FoodAttribute Trapezoid&FoodAttribute SumTrapezoid&FoodAttribute );
set &dataset;
lagtime = lag(X);
lagvalue = lag(&FoodAttribute);
if X = 0 then do;
lagtime = 0;
lagvalue = 0;
end;
trapezoid&FoodAttribute = (X – lagtime) *(&FoodAttribute + Lagvalue)/2;
SumTrapezoid&FoodAttribute + Trapezoid&FoodAttribute;
Run;

Proc print data=&out;
Run;
%mend;

Notice that all the code we talked about in this blog post is all listed inside the macro.  We close the macro with the %mend statement.

To run the macro we do the following:

%AUC(sensory, Adhesive_auc, Adhesive);  %AUC is telling SAS we are using the macro called AUC (which we just created and ran above),  Inside the brackets we match the pieces of information that it needs to run:  name of the incoming dataset (sensory), name of the output dataset (Adhesive_auc), and the name of the attribute or variable we are studying in this example.

We have several to run – so you can see the changes we make to this line of code that will run the macro.  To run it, simply highlight the line and click your friendly running dude.

%AUC(sensory, Cohesive_auc, Cohesive);
%AUC(sensory, Dense_auc, Dense);
%AUC(sensory, Grainy_auc, Grainy);

Conclusion

I would like to sincerely thank Madhu Sharma for bringing this question to me and for willing to put together the presentation for the SASsyFridays blog.

Screen Shot 2013-11-18 at 7.33.07 PM

One thought on “Area Under the Curve

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s