Manipulating data in the SAS datastep. Part 1: Functions

(you can download a zip file containing the SAS programs used in this posting by clicking on the “Give us your feedback” link at the bottom of this page.)

In the previous session (Using Arrays in SAS) we looked at two powerful datastep tools to perform repetitive calculations on variables:   Arrays and Loops.

In this session we’ll extend our knowledge of things you can do in the SAS datastep by looking at a few of the wide range of SAS functions that, in combination with Arrays andLoops, help you accomplish a variety of tasks you often need to do when working with data.

The syntax for a SAS function is:

target_variable = FUNCTION(argument_variable)  ;

where:

  • target_variable is the variable receiving the content produced by the function.
  • FUNCTION is the name of the SAS function.
  • argument_variable is the argument to the function.  Many functions take more than one argument.   Multiple arguments to a function are separated by commas.

e.g.

Y = SQRT(X) ;

The above statement will assign the variable Y the square root of the variable X.   The Function is SQRT.

Here’s a program where we use the SQRT function.

/* using the SQRT function to calculate the square root of a variable */
data example;
input x ;
y = sqrt(x) ;
datalines ;
9
16
144
;
proc print data= example;
run;

This page from SAS Inst. lists the full range of functions available.

Our examples will be from various SAS books written by Dr. Ron Cody.     Please note that the example programs and datasets used by Dr. Cody in many of his books are available for free download as zip files.   (see the Example code and data link underneath many of the books on the right side of the page).

For this session we’ll be looking at some specific examples from those included in Dr. Cody’s  book:   SAS Functions By Example.      The full set of examples from the book is available here and we’ll use that page as our handout in the workshop.

In the workshop:

First, we’ll start with a review of the PUT statement that we can use for debugging our SAS datasteps.  Please see the previous post (Using Arrays in SAS) for a refresher on the PUT statement.  You’ll see a PUT statement in the example below – it will help us understand what’s going on in the datastep as SAS processes each observation.

Next we’ll look at how you invoke functions in your SAS datastep program with this example that illustrates the LAG function for referencing values in earlier observations ( a very hard thing to do without the LAG function by the way!).   In this example we’ll use the LAG function to calculate a moving average.

/* Using the LAG function to compute a moving average */
data moving;
input price;
date = _n_ ;
last = lag(price);
twoback = lag2(price);
if _n_ ge 3 then moving = mean(of price,last,twoback);

/* Using the PUT statement to follow what’s going on in the Data Step */
put ‘Observation: ‘ _n_ ‘ Date: ‘ date ‘ Price: ‘ price ‘ Last: ‘ last ‘ Two Back: ‘
twoback ‘ Moving: ‘ moving;

datalines;
11
8
17
21
15
18
14
;
run;

proc print data = moving;
run;

proc sgplot  data = moving;
series x=date y=price;
series x=date y=moving;
run;

Then thirdly we’ll look at these examples if our time permits:

(the Program number in the list below refers to the program you’ll find in Dr. Cody’s examples page here   The programs below are included in the zip file you can download for this workshop by clicking on the Give us your feedback link below).

  • Program 1.42  illustrating the SCAN function.
  • Program 1.46 illustrating the TRANSLATE function
  • Program 7.7 illustrating the LARGEST function
  • Program 9.5 illustrating the RANUNI function

Give us your feedback.   You can download the SAS program file that contains all the examples used in this posting by clicking on this link.

Screen Shot 2013-11-18 at 7.37.15 PM