(you can download a zipped file containing the SAS program and dataset used in this example by clicking on the “Give us your feedback” link at the bottom of this posting.)
Key Concept 1: SAS programs have data steps and procedure steps.
/* SAS Data Step */
data peter ;
input x y ;
datalines ;
17 15
22 8
run ;
/* SAS procedure step */
proc print data = peter ;
run ;
Key Concept 2: The SAS data step is a full-featured programming language, supporting Loops, Functions, Arrays, etc. i.e. many of the things you can do in other programming languages like Java, C, etc.
Key Concept 3: Arrays are used in conjunction with Loops in the SAS data step to do repetitive calculations and/or analysis with a minimum of coding.
Key Concept 4: An Array is just a convenient way to refer to a number of variables by using one word i.e. using the array name “measures” to refer to the variables “height”, “weight”, “age”, “income”, “semester_level”, etc.
Key Concept 5: A Loop executes a group of SAS statements repeatedly, basically executing the statements for each variable in the array. The statements go between the “do” and the “end” statement. The ”do” and “end” statements mark the beginning and end of the Loop. The statements between the “do” and “end” statements will be executed repeatedly, from the starting value of the Loop’s index variable to the ending value of the index value.
i.e. these SAS statements:
do count = 1 to 5 ; ← Where count is the Loop’s index variable
put count “Hello world” ;
end;
would execute the “put” statement 5 times, as the Loop’s index variable (‘count’) increments from 1 to 5. The put statement would write these statements to the SAS log.
1 Hello world
2 Hello world
3 Hello world
4 Hello world
5 Hello world
Key Concept 6: The Loop’s index variable (‘count’ in the above example) is used to reference the specific variable in the Array you want a SAS statement to work with for this pass through the loop. i.e. when the index variable has the value ‘7’ it means you want to work with the 7th variable in the array for this pass through the loop.
Example: You have a SAS dataset containing hourly temperature data in Fahrenheit for a variety of climate stations. You need to add variables to the dataset that store the data in Celsius. The Fahrenheit temperature data is stored in the dataset as variables temperature1 through temperature24.
The figure below shows variables temperature1 through temperature14 of the dataset, to give you an idea of how the dataset looks. We want to add 24 additional variables to this dataset: cel_temperature1 to cel_temperature24 to store the hourly temperature values as Celsius.
Let’s call the 24 existing variables that contain the temperature in Fahrenheit by the array name temps_fah.
Let’s call the 24 to-be-created variables that contain the temperature in Celsius by the array name temps_cel
Key Point: Arrays must have valid SAS names i.e. must start with a letter, and contain only letters, numbers and underscores. Arrays are not part of the data set, instead they are just a name you use to refer to a number of variables in the dataset. The array will be unknown to other Procedure steps or Data steps – it is only know to the Datastep in which it is created.
Example not using arrays.
/* calculate the temperatures as Celsius values */
data example ;
/* read in the data from an existing sas dataset */
set course.peter ;
/* calculate the 24 new celsius variables */
cel_temperature1 = 5/9 * (temperature1 – 32);
cel_temperature2 = 5/9 * (temperature2 – 32);
cel_temperature3 = 5/9 * (temperature3 – 32);
cel_temperature4 = 5/9 * (temperature4 – 32);
. . .
cel_temperature24 = 5/9 * (temperature24 – 32);
run ;
proc print data=example;
run ;
Example using arrays.
/* calculate the temperatures as Celsius values */
data example ;
/* read in the data from an existing sas dataset */
set course.peter ;
/* array for variables that are already in the dataset */
array temps_fah {24} temperature1-temperature24 ;
/* array for new variables that are to be added to the dataset */
array temps_cel {24} cel_temperature1-cel_temperature24 ;
/* loop that calculates the new variables from the existing variables */
do hour = 1 to 24 ;
temps_cel{hour} = 5/9 * (temps_fah{hour} – 32);
end;
run ;
proc print data=example;
run ;
The index variable for the Loop that will step through the array is called “hour”.
Syntax for the Array statement
array temps_fah {24} temperature1 temperature2 etc. ;
array : SAS keyword
temps_fah : Valid SAS name for the array you want to create. i.e. the collective name for all the variables that you want to have in the array.
{24} : Number of variables that will be in the array.
temperature1 temperature2 etc. ; :The names of the actual variables that will be in the array. The variables can already exist in the dataset, or they can be new variables that are going to be created.
You can use this syntax to specify a group of variables:
temperature1 – temperature24
Which means the variables temperature1, temperature2, temperature3, etc. all the way totemperature24.
Syntax for Loops
do hour= 1 to 24 ;
sas statements go here
end ;
hour : index variable
1 to 24 : Where 1 is the starting value for the index variable and 24 is the stopping value for the index variable
sas statements go here : All SAS statements between the DO and END statements will be executed 24 times in this example. The value of the index variable: hour will be 1, then 2, then 3, and so on up to 24 as the loop proceeds.
Hour is called the index variable.
end; : A Loop always ends with the SAS statement END.
Using an array element inside a DO loop
do hour= 1 to 24 ;
temps_cel{hour} = 5/9 * (temps_fah{hour} – 32);
end ;
temps_cel{hour} = 5/9 * (temps_fah{hour} – 32);:This statement will be executed 24 times because it is between the “do” statement that starts the loop and the “end” statement that ends the loop. On the first pass through the loop, the index variable hour will have the value of 1 so the statement would be interpreted by SAS as: the first variable in the arraytemps_cel should be calculated by using the first variable in the array temps_fah i.e.
cel_temperature1 = 5/9 * (temperature1 – 32);
The loop will then continue for a second pass, incrementing the index variable hour by 1, so the second time through the loop the statement will be interpreted as:
cel_temperature2 = 5/9 * (temperature2 – 32);
where cel_temperature2 is the 2nd variable in the array temps_cel and temperature2 is the 2nd variable in the array temps_fah
Tip
Use a PUT statement inside the Loop to debug your program.
do hour = 1 to 24;
temps_cel{hour} = 5/9 * (temps_fah{hour} – 32);
put hour temps_cel{hour} temps_fah{hour} ;
end;
The two statements between the do and end statement will be executed 24 times, as the Loop executes from its index variable’s initial value of 1 to its stopping value of 24.
Here are the first 10 lines written to the SAS log by the Put statement:
Hour Temperature in Celsius Temperature in Fahrenheit
1 -16 3.2
2 -16.05555556 3.1
3 -16.33333333 2.6
4 -16.5 2.3
5 -18.55555556 -1.4
6 -19.22222222 -2.6
7 -18.33333333 -1
8 -16.66666667 2
9 -16.38888889 2.5
10 -16.27777778 2.7
Give us your feedback. You can download a zipped file of the SAS program and dataset used in this posting by clicking on this link.