Working with dates

For this session we will be working with data on Admission and Discharge dates of patients in a hospital.

e.g.

Admit Date          Discharge Date        Reason for Admission

04-OCT-2013     06-OCT-2013                flu
09-OCT-2013     14-NOV-2013                flu
16-SEP-2013      21-DEC-2013               heart
31-May-2013         6-JUN-2013                leg_injury

Working with such “date” information in SAS requires us to use four (4) SAS concepts:

1.  INFORMATS – these statements tell SAS to read a column in our raw data as “date” AND give SAS a picture of how the date is represented in the raw data e.g.  2014-02-07, 02/07/14, etc.

2. FORMATS – the statements that tell the Procs in SAS how to display a variable of type “date”.

3. FUNCTIONS that work with dates.

4.  A fancy use of the PUT function to convert a variable already in a SAS dataset to a “date” variable, so you can use the array of SAS functions that help you use dates in your analysis.

For the example data above we will use SAS to determine:

  •      length of stay for each patient
  •      which quarter in the year did the admission take place
  •     day of the week of admission

Being able to determine these things from our admission data will help us see if certain days of the week or quarters of the year are associated with certain admission reasons.  e.g. do we see more admissions for flu in the first quarter of a year.   Are “leg injuries” more prevalent on a Monday, etc.

The items above are easy to determine with SAS, but the variables holding the information on dates must be of type:  DATE.    We use variables of type NUMERIC and CHARACTER often when we’re using SAS, this session introduces us to variables of type DATE.

To use DATE variables in SAS you need to use informats and, optionally,  formats.

  • informats tell SAS how to read data that should be considered as dates from external files.
  • formats tell SAS how to display variables of type DATE in the output from Procs

This paper gives a great overview of informats and formats.

Using informats to create variables of type DATE when reading data into a SAS dataset

If you are reading data into a SAS dataset with the INPUT statement, you tell SAS a variable is a DATE variable by following its name on the INPUT statement with an “informat” – literally the format you want SAS to use to read in the information, hence the reason it’s called an informat.    You’ll see the informats in blue in the SAS example below.

There are many different informats provided with SAS to give you very fine control over how a variable should be created.

The close cousins to informats  are formats formats give you very fine control over how a variable should be displayed in the output of Procedures (e.g. PROC PRINT).

We’ll be using the date11. informat to create variables of type DATE.  You can see thedate11. informat in blue text after both of the admit and discharge variables on the INPUT statement in the SAS example below.

The date11. informat specifies that the variable named just before it on the INPUT statement should be created as type DATE.   Also the date11. informat gives SAS a picture of what the data in the raw data file looks like so SAS knows what part of the data is the day, what part is the month and what part is the year.  If your date data is recorded in a different format in your raw data file than the dd-mmm-yyyy layout we’re using below (e.g. 04-OCT-2013 ) then you merely give SAS the correct informat to describe the way your raw data is setup.  You’ll see a complete list of the informats that apply to reading dates here.  Just choose the one that describes how your raw data on dates has been entered in the external data file you are reading.   Here is a list of the most commonly usedinformats – note all the informats end with a period:
INPUT LOOKS LIKE                       USE THIS INFORMAT
01/23/1963                                                 mmddyy10.
1/23/1963                                                   mmddyy10.
01/23/63                                                     mmddyy8.
1/23/63                                                       mmddyy8.
January 23,1963                                        worddate20.
jan 23, 1963                                               worddate12.
23jan1963                                                  date9.
23jan63                                                      date7.
23-jan-1963                                               date11.
01-23-63                                                   mmddyy8.
19630123                                                 yymmdd8.

The above examples are from this paper.

In addition, this SUGI paper provides some excellent examples of how to use the variousinformats, combined with SAS Functions that work with dates, to do date calculations.

Here’s the SAS program to read the data, note the informats after the two variables we wish to create as type DATE.   If you do not follow these variable names with informatsSAS will think they are character variables and you will not be able to do date arithmetic or use the SAS functions that work with dates.  In fact in this example SAS will generate error messages if you omit the informat statements as SAS will be expecting numeric data but will see the raw data as character – SAS doesn’t like such surprises :>

Note when reading variables in free format as we do in this example, you need to place a colon after the variable name that will hold date information, and before the informat specification.  You’ll see these colons in red in the example below.  The colons are not required if you’re reading your data in fixed field format.

Here is the INPUT statement to read the data in fixed field format – no colons are required.

 input  @1  admit  date11.  @14  discharge  date11.  @27 reason $  ;

Note all informats have a period (“.”) as part of their name.  Forgetting to include the period as part of the informat name is a common error.   When SAS sees the period it knows you’re specifying an informat rather than a variable.

And just in case you think informats are really something new, you use them all the time to tell SAS a variable is of type CHARACTER.   That’s what the “$” does after a variable name on an input statement!   The $ is a special format in SAS in that it’s the only one that does not require a period at the end of its name.   There you go, you already were an expert on informats – now you’re a super expert  :>

data patients;
input  admit  :  date11.  discharge  :  date11.  reason $ ;
datalines;
04-OCT-2013     06-OCT-2013                flu
09-OCT-2013     14-NOV-2013                flu
16-SEP-2013      21-DEC-2013               heart
31-May-2013         6-JUN-2013                leg_injury
;
run ;

Now that we’ve got the data into the variables admit and discharge, and SAS knows theadmit and discharge variables are of type DATE, the rest is easy since we can do date arithmetic and use the SAS functions that work with DATE variables.  Below is the SAS program with the statements added to determine what we want to determine:

data patients;
input  admit :  date11.  discharge :  date11.  reason $ ;
length_of_stay = discharge – admit ;
quarter_admittted = QTR(admit) ;           /* use the QTR function in SAS */
day_of_week_admitted = WEEKDAY(admit) ;  /* use the WEEKDAY function in SAS */
datalines;
04-OCT-2013     06-OCT-2013                flu
09-OCT-2013     14-NOV-2013                flu
16-SEP-2013      21-DEC-2013               heart
31-May-2013         6-JUN-2013                leg_injury
;
run ;
proc  print  data  = patients ;
var  admit  discharge reason
length_of_stay  quarter_admitted day_of_week_admitted ;
format admit discharge date11. ;
run

Note the format statement in red in the PROC PRINT.  format statements tell SAS how to display variables of type DATE in procedures.    In this example the format statement only applies to the PROC PRINT.   If you put the format statement in the data step it will apply to everything that follows in your program.    SAS actually stores date values as “the number of days from January 1, 1960” so you need to specify a format for the procedures to use in displaying the dates, otherwise you will just get numbers like 17634, 19209, etc.  Remove the format statement from the PROC PRINT step and look at the output if you don’t believe me :>

Now to see why we go to all this bother of telling SAS some data should be considered as type DATE.   The reason is it lets you use the power of SAS’s date functions!   In the DATA STEP part of the SAS program above we used the SAS date functions QTR and WEEKDAY to do calculations on our date variables.   It would be VERY DIFFICULT to determine these values without SAS’s date functions.

Here’s the complete list of SAS functions for working with dates.

So far so good, but what do you do when your SAS dataset has variables you would like to treat as type DATE but they are character instead?

This is a common challenge when you create a SAS dataset by importing an CSV file from Excel.  If you have data in the CSV file you want to have SAS treat as type DATE, you can’t specify this with SAS’s import capability  – sigh :<

As you might expect, SAS has a function you can use in your DATA STEP program to create a DATE variable from a character variable.   It’s the INPUT function, not to be confused with the INPUT statement.    Here’s an example:

Let’s assume you have a character variable named A with the value:  ’11-APR-2007′  While A looks like a date, SAS will not be able to do date arithmetic with the variable A, nor will SAS be able to use date functions etc. on the variable A because the variable is not of type DATE.   The INPUT function let’s you create a new variable, from A, and this new variable will be of type DATE.   Here’s an example:

A2 = input (A, date11.) ;

You’ll see our friend the informat in the above example, and this time we’re using it as a argument to the INPUT function.   Basically the statement says:  “create a new variable called A2, by taking the value of the A variable and formatting it as a date variable using the date11 informat.   Now we’re all set to do date arithmetic and use the impressive array of SAS functions that work on DATE variables.

Screen Shot 2013-11-18 at 7.37.15 PM