Week 1 – Introduction to SAS

Learning outcomes for this session

  • Identify sections of the SAS interface
  • describe features of an Excel file required for SAS import
  • perform a PROC PRINT to view data entered into SAS

After completing Week 1:

Additional examples to try on your own
Additional examples – answers


 What is SAS?

Statistical analysis software used by many researchers on campus – from Population Medicine to Animal & Poultry Science to Plant Agriculture to Human Nutrition.   A widely used software package!

What does it look like?

There are a number of components to the SAS interface:

  • Results and Explorer windows to the left
  • Output, Log and Editor windows to the right, taking up most of the screen
SAS_interface
Results and Explorer Windows to the left Output, Log and Editor windows to the right, taking up most of the screen

 

 

 

 

 

 

 

 

What do each of these windows do?

  • Results Window –  a Table of Contents for all of your results.
  • Explorer Window – similar to Windows Explorer – allows you to navigate SAS libraries and files
  • Editor Window – this is where you will spend most of your time, writing and editing program files
  • Log Window – this window is extremely helpful, think of it as your best friend in SAS, it tells you what SAS has done every step of your program and processing
  • Output Window – SAS versions 9.2 and earlier, use this window to display all results and output.  SAS 9.3 and higher use a new window called the Results Viewer.  All the results are presented in an HTML format.

How does SAS work?

SAS is divided into 2 areas:

  • DATA step
  • PROCs (short for PROCedures)

DATA step is all about data manipulation – one of the key strengths to SAS
PROCs – this is where you will find most of your statistical procedures.

Throughout the next 9 weeks we will work with different aspects of the DATA step and we will review and use many PROCs.

How do you get data into SAS?

The primary reason we use SAS is to perform statistical analyses on some data.  However, we need to ensure that the data we have collected is brought into SAS correctly.  I’m sure you’ve heard of “garbage in, garbage out”?  This cannot be more truer when you collect data and bring it into a statistical package.

There are different ways to bring data into SAS.  I will try to review and provide my thoughts on 3 different ways I see my students performing this task.  However, before we import data into any software package, we need to ensure the data is “clean” and in a format that will be accepted into the package.  So let’s talk about the most common way researchers enter their data – EXCEL.

Using Excel to enter data and Statistical Software packages

Most people use Excel to enter their data and that’s great!  The look of it is neat, ordered and we can do quick summaries, such as means and sums.  We can also make Excel look pretty by adding colours, headings, footnotes, or maybe notes about what we did and how.  In the end, Excel can be a very versatile tool.  But, we need to keep in mind that Excel is NOT a statistical package and that we are using it to collect our data.  That being said, I recognize many people use it for more than it was set out to be.

Let’s take a look at an example of how Excel is used.

DAD/HMDB Hospitalization Rate, Average Length Of Stay, Top 10 High-Volume Inpatient Hospitalizations and Surgeries, and Hospital-Based Newborn Rate, 2012–2013

Beautiful file, several worksheets, graphs, tables, notes, etc…  To do any statistical analysis on any of these stats, we will need to strip the tables.

Let’s select Table 2: Number, Percentage* and Average Length of Stay for Top 10 High-Volume Inpatient Hospitalizations by Province/Territory, HMDB, 2012-2013

Follow these steps to clean the file:

  • Copy the entire sheet into a blank worksheet.  This allows you to keep the formatted version while working on a clean version.
  • Label the new worksheet SAS or something that makes sense to you.  This way when we import the data you will know which worksheet contains the clean data.
  • Remove all the formatting.  In Excel, Click on the

Excel_clear_formatbutton and select Clear Formats.  This will remove all Excel formatting from the worksheet.

  • The top row of the Excel file needs to contain the name of the variables you wish to use in SAS.  Notice that in our file there is a Table Title.  Delete the row that contains the title.
  • The top row of the Excel file needs to contain the name of the variables you wish to use in SAS.  You will now need to modify the headings of the columns.  For instance:
    • Province/Territory -> PROV
    • Most Responsible Diagnosis for Inpatient Hospitalization -> DIAGNOSIS
    • Number of Inpatient Hospitalizations in 2012–2013 -> NO_INPAT
    • Percentage* of Inpatient Hospitalizations in 2012–2013 -> PERC_INPAT
    • Average Length of Stay of Inpatient Hospitalizations in 2012–2013 -> STAY
  • The variable names are ones that will have a significance to you.  Please DOCUMENT these changes so you know what is contained in your dataset!  I will provide more information on Variable Labels and Value Labels in a follow-up post
  • Notice that in this particular file we have 2 header cells that have been merged?  We will need to divide these and add a title to the column that contains 1-10,let’s call it RANK
  • Don’t forget to save your Excel file!
  • Remove any notes at the bottom of the table or anywhere else in the worksheet

RECAP:

  1. Copy data into new worksheet
  2. Rename worksheet for easy identification later
  3. Clean variable names in the first row
  4. Second row contains your data and NOT blanks

TIPS:

SAS naming conventions:

  • variable names do not contain any “funny” characters such as *, %, $, etc…
  • variable names begin with a letter
  • variable names may contain a number, but cannot begin with a number
  • NO spaces allowed!  You may use _ in place of a space

 Importing Excel files into SAS

Using the IMPORT feature in SAS is probably the easiest way of bringing data into the SAS program.  To import the data follow these steps:

  1. In the SAS program – File -> Import Data
  2. You will now answer each step of the Import Wizard.
  3. With SAS 9.2, you will need to save your Excel files as the older 97-2003 .xls format.  This version of SAS does NOT recognize the .xlsx ending for Excel
  4. Browse your computer to find and select your Excel file
  5. Select the worksheet in the file using the dropdown box in SAS.  This is why I suggested earlier to call it SAS or something you will remember
  6. This next step can be tricky.  Leave the Library setting to WORK.  In the Member box provide SAS with a name for your datafile to be saved in SAS.  For this example let’s call it HOSPITAL.
  7. The next step is optional!  If you are planning on importing more files that have the same structure or where your answers to the Wizard will be the same, this step allows you to save the program (or syntax) that SAS creates to import the file.
  8. Finish and your file is now in SAS
  9. Check the Log Window

 Copy and Paste Data into SAS

As much as I would like to discourage people from using this method of bringing data into SAS, it is a viable option about 95% of the time.  In most cases this method will work, however there is the odd case, about 5% of the time where this method will fail.

Let’s work through how we enter data into Excel and translate our steps into SAS

First thing most of use do when entering data into Excel is to create variable names or headings in the first row of Excel.  We then begin to type our data in the second row.  When we’ve completed entering the data or we have a page full of data, that’s when most of us remember to save the file.  Sound familiar?

In SAS we can do all of these steps using a DATA Step.  We will be creating a program or writing syntax in the SAS editor for this bit.  To start, SAS likes us to save our file FIRST, before we enter any data – contrary to what we traditionally do in Excel.  We start our program

Data hospital;         <-  Our file is called hospital

The first thing we did in Excel was label our columns – this is the second line of our SAS code:

Data hospital;
input prov diagnosis rank no_inpat perc_inpat stay;

The next thing we do in Excel is start entering our data.  In SAS, we first let SAS know that the data is coming by adding a datalines; statement in our code and then we enter our data.

Data hospital;
input prov diagnosis rank no_inpat perc_inpat stay;
datalines;
Canada  1  Giving birth 369454 12.9 2.4

In order to complete our data entry in SAS, we need to let SAS know that there are no more data points and to go ahead and save the file.  To do this we add a ; all by itself at the end of the data and a Run; to let SAS finish the data and save the file.

Data hospital;
input prov diagnosis rank no_inpat perc_inpat stay;
datalines;
Canada  1  Giving birth 369454 12.9 2.4
;
Run;

Rather than retyping all the data, we can copy it from Excel and paste it after the datalines statement.  As I noted above, this will work most of the time, but there are times where it does not work.  Why you may ask?  I suspect it is some hidden Excel formatting that plays havoc with SAS, but I cannot identify exactly what it is.  Just note that this method may fail at times.

Viewing the data in SAS

We’ve just imported our data and I see nothing!  What happened?  Did I do something wrong?  My log window says my data has been successfully imported, but where did the data go?

Once you’ve imported your data, SAS saved it in a dataset within its program. So think of is as a blackbox and somewhere in that blackbox is a dataset called HOSPITAL.  How do you go about viewing it?  Let’s use a PROCedure called PRINT.

PROC PRINT will show you your data in the Output window.

Proc print data=hospital;
Run;

These statements will printout ALL the observations in your dataset.  Please note that specifying the dataset you are working with is an EXCELLENT habit to get into.  In this case we are interested in viewing the data contained in the HOSPITAL dataset – data=hospital

To view only the first 10 observations in this dataset we can add an option at the end of the Proc print statement.

Proc print data=hospital (obs=10);
Run;

Maybe we want to view observations 11-20 we can a second option at the end of our Proc print statements

Proc print data=hospital (firstobs=11 obs=20);
Run;

This tells SAS that the first observation we want to view is the 11th observation of a total of 20 observations we are looking at.

We can also tell SAS that rather than looking at all the variables we only want to see PROV by adding a var statement to our Proc print.

Proc print data=hospital (firstobs=11 obs=20);
  var PROV;
Run;

 

What went wrong?  Reading the Log window

When you were trying to run the Proc Print coding, you probably noticed that the data didn’t look like it was supposed to.  What went wrong?

Rather than providing all the answers here, please add a comment to this post  – what you think went wrong.  I’ll add another post last this week with fixes to the problems at hand.

Check out my answers

Screen Shot 2013-11-18 at 7.33.07 PM

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s