SQL II: Subsetting

In a previous post, we compared PROC SQL to the DATA step for merging datasets.  Now let’s take a closer look at PROC SQL and what you can do with it.

What can PROC SQL be used for?

Source:  SAS 9.x online Documentation

  • to generate reports
  • to generate summary statistics
  • to retrieve data from tables or views
  • to combine data from tables or views
  • to create tables, views and indexes
  • to update the data values in PROC SQL tables
  • to update and retrieve data from DBMS tables
  • to modify a PROC SQL table by adding, modifying, or dropping columns

Key PROC SQL statements and the order they must appear are:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

For this session we will look at a variety of examples using the SELECT, FROM, WHERE and ORDER BY statements.

SELECT

As we have seen earlier the SELECT function, selects columns of a table or variables in the dataset.

SELECT all columns or variables

PROC SQL outobs=10;        <- outobs = allows you to view x observations or rows
  select *                                <- Selecting all variables or columns in the table
    from sashelp.cars;           <- sashelp.cars is a sample dataset that is available in SAS
Run;

SELECT specific columns

PROC SQL outobs=10;        <- outobs = allows you to view x observations or rows
     select make, model        <- Selecting the make and model variables (columns)
    from sashelp.cars;           <- sashelp.cars is a sample dataset that is available in SAS
Run;

REMOVE duplicates

PROC SQL;
  select make
  from sashelp.cars;
Run;

This will result in a table with a large number of observations and several of them will be identical makes.  To remove the duplicates so that we are left with a table with the makes.

PROC SQL;
  select distinct make       <- distinct will remove all the duplicates
  from sashelp.cars;
Run;

The resulting table should show only one observation for each unique car make.

WHERE

We have already seen the WHERE statement in action when we joined 2 tables together in the previous SQL session.  But let’s dig a little deeper into how one can use this statement.

WHERE – subsetting by value

We now want to create a table where the data only pertains to cars that are sedans.  To accomplish this we will subset where the type = Sedan.

PROC SQL;
  select make, model, type       
  from sashelp.cars
where type = “Sedan”;       <- Select only rows in the table where type                                                                              has a value of Sedan
Run;

WHERE – subsetting by equation

Now let’s create a table where we list the make and model of cars where the Miles per gallon in the city is greater than 30.

PROC SQL;
  select make, model, MPG_City       
  from sashelp.cars
  where MPG_City gt 30;       <- Select only rows in the table where MPG_City > 30
Run;

How about cars where their MPG on the highway is between 35 and 50.

PROC SQL;
  select make, model, MPG_Highway       
  from sashelp.cars
  where MPG_Highway between 35 and 50;       <- Select only rows in the table where                                                                                 MPG_Highway is between 35 and 50 mpg
Run;

Now let’s restrict these vehicles to only those that are sedans.

PROC SQL;
  select make, model, type       
  from sashelp.cars
  where MPG_Highway between 35 and 50
              and type = “Sedan”;       <- Select only rows in the table where                                                                                     MPG_Highway is between 35 and 50mpg and                                                                      where the type = Sedan

Run;

ORDER BY

ORDER BY is a great of sorting your results.  If you use the DATA Step, to sort your data you would need to use the PROC SORT.  In SQL you can accomplish this all in one PROC by using the ORDER BY statement.

Let’s take our last set of results and ORDER BY MPG_Highway.

PROC SQL;
  select make, model, MPG_Highway       
  from sashelp.cars
  where MPG_Highway between  35 and 50
              and type = “Sedan”
order by MPG_Highway desc;        <- desc will present the results in descending order
Run;

PROC SQL;
  select make, model, MPG_City       
  from sashelp.cars
  where MPG_City > 30
              and type = “Sedan”
order by MPG_City asc;        <- asc will present the results in ascending order
Run;

The next SQL session will cover creating new variables, the HAVING and the GROUP BY statements.

Screen Shot 2013-11-18 at 7.33.07 PM