In a previous post, we compared PROC SQL to the DATA step for merging datasets. Now let’s take a closer look at PROC SQL and what you can do with it.
What can PROC SQL be used for?
Source: SAS 9.x online Documentation
- to generate reports
- to generate summary statistics
- to retrieve data from tables or views
- to combine data from tables or views
- to create tables, views and indexes
- to update the data values in PROC SQL tables
- to update and retrieve data from DBMS tables
- to modify a PROC SQL table by adding, modifying, or dropping columns
Key PROC SQL statements and the order they must appear are:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
For this session we will look at a variety of examples using the SELECT, FROM, WHERE and ORDER BY statements.
SELECT
As we have seen earlier the SELECT function, selects columns of a table or variables in the dataset.
SELECT all columns or variables
PROC SQL outobs=10; <- outobs = allows you to view x observations or rows
select * <- Selecting all variables or columns in the table
from sashelp.cars; <- sashelp.cars is a sample dataset that is available in SAS
Run;
SELECT specific columns
PROC SQL outobs=10; <- outobs = allows you to view x observations or rows
select make, model <- Selecting the make and model variables (columns)
from sashelp.cars; <- sashelp.cars is a sample dataset that is available in SAS
Run;
REMOVE duplicates
PROC SQL;
select make
from sashelp.cars;
Run;
This will result in a table with a large number of observations and several of them will be identical makes. To remove the duplicates so that we are left with a table with the makes.
PROC SQL;
select distinct make <- distinct will remove all the duplicates
from sashelp.cars;
Run;
The resulting table should show only one observation for each unique car make.
WHERE
We have already seen the WHERE statement in action when we joined 2 tables together in the previous SQL session. But let’s dig a little deeper into how one can use this statement.
WHERE – subsetting by value
We now want to create a table where the data only pertains to cars that are sedans. To accomplish this we will subset where the type = Sedan.
PROC SQL;
select make, model, type
from sashelp.cars
where type = “Sedan”; <- Select only rows in the table where type has a value of Sedan
Run;
WHERE – subsetting by equation
Now let’s create a table where we list the make and model of cars where the Miles per gallon in the city is greater than 30.
PROC SQL;
select make, model, MPG_City
from sashelp.cars
where MPG_City gt 30; <- Select only rows in the table where MPG_City > 30
Run;
How about cars where their MPG on the highway is between 35 and 50.
PROC SQL;
select make, model, MPG_Highway
from sashelp.cars
where MPG_Highway between 35 and 50; <- Select only rows in the table where MPG_Highway is between 35 and 50 mpg
Run;
Now let’s restrict these vehicles to only those that are sedans.
PROC SQL;
select make, model, type
from sashelp.cars
where MPG_Highway between 35 and 50
and type = “Sedan”; <- Select only rows in the table where MPG_Highway is between 35 and 50mpg and where the type = Sedan
Run;
ORDER BY
ORDER BY is a great of sorting your results. If you use the DATA Step, to sort your data you would need to use the PROC SORT. In SQL you can accomplish this all in one PROC by using the ORDER BY statement.
Let’s take our last set of results and ORDER BY MPG_Highway.
PROC SQL;
select make, model, MPG_Highway
from sashelp.cars
where MPG_Highway between 35 and 50
and type = “Sedan”
order by MPG_Highway desc; <- desc will present the results in descending order
Run;
PROC SQL;
select make, model, MPG_City
from sashelp.cars
where MPG_City > 30
and type = “Sedan”
order by MPG_City asc; <- asc will present the results in ascending order
Run;
The next SQL session will cover creating new variables, the HAVING and the GROUP BY statements.