Keeping variables in SAS (using KEEP statement)
In this tutorial we shall learn 3 methods of how to drop variables in SAS using KEEP keyword or KEEP statement and the difference in the outputs.
Firstly let us define our library location:
libname mylib '/home/u50132927/My_datasets';
For this tutorial we shall be leveraging SAS' inbuilt dataset: SASHELP.SHOES.
DATA statement created a new dataset named shoes in our library mylib.
SET statement copies the rows specified in SET statement and pastes it to our new DATA.
DATA mylib.shoes; SET SASHELP.shoes; RUN;
We can keep variables using 3 different methods:
Using KEEP keyword in DATA step.
Using KEEP keyword in SET step.
Using KEEP Statement after SET step.
All 3 of them yield same output (provided we are not creating any new variable using those variables which are not kept)
In the following 3 code chunks we are keeping variables Region, Product, Sales and Returns from our data.
DATA MYLIB.SHOES (KEEP = REGION PRODUCT SALES RETURNS); SET SASHELP.SHOES; RUN;
DATA MYLIB.SHOES ; SET SASHELP.SHOES(KEEP = REGION PRODUCT SALES RETURNS); RUN;
DATA MYLIB.SHOES ; SET SASHELP.SHOES; KEEP REGION PRODUCT SALES RETURNS; RUN;
Difference in using KEEP statement while creating a new variable
For example, let us keep the variables Region, Product, Sales and Returns in DATA statement and create a new variable called MYVARIABLE = 2* SALES.
DATA MYLIB.SHOES (KEEP = REGION PRODUCT SALES RETURNS); SET SASHELP.SHOES; MYVARIABLE = SALES*2; RUN;
Our new variable is not created! Why?
Ans. SAS firstly copies the data rows and columns available in SET statement. Since we have kept variable SALES in our SET statement thus this variable exists for SAS and MYVARIABLE is computed. Now at the data step after all the calculations are done then SAS keeps only the columns mentioned using KEEP keyword.
Now, let us keep the variablesin SET statement and create a new variable called MYVARIABLE = 2* SALES.
DATA MYLIB.SHOES ; SET SASHELP.SHOES(KEEP =REGION PRODUCT SALES RETURNS); MYVARIABLE = SALES*2; RUN;
Since KEEP keyword is mentioned in SET statement thus SAS copies only 4 columns from SASHELP.SHOES and hence MYVARIABLE gets created successfully. As a result in the newly created data all 5 columns are available.
When you create a new variable but provide a KEEP Statement after SET step then only the columns mentioned in KEEP statement get saved to the new dataset.
DATA MYLIB.SHOES ; SET SASHELP.SHOES; MYVARIABLE = SALES*2; KEEP REGION PRODUCT SALES RETURNS; RUN;
If you have a large dataset and you want to drop only a few of the columns. In such a case it can be tedious and overwhelming to write the names of so many columns in KEEP statement. Thus as an alternative SAS also offers DROP statement.
Refer to the following tutorial to learn more about DROPPING variables in SAS.