DROPPING variables in SAS (Using DROP Statement)
In this tutorial we shall learn 3 methods of how to drop variables in SAS using DROP keyword or DROP statement and the difference in the outputs.
Firstly let us define our library location:
libname mylib '/home/u50132927/My_datasets';
For this tutorial we shall be leveraging SAS' inbuilt dataset: SASHELP.SHOES.
DATA statement created a new dataset named shoes in our library mylib.
SET statement copies the rows specified in SET statement and pastes it to our new DATA.
DATA mylib.shoes; SET SASHELP.shoes; RUN;
We can drop variables using 3 different methods:
Using DROP keyword in DATA step.
Using DROP keyword in SET step.
Using DROP Statement after SET step.
All 3 of them yield same output (provided we are not creating any new variable using dropped variables)
In the following 3 code chunks we are dropping variables Sales and Returns from our data.
DATA MYLIB.SHOES (DROP = SALES RETURNS); SET SASHELP.SHOES; RUN;
DATA MYLIB.SHOES ; SET SASHELP.SHOES(DROP = SALES RETURNS); RUN;
DATA MYLIB.SHOES ; SET SASHELP.SHOES; DROP SALES RETURNS; RUN;
Difference in using DROP statement while creating a new variable
For example, let us drop the variables Sales and Returns in SET statement and create a new variable called MYVARIABLE = 2* SALES.
DATA MYLIB.SHOES ; SET SASHELP.SHOES(DROP = SALES RETURNS); MYVARIABLE = SALES*2; RUN;
Our new variable has NA values, moreover SALES column is still present. Why?
Ans. SAS firstly copies the data rows and columns available in SET statement. Since we have dropped variable SALES in our SET statement thus this variable does not exist for SAS. Hence MYVARIABLE cannot be computed and has NA values. Similarly it creates a new variable called SALES.
Now, let us drop the variables Sales and Returns in DATA statement and create a new variable called MYVARIABLE = 2* SALES.
DATA MYLIB.SHOES (DROP = SALES RETURNS); SET SASHELP.SHOES; MYVARIABLE = SALES*2; RUN;
When SAS tries to copy the dataset using SET statement then SALES column is available to it thus it successfully creates our new variable MYVARIABLE. Now at the data step after all the calculations are done then SAS drops the Sales and Returns columns.
Similarly writing a DROP Statement after SET Statement produces the new variable.
DATA MYLIB.SHOES; SET SASHELP.SHOES; MYVARIABLE = SALES*2; DROP SALES RETURNS; RUN;
If you have a large dataset and you want to keep only a few of the columns. In such a case it can be tedious and overwhelming to write the names of so many columns in DROP statement. Thus as an alternative SAS also offers KEEP statement.
Refer to the following tutorial to learn more about Keeping variables in SAS.