PROC FORMAT - With detailed examples!
Formats in SAS are used to view the columns in a particular way (with commas or date formats or decimals etc.)
SAS also provides the provision i.e. PROC FORMAT to manually create a format which can be applied to multiple columns at one go.
PROC FORMAT; VALUE Name_of_first_format value-1 = 'Label 1' value-2 = 'Label 2' ...; VALUE Name_of_second_format value-3 = 'Label 3' value-4 = 'Label 4'; ...; RUN;
Let us create a dummy format named Rating_explanation where:
1 and 2 map to "Did not like the movie"
3 maps to "Neutral"
4,5 map to "Liked the movie"
PROC FORMAT; VALUE Rating_explanation 1,2 = "Did not like the movie" 3 = "Neutral" 4,5 = "Liked the movie"; RUN;
Understanding with Examples!
Let us firstly define our libpath:
libname mylib '/home/u50132927/My_datasets';
For this tutorial we shall be making use of SAS' inbuilt datasets: SASHELP.SHOES
Example 1: Let us firstly create a format Sales_info which takes values as:
0 to less than 20000 : "Low Sales"
20000 to less than 50000 : "Medium Sales"
50000 and above: "High Sales"
PROC FORMAT; VALUE Sales_info 0-20000 = "Low Sales" 20001-50000 = "Medium Sales" 50000-2000000 = "High Sales"; RUN;
Note that in the above format to indicate the third bracket (50000 and above) we have set a very high number as upper limit (i.e. 2000000)
Now let us apply this format to Sales column in our dataset SASHELP.SHOES and save the result in MYLIB.PROC_FORMAT_output. To apply a format we use FORMAT statement followed by column name and then the format name with a dot (.)
DATA MYLIB.PROC_FORMAT_output(KEEP = Region Product Sales); SET SASHELP.SHOES; FORMAT Sales Sales_info.; RUN;
Note that in the output sales values have now been replaced by the new format.
Example 2: LOW and HIGH limits
In the previous format to indicate the third bracket (50000 and above) we have set a very high number as upper limit (i.e. 2000000), but when we are dealing with a large data it becomes difficult to manually set this limit. Thus we can use keywords LOW and HIGH to indicate lower and higher limits respectively.
PROC FORMAT; VALUE Sales_info LOW -< 20001 = "Low Sales" 20001 -< 50001 = "Medium Sales" 50001 - HIGH = "High Sales"; RUN;
Applying the format on a new column
Method 1: Using FORMAT Statement.
Let us create a new column Sales2 and apply the format using FORMAT statement as follows:
DATA MYLIB.PROC_FORMAT_output (KEEP = Region Product Sales Sales2); SET SASHELP.SHOES; FORMAT Sales2 Sales_info.; Sales2 = Sales; RUN;
Method 2: Using PUT function.
We are creating a new column sales2 and are applying the format Sales_info using PUT function.
DATA MYLIB.PROC_FORMAT_output(KEEP = Region Product Sales Sales2); SET SASHELP.SHOES; Sales2 = put(Sales,Sales_info.); RUN;
Multiple formats can be defined in a single PROC FORMAT
You can also define multiple formats in one PROC FORMAT.
Example: In the following example we are creating two formats Sales_info and Returns_info
Note: In Returns_info format we have defined the third bucket as Other (which comprises of all other values)
PROC FORMAT; VALUE Sales_info LOW -< 20001 = "Low Sales" 20001 -< 50001 = "Medium Sales" 50001 - HIGH = "High Sales"; VALUE Returns_info LOW -< 1500 = "Low Returns" 1500 -<3000 = "Medium Returns" Other = "High Returns"; RUN;
Applying the format on new columns Sales2 and Returns2
DATA MYLIB.PROC_FORMAT_output (KEEP = Sales Sales2 Returns Returns2); SET SASHELP.SHOES; Sales2 = put(Sales,Sales_info.); Returns2 = put(Returns,Returns_info.); RUN;
Filtering the data using PROC FORMAT
We can also filter the data with PROC FORMAT by using WHERE keyword in the DATA step.
In the following example we are removing the rows with low sales.
DATA MYLIB.PROC_FORMAT_output(WHERE = (Sales2 <> "Low Sales")); SET SASHELP.SHOES(KEEP = Region Product Sales ); Sales2 = put(Sales,Sales_info.); RUN;
Note that resultant data has 261 rows.
PROC FORMAT along with PROC FREQ
PROC FREQ is used in calculating frequency tables and cross tabulations in SAS. We can also carry out PROC FREQ for numeric columns by defining a format on the numeric column.
In the following example we have applied our Sales_info. FORMAT on sales column and then are using PROC FREQ.
PROC FREQ data = SASHELP.SHOES; TABLES Sales; FORMAT Sales Sales_info.; run;
PROC FORMAT For character variables
We can also reformat the character variables by creating a mapping in PROC FREQ.
For example, our format my_labels maps L to low, M to medium, H to High and all others to Not available.
PROC FORMAT; VALUE my_labels "L" = "Low" "M" = "Medium" "H" = "High" Other = "Not available"; RUN;
Mapping multiple categorical values to a single format
In the following example we are mapping multiple values to a single formatted text.
PROC FORMAT; VALUE my_labels "L","ll" = "Low" "M","mm" = "Medium" "H","hh" = "High" Other = "Not available"; RUN;