Sometimes while reading and viewing a dataset we can have differences. All this is due to formats and informats. In this tutorial we shall understand both of them in detail.
Informats
While creating or reading a dataset: SAS needs to understand the datatype (whether it is a character or numeric), length of the variable (in case of characters) or if the data has any comma or numeric (in case of numeric). The types of the variables at this stage are called INFORMATS.
Understanding with the help of an example!
Let us create a dataset mydata with 4 columns: Department, Months, No of cases and no of deaths.
To denote character variables in SAS DATAset we write a $ sign after the column name. While for numeric columns we do not specify anything in front of the column name. Suppose we want to read commas in our data thus for this we use comma informats!
: comma7. - Helps SAS in reading numbers such as 123,456 (at most 7 places including a comma) or 12,345 or 345.
: commaN. - where N is the number of places to read (including a comma)
DATA mydata;
INPUT Department :$30. Months $ No_of_cases No_of_deaths :comma7.;
CARDS;
Cardiology Jan 7713 123,456
Cardiology Feb 243 126,243
Cardiology Mar 543 10,892
E&A Jan 772 4,243
E&A Feb 443 13,924
E&A Mar 82 121,632
;
RUN;
Note that in the output you are still unable to view the commas. This is so because an informat tells SAS only to read the data properly.
Formats
To view commas in our data we need to specify COMMA FORMAT using the FORMAT statement as follows:
DATA mylib.myDATA;
INPUT Department :$30. Months $ No_of_cases No_of_deaths :comma7.;
FORMAT No_of_deaths :comma7.;
CARDS;
Cardiology Jan 7713 123,456
Cardiology Feb 243 126,243
Cardiology Mar 543 10,892
E&A Jan 772 4,243
E&A Feb 443 13,924
E&A Mar 82 121,632
;
RUN;
Formats and informats can be different
In SAS we can different informats (while reading the dataset) and formats (to view the dataset):
In the following SQL query we are defining dates in a mmddyy10. informat while we can view it in a date9. format.
DATA mylib.mydata;
INPUT Date1 mmddyy10.;
FORMAT Date1 date9.;
CARDS;
07/10/2017
02/12/1989
12/28/2001
;
RUN;
To understand more about various types of formats and informats refer to the following tutorials:
Comments