• Ekta Aggarwal

Creating or reading datasets in SAS

In this tutorial we shall learn how to create or read datasets in SAS.


Key topics covered:


Let us firstly define our library:

libname mylib '/home/u17432927/My_DATAsets';

Data Types in SAS


There are majorly 2 datatypes in SAS: Numeric and character.

Numeric Data Type: A numeric data type consists of numbers, decimals, commas, currency symbols and dates. A numeric data type always has length = 8 irrespective of the number of digits it consititutes.

Character Data Type: A character data type majorly consists of strings with default length 8.


To learn more about how dates are created in SAS refer to this tutorial!


Creating datasets in SAS:


Syntax:

Firstly let us understand the common syntax:

DATA XXX;
INPUT ....;
CARDS (or DATALINES);
<Data values are entered>
;
RUN;

DATA: New dataset which needs to be created along with the libpath. If libpath is not mentioned then the data gets created in default WORK directory.

INPUT: contains the list of the column names with informats;

CARDS (or DATALINES): Tells SAS that data is to be read from next line onwards.

RUN: Tells SAS that the query is complete thus the commands can be executed as a whole.


Task: Let us create a dataset mydata with 4 columns: Department, Months, No of cases and no of deaths.


To denote character variables in SAS DATAset we write a $ sign after the column name. While for numeric columns we do not specify anything in front of the column name.

DATA mylib.myDATA;
INPUT Department $ Months $ No_of_cases No_of_deaths;
CARDS;
E&A Jan 772 42
E&A Feb 443 13
E&A Mar 82 12
;
RUN;

Our data has 3 rows and 4 columns:


Task: Let us include the data for cardiology department:

DATA mylib.myDATA;
INPUT Department $ Months $ No_of_cases No_of_deaths;
CARDS;
Cardiology Jan 7713 23
Cardiology Feb 243 21
Cardiology Mar 543 10
E&A Jan 772 42
E&A Feb 443 13
E&A Mar 82 12
;
RUN;

Note that Cardiology Department name got truncated to 8 letters.

Define the length of the variables in INPUT statement.


A character variable has by default length of 8 characters thus our string has got truncated. To tackle this we need to define our length in the INPUT statement:

In front of department we have written : $30. (denoting that Department variable can have strings upto 30 characters)

DATA mylib.myDATA;
INPUT Department : $30. Months $ No_of_cases No_of_deaths;
CARDS;
Cardiology Jan 7713 23
Cardiology Feb 243 21
Cardiology Mar 543 10
E&A Jan 772 42
E&A Feb 443 13
E&A Mar 82 12
;
RUN;

Why putting a COLON INFORMAT ' : ' is necessary ?


If we skip the : with our length then SAS will look for exactly 30 decimal places. Thus a colon informat tells SAS that the string can have less than 30 characters as well!

DATA mylib.myDATA;
INPUT Department $30. Months $ No_of_cases No_of_deaths;
CARDS;
Cardiology Jan 7713 23
Cardiology Feb 243 21
Cardiology Mar 543 10
E&A Jan 772 42
E&A Feb 443 13
E&A Mar 82 12
;
RUN;

Comma informat

Suppose we want to read commas in our data thus for this we use comma informats!

: comma7. - Helps SAS in reading numbers such as 123,456 (at most 7 places including a comma) or 12,345 or 345.

: commaN. - where N is the number of places to read (including a comma)

DATA mylib.myDATA;
INPUT Department :$30. Months $ No_of_cases No_of_deaths :comma7.;
CARDS;
Cardiology Jan 7713 123,456
Cardiology Feb 243 126,243
Cardiology Mar 543 10,892
E&A Jan 772 4,243
E&A Feb 443 13,924
E&A Mar 82 121,632
;
RUN;

Note that in the output you are still unable to view the commas. This is so because an informat tells SAS only to read the data properly.


Viewing comma format

To view commas in our data we need to specify COMMA FORMAT using the FORMAT statement as follows:

DATA mylib.myDATA;
INPUT Department :$30. Months $ No_of_cases No_of_deaths :comma7.;
FORMAT No_of_deaths :comma7.;
CARDS;
Cardiology Jan 7713 123,456
Cardiology Feb 243 126,243
Cardiology Mar 543 10,892
E&A Jan 772 4,243
E&A Feb 443 13,924
E&A Mar 82 121,632
;
RUN;

Decimal informats and formats


Suppose we want to read commas and decimals in our data thus for this we use comma with decimal informats!

: comma10.2 - Helps SAS in reading numbers such as 123,456.12 (at most 10 places including a comma and a comma) or 12,345.1 or 345.01.

: commaN.W - where N is the total number of places to read (including a comma and decimal), W is the number of decimals places to be read.


To view the decimals and commas in the output we need to specify COMMA and DECIMAL FORMAT.

DATA mylib.myDATA;
INPUT Department :$30. Months $ No_of_cases No_of_deaths :comma10.2;
FORMAT No_of_deaths :comma10.2;
CARDS;
Cardiology Jan 7713 123,456.12
Cardiology Feb 243 126,243.53
Cardiology Mar 543 10,892.23
E&A Jan 772 4,243.85
E&A Feb 443 13,924.2
E&A Mar 82 121,632.1
;
RUN;

Reading DATA from specified positions


SAS also provides the provision to read data when the data is written at specific position.

@ denotes the starting position

X-Y denotes start reading the data from position X till position Y.


For example, in the code below we are telling SAS to read:

Department : From 3rd position till at most 10th position.

Month: From 12 till 14th characters.

Number of deaths: Has a comma and decimal informat of 10.2

DATA mylib.myDATA;
INPUT @3 Department :$10. Months $ 12-14 No_of_cases No_of_deaths :comma10.2;
FORMAT No_of_deaths :comma10.2;
CARDS;
Cardiology Jan 7713 123,456.12
Cardiology Feb 243 126,243.53
Cardiology Mar 543 10,892.23
Neurology  Jan 772 4,243.85
Neurology  Feb 443 13,924.2
Neurology  Mar 82 121,632.1
;
RUN;

Missing Values


Case 1:Numeric values are missing


To enter data where numeric values are missing we define a dot ( . ) at the place where the numeric observation is missing.

At the time of viewing : a missing numeric value is viewed as a dot ( . )

DATA mylib.myDATA;
INPUT Department :$30. Months $ No_of_cases No_of_deaths;
CARDS;
Cardiology Jan 7713 23
Cardiology Feb . 21
Cardiology Mar 543 10
E&A Jan 772 42
E&A Feb 443 .
E&A Mar 82 12
;
RUN;

Case 2: Character values are missing

To enter data where character values are missing we define a dot ( . ) at the place where the observation is missing.

At the time of viewing : a missing character value is viewed as a blank space( . )

DATA mylib.myDATA;
INPUT Department :$30. Months :$ No_of_cases No_of_deaths;
CARDS;
Cardiology . 7713 23
Cardiology Feb . 21
Cardiology Mar 543 10
E&A . 772 42
E&A Feb 443 .
E&A Mar 82 12
;
RUN;

Reading data using a TXT file

We can also read data using a txt file. For this we will consider 3 cases:

  • TXT file with no missing values

  • TXT file with missing values only in numeric columns

  • TXT file with missing values in both numeric and character columns

For this we have uploaded 3 TXT files :

File1
.txt
Download TXT • 115B
File2
.txt
Download TXT • 110B
File3
.txt
Download TXT • 103B

Case 1: TXT file with no missing values


INFILE Statement: File Location and filename to be read.

INPUT: Define all the column names and informats

DATA mylib.myDATA;
INFILE '/home/u50132927/My_DATAsets/File1.txt';
INPUT Department : $30. Months $ No_of_cases No_of_deaths;
RUN;

Case 2: TXT file with missing values only in numeric columns


In out TXT files we have defined a numeric missing value as a dot (. ) thus SAS is able to recognise it.

DATA mylib.myDATA;
INFILE '/home/u50132927/My_DATAsets/File2.txt';
INPUT Department : $30. Months $ No_of_cases No_of_deaths;
RUN;

Case 3: TXT file with missing values in both numeric and character columns


In out TXT files we have defined a numeric missing value as a dot (. ) and a character missing value as two blank spaces.


DSD Keyword: To help SAS recognise the missing values we have used DSD keyword in INFILE. But DSD keyword treats a comma " , " as a separator, thus we have to explicitly specify DELIMITER = " " in the INFILE statement.

DATA mylib.myDATA;
INFILE '/home/u50132927/My_DATAsets/File3.txt' DSD DELIMITER  = " ";
INPUT Department : $30. Months $ No_of_cases No_of_deaths;
RUN;

We can also import data in SAS using SAS' inbuilt procedure: PROC IMPORT. Click here to learn it in detail!

Tags: