• Ekta Aggarwal

Apply family of functions

Apply functions play an important role in R, which eliminate the hassle of writing tricky codes to get the output for multiple variables. This tutorial is about how apply family of functions are easy to implement in R.


Apply family of functions majorly consists of-

1. apply

2. lapply

3. sapply

4. tapply


In this tutorial, we'll be learning implementation of all these four functions.


apply:

Syntax:

apply(X, margin, function,...)

X - vector, matrix, dataframe

Margin - 1, if functions need to be applied on rows,

2, if functions need to be applied on columns

c(1,2), functions applied on both rows & columns

FUN(functions) - mean, sum, min, max, standard deviation or any inbuilt function and also it can be a user defined function.

... - This arguments is used when you want to provide some more information about the function.


Let's take simple dataset - mtcars having 32 rows and 11 columns

View(mtcars)
dim(mtcars)

Task: Computing row wise mean using apply function.

apply(mtcars, 1, mean)

The first argument depicts the name of the dataset; 1 represents that operations have been applied to rows, mean is the function imposed to each row.


From the below output, it can be seen 32 row wise means have been calculated.

Secondly, apply function on columns it is simple just as we did in the last code just replace second argument as 2 in order to get column wise mean.


lapply:

Lets understand how lapply is different from apply function.

1. lapply function gives list as a output, while apply returns a vector as an output.

2. It can only be applied to the columns

Syntax:

lapply(X, FUN, ...) X - vector, matrix, dataframe, lists FUN(functions) - mean, sum, min, max, standard deviation or any inbuilt function and also it can be a user defined function. "..." - This arguments is used when you want to provide some more information about the function.

Let's take simple dataset - iris having 150 rows and 5 columns

View(iris)
dim(iris)
lapply(iris[,-5],mean)

In the first argument, we have excluded the 5th column because it is categorical in nature and calculating the average for first four continuous variables.

Suppose we have missing values in the iris dataset, then we'll make use of third argument of lapply which is "..." and putting here na.rm=TRUE, which eventually remove the missing values.

sapply:

Characteristics - 1. sapply function returns vector as a output. 2. It can only be applied to the coulumns Syntax: sapply(X, FUN, ...) Task: Computing column wise mean using sapply function.

tapply:

Characteristics: 1. tapply function returns vector as a output. 2. It only does column-wise computations 3. It is mainly used when you have a categorical variable and you need to do some computations on some numeric column for each group in that categorical variable. Syntax: tapply(X, INDEX, FUN) X: Numeric column on which computations are to be done. Index: categorical column on the basis of which grouping is needed.

unique(mtcars$cyl)

Task: In mtcars, we have vehicles where number of cylinders is 4 or 6 or 8. We split our numeric variable (mileage i.e. mpg) on the basis of number of cylinders and get the mean of the data.

tapply(mtcars$mpg,mtcars$cyl,mean)

first argument representing the mileage of cars which is numeric in nature, second argument is cars with number of cylinders with categories like 4, 6, & 8. We are computing average mileage for each type of cylinders.

It can be clearly observed that as the number of cylinders increase, mileage decreases.


Following table explains the major differences among 4 functions:


Tags: