- Ekta Aggarwal

# Creating dummy variables in Python

In this tutorial we will learn what are dummy variables and how to create them.

**What are dummy variables?**

Let us consider the data where we have employee information and the department they are working for. Since department is a categorical variable thus we can denote them in a 1 - 0 format.

Suppose we have 3 departments: consulting, technology and outsourcing. We can create 3 columns for each of them namely: Dep_consulting, dep_technology and dep_outsourcing.

When the department is consulting then Dep_consulting = 1 and other 2 variables are 0.

Similarly for technology department Dep_technology = 1 and other 2 will be 0.

Lastly, for outsourcing department dep_outsourcing = 1 and others as 0.

**Note: **It can never happen that there can be more than one 1 in a set of dummy variables. In a single row for dummy variables there can be at most one 1.

When categorical variables can be expressed in 1-0 notation - these are called** dummy variables.**

**Creating dummy variables in Python**

__Dataset:__

__Dataset:__

In this tutorial we will make use of following CSV file:

Let us read our file using pandas' read_csv function. Do specify the file path where your file is located:

```
import pandas as pd
mydata = pd.read_csv("C:\\Users\\Employee_info.csv")
```

Let us create a copy of our dataset as data1.

`data1 = mydata.copy()`

**Method 1: Using map( ) function** we can create a mapping between our variable and new value.

We need to create 3 dummy variables manually using map( ) function

For consulting dummy , map function creates a mapping that when department is consulting the value will be 1 and for other 2 departments it will be 0.

In a similar fashion other 2 dummy variables can be created.

```
data1['Consulting Dummy'] = data1.Department.map({"Consulting" : 1,"Technology":0, "Outsourcing" : 0})
data1['Outsourcing Dummy'] = data1.Department.map({"Consulting" : 0,"Technology":0, "Outsourcing" : 1})
data1['Technology Dummy'] = data1.Department.map({"Consulting" : 0,"Technology":1, "Outsourcing" : 0})
```

`data1.head()`

**Drawback:**

Suppose we have a categorical variable with 50 categories then creating dummy variables with map would be too cumbersome and mundane. To mitigate this we have pandas' get_dummies( ) function!

**Method 2: Using pandas' get_dummies( ) function** we can create dummy variables with a single line of code.

Let us create another copy of our data

`data2 =mydata.copy()`

Using pandas' get_dummies( ) function we can create dummy variables with a single line of code.

For each department name have added a prefix **"Dep"**

`pd.get_dummies(data2.Department,prefix = "Dep")`

get_dummies( ) only creates dummy variables. To append it in our data we use pandas' concat function:

Let us firstly save our dummy variables in a dataset.

`dummy_variables = pd.get_dummies(data2.Department,prefix = "Dep")`

We now concatenate our original data using pd.concat( ) , by defining axis = 1 or axis = "columns" we are telling Python to add the columns horizontally (and not append them as rows).

```
pd.concat([data2,dummy_variables],axis = 1)
#alternatively
pd.concat([data2,dummy_variables],axis = "columns")
```

When Dep_Consulting = 1 and Dep_Technology is 0 then it is self-implied that dep_outsourcing will be 0. Thus in this case we only need 3-1 = 2 dummy variables. We can drop the first column by specifying **drop_first = True.**

```
dummy_variables = pd.get_dummies(data2.Department,prefix = "Dep",drop_first=True)
pd.concat([data2,dummy_variables],axis = "columns")
```