**Originally published: 26/11/2018 09:19**

Publication number: ELQ-28959-1

View all versions & Certificate

# How to Include Dummy Variables into a Regression

Learn how to include Dummy Variables into a Regression.

## Introduction

Realizing how to include dummy variables into a regression is the best way to end your introduction into the world of linear regressions. Another useful concept you can learn is the Ordinary Least Squares. But now, onto dummy variables. Apart from the offensive use of the word “dummy”, there is another meaning – an imitation or a copy that stands as a substitute.

- Step n°1 |
## What are we About to Learn

In regression analysis, a dummy is a variable that is used to include categorical data into a regression model. In previous tutorials, we have only used numerical data. We did that when we first introduced linear regressions and again when we were exploring the adjusted R-squared. However, representing numbers on a scale makes more sense than representing categories like gender or season. It’s time to find out how to include such variables into a regression we are working with.

- Step n°2 |
## Including Categorical Data for the First Time

Firstly, make sure that you check the article where we made our first steps into the world of linear regressions. We will be using the SAT-GPA example from there. If you don’t have time to read it, here is a brief explanation: Based on the SAT score of a student, we can predict his GPA. Now, we can improve our prediction by adding another regressor – attendance.

In the picture below, you can see a dataset that includes a variable that measures if a student attended more than 75% of their university lectures.*lightbulb_outline*Keep in mind that this is categorical data, so we cannot simply put it in the regression. - Step n°3 |
## Using a Dummy Variable

The time has come to write some code. We can begin by importing the relevant libraries by writing:

import numpy as np

import pandas as pd

import statsmodels.api as sm

import matplotlib.pyplot as plt

import seaborn as sns

sns.set()

After that, let’s load the file ’1.03. Dummies.csv’ into the variable raw_data. You can download the file from here. If you don’t know how to load it, here’s what you need to type:

raw_data = pd.read_csv(’1.03. Dummies.csv’)

Now, let’s simply write

''raw_data''

and see what happens.

As you can tell from the picture, there is a third column named ‘Attendance’. It reflects if a student attended more than 75% of the lessons with two possibilities – Yes and No. - Step n°4 |
## Mapping Values

What we would usually do in such cases is to map the Yes/No values with 1s and 0s. In this way, if the student attended more than 75% of the lessons, the dummy will be equal to 1. Otherwise, it will be a 0.

So, we will have transformed our yes/no question into 0s and 1s. That’s what the dummy name stands for – we are imitating the categories with numbers.

*add_shopping_cart*Continue reading for free

**(70% left)**

## Reviews

Write a review

## People using this Best Practice also downloaded

## More Best Practices from 365 Data Science

#### How to Apply INDEX and MATCH Separately and Combined | Advanced Excel

Learn how to apply both functions, INDEX and MATCH, separately and combined on Excel.311*add_shopping_cart*freeby 365 Data Science#### How to Classify Data | Types of Data

Read our article to find out the two main ways of classifying data.371*add_shopping_cart*freeby 365 Data Science#### How to Apply and Combine INDIRECT Excel Function with VLOOKUP

Learn how to apply and combine INDIRECT Excel Function with VLOOKUP.191*add_shopping_cart*freeby 365 Data Science#### How to Differentiate the Two Groups of Measurement Levels in Statistics

Understand the levels of measurement in statistics, which are split into two groups: qualitative and quantitative data.231*add_shopping_cart*freeby 365 Data Science#### How to Use Mean, Median, and Mode | Statistics

This lesson will introduce you to the three measures of central tendency.181*add_shopping_cart*freeby 365 Data Science#### How to Use the Distribution Function in Statistics

Learn everything you need to know about Distribution in Statistics.191*add_shopping_cart*freeby 365 Data Science#### How to Use Normal Distribution | Statistics

Learn everything you need to know about Normal Distribution in Statistics.251*add_shopping_cart*freeby 365 Data Science#### How to Apply The Central Limit Theorem

Learn how to apply the Central Limit Theorem in Statistics.201*add_shopping_cart*freeby 365 Data Science#### How to Use Student's T Distribution

Learn everything you need to know about Student's T Distribution.181*add_shopping_cart*freeby 365 Data Science#### How to Test a Hypothesis | Hypothesis Testing Steps: Null Hypothesis vs Alternative Hypothesis

Understand the Hypothesis Testing Steps, and see the difference between Null and Alternative Hypotheses.221*add_shopping_cart*freeby 365 Data Science#### How to Define the Types of Statistical Errors

Learn about the errors that can be made in hypothesis testing.191*add_shopping_cart*freeby 365 Data Science#### How to Define and Use the Simple Linear Regression Model

Learn everything you need to know about the simple linear regression model.131*add_shopping_cart*freeby 365 Data Science

## Discussion feed for How To Include Dummy Variables Into A Regression

The user community and author are here to help. Go ahead!

How-To Methodology Author

stararrow_drop_uparrow_drop_downReplyreply