
Publication number: ELQ-28959-1
View all versions & Certificate

How to Include Dummy Variables into a Regression
Learn how to include Dummy Variables into a Regression.
Introduction
Realizing how to include dummy variables into a regression is the best way to end your introduction into the world of linear regressions. Another useful concept you can learn is the Ordinary Least Squares. But now, onto dummy variables. Apart from the offensive use of the word “dummy”, there is another meaning – an imitation or a copy that stands as a substitute.
- Step n°1 |
What are we About to Learn
In regression analysis, a dummy is a variable that is used to include categorical data into a regression model. In previous tutorials, we have only used numerical data. We did that when we first introduced linear regressions and again when we were exploring the adjusted R-squared. However, representing numbers on a scale makes more sense than representing categories like gender or season. It’s time to find out how to include such variables into a regression we are working with.
- Step n°2 |
Including Categorical Data for the First Time
Firstly, make sure that you check the article where we made our first steps into the world of linear regressions. We will be using the SAT-GPA example from there. If you don’t have time to read it, here is a brief explanation: Based on the SAT score of a student, we can predict his GPA. Now, we can improve our prediction by adding another regressor – attendance.
In the picture below, you can see a dataset that includes a variable that measures if a student attended more than 75% of their university lectures.lightbulb_outline Keep in mind that this is categorical data, so we cannot simply put it in the regression.
- Step n°3 |
Using a Dummy Variable
The time has come to write some code. We can begin by importing the relevant libraries by writing:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
After that, let’s load the file ’1.03. Dummies.csv’ into the variable raw_data. You can download the file from here. If you don’t know how to load it, here’s what you need to type:
raw_data = pd.read_csv(’1.03. Dummies.csv’)
Now, let’s simply write
''raw_data''
and see what happens.
As you can tell from the picture, there is a third column named ‘Attendance’. It reflects if a student attended more than 75% of the lessons with two possibilities – Yes and No. - Step n°4 |
Mapping Values
What we would usually do in such cases is to map the Yes/No values with 1s and 0s. In this way, if the student attended more than 75% of the lessons, the dummy will be equal to 1. Otherwise, it will be a 0.
So, we will have transformed our yes/no question into 0s and 1s. That’s what the dummy name stands for – we are imitating the categories with numbers.
Reviews
Write a review
People using this Best Practice also downloaded
More Best Practices from 365 Data Science
How to Apply INDEX and MATCH Separately and Combined | Advanced Excel
Learn how to apply both functions, INDEX and MATCH, separately and combined on Excel.651add_shopping_cartfreeby 365 Data Science
How to Classify Data | Types of Data
Read our article to find out the two main ways of classifying data.611add_shopping_cartfreeby 365 Data Science
How to Apply and Combine INDIRECT Excel Function with VLOOKUP
Learn how to apply and combine INDIRECT Excel Function with VLOOKUP.531add_shopping_cartfreeby 365 Data Science
How to Differentiate the Two Groups of Measurement Levels in Statistics
Understand the levels of measurement in statistics, which are split into two groups: qualitative and quantitative data.561add_shopping_cartfreeby 365 Data Science
How to Use Mean, Median, and Mode | Statistics
This lesson will introduce you to the three measures of central tendency.441add_shopping_cartfreeby 365 Data Science
How to Use the Distribution Function in Statistics
Learn everything you need to know about Distribution in Statistics.491add_shopping_cartfreeby 365 Data Science
How to Use Normal Distribution | Statistics
Learn everything you need to know about Normal Distribution in Statistics.521add_shopping_cartfreeby 365 Data Science
How to Apply The Central Limit Theorem
Learn how to apply the Central Limit Theorem in Statistics.611add_shopping_cartfreeby 365 Data Science
How to Use Student's T Distribution
Learn everything you need to know about Student's T Distribution.471add_shopping_cartfreeby 365 Data Science
How to Test a Hypothesis | Hypothesis Testing Steps: Null Hypothesis vs Alternative Hypothesis
Understand the Hypothesis Testing Steps, and see the difference between Null and Alternative Hypotheses.561add_shopping_cartfreeby 365 Data Science
How to Define the Types of Statistical Errors
Learn about the errors that can be made in hypothesis testing.511add_shopping_cartfreeby 365 Data Science
How to Define and Use the Simple Linear Regression Model
Learn everything you need to know about the simple linear regression model.311add_shopping_cartfreeby 365 Data Science
Discussion feed for How To Include Dummy Variables Into A Regression
The user community and author are here to help. Go ahead!
Eloquens Member