# Multiple Linear Regression

The following two tabs change content below. #### Renuka Joshi

I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can. #### Latest posts by Renuka Joshi (see all)

Hello all! In the previous article we saw how to implement simple linear regression in machine learning.In this tutorial we are will implement Multiple Linear Regression.

## Multiple Linear Regression

Multiple linear regression is similar to simple linear regression.In simple linear regression we had only one dependent and one independent variable whereas, in multiple linear regression we will teach machine to predict the values of dependent variable from two or more independent variables.Let’s get started.

Mathematical equation of multiple linear regression:

y = b1X1+b2X2+b3X3+….+bnXn

Here,

• y is dependent variable
• b1,b2… are constants
• X1,X2… are independent variables

## Dataset

We will predict salary of employees from the years of experience,total number of certifications,total number of worked hours and the department where employee is working using multiple linear regression.

 Depatment WorkedHours Certification YearsExperience Salary Development 2300 0 1.1 39343 Testing 2100 1 1.3 46205 Development 2104 2 1.5 37731 UX Designer 1200 1 2 43525 Testing 1254 2 2.2 39891 UX Designer 1236 1 2.9 56642 Development 1452 2 3 60150 Testing 1789 1 3.2 54445 UX Designer 1645 1 3.2 64445 UX Designer 1258 0 3.7 57189 Testing 1478 3 3.9 63218 Development 1257 2 4 55794 Development 1596 1 4 56957 Testing 1256 2 4.1 57081 UX Designer 1489 3 4.5 61111 Development 1236 3 4.9 67938 Testing 2311 2 5.1 66029 UX Designer 2245 3 5.3 83088 Development 2365 1 5.9 81363 Development 1500 3 6 93940 Testing 1456 2 6.8 91738 Testing 1760 1 7.1 98273 UX Designer 2400 4 7.9 101302 Development 2148 3 8.2 113812 UX Designer 1450 2 8.7 109431 UX Designer 1000 4 9 105582 Testing 1540 3 9.5 116969 Development 1500 2 9.6 112635 Testing 3000 4 10.3 122391 UX Designer 2100 3 10.5 121872

## Data Preprocessing

We will use data preprocessing template which we have created previously.

Here,we have preprocessed our data.

## Dummy Variables

When we encode the categorical data,skLearn library in Python creates separate column for each categorical data.For example,In our dataset ‘Employee_Data.csv’ contains Department as categorical data like development,testing,UX.So,when we encode this data,we get separate column created for all three categories as follows.

 Development Testing UX 1 0 0 0 1 0 1 0 0 0 0 1

While implementing multiple linear regression we will eliminate one dummy variable.For example,in the first row development = 1, testing = 0 and UX = 0.Each row should contain value one in only one of the column.So consider the first row

• If we remove development column then testing and UX are 0 which mean development should be 1
• If we remove testing column,development is already 1 and all the other columns should be 0
• If we remove UX column,development is already 1 and all the other columns should be 0

## Multiple Linear Regression Implementation

Now we will implement machine learning using python.

In above code snippet we have used data preprocessing template and after splitting dataset into train and test.we have used LinearRegression class from sklearn.linear_model library exactly same as we used in simple linear regression.After executing the above code we will have predicted values for X_test in y_pred that is y_pred will have salaries predicted from the data available in X_test.

### References: #### Renuka Joshi

I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.

### 4 thoughts on “Multiple Linear Regression”

• Pingback:Polynomial Regression - theJavaGeek

• July 24, 2018 at 11:24 am