# Multiple Linear Regression

#### Latest posts by Renuka Joshi (see all)

- Image Classification Using CNN - May 13, 2018
- Convolutional Neural Networks - May 12, 2018
- Hierarchical Clustering - March 15, 2018

Hello all! In the previous article we saw how to implement simple linear regression in machine learning.In this tutorial we are will implement Multiple Linear Regression.

## Multiple Linear Regression

Multiple linear regression is similar to simple linear regression.In simple linear regression we had only one dependent and one independent variable whereas, in multiple linear regression we will teach machine to predict the values of dependent variable from two or more independent variables.Let’s get started.

Mathematical equation of multiple linear regression:

y = b

_{1}X_{1}+b_{2}X_{2}+b_{3}X_{3}+….+b_{n}X_{n}

Here,

- y is dependent variable
- b
_{1},b_{2}… are constants _{X1},X_{2}… are independent variables

## Dataset

We will predict salary of employees from the years of experience,total number of certifications,total number of worked hours and the department where employee is working using multiple linear regression.

Depatment WorkedHours Certification YearsExperience Salary Development 2300 0 1.1 39343 Testing 2100 1 1.3 46205 Development 2104 2 1.5 37731 UX Designer 1200 1 2 43525 Testing 1254 2 2.2 39891 UX Designer 1236 1 2.9 56642 Development 1452 2 3 60150 Testing 1789 1 3.2 54445 UX Designer 1645 1 3.2 64445 UX Designer 1258 0 3.7 57189 Testing 1478 3 3.9 63218 Development 1257 2 4 55794 Development 1596 1 4 56957 Testing 1256 2 4.1 57081 UX Designer 1489 3 4.5 61111 Development 1236 3 4.9 67938 Testing 2311 2 5.1 66029 UX Designer 2245 3 5.3 83088 Development 2365 1 5.9 81363 Development 1500 3 6 93940 Testing 1456 2 6.8 91738 Testing 1760 1 7.1 98273 UX Designer 2400 4 7.9 101302 Development 2148 3 8.2 113812 UX Designer 1450 2 8.7 109431 UX Designer 1000 4 9 105582 Testing 1540 3 9.5 116969 Development 1500 2 9.6 112635 Testing 3000 4 10.3 122391 UX Designer 2100 3 10.5 121872

## Data Preprocessing

We will use data preprocessing template which we have created previously.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Employee_Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values # Encoding categorical data from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder = LabelEncoder() X[:, 0] = labelencoder.fit_transform(X[:, 0]) onehotencoder = OneHotEncoder(categorical_features = [0]) X = onehotencoder.fit_transform(X).toarray() # Avoiding the Dummy Variable Trap X = X[:, 1:] # Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0) |

Here,we have preprocessed our data.

## Dummy Variables

When we encode the categorical data,skLearn library in Python creates separate column for each categorical data.For example,In our dataset ‘Employee_Data.csv’ contains Department as categorical data like development,testing,UX.So,when we encode this data,we get separate column created for all three categories as follows.

Development | Testing | UX |

1 | 0 | 0 |

0 | 1 | 0 |

1 | 0 | 0 |

0 | 0 | 1 |

While implementing multiple linear regression we will eliminate one dummy variable.For example,in the first row development = 1, testing = 0 and UX = 0.Each row should contain value one in only one of the column.So consider the first row

- If we remove development column then testing and UX are 0 which mean development should be 1
- If we remove testing column,development is already 1 and all the other columns should be 0
- If we remove UX column,development is already 1 and all the other columns should be 0

## Multiple Linear Regression Implementation

Now we will implement machine learning using python.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Employee_Data.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values # Encoding categorical data from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder = LabelEncoder() X[:, 0] = labelencoder.fit_transform(X[:, 0]) onehotencoder = OneHotEncoder(categorical_features = [0]) X = onehotencoder.fit_transform(X).toarray() # Avoiding the Dummy Variable Trap X = X[:, 1:] # Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0) # Fitting Multiple Linear Regression to the Training set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train) # Predicting the Test set results y_pred = regressor.predict(X_test) |

In above code snippet we have used data preprocessing template and after splitting dataset into train and test.we have used LinearRegression class from sklearn.linear_model library exactly same as we used in simple linear regression.After executing the above code we will have predicted values for X_test in y_pred that is y_pred will have salaries predicted from the data available in X_test.

### References:

Pingback:Backward Elimination for multiple linear regression - theJavaGeek

Pingback:Polynomial Regression - theJavaGeek

Pingback:Artificial Neural Networks - theJavaGeek

Good Article. Can you please let me know how to print the analysis of variance table from this method. Thanks in advance!