We learned about multiple linear regression and backward elimination in previous articles. Now we will learn about another form of regression, i.e. polynomial regression. we will take a look at it to solve an interesting problem.
Before we go into details of polynomial regression, let us consider our case study first.
Consider our company wants to recruit certain people for certain positions. They have found a potential employee who is currently working as a vice president for last 2 years. He is expecting a salary of 190000 for his 2 years experience as vice president.
Now the question is, does his demand fit into our company’s salary structure? If his demand fits, how much can we really offer him?
Our company’s salary and position data:
The company has approximate salaries according to positions as below.
|Sr. Software Engineer||2||24000|
|Associate Vice President||7||110000|
Usually in our company, an employee can rise from Vice President to President level in 6 years, so we will predict salary for 8.3 level, because the employee has worked one third of tenure to become President.
Why to use Polynomial Regression?
Let us understand why we are using polynomial regression instead of linear regression. Looking at the dataset, it does not seem linear. As higher positions are concerned, salaries are changing non linearly. Let us first check whether linear regression is providing any good predictions.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values # This line creates a matrix
y = dataset.iloc[:, 2].values
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
salary = lin_reg.predict(8.3)
# Visualising the Linear Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
Note the graph plotted:
Note that red dots are actual salaries plotted against position levels. for position 8.3, our model has predicted a salary of approximately 239851, which is way more than potential employee wanted.
Blue line is regression line and its predictions are far from the reality in most of the cases. We can have a different model to consider and hence we can look into polynomial regression.
Polynomial Regression Model:
Equation of polynomial regression model is
y = b0 + b1x1 + b2x12 + b3x13 + … + bnx1n
We will give polynomial powers to existing positions dataset.
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg_2.predict(X_poly), color = 'blue')
# Predicting a new result with Polynomial Regression
salary = lin_reg_2.predict(poly_reg.fit_transform(8.3))
- import PolynomialFeatures class from sklearn.preprocessing.
- create poly_reg object with 4th degree Polynomial features. i.e. for our case, equation will become y = b0 + b1x1 + b2x12 + b3x13 + b4x14
- X_poly = poly_reg.fit_transform(X), will expand x matrix into X_poly where each column will contain values of powers of x.
- lin_reg_2.fit(X_poly, y) fits transformed X_poly matrix and salary data. Note that each column is power of x in increasing order
Now, note the graph plotted by polynomial regression
Salary predicted by polynomial regression is approximately 189117 which is quite close to what he was asking for and it fits our company salary model.
Our company can hire the new vice president and he will happily come to us :