# Polynomial Regression

#### Latest posts by Prasad Kharkar (see all)

- PyCharm for Machine Learning - July 17, 2018
- Linear Discriminant Analysis using Python - April 30, 2018
- Principal Component Analysis using Python - April 30, 2018

We learned about multiple linear regression and backward elimination in previous articles. Now we will learn about another form of regression, i.e. polynomial regression. we will take a look at it to solve an interesting problem.

# Polynomial Regression:

Before we go into details of polynomial regression, let us consider our case study first.

## Problem Statement:

Consider our company wants to recruit certain people for certain positions. They have found a potential employee who is currently working as a vice president for last 2 years. He is expecting a salary of 190000 for his 2 years experience as vice president.

Now the question is, does his demand fit into our company’s salary structure? If his demand fits, how much can we really offer him?

### Our company’s salary and position data:

The company has approximate salaries according to positions as below.

Position | Level | Salary |

Software Engineer | 1 | 22000 |

Sr. Software Engineer | 2 | 24000 |

Technology Lead | 3 | 30000 |

Team Leader | 4 | 38000 |

Manager | 5 | 50000 |

Senior Manager | 6 | 75000 |

Associate Vice President | 7 | 110000 |

Vice President | 8 | 170000 |

President | 9 | 260000 |

CEO | 10 | 480000 |

Usually in our company, an employee can rise from Vice President to President level in 6 years, so we will predict salary for 8.3 level, because the employee has worked one third of tenure to become President.

### Why to use Polynomial Regression?

Let us understand why we are using polynomial regression instead of linear regression. Looking at the dataset, it does not seem linear. As higher positions are concerned, salaries are changing non linearly. Let us first check whether linear regression is providing any good predictions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Position_Salaries.csv') X = dataset.iloc[:, 1:2].values # This line creates a matrix y = dataset.iloc[:, 2].values # Fitting Linear Regression to the dataset from sklearn.linear_model import LinearRegression lin_reg = LinearRegression() lin_reg.fit(X, y) salary = lin_reg.predict(8.3) # Visualising the Linear Regression results plt.scatter(X, y, color = 'red') plt.plot(X, lin_reg.predict(X), color = 'blue') plt.title('Linear Regression') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() |

Note the graph plotted:

Note that red dots are actual salaries plotted against position levels. for position 8.3, our model has predicted a salary of approximately 239851, which is way more than potential employee wanted.

Blue line is regression line and its predictions are far from the reality in most of the cases. We can have a different model to consider and hence we can look into polynomial regression.

### Polynomial Regression Model:

Equation of polynomial regression model is

y = b

_{0}+ b_{1}x_{1}+ b_{2}x_{1}^{2}+ b_{3}x_{1}^{3 }+ … + b_{n}x_{1}^{n }

We will give polynomial powers to existing positions dataset.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Fitting Polynomial Regression to the dataset from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree = 4) X_poly = poly_reg.fit_transform(X) lin_reg_2 = LinearRegression() lin_reg_2.fit(X_poly, y) # Visualising the Polynomial Regression results plt.scatter(X, y, color = 'red') plt.plot(X, lin_reg_2.predict(X_poly), color = 'blue') plt.title('Polynomial Regression') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() # Predicting a new result with Polynomial Regression salary = lin_reg_2.predict(poly_reg.fit_transform(8.3)) |

- import
**PolynomialFeatures**class from**sklearn.preprocessing**. - create
**poly_reg**object with 4th degree Polynomial features. i.e. for our case, equation will become**y = b**_{0}+ b_{1}x_{1}+ b_{2}x_{1}^{2}+ b_{3}x_{1}^{3 }+ b_{4}x_{1}^{4} **X_poly = poly_reg.fit_transform(X),**will expand x matrix into X_poly where each column will contain values of powers of x.**lin_reg_2.fit(X_poly, y)**fits transformed X_poly matrix and salary data. Note that each column is power of x in increasing order

Now, note the graph plotted by polynomial regression

Salary predicted by polynomial regression is approximately 189117 which is quite close to what he was asking for and it fits our company salary model.

Our company can hire the new vice president and he will happily come to us :

Pingback:Support Vector Regression - theJavaGeek