Simple Linear Regression

The following two tabs change content below.
I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.

Latest posts by Renuka Joshi (see all)

Simple Linear Regression

Hello all!Welcome to another brain storming tutorial for Machine Learning.In this article we are going to implement Simple Linear Regression using Python.

Regression

Regression is an analysis of relationship between dependent and independent variables.It is a correlation between two variables.Regression analysis helps to understand how the value of a dependent variable changes when any of the values of independent variable varies.

Types of Regression

Types of regression are as follows :

  1. Simple Linear Regression
  2. Multiple Linear Regression
  3. Polynomial Regression
  4. Support Vector for Regression(SVR)
  5. Decision Tree Classification
  6. Random Forest Classification

In this tutorial we are going to implement Simple Linear Regression using Python.So,Let’s get started.

Simple Linear Regression

Simple linear regression determines the relationship between dependent and independent variables.It is a statistical analysis which predicts values of dependent variables based on the values of the independent variables.

Consider equation of a straight line,

y = c + mx

  • where y is the dependent variable
  • c is a constant,
  • x is independent variable
  • m is an coefficient i.e. slope of the line.

.

We are going to use LinearRegression class from sklearn.linear_model library.To implement simple linear regression we are going to create a new dataset containing at least 30 records of year of experience and total salary as follows.

YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00

Preprocess the dataset and also divide the dataset into train and test dataset as follows.

  • x_train : training data of independent variables. i.e. years of experience
  • x_test : test data for which we want to predict salaries
  • y_train : training data o dependent variables i.e. salaries based on years of experience
  • y_test : actual salaries for years of experience in x_test

variables are as below. Please note x_train and y_train contain 20 values.

variable states
variable states

We are to fit our training dataset into simple linear regression model.To do this create an object regressor of class LinearRegression.  Fit training data i.e. x_train and y_train in regressor as below.

regressor.fit() method takes dependent and independent variables as parameters. We are actually teaching the regressor that y_train values are all corresponding to X_train values.

 Predicting salaries

We are now going to predict the salaries related to X_test values i.e years of experience and compare them with actual i.e values of y_test as below.

x_test y_pred and y_test
x_test y_pred and y_test

regressor.predict() method predicts the values of salaries depending on the years of experience in X_test.

y_pred values are predicted salaries and we will compare them with actual salaries which we have in y_test.

Image besides shows years of experience, predicted salaries and actual salaries.

  • Regressor has predicted 40835.1 salary for an employee with 1.5 years of experience whose actual salary is 37731.
  • Regressor has predicted 123079 salary for an employee with 10.3 experience whose actual salary is 122391

 

Here,machine has learned to predict the salaries based on years of experiences.

Simple Linear Regression Graph:

Training Set

Training Set graph
Training Set graph

  • plt.scatter(X_train, y_train , color = 'red') plots scatter graph of salaries against years of experience for values in X_train and y_train
  • plt.plot(X_train, regressor.predict(X_train), color = 'blue') plots the graph of predicted salaries against years of experience.
  • Red dots represents co-relation between X_train and y_train i.e salaries and years of experience
  • Blue line is the simple linear regression.

Test Set:

Test set graph
Test set graph

plt.scatter(X_test, y_test , color = 'red') plots scatter graph of salaries against years of experience for values in X_test and y_test.

Please note that blue regression line remains the same as it shows all predicted salaries for any years of experience.

From the graph and our comparison of y_pred and y_test we can say that we have successfully predicted salaries for any given number of years of experience using Simple Linear Regression using python.

 

I hope this article helped understand Simple Linear Regression. In next article we will learn about multiple linear regression.

References:

Share Button

Renuka Joshi

I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.

6 thoughts on “Simple Linear Regression

Leave a Reply

Your email address will not be published. Required fields are marked *