Simple Linear Regression

The following two tabs change content below.
I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.

Latest posts by Renuka Joshi (see all)

Simple Linear Regression

Hello all!Welcome to another brain storming tutorial for Machine Learning.In this article we are going to implement Simple Linear Regression using Python.

Regression

Regression is an analysis of relationship between dependent and independent variables.It is a correlation between two variables.Regression analysis helps to understand how the value of a dependent variable changes when any of the values of independent variable varies.

Types of Regression

Types of regression are as follows :

  1. Simple Linear Regression
  2. Multiple Linear Regression
  3. Polynomial Regression
  4. Support Vector for Regression(SVR)
  5. Decision Tree Classification
  6. Random Forest Classification

In this tutorial we are going to implement Simple Linear Regression using Python.So,Let’s get started.

Simple Linear Regression

Simple linear regression determines the relationship between dependent and independent variables.It is a statistical analysis which predicts values of dependent variables based on the values of the independent variables.

Consider equation of a straight line,

y = c + mx

  • where y is the dependent variable
  • c is a constant,
  • x is independent variable
  • m is an coefficient i.e. slope of the line.

.

We are going to use LinearRegression class from sklearn.linear_model library.To implement simple linear regression we are going to create a new dataset containing at least 30 records of year of experience and total salary as follows.

YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00

Preprocess the dataset and also divide the dataset into train and test dataset as follows.

  • x_train : training data of independent variables. i.e. years of experience
  • x_test : test data for which we want to predict salaries
  • y_train : training data o dependent variables i.e. salaries based on years of experience
  • y_test : actual salaries for years of experience in x_test

variables are as below. Please note x_train and y_train contain 20 values.

variable states

variable states

We are to fit our training dataset into simple linear regression model.To do this create an object regressor of class LinearRegression.  Fit training data i.e. x_train and y_train in regressor as below.

regressor.fit() method takes dependent and independent variables as parameters. We are actually teaching the regressor that y_train values are all corresponding to X_train values.

 Predicting salaries

We are now going to predict the salaries related to X_test values i.e years of experience and compare them with actual i.e values of y_test as below.

x_test y_pred and y_test

x_test y_pred and y_test

regressor.predict() method predicts the values of salaries depending on the years of experience in X_test.

y_pred values are predicted salaries and we will compare them with actual salaries which we have in y_test.

Image besides shows years of experience, predicted salaries and actual salaries.

  • Regressor has predicted 40835.1 salary for an employee with 1.5 years of experience whose actual salary is 37731.
  • Regressor has predicted 123079 salary for an employee with 10.3 experience whose actual salary is 122391

 

Here,machine has learned to predict the salaries based on years of experiences.

Simple Linear Regression Graph:

Training Set

Training Set graph

Training Set graph

  • plt.scatter(X_train, y_train , color = 'red') plots scatter graph of salaries against years of experience for values in X_train and y_train
  • plt.plot(X_train, regressor.predict(X_train), color = 'blue') plots the graph of predicted salaries against years of experience.
  • Red dots represents co-relation between X_train and y_train i.e salaries and years of experience
  • Blue line is the simple linear regression.

Test Set:

Test set graph

Test set graph

plt.scatter(X_test, y_test , color = 'red') plots scatter graph of salaries against years of experience for values in X_test and y_test.

Please note that blue regression line remains the same as it shows all predicted salaries for any years of experience.

From the graph and our comparison of y_pred and y_test we can say that we have successfully predicted salaries for any given number of years of experience using Simple Linear Regression using python.

 

I hope this article helped understand Simple Linear Regression. In next article we will learn about multiple linear regression.

References:

Share Button

5 comments for “Simple Linear Regression

  1. Kiri
    May 14, 2018 at 9:52 pm

    Mam.. Why to publish copied content as your own? I see most of the articles are copied from some other place. Put your genuine stuff.

  2. Kiri
    May 14, 2018 at 9:53 pm

    Mam.. I found most of these articles at some other place.

    • May 16, 2018 at 11:33 pm

      Mam I have learned from udemy and wrote my own article, even I have provided references for same. Could you please share the link where do you find same stuff?

Leave a Reply

Your email address will not be published. Required fields are marked *