Logistic Regression Classification

The following two tabs change content below.
Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

Latest posts by Prasad Kharkar (see all)

Hello all, welcome to another tutorial in machine learning. Till now ,we learned about many regression models which had independent variables and dependent variable. Dependent variable is result of independent variables. For example, we predicted salary based on experience, qualifications etc. However, we may come across some observations where dependent variable gets classified into specific values. We need classification mechanism for this kind of machine learning models.

From this article onward, we will learn about classification mechanisms and python libraries . This article will focus on logistic regression classification.

Logistic Regression Classification:

Logistic Regression classification can predict results from observation and classify appropriately. We will use the same dataset for logistic regression taken from www.superdatascience.com/machine-learning

Problem Statement:

Consider we have data set of  users of a social networking site with userId, gender,  age, salary. We are a car manufacturing company and have launched a great car. We want to know whether users will buy our car or not.

Dataset:

We have some data from observations as below. It has user id, gender, age and salary of a person and whether he has purchased the car or not.Full data set is available here.

user id gender age salary purchased
15628523 Male 35 39000 0
15708196 Male 49 74000 0
15735549 Female 39 134000 1
15809347 Female 41 71000 0
15660866 Female 58 101000 1

Logistic Regression Classification:

Data Preprocessing:

  • We read “Social_Network_Ads.csv” file and stored in dataset.
  • Extracted age and salary information from dataset and stored in X
  • Extracted purchase information from dataset and stored in Y.
  • Split dataset in training and test set so that machine can be trained using X_train and Y_train and y_test can be compared with y_pred.
  • Used feature scaling for X_train.

Logistic Regression:

  • We imported LogisticRegression class from sklearn.linear library.
  • Created classifier as object of LogisticRegression.
  • Fitted training data into classifier.
  • Predicted results for X_test and stored in y_pred

We have our predictions ready.

Plotting the graph:

 

I know that is lot of code for plotting a graph but I will try to explain what is being done here.

  • aranged_ages variable will have scaled ages of users starting from minimum age to maximum age incremented by 0.01.
  • aranged_salaries variable will have scaled salaries of users starting from minimum salary to maximum salary incremented by 0.01.
  • np.meshgrid() takes aranged_ages and aranged_salaries to form X1 and X2.
  • X1 and X2 are used for creating a graph which classifies all data points using logistic regression classification. It is done using plt.contourf(), method.

After this graph is plotted as below

classifications
classifications
  • Note the orange and blue sections in graph.
  • Logistic Regression classification has classified all data points into two classes, one who will not buy the car and the one who buys it.
  • Orange section denotes all users who will not buy the car
  • Blue section denotes all users who will buy the car.

Now, these are predictions of logistic regression classification. Let us plot actual observations on this graph and we will compare results

These lines will plot actual data points.

  • Red points denote users who did not buy the car
  • Green points denote users who bought the car.

Ok Let’s go.

Actual Test set results
Actual Test set results

Note that we have plotted 100 observations from our test set and out of them

  • Only 8 green points are observed on orange area
  • Only 3 red points are observed in blue area

This means, out of 100 observation points, logistic regression classification predicted 89 results correctly and only 11 are incorrect.

I hope this helped. Happy learning 🙂

 

References:

Share Button

Prasad Kharkar

Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

9 thoughts on “Logistic Regression Classification

Leave a Reply

Your email address will not be published. Required fields are marked *