K Nearest Neighbors classification

The following two tabs change content below.
Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

Latest posts by Prasad Kharkar (see all)

Hello all, welcome to another machine learning tutorial. Here, we will learn about K nearest neighbors classification model. We will take same example from logistic regression article.

K Nearest Neighbors Classification:

Let us understand what K Nearest Neighbors classification does.

  • It chooses K number for a new data point
  • Takes K nearest neighbors of newly considered data point according to distance.
  • Count the number of nearest neighbors for each category.
  • Place data point to category with most number of nearest neighbors.

Consider our example where we want to know whether a new user will buy the car or not.

  • Choose number of neighbors for this user, e.g. 5
  • check all data points which are nearest to this user. So we will have 5 nearest neighbors
  • There are 3 users who have bought the car and 2 users who have not bought the car.
  • Classify this user as a buyer because 3 out of 5 neighbors have already bought it.

Data Preprocessing:

  • We read “Social_Network_Ads.csv” file and stored in dataset.
  • Extracted age and salary information from dataset and stored in X
  • Extracted purchase information from dataset and stored in Y.
  • Split dataset in training and test set so that machine can be trained using X_train and Y_train and y_test can be compared with y_pred.
  • Used feature scaling for X_train.

K Nearest Neighbors Classification:

  • Imported KNeighborsClassifier from sklearn.neighbors
  • Created a classifier with all default paramaters,
    • KNeighborsClassifier takes 5 nearest neighbors by default
    • it uses euclidean distance by default.
  • Fit training set to classifier.

Plotting the Graph:

  • aranged_ages variable will have scaled ages of users starting from minimum age to maximum age incremented by 0.01.
  • aranged_salaries variable will have scaled salaries of users starting from minimum salary to maximum salary incremented by 0.01.
  • np.meshgrid() takes aranged_ages and aranged_salaries to form X1 and X2.
  • X1 and X2 are used for creating a graph which classifies all data points using K nearest neighbors classification. It is done using plt.contourf(), method.
Nearest Neighbors Classification
Nearest Neighbors Classification
  • Note the orange and blue sections in graph.
  • K Nearest Neighbors classification has created two classes which are clearly non linear.
  • So unlike logistic regression, this is a non linear regression.
  • Orange section denotes all users who will not buy the car
  • Blue section denotes all users who will buy the car.

Plotting Test set:

Above code plots actual data points in classification.

  • Red points denote users who did not buy the car
  • Green points denote users who bought the car.
Predictions against actual results
Predictions against actual results

Note that we have plotted 100 observations from our test set and out of them

  • Only 3 green points are observed on orange area
  • Only 4 red points are observed in blue area

This means, out of 100 observation points, K nearest neighbors classification predicted 93 results correctly and only 7 are incorrect.

I hope this helped. Happy learning 🙂

References:

Share Button

Prasad Kharkar

Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

Leave a Reply

Your email address will not be published. Required fields are marked *