Random Forest Classification

The following two tabs change content below.
Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

Latest posts by Prasad Kharkar (see all)

Hello all, welcome to another machine learning tutorial. Here, we will learn about Random forest classification model. This article is quite similar to all previous classification articles because we are simply using new python libraries for classifiers and we are not changing the way data preprocessing and graphs are plotted.

Random Forest Classification:

Random Forest classification works on the same concept of Random Forest Regression.

Consider our example  from logistic regression, where we want to know whether a new user will buy the car or not.

Data Preprocessing:

  • We read “Social_Network_Ads.csv” file and stored in dataset.
  • Extracted age and salary information from dataset and stored in X
  • Extracted purchase information from dataset and stored in Y.
  • Split dataset in training and test set so that machine can be trained using X_train and Y_train
  • Used feature scaling for X_train.

Random Forest Classification:

  • Imported RandomForestClassifier from sklearn.ensemble
  • Created a classifier and
    • provided number of estimators as 10, i.e. our forest will compose of 10 decision trees
    • applied a widely used ‘entropy’ criterion to it
  • fitted classifier with training data set

Plotting the Graph:

  • aranged_ages variable will have scaled ages of users starting from minimum age to maximum age incremented by 0.01.
  • aranged_salaries variable will have scaled salaries of users starting from minimum salary to maximum salary incremented by 0.01.
  • np.meshgrid() takes aranged_ages and aranged_salaries to form X1 and X2.
  • X1 and X2 are used for creating a graph which classifies all data points using random forest classification. It is done using plt.contourf(), method.
Random Forest Classification

Random Forest Classification

  • Random forest classification takes number of decision trees as input parameters.
  • It takes predictions of all decision trees and chooses mostly predicted results from input decision trees.
  • Mostly predicted results are chosen as final results and they are plotted on graph as above
  • orange section is of users who will not buy the car and blue section is for users who will buy the car

Plotting Test set:

Above code plots actual data points in classification.

  • Red points denote users who did not buy the car
  • Green points denote users who bought the car.
Random Forest results

Random Forest results

Note that we have plotted 100 observations from our test set and out of them

  • 3 green points are observed on orange area
  • 5 red points are observed in blue area

This means, out of 100 observation points, random forest classification predicted 92 results correctly and only 8 are incorrect.

I hope this helped. Happy learning 🙂

 

References:

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *