Decision Tree Classification

The following two tabs change content below.
Prasad Kharkar is a java enthusiast and always keen to explore and learn java technologies. He is SCJP,OCPWCD, OCEJPAD and aspires to be java architect.

Latest posts by Prasad Kharkar (see all)

Hello all, welcome to another machine learning tutorial. Here, we will learn about Decision Tree classification model. This article is quite similar to all previous classification articles because we are simply using new python libraries for classifiers and we are not changing the way data preprocessing and graphs are plotted.

Decision Tree Classification:

Decision Tree classification works on the same concept of Decision Tree Regression.

Consider our example  from logistic regression, where we want to know whether a new user will buy the car or not.

Data Preprocessing:

  • We read “Social_Network_Ads.csv” file and stored in dataset.
  • Extracted age and salary information from dataset and stored in X
  • Extracted purchase information from dataset and stored in Y.
  • Split dataset in training and test set so that machine can be trained using X_train and Y_train
  • Used feature scaling for X_train.

Decision Tree Classification:

  • Imported DecisionTreeClassifier from sklearn.tree
  • Created a classifier and applied a widely used ‘entropy’ criterion to it
  • fitted classifier with training data set

Plotting the Graph:

  • aranged_ages variable will have scaled ages of users starting from minimum age to maximum age incremented by 0.01.
  • aranged_salaries variable will have scaled salaries of users starting from minimum salary to maximum salary incremented by 0.01.
  • np.meshgrid() takes aranged_ages and aranged_salaries to form X1 and X2.
  • X1 and X2 are used for creating a graph which classifies all data points using decision tree classification. It is done using plt.contourf(), method.
Decision Tree Classification

Decision Tree Classification

  • Decision Tree does not draw any straight line, or a curve, it creates sections as shown in above graph
  • orange section is of users who will not buy the car and blue section is for users who will buy the car

Plotting Test set:

Above code plots actual data points in classification.

  • Red points denote users who did not buy the car
  • Green points denote users who bought the car.
Decision Tree Results

Decision Tree Results

Note that we have plotted 100 observations from our test set and out of them

  • 3 green points are observed on orange area
  • 6 red points are observed in blue area

This means, out of 100 observation points, Decision Tree classification predicted 91 results correctly and only 9 are incorrect.

I hope this helped. Happy learning 🙂



Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *