Decision Tree Regression
The following two tabs change content below.
I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.
Latest posts by Renuka Joshi (see all)
- Image Classification Using CNN - May 13, 2018
- Convolutional Neural Networks - May 12, 2018
- Hierarchical Clustering - March 15, 2018
We learned about support vector regression in previous article and now we will implement decision tree regression to predict salaries of employees at certain position.
Decision Tree Regression
It splits dataset into sections and calculates predictions from average values of data points in each section. So, prediction for all data points lying within one section will be same.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# Decision Tree Regression # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Position_Salaries.csv') X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values # Fitting Decision Tree Regression to the dataset from sklearn.tree import DecisionTreeRegressor regressor = DecisionTreeRegressor(random_state = 0) regressor.fit(X, y) # Predicting a new result y_pred = regressor.predict(8.3) # Visualising the Decision Tree Regression results (higher resolution) X_grid = np.arange(min(X), max(X), 0.01) X_grid = X_grid.reshape((len(X_grid), 1)) plt.scatter(X, y, color = 'red') plt.plot(X_grid, regressor.predict(X_grid), color = 'blue') plt.title('Decision Tree Regression') plt.xlabel('Position level') plt.ylabel('Salary') plt.show() |
Here,
- We have used DecisionTreeRegressor class from sklearn.tree library
- regressor.fit will fit x and y to regressor object of DecisionTreeRegressor class
Execute above lines of code and you will see graph plotted as below
- We can notice the graph is not continuous.
- Prediction for 8.3 level is 170000.
- Horizontal lines are averages of all data points in sections created.
- Predictions are averages of data point sections. So prediction for each value lying in one section will be the same.
- For example, note that horizontal lines start from halfway past any number and end just before halfway of next interval. In our case, horizontal lines are from 1.6 to 2.5, 2.6 to 3.5 and so on.
- These horizontal lines represent sections. If you predict value of all data points between 7.6 and 8.5, prediction will always be 170000 according to decision tree regression.
References:
Pingback:Decision Tree Classification - theJavaGeek
Can you please share an example with multi independent variable and one dependent variable ?
Thank you!!