We learned about support vector regression in previous article and now we will implement decision tree regression to predict salaries of employees at certain position.
Decision Tree Regression
It splits dataset into sections and calculates predictions from average values of data points in each section. So, prediction for all data points lying within one section will be same.
# Decision Tree Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Fitting Decision Tree Regression to the dataset
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
# Predicting a new result
y_pred = regressor.predict(8.3)
# Visualising the Decision Tree Regression results (higher resolution)
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Decision Tree Regression')
- We have used DecisionTreeRegressor class from sklearn.tree library
- regressor.fit will fit x and y to regressor object of DecisionTreeRegressor class
Execute above lines of code and you will see graph plotted as below
- We can notice the graph is not continuous.
- Prediction for 8.3 level is 170000.
- Horizontal lines are averages of all data points in sections created.
- Predictions are averages of data point sections. So prediction for each value lying in one section will be the same.
- For example, note that horizontal lines start from halfway past any number and end just before halfway of next interval. In our case, horizontal lines are from 1.6 to 2.5, 2.6 to 3.5 and so on.
- These horizontal lines represent sections. If you predict value of all data points between 7.6 and 8.5, prediction will always be 170000 according to decision tree regression.