# Naive Bayes Classification

#### Latest posts by Prasad Kharkar (see all)

- PyCharm for Machine Learning - July 17, 2018
- Linear Discriminant Analysis using Python - April 30, 2018
- Principal Component Analysis using Python - April 30, 2018

Hello all, welcome to another machine learning tutorial. Here, we will learn about Naive Bayes classification model. This article is quite similar to all previous classification articles because we are simply using new python libraries for classifiers and we are not changing the way data preprocessing and graphs are plotted.

# Naive Bayes Classification:

Naive bayes classification uses bayes theorem to determine class of new data points.

Consider our example from logistic regresesion, where we want to know whether a new user will buy the car or not.

## Data Preprocessing:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) |

- We read “Social_Network_Ads.csv” file and stored in dataset.
- Extracted age and salary information from dataset and stored in X
- Extracted purchase information from dataset and stored in Y.
- Split dataset in training and test set so that machine can be trained using X_train and Y_train
- Used feature scaling for X_train.

### Naive Bayes Classification:

1 2 3 4 |
# Fitting Naive Bayes to the Training set from sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(X_train, y_train) |

- Imported GaussianNB from sklearn.naive_bayes
- Created a classifier and fitted training set to it.

### Plotting the Graph:

1 2 3 4 5 6 7 8 9 |
from matplotlib.colors import ListedColormap X_set, y_set = X_test, y_test aranged_ages = np.arange(start = X_set[:, 0].min(), stop = X_set[:, 0].max(), step = 0.01) aranged_salaries = np.arange(start = X_set[:, 1].min(), stop = X_set[:, 1].max(), step = 0.01) X1, X2 = np.meshgrid(aranged_ages, aranged_salaries) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha = 0.5, cmap = ListedColormap(('orange', 'blue'))) |

**aranged_ages**variable will have scaled ages of users starting from minimum age to maximum age incremented by 0.01.**aranged_salaries**variable will have scaled salaries of users starting from minimum salary to maximum salary incremented by 0.01.- np.meshgrid() takes aranged_ages and aranged_salaries to form X1 and X2.
- X1 and X2 are used for creating a graph which classifies all data points using naive bayes classification. It is done using
**plt.contourf(),**method.

- Naive bayes classification draws a curve and creates orange and blue sections
- orange section is of users who will not buy the car and blue section is for users who will buy the car

### Plotting Test set:

1 2 3 4 5 6 7 8 9 10 |
plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j) plt.title('Naive Bayes Classification (Test set)') plt.xlabel('Age') plt.ylabel('Salary') plt.legend() plt.show() |

Above code plots actual data points in classification.

- Red points denote users who did not buy the car
- Green points denote users who bought the car.

Note that we have plotted 100 observations from our test set and out of them

- 7 green points are observed on orange area
- 3 red points are observed in blue area

This means, out of 100 observation points, Naive Bayes classification predicted 90 results correctly and only 10 are incorrect.

I hope this helped. Happy learning 🙂