Hierarchical Clustering

The following two tabs change content below.
I am a technology enthusiast and always up for challenges. Recently I have started getting hands dirty in machine learning using python and aspiring to gather everything I can.

Latest posts by Renuka Joshi (see all)

In previous tutorial we learned how to divide data-set into right number of clusters using K-means clustering.In this tutorial we will see another type of clustering i.e Hierarchical Clustering.This algorithm uses bottom up approach to divide data-set into clusters so,let’s get started.

Hierarchical Clustering

Hierarchical clustering algorithm works as follows:

  • Make each data point a single point cluster to form N clusters : N
  • Take 2 closest data points and make them one cluster : N-1
  • Take 2 closest clusters and make them one cluster : N-2
  • Repeat step 3 until you get only one cluster

Problem Statement

We are going to use same problem we previously used in k-means clustering tutorial.There is mall and they want to identify types of customers depending on the spending score and annual income of customers so that they can target right category for various campaigns.

Sample dataset:

Mall Dataset
You can download data-set from superdatascience.com/machine-learning/

Choosing right number of clusters

As our data-set is same as last tutorial we already know the right number of clusters which is 5.We can find right number of clusters in hierarchical clustering using Dendrograms .But,right now we will not go into deep.So,after getting the right number of clusters we are ready to fit our data-set into hierarchical clustering algorithm as below.

  • AgglomerativeClustering is class from sklearn.cluster library in Python.
  • We have provided 5 as the number of clusters that we already know from k-means clustering.
  • affinity is the metric used to compute linkage and it contains default value as euclidean.
  • linkage  criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.Its default values is ‘ward’.
  • y_hc  will have predicted types of customers in the form of clusters
  • Then we plotted types with different colors with actual customers

Execute above code and final result will look like below

Hierarchical Clustering

Hierarchical Clustering

Observe the 5 clusters. They can be categorized into

  • red cluster: Customers with high annual income but less spending score
  • magenta cluster: Customers with low income and low spending score
  • blue cluster: Customers with moderate income and moderate spending score
  • cyan cluster: Customers with low income but high spending score
  • green cluster: Customers with high income and high spending score.
  • Hierarchical clustering will not have centroids as it forms clusters on the basis of thresh-holding in dendrograms which we will see in future.

Using this data from hierarchical clustering, mall can create their campaigns and target customers accordingly. I hope this article helped understand hierarchical clustering.  Happy Learning 🙂


Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *