Countries Segmentation

Data Source: Here

Data Exploration

This projects aims to classify countries based on its parameters using clustering algorithm. There are several variables in the data, however only two variables are used for the project which are:

Income: Net income per person
Life Expectancy: The average number of years a new born child would live

It is essential to discover the statistical summary of the variables as shown below

Model Training

The model used is K-Means Clustering, therefore it is vital to determine the K value (number of groups) by training the models using 1 K to 10 K in which for each K the Within-Cluster Sum of Square (WCSS) value is calculated then the result is plotted.

By using the plotted result of different K value, the best K value is when the next line is flat in other words, when the difference in WCSS is not significant to the next K Value.

The K value of 3 is selected based on the Elbow Method.

Clustering Description

Cluster 1: Countries with low to high Expectancy and low income
Cluster 2: Countries with high Expectancy and high income
Cluster 3: Countries with low Expectancy and medium income

Checkout the source code on Github