Countries Segmentation


Data Source: Here

Data Exploration

This projects aims to classify countries based on its parameters using clustering algorithm. There are several variables in the data, however only two variables are used for the project which are:

  • Income: Net income per person
  • Life Expectancy: The average number of years a new born child would live
Data Variables

It is essential to discover the statistical summary of the variables as shown below

Data Variables

Model Training

The model used is K-Means Clustering, therefore it is vital to determine the K value (number of groups) by training the models using 1 K to 10 K in which for each K the Within-Cluster Sum of Square (WCSS) value is calculated then the result is plotted.

Dashboard Prototype

By using the plotted result of different K value, the best K value is when the next line is flat in other words, when the difference in WCSS is not significant to the next K Value.

The K value of 3 is selected based on the Elbow Method.

Dashboard Prototype

Clustering Description

  • Cluster 1: Countries with low to high Expectancy and low income
  • Cluster 2: Countries with high Expectancy and high income
  • Cluster 3: Countries with low Expectancy and medium income

Checkout the source code on Github


Copyright © 2023 Giovanni Abel Christian