K means clustering in R programming

toobamukhtar · March 2, 2019, 11:23am

I am trying to write a program using K-Means clustering for the given dataset.
The following is my sample code:

install.packages("cluster.datasets")
library(cluster.datasets)
data(anime)
anime
data(mammal.dentition)

How can I find optimal number of clusters in the above dataset using Elbow Method to finally apply K-means?

tshrivas · March 11, 2019, 6:05pm

toobamukhtar · April 2, 2019, 9:52am

toobamukhtar · April 2, 2019, 9:52am

Rabeez · April 3, 2019, 10:44am

For this method you need to run the clustering algorithm for different numbers of clusters and obtain some performance metric (explained variance, sum of squared distances) and find the number of optimal clusters from the plot. You pick the point where the curve starts to flatten (hence the name - elbow).

Here is some code accomplishing this.

# First, establish a vector to hold all values.
clusters.sum.squares <- rep(0.0, 14)

# Repeat K-means clustering with K equal to 2, 3, 4,...15.
cluster.params <- 2:15

for (i in cluster.params) {
  # Cluster data using K-means with the current value of i.
  kmeans.temp <- kmeans(data, centers = i)
  
  # Get the total sum of squared distances for all points
  # in the cluster and store it for plotting later.
  clusters.sum.squares[i - 1] <- sum(kmeans.temp$withinss)
}

# Plot our scree plot using the mighty ggplot2.
ggplot(NULL, aes(x = cluster.params, y = clusters.sum.squares)) +
  geom_point() +
  geom_line() +
  labs(x = "Number of Clusters",
       y = "Cluster Sum of Squared Distances",
       title = "Scree Plot")