We continue to cover identifying groups in multivariate data. This class will focus on cluster analysis. This is a broad topic and could probably cover most of a semester, if you want more in depth start by looking at:

Cluster book

Notebook files

R notebook-Cluster part 2

R notebook Rmarkdown file-Cluster part 1

Challenge

  1. Complete an agglomerative cluster analysis on the USArrests data
  2. Identify the appropriate number of clusters
  3. Create a dendrogram of the data using ggplot2
  4. Format the coloring of the dendrogram so that it matches it groups identified in step two. It should look something like this