Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Clustering
LESSON 33
Clustering Algorithms
Keywords: Hierarchical, Agglomerative, Divisive, Partitional,
K-Means
Soft
Clustering
Hard
Clustering
Partitional
Hierarchical
Divisive
Square
Error
Agglomerative
Others
At the top level, there is a distinction between the hierarchical, partitional (hard) and soft clustering paradigms based on whether the
clusterings generated are partitions or overlapping.
The clustering algorithms are either hierarchical, where a nested sequence of partitions is generated, or partitional where a partition of
the given data set is generated.
The soft clustering algorithms are based on fuzzy sets, rough sets, artificial neural nets (ANNs), or evolutionary algorithms, specifically genetic
algorithms (GAs).
Hierarchical Algorithms
2. Find closest pair of clusters and merge them. Update the proximity
matrix to reflect the merge.
3. If all the patterns are in one cluster, stop. Else, go to step 2.
Note that step 1 in the above algorithm requires O(n2 ) time to compute
pairwise similarities and O(n2 ) space to store the values, when there
are n patterns in the given collection.
In the single-link algorithm, the distance between two clusters C1 and
C2 is the minimum of the distances d(X, Y ), where X C1 and Y C2 .
In the complete-link algorithm, the distance between two clusters C1
and C2 is the maximum of the distances d(X, Y ),where X C1 and
Y C2 .
Note that the single-link algorithm constructs a minimal spanning tree
with nodes located at the data points. On the other hand, the completelink algorithm characterizes strongly connected components as clusters.
Hierarchical clustering algorithms are computationally expensive.
The agglomerative algorithms require computation and storage of a
similarity or dissimilarity matrix of values that has O(n2 ) time and
space requirement.
Initially, they were used in applications where hundreds of patterns
were to be clustered.
However, when the data sets are larger in size, these algorithms are not
feasible because of the non-linear time and space demands.
It may not be easy to visualize a dendrogram corresponding to 1000
patterns.
Similarly, divisive algorithms require exponential time in the number
of patterns or the number of features. So, they too do not scale up well
in the context of large-scale problems involving millions of patterns.
Partitional Clustering
Example 1
K-means algorithm may be illustrated with the help of the three-dimensional
data set of 10 points given below.
(1, 1, 1)t (1, 1, 2)t (1, 3, 2)t (2, 1, 1)t (6, 3, 1)t
(6, 4, 1)t (6, 6, 6)t (6, 6, 7)t (6, 7, 6)t (7, 7, 7)t
Cluster1
Cluster2
Cluster3
(1, 1, 1)t
(2, 1, 1)t
(1, 1, 2)t
(1, 3, 2)t
(6, 3, 1)t
(6, 4, 1)t
(6, 6, 6)t
(6, 6, 7)t
(6, 7, 6)t
(7, 7, 7)t
Considering the first three points as the initial seed points of a 3-partition,
The three cluster representatives and assignment of the remaining 7 points
to the nearest centroids is shown in Table 1 using L1 metric as the distance
measure
The centroids of the clusters are
(1.5, 1, 1)t, (1, 1, 2)t, (5.4, 5.1, 4.3)t
Assignment of the 10 patterns to the nearest updated centroids is shown in
Table 2. The updated cluster centroids are
Cluster1
Cluster2
Cluster3
(1.5, 1, 1)t
(1, 1, 1)t
(2, 1, 1)t
(1, 1, 2)t
(1, 1, 2)t
(1, 3, 2)t
xith cluster (x
mi )t (x mi ).
Note that the value of the function is 55.5 for the 3-partition. However, if
we consider the initial seeds to be
(1, 1, 1)t, (6, 3, 1)t, (6, 6, 6)t,
then we have the assignment shown in Table 3. Further iterations do not
Cluster1
Cluster2
Cluster3
(1, 1, 1)t
(2, 1, 1)t
(1, 1, 2)t
(1, 3, 2)t
(6, 3, 1)t
(6, 4, 1)t
(1, 3, 2)t
(6, 6, 6)t
(6, 6, 7)t
(6, 7, 6)t
(7, 7, 7)t
It is best to use the K-Means algorithm when the clusters are hyperspherical.
It does not generate the intended partition, if the partition has nonspherical clusters; in such a case, a pattern may not belong to the same
cluster as its nearest centroid.
Even when the clusters are spherical, it may not find the globally optimal partitions as discussed in Example 1.
Several schemes have been used to find the globally optimal partition
corresponding to the minimum value of the squared-error criterion.
Assignment
1. Let us consider
X1 = (1, 1)
X4 = (2, 2)
X7 = (6, 2)
X10 = (7, 6)
Let us say that any two points belong to the same cluster if the Euclidean distance between them is less than a threshold of 5 units. Form
the Clusters.
2. Consider the following data set of 4 patterns. Use Hamming distance
to cluster the rows and then columns. How is the result superior to the
original data?
Pattern
1
2
3
4
f1 f2 f3 f4
1
0
0
1
0
1
1
0
0
1
1
0
1
0
0
1