# K-means¶

\(K\)-means clustering aims to partition \(n\) observations into \(k\leq n\) clusters (sets \(\mathbf{S}\)), in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

In other words, its objective is to minimize:

where \(\mathbf{μ}_i\) is the mean of points in \(S_i\).

See Chapter 20 in [Bar12] for a detailed introduction.

## Example¶

Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as

```
features_train = RealFeatures(f_feats_train)
```

```
features_train = RealFeatures(f_feats_train);
```

```
RealFeatures features_train = new RealFeatures(f_feats_train);
```

```
features_train = Shogun::RealFeatures.new f_feats_train
```

```
features_train <- RealFeatures(f_feats_train)
```

```
features_train = shogun.RealFeatures(f_feats_train)
```

```
RealFeatures features_train = new RealFeatures(f_feats_train);
```

```
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);
```

In order to run CKMeans, we need to choose a distance, for example CEuclideanDistance, or other sub-classes of CDistance. The distance is initialized with the data we want to classify.

```
distance = EuclideanDistance(features_train, features_train)
```

```
distance = EuclideanDistance(features_train, features_train);
```

```
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
```

```
distance = Shogun::EuclideanDistance.new features_train, features_train
```

```
distance <- EuclideanDistance(features_train, features_train)
```

```
distance = shogun.EuclideanDistance(features_train, features_train)
```

```
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
```

```
auto distance = some<CEuclideanDistance>(features_train, features_train);
```

Once we have chosen a distance, we create an instance of the CKMeans classifier. We explicitly set \(k\), the number of clusters we are expecting to have as 3 and pass it to CKMeans. In this example, we apply Lloyd’s method for k-means clustering.

```
kmeans = KMeans(2, distance)
```

```
kmeans = KMeans(2, distance);
```

```
KMeans kmeans = new KMeans(2, distance);
```

```
kmeans = Shogun::KMeans.new 2, distance
```

```
kmeans <- KMeans(2, distance)
```

```
kmeans = shogun.KMeans(2, distance)
```

```
KMeans kmeans = new KMeans(2, distance);
```

```
auto kmeans = some<CKMeans>(2, distance);
```

Then we train the model:

```
kmeans.train()
```

```
kmeans.train();
```

```
kmeans.train();
```

```
kmeans.train
```

```
kmeans$train()
```

```
kmeans:train()
```

```
kmeans.train();
```

```
kmeans->train();
```

We can extract centers and radius of each cluster:

```
c = kmeans.get_cluster_centers()
r = kmeans.get_radiuses()
```

```
c = kmeans.get_cluster_centers();
r = kmeans.get_radiuses();
```

```
DoubleMatrix c = kmeans.get_cluster_centers();
DoubleMatrix r = kmeans.get_radiuses();
```

```
c = kmeans.get_cluster_centers
r = kmeans.get_radiuses
```

```
c <- kmeans$get_cluster_centers()
r <- kmeans$get_radiuses()
```

```
c = kmeans:get_cluster_centers()
r = kmeans:get_radiuses()
```

```
double[,] c = kmeans.get_cluster_centers();
double[] r = kmeans.get_radiuses();
```

```
auto c = kmeans->get_cluster_centers();
auto r = kmeans->get_radiuses();
```

CKMeans also supports mini batch \(k\)-means clustering. We can create an instance of CKMeans classifier with mini batch \(k\)-means method by providing the batch size and iteration number.

```
kmeans_mb = KMeansMiniBatch(2, distance)
kmeans_mb.set_mb_params(4, 1000)
```

```
kmeans_mb = KMeansMiniBatch(2, distance);
kmeans_mb.set_mb_params(4, 1000);
```

```
KMeansMiniBatch kmeans_mb = new KMeansMiniBatch(2, distance);
kmeans_mb.set_mb_params(4, 1000);
```

```
kmeans_mb = Shogun::KMeansMiniBatch.new 2, distance
kmeans_mb.set_mb_params 4, 1000
```

```
kmeans_mb <- KMeansMiniBatch(2, distance)
kmeans_mb$set_mb_params(4, 1000)
```

```
kmeans_mb = shogun.KMeansMiniBatch(2, distance)
kmeans_mb:set_mb_params(4, 1000)
```

```
KMeansMiniBatch kmeans_mb = new KMeansMiniBatch(2, distance);
kmeans_mb.set_mb_params(4, 1000);
```

```
auto kmeans_mb = some<CKMeansMiniBatch>(2, distance);
kmeans_mb->set_mb_params(4, 1000);
```

Then train the model and extract the centers and radius information as mentioned above.