With having gcc compiler and Make installing the program should be as easy as writing make. An executable "clbin" will be created.
Help menu. Argument and its value must be separated by a space:
-h - help menu
-f - input data points file name
-o - output file name
-ci - centers input file name
-a - algorithm choice (lloyd, elkan, hamerly, macqueen, hartigan, closest)
-i - initialization method choice (kpp, forgy, partition, furthest, firstn)
-m - metric choice (euclidean, manhattan)
-k - clusters count, -ci flag has a higher priority
-s - random seed nr, otherwise uses current time as the seed. Used to confirm clustering results
-n - iteration count (default 100)
- k-means++
- Forgy
- Partition (assigns points to random cluster and then finds means of these assignments as centers)
- Furthest first
- firstn (chooses k first points from input file as initial cluster centers).
- Lloyd
- Elkan
- Hamerly
- MacQueen
- Hartigan-Wong
- Closest (just assigns points to closest centers and stops)
- Euclidean
- Manhattan
It should be rather easy to add more of them.
- First line consists of two integers: Point count and data dimensionality. Follows n rows with d doubles on each of them.
Outputs 2 files
- File consisting of numbers to which cluster a point belongs to
- File consisting of means' vectors. First row has an integer which shows how many means follow.