smallDataIndex

This package contains Weka Cluster algorithm with a complete list of indices that will help you to decide the optimal number of clusters that the dataset could have. The package includes the followings indices:

Silhouette
Dunn
BD-Silhouette [1]
BD-Dunn [1]
Davies-Bouldin
Calinski-Harabasz
MaximumDiameter
SquaredDistance
AverageDistance
AverageBetweenClusterDistance
MinimumDistance.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

This is a Maven Project with OpenCSV 2.4 and Weka 3.8.0 dependencies. Both of them are included in the pom.xml file in the repository.

<dependency>
    <groupId>au.com.bytecode</groupId>
    <artifactId>opencsv</artifactId>
    <version>2.4</version>
</dependency>

        
<dependency>
    <groupId>nz.ac.waikato.cms.weka</groupId>
    <artifactId>weka-stable</artifactId>
    <version>3.8.0</version>
</dependency>

Running

WekaCluster is the main class, and it includes 4 arguments that can be set from the code or directly when you execute the jar file:

Argument 0: minNumCluster: it is the minimum cluster number that the dataset is going to be tested.
Argument 1: maxNumCluster: it is the maximum cluster number that the dataset is going to be tested.
Argument 2: pathFile: it is the path of the input dataset. It must includes the complete pathfile.
Argument 3: outFile: it is the path of the output result file. It must includes the complete pathfile.
Argument 4: selector: you can choose between SMALLDATA, BIGDATA and ALL. SMALLDATA only execute smalldata indices. BIGDATA just execute BD-Silhouette and BD-Dunn indices from [1]. And ALL execute SMALLDATA and BIGDATA indices.

int minNumCluster = 2;
int maxNumCluster = 10;
int selector = SMALLDATA;

String fileName = "SmallDataset.csv";
String folderFile = "C:\\datasets\\";
String pathFile = folderFile + fileName;
String outFile = getFileNameOutput(selector, fileName);

For this configuration the application load a file called SmallDataset.csv in "C:/datasets" and the result file will be saved as "Results-SmallDataset.csv"in the application folder.

Execution example

If we preffer executing in a terminal using java we just have to:

java -jar smallDataIndices.jar 2 10 C:/datasets/SmallDataset.csv Results.csv ALL
java -jar smallDataIndices.jar 10 20 datasets/dataset.csv results.csv SMALLDATA

Built With

Weka - Kmeans from Weka.
Maven - Dependency Management.
EmergentOrder - For the use of CSV format.

Authors

José María Luna - Initial work - José María Luna Romera

Acknowledgments

University of Waikato

References

[1] Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3

Please, cite as: Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3 (https://link.springer.com/article/10.1007%2Fs13748-017-0135-3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

smallDataIndex

Getting Started

Prerequisites

Running

Execution example

Built With

Authors

Acknowledgments

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

smallDataIndex

Getting Started

Prerequisites

Running

Execution example

Built With

Authors

Acknowledgments

References