Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 3.29 KB

README.md

File metadata and controls

88 lines (63 loc) · 3.29 KB

smallDataIndex

This package contains Weka Cluster algorithm with a complete list of indices that will help you to decide the optimal number of clusters that the dataset could have. The package includes the followings indices:

  • Silhouette
  • Dunn
  • BD-Silhouette [1]
  • BD-Dunn [1]
  • Davies-Bouldin
  • Calinski-Harabasz
  • MaximumDiameter
  • SquaredDistance
  • AverageDistance
  • AverageBetweenClusterDistance
  • MinimumDistance.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

This is a Maven Project with OpenCSV 2.4 and Weka 3.8.0 dependencies. Both of them are included in the pom.xml file in the repository.

<dependency>
    <groupId>au.com.bytecode</groupId>
    <artifactId>opencsv</artifactId>
    <version>2.4</version>
</dependency>

        
<dependency>
    <groupId>nz.ac.waikato.cms.weka</groupId>
    <artifactId>weka-stable</artifactId>
    <version>3.8.0</version>
</dependency>

Running

WekaCluster is the main class, and it includes 4 arguments that can be set from the code or directly when you execute the jar file:

  • Argument 0: minNumCluster: it is the minimum cluster number that the dataset is going to be tested.
  • Argument 1: maxNumCluster: it is the maximum cluster number that the dataset is going to be tested.
  • Argument 2: pathFile: it is the path of the input dataset. It must includes the complete pathfile.
  • Argument 3: outFile: it is the path of the output result file. It must includes the complete pathfile.
  • Argument 4: selector: you can choose between SMALLDATA, BIGDATA and ALL. SMALLDATA only execute smalldata indices. BIGDATA just execute BD-Silhouette and BD-Dunn indices from [1]. And ALL execute SMALLDATA and BIGDATA indices.
int minNumCluster = 2;
int maxNumCluster = 10;
int selector = SMALLDATA;

String fileName = "SmallDataset.csv";
String folderFile = "C:\\datasets\\";
String pathFile = folderFile + fileName;
String outFile = getFileNameOutput(selector, fileName);

For this configuration the application load a file called SmallDataset.csv in "C:/datasets" and the result file will be saved as "Results-SmallDataset.csv"in the application folder.

Execution example

If we preffer executing in a terminal using java we just have to:

java -jar smallDataIndices.jar 2 10 C:/datasets/SmallDataset.csv Results.csv ALL
java -jar smallDataIndices.jar 10 20 datasets/dataset.csv results.csv SMALLDATA

Built With

Authors

Acknowledgments

  • University of Waikato

References

[1] Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3

Please, cite as: Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M. et al. Prog Artif Intell (2017). https://doi.org/10.1007/s13748-017-0135-3 (https://link.springer.com/article/10.1007%2Fs13748-017-0135-3)