# Clustering benchmarks ## Datasets This project contains collection of labeled clustering problems that can be found in the literature. Most of datasets were artificially created. The benchmark includes: * [artificial datasets](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial) * [real world datasets](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/real-world) ### Artificial data

## Experiments This project contains set of clustering methods benchmarks on various dataset. The project is dependent on [Clueminer project](https://github.com/deric/clueminer). in order to run benchmark compile dependencies into a single JAR file: mvn assembly:assembly # Consensus experiment allows running repeated runs of the same algorithm: ``` ./run consensus --dataset "triangle1" --repeat 10 ``` by default k-means algorithm is used. For available datasets see [resources folder](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial).