# Clustering benchmarks ## Datasets This project contains collection of labeled clustering problems that can be found in the literature. Most of datasets were artificially created. The benchmark includes: * [artificial datasets](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial) * [real world datasets](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/real-world) ### Artificial data 2d-10c 2d-20c-no0 2d-3c-no123 2d-4c-no4 2d-4c-no9 2d-4c 2sp2glob 3-spiral 3MC D31 DS577 DS850 R15 aggregation atom banana birch-rg1 birch-rg2 birch-rg3 chainlink cluto-t4.8k cluto-t5.8k cluto-t7.10k cluto-t8.8k complex8 complex9 compound cure-t0-2000n-2D cure-t1-2000n-2D cure-t2-4k curves1 curves2 dartboard1 dartboard2 dense-disk-3000 dense-disk-5000 diamond9 disk-1000n disk-3000n disk-4000n disk-4500n disk-4600n disk-5000n disk-6000n donut1 donut2 donut3 donutcurves ds2c2sc13 ds3c3sc6 ds4c2sc8 elliptical_10_2 elly-2d10c13s engytime flame fourty golfball hepta insect jain long1 long2 long3 longsquare lsun mopsi-finland mopsi-joensuu pathbased rings s-set1 s-set2 s-set3 s-set4 sizes1 sizes2 sizes3 sizes4 sizes5 smile1 smile2 smile3 spherical_4_3 spherical_5_2 spherical_6_2 spiral spiralsquare square1 square2 square3 square4 square5 st900 target tetra triangle1 triangle2 twenty twodiamonds wingnut xclara zelnik1 zelnik2 zelnik3 zelnik4 zelnik5 zelnik6 ## Experiments This project contains set of clustering methods benchmarks on various dataset. The project is dependent on [Clueminer project](https://github.com/deric/clueminer). in order to run benchmark compile dependencies into a single JAR file: mvn assembly:assembly # Consensus experiment allows running repeated runs of the same algorithm: ``` ./run consensus --dataset "triangle1" --repeat 10 ``` by default k-means algorithm is used. For available datasets see [resources folder](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial).