Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 711 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 711 Bytes

Spark clustering algorithms

Implemntation of DBSCAN and K-means clustering algorithms in Scala using Spark framework. Algorithms deal only with two dimensional (x and y) data.

DBSCAN

Program arguments: <input_file> <min_points_in_cluster> <epsilon>

KMeans

Program arguments: <input_file> <number_of_clusters> <converge_distance>

Dataset

Sample dataset file is included - data.txt.

Running

  • When launching on a cluster refer to Spak official documentation.
  • In order to run on local machine use -Dspark.master=local VM option.