We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,我在看 DBSCAN算法的Spark实现.ipynb 代码时候,有一处不是很明白,希望可以解答一下。 文中在第八步骤
/*================================================================================*/ // 八,求每一个簇的代表核心和簇元素数量 /*================================================================================*/ .... val rdd_result = rdd_cluster.reduceByKey((a,b)=>{ val id_set = a._3 | b._3 # 不明白 val result = if(a._2>=b._2) (a._1,a._2,id_set) else (b._1,b._2,id_set) result }) ...
在这里,rdd_cluster进行了按照键的归并操作,但是rdd_cluster的已经是聚类的结果了,就说明rdd_cluster的键都是唯一的,没有重复的,所以我认为这一步骤reduceByKey是无用操作,是吗? 第二,如果我的第一的疑问有问题,那么成功运行了reduceByKey的逻辑,但是在reduceByKey的代码中,对相同类的邻居集合进行了并集|操作,为什么邻居数量result._2取的不是id_set的长度,而是Max(a._2,b._2)?
rdd_cluster
reduceByKey
|
result._2
id_set
Max(a._2,b._2)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
您好,我在看 DBSCAN算法的Spark实现.ipynb 代码时候,有一处不是很明白,希望可以解答一下。
文中在第八步骤
在这里,
rdd_cluster
进行了按照键的归并操作,但是rdd_cluster的已经是聚类的结果了,就说明rdd_cluster
的键都是唯一的,没有重复的,所以我认为这一步骤reduceByKey
是无用操作,是吗?第二,如果我的第一的疑问有问题,那么成功运行了
reduceByKey
的逻辑,但是在reduceByKey
的代码中,对相同类的邻居集合进行了并集|
操作,为什么邻居数量result._2
取的不是id_set
的长度,而是Max(a._2,b._2)
?The text was updated successfully, but these errors were encountered: