Skip to Main Content
Spark高级数据分析(第2版)
book

Spark高级数据分析(第2版)

by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
June 2018
Beginner to intermediate content levelBeginner to intermediate
246 pages
6h 57m
Chinese
Posts & Telecom Press
Content preview from Spark高级数据分析(第2版)
基于潜在语义分析算法分析维基百科
119
VS
中查询给定词项
ID
对应的行。
计算每个词项的得分。
找出最高得分的词项。
计算词项到词项
ID
的映射。
如果你正在使用
spark-shell
,就可以用以下方法加载此功能:
import com.cloudera.datascience.lsa.LSAQueryEngine
val termIdfs = idfModel.idf.toArray
val queryEngine = new LSAQueryEngine(svd, termIds, docIds, termIdfs)
下面是一些样例词项的最相关的词项得分情况:
queryEngine.printTopTermsForTerm("algorithm")
(algorithm,1.000000000000002), (heuristic,0.8773199836391916),
(compute,0.8561015487853708), (constraint,0.8370707630657652),
(optimization,0.8331940333186296), (complexity,0.823738607119692),
(algorithmic,0.8227315888559854), (iterative,0.822364922633442),
(recursive,0.8176921180556759), (minimization,0.8160188481409465) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

大数据项目管理:从规划到实现

大数据项目管理:从规划到实现

Ted Malaska, Jonathan Seidman
管理Kubernetes

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787115482525