1 / 2018-05-16 12:12:44
An Similar Entity Identification Method for Text Big Data Based on Spark Parallel Framework
Spark,Text big data,Similar entity identification,Graph theory
全文待审
Tong YU / Northeast Electric Power University
LI Hongbiao / Northeast electric power university
Aiming at the problem that the similar entity identification for high-dimensional
and massive text data, a method based on Spark parallel framework is proposed. Firstly,
convert the corresponding records of entities into Simhash fingerprints(binary strings) by
using Simhash algorithm to realize the conversion of high-dimensional text data and lowdimensional
Simhash fingerprints. Secondly, a Simhash fingerprint recognition strategy
(SFRS, Simhash Fingerprint Recognition Strategy) based on Graph theory is designed so as to
identify the similar Simhash fingerprints, proceeding to identify the corresponding records,
realize the similar entities identification. Finally, a similar entity identification algorithm
based on the SFRS and Spark is proposed, which is applied to the similar entity identification
of high-dimensional and massive text data, then a comparatively experimental analysis about
text data from UCI is conducted, the experimental results show the good performances and
applicability of the presented method.
重要日期
  • 会议日期

    10月02日

    2018

    10月04日

    2018

  • 05月30日 2018

    摘要截稿日期

  • 05月30日 2018

    初稿截稿日期

  • 06月10日 2018

    初稿录用通知日期

  • 07月30日 2018

    终稿截稿日期

  • 10月04日 2018

    注册截止日期

承办单位
Universitas Sriwijaya
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询