首页 | 官方网站   微博 | 高级检索  
     

新浪微博搜索排序方法研究
引用本文:叶施仁;严水歌;杨长春.新浪微博搜索排序方法研究[J].江苏石油化工学院学报,2013(3):71-75.
作者姓名:叶施仁;严水歌;杨长春
作者单位:常州大学信息科学与工程学院
基金项目:国家自然科学基金项目(61003163);江苏省科技厅项目(BZ2010021)
摘    要:深入讨论了基于向量空间模型以及基于潜在语义分析的微博搜索排序算法,以新浪微博为例,通过建立实验系统,利用新浪微博公共开放平台提供的API获取实验数据,通过一个实验样例阐述向量空间模型和潜在语义分析的处理过程。新浪微博现有排序方法通常不能提供按照相关性排序的满意结果。利用向量空间模型以及潜在语义分析方法,构建"索引词-博文"矩阵,对博文进行分词和向量化。衡量博文和查询的相关度转化成计算博文向量和查询向量之间的相似度。把对博文和查询的处理简化为向量空间中向量的运算。由实验得知基于潜在语义分析的微博搜索排序算法有效地提高了博文的检索效率。

关 键 词:微博  向量空间模型  潜在语义分析  搜索排序

Research of Searching and Sorting Method of Sina Microblogging
Affiliation:YE Shi-ren,YAN Shui-ge,YANG Chang-chun(School of Information Science and Engineering,Changzhou University,Changzhou 213164,China)
Abstract:A searching and sorting method for Chinese microblog called Weibo is presented in this paper,based on the vector space model and latent semantic analysis.APIs,provided by the Sina microblogging public platform,are applied to obtain test data.Weibo posts using vector space model as matrix of "ndex-term content" are presented,and then a latent semantic analysis process on this matrix is performed.The relevance between Weibo contents and query was turned into the similarity between the Weibo content vector and query vector,which was calculated by the cosine value between Weibo content vector and inquiring vector decomposed by SVD.The treatment on the Weibo content and query was simplified as the operation for the vectors in the low-dimensional vector space.A sorting list of Weibo posts will be obtained according to their relevance to the query rather than the simple string-matching and post time descending order approach,which is widely used in many microblogging platforms.The experiment results indicate that the approach is able to retrieve the relevant posts in the top-ranked list.
Keywords:Weibo  vector space model  latent aemantic analysis  search ranking
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号