首页 | 官方网站   微博 | 高级检索  
     

基于用户查询日志的网络搜索主题分析
引用本文:张森,张晨,,林培光,张春云,郭玉超,任威龙,任可.基于用户查询日志的网络搜索主题分析[J].智能系统学报,2017,12(5):668-677.
作者姓名:张森  张晨    林培光  张春云  郭玉超  任威龙  任可
作者单位:1. 山东财经大学 计算机科学与技术学院, 山东 济南 250014;2. 香港科技大学 计算机科学及工程学系, 香港 999077
摘    要:网络搜索分析在优化搜索引擎方面具有举足轻重的作用,而且对用户个人搜索特性进行分析能够提高搜索引擎的精准度。目前,大多数已有模型(比如点击图模型及其变体),注重研究用户群体的共同特点。然而,关于如何做到既可以获取用户群体共同特点又可以获取用户个人特点方面的研究却非常少。本文研究了基于个人用户网络搜索分析新问题,即通过研究用户搜索的突发性现象,获取个人用户搜索查询的主题分布情况。提出了两个搜索主题模型,即搜索突发性模型(SBM)和耦合敏感搜索突发性模型(CS-SBM)。SBM假设查询词和URL主题是无关的,CS-SBM假设查询词和URL之间是有主题关联的,得到的主题分布信息存储在偏Dirichlet先验中,采用Beta分布刻画用户搜索的时间特性。实验结果表明,每一个用户的网络搜索轨迹都有多种基于用户的独有特点。同时,在使用大量真实用户查询日志数据情况下,与LDA、DCMLDA、TOT相比,本文提出的模型具有明显的泛化性能优势,并且有效地描绘了用户搜索查询主题在时间上的变化过程。

关 键 词:网络搜索  搜索引擎  自然语言处理  主题模型  文本挖掘  突发性  时间分析  参数估计

Web search topic analysis based on user search query logs
ZHANG Sen,ZHANG Chen,,LIN Peiguang,ZHANG Chunyun,GUO Yuchao,REN Weilong,REN Ke.Web search topic analysis based on user search query logs[J].CAAL Transactions on Intelligent Systems,2017,12(5):668-677.
Authors:ZHANG Sen  ZHANG Chen    LIN Peiguang  ZHANG Chunyun  GUO Yuchao  REN Weilong  REN Ke
Affiliation:1. School of Computer Science & Technology, Shandong University of Finance & Economics, Jinan 250014, China;2. Department of Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong 999077, China
Abstract:Web search analysis plays a critical role in improving the performance of contemporary search engines. In addition, search engine accuracy can be improved by analyzing the individual search properties of users. Most existing models, such as the click graph and its variants, focus on the common characteristics of the group. However, as yet, there has been little investigation of a model that would obtain both the collective group characteristics and the unique characteristics of individual users. In this paper, we investigate user-specific web search analysis, whereby we obtain the topic distributions of the search queries of individual users by determining the burstiness of user searches. We propose two topic models, i.e., the search burstiness model (SBM) and the coupling-sensitive search burstiness model (CS-SBM). The SBM adopts the assumption that the query words and URL are topically independent, The CS-SBM supposes that the query words and URL are topically relevant. The obtained topic distribution information is stored in skewed Dirichlet priors and a beta distribution is used to capture the temporal properties of the user searches. Our experimental results show that each user’s web search trail has unique characteristics, and that in the case of there being a large amount of real query log data, in comparison to the latent Dirichlet allocation (LDA) and topic over time (TOT) models, our proposed models have advantages with respect to generalized performance and effectively describes the temporal change process of user search queries.
Keywords:web search  search engine  natural language processing  topic model  data mining  burstiness  temporal analysis  parameter estimate
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号