首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Correlation-Based Web Document Clustering for Adaptive Web Interface Design   总被引:2,自引:2,他引:2  
A great challenge for web site designers is how to ensure users' easy access to important web pages efficiently. In this paper we present a clustering-based approach to address this problem. Our approach to this challenge is to perform efficient and effective correlation analysis based on web logs and construct clusters of web pages to reflect the co-visit behavior of web site users. We present a novel approach for adapting previous clustering algorithms that are designed for databases in the problem domain of web page clustering, and show that our new methods can generate high-quality clusters for very large web logs when previous methods fail. Based on the high-quality clustering results, we then apply the data-mined clustering knowledge to the problem of adapting web interfaces to improve users' performance. We develop an automatic method for web interface adaptation: by introducing index pages that minimize overall user browsing costs. The index pages are aimed at providing short cuts for users to ensure that users get to their objective web pages fast, and we solve a previously open problem of how to determine an optimal number of index pages. We empirically show that our approach performs better than many of the previous algorithms based on experiments on several realistic web log files. Received 25 November 2000 / Revised 15 March 2001 / Accepted in revised form 14 May 2001  相似文献   

2.
基于隐马尔可夫模型的兴趣迁移模式发现   总被引:17,自引:0,他引:17  
王实  高文 《计算机学报》2001,24(2):152-157
Web挖掘的一个重要研究方向是发现用户的迁移模式。一般来说,用户的迁移具有某种目的性。这种目的性表现为用户对某种概念的兴趣。文中提出基于隐马尔可夫模型的兴趣迁移模式发现方法,用于发现这种带有某种兴趣的用户迁移模式,这种模式实质上是一种特殊的关联规则。在这种方法中,作者首先根据用户的访问记录定义一个隐马尔可夫模型,然后提出一种新的增量发现算法Increase_R用于发现兴趣迁移模式,同时给出了证明以说明该算法可以发现所有的兴趣迁移模式。  相似文献   

3.
Web使用挖掘是数据挖掘技术在Web信息仓库中的应用.Web使用挖掘通过挖掘Web服务器日志获取的知识来预测用户浏览行为,是Web挖掘技术中的一个重要研究方向.通常发现的知识或一些意外规则很可能是不精确的、不完备的,这就需要用软计算技术如粗糙集来解决.提出一种基于粗糙近似的聚类方法,该方法能够实现从Web访问日志中聚类Web事务.通过这种方法可以有效地挖掘Web日志记录,从而发现用户存取Web页面的模式.  相似文献   

4.
随着WWW的迅速发展和网络用户的急剧增加,准确预测Web用户的访问行为对减小用户的感知延时,实现个性化推荐等具有重要的作用.无论是Markov模型还是其任何一种变种,高阶模型具有较好的预测性能.然而,高阶模型通常有较高的状态空间复杂度.提出了一种新的混合阶Markov模型(HMPM),将前缀相同的序列共享存储,降低了状态空间复杂度.仿真实验结果表明,该模型在一定程度上提高了预测准确率,查全率也有所提升.  相似文献   

5.
A Data Cube Model for Prediction-Based Web Prefetching   总被引:7,自引:0,他引:7  
Reducing the web latency is one of the primary concerns of Internet research. Web caching and web prefetching are two effective techniques to latency reduction. A primary method for intelligent prefetching is to rank potential web documents based on prediction models that are trained on the past web server and proxy server log data, and to prefetch the highly ranked objects. For this method to work well, the prediction model must be updated constantly, and different queries must be answered efficiently. In this paper we present a data-cube model to represent Web access sessions for data mining for supporting the prediction model construction. The cube model organizes session data into three dimensions. With the data cube in place, we apply efficient data mining algorithms for clustering and correlation analysis. As a result of the analysis, the web page clusters can then be used to guide the prefetching system. In this paper, we propose an integrated web-caching and web-prefetching model, where the issues of prefetching aggressiveness, replacement policy and increased network traffic are addressed together in an integrated framework. The core of our integrated solution is a prediction model based on statistical correlation between web objects. This model can be frequently updated by querying the data cube of web server logs. This integrated data cube and prediction based prefetching framework represents a first such effort in our knowledge.  相似文献   

6.
一种新的预测用户浏览模式的度量方法   总被引:1,自引:0,他引:1       下载免费PDF全文
在Web环境中,度量用户的浏览模式对Web站点结构的改进是有益的。挖掘和度量Web日志能够识别用户的访问模式模型,Web站点管理者能够应用这些模型研究用户的访问偏爱度,由此改进站点的体系结构以及分析这些改进带来的影响。因此,提出用户群偏爱度这样一个新概念,并使用了基于用户群的模糊聚类算法(UGFC),然后根据聚类结果,即具有相似访问习惯的用户群体,度量用户群偏爱度,再基于用户群偏爱度,利用混合阶Markov模型(HOMM)进行预测。实验表明,这种新的度量预测方法(UGFC-HOMM)比传统Markov模型(TMM)预测更准确,并且实验用精确率、覆盖率和运行时间这3个度量评价值对预测性能进行评估。  相似文献   

7.
郭平  陈婷  李东 《计算机科学》2006,33(3):75-79
自适应网站能够根据用户需要快速灵活地改变自身,动态适应不断变化的用户需求和应用环境.本文基于图的频繁闭项集从站点一定时期内的日志中挖掘出闭相关页面集,据此提供在线动态推荐为用户导航,改善了传统的合作推荐存在的稀疏性和扩展性问题,在不增大网站服务器负荷的情况下提高对所有用户的信息服务质量.最后,分析了两种测量推荐系统性能的方法并对系统进行评价.  相似文献   

8.
随着WWW应用的高速发展和广泛普及,WWW服务器上收集大量的Web日志。对这些日志进行实时的数据开采,可得到大量关联规则,这些规则存放在实时规则数据库中。为了能即时并准确得到最能反映当前用户访问模式的规则,我们需要一套建立在规则形式基础上的查询和触发器系统,从实时规则数据库中,分析出Web用户当前访问模式的变化趋势。  相似文献   

9.
In recent years, there has been considerable research on constructing crawlers which find resources satisfying specific conditions called predicates. Such a predicate could be a keyword query, a topical query, or some arbitrary contraint on the internal structure of the web page. Several techniques such as focussed crawling and intelligent crawling have recently been proposed for performing the topic specific resource discovery process. All these crawlers are linkage based, since they use the hyperlink behavior in order to perform resource discovery. Recent studies have shown that the topical correlations in hyperlinks are quite noisy and may not always show the consistency necessary for a reliable resource discovery process. In this paper, we will approach the problem of resource discovery from an entirely different perspective; we will mine the significant browsing patterns of world wide web users in order to model the likelihood of web pages belonging to a specified predicate. This user behavior can be mined from the freely available traces of large public domain proxies on the world wide web. For example, proxy caches such as Squid are hierarchical proxies which make their logs publically available. As we shall see in this paper, such traces are a rich source of information which can be mined in order to find the users that are most relevant to the topic of a given crawl. We refer to this technique as collaborative crawling because it mines the collective user experiences in order to find topical resources. Such a strategy turns out to be extremely effective because the topical consistency in world wide web browsing patterns turns out to very high compared to the noisy linkage information. In addition, the user-centered crawling system can be combined with linkage based systems to create an overall system which works more effectively than a system based purely on either user behavior or hyperlinks.  相似文献   

10.
一种基于Web用户不完备信息的规则获取方法研究   总被引:1,自引:0,他引:1  
Web日志是一个很不完全且存在多样性特点的数据集,在获取决策规则的过程中经常会出现不一致、不完全规则的情况.提到了粗糙集理论,利用粗糙集理论在处理不完全知识上的特有优势来解决此种问题.首先把重要的用户行为特征值离散化作为属性值和值的约简,然后通过粗糙集缺省规则获取算法获得决策规则.其中条件属性的提取主要是一个对用户行为观察和分析的结果,而离散化处理方法就是应用粗糙集理论中的典型方法.这种处理方法有利于最后规则提取的进行,经过实例分析效果良好.  相似文献   

11.
随着web服务数量大幅增长,如何快速准确的发现并满足用户需求的服务已经成为一个亟待解决的问题.现有的基于语义的web服务发现通常使用混合的方法,先在本体层面上进行语义匹配,当语义匹配失败的时候再采取其他的方法(基于关键字的匹配、基于结构分析)来弥补这个缺陷,在补救的过程当中由于现有的方法并未准确的反应两个概念之间的相似性,从而导致web服务的发现的准确率不高.将信息内容语义相似度计算的思想考虑在内,提出了采用基于服务的IO(input, output)语义匹配和基于信息内容语义相似计算相结合的方法,并以owls-tc2.0作为测试集合对该方法进行测试,实验结果表明该方法能有效提高服务发现的准确率.  相似文献   

12.
13.
Advances in the data mining technologies have enabled the intelligent Web abilities in various applications by utilizing the hidden user behavior patterns discovered from the Web logs. Intelligent methods for discovering and predicting user’s patterns is important in supporting intelligent Web applications like personalized services. Although numerous studies have been done on Web usage mining, few of them consider the temporal evolution characteristic in discovering web user’s patterns. In this paper, we propose a novel data mining algorithm named Temporal N-Gram (TN-Gram) for constructing prediction models of Web user navigation by considering the temporality property in Web usage evolution. Moreover, three kinds of new measures are proposed for evaluating the temporal evolution of navigation patterns under different time periods. Through experimental evaluation on both of real-life and simulated datasets, the proposed TN-Gram model is shown to outperform other approaches like N-gram modeling in terms of prediction precision, in particular when the web user’s navigating behavior changes significantly with temporal evolution.  相似文献   

14.
Zhang  Hongjiang  Chen  Zheng  Li  Mingjing  Su  Zhong 《World Wide Web》2003,6(2):131-155
A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between low-level image features used to index images and high-level semantic contents of images. One solution to this bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process. In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed methods fall well into such framework. We present a framework of relevance feedback and semantic learning in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in feedback processes to improve the retrieval performance. We have also extended framework to a content-based web image search engine in which hosting web pages are used to collect relevant annotations for images and users' feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and relevance feedback approaches.  相似文献   

15.
On Using a Warehouse to Analyze Web Logs   总被引:1,自引:0,他引:1  
Analyzing Web Logs for usage and access trends can not only provide important information to web site developers and administrators, but also help in creating adaptive web sites. While there are many existing tools that generate fixed reports from web logs, they typically do not allow ad-hoc analysis queries. Moreover, such tools cannot discover hidden patterns of access embedded in the access logs. We describe a relational OLAP (ROLAP) approach for creating a web-log warehouse. This is populated both from web logs, as well as the results of mining web logs. We discuss the design criteria that influenced our choice of dimensions, facts and data granularity. A web based ad-hoc tool for analytic queries on the warehouse was developed. We present some of the performance specific experiments that we performed on our warehouse.  相似文献   

16.
QoS-Aware Composite Services Retrieval   总被引:4,自引:1,他引:3       下载免费PDF全文
For current service-oriented applications, individual web service usually cannot meet the requirements arising from real world applications, so it is necessary to combine the functionalities of different web services to obtain a composite service in response to users' service requests. In order to address the problem of web service composition, this paper proposes an efficient approach to composing basic services in case no any individual service can fully satisfy users' requests. Compared with the general strategies adopted in most previously proposed approaches where only the best composition solution is produced, the QoS-aware service composition approach is given and top k solutions in the framework are provided, rather than focusing on obtaining the best composition solution, since the presented approach allows more candidates that are likely to meet the requirements of the users. The approach is based on a succinct binary tree data structure, and a system,named ATC (Approach to Top-k Composite services retrieval) system is implemented. In ATC, Qos is taken into account for composite service, and a heuristic-based search method is proposed to retrieve top k composite service. Some extensive experiments are designed and two web service benchmarks are used for performance study. The experimental results snow that the proposed approach can assure high precision and efficiency for composite service search.  相似文献   

17.
传统的网络安全技术已经难以有效防范针对Web应用的攻击行为,Web应用入侵检测作为一种重要的安全技术已受到了广泛的重视。访问日志是Web应用入侵检测的重要数据,然而,海量的日志记录令应用管理员望而却步,若缺乏有效的分析方法,将很难发现和定位入侵行为。致力于这个问题的解决,多种误用和异常检测模型已被提出和采用。针对动态页面采用参数值长度、字符分布等统计异常模型,对真实Web应用的访问日志进行入侵检测,实验结果表明,模型可以有效地检测SQL注入等攻击。  相似文献   

18.
数据挖掘技术在Web预取中的应用研究   总被引:69,自引:0,他引:69  
WWW以其多媒体的传输及良好的交互性而倍受青睐,虽然近几年来网络速度得到了很大的提高,但是由于接入Internet的用户数量剧增以及Web服务和网络固有的延迟,使得网络越来越拥护,用户的服务质量得不到很好的保证。为此文中提出了一种智能Web预取技术,它能够加快用户浏览Web页面时获取页面的速度。该技术通过简化的WWW数据模型表示用户浏览器缓冲器中的数据,在此基础上利用数据挖掘技术挖掘用户的兴趣关联规则,存放在兴趣关联知识库中,作为对用户行为进行预测的依据。在用户端,智能代理负责用户兴趣的挖掘及基于兴趣关联知识库的Web预取,从而对用户实现透明的浏览器加速。  相似文献   

19.
用户行为异常检测在安全审计系统中的应用   总被引:4,自引:0,他引:4  
江伟  陈龙  王国胤 《计算机应用》2006,26(7):1637-1639
提出一种基于数据挖掘的用户行为审计方法,通过对正常审计数据进行分类预处理获取其他传统方法容易遗漏的正常模式,结合使用关联规则以及序列模式挖掘技术对用户行为进行模式挖掘,根据模式相似度比较来检测用户行为的异常。将此方法应用于实际的安全审计系统,得到了较好的效果。  相似文献   

20.
利用在线数据收集系统对用户在网页上的浏览行为信息进行收集,并按照关联规则进行频繁模Q式演算,根据演算得到的行为模式,判断出哪种组合浏览行为最能反映用户对网页的实际兴趣,为个性化的网页推荐和网页站点的规划提供依据,使web挖掘在电子商务上进行更充分的应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号