首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
网页去重方法研究   总被引:2,自引:1,他引:1       下载免费PDF全文
搜索引擎返回的重复网页不但浪费了存储资源,而且加重了用户浏览的负担。针对网页重复的特征,提出了一种基于语义的去重方法。该方法通过句子在文本中的位置和组块的重要度,提取出网页正文的主题句向量,然后对主题句向量进行语义相似度计算,把重复的网页去除。实验证明,该方法对全文重复和部分重复的网页都能进行较准确的检测。  相似文献   

2.
This paper presents OpenDLib, a digital library infrastructure that provides capabilities for new-generation digital libraries. In particular, the paper introduces a document model that can be used to represent a wide variety of document types and describes the open architectural infrastructure that allows for the expansion of the digital library through the dynamic plugin of new services.  相似文献   

3.
CA技术在解决应用系统安全问题比较流行,但很少有介绍RA系统设计的文章。研究了应用Web技术进行RA系统的体系结构设计。首先解释了RA系统及PKI技术的一些基本概念,然后对一种基于Web技术和PKI技术的RA系统的体系结构设计进行了研究,分别说明了客户端控件、服务器端组件等的结构与设计方法,最后对RA系统客户端的安全性提出了一些建议。  相似文献   

4.
穆万军  游志胜  赵明华  余静 《计算机应用》2005,25(10):2310-2311
利用Grover量子搜索算法和概率论给出了挖掘网络数据的关联规则挖掘、权威页面挖掘和Weblog记录挖掘的一种新方法,最后说明该方法比任何经典方法要快得多。  相似文献   

5.
基于动态异构的Web信息集成网页分析方法   总被引:1,自引:0,他引:1  
将动态异构的Web信息资源进行抽取以统一的方式供用户查询和使用,是当前迫切需要解决的问题。介绍了分析相关Web页面的方法和经验,实现了自动提交HTML表单获得所需页面和对页面的信息抽取。最后,实验证明了此方法的有效性。  相似文献   

6.
Although caching has been shown as an efficient technique to reduce the delay in generating web pages to meet the page requests from web users, it becomes less effective if the pages are dynamic and contain dynamic contents. In this paper, instead of using caching, we study the effectiveness of using pre-fetching to resolve the problems in handling dynamic web pages. Pre-fetching is a proactive caching scheme since a page is cached before the receipt of any page request for the page. In addition to the problem of which pages to be pre-fetched, another equally important question is when to perform the pre-fetching. To resolve the prediction and timing problems, we explore the temporal properties of the dynamic web pages and the timing issues in accessing the pages to determine which pages to be pre-fetched and the best time to pre-fetch the pages to maximize the cache hit probability of the pre-fetched page. If the required pages can be found in the cache validly, the response times of the requests can be greatly reduced. The proposed scheme is called temporal pre-fetching (TPF) in which we prioritize pre-fetching requests based on the predicted usability of the to-be pre-fetched pages. To minimize the impact of incorrect prediction in pre-fetching on processing of on-demand page requests, a qualifying examination is performed to remove unnecessary and low usability pre-fetching requests while they are waiting to be processed and just before their processing. We have implemented the proposed TPF scheme in a web server system and experiments have been performed to study its performance characteristics compared with conventional cache-only scheme using a benchmark auction application under different system and application settings. As shown in the experiment results, the overall system performance, i.e., response time, is improved as more page requests can be served immediately using pre-fetched pages.  相似文献   

7.
为了解决显示屏亮度不统一所引起的多媒体网页图像色彩退化的问题,设计一种基于RGB模式的图像色彩增强模型。根据RGB模式的要求,对视觉图像从照度和反射两个分量的角度进行光滑化处理,在此基础上,利用RGB格式的增强系数,建立视觉图像的色彩增强函数。通过增强多媒体网页中视觉图像的整体亮度、调整图像局部对比度,恢复图像色彩的方式,增强视觉图像的色彩。在Windows XP系统内进行图像色彩增强效果的检测,结果显示,基于RGB模式的色彩增强模型能够切实增强图像的亮度、信息熵、饱和度。说明该模型具备有效性,较传统的直方图均衡模型的图像色彩增强效果好,符合实际推广应用标准。  相似文献   

8.
提出了一种基于语义关联的中文网页主题词提取方法,首先借助滑动窗口和“知网”计算词语间的语义相似度,形成候选名词对集合;然后基于该集合生成无向图表示词语间的语义联系,并通过该无向图对主题词权重进行建模;最后选取权值较高的名词作为主题词。实验结果表明,相比未建立语义关联的主题词提取方法,本方法在查准率、召回率和F1测度值上均有一定的提高,当提取主题词个数为7时,本方法召回率和F1测度值达到最大值,且分别较传统方法最大值提高了12.5%和9.53%。  相似文献   

9.
片段缓存机制是加速动态网页分发的有效解决方案之一,但是实施片段缓存需要有效的共享片段检测机制。针对这种情况,提出了一种高效的共享片段检测算法,介绍了基于片段缓存的动态网页传送模型。该模型能够自动识别共享片段和有效的缓存单元,更好地消除冗余数据,提高缓存命中率。实验和分析表明,与现有方案ESI和Silo相比,该模型能够有效节约带宽,缩短用户请求的响应时间。  相似文献   

10.
Recent advances in digital libraries have been closely intertwined with advances in Internet technologies. With the advent of the Web, digital libraries have been able to reach constituencies previously unanticipated. Because of the wide deployability of Web-accessible digital libraries, the potential for privacy violations has also grown tremendously. The much touted Semantic Web, with its agent, service, and ontology technologies, is slated to take the Web to another qualitative level in advances. Unfortunately, these advances may also open doors for privacy violations in ways never seen before. We propose a Semantic Web infrastructure, called SemWebDL, that enables the dynamic composition of disparate and autonomous digital libraries while preserving user privacy. In the proposed infrastructure, users will be able to pose more qualitative queries that may require the ad hoc collaboration of multiple digital libraries. In addition to the Semantic Web-based infrastructure, the quality of the response would rest on extraneous information in the form of a profile. We introduce the concept of communities to enable subject-based cooperation and search speedup. Further, digital libraries heterogeneity and autonomy are transcended by a layered Web-service-based infrastructure. Semantic Web-based digital library providers would advertise to Web services, which in turn are organized in communities accessed by users. For the purpose of privacy preservation, we devise a three-tier privacy model consisting of user privacy, Web service privacy, and digital library privacy that offers autonomy of perspectives for privacy definition and violation. We propose an approach that seamlessly interoperates with potentially conflicting privacy definitions and policies at the different levels of the Semantic Web-based infrastructure. A key aspect in the approach is the use of reputations for outsourcing Web services. A Web service reputation is associated with its behavior with regard to privacy preservation. We developed a technique that uses attribute ontologies and information flow difference to collect, evaluate, and disseminate the reputation of Web services.  相似文献   

11.
基于结构相关性Markov模型的Web网页预取方法   总被引:2,自引:0,他引:2  
预取技术通过在用户浏览当前网页的时间内提前取回其将来最有可能请求的网页来减小实际感知的获取网页的时间。预测的准确性和方法的可用性是预取技术需要解决的主要问题。针对目前Web网页预取的一般方法的不足之处,提出了一种基于结构相关性Markov模型的Web网页预取方法。仿真实验的结果表明,这种方法在保证一定预测准确性的同时也具有较好的可用性,能够在减小用户访问延迟、提高响应速度方面达到较为满意的效果。  相似文献   

12.
基于网页文本结构的网页去重   总被引:1,自引:0,他引:1  
魏丽霞  郑家恒 《计算机应用》2007,27(11):2854-2856
搜索引擎返回的重复网页不但浪费了存储资源,而且加重了用户浏览的负担。 针对网页重复的特征和网页文本自身的特点,提出了一种动态的网页去重方法。该方法通过将网页的正文表示成目录结构树的形式,实现了一种动态的特征提取算法和层次指纹的相似度计算算法。实验证明,该方法对全文重复和部分重复的网页都能进行准确的检测。  相似文献   

13.
通常在一个网站中会有几十个甚至几百个风格相似的页面,如果每次都重新设定网页结构以及相同栏目下的导航条、各类图标就显得非常麻烦,不过我们可以利用网页模板功能来简化操作。其实,模板的功能就是把网页布局和网页内容分离,在布局设计好之后将其存储为模板,这样相同布局的页面可以通过模板创建,因此能够极大提高工作效率。在网页的后期维护中,网页模板也发挥着同样巨大的作用,让网页更新变得不再困难。  相似文献   

14.
通常在一个网站中会有几十个甚至几百个风格相似的页面,如果每次都重新设定网页结构以及相同栏目下的导航条、各类图标就显得非常麻烦,不过我们可以利用网页模板功能来简化操作.其实,模板的功能就是把网页布局和网页内容分离,在布局设计好之后将其存储为模板,这样相同布局的页面可以通过模板创建,因此能够极大提高工作效率.在网页的后期维护中,网页模板也发挥着同样巨大的作用,让网页更新变得不再困难.  相似文献   

15.
16.
基于启发式规则的网页主题信息精确定位方法*   总被引:3,自引:0,他引:3  
目前大部分的信息抽取方法都是针对主题信息块的提取,没有进一步深入到各个单独主题信息的抽取。针对这一问题,提出了一种基于启发式规则的网页主题信息精确定位方法。首先针对各个单独的主题,分析其多方面的特征,制定出对应的启发式规则;然后利用不同的规则对定位主题重要度不同的这一特点,得到启发式规则的权值矩阵;最后利用基于启发式规则的定位算法精确定位各个主题。将该方法用于网页主题信息抽取系统中,抽取系统能够有效地对各个单独的主题进行定位和抽取。实验结果表明,该方法具有很好的有效性和准确性。  相似文献   

17.
The perception of the visual complexity of World Wide Web (Web) pages is a topic of significant interest. Previous work has examined the relationship between complexity and various aspects of presentation, including font styles, colours and images, but automatically quantifying this dimension of a web page at the level of the document remains a challenge. In this paper we demonstrate that areas of high complexity can be identified by detecting areas, or ‘chunks’, of a web page high in block-level elements. We report a computational algorithm that captures this metric and places web pages in a sequence that shows an 86% correlation with the sequences generated through user judgements of complexity. The work shows that structural aspects of a web page influence how complex a user perceives it to be, and presents a straightforward means of determining complexity through examining the DOM.  相似文献   

18.
With the explosive growth of information in the WWW, it is becoming increasingly difficult for the user to find information of interest. Visualisations may be helpful in assisting the users in their information retrieval task. Effective visualisation of the structure of a WWW site is extremely useful for browsing through the site. Visualisation can also be used to augment a WWW search engine when too many or too few results are retrieved. In this paper, we discuss several visualisations we have developed to facilitate information retrieval on the WWW. With VRML becoming the standard for graphics on the Web and efficient VRML browsers becoming available, VRML was used for developing these visualisations. Unique visualisations like focus + context views of WWW nodes and semantic visualisation are presented and examples are given on scenarios where the visualisations are useful.  相似文献   

19.
去除重复网页可以提高搜索引擎的搜索精度,减少数据存储空间。目前文本去重算法以关键词去重、语义指纹去重为主,用上述算法进行网页去重时容易发生误判。通过对字符关系矩阵进行K L展开,将每个字符映射成为一个数值,然后对这个数值序列做离散傅立叶变换,得到每个网页的傅立叶系数向量,通过比较傅立叶系数向量差异实现对网页的相似度判断。实验结果表明该方法可对网页实现较好的去重。  相似文献   

20.
Personalization of content returned from a Web site is an important problem in general and affects e-commerce and e-services in particular. Targeting appropriate information or products to the end user can significantly change (for the better) the user experience on a Web site. One possible approach to Web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. We present a system that mines the logs to obtain profiles and uses them to automatically generate a Web page containing URLs the user might be interested in. Profiles generated are only based on the prior traversal patterns of the user on the Web site and do not involve providing any declarative information or require the user to log in. Profiles are dynamic in nature. With time, a users traversal pattern changes. To reflect changes to the personalized page generated for the user, the profiles have to be regenerated, taking into account the existing profile. Instead of creating a new profile, we incrementally add and/or remove information from a user profile, aiming to save time as well as physical memory requirements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号