首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
数据挖掘技术在Web预取中的应用研究   总被引:69,自引:0,他引:69  
WWW以其多媒体的传输及良好的交互性而倍受青睐,虽然近几年来网络速度得到了很大的提高,但是由于接入Internet的用户数量剧增以及Web服务和网络固有的延迟,使得网络越来越拥护,用户的服务质量得不到很好的保证。为此文中提出了一种智能Web预取技术,它能够加快用户浏览Web页面时获取页面的速度。该技术通过简化的WWW数据模型表示用户浏览器缓冲器中的数据,在此基础上利用数据挖掘技术挖掘用户的兴趣关联规则,存放在兴趣关联知识库中,作为对用户行为进行预测的依据。在用户端,智能代理负责用户兴趣的挖掘及基于兴趣关联知识库的Web预取,从而对用户实现透明的浏览器加速。  相似文献   

2.
Exploiting Regularities in Web Traffic Patterns for Cache Replacement   总被引:2,自引:0,他引:2  
Cohen  Kaplan 《Algorithmica》2002,33(3):300-334
Abstract. Caching web pages at proxies and in web servers' memories can greatly enhance performance. Proxy caching is known to reduce network load and both proxy and server caching can significantly decrease latency. Web caching problems have different properties than traditional operating systems caching, and cache replacement can benefit by recognizing and exploiting these differences. We address two aspects of the predictability of traffic patterns: the overall load experienced by large proxy and web servers, and the distinct access patterns of individual pages. We formalize the notion of ``cache load' under various replacement policies, including LRU and LFU, and demonstrate that the trace of a large proxy server exhibits regular load. Predictable load allows for improved design, analysis, and experimental evaluation of replacement policies. We provide a simple and (near) optimal replacement policy when each page request has an associated distribution function on the next request time of the page. Without the predictable load assumption, no such online policy is possible and it is known that even obtaining an offline optimum is hard. For experiments, predictable load enables comparing and evaluating cache replacement policies using partial traces , containing requests made to only a subset of the pages. Our results are based on considering a simpler caching model which we call the interval caching model . We relate traditional and interval caching policies under predictable load, and derive (near)-optimal replacement policies from their optimal interval caching counterparts.  相似文献   

3.
如果网页在服务器端需要执行代码则属于动态网网站,动态网站的网页会随客户端请求的不同返回的网页也不同,也就是说传往客户端的网页是动态生成的。若网页在服务器端不需要执行代码则属于静态网站,此时传往客户端的网页是事先编好存在于服务器上的网页文件,它们是永远不变的。动态网站也可以采用静动结合的原则,适合采用动态网页的地方用动态网页,如果有必要使用静态网页,则可以考虑用静态网页的方法来实现。  相似文献   

4.
缩短Web访问中的用户感知时间,是Web应用中的一个重要问题,服务器需要预测用户未来的HTTP请求和处理当前的网页以提高Web服务器的响应速度,为此提出了一种基于用户访问模式的Web预取算法.该算法根据Web日志信息分析了用户的访问模式,并计算出Web页面间的转移概率,以此作为对用户未来请求预取的依据.实验结果表明,该预取算法能有效提高预测精度和命中率,有效地缩短了用户的感知时间.  相似文献   

5.
Web caching proxy servers are essential for improving web performance and scalability, and recent research has focused on making proxy caching work for database-backed web sites. In this paper, we explore a new proxy caching framework that exploits the query semantics of HTML forms. We identify two common classes of form-based queries from real-world database-backed web sites, namely, keyword-based queries and function-embedded queries. Using typical examples of these queries, we study two representative caching schemes within our framework: (i) traditional passive query caching, and (ii) active query caching, in which the proxy cache can service a request by evaluating a query over the contents of the cache. Results from our experimental implementation show that our form-based proxy is a general and flexible approach that efficiently enables active caching schemes for database-backed web sites. Furthermore, handling query containment at the proxy yields significant performance advantages over passive query caching, but extending the power of the active cache to do full semantic caching appears to be less generally effective.  相似文献   

6.
Caching web pages is an important part of web infrastructures. Medium to large‐scale infrastructures deploy a cluster of servers to solve the scalability and storage problems inherent in caching. In this paper we present dynamic information‐based scalable hashing that evenly hashes client requests to a cluster of cache servers, resulting in performance scalability. Runtime information is used to determine when and how to cache pages. Cached pages are stored and retrieved mutually exclusively to/from all the servers to minimize the use of storage, resulting in storage scalability. We set up an experimental environment consisting of various machines, including client servers, a cluster of 16 cache servers, and a load balancer. We demonstrate through experimental results that dynamic information‐based scalable hashing maximizes both performance scalability and storage scalability while the existing approaches do only either one of the two. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

7.
Information Centric Network (ICN) is an emerging network paradigm centered around the named contents rather than the host-to-host connectivity. The common characteristic of ICN leverages in-network caching to achieve an efficient and reliable content distribution but also brings challenges. The in-network caching technique equips all ICN routers with cache storage. However, no existing works focus on the cache storage sharing mechanism among different applications to satisfy the line speed requirements and diversity of applications in ICN. In this paper, we formulate the per-application storage management problem into an optimized resource allocation problem and introduce a manifold learning method to classify the priority of applications. Dynamic programming is adopted to solve the formulated problem and an adaptive per-application storage management scheme is proposed on the basis of the optimal solutions. Extensive experiments have been performed to evaluate the proposed scheme and show that our approach is superior to static partitioning and shared storage schemes.  相似文献   

8.
利用ASP,可以很容易地把HTML(超文本标记语言)文本、脚本命令及ActiveX组件混合在一起构成ASP页,以此来生成动态网页,创建交互式的Web站点,实现对Web数据库的访问。当用户使用浏览器请求ASP主页时,Web服务器响应,调用ASP引擎来执行ASP文件,并解释其中的脚本语言(JScript或VBScript),通过ODBC连接数据库,由数据库访问组件ADO(ActiveX Data Ob-jects)完成数据库操作,最后ASP生成包含有数据查询结果的HTML主页返回用户端显示。用ASP中数据库连接的多种方法,可以有效利用ASP技术访问WEB数据库。  相似文献   

9.
We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b?? in the same page if b?? content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.  相似文献   

10.
A generalized paging problem is considered. Each request is expressed as a set of u pages. In order to satisfy the request, at least one of these pages must be in the cache. Therefore, on a page fault, the algorithm must load into the cache at least one page out of the u pages given in the request. The problem arises in systems in which requests can be serviced by various utilities (e.g., a request for a data that lies in various web-pages) and a single utility can service many requests (e.g., a web-page containing various data). The server has the freedom to select the utility that will service the next request and hopefully additional requests in the future. The case u=1 is simply the classical paging problem, which is known to be polynomially solvable. We show that for any u>1 the offline problem is NP-hard and hard to approximate if the cache size k is part of the input, but solvable in polynomial time for constant values of k. We consider mainly online algorithms, and design competitive algorithms for arbitrary values of k, u. We study in more detail the cases where u and k are small. We also give an algorithm which uses resource augmentation and which is asymptotically optimal for u=2. A preliminary version of this paper appeared in Proc. Scandinavian Workshop on Algorithm Theory (SWAT 2006), pp. 124–135, 2006. Research of R. van Stee supported by Alexander von Humboldt Foundation.  相似文献   

11.
Modelling a software system is often a challenging prerequisite to automatic test case generation. Modelling the navigation structure of a dynamic web application is particularly challenging because of the presence of a large number of pages that are created dynamically and the difficulty of reaching a dynamic page unless a set of appropriate input values are provided for the parameters. To address the first challenge, some form of abstraction is required to enable scalable modelling. For the second challenge, techniques are required to select appropriate input values for parameters and systematically combine them to reach new pages. This paper presents a combinatorial approach in building a navigation graph for dynamic web applications. The navigation graph can then be used to automatically generate test sequences for testing web applications. The novelty of our approach is twofold. First, we use an abstraction scheme to control the page explosion problem, where pages that are likely to have the same navigation behaviour are grouped together and are represented as a single node in the navigation graph. Second, assuming that values of individual parameters are supplied manually or generated from other techniques, we combine parameter values such that well‐defined combinatorial coverage of input parameter values is achieved. Using combinatorial coverage can significantly reduce the number of requests that have to be submitted while still achieving effective coverage of the navigation structure. We implement our combinatorial approach in a tool, Tansuo, and apply the tool on seven open‐source web applications. We evaluate the effectiveness of Tansuo's exploration process guided by t‐way coverage, for t = 1,2,3, with respect to code coverage, and find that the navigation structure exploration by Tansuo, in general, results in high code coverage (more than 80% statement coverage for most of our subject applications when dead code is removed). We compare Tansuo's effectiveness with two other navigation graph tools and find that Tansuo is more effective. Our empirical results indicate that using pairwise coverage in Tansuo results in the efficient generation of navigation graphs and effective exploration of dynamic web applications. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
《Computer Networks》1999,31(11-16):1725-1736
The World-Wide Web provides remote access to pages using its own naming scheme (URLs), transfer protocol (HTTP), and cache algorithms. Not only does using these special-purpose mechanisms have performance implications, but they make it impossible for standard Unix applications to access the Web. Gecko is a system that provides access to the Web via the NFS protocol. URLs are mapped to Unix file names, providing unmodified applications access to Web pages; pages are transferred from the Gecko server to the clients using NFS instead of HTTP, significantly improving performance; and NFS's cache consistency mechanism ensures that all clients have the same version of a page. Applications access pages as they would Unix files. A client-side proxy translates HTTP requests into file accesses, allowing existing Web applications to use Gecko. Experiments performed on our prototype show that Gecko is able to provide this additional functionality at a performance level that exceeds that of HTTP.  相似文献   

13.
查询结果缓存可以对查询结果的文档标识符集合或者实际的返回页面进行缓存,以提高用户查询的响应速度,相应的缓存形式可以分别称之为标识符缓存或页面缓存。对于固定大小的内存,标识符缓存可以获得更高的命中率,而页面缓存可以达到更高的响应速度。该文根据用户查询访问的时间局部性和空间局部性,提出了一种新颖的基于时空局部性的层次化结果缓存机制。首先,该机制将固定大小的结果缓存划分为两层:页面缓存和标识符缓存。对于用户提交的查询,该机制会首先使用第一层的页面缓存进行应答,如果未能命中,则继续尝试使用第二层的标识符缓存。实验显示这种层次化的缓存机制较传统的仅依赖于单一缓存形式的机制,在平均查询响应时间上,取得了可观的性能提升:例如,相对单纯的页面缓存,平均达到9%,最好情况下达到11%。其次,该机制在标识符缓存的基础上,设计了一种启发式的预取策略,对用户查询检索的空间局部性进行挖掘。实验显示,这种预取策略的融合,能进一步促进检索系统性能的有效提升,从而最终建立起一套时空完备的、有效的结果缓存机制。  相似文献   

14.
The tamper-proof of web pages is of great importance. Some watermarking schemes have been reported to solve this problem. However, both these watermarking schemes and the traditional hash methods have a problem of increasing file size. In this paper, we propose a novel watermarking scheme for the tamper-proof of web pages, which is free of this embarrassment. For a web page, the proposed scheme generates watermarks based on the principal component analysis (PCA) technique. The watermarks are then embedded into the web page through the upper and lower cases of letters in HTML tags. When a watermarked web page is tampered, the extracted watermarks can detect the modifications to the web page, thus we can keep the tampered one from being published. Extensive experiments are performed on the proposed scheme and the results show that the proposed scheme can be a feasible and efficient tool for the tamper-proof of web pages.  相似文献   

15.
In this work, we present a novel approach for the efficient materialization of dynamic web pages in e-commerce applications such as an online retail store with millions of items, hundreds of HTTP requests per second and tens of dynamic web page types. In such applications, user satisfaction, as measured in terms of response time (QoS) and content freshness (QoD), determines their success especially under heavy workload. The novelty of our materialization approach over existing ones is that, it considers the data dependencies between content fragments of a dynamic web page. We introduce two new semantic-based data freshness metrics that capture the content dependencies and propose two materialization algorithms that balance QoS and QoD. In our evaluation, we use a real-world experimental system that resembles an online bookstore and show that our approach outperforms existing QoS-QoD balancing approaches in terms of server-side response time (throughput), data freshness and scalability.  相似文献   

16.
We study web caching with request reordering. The goal is to maintain a cache of web documents so that a sequence of requests can be served at low cost. To improve cache hit rates, a limited reordering of requests is allowed. Feder et al. (Proceedings of the 13th ACM–SIAM Symposium on Discrete Algorithms, pp. 104–105, 2002), who recently introduced this problem, considered caches of size 1, i.e. a cache can store one document. They presented an offline algorithm based on dynamic programming as well as online algorithms that achieve constant factor competitive ratios. For arbitrary cache sizes, Feder et al. (Theor. Comput. Sci. 324:201–218, 2004) gave online strategies that have nearly optimal competitive ratios in several cost models.  相似文献   

17.
We present a scheme for dynamic generation of web pages. The scheme separates presentation from content. Furthermore, by utilizing the theme metaphor, the scheme makes it easy to develop a web site with several design themes, each having its own template, graphics and style sheet characteristics. The proposed scheme relies on versatile substitution mechanisms, which nonetheless use simplified syntax. Most importantly, the scheme utilizes XML for defining custom tags that are transformed into HTML using the innovative concept of HTML patterns. The scheme was initially implemented as a COM component (PageGen) and later ported to Microsoft .NET. it has proven to be quite effective for Active Server Pages (and ASP.NET) sites used to host online books and course material. However, the scheme is general enough for use with any database-centric site or content as well as being adapted to other web application frameworks such as PHP and JSP.  相似文献   

18.
Web页面中计数器技术研究   总被引:9,自引:0,他引:9  
Web页面计数器能够直观地反映该Web站点受关心的程度,一个好的Web页计数器应该方便使用,并具有较高的性能,Web页面计数器技术充分反映了动态Web页面技术的发展现状,本文给出了几种实现了Web页面计数器的技术,并对这些进行了比较。  相似文献   

19.
ASP.NET下利用动态网页技术生成静态HTML页面的方法   总被引:1,自引:0,他引:1  
介绍了一种在ASP.NET环境下利用动态网页技术生成静态HTML页面的方法.利用这种技术,网站内容管理人员在添加网页时直接利用后台管理发布程序就把页面存放成HTML静态文件,它有生成页面简单、快速的优点.这种技术对于访问量大的网站尤其适用,可以减轻服务器端运行程序和读取数据库的压力,提高了网站的数据存取效率,生成的静态页面也更利于搜索引擎收录.  相似文献   

20.
The web is the largest distributed database deploying time-to-live-based weak consistency. Each object has a lifetime-duration assigned to it by its origin server. A copy of the object fetched from its origin server is received with maximum time-to-live (TTL) that equals its lifetime duration. In contrast a copy obtained through a cache have shorter TTL since the age (elapsed time since fetched from the origin) is deducted from its lifetime duration. A request served by a cache constitutes a hit if the cache has a fresh copy of the object. Otherwise, the request is considered a miss and is propagated to another server. It is evident that the number of cache misses depends on the age of the copies the cache receives. Thus, a cache that sends requests to another cache would suffer more misses than a cache that sends requests directly to an authoritative server.In this paper, we model and analyze the effect of age on the performance of various cache configurations. We consider a low-level cache that fetches objects either from their origin servers or from other caches and analyze its miss-rate as function of its fetching policy. We distinguish between three basic fetching policies, namely, fetching always from the origin, fetching always from the same high-level cache, and fetching from a “random” high-level cache. We explore the relationships between these policies in terms of the miss-rate achieved by the low-level cache, both on worst-case sequences, and on sequences generated using particular probability distributions.Guided by web caching practice, we consider two variations of the basic policies. In the first variation the high-level cache uses pre-term refreshes to keep a copy with lower age. In the second variation the low-level cache uses extended lifetime duration. We analyze how these variations affect the miss-rates. Our theoretical results help to understand how age may affect the miss-rate, and imply guidelines for improving performance of web caches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号