首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT

This article discusses methods for retrieving information from dead or unavailable Web servers. It describes using the Google search engine cache and the Wayback Machine as possible means of recovering data that would otherwise be lost after a Web server or Web site is no longer online.  相似文献   

2.
This paper presents a Page rank-based prefetching technique for accesses to Web page clusters. The approach uses the link structure of a requested page to determine the “most important” linked pages and to identify the page(s) to be prefetched. The underlying premise of our approach is that in the case of cluster accesses, the next pages requested by users of the Web server are typically based on the current and previous pages requested. Furthermore, if the requested pages have a lot of links to some “important” page, that page has a higher probability of being the next one requested. An experimental evaluation of the prefetching mechanism is presented using real server logs. The results show that the Page rank-based scheme does better than random prefetching for clustered accesses, with hit rates of 90% in some cases.  相似文献   

3.
An emerging trend in social media is for users to create and publish “stories”, or curated lists of  Web resources, with the purpose of creating a particular narrative of interest to the user. While some stories on the Web are automatically generated, such as Facebook’s “Year in Review”, one of the most popular storytelling services is “Storify”, which provides users with curation tools to select, arrange, and annotate stories with content from social media and the Web at large. We would like to use tools, such as Storify, to present (semi-)automatically created summaries of archival collections. To support automatic story creation, we need to better understand as a baseline the structural characteristics of popular (i.e., receiving the most views) human-generated stories. We investigated 14,568 stories from Storify, comprising 1,251,160 individual resources, and found that popular stories (i.e., top 25 % of views normalized by time available on the Web) have the following characteristics: 2/28/1950 elements (min/median/max), a median of 12 multimedia resources (e.g., images, video), 38 % receive continuing edits, and 11 % of their elements are missing from the live Web. We also checked the population of Archive-It collections (3109 collections comprising 305,522 seed URIs) for better understanding the characteristics of the collections that we intend to summarize. We found that the resources in human-generated stories are different from the resources in Archive-It collections. In summarizing a collection, we can only choose from what is archived (e.g., twitter.com is popular in Storify, but rare in Archive-It). However, some other characteristics of human-generated stories will be applicable, such as the number of resources.  相似文献   

4.
Search services are the main interface through which people discover information on the Internet. A fundamental challenge in testing search services is the lack of oracles. The sheer volume of data on the Internet prohibits testers from verifying the results. Furthermore, it is difficult to objectively assess the ranking quality because different assessors can have very different opinions on the relevance of a Web page to a query. This paper presents a novel method for automatically testing search services without the need of a human oracle. The experimental findings reveal that some commonly used search engines, including Google, Yahoo!, and Live Search, are not as reliable as what most users would expect. For example, they may fail to find pages that exist in their own repositories, or rank pages in a way that is logically inconsistent. Suggestions are made for search service providers to improve their service quality. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Awkward arrangement of documents in an HTML tree can discourage users from staying at a Web site. The authors have developed an algorithm for dynamically altering the organization of pages at sites where the main design objective is to give users fast access to requested data. The algorithm reads information from the HTTP log file and computes the relative popularity of pages within the site. Based on popularity (defined as a relationship between number of accesses, time spent, and location of the page), the hierarchical relationships between pages are rearranged to maximize accessibility for popular pages  相似文献   

6.
7.
Collaborative social annotation systems allow users to record and share their original keywords or tag attachments to Web resources such as Web pages, photos, or videos. These annotations are a method for organizing and labeling information. They have the potential to help users navigate the Web and locate the needed resources. However, since annotations are posted by users under no central control, there exist problems such as spam and synonymous annotations. To efficiently use annotation information to facilitate knowledge discovery from the Web, it is advantageous if we organize social annotations from semantic perspective and embed them into algorithms for knowledge discovery. This inspires the Web page recommendation with annotations, in which users and Web pages are clustered so that semantically similar items can be related. In this paper we propose four graphic models which cluster users, Web pages and annotations and recommend Web pages for given users by assigning items to the right cluster first. The algorithms are then compared to the classical collaborative filtering recommendation method on a real-world data set. Our result indicates that the graphic models provide better recommendation performance and are robust to fit for the real applications.  相似文献   

8.
This paper presents a Page Rank based prefetching technique for accesses to Web page clusters. The approach uses the link structure of a requested page to determine the most important linked pages and to identify the page(s) to be prefetched. The underlying premise of our approach is that in the case of cluster accesses, the next pages requested by users of the Web server are typically based on the current and previous pages requested. Furthermore, if the requested pages have a lot of links to some important page, that page has a higher probability of being the next one requested. An experimental evaluation of the prefetching mechanism is presented using real server logs. The results show that the Page-Rank based scheme does better than random prefetching for clustered accesses, with hit rates of 90% in some cases.  相似文献   

9.
如何有效地分析用户的需求,帮助用户从因特网的信息海洋中发现他们感兴趣的信息和资源.已经成为一项迫切而重要的课题。解决这些问题的一个途径,就是将传统的数据挖掘技术与Web结合起来,进行Web数据挖掘。其中的Web日志挖掘可以掌握用户在浏览站点时的行为,并且将挖掘出的用户访问模式应用于网站上,在改善Web站点的结构以及页面间的超链接结构,提高站点的服务质量等方面有重要的意义。  相似文献   

10.
如何有效地分析用户的需求,帮助用户从因特网的信息海洋中发现他们感兴趣的信息和资源,已经成为一项迫切而重要的课题。解决这些问题的一个途径,就是将传统的数据挖掘技术与Web结合起来,进行Web数据挖掘。其中的Web日志挖掘可以掌握用户在浏览站点时的行为,并且将挖掘出的用户访问模式应用于网站上,在改善Web站点的结构以及页面间的超链接结构,提高站点的服务质量等方面有重要的意义。  相似文献   

11.
针对传统PageRank算法存在的平分链接权重和忽略用户兴趣等问题,提出一种基于学习自动机和用户兴趣的页面排序算法LUPR。在所提方法中,给每个网页分配学习自动机,其功能是确定网页之间超链接的权重。通过对用户行为进一步分析,以用户的浏览行为衡量用户对网页的兴趣度,从而获得兴趣度因子。该算法根据网页间的超链接和用户对网页的兴趣度衡量网页权重计算每个网页的排名。最后的仿真实验表明,较传统的PageRank算法和WPR算法,改进后的LUPR算法在一定程度上提高了信息检索的准确度和用户满意度。  相似文献   

12.
梁秋实  吴一雷  封磊 《计算机应用》2012,32(11):2989-2993
在微博搜索领域,单纯依赖于粉丝数量的搜索排名使刷粉行为有了可乘之机,通过将用户看作网页,将用户间的“关注”关系看作网页间的链接关系,使PageRank关于网页等级的基本思想融入到微博用户搜索,并引入一个状态转移矩阵和一个自动迭代的MapReduce工作流将计算过程并行化,进而提出一种基于MapReduce的微博用户搜索排名算法。在Hadoop平台上对该算法进行了实验分析,结果表明,该算法避免了用户排名单纯与其粉丝数量相关,使那些更具“重要性”的用户在搜索结果中的排名获得提升,提高了搜索结果的相关性和质量。  相似文献   

13.
In the last 20 years, the World Wide Web (Web) has gone from being the means of disseminating information for a few scientists to a universal means of disseminating information across the globe. While the Web provides an unprecedented level of access to information for many, if not properly designed, Web sites can actually create a number of barriers to information access to persons with disabilities. The purpose of this study was to evaluate the accessibility of home pages of University Departments of Special Education. A total of 51 Special Education departmental Web sites were located using a popular online search engine and evaluated for accessibility. Two Web site evaluation programs were used to determine whether the Web sites meet minimum accessibility guidelines, and one of them was used to quantify the number of accessibility errors on each site. The results indicated that most (97 %) of the pages evaluated had accessibility problems, many (39 %) of which were severe and should be given a high priority for correcting. The good news is the majority of errors can easily be corrected. The work reflects a need for Departments of Special Education to examine the accessibility of their home pages. Direction for improving accessibility is provided.  相似文献   

14.
Most users assume that their use of Internet services is implicitly private and anonymous, so it can be quite eye-opening to find out how much about ourselves and our companies we reveal by seemingly innocuous words we use to search, the maps we view, and the other "free" services we use on the Internet. The Internet has become one of the most central aspects of our world, and we react to both the mundane and important events in our personal and professional lives by turning to it. Unfortunately, these events, great or small, continue to exist for an indeterminately long time period on the service providers' servers. Providers of free Web-based applications aren't simply offering their tools as a public service. However altruistic they might be in some regards, these companies have legal obligations to their shareholders to make profits. Although various business models exist for advertising in connection with "free" services, the consistent bottom line is that Web-based companies depend on being able to convince advertisers that it's worth their money to have their ads presented on Web pages and emails. Free Web-based services aren't really free: users pay for them with micropayments of information that add up to a significant sum.  相似文献   

15.
Brewington  B.E. Cybenko  G. 《Computer》2000,33(5):52-58
Most information depreciates over time, so keeping Web pages current presents new design challenges. This article quantifies what “current” means for Web search engines and estimates how often they must reindex the Web to keep current with its changing pages and structure  相似文献   

16.
Looks at some practical approaches to improving the process of interacting with information distributed over the global information infrastructure, specifically for the World Wide Web. The introduction of NCSA Mosaic changed the way we get information over the Web. With the click of a button, Mosaic's graphical user interface made it possible to browse and retrieve literally any information accessible through the Web. This is true if you know the document's Universal Resource Locator (URL), an identifier expressing its location. You type in this address and sooner or later (depending on document size and traffic at the time), the document appears on your screen. If you do not know the URL, or even which documents contain the requested information, you might want to browse or search the Web. Interacting with information on the Web starts with browsing and searching; continues with selecting, digesting and assimilating information; terminates with generating new information; and begins anew. The user's needs and desires must occupy center stage during development of Web systems and sites. The approach chosen should let users interact easily and effectively with the information contained throughout large arrays of documents. Visualization, computer graphics, and just plain common sense in designing Web pages and presenting information make the process better for users. This article discusses how to construct effective presentations (Web pages)  相似文献   

17.
《Computer Networks》1999,31(11-16):1331-1345
This paper discusses how to augment the World Wide Web with an open hypermedia service (Webvise) that provides structures such as contexts, links, annotations, and guided tours stored in hypermedia databases external to the Web pages. This includes the ability for users collaboratively to create links from parts of HTML Web pages they do not own and support for creating links to parts of Web pages without writing HTML target tags. The method for locating parts of Web pages can locate parts of pages across frame hierarchies and it also supports certain repairs of links that break due to modified Web pages. Support for providing links to/from parts of non-HTML data, such as sound and movie, will be possible via interfaces to plug-ins and Java-based media players.The hypermedia structures are stored in a hypermedia database, developed from the Devise Hypermedia framework, and the service is available on the Web via an ordinary URL. The best user interface for creating and manipulating the structures is currently provided for the Microsoft Internet Explorer 4.x browser through COM integration that utilizes the Explorer's DOM representation of Web-pages. But the structures can also be manipulated and used via special Java applets and a pure proxy server solution is provided for users who only need to browse the structures. A user can create and use the external structures as `transparency' layers on top of arbitrary Web pages, the user can switch between viewing pages with one or more layers (contexts) of structures or without any external structures imposed on them.  相似文献   

18.
19.
Spam: it's not just for inboxes anymore   总被引:1,自引:0,他引:1  
Gyongyi  Z. Garcia-Molina  H. 《Computer》2005,38(10):28-34
E-mail spam is a nuisance that every user has come to expect. But Web spammers prey on unsuspecting users and undermine search engines by subverting search results to increase the visibility of their pages.  相似文献   

20.
When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号