首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 475 毫秒
1.
Web browsers and multimedia players play a critical role in making Web content accessible to people with disabilities. Access to Web content requires that Web browsers provide users with final control over the styling of rendered content, the type of content rendered and the execution of automated behaviors. The features available in Web browsers determine the extent to which users can orient themselves and navigate the structure of Web resources. The World Wide Web Consortium (W3C) User Agent Guidelines are part of the W3C Web Accessibility Initiative, the guidelines provide a comprehensive resource to Web browser and multimedia developers on the features needed to render Web content more accessibly to people with disabilities. UAAG 1.0 was developed over a period of four years and included extensive reviews to demonstrate that the proposed requirements can be implemented.  相似文献   

2.
The Paper emphasizes relativity between Web usage mining and the application of Web site structure and content.It has shown that the amount of effort involved in processing and quantifying the structure and content of a Web site is well worth in performing Web usage mining.The necessity of combining Web site structure and content with Web usage mining process is further proved.  相似文献   

3.
Automatic identification of informative sections of Web pages   总被引:3,自引:0,他引:3  
Web pages - especially dynamically generated ones - contain several items that cannot be classified as the "primary content," e.g., navigation sidebars, advertisements, copyright notices, etc. Most clients and end-users search for the primary content, and largely do not seek the noninformative content. A tool that assists an end-user or application to search and process information from Web pages automatically, must separate the "primary content sections" from the other content sections. We call these sections as "Web page blocks" or just "blocks." First, a tool must segment the Web pages into Web page blocks and, second, the tool must separate the primary content blocks from the noninformative content blocks. In this paper, we formally define Web page blocks and devise a new algorithm to partition an HTML page into constituent Web page blocks. We then propose four new algorithms, ContentExtractor, FeatureExtractor, K-FeatureExtractor, and L-Extractor. These algorithms identify primary content blocks by 1) looking for blocks that do not occur a large number of times across Web pages, by 2) looking for blocks with desired features, and by 3) using classifiers, trained with block-features, respectively. While operating on several thousand Web pages obtained from various Web sites, our algorithms outperform several existing algorithms with respect to runtime and/or accuracy. Furthermore, we show that a Web cache system that applies our algorithms to remove noninformative content blocks and to identify similar blocks across Web pages can achieve significant storage savings.  相似文献   

4.
The Semantic Web Initiative envisions a Web wherein information is offered free of presentation, allowing more effective exchange and mixing across web sites and across web pages. But without substantial Semantic Web content, few tools will be written to consume it; without many such tools, there is little appeal to publish Semantic Web content.To break this chicken-and-egg problem, thus enabling more flexible information access, we have created a web browser extension called Piggy Bank that lets users make use of Semantic Web content within Web content as users browse the Web. Wherever Semantic Web content is not available, Piggy Bank can invoke screenscrapers to re-structure information within web pages into Semantic Web format. Through the use of Semantic Web technologies, Piggy Bank provides direct, immediate benefits to users in their use of the existing Web. Thus, the existence of even just a few Semantic Web-enabled sites or a few scrapers already benefits users. Piggy Bank thereby offers an easy, incremental upgrade path to users without requiring a wholesale adoption of the Semantic Web's vision.To further improve this Semantic Web experience, we have created Semantic Bank, a web server application that lets Piggy Bank users share the Semantic Web information they have collected, enabling collaborative efforts to build sophisticated Semantic Web information repositories through simple, everyday's use of Piggy Bank.  相似文献   

5.
基于标记窗的网页正文信息提取方法*   总被引:12,自引:2,他引:12  
提出了基于标记窗的网页正文信息提取方法.该方法不仅适合于处理一个网页中所有正文信息均放在一个td 中的情况,也适合于处理网页正文放在多个td中的情况,还可以处理网页正文文字短到与网页其余部分文字(如广告、导航条、版权)长度相当的情况.尤其重要的是,它能够解决非Table 结构的网页正文提取问题.实验表明,该方法可以提高网页正文提取的准确率,适用性强.  相似文献   

6.
王瑞  周喜  李晓 《计算机工程》2012,38(21):153-156,160
网页表达的主要信息通常隐藏在大量无关的结构与文字中,使正文信息不能被迅速获取,影响文本检测的效率。为此,根据维吾尔网页的非规范化编码、论坛型网页较多等特点,提出一种基于正文相关度的正文提取算法,并建立上下文正文密度和节点间正文比例等数学模型对算法进行改进。对大量维吾尔网页的实验结果表明,该算法具有较好的正文提取正确率和召回率,能够有效地从维吾尔网页中提取到所需的正文信息。  相似文献   

7.
Cellular phones are widely used to access the Web. However, most available Web pages are designed for desktop PCs, and it is inconvenient to browse these large Web pages on a cellular phone with a small screen and poor interfaces. Users who browse a Web page on a cellular phone have to scroll through the whole page to find the desired content, and must then search and scroll within that content in detail to get useful information. This paper describes the design and implementation of a novel Web browsing system for cellular phones. This system includes a Web page overview to reduce scrolling operations when finding objective content within the page. Furthermore, it adaptively presents content according to its characteristics to reduce burdensome operations when searching within content.  相似文献   

8.
While Web applications serve personal needs and business functions almost in every area, the responsiveness and performance of Web applications is the key factor to their success. With continuous innovation on Web technology, Web sites have evolved from document Web to application Web and further to service Web recently. During the evolution course, Web sites serving dynamic content started to grow exponentially to dominate the area. Dynamic pages require servers to generate the response content per-user request before delivering it back to the user, which introduces network traffic, server workload and results in extra latency. This drew tremendous efforts from both research and industry on how to accelerate the dynamic content generation and distribution in order to reduce the user perceived latency and improve the application performance, among which caching is a vital technology. This paper attempts to survey the innovative research and products recently published in this area and presents them in a road map style. It first examines the dynamic characteristics of Web applications and the inherent challenges for caching. Then the rest of this paper explores the varied acceleration solutions on content generation process and content delivery process, respectively, followed by the analysis of how different caching solutions fit Web applications of different characteristics. Finally it ends with the future trends on Web caching technique and a summary of the survey.  相似文献   

9.

Web content nowadays can also be accessed through new generation of Internet connected TVs. However, these products failed to change users’ behavior when consuming online content. Users still prefer personal computers to access Web content. Certainly, most of the online content is still designed to be accessed by personal computers or mobile devices. In order to overcome the usability problem of Web content consumption on TVs, this paper presents a knowledge graph based video generation system that automatically converts textual Web content into videos using semantic Web and computer graphics based technologies. As a use case, Wikipedia articles are automatically converted into videos. The effectiveness of the proposed system is validated empirically via opinion surveys. Fifty percent of survey users indicated that they found generated videos enjoyable and 42 % of them indicated that they would like to use our system to consume Web content on their TVs.

  相似文献   

10.
Extracting significant Website Key Objects: A Semantic Web mining approach   总被引:1,自引:0,他引:1  
Web mining has been traditionally used in different application domains in order to enhance the content that Web users are accessing. Likewise, Website administrators are interested in finding new approaches to improve their Website content according to their users' preferences. Furthermore, the Semantic Web has been considered as an alternative to represent Web content in a way which can be used by intelligent techniques to provide the organization, meaning, and definition of Web content. In this work, we define the Website Key Object Extraction problem, whose solution is based on a Semantic Web mining approach to extract from a given Website core ontology, new relations between objects according to their Web user interests. This methodology was applied to a real Website, whose results showed that the automatic extraction of Key Objects is highly competitive against traditional surveys applied to Web users.  相似文献   

11.
基于XML的Web内容挖掘逐渐成为Web数据挖掘的重要研究课题。论文定义了用户模型,通过三种途径建立用户模型,将XML和个性化技术应用到Web内容挖掘,设计了一个基于XML的个性化Web内容挖掘系统(PWCMS),并讨论了PWCMS的关键技术及实现。实践证明,将XML和个性化技术应用到Web内容挖掘是有效的。  相似文献   

12.
周文刚  马占欣 《微机发展》2007,17(4):120-124
对Web页进行必要的、有效的内容过滤对于营造健康、安全的网络环境具有重要的意义。重现用户成功访问过的Web页内容,可以对网络访问进行事后监督,为过滤机制的完善提供相应数据。文中分析了Web页的访问流程,基于HTTP代理服务器,在应用层实现了对Web页的关键字过滤和基于语义的内容过滤,并通过将客户机成功访问过的Web页存储在代理服务器硬盘上,实现了内容重现。试验表明,语义过滤能较好地甄别文本的不同观点,准确度较单纯关键字过滤有明显提高。  相似文献   

13.
文中针对智能浏览器对面向内容的Web需求,提出了基于XML的内容定义格式——HDF;介绍了HDF的描述和处理方式,以及面向内容的Web页面的特点;阐述了如何通过HDF实现面向内容的Web方式。  相似文献   

14.
网页信息指网页的正文、标题、发布时间、媒体等,每个信息都存在于HTML文档特定的标签中,自动获取这些标签可以实现在相同模板下的网页信息自动提取,对于大规模抓取网页内容有很大帮助。由于在相同模板下不同网页之间结构一致,网页信息有一定统计特征,提出了一种基于结构对比和特征学习的网页信息标签自动提取算法。该算法包含三个步骤:网页对比、内容识别和标签提取。在51个模块下对1?620个网页进行测试,实验结果表明,通过提取标签获取网页信息不仅速度快,而且抓取的内容更加准确。  相似文献   

15.
熊忠阳  蔺显强  张玉芳  牙漫 《计算机工程》2013,(12):200-203,210
网页中存在正文信息以及与正文无关的信息,无关信息的存在对Web页面的分类、存储及检索等带来负面的影响。为降低无关信息的影响,从网页的结构特征和文本特征出发,提出一种结合网页结构特征与文本特征的正文提取方法。通过正则表达式去除网页中的无关元素,完成对网页的初次过滤。根据网页的结构特征对网页进行线性分块,依据各个块的文本特征将其区分为链接块与文本块,并利用噪音块连续出现的结果完成对正文部分的定位,得到网页正文信息。实验结果表明,该方法能够快速准确地提取网页的正文内容。  相似文献   

16.
A transformation-based Web site can keep the contents of a Web site consistent by furnishing a single database and a set of transformation programs, each generating a Web page from the database. However, when someone notices an error or stale content on a Web page in this style of Web site construction, the Web site maintainer must access a possibly huge database to update the corresponding content. In this paper, we propose a new approach to Web site construction based on bidirectional transformation, and report our design and implementation of a practical updating system called Vu-X. We bring the idea of bidirectional transformation to Web site construction, describing not only a forward transformation for generating Web pages from the database but also a backward transformation for reflecting modifications on the Web pages to the database. By use of the bidirectional transformation language Bi-X, we can obtain both transformations only by specifying a forward transformation. Our Vu-X system is implemented as a Web server built upon the Bi-X transformation engine, which can keep the content of Web sites consistent by updating Web pages in WYSIWYG style on Web browsers.  相似文献   

17.
18.
Web搜索中的数据挖掘技术研究   总被引:4,自引:0,他引:4  
WWW已经成为世界上是大的分布式信息系统,如何快速有效地搜索用户所需的资源一直是研究热点。Web挖掘也已经成为数据挖掘中相对成熟的一个分支。本文针对Web资源搜索中利用的相关Web挖掘技术做一个综述。文章首先对目前流行的Web内容挖掘方面的常用技术进行了研究分析,然后着重研究了Web结构挖掘技术,介绍并评价了多种算法模型。接着介绍了用户使用的挖掘,并提出了Web内容挖掘技术,结构挖掘技术和用户使用挖掘相结合,应用于开发智能型搜索引擎的趋势。  相似文献   

19.
Research into the Internet has experienced a tremendous growth within the field of information systems. In this sense, the recent literature focuses on more complex research topics. However, there is a need to further investigate into the more basic and primary use of Internet, the external Web site to interact with stakeholders. By external, we mean publicly accessible contents. This paper develops a framework that allows evaluation of external Web content of business Web sites and examines the influence on firm performance. Here, external Web content is studied according to three Web orientations: e-information, e-communication, and e-transaction. In addition, differences in external Web content are analysed according to two contingency factors: business size and business industry. To achieve these goals, a sample comprising 288 Spanish SMEs firms was employed. The results show a positive relationship between external Web content and firm performance. Furthermore, this research indicates the existence of complementarities among the Web orientations. Thus, existing e-information was found as critical for enabling e-transaction to impact upon firm performance. Additionally, e-information and e-communication (jointly considered) were found to mutually reinforce the impact of e-transaction on firm performance. The results also confirm that external Web content is not related to business size and differs slightly by business industry.  相似文献   

20.
Web挖掘技术研究   总被引:10,自引:0,他引:10  
吉根林  孙志挥 《计算机工程》2002,28(10):16-17,146
对Web挖掘技术作了全面论述,介绍了Web挖掘的分类和应用,给出了Web数据模型,探讨了Web内容挖掘,Web结构挖掘以及Web日志挖掘的基本思想和方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号