首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Indexing and Retrieval of Document Images: A Survey   总被引:2,自引:0,他引:2  
The economic feasibility of maintaining large data bases of document images has created a tremendous demand for robust ways to access and manipulate the information these images contain. In an attempt to move toward a paperless office, large quantities of printed documents are often scanned and archived as images, without adequate index information. One way to provide traditional data-base indexing and retrieval capabilities is to fully convert the document to an electronic representation which can be indexed automatically. Unfortunately, there are many factors which prohibit complete conversion including high cost, low document quality, and the fact that many nontext components cannot be adequately represented in a converted form. In such cases, it can be advantageous to maintain a copy of and use the document in image form. In this paper, we provide a survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion. We briefly discuss traditional text indexing techniques on imperfect data and the retrieval of partially converted documents. This is followed by a more comprehensive review of techniques for the direct characterization, manipulation, and retrieval, of images of documents containing text, graphics, and scene images.  相似文献   

2.
In this paper, we advance a technique to develop a user profile for information retrieval through knowledge acquisition techniques. The profile bridges the discrepancy between user-expressed keywords and system-recognizable index terms. The approach presented in this paper is based on the application of personal construct theory to determine a user's vocabulary and his/her view of different documents in a training set. The elicited knowledge is used to develop a model for each phrase/concept given by the user by employing machine learning techniques.Our model correlates the concepts in a user's vocabulary to the index terms present in the documents in the training set. Computation of dependence between the user phrases also contributes in the development of the user profile and in creating a classification of documents. The resulting system is capable of automatically identifying the user concepts and query translation to index terms computed by the conventional indexing process. The system is evaluated by using the standard measures of precision and recall by comparing its performance against the performance of the smart system for different queries.This research is supported by the NSF grant IRI-8805875.  相似文献   

3.
4.
Thomas  C.G. 《Computer》1995,28(5):84-86
At GMD's Human Computer Interaction Research Division, we are working on a system called BASAR (Building Agents Supporting Adaptive Retrieval). BASAR, a Smalltalk-based program that runs on Unix platforms, involves intelligent agents that actively support construction and management of a user's personal information space. Possible agent tasks include supporting navigation and browsing; simplifying information retrieval; sorting, organizing, and indexing user-jobs; and filtering information from large databases. BASAR's interface is based on the concept of indirect management. That is, computer agents support human users in a cooperative process where both agents and users can initiate communication, monitor events, and perform tasks  相似文献   

5.
基于文档实例的中文信息检索   总被引:2,自引:0,他引:2  
传统的信息检索系统基于关键词建立索引并进行信息检索.这些系统存在查询返回文档集大、准确率低和普通用户不便于构造查询等不足.为此,该文提出基于文档实例的信息检索,即以已有文档作为样本,在文档库中检索与样本文档相似的所有文档.文中给出了基于文档实例的中文信息检索的解决方法和实现技术.初步实验结果表明该方法是行之有效的.  相似文献   

6.
王志华  金燕  李占波 《计算机工程》2011,37(11):83-85,88
基于内容的语义Web检索只考虑内容本身,没有考虑用户的不同,不能准确反映用户需求。为此,提出一个自适应语义Web检索框架,对于Web中文文档,借助HowNet知识库给出一种本体学习方法,通过提取用户客观、显式和隐式信息建立用户信息库,并设计用户初始查询本体和个性化查询本体构建算法,从而实现用户的自适应检索。实验结果表明,该方法具有较高的检索效率。  相似文献   

7.
This paper describes the integration of computer-supported cooperative work (CSCW), multimedia, and hyperstructures into a single framework calledCooperative Hyper Media (CHM). The concept of CHM supports groups of users acting on a single hypermedia document. Cooperative HyperMedia is a layered concept that integrates several notions related to the production of hypermedia documents, namely, document contents, access, and organization. To test the applicability of the concepts of this approach to CSCW we have realized a generic CHM editing architecture (Co MEdiA), telecommunication mechanism for multimedia objects (Tele Media), and service for cooperative work over long distances for time-dependent media (MISTER COOL).  相似文献   

8.
More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures.  相似文献   

9.
一种自适应超媒体教学课件的组织方法   总被引:2,自引:0,他引:2  
讨论了一种教学超媒体课件的组织方法:围绕概念组织课件,概念由文档的集合解释。概念之间的链表示语义间的相互联系,文档和链的信息存储在数据库里,每个文档有相应的难度级别。系统根据学对每个概念掌握的情况信息,引导学生到合适的文档处学习。  相似文献   

10.
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

11.
Adaptive Hypermedia   总被引:9,自引:2,他引:7  
Adaptive hypermedia is a relatively new direction of research on the crossroads of hypermedia and user modeling. Adaptive hypermedia systems build a model of the goals, preferences and knowledge of each individual user, and use this model throughout the interaction with the user, in order to adapt to the needs of that user. The goal of this paper is to present the state of the art in adaptive hypermedia at the eve of the year 2000, and to highlight some prospects for the future. This paper attempts to serve both the newcomers and the experts in the area of adaptive hypermedia by building on an earlier comprehensive review (Brusilovsky, 1996; Brusilovsky, 1998).  相似文献   

12.
A novel taxonomic model for structuring and accessing large heterogeneous information bases is presented. The model is designed to simplify both classification and access by computer-illiterate people. It defines simple and intuitive operations to access large information bases at the conceptual level and at different levels of abstraction, in a totally assisted way, through a simple, yet effective visual interface. The model can also be used to summarize result sets computed by other query methods, such as information retrieval, shape retrieval, etc., and to provide user maps for complex hypermedia networks. The experience gained by applying this model to commercial applications is reported  相似文献   

13.
Methods and techniques of adaptive hypermedia   总被引:35,自引:5,他引:30  
Adaptive hypermedia is a new direction of research within the area of adaptive and user model-based interfaces. Adaptive hypermedia (AH) systems build a model of the individual user and apply it for adaptation to that user, for example, to adapt the content of a hypermedia page to the user's knowledge and goals, or to suggest the most relevant links to follow. AH systems are used now in several application areas where the hyperspace is reasonably large and where a hypermedia application is expected to be used by individuals with different goals, knowledge and backgrounds. This paper is a review of existing work on adaptive hypermedia. The paper is centered around a set of identified methods and techniques of AH. It introduces several dimensions of classification of AH systems, methods and techniques and describes the most important of them.  相似文献   

14.
Distributed information processing, in many WWW applications, requires access to and the transfer and synchronization of large multimedia data objects (MDOs) across the communication network. Moreover, end users expect very fast response times and high QoS. Since the transfer of large MDOs across the communication network contributes to the response time observed by the end users, the problem of allocating these MDOs so as to minimize the response time is challenging. This problem becomes more complex in the context of hypermedia documents, in which the MDOs need to be synchronized during presentation to the end users. The basic problem of data allocation in distributed database environments is NP-complete. Therefore, there is a need to pursue and evaluate solutions based on heuristics which generate near-optimal MDO allocation. We address this problem by: (1) conceptualizing this problem by using a navigational model to represent hypermedia documents and their access behavior by end users, and by capturing the synchronization requirements on MDOs, (2) formulating the problem by developing a base case cost model for response time and generalizing it to incorporate user interaction and buffer memory constraints, (3) designing two algorithms to find near-optimal solutions for allocating MDOs of the hypermedia documents while adhering to the synchronization requirements, and (4) evaluating the trade-off between the time complexity to obtain the solution and the solution quality by comparing the solutions generated by the algorithms with the optimal solutions generated through an exhaustive search  相似文献   

15.
Adaptive Educational Hypermedia Systems aim to increase the functionality of hypermedia by making it personalised to individual learners. The adaptive dimension of these systems mainly supports knowledge communication between the system and the learner by adapting the content or the appearance of hypermedia to the knowledge level, goals and other characteristics of each learner. The main objectives are to protect learners from cognitive overload and disorientation by supporting them to find the most relevant content and path in the hyperspace. In the approach presented in this paper, learners' knowledge level and individual traits are used as valuable information to represent learners' current state and personalise the educational system accordingly, in order to facilitate learners to achieve their personal learning goals and objectives. Learners' knowledge level is approached through a qualitative model of the level of performance that learners exhibit with respect to the concepts they study and is used to adapt the lesson contents and the navigation support. Learners' individual traits and especially their learning style represent the way learners perceive and process information, and are exploited to adapt the presentation of the educational material of a lesson. The proposed approach has been implemented through various adaptation technologies and incorporated into a prototype hypermedia system. Finally, a pilot study has been conducted to investigate system's educational effectiveness.  相似文献   

16.
17.
用自适应机制改进Web信息缓存管理的性能   总被引:5,自引:1,他引:4  
目前,各种缓存(caching)技术被广泛应用于Web信息获取过程中,以求减少Internet的网络负载和提高响应速度,如何改进缓存技术从某种意义上成为制约Web信息获取中的特点,然后提出了采用自适应机制改进Web信息缓存管理性能的方法,同时给出了该方法的一些具体实现细节,该方法被应用于基于企业主题的Web信息获取系统(WebCapture)的设计开发过程中,自适应机制的Web信息缓存管理主要采用  相似文献   

18.
Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documentsthat they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This papershows how information retrieval and machine translation can becombined in a cross-language information access frameworkto help overcome the language barrier. We presentencouraging preliminary experimental results using English queries toretrieve documents from the standard Japanese language BMIR-J2retrieval test collection. We outline the scope and purpose ofcross-language information access and provide an example applicationto suggest that technology already exists to provide effective andpotentially useful applications.  相似文献   

19.
The advantages and positive effects of multiple coordinated views on search performance have been documented in several studies. This paper describes the implementation of multiple coordinated views within the Media Watch on Climate Change, a domain-specific news aggregation portal available at www.ecoresearch.net/climate that combines a portfolio of semantic services with a visual information exploration and retrieval interface. The system builds contextualized information spaces by enriching the content repository with geospatial, semantic and temporal annotations, and by applying semi-automated ontology learning to create a controlled vocabulary for structuring the stored information. Portlets visualize the different dimensions of the contextualized information spaces, providing the user with multiple views on the latest news media coverage. Context information facilitates access to complex datasets and helps users navigate large repositories of Web documents. Currently, the system synchronizes information landscapes, domain ontologies, geographic maps, tag clouds and just-in-time information retrieval agents that suggest similar topics and nearby locations.  相似文献   

20.
Although a large amount of research has been conducted on building interfaces for the visually impaired that allows users to read web pages and generate and access information on computers, little development addresses two problems faced by the blind users. First, sighted users can rapidly browse and select information they find useful, and second, sighted users can make much useful information portable through the recent proliferation of personal digital assistants (PDAs). These possibilities are not currently available for blind users. This paper describes an interface that has been built on a standard PDA and allows its user to browse the information stored on it through a combination of screen touches coupled with auditory feedback. The system also supports the storage and management of personal information so that addresses, music, directions, and other supportive information can be readily created and then accessed anytime and anywhere by the PDA user. The paper describes the system along with the related design choices and design rationale. A user study is also reported.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号