首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
Accurately measuring document similarity is important for many text applications, e.g. document similarity search, document recommendation, etc. Most traditional similarity measures are based only on “bag of words” of documents and can well evaluate document topical similarity. In this paper, we propose the notion of document structural similarity, which is expected to further evaluate document similarity by comparing document subtopic structures. Three related factors (i.e. the optimal matching factor, the text order factor and the disturbing factor) are proposed and combined to evaluate document structural similarity, among which the optimal matching factor plays the key role and the other two factors rely on its results. The experimental results demonstrate the high performance of the optimal matching factor for evaluating document topical similarity, which is as well as or better than most popular measures. The user study shows the good ability of the proposed overall measure with all three factors to further find highly similar documents from those topically similar documents, which is much better than that of the popular measures and other baseline structural similarity measures. Xiaojun Wan received a B.Sc. degree in information science, a M.Sc. degree in computer science and a Ph.D. degree in computer science from Peking University, Beijing, China, in 2000, 2003 and 2006, respectively. He is currently a lecturer at Institute of Computer Science and Technology of Peking University. His research interests include information retrieval and natural language processing.  相似文献   

2.
Stochastic regular motifs are evolved for protein sequences using genetic programming. The motif language, SRE-DNA, is a stochastic regular expression language suitable for denoting biosequences. Three restricted versions of SRE-DNA are used as target languages for evolved motifs. The genetic programming experiments are implemented in DCTG-GP, which is a genetic programming system that uses logic-based attribute grammars to define the target language for evolved programs. Earlier preliminary work tested SRE-DNA’s viability as a representation language for aligned protein sequences. This work establishes that SRE-DNA is also suitable for evolving motifs for unaligned sets of sequences. Brian J. Ross, Ph.D.: He is an associate professor of computer science at Brock University, where he has worked since 1992. He obtained his BCSc at the University of Manitoba, Canada, in 1984, his MSc at the University of British Columbia, Canada, in 1988, and his PhD at the University of Edinburgh, Scotland, in 1992. His research interests include evolutionary computation, machine learning, language induction, concurrency, and logic programming.  相似文献   

3.
DCTG-GP is a genetic programming system that uses definite clause translation grammars. A DCTG is a logical version of an attribute grammar that supports the definition of context-free languages, and it allows semantic information associated with a language to be easily accommodated by the grammar. This is useful in genetic programming for defining the interpreter of a target language, or incorporating both syntactic and semantic problem-specific constraints into the evolutionary search. The DCTG-GP system improves on other grammar-based GP systems by permitting nontrivial semantic aspects of the language to be defined with the grammar. It also automatically analyzes grammar rules in order to determine their minimal depth and termination characteristics, which are required when generating random program trees of varied shapes and sizes. An application using DCTG-GP is described. Brian James Ross, Ph.D.: He is an associate professor of computer science at Brock University, where he has worked since 1992. He obtained his BCSc at the University of Manitoba, Canada, in 1984, his MSc at the University of British Columbia, Canada, in 1988, and his PhD at the University of Edinburgh, Scotland, in 1992. His research interests include evolutionary computation, machine learning, language induction, concurrency, and logic programming.  相似文献   

4.
Our objective is spoken-language classification for helpdesk call routing using a scanning understanding and intelligent-system techniques. In particular, we examine simple recurrent networks, support-vector machines and finite-state transducers for their potential in this spoken-language-classification task and we describe an approach to classification of recorded operator-assistance telephone utterances. The main contribution of the paper is a comparison of a variety of techniques in the domain of call routing. Support-vector machines and transducers are shown to have some potential for spoken-language classification, but the performance of the neural networks indicates that a simple recurrent network performs best for helpdesk call routing. Sheila Garfield received a BSc (Hons) in computing from the University of Sunderland in 2000 where, as part of her programme of study, she completed a project associated with aphasic language processing. She received her PhD from the same university, in 2004, for a programme of work connected with hybrid intelligent systems and spoken-language processing. In her PhD thesis, she collaborated with British Telecom and suggested a novel hybrid system for call routing. Her research interests are natural language processing, hybrid systems, intelligent systems. Stefan Wermter holds the Chair in Intelligent Systems and is leading the Intelligent Systems Division at the University of Sunderland, UK. His research interests are intelligent systems, neural networks, cognitive neuroscience, hybrid systems, language processing and learning robots. He has a diploma from the University of Dortmund, Germany, an MSc from the University of Massachusetts, USA, and a PhD in habilitation from the University of Hamburg, Germany, all in Computer Science. He was a Research Scientist at Berkeley, CA, before joining the University of Sunderland. Professor Wermter has written edited, or contributed to 8 books and published about 80 articles on this research area.  相似文献   

5.
In this paper, we concentrate on justifying the decisions we made in developing the TEI recommendations for feature structure markup. The first four sections of this paper present the justification for the recommended treatment of feature structures, of features and their values, and of combinations of features or values and of alternations and negations of features and their values. Section 5 departs briefly from the linguistic focus to argue that the markup scheme developed for feature structures is in fact a general-purpose mechanism that can be used for a wide range of applications. Section 6 describes an auxiliary document called a feature system declaration that is used to document and validate a system of feature-structure markup. The seventh and final section illustrates the use of the recommended markup scheme with two examples, lexical tagging and interlinear text analysis.Terry Langendoen is Professor and Head of the Department of Linguistics at The University of Arizona. He was Chair of the TEI Committee on Analysis and Interpretation. He received his PhD in Linguistics from the Massachusetts Institute of Technology in 1964, and held teaching positions at The Ohio State University and the City University of New York (Brooklyn College and the Graduate Center) before moving to Arizona in 1988. He is author, co-author, or co-editor of six books in linguistics, and of numerous articles.Gary Simons is Director of the Academic Computing Department of the Summer Institute of Linguistics, Dallas, TX. He served on the TEI Committee on Analysis and Interpretation. He received his PhD in Linguistics (with minor emphasis in Computer Science) from Cornell University in 1979. Before taking up his current position in 1984, he spent five years in the Solomon Islands doing field work with SIL. He is author, co-author, or co-editor of eight books in the fields of linguistics and linguistic computing.The initial feature-structure recommendations were formulated by the Analysis and Interpretation Committee at a meeting in Tucson, Arizona in March 1990, following suggestions by Mitch Marcus and Beatrice Santorini. The authors received valuable help in the further revision and refinement of the recommendations from Steven Zepp.  相似文献   

6.
This paper gives a declarative specification of a popular inheritance system and shows how simple changes to this specification can result in different path-based reasoners. This parameterized definition provides a deeper understanding of the fundamental differences between some of the more popular path-based inheritance reasoners. In particular, it allows the clarification of some of the results on the complexity of reasoning in the various systems. The uniform framework also allows definition of novel systems which constitute intermediate points in the space of possible reasoners, and facilitates perspicuous Prolog implementation. The work reported here is primarily the research of Carl Vogel, with Fred Popowich being particularly involved with the initial logical specification and implementation. Thanks to Nick Cercone, who collaborated with the authors on earlier research relating to the material presented in this paper, and also to two anonymous reviewers for constructive suggestions. Vogel is particularly grateful to Robin Cooper and Jeff Pelletier for feedback and encouragement as well as to the Marshall Aid Commemoration Commission for making it possible for him to do his Ph. D. at the Centre for Cognitive Science in Edinburgh. Popowich wishes to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada. Carl Vogel, Ph.D.: He is a Research Scientist in the Institute for Computational Linguistics at the University of Stuttgart. He is grateful to the Sonderforshungsbereich 340 for funding his postdoctoral work there. Vogel finished his Ph.D. in Cognitive Science at the University of Edinburgh in 1995. He is interested in the proof theory and semantics of default reasoning as well as consequent applications throughout computational linguistics: semantics of natural language generics, robust processing of natural language in typed feature systems, and syntactic representation. Fred Popowich, Ph.D.: He is an Associate Professor of Computing Science and an Associate Member of the Department of Linguistics at Simon Fraser University. He received his Ph.D. in Cognitive Science/ Artificial Intelligence from the University of Edinburgh in 1989. His current research interests include the development and processing of unification based grammars, machine translation, natural language interfaces to databases, the structure of the lexicon, the use of inheritance in the lexicon, and the use of lexical resources in natural language processing applications.  相似文献   

7.
Content based image retrieval represents images as N- dimensional feature vectors. The k images most similar to a target image, i.e., those closest to its feature vector, are determined by applying a k-nearest-neighbor (k-NN) query. A sequential scan of the feature vectors for k-NN queries is costly for a large number of images when N is high. The search space can be reduced by indexing the data, but the effectiveness of multidimensional indices is poor for high dimensional data. Building indices on dimensionality reduced data is one method to improve indexing efficiency. We utilize the Singular Value Decomposition (SVD) method to attain dimensionality reduction (DR) with minimum information loss for static data. Clustered SVD (CSVD) combines clustering with SVD to attain a lower normalized mean square error (NMSE) by taking advantage of the fact that most real-world datasets exhibit local rather than global correlations. The Local Dimensionality Reduction (LDR) method differs from CSVD in that it uses an SVD-friendly clustering method, rather than the k-means clustering method. We propose a hybrid method which combines the clustering method of LDR with the DR method of CSVD, so that the vector of the number of retained dimensions of the clusters is determined by varying the NMSE. We build SR-tree indices based on the vectors in the clusters to determine the number of accessed pages for exact k-NN queries (Thomasian et al., Inf Process Lett - IPL 94(6):247–252, 2005) (see Section A.3 versus the NMSE. Varying the NMSE a minimum cost can be found, because the lower cost of accessing a smaller index is offset with the higher postprocessing cost resulting from lower retrieval accuracy. Experimenting with one synthetic and three real-world datasets leads to the conclusion that the lowest cost is attained at NMSE ≈ 0.03 and between 1/3 and 1/2 of the number of dimensions are retained. In one case doubling the number of dimensions cuts the number of accessed pages by one half. The Appendix provides the requisite background information for reading this paper.
Lijuan ZhangEmail:

Alexander Thomasian   is a Professor of Computer Science at NJIT. He was a faculty member at Case Western University and University of Southern California and an adjunct professor at Columbia University, while a Research Staff Member at the IBM T. J. Watson Research Center (1985–1998). He received his Masters and PhD degrees in Computer Science from UCLA. Dr. Thomasian’s research has more recently been focused on indexing high-dimensional datasets and the performance of storage systems. He has contributed to the performance analysis area and especially the analysis and synthesis of concurrency control methods. He has published over 50 journal and over 60 conference papers. He holds four patents, received innovation and invention awards at IBM. He has served as an area editor of the IEEE Trans. Parallel and Distributed Systems and has been on the program committees of numerous conferences. He has given numerous tutorials on storage systems, high performance systems for database applications, etc. He is the author of Database Concurrency Control: Methods, Performance, and Analysis, Kluwer 1996. Dr. Thomasian is a member of ACM and a Fellow of IEEE. Yue Li   started her PhD studies at NJIT in Fall 2000, after completing her MS degree in Computer Science at Shandong University, Jinan, China. Her PhD thesis on “Efficient similarity search in high dimensional data” was completed in May 2004. Dr Li is a Software Engineer at AIG Software in NJ. She is the author of a half a dozen publications. Lijuan Zhang   received her Master’s degree in Computer Science from Northeastern University, China in 1999 and PhD degree in Computer Science from NJIT in 2005 with a dissertation in highdimensional indexing methods. She was a software engineer in Huawei Technologies, China. In 2005, She joined Amicas, Inc as a software engineer focusing on picture archiving and communication technologies. Her research interest was in high-dimensional indexing techniques, similarity search, content-based image retrieval, time series etc. Her current interest is in medical imaging and information management, including DICOM, HL7.   相似文献   

8.
There has been increased interest on the impact of mobile devices such as PDAs and Tablet PCs in introducing new pedagogical approaches and active learning experiences. We propose an intelligent system that efficiently addresses the inherent subjectivity in student perception of note taking and information retrieval. We employ the idea of cross indexing the digital ink notes with matching electronic documents in the repository. Latent Semantic Indexing is used to perform document and page level indexing. Thus for each retrieved document, the user can go over to the relevant pages that match the query. Techniques to handle problems such as polysemy (multiple meanings of a word) in large databases, document folding and no match for query are discussed. We tested our system for its performance, usability and effectiveness in the learning process. The results from the exploratory studies reveal that the proposed system provides a highly enhanced student learning experience, thereby facilitating high test scores.
William I. GroskyEmail:

Akila Varadarajan   is a Senior Software Engineer at Motorola, IL with the Mobile devices division. Prior joining Motorola, she was a Software development intern at Autodesk, MI and Graduate Research assistant at University of Michigan - Dearborn. She received her MS in Computer Engineering from University of Michigan in 2006 and her BS in Computer Engineering from Madurai Kamaraj University, India in 2003. She is interested in Mobile computing - specifically Human Factors of Mobile Computing, Information retrieval and pattern recognition. Nilesh Patel   is Assistant Professor in the department of Computer Science and Engineering at Oakland University, MI. He received his PhD and MS in Computer Science from Wayne State University, MI in 1997 and 1993. He is interested in Multimedia Information Processing - specifically audio and video indexing, retrieval and event detection, Pattern Recognition, Distributed Data Mining in a heterogeneous environment, and Computer Vision with special interest in medical imaging. Dr. Patel has also served in the automotive sector for several years and developed interest in Telematics and Mobile Computing. Bruce Maxim   has worked as a software engineer for the past 31 years. He is a member of the Computer and Information Science faculty at the University of Michigan-Dearborn since 1985. He serves as the computing laboratory supervisor and head of the undergraduate programs in Computer Science, Software Engineering, and Information Systems. He has created more than 15 Computer and Information Science courses dealing with software engineering, game design, artificial intelligence, user interface design, web engineering, software quality, and computer programming. He has authored or co-authored four books on programming and software engineering. He has most recently served on the pedagogy subcommittee for Software Engineering 2004 and contributed to the IDGA Game Curriculum Framework 2008 guidelines. William I. Grosky   is currently Professor and Chair of the Department of Computer and Information Science at University of Michigan - Dearborn, Dearborn, Michigan. Prior to joining the University of Michigan in 2001, he was Professor and Chair of the Department of Computer Science at Wayne State University, Detroit, Michigan. Before joining Wayne State University in 1976, he was an Assistant Professor in the Department of Information and Computer Science at Georgia Tech, Atlanta, Georgia. He received his B.S. in Mathematics from MIT in 1965, his M.S. in Applied Mathematics from Brown University in 1968, and his Ph.D. in Engineering and Applied Science from Yale University in 1971.   相似文献   

9.
Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous one-to-many and many-to-many information dissemination. Event systems are widely used because asynchronous messaging provides a flexible alternative to RPC. They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. This paper addresses the optimization of content-based routing tables organized using the covering relation and presents novel configurations for improving local and distributed operation. We present the poset-derived forest data structure and variants that perform considerably better under frequent filter additions and removals than existing data structures. The results offer a significant performance increase to currently known covering-based routing mechanisms. Sasu Tarkoma received his M.Sc. and Ph.Lic degrees in Computer Science from the University of Helsinki, Department of Computer Science. He has over 20 scientific publications and has also contributed to several books on mobile middleware. His research interests include distributed computing and middleware. Jaakko Kangasharju is a PhD student at the University of Helsinki and working as a researcher at the Helsinki Institute for Information Technology. His research is concentrated on XML messaging and processing in the mobile wireless environment. He has participated in related standardization efforts at the Object Management Group and the World Wide Web Consortium.  相似文献   

10.
Because of the media digitization, a large amount of information such as speech, audio and video data is produced everyday. In order to retrieve data from these databases quickly and precisely, multimedia technologies for structuring and retrieving of speech, audio and video data are strongly required. In this paper, we overview the multimedia technologies such as structuring and retrieval of speech, audio and video data, speaker indexing, audio summarization and cross media retrieval existing today for TV news detabase. The main purpose of structuring is to produce tables of contents and indices from audio and video data automatically. In order to make these technologies feasible, first, processing units such as words on audio data and shots on video data are extracted. On a second step, they are meaningfully integrated into topics. Furthermore, the units extracted from different types of media are integrated for higher functions. Yasuo Ariki, Ph.D.: He is a Professor in the Department of Electronics and Informatics at the Ryukoku University. He received his B.E., M.E. and Ph.D. in information science from Kyoto University in 1974, 1976 and 1979, respectively. He had been an Assistant in Kyoto University from 1980 to 1990, and stayed at Edinburgh University as visiting academic from 1987 to 1990. His research interests are in speech and image recognition and in information retrieval and database. He is a member of IPSJ, IEICE, ASJ, Soc. Artif. Intel. and IEEE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号