首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %.  相似文献   

2.
No doubt, words play a major role in language production, hence finding them is of vital importance, be it for writing or for speaking (spontaneous discourse production, simultaneous translation). Words are stored in a dictionary, and the general belief holds, the more entries the better. Yet, to be truly useful the resource should contain not only many entries and a lot of information concerning each one of them, but also adequate navigational means to reveal the stored information. Information access depends crucially on the organization of the data (words) and the access keys (meaning/form), two factors largely overlooked. We will present here some ideas of how an existing electronic dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic dictionary an index based on the notion of association, i.e. words co-occurring in a well balanced corpus, the latter being supposed to represent the average citizen’s knowledge of the world. Before describing our approach, we will briefly take a critical look at the work being done by colleagues working on automatic, spontaneous or deliberate language production,—that is, computer-generated language, simulation of the mental lexicon, or WordNet (WN),—to see how adequate they are with regard to our goal.  相似文献   

3.
This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD.  相似文献   

4.
5.
Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches. The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a dictionary. In this paper, we address several concerns raised with regard to our approach, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet. As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN currently the largest wordnet in the world by these statistics.  相似文献   

6.
Field Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set.  相似文献   

7.
A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe the use of a form of the expectation-maximization (EM) algorithm to learn alignments of English text and phonemes, starting from a variety of initializations. We use the British English Example Pronunciation (BEEP) dictionary of almost 200,000 words in this work. The quality of alignment is difficult to determine quantitatively since no ‘gold standard’ correct alignment exists. We evaluate the success of our algorithm indirectly from the performance of a pronunciation by analogy system using the aligned dictionary data as a knowledge base for inferring pronunciations. We find excellent performance—the best so far reported in the literature. There is very little dependence on the start point for alignment, indicating that the EM search space is strongly convex. Since the aligned BEEP dictionary is a potentially valuable resource, it is made freely available for research use.  相似文献   

8.
Olfaction—or smell—is one of the last challenges which multimedia and multimodal applications have to conquer. Enhancing such applications with olfactory stimuli has the potential to create a more complex—and richer—user multimedia experience, by heightening the sense of reality and diversifying user interaction modalities. Nonetheless, olfaction-enhanced multimedia still remains a challenging research area. More recently, however, there have been initial signs of olfactory-enhanced applications in multimedia, with olfaction being used towards a variety of goals, including notification alerts, enhancing the sense of reality in immersive applications, and branding, to name but a few. However, as the goal of a multimedia application is to inform and/or entertain users, achieving quality olfaction-enhanced multimedia applications from the users’ perspective is vital to the success and continuity of these applications. Accordingly, in this paper we have focused on investigating the user perceived experience of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important from a user’s perspective of these applications, and consequently ensure the continued advancement and success of olfaction-enhanced multimedia applications.  相似文献   

9.
In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus—the New Corpus for Ireland (NCI)—to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding.
Adam KilgarriffEmail:
  相似文献   

10.
11.
e-Commerce companies acknowledge that customers are their most important asset and that it is imperative to estimate the potential value of this asset. In conventional marketing, one of the widely accepted methods for evaluating customer value uses models known as Customer Lifetime Value (CLV). However, these existing models suffer from two major shortcomings: They either do not take into account significant attributes of customer behavior unique to e-Commerce, or they do not provide a method for generating specific models from the large body of relevant historical data that can be easily collected in e-Commerce sites. This paper describes a general modeling approach, based on Markov Chain Models, for calculating customer value in the e-Commerce domain. This approach extends existing CLV models, by taking into account a new set of variables required for evaluating customers value in an e-Commerce environment. In addition, we describe how data-mining algorithms can aid in deriving such a model, thereby taking advantage of the historical customer data available in such environments. We then present an application of this modeling approach—the creation of a model for online auctions—one of the fastest-growing and most lucrative types of e-Commerce. The article also describes a case study, which demonstrates how our model provides more accurate predictions than existing conventional CLV models regarding the future income generated by customers. Opher Etzion is a research staff member and the manager of the active management technology group in IBM Research Laboratory in Haifa, Israel, and a visiting research scientist at the Technion—Israel Institute of Technology. He received BA degree in Philosophy from Tel-Aviv University and Ph.D. degree in Computer Science from Temple University, Prior to joining IBM in 1997, he has been a faculty member at the Technion, where he has served as the founding head of the information systems engineering area and graduate program. Prior to his graduate studies, he held professional and managerial positions in industry and in the Israel Air-Force, receiving the air-force highest award in 1982. His research interests include active technology (active databases and beyond), temporal databases, middleware systems and rule-base systems. He is a member of the editorial board of the IIE Transactions Journal, was a guest editor in the Journal of Intelligent Information Systems in 1994, and a guest editor in the International Journal of Cooperative Information Systems (2001). He served as a coordinating organizer of the Dagstuhl seminar on Temporal databases in 1997, has been the coeditor of the book “Temporal Databases—Research and Practice” published by Springer-Verlag, in 2000 he has been program chair of CoopIS'2000, and demo and panel chair of VLDB'2000. He also served in many conferences program committees (e.g. VLDB, ICDE, ER) as well as national committees and has been program and general chair in the NGITS workshop series. Amit Fisher is a research staff member in the active management technology group in IBM Research Laboratory in Haifa, Israel. He received B.Sc. degree in Industrial Engineering and Management and M.Sc. degree in Information System Engineering from the Technion—Israel institute of Technology. Prior to joining IBM research, Amit held a professional position in Israel Air-Force. He's research interests include Customer Behavior Analysis, CRM, Data Mining and Business Process Management. Segev Wasserkrug is a research staff member at IBM's Haifa Research lab (HRL). He has a M.Sc. in computer science, in the area of neural networks. In addition, he has significant experience in modeling of various types, automatic model derivation, and optimization, gained from leading the development of a technology in HRL that deals with optimization of an IT infrastructure according to business objectives. He is also currently studying towards a Ph.D. in information systems at the Technion, Israel Institute of Technology, in the area of uncertainty handling, based on Bayesian network techniques  相似文献   

12.
On the basis of an earlier contribution to the philosophy of computer science by Amnon Eden, this essay discusses to what extent Eden’s ‘paradigms’ of computer science can be transferred or applied to software engineering. This discussion implies an analysis of how software engineering and computer science are related to each other. The essay concludes that software engineering can neither be fully subsumed by computer science, nor vice versa. Consequently, also the philosophies of computer science and software engineering—though related to each other—are not identical branches of a general philosophy of science. This also implies that not all of Eden’s earlier arguments can be directly mapped from the domain of computer science into the domain of software science. After the discussion of this main topic, the essay also points to some further problems and open issues for future studies in the philosophy of software science and engineering.  相似文献   

13.
Technology can improve the quality of life for elderly persons by supporting and facilitating the unique leadership roles that elderly play in groups, communities, and other organizations. Elderly people are often organizational firekeepers. They maintain community memory, pass on organizational practices, and ensure social continuity. This paper reports studies of several essential community roles played by elderly community members—including the role of volunteer community webmaster—and describes two positive design projects that investigated how technology can support new kinds of social endeavors and contributions to society by elderly citizens. Finally, the paper speculates on the utility of intergenerational teams in strengthening society’s workforce.  相似文献   

14.
Propositional satisfiability (SAT) is a success story in Computer Science and Artificial Intelligence: SAT solvers are currently used to solve problems in many different application domains, including planning and formal verification. The main reason for this success is that modern SAT solvers can successfully deal with problems having millions of variables. All these solvers are based on the Davis–Logemann–Loveland procedure (dll). In its original version, dll is a decision procedure, but it can be very easily modified in order to return one or all assignments satisfying the input set of clauses, assuming at least one exists. However, in many cases it is not enough to compute assignments satisfying all the input clauses: Indeed, the returned assignments have also to be “optimal” in some sense, e.g., they have to satisfy as many other constraints—expressed as preferences—as possible. In this paper we start with qualitative preferences on literals, defined as a partially ordered set (poset) of literals. Such a poset induces a poset on total assignments and leads to the definition of optimal model for a formula ψ as a minimal element of the poset on the models of ψ. We show (i) how dll can be extended in order to return one or all optimal models of ψ (once converted in clauses and assuming ψ is satisfiable), and (ii) how the same procedures can be used to compute optimal models wrt a qualitative preference on formulas and/or wrt a quantitative preference on literals or formulas. We implemented our ideas and we tested the resulting system on a variety of very challenging structured benchmarks. The results indicate that our implementation has comparable performances with other state-of-the-art systems, tailored for the specific problems we consider.  相似文献   

15.
In D’Ariano in Philosophy of Quantum Information and Entanglement, Cambridge University Press, Cambridge, UK (2010), one of the authors proposed a set of operational postulates to be considered for axiomatizing Quantum Theory. The underlying idea is to derive Quantum Theory as the mathematical representation of a fair operational framework, i.e. a set of rules which allows the experimenter to make predictions on future events on the basis of suitable tests, e.g. without interference from uncontrollable sources and having local control and low experimental complexity. In addition to causality, two main postulates have been considered: PFAITH (existence of a pure preparationally faithful state), and FAITHE (existence of a faithful effect). These postulates have exhibited an unexpected theoretical power, excluding all known nonquantum probabilistic theories. In the same paper also postulate PURIFY-1 (purifiability of all states) has been introduced, which later has been reconsidered in the stronger version PURIFY-2 (purifiability of all states unique up to reversible channels on the purifying system) in Chiribella et al. (Reversible realization of physical processes in probabilistic theories, arXiv:0908.1583). There, it has been shown that Postulate PURIFY-2, along with causality and local discriminability, narrow the probabilistic theory to something very close to the quantum one. In the present paper we test the above postulates on some nonquantum probabilistic models. The first model—the two-box world—is an extension of the Popescu–Rohrlich model (Found Phys, 24:379, 1994), which achieves the greatest violation of the CHSH inequality compatible with the no-signaling principle. The second model—the two-clock world— is actually a full class of models, all having a disk as convex set of states for the local system. One of them corresponds to—the two-rebit world— namely qubits with real Hilbert space. The third model—the spin-factor—is a sort of n-dimensional generalization of the clock. Finally the last model is the classical probabilistic theory. We see how each model violates some of the proposed postulates, when and how teleportation can be achieved, and we analyze other interesting connections between these postulate violations, along with deep relations between the local and the non-local structures of the probabilistic theory.  相似文献   

16.
Manipulatives—physical learning materials such as cubes or tiles—are prevalent in educational settings across cultures and have generated substantial research into how actions with physical objects may support children’s learning. The ability to integrate digital technology into physical objects—so-called ‘digital manipulatives’—has generated excitement over the potential to create new educational materials. However, without a clear understanding of how actions with physical materials lead to learning, it is difficult to evaluate or inform designs in this area. This paper is intended to contribute to the development of effective tangible technologies for children’s learning by summarising key debates about the representational advantages of manipulatives under two key headings: offloading cognition—where manipulatives may help children by freeing up valuable cognitive resources during problem solving, and conceptual metaphors—where perceptual information or actions with objects have a structural correspondence with more symbolic concepts. The review also indicates possible limitations of physical objects—most importantly that their symbolic significance is only granted by the context in which they are used. These arguments are then discussed in light of tangible designs drawing upon the authors’ current research into tangibles and young children’s understanding of number.  相似文献   

17.
We present a new probabilistic algorithm to compute the Smith normal form of a sparse integer matrix . The algorithm treats A as a “black box”—A is only used to compute matrix-vector products and we do not access individual entries in A directly. The algorithm requires about black box evaluations for word-sized primes p and , plus additional bit operations. For sparse matrices this represents a substantial improvement over previously known algorithms. The new algorithm suffers from no “fill-in” or intermediate value explosion, and uses very little additional space. We also present an asymptotically fast algorithm for dense matrices which requires about bit operations, where O(MM(m)) operations are sufficient to multiply two matrices over a field. Both algorithms are probabilistic of the Monte Carlo type — on any input they return the correct answer with a controllable, exponentially small probability of error. Received: March 9, 2000.  相似文献   

18.
Linux malware can pose a significant threat—its (Linux) penetration is exponentially increasing—because little is known or understood about Linux OS vulnerabilities. We believe that now is the right time to devise non-signature based zero-day (previously unknown) malware detection strategies before Linux intruders take us by surprise. Therefore, in this paper, we first do a forensic analysis of Linux executable and linkable format (ELF) files. Our forensic analysis provides insight into different features that have the potential to discriminate malicious executables from benign ones. As a result, we can select a features’ set of 383 features that are extracted from an ELF headers. We quantify the classification potential of features using information gain and then remove redundant features by employing preprocessing filters. Finally, we do an extensive evaluation among classical rule-based machine learning classifiers—RIPPER, PART, C4.5 Rules, and decision tree J48—and bio-inspired classifiers—cAnt Miner, UCS, XCS, and GAssist—to select the best classifier for our system. We have evaluated our approach on an available collection of 709 Linux malware samples from vx heavens and offensive computing. Our experiments show that ELF-Miner provides more than 99% detection accuracy with less than 0.1% false alarm rate.  相似文献   

19.
For a Markovian source, we analyze the Lempel—Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (i-1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called the Gilbert—Kadota model, a fixed number of phrases is generated according to the Lempel—Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel—Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching, and pattern matching. In this paper we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel—Ziv code. For the Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform, and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses. Received June 6, 2000; revised January 14, 2001.  相似文献   

20.
Computability and Complexity in Self-assembly   总被引:1,自引:0,他引:1  
This paper explores the impact of geometry on computability and complexity in Winfree’s model of nanoscale self-assembly. We work in the two-dimensional tile assembly model, i.e., in the discrete Euclidean plane ℤ×ℤ. Our first main theorem says that there is a roughly quadratic function f such that a set A⊆ℤ+ is computably enumerable if and only if the set X A ={(f(n),0)∣nA}—a simple representation of A as a set of points on the x-axis—self-assembles in Winfree’s sense. In contrast, our second main theorem says that there are decidable sets D⊆ℤ×ℤ that do not self-assemble in Winfree’s sense.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号