共查询到20条相似文献,搜索用时 187 毫秒
1.
Yingjie Zhang Bin Li Xinyu Dai Shujian Huang Jiajun Chen 《Language Resources and Evaluation》2017,51(2):525-545
The Princeton WordNet® (PWN) is a widely used lexical knowledge database for semantic information processing. There are now many wordnets under creation for languages worldwide. In this paper, we endeavor to construct a wordnet for Pre-Qin ancient Chinese (PQAC), called PQAC WordNet (PQAC-WN), to process the semantic information of PQAC. In previous work, most recently constructed wordnets have been established either manually by experts or automatically using resources from which translation pairs between English and the target language can be extracted. The former method, however, is time-consuming, and the latter method, owing to a lack of language resources, cannot be performed on PQAC. As a result, a method based on word definitions in a monolingual dictionary is proposed. Specifically, for each sense, kernel words are first extracted from its definition, and the senses of each kernel word are then determined by graph-based Word Sense Disambiguation. Finally, one optimal sense is chosen from the kernel word senses to guide the mapping between the word sense and PWN synset. In this research, we obtain 66 % PQAC senses that can be shared with English and another 14 % language-specific senses that were added to PQAC-WN as new synsets. Overall, the automatic mapping achieves a precision of over 85 %. 相似文献
2.
Michael Zock Olivier Ferret Didier Schwab 《International Journal of Speech Technology》2010,13(4):201-218
No doubt, words play a major role in language production, hence finding them is of vital importance, be it for writing or
for speaking (spontaneous discourse production, simultaneous translation). Words are stored in a dictionary, and the general
belief holds, the more entries the better. Yet, to be truly useful the resource should contain not only many entries and a
lot of information concerning each one of them, but also adequate navigational means to reveal the stored information. Information
access depends crucially on the organization of the data (words) and the access keys (meaning/form), two factors largely overlooked.
We will present here some ideas of how an existing electronic dictionary could be enhanced to support a speaker/writer to
find the word s/he is looking for. To this end we suggest to add to an existing electronic dictionary an index based on the
notion of association, i.e. words co-occurring in a well balanced corpus, the latter being supposed to represent the average
citizen’s knowledge of the world. Before describing our approach, we will briefly take a critical look at the work being done
by colleagues working on automatic, spontaneous or deliberate language production,—that is, computer-generated language, simulation of the mental lexicon, or WordNet (WN),—to see how adequate they are with regard to our goal. 相似文献
3.
This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD. 相似文献
4.
5.
Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches. The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a dictionary. In this paper, we address several concerns raised with regard to our approach, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet. As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN currently the largest wordnet in the world by these statistics. 相似文献
6.
Tshering Cigay Dorji El-sayed Atlam Susumu Yata Masao Fuketa Kazuhiro Morita Jun-ichi Aoe 《Knowledge and Information Systems》2011,27(1):141-161
Field Association (FA) Terms—words or phrases that serve to identify document fields are effective in document classification,
similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract and select
relevant FA Terms to build a comprehensive dictionary of FA Terms. This paper presents a new method to extract, select and
rank FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules, corpora comparison and modified tf-idf weighting. Experimental evaluation on 21 fields using 306 MB of domain-specific corpora obtained from English Wikipedia dumps
selected up to 2,517 FA Terms (single and compound) per field at precision and recall of 74–97 and 65–98. This is better than
the traditional methods. The FA Terms dictionary constructed using this method achieved an average accuracy of 97.6% in identifying
the fields of 10,077 test documents collected from Wikipedia, Reuters RCV1 corpus and 20 Newsgroup data set. 相似文献
7.
A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’.
For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their
pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment
is a hard problem for a language like English. In this paper, we describe the use of a form of the expectation-maximization
(EM) algorithm to learn alignments of English text and phonemes, starting from a variety of initializations. We use the British
English Example Pronunciation (BEEP) dictionary of almost 200,000 words in this work. The quality of alignment is difficult
to determine quantitatively since no ‘gold standard’ correct alignment exists. We evaluate the success of our algorithm indirectly
from the performance of a pronunciation by analogy system using the aligned dictionary data as a knowledge base for inferring
pronunciations. We find excellent performance—the best so far reported in the literature. There is very little dependence
on the start point for alignment, indicating that the EM search space is strongly convex. Since the aligned BEEP dictionary
is a potentially valuable resource, it is made freely available for research use. 相似文献
8.
Olfaction—or smell—is one of the last challenges which multimedia and multimodal applications have to conquer. Enhancing such
applications with olfactory stimuli has the potential to create a more complex—and richer—user multimedia experience, by heightening
the sense of reality and diversifying user interaction modalities. Nonetheless, olfaction-enhanced multimedia still remains
a challenging research area. More recently, however, there have been initial signs of olfactory-enhanced applications in multimedia,
with olfaction being used towards a variety of goals, including notification alerts, enhancing the sense of reality in immersive
applications, and branding, to name but a few. However, as the goal of a multimedia application is to inform and/or entertain
users, achieving quality olfaction-enhanced multimedia applications from the users’ perspective is vital to the success and
continuity of these applications. Accordingly, in this paper we have focused on investigating the user perceived experience
of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important
from a user’s perspective of these applications, and consequently ensure the continued advancement and success of olfaction-enhanced
multimedia applications. 相似文献
9.
Adam Kilgarriff Michael Rundell Elaine Uí Dhonnchadha 《Language Resources and Evaluation》2006,40(2):127-152
In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus—the New Corpus for Ireland
(NCI)—to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the
solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find
it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We
describe its design, collection and encoding.
相似文献
Adam KilgarriffEmail: |
10.
11.
e-CLV: A Modeling Approach for Customer Lifetime Evaluation in e-Commerce Domains, with an Application and Case Study for Online Auction 总被引:1,自引:0,他引:1
e-Commerce companies acknowledge that customers are their most important asset and that it is imperative to estimate the potential
value of this asset.
In conventional marketing, one of the widely accepted methods for evaluating customer value uses models known as Customer
Lifetime Value (CLV). However, these existing models suffer from two major shortcomings: They either do not take into account
significant attributes of customer behavior unique to e-Commerce, or they do not provide a method for generating specific
models from the large body of relevant historical data that can be easily collected in e-Commerce sites.
This paper describes a general modeling approach, based on Markov Chain Models, for calculating customer value in the e-Commerce
domain. This approach extends existing CLV models, by taking into account a new set of variables required for evaluating customers
value in an e-Commerce environment. In addition, we describe how data-mining algorithms can aid in deriving such a model,
thereby taking advantage of the historical customer data available in such environments. We then present an application of
this modeling approach—the creation of a model for online auctions—one of the fastest-growing and most lucrative types of
e-Commerce. The article also describes a case study, which demonstrates how our model provides more accurate predictions than
existing conventional CLV models regarding the future income generated by customers.
Opher Etzion is a research staff member and the manager of the active management technology group in IBM Research Laboratory in Haifa,
Israel, and a visiting research scientist at the Technion—Israel Institute of Technology. He received BA degree in Philosophy
from Tel-Aviv University and Ph.D. degree in Computer Science from Temple University, Prior to joining IBM in 1997, he has
been a faculty member at the Technion, where he has served as the founding head of the information systems engineering area
and graduate program. Prior to his graduate studies, he held professional and managerial positions in industry and in the
Israel Air-Force, receiving the air-force highest award in 1982. His research interests include active technology (active
databases and beyond), temporal databases, middleware systems and rule-base systems. He is a member of the editorial board
of the IIE Transactions Journal, was a guest editor in the Journal of Intelligent Information Systems in 1994, and a guest
editor in the International Journal of Cooperative Information Systems (2001). He served as a coordinating organizer of the
Dagstuhl seminar on Temporal databases in 1997, has been the coeditor of the book “Temporal Databases—Research and Practice”
published by Springer-Verlag, in 2000 he has been program chair of CoopIS'2000, and demo and panel chair of VLDB'2000. He
also served in many conferences program committees (e.g. VLDB, ICDE, ER) as well as national committees and has been program
and general chair in the NGITS workshop series.
Amit Fisher is a research staff member in the active management technology group in IBM Research Laboratory in Haifa, Israel. He received
B.Sc. degree in Industrial Engineering and Management and M.Sc. degree in Information System Engineering from the Technion—Israel
institute of Technology.
Prior to joining IBM research, Amit held a professional position in Israel Air-Force. He's research interests include Customer
Behavior Analysis, CRM, Data Mining and Business Process Management.
Segev Wasserkrug is a research staff member at IBM's Haifa Research lab (HRL). He has a M.Sc. in computer science, in the area of neural networks.
In addition, he has significant experience in modeling of various types, automatic model derivation, and optimization, gained
from leading the development of a technology in HRL that deals with optimization of an IT infrastructure according to business
objectives. He is also currently studying towards a Ph.D. in information systems at the Technion, Israel Institute of Technology,
in the area of uncertainty handling, based on Bayesian network techniques 相似文献
12.
Stefan Gruner 《Minds and Machines》2011,21(2):275-299
On the basis of an earlier contribution to the philosophy of computer science by Amnon Eden, this essay discusses to what
extent Eden’s ‘paradigms’ of computer science can be transferred or applied to software engineering. This discussion implies
an analysis of how software engineering and computer science are related to each other. The essay concludes that software
engineering can neither be fully subsumed by computer science, nor vice versa. Consequently, also the philosophies of computer
science and software engineering—though related to each other—are not identical branches of a general philosophy of science.
This also implies that not all of Eden’s earlier arguments can be directly mapped from the domain of computer science into
the domain of software science. After the discussion of this main topic, the essay also points to some further problems and
open issues for future studies in the philosophy of software science and engineering. 相似文献
13.
John M. Carroll Gregorio Convertino Umer Farooq Mary Beth Rosson 《Universal Access in the Information Society》2012,11(1):7-15
Technology can improve the quality of life for elderly persons by supporting and facilitating the unique leadership roles
that elderly play in groups, communities, and other organizations. Elderly people are often organizational firekeepers. They maintain community memory, pass on organizational practices, and ensure social continuity. This paper reports studies
of several essential community roles played by elderly community members—including the role of volunteer community webmaster—and
describes two positive design projects that investigated how technology can support new kinds of social endeavors and contributions
to society by elderly citizens. Finally, the paper speculates on the utility of intergenerational teams in strengthening society’s
workforce. 相似文献
14.
Propositional satisfiability (SAT) is a success story in Computer Science and Artificial Intelligence: SAT solvers are currently
used to solve problems in many different application domains, including planning and formal verification. The main reason
for this success is that modern SAT solvers can successfully deal with problems having millions of variables. All these solvers
are based on the Davis–Logemann–Loveland procedure (dll). In its original version, dll is a decision procedure, but it can be very easily modified in order to return one or all assignments satisfying the input
set of clauses, assuming at least one exists. However, in many cases it is not enough to compute assignments satisfying all
the input clauses: Indeed, the returned assignments have also to be “optimal” in some sense, e.g., they have to satisfy as
many other constraints—expressed as preferences—as possible. In this paper we start with qualitative preferences on literals,
defined as a partially ordered set (poset) of literals. Such a poset induces a poset on total assignments and leads to the
definition of optimal model for a formula ψ as a minimal element of the poset on the models of ψ. We show (i) how dll can be extended in order to return one or all optimal models of ψ (once converted in clauses and assuming ψ is satisfiable), and (ii) how the same procedures can be used to compute optimal models wrt a qualitative preference on formulas and/or wrt a quantitative
preference on literals or formulas. We implemented our ideas and we tested the resulting system on a variety of very challenging
structured benchmarks. The results indicate that our implementation has comparable performances with other state-of-the-art
systems, tailored for the specific problems we consider. 相似文献
15.
In D’Ariano in Philosophy of Quantum Information and Entanglement, Cambridge University Press, Cambridge, UK (2010), one of
the authors proposed a set of operational postulates to be considered for axiomatizing Quantum Theory. The underlying idea
is to derive Quantum Theory as the mathematical representation of a fair operational framework, i.e. a set of rules which allows the experimenter to make predictions on future events on the basis of suitable tests, e.g. without interference from uncontrollable sources and having local control and low experimental complexity. In addition
to causality, two main postulates have been considered: PFAITH (existence of a pure preparationally faithful state), and FAITHE
(existence of a faithful effect). These postulates have exhibited an unexpected theoretical power, excluding all known nonquantum
probabilistic theories. In the same paper also postulate PURIFY-1 (purifiability of all states) has been introduced, which
later has been reconsidered in the stronger version PURIFY-2 (purifiability of all states unique up to reversible channels
on the purifying system) in Chiribella et al. (Reversible realization of physical processes in probabilistic theories, arXiv:0908.1583).
There, it has been shown that Postulate PURIFY-2, along with causality and local discriminability, narrow the probabilistic
theory to something very close to the quantum one. In the present paper we test the above postulates on some nonquantum probabilistic
models. The first model—the two-box world—is an extension of the Popescu–Rohrlich model (Found Phys, 24:379, 1994), which achieves the greatest violation of the CHSH
inequality compatible with the no-signaling principle. The second model—the two-clock world— is actually a full class of models, all having a disk as convex set of states for the local system. One of them corresponds
to—the two-rebit world— namely qubits with real Hilbert space. The third model—the spin-factor—is a sort of n-dimensional generalization of the clock. Finally the last model is the classical probabilistic theory. We see how each model violates some of the proposed postulates, when and how teleportation can be achieved, and we analyze
other interesting connections between these postulate violations, along with deep relations between the local and the non-local
structures of the probabilistic theory. 相似文献
16.
Manipulatives—physical learning materials such as cubes or tiles—are prevalent in educational settings across cultures and
have generated substantial research into how actions with physical objects may support children’s learning. The ability to
integrate digital technology into physical objects—so-called ‘digital manipulatives’—has generated excitement over the potential
to create new educational materials. However, without a clear understanding of how actions with physical materials lead to
learning, it is difficult to evaluate or inform designs in this area. This paper is intended to contribute to the development
of effective tangible technologies for children’s learning by summarising key debates about the representational advantages
of manipulatives under two key headings: offloading cognition—where manipulatives may help children by freeing up valuable cognitive resources during problem solving, and conceptual metaphors—where perceptual information or actions with objects have a structural correspondence with more symbolic concepts. The review
also indicates possible limitations of physical objects—most importantly that their symbolic significance is only granted
by the context in which they are used. These arguments are then discussed in light of tangible designs drawing upon the authors’
current research into tangibles and young children’s understanding of number. 相似文献
17.
M. Giesbrecht 《Computational Complexity》2001,10(1):41-69
We present a new probabilistic algorithm to compute the Smith normal form of a sparse integer matrix . The algorithm treats A as a “black box”—A is only used to compute matrix-vector products and we do not access individual entries in A directly. The algorithm requires about black box evaluations for word-sized primes p and , plus additional bit operations. For sparse matrices this represents a substantial improvement over previously known algorithms.
The new algorithm suffers from no “fill-in” or intermediate value explosion, and uses very little additional space. We also
present an asymptotically fast algorithm for dense matrices which requires about bit operations, where O(MM(m)) operations are sufficient to multiply two matrices over a field. Both algorithms are probabilistic of the Monte Carlo type — on any input they return the correct answer
with a controllable, exponentially small probability of error.
Received: March 9, 2000. 相似文献
18.
Linux malware can pose a significant threat—its (Linux) penetration is exponentially increasing—because little is known or
understood about Linux OS vulnerabilities. We believe that now is the right time to devise non-signature based zero-day (previously
unknown) malware detection strategies before Linux intruders take us by surprise. Therefore, in this paper, we first do a
forensic analysis of Linux executable and linkable format (ELF) files. Our forensic analysis provides insight into different
features that have the potential to discriminate malicious executables from benign ones. As a result, we can select a features’
set of 383 features that are extracted from an ELF headers. We quantify the classification potential of features using information
gain and then remove redundant features by employing preprocessing filters. Finally, we do an extensive evaluation among classical
rule-based machine learning classifiers—RIPPER, PART, C4.5 Rules, and decision tree J48—and bio-inspired classifiers—cAnt
Miner, UCS, XCS, and GAssist—to select the best classifier for our system. We have evaluated our approach on an available
collection of 709 Linux malware samples from vx heavens and offensive computing. Our experiments show that ELF-Miner provides more than 99% detection accuracy with less than 0.1% false alarm rate. 相似文献
19.
For a Markovian source, we analyze the Lempel—Ziv parsing scheme that partitions sequences into phrases such that a new phrase
is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (i-1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called
the Gilbert—Kadota model, a fixed number of phrases is generated according to the Lempel—Ziv algorithm, thus producing a sequence of a variable (random) length. In
the last model, known also as the Lempel—Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed
by digital search trees that are of interest to other algorithms such as sorting, searching, and pattern matching. In this
paper we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical
phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase
length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution,
which leads to bounds on the average redundancy of the Lempel—Ziv code. For the Markov Independent model, this finding is
established by analytic methods (i.e., generating functions, Mellin transform, and depoissonization), while for the other
two models we use a combination of analytic and probabilistic analyses.
Received June 6, 2000; revised January 14, 2001. 相似文献
20.
Computability and Complexity in Self-assembly 总被引:1,自引:0,他引:1
James I. Lathrop Jack H. Lutz Matthew J. Patitz Scott M. Summers 《Theory of Computing Systems》2011,48(3):617-647
This paper explores the impact of geometry on computability and complexity in Winfree’s model of nanoscale self-assembly.
We work in the two-dimensional tile assembly model, i.e., in the discrete Euclidean plane ℤ×ℤ. Our first main theorem says
that there is a roughly quadratic function f such that a set A⊆ℤ+ is computably enumerable if and only if the set X
A
={(f(n),0)∣n∈A}—a simple representation of A as a set of points on the x-axis—self-assembles in Winfree’s sense. In contrast, our second main theorem says that there are decidable sets D⊆ℤ×ℤ that do not self-assemble in Winfree’s sense. 相似文献