Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus |
| |
Authors: | Javier Tejada-Cárcamo Hiram Calvo Alexander Gelbukh Kazuo Hara |
| |
Affiliation: | 1.San Pablo Catholic University,Arequipa,Peru;2.Center for Computing Research,National Polytechnic Institute,Mexico City,Mexico;3.Nara Institute of Science and Technology,Nara,Japan |
| |
Abstract: | We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented
by McCarthy et al. in 2004 for finding the predominant sense of each word in the entire corpus. Their maximization algorithm allows weighted
terms (similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, i.e., the sense
with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word. This list is
obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus. In the method of McCarthy
et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs.
Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed
similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy versus
67.10% of the original method tested on SemCor. We also analyze the effect of the number of weighted terms in the tasks of
finding the Most Frecuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model. |
| |
Keywords: | |
本文献已被 万方数据 SpringerLink 等数据库收录! |
|