首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 663 毫秒
1.
张玉芳  徐安龙 《计算机应用》2012,32(5):1329-1331
目前,基于混合方法的相似度计算对影响语义相似度的因素分析不全面。针对这个问题,提出了基于多个影响术语语义相似度度量因素的综合方法。该方法结合语义层次,语义距离和局部语义密度,充分运用本体的语义信息来计算基因术语间的语义相似度。实验结果表明,该方法与人工打分的相关系数更高。  相似文献   

2.
In this paper, we proposed a novel approach based on topic ontology for tag recommendation. The proposed approach intelligently generates tag suggestions to blogs. In this approach, we construct topic ontology through enriching the set of categories in existing small ontology called as Open Directory Project. To construct topic ontology, a set of topics and their associated semantic relationships is identified automatically from the corpus‐based external knowledge resources such as Wikipedia and WordNet. The construction relies on two folds such as concept acquisition and semantic relation extraction. In the first fold, a topic‐mapping algorithm is developed to acquire the concepts from the semantic of Wikipedia. A semantic similarity‐clustering algorithm is used to compute the semantic similarity measure to group the set of similar concepts. The second is the semantic relation extraction algorithm, which derives associated semantic relations between the set of extracted topics from the lexical patterns between synsets in WordNet. A suitable software prototype is created to implement the topic ontology construction process. A Jena API framework is used to organize the set of extracted semantic concepts and their corresponding relationship in the form of knowledgeable representation of Web ontology language. Thus, Protégé tool provides the platform to visualize the automatically constructed topic ontology successfully. Using the constructed topic ontology, we can generate and suggest the most suitable tags for the new resource to users. The applicability of topic ontology with a spreading activation algorithm supports efficient recommendation in practice that can recommend the most popular tags for a specific resource. The spreading activation algorithm can assign the interest scores to the existing extracted blog content and tags. The weight of the tags is computed based on the activation score determined from the similarity between the topics in constructed topic ontology and content of the existing blogs. High‐quality tags that has the highest activation score is recommended to the users. Finally, we conducted experimental evaluation of our tag recommendation approach using a large set of real‐world data sets. Our experimental results explore and compare the capabilities of our proposed topic ontology with the spreading activation tag recommendation approach with respect to the existing AutoTag mechanism. And also discuss about the improvement in precision and recall of recommended tags on the data sets of Delicious and BibSonomy. The experiment shows that tag recommendation using topic ontology results in the folksonomy enrichment. Thus, we report the results of an experiment mean to improve the performance of the tag recommendation approach and its quality.  相似文献   

3.
4.
This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments.  相似文献   

5.
一种基因本体术语间的语义相似度计算方法   总被引:2,自引:0,他引:2       下载免费PDF全文
计算基因本体中的术语的语义相似度是基因本体的一个重要应用。基于信息量和基于距离的语义相似度计算方法都只从各自的角度计算术语间语义相似度。提出了基于基因本体中术语所在有向无环图的计算方法。该方法既考虑了术语的祖先对其的信息量的影响,又考虑了术语所在的位置以及术语间的语义联系类型。实验结果表明该方法有较高的准确度。  相似文献   

6.
一种衡量基因语义相似度的新方法*   总被引:1,自引:1,他引:0  
利用GO (Gene Ontoloty) 来衡量基因之间的相似度是近年来研究的热点。传统的方法在准确性上有一定的弊端,本文提出了一种新的方法来衡量基因之间的语义相似度。该方法的主要原则是同时依赖于GO拓扑结构图中基因注释项之间的路径长度和基因注释项的公共祖先节点在GO拓扑结构图中的深度。本文用人工数据和取自酵母基因数据库的基因数据进行了实验,结果表明本文的方法比传统方法更有效。  相似文献   

7.
概率决策树在生物信息数据库中的一个应用   总被引:1,自引:0,他引:1  
GO(GeneOntology)是个标准化的生物信息本体库,被广泛地用来注释基因数据库,然而由于GO结构设计上的缺陷以及目前对基因数据库注释方法多采用手工方式,再加上基因的许多特性尚未发现,使得这种注释还不完全。该文尝试用概率决策树的方法来学习得到基因和GO本体的内在联系,进而预测基因的本体注释情况,也就是预测基因的未知特性,这样就可以引导基因数据库管理员去完善,修正基因数据库的本体注释,并指导生物学家有针对性地设计试验。作为一个应用,用MGI基因数据数据库做试验,分析表明用该方法得到的预测结果准确性比较高。  相似文献   

8.
一种改进的本体语义相似度计算及其应用   总被引:5,自引:1,他引:5  
词语相似度研究,是知识表示以及信息检索领域中的一个重要内容.词语相似度的计算方法一般是利用大规模的语料库来统计.本体给词语间相似度计算带来了新的机会.利用本体结构上的ISA关系,提出了本体内部概念之间的相似度计算方法.实验结果表明,该方法能充分利用本体特点来计算相关概念之间的相似度.结合一个简单本体,介绍了如何计算概念间的相似度,及其在智能检索系统中的应用.  相似文献   

9.
Assessing semantic similarity is a fundamental requirement for many AI applications. Crisp ontology (CO) is one of the knowledge representation tools that can be used for this purpose. Thanks to the development of semantic web, CO‐based similarity assessment has become a popular approach in recent years. However, in the presence of vague information, CO cannot consider uncertainty of relations between concepts. On the other hand, fuzzy ontology (FO) can effectively process uncertainty of concepts and their relations. This paper aims at proposing an approach for assessing concept similarity based on FO. The proposed approach incorporates fuzzy relation composition in combination with an edge counting approach to assess the similarity. Accordingly, proposed measure relies on taxonomical features of an ontology in combination with statistical features of concepts. Furthermore, an evaluation approach for the FO‐based similarity measure named as FOSE is proposed. Considering social network data, proposed similarity measure is evaluated using FOSE. The evaluation results prove the dominance of proposed approach over its respective CO‐based measure.  相似文献   

10.
针对VSM不能揭示文档中特征词间的潜在语义关系,相似度计算准确性较低的问题,结合本体模型的结构特点,从语义重合度、语义距离以及本体结构等因素综合考虑概念间的相似度计算,提出了一种基于领域本体的文档向量空间模型。该模型通过构建概念间的语义相似度矩阵对特征词权值进行调整,建立包含语义关系的标准(学生)答案的向量空间模型,并用"VSM模型+余弦值"算法评估学生答案和标准答案的相似度。实验表明,与传统方法相比,该方法提高了评测效果及准确率。  相似文献   

11.
基于本体的概念语义相似度近年来在信息科学的多个领域获得了广泛的应用,其计算方法也为诸多学者所关注。分析现有基于本体的概念语义相似度计算方法的工作原理和优缺点,提出一种对概念共享路径的重合度和概念最低共同祖先节点的深度进行综合加权的概念语义相似度算法。该算法灵活简便、可扩展性强,能够应用于不同类型的本体。使用基因本体和植物本体的部分数据进行了实验并与两种现有算法进行了比较,实验结果证明了提出的计算方法的正确性和有效性。  相似文献   

12.
一种基于本体的句子相似度计算方法   总被引:2,自引:0,他引:2  
刘宏哲 《计算机科学》2013,40(1):251-256
提出了一种基于树结构本体的句子相似度计算方法。利用本体概念与句子中关键词之间建立的语义索引,构建句子与本体间的直接和间接语义联系,据此提取描述句子的语义向量,从而计算句子间的语义相似度。应用微软研究院的意译语料库(MSRP)对本方法进行了验证,结果表明:与相关的计算方法相比,本方法在不完备附加信息应用前提下获得了较好的准确率和召回率。  相似文献   

13.
基于个性化本体的图像语义标注和检索   总被引:1,自引:0,他引:1  
针对目前图像检索系统较难实现语义检索的问题,提出了一种新的以本体为核心的图像语义标注和检索模型。构建个性化本体描述图像语义,继而提取基于概念集的图像语义特征并利用本体中“Is-A”关系设计相似性度量方法最终实现语义扩展检索。其难点在于顶级本体向个性化本体进化,以及基于概念集和“Is-A”关系实现语义相似度量的方法。通过系统的初步实现与相关实验的验证,该模型的检索准确度可达88.6%,明显高于传统的基于关键字和基于通用本体的图像检索,实现了图像智能检索功能。  相似文献   

14.
Biclusters are subsets of genes that exhibit similar behavior over a set of conditions. A biclustering algorithm is a useful tool for uncovering groups of genes involved in the same cellular processes and groups of conditions under which these processes take place. In this paper, we propose a polynomial time algorithm to identify functionally highly correlated biclusters. Our algorithm identifies (1) gene sets that simultaneously exhibit additive, multiplicative, and combined patterns and allow high levels of noise, (2) multiple, possibly overlapped, and diverse gene sets, (3) biclusters that simultaneously exhibit negatively and positively correlated gene sets, and (4) gene sets for which the functional association is very high. We validate the level of functional association in our method by using the GO database, protein-protein interactions and KEGG pathways.  相似文献   

15.
Semantic-oriented service matching is one of the challenges in automatic Web service discovery. Service users may search for Web services using keywords and receive the matching services in terms of their functional profiles. A number of approaches to computing the semantic similarity between words have been developed to enhance the precision of matchmaking, which can be classified into ontology-based and corpus-based approaches. The ontology-based approaches commonly use the differentiated concept information provided by a large ontology for measuring lexical similarity with word sense disambiguation. Nevertheless, most of the ontologies are domain-special and limited to lexical coverage, which have a limited applicability. On the other hand, corpus-based approaches rely on the distributional statistics of context to represent per word as a vector and measure the distance of word vectors. However, the polysemous problem may lead to a low computational accuracy. In this paper, in order to augment the semantic information content in word vectors, we propose a multiple semantic fusion (MSF) model to generate sense-specific vector per word. In this model, various semantic properties of the general-purpose ontology WordNet are integrated to fine-tune the distributed word representations learned from corpus, in terms of vector combination strategies. The retrofitted word vectors are modeled as semantic vectors for estimating semantic similarity. The MSF model-based similarity measure is validated against other similarity measures on multiple benchmark datasets. Experimental results of word similarity evaluation indicate that our computational method can obtain higher correlation coefficient with human judgment in most cases. Moreover, the proposed similarity measure is demonstrated to improve the performance of Web service matchmaking based on a single semantic resource. Accordingly, our findings provide a new method and perspective to understand and represent lexical semantics.  相似文献   

16.
基于基因本体的语义相似度研究   总被引:3,自引:0,他引:3       下载免费PDF全文
魏韡  向阳  陈千 《计算机工程》2010,36(20):209-210
针对基因本体的有向无环图结构,提出一种新的计算基因本体中术语间语义相似度的方法。该方法通过计算2个术语的公共祖先及符合条件的不相交祖先,得到不相交祖先的信息量平均值和2个术语的信息量平均值,并将2个平均值的比值作为2个术语的语义相似度。实验结果证明该方法准确度较高。  相似文献   

17.
颜晶晶 《计算机应用》2011,31(7):1751-1755
提出一种基于本体的信息过滤方法。该方法通过本体实现形式化语义描述,并对原始输入条件进行带约束规则的本体语义扩展。进而为了实现语义匹配,给出了信息向量语义描述及权重计算方法。最终,实现基于语义相似度计算的信息过滤。实验证明,该方法是有效的。  相似文献   

18.
The estimation of semantic similarity between words is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modelled in an ontology have been proposed. However, in many domains, knowledge is dispersed through several partial and/or overlapping ontologies. Because most previous works on semantic similarity only support a unique input ontology, we propose a method to enable similarity estimation across multiple ontologies. Our method identifies different cases according to which ontology/ies input terms belong. We propose several heuristics to deal with each case, aiming to solve missing values, when partial knowledge is available, and to capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge. We evaluate and compare our method using several general purpose and biomedical benchmarks of word pairs whose similarity has been assessed by human experts, and several general purpose (WordNet) and biomedical ontologies (SNOMED CT and MeSH). Results show that our method is able to improve the accuracy of similarity estimation in comparison to single ontology approaches and against state of the art related works in multi-ontology similarity assessment.  相似文献   

19.
On ontology-driven document clustering using core semantic features   总被引:2,自引:1,他引:1  
Incorporating semantic knowledge from an ontology into document clustering is an important but challenging problem. While numerous methods have been developed, the value of using such an ontology is still not clear. We show in this paper that an ontology can be used to greatly reduce the number of features needed to do document clustering. Our hypothesis is that polysemous and synonymous nouns are both relatively prevalent and fundamentally important for document cluster formation. We show that nouns can be efficiently identified in documents and that this alone provides improved clustering. We next show the importance of the polysemous and synonymous nouns in clustering and develop a unique approach that allows us to measure the information gain in disambiguating these nouns in an unsupervised learning setting. In so doing, we can identify a core subset of semantic features that represent a text corpus. Empirical results show that by using core semantic features for clustering, one can reduce the number of features by 90% or more and still produce clusters that capture the main themes in a text corpus.  相似文献   

20.
一个基于语义元的相似度计算方法研究*   总被引:6,自引:1,他引:5  
针对已有相似性度量方法的局限与不足,对属性进行语义扩展,提出了基于语义元支持度的相似度计算方法,该方法用语义元表示概念内涵,在语义元中引入支持度来表现不同语义元对概念表示的贡献,综合考虑相关性、相似性、非对称性以及语义元的支持度,通过比较语义元的相似性,实现了概念相似性的度量。把关系作为一种特殊的概念进行关系的比较,得到了基于语义元的本体相似性度量。最后,将该方法与其他方法进行比较,验证了该方法的计算结果更具有合理性,同时也验证了该方法的有效性与正确性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号