共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
为了快速获取网络文本中主题内容和情感信息,提出了文本情感文摘的概念,同时提出了一种基于条件随机场模型的情感文摘提取方法.首先提取文本中的句子长度、提示词以及情感词语作为基本特征,同时应用浅层狄利赫雷分配的主题模型,分析文本潜在主题信息,提取主题特征,将这两类特征同时应用到条件随机场模型中,从而获取文本的情感文摘.实验结果表明,该方法细腻刻画了文本的主题信息,同时考虑了文本主题的情感色彩,文摘提取效果较理想,能满足用户的实际需要. 相似文献
3.
在中文文本分类任务中,针对重要特征在中文文本中位置分布分散、稀疏的问题,以及不同文本特征对文本类别识别贡献不同的问题,该文提出一种基于语义理解的注意力神经网络、长短期记忆网络(LSTM)与卷积神经网络(CNN)的多元特征融合中文文本分类模型(3CLA)。模型首先通过文本预处理将中文文本分词、向量化。然后,通过嵌入层分别经过CNN通路、LSTM通路和注意力算法模型通路以提取不同层次、具有不同特点的文本特征。最终,文本特征经融合层融合后,由softmax分类器进行分类。基于中文语料进行了文本分类实验。实验结果表明,相较于CNN结构模型与LSTM结构模型,提出的算法模型对中文文本类别的识别能力最多提升约8%。 相似文献
4.
在分析了文本中重要事件识别和文本分类方法的基础之上,提出了一种基于重要事件的文本分类方法.重点研究了该方法涉及到的两个关键技术:以重要事件表示文本和获取文本类别的模板.在中文事件语料CEC上,使用本文介绍的文本分类方法得到的平均准确率达到80%,而使用传统的以词为特征的文本分类方法得到的平均准确率为72%. 相似文献
5.
Wonjun Kim Changick Kim 《IEEE transactions on image processing》2009,18(2):401-411
Overlay text brings important semantic clues in video content analysis such as video information retrieval and summarization, since the content of the scene or the editor's intention can be well represented by using inserted text. Most of the previous approaches to extracting overlay text from videos are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to detect and extract the overlay text from the video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background, a transition map is first generated. Then candidate regions are extracted by a reshaping method and the overlay text regions are determined based on the occurrence of overlay text in each candidate. The detected overlay text regions are localized accurately using the projection of overlay text pixels in the transition map and the text extraction is finally conducted. The proposed method is robust to different character size, position, contrast, and color. It is also language independent. Overlay text region update between frames is also employed to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method. 相似文献
6.
针对利用抽象语义(AMR)图来预测摘要子图存在的语义结构不完整问题,该文提出一种基于整数线性规划(ILP)重构AMR图结构的语义摘要算法。首先将数据预处理生成一个AMR总图;然后基于统计特征从AMR总图中抽取出摘要子图重要节点信息;最后利用ILP的方法来对摘要子图中节点关系进行重构,利用完整的摘要子图恢复生成语义摘要。实验结果表明,相比其他语义摘要方法,所提方法的ROUGE值和Smatch值都有显著提高,最多分别提高了9%和14%,该方法有利于提高语义摘要的质量。 相似文献
7.
一种基于N-gram模型和机器学习的汉语分词算法 总被引:6,自引:0,他引:6
汉语的自动分词,是计算机中文信息处理领域中一个基础而困难的课题。该文提出了一种将汉语文本句子切分成词的新方法,这种方法以N-gram模型为基础,并结合有效的Viterbi搜索算法来实现汉语句子的切词。由于采用了基于机器学习的自组词算法,无需人工编制领域词典。该文还讨论了评价分词算法的两个定量指标,即查准率和查全率的定义,在此基础上,用封闭语料库和开放语料库对该文提出的汉语分词模型进行了实验测试,表明该模型和算法具有较高的查准率和查全率。 相似文献
8.
9.
视频数据中的文本是视频语义理解和检索的重要信息来源.文中对视频中文本的检测、定位、提取、增强和识别进行了研究.提出了应用小波模极大值算法检测视频帧文本所在的位置,用由粗到精的多层定位方法以及金字塔模型,对于多尺度的静止和滚动中英文文字进行提取,最后对文本区域进行二值化.实验表明文中方法取得了良好的效果. 相似文献
10.
11.
短文本相似度计算在社会网络、文本挖掘和自然语言处理等领域中起着至关重要的作用.针对短文本内容简短、特征稀疏等特点,以及传统的短文本相似度计算忽略类别信息等问题,提出一种融合耦合距离区分度和强类别特征的短文本相似度计算方法.一方面,在整个短文本语料库中利用两个共现词之间的距离计算词项共现距离相关度,并以此来对词项加权从而捕获词项间内联和外联关系,得到短文本的耦合距离区分度相似度;另一方面,基于少量带类别标签的监督数据提取每类中强类别区分能力的特征项作为强类别特征集合,并利用词项的上下文来对强类别特征语义消歧,然后基于文本间包含相同类别的强类别特征数量来衡量文本间的相似度.最后,本文结合耦合距离区分度和强类别特征来衡量短文本的相似度.经实验证明本文提出的方法能够提高短文本相似度计算的准确率. 相似文献
12.
13.
Laurence Danlos 《电信纪事》1989,44(1-2):101-110
There exist two main models of translation system : 1) a transfer model which includes a representation of the text in the source language and a transfer module that changes this representation into a representation of the next in the target language, 2) a pivot model which includes a single representation shared by the texts in the source and target languages. The criteria used to choose one of these two models will be described. Then, a transfer system, Eurotra, will be presented in detail. Eurotra, which is an R and D project develop-ped within the European Economic Community, involves the nine official languages of the eec. The large number of transfer modules (i.e. 72) requires that they be simplified as much as possible. This entails to design a syntactico-semantic representation which is abstract enough to reduce transfer to lexical transfer in a significant number of cases. 相似文献
14.
电路课程双语教学中对电气工程类外语教材建设的探索与构想 总被引:10,自引:2,他引:8
本文将近年来国内外出版的各种电路教材的总体突出特征在五个方面作出了比较,并对所使用的原版电路教材“Fundamentals of Electric Circuits”作出了简评.基于电路课程双语教学中对原版教材作出的一些探索性改革实践,提出了自主编写我国电气工程类外语教材的构想。 相似文献
15.
16.
The named entity extraction task aims to extract entity mentions from the unstructured text, including names of people, places, institutions and so on. It plays an important role in many Natural language processing (NLP) tasks, such as knowledge bases construction, automatic question answering system and information extraction. Most of the existing entity extraction studies are based on the long text data, which are easier to annotate due to the sufficient contextual information. Extracting entities from short texts such as search queries, conversations is still a challenging task. This paper proposes a dual pointer approach for entity mention extraction, it extracts one entities by two position pointers of the input sentence. The end-to-end deep neural networks model based on the proposed approach can extract the entities by serially generating the dual pointers. The evaluation results on the Chinese public dataset show that the model achieves the state-of-the-art results over the baseline models. 相似文献
17.
18.
A Multi-document Rhetorical Structure (MRS) is proposed for multi-document automatic summarization task. In this structure, interrelationship between text units, including the correlation between units calculated by hierarchical topic tree, the rhetorical relationship and temporal relationship, were represented at different levels of granularity. MRS simplified traditional multi-document representation in cross structure theory and supplement change and distribution information of events topics which cannot be obtained in information fusion theory. Concretely, a series of algorithms including building MRS, multi-document information fusion based MRS and summarization generation are proposed. The capability of concurrently fuse multiple knowledge sources of MRS strategies is testified by sets of experiments and shows good result. 相似文献
19.
20.
分析英语单词音节计数算法,探索提高音节计数准确率的方法.测试结果表明:根据单词的形态特征增加匹配规则能够提升计数的准确率.如果再添加音节信息词典的支持,可继续提升计算的准确率和速度.音节计数改进算法可用于设计测量英语文本难度的程序,为客观高效地选择教学或训练用英语文本提供参考.该算法还适合用于其它与音节有关的英语词汇量化分析. 相似文献