期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

叶硕褚钰王祎李田港《计算机技术与发展》2020,(3):181-186

智能语音技术包含语音识别、自然语言处理、语音合成三个方面的内容,其中语音识别是实现人机交互的关键技术,识别系统通常需要建立声学模型和语言模型。神经网络的兴起使声学模型数量急剧增加,基于神经网络的声学模型与传统识别模型相结合的方式,极大地推动了语音识别的发展。语音识别作为人机交互的前端,具有许多研究方向,文中着重对语音识别任务中的文本识别、说话人识别、情绪识别三个方向的声学模型研究现状进行归纳总结,尽可能对语音识别技术的演化进行细致介绍,为以后的相关研究提供有价值的参考。同时对目前语音识别的主流方法进行概括比较,介绍了端到端的语音识别模型的优势,并对发展趋势进行分析展望,最后提出当前语音识别任务中面临的挑战。相似文献

2.

基于深度学习和语法规约的需求文档命名实体识别

许梦笛王金华《计算机与现代化》2021,(1):105-110

命名实体识别是自然语言处理中的一个关键.在需求文档中存在过长的实体:虚功能,使得普适的传统命名实体识别方法无法有效地识别得到完整的实体.本文针对需求文档实体识别模型进行深入研究,引入深度学习方法,提出基于深度残差网络(ResNet)的CNER方法与基于规则的方法相结合,进行针对中文需求文档的分词.本文的命名实体识别模型... 相似文献

3.

The Hierarchical Hidden Markov Model: Analysis and Applications 总被引：20，自引：0，他引：20

Fine Shai Singer Yoram Tishby Naftali 《Machine Learning》1998,32(1):41-62

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting. 相似文献

4.

Exploratory analysis of concept and document spaces with connectionist networks

Dieter Merkl Erich Schweighoffer Werner Winiwarter 《Artificial Intelligence and Law》1999,7(2-3):185-209

Exploratory analysis is an area of increasing interest in the computational linguistics arena. Pragmatically speaking, exploratory analysis may be paraphrased as natural language processing by means of analyzing large corpora of text. Concerning the analysis, appropriate means are statistics, on the one hand, and artificial neural networks, on the other hand. As a challenging application area for exploratory analysis of text corpora we may certainly identify text databases, be it information retrieval or information filtering systems. With this paper we present recent findings of exploratory analysis based on both statistical and neural models applied to legal text corpora. Concerning the artificial neural networks, we rely on a model adhering to the unsupervised learning paradigm. This choice appears naturally when taking into account the specific properties of large text corpora where one is faced with the fact that input-output-mappings as required by supervised learning models cannot be provided beforehand to a satisfying extent. This is due to the fact of the highly changing contents of text archives. In a nutshell, artificial neural networks count for their highly robust behavior regarding the parameters for model optimization. In particular, we found statistical classification techniques much more susceptible to minor parameter variations than unsupervised artificial neural networks. In this paper we describe two different lines of research in exploratory analysis. First, we use the classification methods for concept analysis. The general goal is to uncover different meanings of one and the same natural language concept. A task that, obviously, is of specific importance during the creation of thesauri. As a convenient environment to present the results we selected the legal term of neutrality, which is a perfect representative of a concept having a number of highly divergent meanings. Second, we describe the classification methods in the setting of document classification. The ultimate goal in such an application is to uncover semantic similarities of various text documents in order to increase the efficiency of an information retrieval system. In this sense, document classification has its fixed position in information retrieval research from the very beginning. Nowadays renewed massive interest in document classification may be witnessed due to the appearance of large-scale digital libraries. 相似文献

5.

End-to-end scene text recognition using tree-structured models

Cunzhao ShiAuthor Vitae Chunheng WangBaihua XiaoAuthor Vitae Song GaoAuthor VitaeJinlong HuAuthor Vitae 《Pattern recognition》2014

Detecting and recognizing text in natural images are quite challenging and have received much attention from the computer vision community in recent years. In this paper, we propose a robust end-to-end scene text recognition method, which utilizes tree-structured character models and normalized pictorial structured word models. For each category of characters, we build a part-based tree-structured model (TSM) so as to make use of the character-specific structure information as well as the local appearance information. The TSM could detect each part of the character and recognize the unique structure as well, seamlessly combining character detection and recognition together. As the TSMs could accurately detect characters from complex background, for text localization, we apply TSMs for all the characters on the coarse text detection regions to eliminate the false positives and search the possible missing characters as well. While for word recognition, we propose a normalized pictorial structure (PS) framework to deal with the bias caused by words of different lengths. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods both for text localization and word recognition. 相似文献

6.

一种视频中字符的集成型切分与识别算法 总被引：3，自引：0，他引：3

杨武夷张树武《自动化学报》2010,36(10):1468-1476

视频文本行图像识别的技术难点主要来源于两个方面: 1)粘连字符的切分与识别问题; 2)复杂背景中字符的切分与识别问题. 为了能够同时切分和识别这两种情况中的字符, 提出了一种集成型的字符切分与识别算法. 该集成型算法首先对文本行图像二值化, 基于二值化的文本行图像的水平投影估计文本行高度. 其次根据字符笔划粘连的程度, 基于图像分析或字符识别对二值图像中的宽连通域进行切分. 然后基于字符识别组合连通域得到候选识别结果, 最后根据候选识别结果构造词图, 基于语言模型从词图中选出字符识别结果. 实验表明该集成型算法大大降低了粘连字符及复杂背景中字符的识别错误率. 相似文献

7.

开放集文字识别技术

下载免费PDF全文

杨春刘畅方治屿韩铮刘成林殷绪成《中国图象图形学报》2023,28(6):1767-1791

开放环境下的模式识别与文字识别应用中,新数据、新模式和新类别不断涌现,要求算法具备应对新类别模式的能力。针对这一问题,研究者们开始聚焦开放集文字识别（open-set text recognition,OSTR）任务。该任务要求,算法在测试（推断）阶段,既能识别训练集见过的文字类别,还能够识别、拒识或发现训练集未见过的新文字。开放集文字识别逐步成为文字识别领域的研究热点之一。本文首先对开放集模式识别技术进行简要总结,然后重点介绍开放集文字识别的研究背景、任务定义、基本概念、研究重点和技术难点。同时,针对开放集文字识别三大问题（未知样本发现、新类别识别和上下文信息偏差）,从方法的模型结构、特点优势和应用场景的角度对相关工作进行了综述。最后,对开放集文字识别技术的发展趋势和研究方向进行了分析展望。相似文献

8.

A language model using variable length tokens for open-vocabulary Hangul text recognition

Sungho Ryu^{Author Vitae} Jin Hyung Kim Author Vitae 《Pattern recognition》2004,37(7):1549-1552

We propose a novel language model for Hangul text recognition. Without relying on prior linguistic knowledge in training, the proposed model learns variable length Hangul character sequences, which comprise the elementary tokens of Korean language, and their probabilities from statistics of a raw text corpus. Experiments in handwritten Hangul recognition shows that the proposed language model is effective in postprocessing of recognition results. 相似文献

9.

Fast RF-UIC: A fast unsupervised image captioning model

《Displays》2023

相似文献

10.

面向特定领域的产品评价对象自动识别研究 总被引：2，自引：0，他引：2

宋晓雷王素格李红霞《中文信息学报》2010,24(1):89-94

产品评价对象的自动识别是文本观点信息抽取和倾向性分析中的重要研究课题之一。该文针对汽车评论,提出了一种不依赖外部资源的无指导评价对象自动识别方法。该方法首先综合使用词形模板和词性模板,采用模糊匹配方法和剪枝法抽取候选评价对象。然后,从候选对象集中,采用双向Bootstrapping方法识别出产品评价对象。最后,通过采用K均值聚类方法对产品评价对象进行聚类,实现从评价对象中自动抽取产品名称和产品属性。实验结果表明,该方法对产品评价对象识别的F值达到58.5%,产品名称识别的F值达到69.48%。
相似文献

11.

Computational personality recognition from Facebook text: psycholinguistic features,words and facets

Wesley R. dos Santos Ricelli M. S. Ramos 《New Review of Hypermedia and Multimedia》2019,25(4):268-287

ABSTRACT

Advances in the Natural Language Processing (NLP) and machine learning fields have led to the development of automated methods for the recognition of personality traits from text available from social media and similar sources. Systems of this kind exploit the close relation between lexical knowledge and personality models – such as the well-known Big Five model – to provide information about the author of an input text in a non-intrusive fashion, and at a low cost. Although now a well-established research topic in the field, the computational recognition of personality traits from text still leaves a number of research questions worth further exploration. In particular, this paper attempts to shed light on three main issues: (i) whether we may develop psycholinguistics-motivated models of personality recognition when such knowledge sources are not available for the target language under consideration; (ii) whether the use of psycholinguistic knowledge may be still superior to contemporary word vector representations; and (iii) whether we may infer certain personality facets from a corpus that does not explicitly convey this information. In this paper these issues are dealt with in a series of individual experiments of personality recognition from Facebook text, whose initial results should aid the future development of more robust systems of this kind. 相似文献

12.

Integrating natural language understanding with document structure analysis

Suzanne Liebowitz Taylor Deborah A. Dahl Mark Lipshutz Carl Weir Lewis M. Norton Roslyn Weidner Nilson Marcia C. Linebarger 《Artificial Intelligence Review》1994,8(2-3):255-276

Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDUS: text retrieval and automatic generation of hypertext links 相似文献

13.

Estimation of the fundamental matrix from uncalibrated stereo hand images for 3D hand gesture recognition

Xiaoming YinAuthor Vitae 《Pattern recognition》2003,36(3):567-584

All 3D hand models employed for hand gesture recognition so far use kinematic models of the hand. We propose to use computer vision models of the hand, and recover hand gestures using 3D reconstruction techniques. In this paper, we present a new method to estimate the epipolar geometry between two uncalibrated cameras from stereo hand images. We first segmented hand images using the RCE neural network based color segmentation algorithm and extracted edge points of fingers as points of interest, then match them based on the topological features of the hand. The fundamental matrix is estimated using a combination of techniques such as input data normalization, rank-2 constraint, linear criterion, nonlinear criterion as well as M-estimator. This method has been tested with real calibrated and uncalibrated images. The experimental comparison demonstrates the effectiveness and robustness of the method. 相似文献

14.

Effect of acoustic and linguistic contexts on human and machine speech recognition

《Computer Speech and Language》2014,28(3):769-787

We compared the performance of an automatic speech recognition system using n-gram language models, HMM acoustic models, as well as combinations of the two, with the word recognition performance of human subjects who either had access to only acoustic information, had information only about local linguistic context, or had access to a combination of both. All speech recordings used were taken from Japanese narration and spontaneous speech corpora.Humans have difficulty recognizing isolated words taken out of context, especially when taken from spontaneous speech, partly due to word-boundary coarticulation. Our recognition performance improves dramatically when one or two preceding words are added. Short words in Japanese mainly consist of post-positional particles (i.e. wa, ga, wo, ni, etc.), which are function words located just after content words such as nouns and verbs. So the predictability of short words is very high within the context of the one or two preceding words, and thus recognition of short words is drastically improved. Providing even more context further improves human prediction performance under text-only conditions (without acoustic signals). It also improves speech recognition, but the improvement is relatively small.Recognition experiments using an automatic speech recognizer were conducted under conditions almost identical to the experiments with humans. The performance of the acoustic models without any language model, or with only a unigram language model, were greatly inferior to human recognition performance with no context. In contrast, prediction performance using a trigram language model was superior or comparable to human performance when given a preceding and a succeeding word. These results suggest that we must improve our acoustic models rather than our language models to make automatic speech recognizers comparable to humans in recognition performance under conditions where the recognizer has limited linguistic context. 相似文献

15.

Unlimited vocabulary speech recognition with morph language models applied to Finnish 总被引：1，自引：0，他引：1

Teemu Hirsimki Mathias Creutz Vesa Siivola Mikko Kurimo Sami Virpioja Janne Pylkknen 《Computer Speech and Language》2006,20(4):515-541

相似文献

16.

Efficient mobile phone Chinese optical character recognition systems by use of heuristic fuzzy rules and bigram Markov language models

《Applied Soft Computing》2008,8(2):1005-1017

Statistical language models are very useful tools to improve the recognition accuracy of optical character recognition (OCR) systems. In previous systems, segmentation by maximum word matching, semantic class segmentation, or trigram language models have been used. However, these methods have some disadvantages, such as inaccuracies due to a preference for longer words (which may be erroneous), failure to recognize word dependencies, complex semantic training data segmentation, and a requirement of high memory.To overcome these problems, we propose a novel bigram Markov language model in this paper. This type of model does not have large word preferences and does not require semantically segmented training data. Furthermore, unlike trigram models, the memory requirement is small. Thus, the scheme is suitable for handheld and pocket computers, which are expected to be a major future application of text recognition systems.However, due to a simple language model, the bigram Markov model alone can introduce more errors. Hence in this paper, a novel algorithm combining bigram Markov language models with heuristic fuzzy rules is described. It is found that the recognition accuracy is improved through the use of the algorithm, and it is well suited to mobile and pocket computer applications, including as we will show in the experimental results, the ability to run on mobile phones.The main contribution of this paper is to show how fuzzy techniques as linguistic rules can be used to enhance the accuracy of a crisp recognition system, and still have low computational complexity. 相似文献

17.

表格识别技术研究进展

下载免费PDF全文

高良才李一博都林张新鹏朱子仪卢宁金连文黄永帅汤帜《中国图象图形学报》2022,27(6):1898-1917

表格广泛存在于科技文献、财务报表、报纸杂志等各类文档中,用于紧凑地存储和展现数据,蕴含着大量有用信息。表格识别是表格信息再利用的基础,具有重要的应用价值,也一直是模式识别领域的研究热点之一。随着深度学习的发展,针对表格识别的新研究和新方法纷纷涌现。然而,由于表格应用场景广泛、样式众多、图像质量参差不齐等因素,表格识别领域仍然存在着大量问题亟需解决。为了更好地总结前人工作,为后续研究提供支持,本文围绕表格区域检测、结构识别和内容识别等3个表格识别子任务,从传统方法、深度学习方法等方面,综述该领域国内外的发展历史和最新进展。梳理了表格识别相关数据集及评测标准,并基于主流数据集和标准,分别对表格区域检测、结构识别、表格信息抽取的典型方法进行了性能比较。然后,对比分析了国内相对于国外,在表格识别方面的研究进展与水平。最后,结合表格识别领域目前面临的主要困难与挑战,对未来的研究趋势和技术发展目标进行了展望。相似文献

18.

采用Transformer-CRF的中文电子病历命名实体识别

下载免费PDF全文

李博康晓东张华丽王亚鸽陈亚媛白放《计算机工程与应用》2020,56(5):153-159

命名实体识别是自然语言处理的基本任务之一。针对中文电子病历命名实体识别传统模型识别效果不佳的问题,提出一种完全基于注意力机制的神经网络模型。实验采用自建真实中文电子病历数据集并对数据集进行人工标注、分词等预处理;对Transformer模型进行训练优化,以提取文本特征;利用条件随机场对提取到的文本特征进行分类识别。为验证所提方法的有效性,将构建的Transformer-CRF神经网络模型与其他7种传统模型进行比较研究,实验采用精确率、召回率和[F1]值三个指标评估模型的识别性能。实验结果显示,在同一语料集下,Transformer-CRF模型对身体部位类的命名实体识别效果较好,[F1]值高达95.02%;且与其他7种传统模型相比,Transformer-CRF模型的精确率、召回率和[F1]值均较高,在一定程度上验证了所构建模型具有较好的识别性能。相似文献

19.

语音识别中统计与规则结合的语言模型 总被引：2，自引：1，他引：1

王轩王晓龙张凯《自动化学报》1999,25(3):309-315

在分析语音识别系统中,基于规则方法和统计方法的语言模型,提出了一种对规则进行量化的合成语言模型.该模型既避免了规则方法无法适应大规模真实文本处理的缺点, 同时也提高了统计模型处理远距离约束关系和语言递归现象的能力.合成语言模型使涵盖6 万词条的非特定人孤立词的语音识别系统的准确率比单独使用词的TRIGRAM模型提高了 4.9%(男声)和3.5%(女声). 相似文献

20.

GA, MR, FFNN, PNN and GMM based models for automatic text summarization

Mohamed Abdel Fattah Fuji Ren 《Computer Speech and Language》2009,23(1):126-144

This work proposes an approach to address the problem of improving content selection in automatic text summarization by using some statistical tools. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First, we investigate the effect of each sentence feature on the summarization task. Then we use all features in combination to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. Moreover, we use all feature parameters to train feed forward neural network (FFNN), probabilistic neural network (PNN) and Gaussian mixture model (GMM) in order to construct a text summarizer for each model. Furthermore, we use trained models by one language to test summarization performance in the other language. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 Arabic political articles and 100 English religious articles. The results of the proposed approach are promising, especially the GMM approach. 相似文献