首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A frame is a square uu, where u is an unbordered word. Let F(n) denote the maximum number of distinct frames in a binary word of length n. We count this number for small values of n and show that F(n) is at most ⌊n/2⌋+8 for all n and greater than 7n/30−? for any positive ? and infinitely many n. We also show that Fibonacci words, which are known to contain plenty of distinct squares, have only a few frames. Moreover, by modifying the Thue-Morse word, we prove that the minimum number of occurrences of frames in a word of length n is ⌈n/2⌉−2.  相似文献   

2.
We say that a partial word w over an alphabet A is square-free if every factor xx of w such that x and x are compatible is either of the form ?a or a? where ? is a hole and aA. We prove that there exist uncountably many square-free partial words over a ternary alphabet with an infinite number of holes.  相似文献   

3.
In real speech, not like lexical words (LWs), prosodic words (PWs) are basic rhythmic units. The naturalness of a Text-to-Speech (TTS) system is directly influenced by the segmentation of the PWs. Most of the PWs are the combination of several LWs. In this paper, three Lexical Combination Models are proposed to combine LWs into PWs, including a Directed Acyclic Graph Model, a Segmentation Model and a Markov Model (MM). To cope with the situation where some long LWs should be segmented into two or more PWs, a Lexical Split Model (LSM) is applied to the long LWs. Experimental results prove that relatively constant results with various training data can be obtained from a MM. The Transformation-Based Error Driven Learning (TBED) algorithm, for its high performance of individual property, is applied in combination with the MM to improve the precision of PW segmentation. Experiments show that among the three proposed models, the MM combined with TBED and LSM, leads to the best performance, in which a precision of 93.00% and a recall of 93.23% are achieved. The perception test indicates that by using PWs as the lowest prosodic units a speech sounds more natural and acceptable than by using LWs. This paper is supported by NSFC Project (60503071); 973 Natural Basic Research Program of China (2004CB318102); Postdoctor Science Foundation of P. R. China (20070420275).  相似文献   

4.
Using the geometric dual technique by Berstel and Pocchiola, we give a uniform O(n3)O(n3) upper bound for the arithmetical complexity of a Sturmian word. We also give explicit expressions for the arithmetical complexity of Sturmian words of slope between 1/3 and 2/3 (in particular, of the Fibonacci word). In this case, the difference between the genuine arithmetical complexity function and our upper bound is bounded, and ultimately 2-periodic. In fact, our formula is valid not only for Sturmian words but for rotation words from a wider class.  相似文献   

5.
面向倾向性分析的基于词聚类的基准词选择方法*   总被引:1,自引:0,他引:1  
现有的基准词选择方法存在着随机性和主观性的缺陷,提出了一种基于词聚类的基准词的选择方法:从目标领域本体中选出一组初始种子词进行扩展,聚类得出二代种子词,对二代种子词再进行扩展、聚类,依次迭代直至得到最优的聚类种子词,并作为最终选取的基准词。实验结果表明该方法提取的基准词在词的情感倾向分类中具有较高的准确率。  相似文献   

6.
We prove that any recognizable set of infinite words is the infinite behaviour of some finite codeterministic automaton.  相似文献   

7.
余敦辉  张笑笑  付聪  张万山 《计算机应用研究》2020,37(5):1395-1399,1405
针对网络中敏感词变形体识别效率不高的问题,提出了基于决策树的敏感词变形体识别算法。首先,通过分析汉字的结构和读音等特征,研究敏感词及变形体;其次,基于敏感词库构建敏感词决策树;最后,通过多因子改进模型,对微博等新媒体的文本敏感程度进行计算。实验结果表明,该算法在识别中文敏感词及变形体时,查全率和查准率最高分别可达95%和94%,与基于确定有穷自动机的改进算法相比,查全率和查准率分别提高了19.8%和21.1%;与敏感信息决策树信息过滤算法相比,查全率和查准率分别提高17.9%和18.1% 。通过分析,该算法对敏感词变形体的识别和自动过滤是有效的。  相似文献   

8.
For each nonempty binary word w=c1c2cq, where ci{0,1}, the nonnegative integer ∑i=1q (q+1−i)ci is called the moment of w and is denoted by M(w). Let [w] denote the conjugacy class of w. Define M([w])={M(u): u[w]}, N(w)={M(u)−M(w): u[w]} and δ(w)=max{M(u)−M(v): u,v[w]}. Using these objects, we obtain equivalent conditions for a binary word to be an -word (respectively, a power of an -word). For instance, we prove that the following statements are equivalent for any binary word w with |w|2: (a) w is an -word, (b) δ(w)=|w|−1, (c) w is a cyclic balanced primitive word, (d) M([w]) is a set of |w| consecutive positive integers, (e) N(w) is a set of |w| consecutive integers and 0N(w), (f) w is primitive and [w]St.  相似文献   

9.
基于论坛语料识别中文未登录词的方法   总被引:2,自引:1,他引:1  
为解决中文分词中未登录词识别效率低的问题,提出了基于论坛语料识别中文未登录词的新方法.利用网络蜘蛛下载论坛网页构建一个语料库,并对该语料库进行周期性的更新以获取具备较强时效性的语料;利用构造出的新统计量MD(由Mutual Information函数和Duplicated Combination Frequency函数构造)对语料库进行分词产生候选词表;最后通过对比候选词表与原始词表发现未登录词,并将识别出的未登陆词扩充到词库中.实验结果表明,该方法可以有效提高未登录词的识别效率.  相似文献   

10.
随着专利申请数量的快速增长,对专利文本实现自动分类的需求与日俱增。现有的专利文本分类算法大都采用Word2vec和全局词向量(GloVe)等方式获取文本的词向量表示,舍弃了大量词语的位置信息且不能表示出文本的完整语义。针对上述问题,提出了一种结合ALBERT和双向门控循环单元(BiGRU)的多层级专利文本分类模型ALBERT-BiGRU。该模型使用ALBERT预训练的动态词向量代替传统Word2vec等方式训练的静态词向量,提升了词向量的表征能力;并使用BiGRU神经网络模型进行训练,最大限度保留了专利文本中长距离词之间的语义关联。在国家信息中心公布的专利数据集上进行有效性验证,与Word2vec-BiGRU和GloVe-BiGRU相比,ALBERT-BiGRU的准确率在专利文本的部级别分别提高了9.1个百分点和10.9个百分点,在大类级别分别提高了9.5个百分点和11.2个百分点。实验结果表明,ALBERT-BiGRU能有效提升不同层级专利文本的分类效果。  相似文献   

11.
Igor  Matthew W.   《Neurocomputing》2008,71(7-9):1172-1179
As potential candidates for explaining human cognition, connectionist models of sentence processing must demonstrate their ability to behave systematically, generalizing from a small training set. It has recently been shown that simple recurrent networks and, to a greater extent, echo-state networks possess some ability to generalize in artificial language learning tasks. We investigate this capacity for a recently introduced model that consists of separately trained modules: a recursive self-organizing module for learning temporal context representations and a feedforward two-layer perceptron module for next-word prediction. We show that the performance of this architecture is comparable with echo-state networks. Taken together, these results weaken the criticism of connectionist approaches, showing that various general recursive connectionist architectures share the potential of behaving systematically.  相似文献   

12.
This paper considers statistical analysis of recurrent event data when there exist observation gaps. By observation gaps, we mean that some study subjects are out of the study for a period of time for various reasons and then are back in the study again and this may happen more than once. Most of existing studies of recurrent events discuss situations where study subjects are under observation over continuous time periods. For recurrent event data with observation gaps, a naive analysis method is to treat them as usual recurrent events without gaps by either censoring observations at times when subjects first leave the study or ignoring the gaps. As expected and shown below, this could yield biased and misleading results. In this paper, we present some appropriate methods for the problem. In particular, we consider estimation of the underlying mean function and regression analysis of recurrent event data in the presence of observation gaps. The presented analysis methods are evaluated and compared to the naive approach that ignores observation gaps using extensive simulation studies and an example.  相似文献   

13.
In this paper, the synchronization control of a general class of memristor-based recurrent neural networks with time delays is investigated. A delay-dependent feedback controller is derived to achieve the exponential synchronization based on the drive-response concept, linear matrix inequalities (LMIs) and Lyapunov functional method. Finally, a numerical example is given to illustrate the derived theoretical results.  相似文献   

14.
In this paper,we present a technique for ensuring the stability of a large class of adaptively controlled systems.We combine IQC models of both the controlled system and the controller with a method of filtering control parameter updates to ensure stable behavior of the controlled system under adaptation of the controller.We present a specific application to a system that uses recurrent neural networks adapted via reinforcement learning techniques.The work presented extends earlier works on stable reinforcement learning with neural networks.Specifically,we apply an improved IQC analysis for RNNs with time-varying weights and evaluate the approach on more complex control system.  相似文献   

15.
在垃圾邮件过滤中,考虑到特征词对合法邮件和垃圾邮件分类贡献的不同,通过定义分类贡献比系数,将特征词分类贡献的思想应用到特征选择和朴素贝叶斯过滤器的设计中,在英文语料库上进行实验,实验结果表明,应用特征词分类贡献的垃圾邮件过滤方法可以有效提高过滤器对合法邮件和垃圾邮件的识别能力,降低过滤器对合法邮件和垃圾邮件的误判率。  相似文献   

16.
情感评价词典在情感分析中具有非常重要的作用,在新词频发的网络环境中,识别新的情感评价词,完善现有的情感词典是非常有必要的。使用基于模式的Bootstrapping方法,在微博语料中抽取情感评价词。实验证明,在保持了较理想的精确率的情况下,上述方法抽取了数量可观的传统情感词典未收录的情感评价词。  相似文献   

17.
孟曌  田生伟  禹龙  王瑞锦 《计算机应用》2019,39(8):2450-2455
为提高对文本语境深层次信息的利用效率,提出了联合分层注意力网络(HAN)和独立循环神经网络(IndRNN)的地域欺凌文本识别模型——HACBI。首先,将手工标注的地域欺凌文本通过词嵌入技术映射到低维向量空间中;其次,借助卷积神经网络(CNN)和双向长短期记忆网络(BiLSTM)提取地域欺凌文本的局部及全局语义特征,并进一步利用HAN捕获文本的内部结构信息;最后,为避免文本层次结构信息丢失和解决梯度消失等问题,引入IndRNN以增强模型的描述能力,并实现信息流的整合。实验结果表明,该模型的准确率(Acc)、精确率(P)、召回率(R)、F1和AUC值分别为99.57%、98.54%、99.02%、98.78%和99.35%,相比支持向量机(SVM)、CNN等文本分类模型有显著提升。  相似文献   

18.
Wongyu  Seong-Whan  Jin H. 《Pattern recognition》1995,28(12):1941-1953
In this paper, a new method for modeling and recognizing cursive words with hidden Markov models (HMM) is presented. In the proposed method, a sequence of thin fixed-width vertical frames are extracted from the image, capturing the local features of the handwriting. By quantizing the feature vectors of each frame, the input word image is represented as a Markov chain of discrete symbols. A handwritten word is regarded as a sequence of characters and optional ligatures. Hence, the ligatures are also explicitly modeled. With this view, an interconnection network of character and ligature HMMs is constructed to model words of indefinite length. This model can ideally describe any form of handwritten words, including discretely spaced words, pure cursive words and unconstrained words of mixed styles. Experiments have been conducted with a standard database to evaluate the performance of the overall scheme. The performance of various search strategies based on the forward and backward score has been compared. Experiments on the use of a preclassifier based on global features show that this approach may be useful for even large-vocabulary recognition tasks.  相似文献   

19.
When human experts express their ideas and thoughts, human words are basically employed in these expressions. That is, the experts with much professional experiences are capable of making assessment using their intuition and experiences. The measurements and interpretation of characteristics are taken with uncertainty, because most measured characteristics, analytical result, and field data can be interpreted only intuitively by experts. In such cases, judgments may be expressed using linguistic terms by experts. The difficulty in the direct measurement of certain characteristics makes the estimation of these characteristics imprecise. Such measurements may be dealt with the use of fuzzy set theory. As Professor L. A. Zadeh has placed the stress on the importance of the computation with words, fuzzy sets can take a central role in handling words [12, 13]. In this perspective fuzzy logic approach is offten thought as the main and only useful tool to deal with human words. In this paper we intend to present another approach to handle human words instead of fuzzy reasoning. That is, fuzzy regression analysis enables us treat the computation with words. In order to process linguistic variables, we define the vocabulary translation and vocabulary matching which convert linguistic expressions into membership functions on the interval [0–1] on the basis of a linguistic dictionary, and vice versa. We employ fuzzy regression analysis in order to deal with the assessment process of experts from linguistic variables of features and characteristics of an objective into the linguistic expression of the total assessment. The presented process consists of four portions: (1) vocabulary translation, (2) estimation, (3) vocabulary matching and (4) dictionary. We employed fuzzy quantification theory type 2 for estimating the total assessment in terms of linguistic structural attributes which are obtained from an expert.This research was supported in part by Grant-in Aid for Scientific Research(C-2); Grant No.11680459 of Ministry of Education of Science, Sports and Culture.  相似文献   

20.
针对中文关系抽取中分词时引起的边界切分出错而造成的歧义问题,以及出现实体对重叠不能提取出所涉及的多个关系问题,提出一种基于字词混合的联合抽取方法.首先,对于分词边界问题,嵌入层在词向量的基础上结合字向量,并且增加位置信息来保证字与字之间的正确顺序.其次,模型引入混合扩张卷积网络进行不同粒度、更远距离的特征提取.最后,采用分层标注方法,通过得到的主实体信息标记对应的关系和客实体,每个主实体可对应多个关系和客实体.与其他关系抽取方法在相同中文数据集上进行实验对比,实验结果表明,该方法的抽取效果最佳,并且也表现出更好的稳定性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号