首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
A set of words is factorially balanced if the set of all the factors of its words is balanced. We prove that if all words of a factorially balanced set have a finite index, then this set is a subset of the set of factors of a Sturmian word. Moreover, characterizing the set of factors of a given length n of a Sturmian word by the left special factor of length n−1 of this Sturmian word, we provide an enumeration formula for the number of sets of words that correspond to some set of factors of length n of a Sturmian word.  相似文献   

2.
Parikh matrices recently introduced have turned out to be a powerful tool in the arithmetizing of the theory of words. In particular, many inequalities between (scattered) subword occurrences have been obtained as consequences of the properties of the matrices. This paper continues the investigation of Parikh matrices and subword occurrences. In particular, we study certain inequalities, as well as information about subword occurrences sufficient to determine the whole word uniquely. Some algebraic considerations, facts about forbidden subwords, as well as some open problems are also included.  相似文献   

3.
A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that palindromic bispecial Sturmian words are precisely the maximal internal factors of primitive Christoffel words. We extend this result by showing that bispecial Sturmian words are precisely the maximal internal factors of all Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the language of Sturmian words.  相似文献   

4.
5.
The problem of reconstruction of a word from a set of its subwords is considered. It is assumed that the set is generated by unit shifts of a fixed window along an unknown word. For the problem without constrains on the unknown word, a method of reconstruction is proposed based on the search for Euler paths or Euler cycles in a de Bruijn multidigraph. The search is based on symbolic multiplication of adjacency matrices with special operations of multiplication and addition of edge names. The method makes it possible to find reconstructed words and the number of reconstructions.  相似文献   

6.
We introduce the notion of periodic-like word. It is a word whose longest repeated prefix is not right special. Some different characterizations of this concept are given. In particular, we show that a word w is periodic-like if and only if it has a period not larger than , where is the least non-negative integer such that any prefix of w of length $\geq R'_{w}$ is not right special. We derive that if a word w has two periods , then also the greatest common divisor of p andq is a period ofw. This result is, in fact, an improvement of the theorem of Fine and Wilf. We also prove that the minimal period of a word w is equal to the sum of the minimal periods of its components in a suitable canonical decomposition in periodic-like subwords. Moreover, we characterize periodic-like words having the same set of proper boxes, in terms of the important notion of root-conjugacy. Finally, some new uniqueness conditions for words, related to the maximal box theorem are given. Received: 10 July 2000 / Accepted: 24 January 2001  相似文献   

7.
量词在知识图中的分类与表示   总被引:3,自引:0,他引:3  
在当今知识表示领域中,知识图作为自然语言理解的语义模型有其独到之处,而在自然语言处理中普遍认为词是最基本的单位,本文从语义学和自然语言处理的角度(主要是从知识图的角度,)在对介词的逻辑词的研究之后,按照量词图的结构,对汉语中的量词进行了分类,并且按照知识量词构造,给一词图。  相似文献   

8.
基于子词的双层CRFs中文分词   总被引:3,自引:0,他引:3  
提出了基于子词的双层CRFs(conditional random fields)中文分词方法,旨在解决中文分词中切分歧义与未登录词的问题.该方法是建立在基于子词的序列标注模型上.方法第1层利用基于字CRFs模型来识别待测语料中的子词,这样做是为了减少子词的跨越标记错误和增加子词识别的精确率;第2层利用CRFs模型学习基于子词的序列标注,对第1层的输出进行测试,进而得到分词结果.在2006年SIGHAN Bakeoff的中文简体语料上进行了测试,包括UPUC和MSRA语料,分别在F值上达到了93.3%和96.1%的精度.实验表明,基于子词的双层CRFs模型能够更加有效地利用子词来提高中文分词的精度.  相似文献   

9.
This paper presents the application of a multi-scale paradigm to Chinese spoken document retrieval (SDR) for improving retrieval performance. Multi-scale refers to the use of both words and subwords for retrieval. Words are basic units in a language that carry lexical meaning, and subword units (such as phonemes, syllables or characters) are building components for words. Retrieval using subword indexing units is better than retrieval using words because of the robustness of subword units to out-of-vocabulary (OOV) words during speech recognition and ambiguities in word segmentation. Experimental results have demonstrated that subword bigrams can bring improvement in retrieval performance over words (~9.56%). Application of multi-scale fusion to SDR aims at combining the lexical information of words and the robustness of subwords. This work presents the first detailed investigation for a Cantonese broadcast news retrieval task using two different multi-scale fusion approaches: pre-retrieval fusion and post-retrieval fusion. Multi-scale retrieval using both words and syllable bigrams achieves improvement in retrieval performance (~1.90%) over retrieval on the composite scales.  相似文献   

10.
通过心理语言学的词汇判断实验范式来研究维吾尔语屈折词和派生词在大脑心理词典中的表征及存储形式。实验一是维吾尔语屈折词的表征及加工研究,实验二是维吾尔语派生词的表征及加工研究。实验一的行为实验结果揭示人脑加工维吾尔语屈折词时需要对它进行形态分析。实验二的行为数据却显示派生词和单语素词的加工过程是相同的。本文实验结果显示维吾尔语派生词、屈折词的加工形式是彼此独立和不同的过程,维吾尔语屈折词是分解加工,而派生词进行整体加工。  相似文献   

11.
We investigate the confluence property, that is, the property of a language to contain, for any two words of it, one which is bigger, with respect to a given quasi order on the respective free monoid, than each of the former two. This property is investigated mainly for regular and context-free languages. As a consequence of our study, we give an answer to an old open problem raised by Haines concerning the effective regularity of the sets of subwords. Namely, we prove that there are families with a decidable emptiness problem for which the regularity of the sets of subwords is not effective.  相似文献   

12.
微博话题随着移动互联网的发展变得火热起来,单个热门话题可能有数万条评论,微博话题的立场检测是针对某话题判断发言人对该话题的态度是支持的、反对的或中立的.本文一方面由Word2Vec训练语料库中每个词的词向量获取句子的语义信息,另一方面使用TextRank构建主题集作为话题的立场特征,同时结合情感词典获取句子的情感信息,最后将特征选择后的词向量使用支持向量机对其训练和预测完成最终的立场检测模型.实验表明基于主题词及情感词相结合的立场特征可以获得不错的立场检测效果.  相似文献   

13.
Finite automata operating on nested words were introduced by Alur and Madhusudan in 2006. While nested word automata retain many of the desirable properties of ordinary finite automata, there is no known efficient minimization algorithm for deterministic nested word automata and, interestingly, state complexity bounds for nested word automata turn out to differ significantly from the corresponding bounds for ordinary finite automata. Consequently lower bounds for the state complexity of nested word languages need to rely on fooling set type techniques. We discuss limitations of the techniques and show that, even in the deterministic case, the bounds given by the lower bound methods may be arbitrarily far away from the actual state complexity of the nested word language.  相似文献   

14.
Motivated by certain coding techniques for reliable DNA computing, we consider the problem of characterizing nontrivial languages D that are maximal with the property that D * is contained in the subword closure of a given set S of words of some fixed length k. This closure is simply the set of all words whose subwords of length k must be in S. We provide a deep structural characterization of these languages D, which leads to polynomial time algorithms for computing such languages.  相似文献   

15.
A sufficient condition is given for the kth power-freeness of the product of three kth power-free words. By means of this property we show that if u and w are respectively words of the left and of the right center of the language S of weakly square-free words over two letters, then there exist infinitely many words v such that uvw ∈ S. As a consequence we obtain that, given two words u, w ∈ S, it is effectively decidable whether there exists a third word v such that uvw ∈ S.  相似文献   

16.
17.
18.
自然语言处理中的逻辑词   总被引:4,自引:0,他引:4  
词是自然语言处理中最基本的单位,在当今知识表示领域,知识图作为自然语言理解的语义模型有其独到之处。本文从语言学和逻辑学的角度,首次提出并探讨了逻辑词研究逻辑词分类及如何用知识图表示各类逻辑词的结构。对自然语言处理中研究复句和篇章的理解提供了一种新的途径。  相似文献   

19.
中文粗分和歧义消解是中文分词的两大基本过程。通过引入广义词条和诱导词集,在最大匹配算法基础上提出一种中文分词的粗分方法,以最长广义词匹配为原则进行中文分词,利用诱导词集实现交叉型歧义识别。在保证快速准确切分无歧义汉语语句的同时,100%检测并标记有歧义汉语语句中的交叉型歧义,最大程度上简化后续歧义消解过程。通过对含有160万汉字1998年1月人民日报语料测试的结果证明了算法速度、歧义词准确率以及粗分召回率的有效性。  相似文献   

20.
We characterize all quasiperiodic Sturmian words: A Sturmian word is not quasiperiodic if and only if it is a Lyndon word. Moreover, we study links between Sturmian morphisms and quasiperiodicity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号