首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
刘金岭 《计算机工程》2011,37(1):57-59,62
提出一种基于语义概念的海量中文短信文本聚类方法。该方法从短信文本出发,利用《现代汉语语义分类词典》的级类主题词,在短信文本向量集中提取概念元组,形成表示聚类结果的高层概念,基于这些高层概念进行样本划分,从而完成整个聚类过程。实验结果表明,该聚类算法有较好的聚类结果且执行效率较高。  相似文献   

2.
为解决分布式环境下消息分发系统中的按需通信,在对Gelemter元组空间模型进行改进的基础上,对消息分发系统中的元组空间通信进行了结构设计,定义了元组空间的特征模型,并基于局部性原理提出一种元组空间通信的空间分解算法.该算法依据在实际通信中不同元组不同元素的匹配频度的差异,将元组空间分解为依赖特征空间、特征元组和特征元素之间抽象关系的一组缓冲子空间,通信进程在进行匹配操作时可直接从缓冲子空间中获取匹配元组,从而降低通信的计算成本.  相似文献   

3.
包分类对于支持如防火墙、攻击检测、差分服务等网络应用有着重要的意义.研究人员对此做了大量研究.其中基于Srinivasan提出的元组空间思想的算法都存在着不能够通过预查找的方法直接定位匹配规则的元组的问题,因此此类算法的平均查找性能不稳定.针对两维包分类,提出了将元组划分为子元组的准则,满足准则的子元组可以根据3个独立的一维查找结果确定是否包含匹配规则,通过消除不必要的元组查找来提高查找速度和获得稳定的查找性能.  相似文献   

4.
《Advanced Robotics》2013,27(17):2189-2206
In this paper we propose a latent Dirichlet allocation (LDA)-based framework for multimodal categorization and words grounding by robots. The robot uses its physical embodiment to grasp and observe an object from various view points, as well as to listen to the sound during the observing period. This multimodal information is used for categorizing and forming multimodal concepts using multimodal LDA. At the same time, the words acquired during the observing period are connected to the related concepts, which are represented by the multimodal LDA. We also provide a relevance measure that encodes the degree of connection between words and modalities. The proposed algorithm is implemented on a robot platform and some experiments are carried out to evaluate the algorithm. We also demonstrate simple conversation between a user and the robot based on the learned model.  相似文献   

5.
In distributed systems, tuple space is one of the coordination models that significantly maximizes system performance against failure due to its space and time decoupling features. With the growing popularity of distributed computing and increasing complexity in the network, host and link failure occurs frequently, resulting in poor system performance. This article proposes a fault-tolerant model named Tuple Space Replication (TSR) for tuple space coordination in distributed environments. The model introduces a multi-agent system that consists of multiple hosts. Each host in a multi-agent system comprises an agent space with a tuple space for coordination. In this model, we introduce three novel fault-tolerant algorithms for tuple space primitives to provide coordination among hosts with tolerance to multiple links and hosts failure. The first algorithm is given for out() operation to insert tuples in the tuple space. The second algorithm is presented for rdp() operation to read any tuple from the tuple space. The third algorithm is given for inp() operation to delete or withdraw tuples from the tuple space. These algorithms use less number of messages to ensure consistency in the system. The message complexity of the proposed algorithms is analyzed and found O(n) for out(), O(1) for rdp(), and O(n) for inp() operations which is comparable and better than existing works, where n is the number of hosts. The testbed experiment reveals that the proposed TSR model gives performance improvement up to 88%, 70.94%, and 63.80% for out(), rdp(), and inp() operations compared to existing models such as FT-SHE, LBTS, DEPSPACE, and E-DEPSPACE.  相似文献   

6.
Linda is a language for programming parallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a version of Linda that addresses this problem by providing two major enhancements that facilitate the writing of fault-tolerant applications: stable tuple spaces and atomic execution of tuple space operations. The former is a type of stable storage in which tuple values are guaranteed to persist across failures, while the latter allows collections of tuple operations to be executed in an all-or-nothing fashion despite failures and concurrency. The design of these enhancements is presented in detail and illustrated by examples drawn from both the Linda and fault-tolerance domains. An implementation of FT-Linda for a network of workstations is also described. The design is based on replicating the contents of stable tuple spaces to provide failure resilience and then updating the copies using atomic multicast. This strategy allows an efficient implementation in which only a single multicast message is needed for each atomic collection of tuple space operations  相似文献   

7.
Streaming data are widely used in today’s world. Data come from different sources in streams and must be processed online and with minimum delay. These data stream can contain confidential data such as customers’ purchase information and need to be mined in order to reveal other useful information like customers’ purchase patterns. Privacy preservation throughout these processes plays a crucial role. K-anonymity is a well-known technique for preserving privacy. The principle issues in k-anonymity are information loss and running time. Although some of the existing k-anonymity techniques are able to generate anonymized data with acceptable information loss, their main drawback is that they are very time-consuming and are not applicable in a streaming context since streaming data are usually very sensitive to delay and need to be processed quite fast. In [32], we proposed a cluster-based k-anonymity algorithm called fast anonymizing algorithm for numerical streaming data (FAANST) which can anonymize numerical streaming data quite fast while providing an admissible information loss. The main drawback of FAANST is that some tuples may remain in the system for a long time and are output when they might be considered to have expired. In this paper, we propose two extensions for FAANST, passive and proactive solutions. These two solutions put a soft deadline, called $delay$ , on the time each tuple can stay in the system, and if a tuple passes this deadline, these algorithms force the tuple to be output. The proactive solution goes even one step further and utilizes a simple heuristic function to predict when a tuple in the system may expire and outputs the tuple if it will expire in the next round of the algorithm’s execution.  相似文献   

8.
大数据流式计算平台Apache Storm默认采用轮询的方式进行任务调度,未考虑到拓扑中各任务计算开销的差异以及任务之间不同类型的通信模式,在负载均衡和通信开销方面存在较大的优化空间。针对这一问题,提出一种Storm环境下基于权重的任务调度算法(TSAW-Storm)。该算法首先根据各任务的CPU资源占用情况以及任务间的数据流大小,分别确定拓扑的点权和边权;并利用最大化边权增益的思想,逐步构建起各工作节点中承载的任务集合,在保证集群负载均衡的同时,尽可能将边权较大的节点间数据流转化为节点内数据流,从而降低网络传输开销。实验结果表明,在包含有8个工作节点的WordCount基准测试中,TSAW-Storm的系统延迟和节点间数据流大小相比Storm默认调度算法分别降低了30.0%和32.9%,且各工作节点的CPU负载标准差仅为Storm默认调度算法的25.8%;此外,在与在线调度算法的对比实验中,TSAW-Storm在系统延迟、节点间数据流大小和CPU负载标准差方面分别降低了7.76%、11.8%和5.93%,且算法的执行开销明显降低,有效提高了Storm系统的运行效率。  相似文献   

9.
区间约束及其代数查询语言   总被引:6,自引:0,他引:6  
提出了区间约束和基于区间约束的代数查询语言。区间约束与密序约束相经,增加了简单的加减运算,具有更强的描述能力,同时区间约束元组有简洁,唯一的规范区间表示,文中给出了计算区间约束的规范区间表示的算法,针对区间约束关系,定义了基本代数操作的语法及语义,研究了代数查询语言,并证明代数约束查询语言满足封闭性,最后讨论了区间约束的实现与应用。  相似文献   

10.
11.
基于依存句法“动词配价”原理与组块的概念,提出以情感依存元组(EDT)作为中文情感表达的基本单位。它以句中能承载情感的几类实词作为中心词,修饰词依附于中心词,程度词和否定词依附于中心词和修饰词。该文对句子进行句法分析,在句法树和依赖关系中按规则提取情感依存元组,建立简单句情感依存元组判别模型计算情感倾向性。针对COAE2014评测公布的网络新闻语料,将该方法分别与有监督分类算法(KNN、SVM)和半监督算法(K-means)进行实验对比。结果表明,基于EDT的情感分类性能与有监督的机器学习算法相当,远高于半监督的聚类算法。  相似文献   

12.
基于语义的高质量中文短信文本聚类算法   总被引:13,自引:5,他引:8       下载免费PDF全文
刘金岭 《计算机工程》2009,35(10):201-202
现有数据聚类方法在处理文本数据时,没有考虑词之间潜在的相似信息,导致聚类效果不理想。针对中文短信文本聚类提出一种基于语义的聚类算法。给出中文概念、词和中文短信文本的相似度度量方法,通过向下连锁裂变和向上两两归并完成中文短信文本聚类。实验结果表明,该算法的聚类质量高于传统算法。  相似文献   

13.
We consider Constraint Satisfaction Problems in which constraints can be initially incomplete, where it is unknown whether certain tuples satisfy the constraint or not. We assume that we can determine the satisfaction of such an unknown tuple, i.e., find out whether this tuple is in the constraint or not, but doing so incurs a known cost, which may vary between tuples. We also assume that we know the probability of an unknown tuple satisfying a constraint. We define algorithms for this problem, based on backtracking search. Specifically, we consider a simple iterative algorithm based on a cost limit on the unknowns that may be determined, and a more complex algorithm that delays determining an unknown in order to estimate better whether doing so is worthwhile. We show experimentally that the more sophisticated algorithms can greatly reduce the average cost.  相似文献   

14.
The performance of database operations can be enhanced with an efficient storage structure design using attribute partitioning and/or tuple clustering. Previous research deals mostly with attribute partitioning. We address here the combined problem of attribute partitioning and tuple clustering. We propose a novel approach for this mixed fragmentation problem by applying a genetic algorithm iteratively to attribute partitioning and tuple clustering sub-problems. We compared our results to attribute-only partitioning and random search solution, resulting in a database access cost reduction of upto 70% and 67% respectively. We analyzed the effect of varying genetic parameters on the optimal solution through experimentation.  相似文献   

15.
Lexical stress is primarily important to generate a correct pronunciation of words in many languages; hence its correct placement is a major task in prosody prediction and generation for high-quality TTS (text-to-speech) synthesis systems. This paper proposes a statistical approach to lexical stress assignment for TTS synthesis in Romanian. The method is essentially based on n-gram language models at character level, and uses a modified Katz backoff smoothing technique to solve the problem of data sparseness during training. Monosyllabic words are considered as not carrying stress, and are separated by an automatic syllabification algorithm. A maximum accuracy of 99.11% was obtained on a test corpus of about 47,000 words.  相似文献   

16.
基于决策树的汉语未登录词识别   总被引:13,自引:0,他引:13  
未登录词识别是汉语分词处理中的一个难点。在大规模中文文本的自动分词处理中,未登录词是造成分词错识误的一个重要原因。本文首先把未登录词识别问题看成一种分类问题。即分词程序处理后产生的分词碎片分为‘合’(合成未登录词)和‘分’(分为两单字词)两类。然后用决策树的方法来解决这个分类的问题。从语料库及现代汉语语素数据库中共统计出六类知识:前字前位成词概率、后字后位成词概率、前字自由度、后字自由度、互信息、单字词共现概率。用这些知识作为属性构建了训练集。最后用C4.5算法生成了决策树。在分词程序已经识别出一定数量的未登录词而仍有分词碎片情况下使用该方法,开放测试的召回率:69.42%,正确率:40.41%。实验结果表明,基于决策树的未登录词识别是一种值得继续探讨的方法。  相似文献   

17.
The authors present an extension to the work of I. Suzuki and T. Kasami (see Proc. 3rd Int. Conf. Distributed Compact Syst., p.365-70 (1982)), where a mutual exclusion algorithm uses a message called a token to transfer the privilege of entering a critical region among the participating sites. The proposed algorithm checks whether the token is lost during network failure, and regenerates it if necessary. The mutual exclusion requirement is satisfied by guaranteeing regeneration of only one token in the network. Failures in a computer network are classified into three types: processor failure, communication controller failure, and communication link failure. To detect failures, a time-out mechanism based on message delay is used. The execution of the algorithm is described for each type of failure; each site follows a rather simple execution procedure. Each site is not required to observe the failure of other sites or communication links  相似文献   

18.
基于遗传算法的汉语未登录词识别   总被引:1,自引:0,他引:1  
针对汉语分词处理中未登录词识别这一难点,提出一种应用遗传算法识别的新方法.该方法扩大了分词碎片容量,将未登录词识别问题看成二值分类问题,即在预处理后产生的分词碎片中,单字存在"可组合"和"不可组合"两类,采用遗传算法先将分词碎片中的单字词确定下来,然后将其余相邻单字组合,完成未登录词识别.实验结果表明,该方法可有效地解决未登录词识别问题,提高未登录词识别的精确率和召回率.  相似文献   

19.
We introduce inductive definitions over language expressions as a framework for specifying tree tuple languages. Inductive definitions and their sub-classes correspond naturally to classes of logic programs, and operations on tree tuple languages correspond to the transformation of logic programs. We present an algorithm based on unfolding and definition introduction that is able to deal with several classes of tuple languages in a uniform way. Termination proofs for clause classes translate directly to closure properties of tuple languages, leading to new decidability and computability results for the latter.  相似文献   

20.
The effects of message type (navigation, E-mail, news story), voice type (text-to-speech, natural human speech), and earcon cueing (present, absent) on message comprehension and driving performance were examined. Twenty-four licensed drivers (12 under 30, 12 over 65, both equally divided by gender) participated in the experiment. They drove the UMTRI driving simulator on a road consisting of straight sections and constant radius curves, thus yielding two levels of low driving-workload. In addition, as a control condition, data were collected while participants were parked. In all conditions, participants were presented with three types of messages. Each message was immediately followed by a series of questions to assess comprehension. Navigation messages were about 4 seconds long (about 9 words). E-mail messages were about 40 seconds long (about 100 words) and news messages were about 80 seconds long (about 225 words). For all message types, comprehension of text-to-speech messages, as determined by accuracy of response to questions, and by subjective ratings, was significantly worse than comprehension of natural speech (79 versus 83 percent correct answers; 7.7/10 versus 8.6/10 subjective rating). Driving workload did not affect comprehension. Interestingly, neither the speech used (synthesized or natural) nor the message type (navigation, E-mail, news) had a significant effect on basic driving performance measured by the standard deviations of lateral lane position and steering wheel angle.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号