首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
2.
3.
There are three factors involved in text classification. These are classification model, similarity measure and document representation model. In this paper, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classifier. In our experiments, we have used the centroid-based text classifier, which is a simple and robust text classification scheme. We will compare four different types of document representations: N-grams, Single terms, phrases and RDR which is a logic-based document representation. The N-gram representation is a string-based representation with no linguistic processing. The Single term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single words. The RDR is based on linguistic processing and representing documents as a set of logical predicates. We have experimented with many text collections and we have obtained similar results. Here, we base our arguments on experiments conducted on Reuters-21578. We show that RDR, the more complex representation, produces more effective classifier on Reuters-21578, followed by the phrase approach.  相似文献   

4.
In order to compare two uncertain multiplicative linguistic variables, a possibility degree formula has been developed and its properties are also studied. Then a possibility degree matrix (also considered as a reciprocal preference relation) is constructed to compare a collection of uncertain multiplicative linguistic variables. Two models are further provided to derive the priority vector from the possibility degree matrix based on the additive consistency and multiplicative consistency. Especially, if the parameters are assigned specific values, then the models reduce to the existing ones. A group decision making method has been developed to deal with the situations where the preferences on alternatives are expressed by uncertain multiplicative linguistic variables. In this method, the possibility degree matrix is first constructed, from which the priority of alternatives is obtained using the developed models. At the end, an example is given to illustrate the proposed method.  相似文献   

5.
The complexity of linguistic distribution assessments increases the difficulty for the decision makers dealing with them. Recently, stochastic dominance has been varied to be a useful tool to compare two stochastic variables. Inspired by this, in this paper we dedicate to utilizing the stochastic dominance to compare the linguistic distribution assessments and further discuss the consensus reaching issue in GDM with linguistic distribution assessments. First, we introduce three types of individuals’ semantic sensitivity. Based on this, we define the linguistic stochastic dominances respectively under different semantic sensitivity contexts, and then provide several desirable properties. Then, we design a consensus reaching resolution framework based on linguistic stochastic dominance (CRRF-LSD). Finally, a case study is provided to show the application value of the CRRF-LSD, and two comparison analyses are further conducted to show the advantages of the linguistic stochastic dominance and the CRRF-LSD. The comparison results show that the proposed linguistic stochastic dominances method has clear advantages over several classical existing methods in comparing two linguistic distribution assessments. Meanwhile, the comparison results show that only the CRRF-LSD method takes the PIS and semantic sensitivity into account, which is helpful to determine more accurate individual ranking results.  相似文献   

6.
Kristof  Dirk   《Decision Support Systems》2008,44(4):870-882
Customer complaint management is becoming a critical key success factor in today's business environment. This study introduces a methodology to improve complaint-handling strategies through an automatic email-classification system that distinguishes complaints from non-complaints. As such, complaint handling becomes less time-consuming and more successful. The classification system combines traditional text information with new information about the linguistic style of an email. The empirical results show that adding linguistic style information into a classification model with conventional text-classification variables results in a significant increase in predictive performance. In addition, this study reveals linguistic style differences between complaint emails and others.  相似文献   

7.
文本阅读难度自动分级是让计算机能够根据文本特征自动判断文本所属的难度级别,该文以此为目标,提出一种基于多元语言特征与深度特征相融合的方法来实现对文本难度的自动分级。其中多元语言特征考虑了汉字、词汇、句子等不同的语言层面,同时涉及到频率、长度、复杂度、丰富度、连贯度等不同维度的信息。另一方面,该文利用了基于BERT的神经网络预训练模型来提取文本中句子的深度特征,在此基础上构建了一个端到端神经网络来将语言特征与深度特征进行融合,最终在自动分级任务上取得了不错的效果,分级正确率超过了基于传统语言特征的方法和基于主流神经网络的方法,充分表明了所提出的特征融合方法在文本阅读难度自动分级任务上的有效性。  相似文献   

8.
The reliable evaluation of financial performance of universities plays an important role in sustainable development of universities, and it could be regarded as a multicriteria group decision making problem. Based on this, in this paper, with respect to the attribute evaluation values are expressed by interval 2-tuple Pythagorean fuzzy linguistic variables, we develop an extended MULTIMOORA method. First, we propose the interval 2-tuple Pythagorean fuzzy linguistic set based on Pythagorean fuzzy set and interval-valued 2-tuple linguistic variables, and further introduce the score and accuracy functions and Hamming distance of it. Then, considering that there are some deficiencies in the traditional MULTIMOORA method, such as exist the case of circular reasoning, and so forth. We improve it by proposing a comprehensive integration approach, which can not only consider both the utility values and sort results, but also reflect the preference attitude of decision makings simultaneously. Based on these research findings, an interval 2-tuple Pythagorean fuzzy linguistic MULTIMOORA method is presented and the calculation process of this method is described in detail. Finally, we give the numerical example concerning the evaluation of financial management performance in universities to illustrate practicability and reliability of the proposed approach and compare the proposed method with different methods to perform its flexibility.  相似文献   

9.
Creating linguistic summaries of data has been a goal of the artificial and computational intelligence communities for many years. Summaries of written text have garnered the most attention. More recently, creating summaries of imagery and other sensed data has become important as a means of compressing large amounts of data and communicating with humans. In this paper, we consider the question of comparing sets of summaries generated from sensed data. In an earlier work, we developed a metric between individual protoform‐based summaries; and here, as a next step, we propose aggregation methods to fuse these individual distances. We provide a case study from eldercare where the goal is to compare different nighttime patterns for change detection. © 2012 Wiley Periodicals, Inc.  相似文献   

10.
Collocations are linguistic phenomena that occur when two or more words appear together more often than by chance and whose meaning often cannot be inferred from the meanings of its parts. As collocations have found many applications in the fields of natural language processing, information retrieval, and text mining, extracting them from large corpora has been the focus of many studies over the past few years. In this paper, we introduce the notion of an extension pattern, a formalization of the idea of extending lexical association measures (AMs) defined for bigrams. An extension pattern provides a measure-independent way of extending AMs for extracting collocations of arbitrary length. We define different extension patterns and compare them on a task of extracting collocations from a newspaper corpus. We show that the stopword-sensitive extension patterns we propose outperform other extensions, which indicates that AMs could benefit by taking into account linguistic information about an n-gram’s part-of-speech pattern.  相似文献   

11.
In this paper we have introduced a class of decision rules related to simple majority, by considering individual intensities of preference. These intensities will be shown by means of linguistic labels. In order to compare the amount of opinion obtained by each alternative, we have considered the total ordered monoid generated by the sums of the original labels, according to an addition and an ordering. In this general framework different sets of linguistic labels can be employed and these sets can be represented by means of diverse mathematical objects. Moreover, on these mathematical representations of linguistic labels several orderings can be considered. Thus, flexibility is an important feature of this new class of group decision making procedures. Some examples of putting in practice the simple majority decision rules based on linguistic labels are provided, and the main properties of these voting systems are analyzed. It is worth emphasizing that these properties are satisfied for any total ordered monoid, regardless of the mathematical representation of linguistic labels or the ordering used to compare collective opinions.  相似文献   

12.
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.  相似文献   

13.
With respect to multiple attribute group decision making problems, in which the attribute values take the form of fuzzy linguistic scale variables, some decision analysis approaches are proposed. In this paper, we introduce the fuzzy linguistic induce generalized ordered weighted averaging operator and study some of its main properties by utilizing some operational laws of fuzzy linguistic scale variables. We end the paper with a numerical example of the new approach in a fuzzy linguistic decision making.  相似文献   

14.
When performing queries in web search engines, users often face difficulties choosing appropriate query terms. Search engines therefore usually suggest a list of expanded versions of the user query to disambiguate it or to resolve potential term mismatches. However, it has been shown that users find it difficult to choose an expanded query from such a list. In this paper, we describe the adoption of set‐based text visualization techniques to visualize how query expansions enrich the result space of a given user query and how the result sets relate to each other. Our system uses a linguistic approach to expand queries and topic modeling to extract the most informative terms from the results of these queries. In a user study, we compare a common text list of query expansion suggestions to three set‐based text visualization techniques adopted for visualizing expanded query results – namely, Compact Euler Diagrams, Parallel Tag Clouds, and a List View – to resolve ambiguous queries using interactive query expansion. Our results show that text visualization techniques do not increase retrieval efficiency, precision, or recall. Overall, users rate Parallel Tag Clouds visualizing key terms of the expanded query space lowest. Based on the results, we derive recommendations for visualizations of query expansion results, text visualization techniques in general, and discuss alternative use cases of set‐based text visualization techniques in the context of web search.  相似文献   

15.
In this paper, we develop a new method for group linguistic decision making, in which the attribute values take the form of fuzzy linguistic information, namely the fuzzy linguistic induced Euclidean ordered weighted averaging distance (FLIEOWAD) operator. This operator is an extension of the IOWA operator that utilizes induce OWA operator, Euclidean distance measures, and uncertain information represented as fuzzy linguistic variables. Then, some of its main properties by utilizing some operational laws of fuzzy linguistic variables are studied. Thus, a method based on the FLIEOWAD operator for decision making is presented. Finally, a numerical example is used to illustrate the applicability and effectiveness of the proposed method.  相似文献   

16.
Linguistic decision making is an important subject in decision making, many interesting and important linguistic decision making methods have been proposed, in which, alternatives-criteria decision matrix are uniformly used to express linguistic assessments of alternatives provided by decision makers with respect to criteria. Alternatives-criteria decision matrixes have some limitations when we use them to distinguish distinct, partial unknown or hesitant linguistic decision making or carry out linguistic decision making in the huge amounts of decision information and alternatives. In this paper, we propose alternatives-linguistic terms decision matrix to represent linguistic assessments of alternatives, analyze advantages of the decision matrix in representing linguistic assessments and distinguishing distinct, partial unknown or hesitant linguistic decision making. To simple and fast fuse alternatives-linguistic terms decision matrixes, we further provide linguistic multiset or fuzzy linguistic multiset to represent linguistic assessments in alternatives-linguistic terms decision matrixes, analyze the function properties of the fuzzy linguistic multiset. Motivated by fuzzy multiset and the TOPSIS method, we develop the fuzzy linguistic multiset TOPSIS method for linguistic decision making, the method is mainly consisted of transformation, aggregation and exploitation phases. In transformation phase, linguistic assessments of alternatives are transformed into fuzzy linguistic multisets by using alternatives-linguistic terms decision matrixes. In aggregation phase, we use Union, Intersection and Sum operations of multisets to obtain the positive and negative ideal solutions of linguistic decision making, which are different with the positive and negative ideal solutions of the traditional TOPSIS method, in addition, we provide a pseudo-distance between two fuzzy linguistic multisets to fast fuse linguistic assessments of alternatives. In exploitation phase, we define a new closeness degree of alternative by using pseudo-distances between the alternative and the positive and negative ideal solutions, which can be used to obtain the set of most satisfying alternatives. We also design an algorithm to carry out linguistic decision making based on the proposed method. In cases study, we use two practical examples to illustrate the practicality of the proposed method and compare it with the symbolic aggregation-based method, the hesitant fuzzy linguistic TOPSIS method, the hesitant fuzzy linguistic VIKOR method and the probabilistic linguistic term sets TOPSIS method, results indicate that alternatives-linguistic terms decision matrix and fuzzy linguistic multiset are alternative, useful and flexible tools for linguistic decision method and the fuzzy linguistic multiset TOPSIS method is suitable to deal with partial unknown or hesitant linguistic decision making.  相似文献   

17.
Multisets generalize sets by allowing elements to have repetitions. In this paper, we study from a formal perspective representations of multiset variables, and the consistency and propagation of constraints involving multiset variables. These help us model problems more naturally and can, for example, prevent introducing unnecessary symmetries into a model. We identify a number of different representations for multiset variables, compare them in terms of effectiveness and efficiency, and propose inference rules to enforce bounds consistency for the representations. In addition, we propose to exploit the variety of a multiset—the number of distinct elements in it—to improve modeling expressiveness and further enhance constraint propagation. We derive a number of inference rules involving the varieties of multiset variables. The rules interact varieties with the traditional components of multiset variables (such as cardinalities) to obtain stronger propagation. We also demonstrate how to apply the rules to perform variety reasoning on some common multiset constraints. Experimental results show that performing variety reasoning on top of cardinality reasoning can effectively reduce more search space and achieve better runtime in solving some multiset CSPs.  相似文献   

18.
On the basis of two-dimension uncertain linguistic variables, in this paper, we further presented a trapezoidal fuzzy two-dimension linguistic variable in which the first dimensional linguistic uncertain information is extended to trapezoidal fuzzy number. First, the definition, operational laws, characteristics, expectation, comparative method and distance of trapezoidal fuzzy two-dimension linguistic information are proposed. Then, the trapezoidal fuzzy two-dimension linguistic power generalized aggregation operator and the trapezoidal fuzzy two-dimension linguistic power generalized weighted aggregation (TF2DLPGWA) operator are developed, and some properties and special cases of these operators are analyzed. Furthermore, based on the TF2DLPGWA operator and the comparative formula of the trapezoidal fuzzy two-dimension linguistic variables, an approach to group decision making with trapezoidal fuzzy two-dimension linguistic variables is established. Finally, an illustrative example is given to verify the developed approach and to demonstrate its practicality and effectiveness.  相似文献   

19.
A novel method for hybrid multiple attribute decision making   总被引:1,自引:0,他引:1  
Liu Pei-de 《Knowledge》2009,22(5):388-391
An approach based on 2-tuple is presented to solve the hybrid multiple attribute decision making problem with weight information unknown. First, transformation rules between linguistic variables and triangular fuzzy numbers, and distance between 2-tuple linguistics are defined, then the transformation method between 2-tuple linguistic and different forms of indicator values is given. Besides, according to grey incidence minimum deviation theory of positive ideal solution, the weights of indicators are determined, and then alternatives are ranked by 2-tuple linguistic weighting arithmetic average values. Finally, an illustrative example is given to demonstrate the procedure of the method and to compare with TOPSIS method to show the effectiveness and advantages of the presented method.  相似文献   

20.
汉语语体的计量特征在文本聚类中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
提出了将语言计量研究成果应用于文本聚类研究的方法。通过两个50万词的语料样本发现了在现代汉语口语体和书面语体中具有显著分布差异的16个语言结构特征;以其中7个作为文本表示特征准确地将实验文本聚类为口语体(相似度89.84%)和书面语体(相似度86.93%)两类。以语言结构的计量特征表示文本的方法加强了聚类/分类研究的可解释性,具有较高的理论和应用价值。以语料库和统计方法进行语体特征计量研究是汉语语体描写研究的重要方法,阐述了其理论基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号