首页 | 官方网站   微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Topic modeling is a popular analytical tool for evaluating data. Numerous methods of topic modeling have been developed which consider many kinds of relationships and restrictions within datasets; however, these methods are not frequently employed. Instead many researchers gravitate to Latent Dirichlet Analysis, which although flexible and adaptive, is not always suited for modeling more complex data relationships. We present different topic modeling approaches capable of dealing with correlation between topics, the changes of topics over time, as well as the ability to handle short texts such as encountered in social media or sparse text data. We also briefly review the algorithms which are used to optimize and infer parameters in topic modeling, which is essential to producing meaningful results regardless of method. We believe this review will encourage more diversity when performing topic modeling and help determine what topic modeling method best suits the user needs.  相似文献   

A language model based on features extracted from a recurrent neural network language model and semantic embedding of the left context of the current word based on probabilistic semantic analysis (PLSA) is developed. To calculate such embedding, the context is considered as a document. The effect of vanishing gradients in a recurrent neural network is reduced by this method. The experiment has shown that adding topic-based features reduces perplexity by 10%.  相似文献   

Knowledge discovery through directed probabilistic topic models: a survey   总被引:1,自引:0,他引:1  
Graphical models have become the basic framework for topic based probabilistic modeling. Especially models with latent variables have proved to be effective in capturing hidden structures in the data. In this paper, we survey an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora. From an unsupervised learning perspective, “topics are semantically related probabilistic clusters of words in text corpora; and the process for finding these topics is called topic modeling”. In topic modeling, a document consists of different hidden topics and the topic probabilities provide an explicit representation of a document to smooth data from the semantic level. It has been an active area of research during the last decade. Many models have been proposed for handling the problems of modeling text corpora with different characteristics, for applications such as document classification, hidden association finding, expert finding, community discovery and temporal trend analysis. We give basic concepts, advantages and disadvantages in a chronological order, existing models classification into different categories, their parameter estimation and inference making algorithms with models performance evaluation measures. We also discuss their applications, open challenges and future directions in this dynamic area of research.  相似文献   

This paper addresses the problem of semantics-based temporal expert finding, which means identifying a person with given expertise for different time periods. For example, many real world applications like reviewer matching for papers and finding hot topics in newswire articles need to consider time dynamics. Intuitively there will be different reviewers and reporters for different topics during different time periods. Traditional approaches used graph-based link structure by using keywords based matching and ignored semantic information, while topic modeling considered semantics-based information without conferences influence (richer text semantics and relationships between authors) and time information simultaneously. Consequently they result in not finding appropriate experts for different time periods. We propose a novel Temporal-Expert-Topic (TET) approach based on Semantics and Temporal Information based Expert Search (STMS) for temporal expert finding, which simultaneously models conferences influence and time information. Consequently, topics (semantically related probabilistic clusters of words) occurrence and correlations change over time, while the meaning of a particular topic almost remains unchanged. By using Bayes Theorem we can obtain topically related experts for different time periods and show how experts’ interests and relationships change over time. Experimental results on scientific literature dataset show that the proposed generalized time topic modeling approach significantly outperformed the non-generalized time topic modeling approaches, due to simultaneously capturing conferences influence with time information.  相似文献   

Accurately representing the quantity and characteristics of users’ interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modern online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolutionmodel, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.  相似文献   

Analyzing the quantitative performance plays an important role in understanding and improving the quality of cloud computing systems and cloud‐based applications. In cloud computing, service requests from users go through numerous provider‐specific steps from the instant it is submitted to when the requested service is fully delivered. Quantitative performance analysis is not an easy task because of the complexity of cloud provisioning control flows and the increasing scale and complexity of real‐world cloud infrastructures. This work proposes a probabilistic queuing network‐based model for the performance analysis of cloud infrastructures. It considers expected task completion time and rejection probability as the performance metrics. Experimental performance data suggest the correctness of the proposed model. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

Humans can generate accurate and appropriate motor commands in various, and even uncertain, environments. MOSAIC (MOdular Selection And Identification for Control) was originally proposed to describe this human ability, but this model is hard to analyze mathematically because of its emphasis on biological plausibility. In this article, we present an alternative and probabilistic model of MOSAIC (p-MOSAIC) as a mixture of normal distributions and an online EM-based learning method for its predictors and controllers. A theoretical consideration shows that the learning rule of p-MOSAIC corresponds to that of MOSAIC except for some points which are mostly related to the learning of controllers. The results of experiments using synthetic datasets demonstrate some practical advantages of p-MOSAIC. One is that the learning rule of p-MOSAIC stabilizes the estimation of “responsibility.” Another is that p-MOSAIC realizes more accurate control and robust parameter learning in comparison to the original MOSAIC, especially in noisy environments, due to the direct incorporation of the noises into the model. This work was presented in part at the 12th International Symposium on Artificial Life and Robotics, Oita, Japan, January 25–27, 2007  相似文献   

ContextOrganizations working in software development are aware that processes are very important assets as well as they are very conscious of the need to deploy well-defined processes with the goal of improving software product development and, particularly, quality. Software process modeling languages are an important support for describing and managing software processes in software-intensive organizations.ObjectiveThis paper seeks to identify what software process modeling languages have been defined in last decade, the relationships and dependencies among them and, starting from the current state, to define directions for future research.MethodA systematic literature review was developed. 1929 papers were retrieved by a manual search in 9 databases and 46 primary studies were finally included.ResultsSince 2000 more than 40 languages have been first reported, each of which with a concrete purpose. We show that different base technologies have been used to define software process modeling languages. We provide a scheme where each language is registered together with the year it was created, the base technology used to define it and whether it is considered a starting point for later languages. This scheme is used to illustrate the trend in software process modeling languages. Finally, we present directions for future research.ConclusionThis review presents the different software process modeling languages that have been developed in the last ten years, showing the relevant fact that model-based SPMLs (Software Process Modeling Languages) are being considered as a current trend. Each one of these languages has been designed with a particular motivation, to solve problems which had been detected. However, there are still several problems to face, which have become evident in this review. This let us provide researchers with some guidelines for future research on this topic.  相似文献   

With the popularity of social websites and mobile applications including Instagram, YouTube, TikTok, etc., online videos shared by customers presenting their thoughts and reviews on products are posted daily in increasing numbers. Such online videos containing Voice of Customer (VOC) are precious for product designers or managers to capture customer sentiment and understand customer preference. For this purpose, we propose a novel method for analyzing customer sentiment from online videos on product review. Firstly, latent Dirichlet allocation (LDA) modeling is applied to identify the topics from the online videos after data preprocessing. Then sentiment polarity corresponding to each topic of each speaker in videos can be identified using our newly designed multi-attention bi-directional LSTM (BLSTM(MA)), which can better mine complex relationships among a speaker’s sentiments on different topics. This paper is of great practical value for company managers and researchers to better understand a large number of customer opinions on specific products. To explain the application of this method and prove its effectiveness, two cases respectively on smartphones and several published datasets are developed finally.  相似文献   

Microblog is a popular and open platform for discovering and sharing the latest news about social issues and daily life. The quickly-updated microblog streams make it urgent to develop an effective tool to monitor such streams. Emerging topic tracking is one of such tools to reveal what new events are attracting the most online attention at present. However, due to the fast changing, high noise and short length of the microblog feeds, two challenges should be addressed in emerging topic tracking. One is the problem of detecting emerging topics early, long before they become hot, and the other is how to effectively monitor evolving topics over time. In this study, we propose a novel emerging topics tracking method, which aligns emerging word detection from temporal perspective with coherent topic mining from spatial perspective. Specifically, we first design a metric to estimate word novelty and fading based on local weighted linear regression (LWLR), which can highlight the word novelty of expressing an emerging topic and suppress the word novelty of expressing an existing topic. We then track emerging topics by leveraging topic novelty and fading probabilities, which are learnt by designing and solving an optimization problem. We evaluate our method on a microblog stream containing over one million feeds. Experimental results show the promising performance of the proposed method in detecting emerging topic and tracking topic evolution over time on both effectiveness and efficiency.  相似文献   

ContextBusiness process modeling is an essential part of understanding and redesigning the activities that a typical enterprise uses to achieve its business goals. The quality of a business process model has a significant impact on the development of any enterprise and IT support for that process.ObjectiveSince the insights on what constitutes modeling quality are constantly evolving, it is unclear whether research on business process modeling quality already covers all major aspects of modeling quality. Therefore, the objective of this research is to determine the state of the art on business process modeling quality: What aspects of process modeling quality have been addressed until now and which gaps remain to be covered?MethodWe performed a systematic literature review of peer reviewed articles as published between 2000 and August 2013 on business process modeling quality. To analyze the contributions of the papers we use the Formal Concept Analysis technique.ResultsWe found 72 studies addressing quality aspects of business process models. These studies were classified into different dimensions: addressed model quality type, research goal, research method, and type of research result. Our findings suggest that there is no generally accepted framework of model quality types. Most research focuses on empirical and pragmatic quality aspects, specifically with respect to improving the understandability or readability of models. Among the various research methods, experimentation is the most popular one. The results from published research most often take the form of intangible knowledge.ConclusionWe believe there is a lack of an encompassing and generally accepted definition of business process modeling quality. This evidences the need for the development of a broader quality framework capable of dealing with the different aspects of business process modeling quality. Different dimensions of business process quality and of the process of modeling still require further research.  相似文献   

广义话题结构是汉语篇章中客观存在的结构形式。依据有限状态机的思想设计了识别广义话题结构的计算模型,在较大规模语料中初步检验了它的有效性,分析了该模型的空间复杂度和时间复杂度。该模型的特点是:递推控制,输出和输入以标点句为单位同步进行,无长距离回溯,有限回填,有限存储,保持词序。这些特点正是人在“话题-说明”信息的认知过程中所遵循的准则,因此该计算模型可以看作人完成这一认知过程的机械模型。  相似文献   

The assessment of scorecard performance in the field of credit scoring is of major relevance to firms. This study presents the first systematic academic literature review of how empirical benchmark studies assess scorecard performance in the field of credit scoring. By analysing 62 comparative studies, this study provides two main contributions. First, this study provides a systematic overview of the assessment-related decisions of all the reviewed studies based on a classification framework. Second, the assessment criteria of consistency, application fit, and transparency are introduced and used to discuss the observed assessment-related decisions. As the findings show, researchers often pay insufficient attention to ensuring the consistent assessment of scorecard performance. Moreover, the majority of the reviewed studies choose performance indicators that failed to fit the application context and provided non-transparent assessment documentation. In conclusion, these researchers pay a great deal of attention to the development of scorecards, but they often fail to implement a straightforward assessment procedure.  相似文献   

Knowledge and Information Systems - The cross-lingual topic analysis aims at extracting latent topics from corpora of different languages. Early approaches rely on high-cost multilingual resources...  相似文献   

A constraint-based topic modeling approach for name disambiguation   总被引:1,自引:0,他引:1  
Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.  相似文献   

Name ambiguity refers to a problem that different people might be referenced with an identical name. This problem has become critical in many applications, particularly in online bibliography systems, such as DBLP and CiterSeer. Although much work has been conducted to address this problem, there still exist many challenges. In this paper, a general framework of constraint-based topic modeling is proposed, which can make use of user-defined constraints to enhance the performance of name disambiguation. A Gibbs sampling algorithm that integrates the constraints has been proposed to do the inference of the topic model. Experimental results on a real-world dataset show that significant improvements can be obtained by taking the proposed approach.  相似文献   

在计算生物学中,根据蛋白质的氨基酸序列预测蛋白质的结构是尚未解决的重要问题之一,而其中的1个难点是预测蛋白质中Loop片段的结构.本文用1阶马尔可夫模型为基础,通过对其训练,可根据氨基酸串和2级结构信息为蛋白质Loop片段概率建模和采样.其中用Ramachandmn图示法的二面角对描述蛋白质结构,模型的训练和推理通过工具包Mocapy来完成.并使用KL交叉熵和角度差异值作为实验检验标准来完成Loop分布情况的测试实验,同时在从头预测Loop结构实验中预测CASP8中8个自由建模的蛋白质结构.与最流行的方法相比,本文提出的模型因为改进了Loop段的预测精度,从而可使得到的二面角对更加接近真实Loop结构中分布,同时在从头预测中提高整个蛋白质结构的预测精度.并且由于本文的模型具有概率推理特性,故在理论上也更具有无偏见性.  相似文献   

Material flow modeling constitutes an important approach to predicting and understanding the flows of materials through the anthroposphere into the environment. The new “Dynamic Probabilistic Material Flow Analysis (DPMFA)” method, combining dynamic material flow modeling with probabilistic modeling, is presented in this paper. Material transfers that lead to particular environmental stocks are represented as systems of mass-balanced flows. The time-dynamic behavior of the system is calculated by adding up the flows over several consecutive periods, considering changes in the inflow to the system and intermediate delays in local stocks. Incomplete parameter knowledge is represented and propagated using Bayesian modeling. The method is implemented as a simulation framework in Python to support experts from different domains in the development of their application models. After the introduction of the method and its implementation, a case study is presented in which the framework is applied to predict the environmental concentrations of carbon nanotubes in Switzerland.  相似文献   

This paper constitutes a literature review on student modeling for the last decade. The review aims at answering three basic questions on student modeling: what to model, how and why. The prevailing student modeling approaches that have been used in the past 10 years are described, the aspects of students’ characteristics that were taken into consideration are presented and how a student model can be used in order to provide adaptivity and personalisation in computer-based educational software is highlighted. This paper aims to provide important information to researchers, educators and software developers of computer-based educational software ranging from e-learning and mobile learning systems to educational games including stand alone educational applications and intelligent tutoring systems. In addition, this paper can be used as a guide for making decisions about the techniques that should be adopted when designing a student model for an adaptive tutoring system. One significant conclusion is that the most preferred technique for representing the student’s mastery of knowledge is the overlay approach. Also, stereotyping seems to be ideal for modeling students’ learning styles and preferences. Furthermore, affective student modeling has had a rapid growth over the past years, while it has been noticed an increase in the adoption of fuzzy techniques and Bayesian networks in order to deal the uncertainty of student modeling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号