首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Online social media networks are gaining attention worldwide, with an increasing number of people relying on them to connect, communicate and share their daily pertinent event-related information. Event detection is now increasingly leveraging online social networks for highlighting events happening around the world via the Internet of People. In this paper, a novel Event Detection model based on Scoring and Word Embedding (ED-SWE) is proposed for discovering key events from a large volume of data streams of tweets and for generating an event summary using keywords and top-k tweets. The proposed ED-SWE model can distill high-quality tweets, reduce the negative impact of the advent of spam, and identify latent events in the data streams automatically. Moreover, a word embedding algorithm is used to learn a real-valued vector representation for a predefined fixed-sized vocabulary from a corpus of Twitter data. In order to further improve the performance of the Expectation-Maximization (EM) iteration algorithm, a novel initialization method based on the authority values of the tweets is also proposed in this paper to detect live events efficiently and precisely. Finally, a novel automatic identification method based on the cosine measure is used to automatically evaluate whether a given topic can form a live event. Experiments conducted on a real-world dataset demonstrate that the ED-SWE model exhibits better efficiency and accuracy than several state-of-art event detection models.  相似文献   

2.
Twitter, with an ever‐increasing user base, has greatly influenced the opinion and purchase habits of the common masses. This has in turn forced the product firms to get involved with sentiment analysis which enables them to mine the actual opinion about their product and make business decisions accordingly. Even though a majority of the existing methods detect sentiment of the tweet with a reasonable accuracy, few ignore emoticons while others consider them as stop words. Emoticons have enabled the users to express their emotion more accurately which eliminates the ambiguity that can arise with usage of words. The trending popularity of emoticons among the users combined with its ease of usage makes it highly lucrative in sentiment analysis. Hence, mining the product opinion without considering the emoticons will severely undermine the accuracy and reliability of the opinion. Moreover, sarcasm detection is still an uncharted territory in opinion mining and is exceedingly difficult to factor it in. Sarcastic tweets when left undetected will affect the accuracy of the opinion. Therefore, the polarity of the individual words and emoticons of the tweets are computed using linguistic analysis. The sarcastic tweets are then classified and eliminated based on their anomalous polarity. By placing a higher emphasis on emoticons, the proposed emoticon‐based linguistic opinion algorithm yields satisfactory results when compared with other traditional and state of the art approaches.  相似文献   

3.
Sarcasm is a type of sentiment where people express their negative feelings using positive or intensified positive words in the text. While speaking, people often use heavy tonal stress and certain gestural clues like rolling of the eyes, hand movement, etc. to reveal sarcastic. In the textual data, these tonal and gestural clues are missing, making sarcasm detection very difficult for an average human. Due to these challenges, researchers show interest in sarcasm detection of social media text, especially in tweets. Rapid growth of tweets in volume and its analysis pose major challenges. In this paper, we proposed a Hadoop based framework that captures real time tweets and processes it with a set of algorithms which identifies sarcastic sentiment effectively. We observe that the elapse time for analyzing and processing under Hadoop based framework significantly outperforms the conventional methods and is more suited for real time streaming tweets.  相似文献   

4.
A microblog is a service typically offered by online social networks, such as Twitter and Facebook. From the perspective of information dissemination, we define the concept behind a spreading matrix. A new WeiboRank algorithm for identification of key nodes in microblog networks is proposed, taking into account parameters such as a user's direct appeal, a user's influence region, and a user's global influence power. To investigate how measures for ranking influential users in a network correlate, we compare the relative influence ranks of the top 20 microblog users of a university network. The proposed algorithm is compared with other algorithms — PageRank, Betweeness Centrality, Closeness Centrality, Out‐degree — using a new tweets propagation model — the Ignorants‐Spreaders‐Rejecters model. Comparison results show that key nodes obtained from the WeiboRank algorithm have a wider transmission range and better influence.  相似文献   

5.
微博中基于多关系网络的话题影响力个体挖掘   总被引:2,自引:0,他引:2       下载免费PDF全文
丁兆云  贾焰  周斌  韩毅 《中国通信》2013,10(1):93-104
In micro-blogging contexts such as Twitter, the number of content producers can easily reach tens of thousands, and many users can participate in discussion of any given topic. While many users can introduce diversity, as not all users are equally influential, it makes it challenging to identify the true influencers, who are generally rated as being interesting and authoritative on a given topic. In this study, the influence of users is measured by performing random walks of the multi-relational data in micro-blogging: ret-weet, reply, reintroduce, and read. Due to the uncertainty of the reintroduce and read opera-tions, a new method is proposed to determine the transition probabilities of uncertain relational networks. Moreover, we propose a method for performing the combined random walks for the multi-relational influence network, considering both the transition prob-abilities for intra- and inter-networking. Ex-periments were conducted on a real Twitter dataset containing about 260 000 users and 2.7 million tweets, and the results show that our method is more effective than Twitter-Rank and other methods used to discover influencers.  相似文献   

6.
陈侃  陈亮  朱培栋  熊岳山 《通信学报》2015,36(7):120-128
网络水军对广告、谣言、木马和恶意链接进行传播,不仅干扰用户对在线社会网络的正常访问,还可能引发网络安全、社会稳定等方面的问题。针对网络水军信息传播的特点,提出基于交互行为的信息传播模型。模型根据不同传播主体间的交互定义特征来量化传播行为,使用决策树方法对水军传播的信息进行检测。通过新浪微博的真实数据分析传播模型并验证检测方法,结果表明检测方法能够对微博中水军信息进行有效检测。  相似文献   

7.
伪装攻击是指非授权用户通过伪装成合法用户来获得访问关键数据或更高层访问权限的行为.提出一种新的用户伪装攻击检测方法.该方法针对伪装攻击用户行为的多变性和审计数据shell命令的相关性,利用特殊的多阶齐次narkov链模型对合法用户的正常行为进行建模,并通过双重阶梯式归并shell命令来确定状态,提高了用户行为轮廓描述的...  相似文献   

8.
Event detection in a multimodal Twitter dataset is considered. We treat the hashtags in the dataset as instances with two modes: text and geolocation features. The text feature consists of a bag-of-words representation. The geolocation feature consists of geotags (i.e., geographical coordinates) of the tweets. Fusing the multimodal data we aim to detect, in terms of topic and geolocation, the interesting events and the associated hashtags. To this end, a generative latent variable model is assumed, and a generalized expectation-maximization (EM) algorithm is derived to learn the model parameters. The proposed method is computationally efficient, and lends itself to big datasets. Experimental results on a Twitter dataset from August 2014 show the efficacy of the proposed method.  相似文献   

9.
为了逃避基于文本的垃圾邮件系统的检测,越来越多的垃圾邮件制造者将文本信息嵌入到图像中。为了有效地检测出图像型垃圾邮件,提出了一种基于灰度—梯度共生矩阵(GGCM, gray-gradient co-occurrence matrix)的图像型垃圾邮件识别方法。先通过灰度—梯度共生矩阵提取图像的特征信息,然后运用最小二乘支持向量机(LS-SVM, least squares support vector machines)进行分类。实验表明,该方法具有较高的分类精度和较好的实时性。  相似文献   

10.
在实用的认知无线电系统中,频谱感知技术必须具备在噪声电平高动态变化和无线信道严重衰落电磁背景下,进行实时盲频谱感知的能力,这为经典的频谱感知算法带来巨大的挑战。该文提出的功率谱分段对消频谱感知算法,依据傅里叶变换的渐进正态性和相互独立性,计算出功率谱的统计特性,利用监测频带内部分谱线强度和与全部谱线强度和的比值作为检验统计量进行信号存在性的判断。该文推导了算法的虚警概率和不同信道模型下正确检测概率的数学表达式,并依据Neyman-Pearson准则得到判决门限的闭式表达式。理论分析和仿真结果均表明:功率谱分段对消频谱感知算法对噪声不确定度具有鲁棒性;固定信噪比,算法的频谱感知性能不受噪声电平改变的影响;应用于高斯白噪声和平坦慢衰落信道中,可在较宽的信噪比范围内获得较优越的频谱感知性能;算法计算复杂度低,可在微秒级时长内完成频谱感知。  相似文献   

11.
Research has shown that organizations tend to use Twitter primarily in a one-way, monologic manner and fall short of using the platform’s technological affordances to engage the public in dialogue. At the same time, relatively little research has addressed the specific persuasive outcomes that organizations could accrue by using Twitter to communicate with the public in a more dialogic way. We investigated the persuasive effect of an organization’s dialogic retweeting (conceptualized as retweeting of user mentions addressed to the organization) by drawing on the concept of social presence and the theory of reasoned action. In an online experiment conducted with an adult sample of U.S. Twitter users, participants were randomly assigned to view either a fictitious organization’s dialogic retweets or the same organization’s monologic tweets of identical content. We found that the dialogic retweets, when compared to the monologic tweets from the organization, induced a higher level of social presence, which, in turn, led to a higher level of subjective norms, more favorable attitudes toward the behavior advocated by the organization in the messages, and greater intention to adopt the behavior. Theoretical and practical implications of these findings are discussed.  相似文献   

12.
As a response to the increasing number of cyber threats, novel detection and prevention methods are constantly being developed. One of the main obstacles hindering the development and evaluation of such methods is the shortage of reference data sets. What is proposed in this work is a way of testing methods detecting network threats. It includes a procedure for creating realistic reference data sets describing network threats and the processing and use of these data sets in testing environments. The proposed approach is illustrated and validated on the basis of the problem of spam detection. Reference data sets for spam detection are developed, analysed and used to both generate the requested volume of simulated traffic and analyse it using machine learning algorithms. The tests take into account both the accuracy and performance of threat detection methods under real load and constrained computing resources.  相似文献   

13.
One of the most annoying problems on the Internet is spam. To fight spam, many approaches have been proposed over the years. Most of these approaches involve scanning the entire contents of e-mail messages in an attempt to detect suspicious keywords and patterns. Although such approaches are relatively effective, they also show some disadvantages. Therefore an interesting question is whether it would be possible to effectively detect spam without analyzing the entire contents of e-mail messages. The contribution of this paper is to present an alternative spam detection approach, which relies solely on analyzing the origin (IP address) of e-mail messages, as well as possible links within the e-mail messages to websites (URIs). Compared to analyzing suspicious keywords and patterns, detection and analysis of URIs is relatively simple. The IP addresses and URIs are compared to various kinds of blacklists; a hit increases the probability of the message being spam. Although the idea of using blacklists is well known, the novel idea proposed within this paper is to introduce the concept of ‘bad neighborhoods’. To validate our approach, a prototype has been developed and tested on our university's mail server. The outcome was compared to SpamAssassin and mail server log files. The result of that comparison was that our prototype showed remarkably good detection capabilities (comparable to SpamAssassin), but puts only a small load on the mail server. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

14.
The optimum receiver to detect the bits of multiple code-division multiple access (CDMA) users has an exponential complexity in the number of active users in the system. Consequently, many suboptimum receivers have been developed to achieve good performance with less complexity. We take the approach of approximating the solution of the optimum multiuser detection problem (OMUD) using nonlinear programming relaxations. First, we observe that some popular suboptimum receivers indeed correspond to relaxations of the optimal detection problem. In particular, one proposed approximation method yields to iterative solutions which correspond to previously proposed heuristic nonlinear detectors. Using a nonlinear programming approach, we identify the convergence properties of these iterative detectors. Secondly, we propose a relaxation that yields a receiver which we call the generalized minimum mean squared error detector. We give a simple iterative implementation of the detector. Its performance is evaluated and comparisons to other suboptimum detection schemes are given  相似文献   

15.
Jia  Jie  Xia  Linjiao  Ji  Pengshuo  Chen  Jian  Wang  Xingwei 《Wireless Networks》2022,28(7):3231-3245

FemtoCaching technology, aiming at maximizing the access probability of streaming media transmission in heterogeneous cellular networks, is investigated in this paper. Firstly, five kinds of streaming media deployment schemes are proposed based on the network topology and the relationship between users and streaming media. Secondly, a matching algorithm for adaptive streaming media deployment is proposed, where the FemtoCaching can be adjusted dynamically. Thirdly, a joint problem is formulated combined with the channel assignment, the power allocation, and the caching deployment. To address this problem, we propose a joint optimization algorithm combining matching algorithm and genetic algorithm to maximize the access probability of streaming media transmission. Simulation experiments demonstrate that: (1) the average access probability of all users accessing streaming media in the network based on the proposed algorithm compared with recent works can be greatly improved, and (2) the performance increases with increasing the number of channels and the storage capacity of micro base stations, but decreases with increasing the number of users.

  相似文献   

16.
Guaranteeing continuous streaming of multimedia data from service providers to the users is a challenging task in wireless ad hoc networks, particularly when node mobility is considered. The topological dynamics introduced by node mobility are further exacerbated by the natural grouping behavior of mobile users, which leads to frequent network partitioning. Network partitioning poses significant challenges to the provisioning of continuous multimedia streaming services in wireless ad hoc networks, since the partitioning disconnects many mobile users from the centralized streaming service. In this paper, we propose NonStop, a collection of novel middleware-based run-time algorithms that ensures the continuous availability of such multimedia streaming services, while minimizing the overhead involved. The network-wide continuous streaming coverage is achieved by partition prediction and service replication on the streaming sources and assisted by distributed selection of streaming sources on regular mobile nodes and users. The proposed algorithms are validated by extensive results from performance evaluations.  相似文献   

17.
Cognitive radio (CR)-based smart grid (SG) networks have been widely recognised as emerging communication paradigms in power grids. However, a sufficient spectrum resource and reliability are two major challenges for real-time applications in CR-based SG networks. In this article, we study the traffic data collection problem. Based on the two-stage power pricing model, the power price is associated with the efficient received traffic data in a metre data management system (MDMS). In order to minimise the system power price, a wideband hybrid access strategy is proposed and analysed, to share the spectrum between the SG nodes and CR networks. The sensing time and transmission time are jointly optimised, while both the interference to primary users and the spectrum opportunity loss of secondary users are considered. Two algorithms are proposed to solve the joint optimisation problem. Simulation results show that the proposed joint optimisation algorithms outperform the fixed parameters (sensing time and transmission time) algorithms, and the power cost is reduced efficiently.  相似文献   

18.
This article proposes a new retransmission-based loss-recovery scheme for reliable streaming of continuous-media data over the Internet. The proposed scheme integrates two techniques, namely gap and time-out detection, to detect packet loss. The integrated scheme is ideal for situations in which it is difficult for end users to assess network characteristics (e.g., delay jitter) and for situations in which network characteristics may change drastically over the duration of a streaming session  相似文献   

19.
由于网络容量的限制,任何一个网络都不可能避免拥塞问题。传统的RED算法只考虑了少量TCP用户,没有涉及UDP用户的情况,同时随着网络应用的多样化,如越来越多的UDP用户接入网络,传统的RED机制无法控制它们,保证不了TCP用户的服务质量(QoS)。针对TCP/UDP混合流多用户的情况,本文提出TCP/UDP混合流的区分控制,这里TCP和UDP流使用不同的带宽,及TCP/UDP混合流单瓶颈网络的2-D稳定条件。基于该稳定条件可以选择一个合适的RED控制参数Pmax,获得满意的网络拥塞控制性能。本文建立了一个TCP和UDP流单瓶颈网络的线性时滞系统模型,利用2-D拉普拉斯-Z变换,推导出基于稳定条件的混合流网络参数配置。利用NS2仿真验证所提出的混合流网络参数配置能够有效获得关于路由器队列长度和TCP窗口的稳定性。  相似文献   

20.
《电子学报:英文版》2016,(6):1025-1033
In this work,a Storm-based query language System (SQLS) is proposed for real-time data stream analysis.The system is compatible with Continuous query language (CQL) specification.It supports both continuous queries and one-time queries over streaming data,and meets the requirements of user experience (traditional SQL queries) and QoS (such as real-time and throughput).In order to better meet the requirement of throughput and enhance the processing efficiency,the load shedding algorithm and cache optimization are employed during the implementation of SQL-like operators.Finally,performance testing of the proposed SQLS has been conducted on standalone Storm platform and Storm clusters.Experimental resuits show that our system can not only meet the needs of users,but also extend the function of real-time streaming queries processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号