首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
Online social networks have become immensely popular in recent years and have become the major sources for tracking the reverberation of events and news throughout the world. However, the diversity and popularity of online social networks attract malicious users to inject new forms of spam. Spamming is a malicious activity where a fake user spreads unsolicited messages in the form of bulk message, fraudulent review, malware/virus, hate speech, profanity, or advertising for marketing scam. In addition, it is found that spammers usually form a connected community of spam accounts and use them to spread spam to a large set of legitimate users. Consequently, it is highly desirable to detect such spammer communities existing in social networks. Even though a significant amount of work has been done in the field of detecting spam messages and accounts, not much research has been done in detecting spammer communities and hidden spam accounts. In this work, an unsupervised approach called SpamCom is proposed for detecting spammer communities in Twitter. We model the Twitter network as a multilayer social network and exploit the existence of overlapping community-based features of users represented in the form of Hypergraphs to identify spammers based on their structural behavior and URL characteristics. The use of community-based features, graph and URL characteristics of user accounts, and content similarity among users make our technique very robust and efficient.  相似文献   

2.
Liu  Bo  Ni  Zeyang  Luo  Junzhou  Cao  Jiuxin  Ni  Xudong  Liu  Benyuan  Fu  Xinwen 《World Wide Web》2019,22(6):2953-2975

Social networking websites with microblogging functionality, such as Twitter or Sina Weibo, have emerged as popular platforms for discovering real-time information on the Web. Like most Internet services, these websites have become the targets of spam campaigns, which contaminate Web contents and damage user experiences. Spam campaigns have become a great threat to social network services. In this paper, we investigate crowd-retweeting spam in Sina Weibo, the counterpart of Twitter in China. We carefully analyze the characteristics of crowd-retweeting spammers in terms of their profile features, social relationships and retweeting behaviors. We find that although these spammers are likely to connect more closely than legitimate users, the underlying social connections of crowd-retweeting campaigns are different from those of other existing spam campaigns because of the unique features of retweets that are spread in a cascade. Based on these findings, we propose retweeting-aware link-based ranking algorithms to infer more suspicious accounts by using identified spammers as seeds. Our evaluation results show that our algorithms are more effective than other link-based strategies.

  相似文献   

3.
The convergence of broadcasting and broadband communications network technologies has attracted increasing attention as a means to enrich the television viewing experience of viewers. Toward this end, this study proposes the ‘Intelligence Circulation System (ICS)’, which provides several services, by using newly developed algorithms for analysing Twitter messages. Twitter users often post messages about on-air TV programmes. ICS obtains viewer responses from tweets without requiring any new infrastructure or changes in users’ habits or behaviours, and it generates and provides several outputs to heterogeneous devices based on the analysis results. The algorithms—designed by considering the characteristics of Twitter messages about TV programmes—use auxiliary programme information, similarity between messages, and time series of messages. An evaluation of our algorithms using Twitter messages about all programme genres for a month showed that the accuracy of topic extraction was 85 % for an emphasis on quality (with 56 % of messages processed) and 65 % for an emphasis on quantity (with 95 % of messages processed). The accuracy of message sentimental classification was 66 %. We also describe social recommendation services using the analysis result. We have created a Social TV site for a large-scale field trial, and we have analysed users’ behaviours by comparing four types of social recommendation services on it. The experimental result shows that active and passive communication users had different needs with regard to the recommendations. ICS can generate recommendations for satisfying the needs of both user types by using the analysis result of Twitter messages.  相似文献   

4.
Social networks once being an innoxious platform for sharing pictures and thoughts among a small online community of friends has now transformed into a powerful tool of information, activism, mobilization, and sometimes abuse. Detecting true identity of social network users is an essential step for building social media an efficient channel of communication. This paper targets the microblogging service, Twitter, as the social network of choice for investigation. It has been observed that dissipation of pornographic content and promotion of followers market are actively operational on Twitter. This clearly indicates loopholes in the Twitter’s spam detection techniques. Through this work, five types of spammers-sole spammers, pornographic users, followers market merchants, fake, and compromised profiles have been identified. For the detection purpose, data of around 1 Lakh Twitter users with their 20 million tweets has been collected. Users have been classified based on trust, user and content based features using machine learning techniques such as Bayes Net, Logistic Regression, J48, Random Forest, and AdaBoostM1. The experimental results show that Random Forest classifier is able to predict spammers with an accuracy of 92.1%. Based on these initial classification results, a novel system for real-time streaming of users for spam detection has been developed. We envision that such a system should provide an indication to Twitter users about the identity of users in real-time.  相似文献   

5.
Spam in online social networks (OSNs) is a systemic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors, as well as negatively affecting users’ engagement. As spammers continuously keep creating newer accounts and evasive techniques upon being caught, a deeper understanding of their spamming strategies is vital to the design of future social media defense mechanisms. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics. Our analysis includes over 100 million messages collected from Twitter over the course of 1 month. We show that there exist two behaviorally distinct categories of spammers and that they employ different spamming strategies. Then, we illustrate how users in these two categories demonstrate different individual properties as well as social interaction patterns. Finally, we analyze the detectability of spam accounts with respect to three categories of features, namely content attributes, social interactions, and profile properties.  相似文献   

6.
Unsolicited or spam email has recently become a major threat that can negatively impact the usability of electronic mail. Spam substantially wastes time and money for business users and network administrators, consumes network bandwidth and storage space, and slows down email servers. In addition, it provides a medium for distributing harmful code and/or offensive content. In this paper, we explore the application of the GMDH (Group Method of Data Handling) based inductive learning approach in detecting spam messages by automatically identifying content features that effectively distinguish spam from legitimate emails. We study the performance for various network model complexities using spambase, a publicly available benchmark dataset. Results reveal that classification accuracies of 91.7% can be achieved using only 10 out of the available 57 attributes, selected through abductive learning as the most effective feature subset (i.e. 82.5% data reduction). We also show how to improve classification performance using abductive network ensembles (committees) trained on different subsets of the training data. Comparison with other techniques such as neural networks and naïve Bayesian classifiers shows that the GMDH-based learning approach can provide better spam detection accuracy with false-positive rates as low as 4.3% and yet requires shorter training time.  相似文献   

7.

In today’s world of connectivity there is a huge amount of data than we could imagine. The number of network users are increasing day by day and there are large number of social networks which keeps the users connected all the time. These social networks give the complete independence to the user to post the data either political, commercial or entertainment value. Some data may be sensitive and have a greater impact on the society as a result. The trustworthiness of data is important when it comes to public social networking sites like facebook and twitter. Due to the large user base and its openness there is a huge possibility to spread spam messages in this network. Spam detection is a technique to identify and mark data as a false data value. There are lot of machine learning approaches proposed to detect spam in social networks. The efficiency of any spam detection algorithm is determined by its cost factor and accuracy. Aiming to improve the detection of spam in the social networks this study proposes using statistical based features that are modelled through the supervised boosting approach called Stochastic gradient boosting to evaluate the twitter data sets in the English language. The performance of the proposed model is evaluated using simulation results.

  相似文献   

8.
Twitter provides search services to help people find users to follow by recommending popular users or the friends of their friends. However, these services neither offer the most relevant users to follow nor provide a way to find the most interesting tweet messages for each user. Recently, collaborative filtering techniques for recommendations based on friend relationships in social networks have been widely investigated. However, since such techniques do not work well when friend relationships are not sufficient, we need to take advantage of as much other information as possible to improve the performance of recommendations.In this paper, we propose TWILITE, a recommendation system for Twitter using probabilistic modeling based on latent Dirichlet allocation which recommends top-K users to follow and top-K tweets to read for a user. Our model can capture the realistic process of posting tweet messages by generalizing an LDA model as well as the process of connecting to friends by utilizing matrix factorization. We next develop an inference algorithm based on the variational EM algorithm for learning model parameters. Based on the estimated model parameters, we also present effective personalized recommendation algorithms to find the users to follow as well as the interesting tweet messages to read. The performance study with real-life data sets confirms the effectiveness of the proposed model and the accuracy of our personalized recommendations.  相似文献   

9.
Twitter is among the fastest‐growing microblogging and online social networking services. Messages posted on Twitter (tweets) have been reporting everything from daily life stories to the latest local and global news and events. Monitoring and analyzing this rich and continuous user‐generated content can yield unprecedentedly valuable information, enabling users and organizations to acquire actionable knowledge. This article provides a survey of techniques for event detection from Twitter streams. These techniques aim at finding real‐world occurrences that unfold over space and time. In contrast to conventional media, event detection from Twitter streams poses new challenges. Twitter streams contain large amounts of meaningless messages and polluted content, which negatively affect the detection performance. In addition, traditional text mining techniques are not suitable, because of the short length of tweets, the large number of spelling and grammatical errors, and the frequent use of informal and mixed language. Event detection techniques presented in literature address these issues by adapting techniques from various fields to the uniqueness of Twitter. This article classifies these techniques according to the event type, detection task, and detection method and discusses commonly used features. Finally, it highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.  相似文献   

10.
In this paper, we present a generic statistical approach to identify spam profiles on Online Social Networks (OSNs). Our study is based on real datasets containing both normal and spam profiles crawled from Facebook and Twitter networks. We have identified a set of 14 generic statistical features to identify spam profiles. The identified features are common to both Facebook and Twitter networks. For classification task, we have used three different classification algorithms – naïve Bayes, Jrip, and J48, and evaluated them on both individual and combined datasets to establish the discriminative property of the identified features. The results obtained on a combined dataset has detection rate (DR) as 0.957 and false positive rate (FPR) as 0.048, whereas on Facebook dataset the DR and FPR values are 0.964 and 0.089, respectively, and that on Twitter dataset the DR and FPR values are 0.976 and 0.075, respectively. We have also analyzed the contribution of each individual feature towards the detection accuracy of spam profiles. Thereafter, we have considered 7 most discriminative features and proposed a clustering-based approach to identify spam campaigns on Facebook and Twitter networks.  相似文献   

11.
For the last decade, online social networking services have consistently shown explosive annual growth, and have become some of the most widely used applications and services. Large amounts of social relation information accumulate on these platforms, and advanced services, such as targeted advertising and viral marketing, have been introduced to exploit this social information. Although many prior social relation-based services have been commerce oriented, we propose employing social relations to improve online security. Specifically, we propose that real social networks possess unique characteristics that are difficult to imitate through random or artificial networks. Also, the social relations of each individual are unique, like a fingerprint or an iris. These observations thus lead to the development of the Social Relation Key (SRK) concept. We applied the SRK concept in different use cases in the real world, including in the detection of spam SMSes, and another in pinpointing fraud in Twitter followers. Since spammers multicast the same SMS to multiple, randomly-selected receivers and normal users multicast an SMS to friends or acquaintances who know each other, we devise a detection scheme that makes use of a clustering coefficient. We conducted a large scale experiment using an SMS log obtained from a major cellular network operator in Korea, and observed that the proposed scheme performs significantly better than the conventional content-based Naive Bayesian Filtering (NBF). To detect fraud in Twitter followers, we use different social network signatures, namely isomorphic triadic counts, and the property of social status. The experiment based on a Twitter dataset again confirmed the feasibility of the SRK. Our codes are available on a website1.  相似文献   

12.
With the incremental use of emails as an essential and popular communication mean over the Internet, there comes a serious threat that impacts the Internet and the society. This problem is known as spam. By receiving spam messages, Internet users are exposed to security issues, and minors are exposed to inappropriate contents. Moreover, spam messages waste resources in terms of storage, bandwidth, and productivity. What makes the problem worse is that spammers keep inventing new techniques to dodge spam filters. On the other side, the massive data flow of hundreds of millions of individuals, and the large number of attributes make the problem more cumbersome and complex. Therefore, proposing evolutionary and adaptable spam detection models becomes a necessity. In this paper, an intelligent detection system that is based on Genetic Algorithm (GA) and Random Weight Network (RWN) is proposed to deal with email spam detection tasks. In addition, an automatic identification capability is also embedded in the proposed system to detect the most relevant features during the detection process. The proposed system is intensively evaluated through a series of extensive experiments based on three email corpora. The experimental results confirm that the proposed system can achieve remarkable results in terms of accuracy, precision, and recall. Furthermore, the proposed detection system can automatically identify the most relevant features of the spam emails.  相似文献   

13.
传统的垃圾短信过滤方案,以垃圾短信中出现的敏感词作为判断的依据,却忽略了正常短信中出现的词对分类的贡献,并且由于短信用语的灵活性,特征提取难度较大。提出了一种基于svm算法对垃圾短信进行监控和过滤的方案,该方案根据短信内容、短信长度等特征,对短信文本进行向量空间的表示。通过机器学习的方式,对垃圾短信进行判断,过滤。相比传统方法而言,本系统在过滤准确度和效率两方面均获得大幅度提升。  相似文献   

14.
《Information & Management》2016,53(8):987-996
Social media is a major platform for opinion sharing. In order to better understand and exploit opinions on social media, we aim to classify users with opposite opinions on a topic for decision support. Rather than mining text content, we introduce a link-based classification model, named global consistency maximization (GCM) that partitions a social network into two classes of users with opposite opinions. Experiments on a Twitter data set show that: (1) our global approach achieves higher accuracy than two baseline approaches and (2) link-based classifiers are more robust to small training samples if selected properly.  相似文献   

15.
随着社交网络平台的发展,社交网络已经成为人们获取信息的重要来源。然而社交网络的便利性也导致了虚假谣言的快速传播。与纯文本的谣言相比,带有多媒体信息的网络谣言更容易误导用户以及被传播,因此对多模态的网络谣言检测在现实生活中有着重要意义。研究者们已提出若干多模态的网络谣言检测方法,但这些方法都没有充分挖掘出视觉特征和融合文本与视觉的联合表征特征。为弥补这些不足,提出了一个基于深度学习的端到端的多模态融合网络。该网络首先抽取出图片中各个兴趣区域的视觉特征,然后使用多头注意力机制将文本和视觉特征进行更新与融合,最后将这些特征进行基于注意力机制的拼接以用于社交网络多模态谣言检测。在推特和微博公开数据集上进行对比实验,结果表明,所提方法在推特数据集上F1值有13.4%的提升,在微博数据集上F1值有1.6%的提升。  相似文献   

16.
社交网络服务(social networking service,SNS)已融入到大众生活中。人们将自己的信息上传到网络中,并通过社交网站管理自己的社交圈子,由此造成大量的个人信息在社交网络上被公开。文章基于Twitter平台,设计实现了Twitter用户关系网的社区发现。通过实时采集Twitter用户信息,重建人物关系网,改进Newman快速算法划分社区发现人物关系网。文章通过可视化的界面呈现用户的社区关系,提供用户网络行为,为决策者的舆情监控或个性推荐提供了参考凭据。  相似文献   

17.
A variety of approaches have been recently proposed to automatically infer users’ personality from their user generated content in social media. Approaches differ in terms of the machine learning algorithms and the feature sets used, type of utilized footprint, and the social media environment used to collect the data. In this paper, we perform a comparative analysis of state-of-the-art computational personality recognition methods on a varied set of social media ground truth data from Facebook, Twitter and YouTube. We answer three questions: (1) Should personality prediction be treated as a multi-label prediction task (i.e., all personality traits of a given user are predicted at once), or should each trait be identified separately? (2) Which predictive features work well across different on-line environments? and (3) What is the decay in accuracy when porting models trained in one social media environment to another?  相似文献   

18.
Despite the enormous importance of e-mail to current worldwide communication, the increase of spam deliveries has had a significant adverse effect for all its users. In order to adequately fight spam, both the filtering industry and scientific community have developed and deployed the fastest and most accurate filtering techniques. However, the increasing volume of new incoming messages needing classification together with the lack of adequate support for anti-spam services on the cloud, make filtering efficiency an absolute necessity. In this context, and given the extensive utilization and increasing significance of rule-based filtering frameworks for the anti-spam domain, this work studies and analyses the importance of both existing and novel scheduling strategies to make the most of currently available anti-spam filtering techniques. Results obtained from the experiments demonstrated that some scheduling alternatives resulted in time savings of up to 26% for filtering messages, while maintaining the same classification accuracy.  相似文献   

19.
Customer engagement is drastically improved through Web 2.0 technologies, especially social media platforms like Twitter. These platforms are often used by organizations for marketing, of which creation of numerous spam profiles for content promotion is common. The present paper proposes a hybrid approach for identifying the spam profiles by combining social media analytics and bio inspired computing. It adopts a modified K-Means integrated Levy flight Firefly Algorithm (LFA) with chaotic maps as an extension to Firefly Algorithm (FA) for spam detection in Twitter marketing. A total of 18,44,701 tweets have been analyzed from 14,235 Twitter profiles on 13 statistically significant factors derived from social media analytics. A Fuzzy C-Means Clustering approach is further used to identify the overlapping users in two clusters of spammers and non-spammers. Six variants of K-Means integrated FA including chaotic maps and levy flights are tested. The findings indicate that FA with chaos for tuning attractiveness coefficient using Gauss Map converges to a working solution the fastest. Further, LFA with chaos for tuning the absorption coefficient using sinusoidal map outperforms the rest of the approaches in terms of accuracy.  相似文献   

20.
ABSTRACT

Twitter has become a popular microblogging service that allows millions of active users share news, emergent social events, personal opinions, etc. That leads to a large amount of data producing every day and the problem of managing tweets becomes extremely difficult. To categorize the tweets and make easily in searching, the users can use the hashtags embedding in their tweets. However, valid hashtags are not restricted which lead to a very heterogeneous set of hashtags created on Twitter, increasing the difficulty of tweet categorization. In this paper, we propose a hashtag recommendation method based on analyzing the content of tweets, user characteristics, and currently popular hashtags on Twitter. The proposed method uses personal profiles of the users to discover the relevant hashtags. First, a combination of tweet contents and user characteristics is used to find the top-k similar tweets. We exploit the content of historical tweets, used hashtags, and the social interaction to build the user profiles. The user characteristics can help to find the close users and enhance the accuracy of finding the similar tweets to extract the hashtag candidates. Then a set of hashtag candidates is ranked based on their popularity in long and short periods. The experiments on tweet data showed that the proposed method significantly improves the performance of hashtag recommendation systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号