首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Online social networks have become immensely popular in recent years and have become the major sources for tracking the reverberation of events and news throughout the world. However, the diversity and popularity of online social networks attract malicious users to inject new forms of spam. Spamming is a malicious activity where a fake user spreads unsolicited messages in the form of bulk message, fraudulent review, malware/virus, hate speech, profanity, or advertising for marketing scam. In addition, it is found that spammers usually form a connected community of spam accounts and use them to spread spam to a large set of legitimate users. Consequently, it is highly desirable to detect such spammer communities existing in social networks. Even though a significant amount of work has been done in the field of detecting spam messages and accounts, not much research has been done in detecting spammer communities and hidden spam accounts. In this work, an unsupervised approach called SpamCom is proposed for detecting spammer communities in Twitter. We model the Twitter network as a multilayer social network and exploit the existence of overlapping community-based features of users represented in the form of Hypergraphs to identify spammers based on their structural behavior and URL characteristics. The use of community-based features, graph and URL characteristics of user accounts, and content similarity among users make our technique very robust and efficient.  相似文献   

2.
With the rise of social networking services such as Facebook and Twitter, the problem of spam and content pollution has become more significant and intractable. Using social networking services, users are able to develop relationships and share messages with others in a very convenient manner; however, they are vulnerable to receiving spam messages. The automatic detection of spammers or content polluters on the network can effectively reduce the burden on the service provider in making a decision on appropriate counteractions. Content polluters can be automatically identified by using the supervised learning technique of artificial intelligence. To build a classification model with high accuracy automatically from the training data set, it is important to identify a set of useful features that can classify polluters and non-polluters. Moreover, because we deal with a huge amount of raw data in this process, the efficiency of data preparation and model creation are also critical issues that need to be addressed. In this paper, we present an efficient method for detecting content polluters on Twitter. Specifically, we propose a set of features that can be easily extracted from the messages and behaviors of Twitter users and construct a new breed of classifiers based on these features. The proposed approach requires only a minimal number of feature values per Twitter user and thus adds considerably less time to the overall mining process compared to other methods. Experiments confirm that the proposed approach outperforms previous approaches in both classification accuracy and processing time.  相似文献   

3.
杨超  秦廷栋  范波  李涛 《计算机科学》2018,45(11):138-142, 159
将人工免疫危险理论引入到用户行为特征的分析中,以有效地识别微博水军用户。以新浪微博为例,分析了新浪微博水军的行为特征,选取微博总数、微博等级、是否认证、阳光信用、粉丝数等特征属性,将属性分析结果作为区别水军与正常用户的特征信号,并基于树突状细胞算法(Dendritic Cells Algorithm,DCA)实现新浪微博水军的识别。使用新浪微博用户的真实数据对算法的有效性进行了验证和对比实验,结果表明该方法能够有效检测出新浪微博中的水军用户,具有较高的检测准确率。  相似文献   

4.
基于关系图特征的微博水军发现方法   总被引:1,自引:0,他引:1  
随着网络水军策略的不断演变,传统的基于用户内容和用户行为的发现方法 对新型社交网络水军的识别效果不断下降.水军用户可以变更自身的博文内容与转发行为, 但无法改变与网络中正常用户的连结关系,形成的结构图具有一定的稳定性, 因此,相对于用户的内容特征与行为特征,用户关系特征在水军识别中具有更强的鲁棒性与准确度. 由此,本文提出一种基于用户关系图特征的微博水军账号识别方法. 实验中通过爬虫程序抓取新浪微博网络数据; 然后,提取用户的属性特征、时间特征、关系图特征;最后,利用三种机器学习算法对用户进行分类预测. 仿真结果表明,添加新特征后对水军账号的识别准确率、召回率提高5%以上, 从而验证了关系图特征在水军识别中的有效性.  相似文献   

5.
Spam in online social networks (OSNs) is a systemic problem that imposes a threat to these services in terms of undermining their value to advertisers and potential investors, as well as negatively affecting users’ engagement. As spammers continuously keep creating newer accounts and evasive techniques upon being caught, a deeper understanding of their spamming strategies is vital to the design of future social media defense mechanisms. In this work, we present a unique analysis of spam accounts in OSNs viewed through the lens of their behavioral characteristics. Our analysis includes over 100 million messages collected from Twitter over the course of 1 month. We show that there exist two behaviorally distinct categories of spammers and that they employ different spamming strategies. Then, we illustrate how users in these two categories demonstrate different individual properties as well as social interaction patterns. Finally, we analyze the detectability of spam accounts with respect to three categories of features, namely content attributes, social interactions, and profile properties.  相似文献   

6.
Social networks once being an innoxious platform for sharing pictures and thoughts among a small online community of friends has now transformed into a powerful tool of information, activism, mobilization, and sometimes abuse. Detecting true identity of social network users is an essential step for building social media an efficient channel of communication. This paper targets the microblogging service, Twitter, as the social network of choice for investigation. It has been observed that dissipation of pornographic content and promotion of followers market are actively operational on Twitter. This clearly indicates loopholes in the Twitter’s spam detection techniques. Through this work, five types of spammers-sole spammers, pornographic users, followers market merchants, fake, and compromised profiles have been identified. For the detection purpose, data of around 1 Lakh Twitter users with their 20 million tweets has been collected. Users have been classified based on trust, user and content based features using machine learning techniques such as Bayes Net, Logistic Regression, J48, Random Forest, and AdaBoostM1. The experimental results show that Random Forest classifier is able to predict spammers with an accuracy of 92.1%. Based on these initial classification results, a novel system for real-time streaming of users for spam detection has been developed. We envision that such a system should provide an indication to Twitter users about the identity of users in real-time.  相似文献   

7.
Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-of-the-art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.  相似文献   

8.
Customer engagement is drastically improved through Web 2.0 technologies, especially social media platforms like Twitter. These platforms are often used by organizations for marketing, of which creation of numerous spam profiles for content promotion is common. The present paper proposes a hybrid approach for identifying the spam profiles by combining social media analytics and bio inspired computing. It adopts a modified K-Means integrated Levy flight Firefly Algorithm (LFA) with chaotic maps as an extension to Firefly Algorithm (FA) for spam detection in Twitter marketing. A total of 18,44,701 tweets have been analyzed from 14,235 Twitter profiles on 13 statistically significant factors derived from social media analytics. A Fuzzy C-Means Clustering approach is further used to identify the overlapping users in two clusters of spammers and non-spammers. Six variants of K-Means integrated FA including chaotic maps and levy flights are tested. The findings indicate that FA with chaos for tuning attractiveness coefficient using Gauss Map converges to a working solution the fastest. Further, LFA with chaos for tuning the absorption coefficient using sinusoidal map outperforms the rest of the approaches in terms of accuracy.  相似文献   

9.
For the last decade, online social networking services have consistently shown explosive annual growth, and have become some of the most widely used applications and services. Large amounts of social relation information accumulate on these platforms, and advanced services, such as targeted advertising and viral marketing, have been introduced to exploit this social information. Although many prior social relation-based services have been commerce oriented, we propose employing social relations to improve online security. Specifically, we propose that real social networks possess unique characteristics that are difficult to imitate through random or artificial networks. Also, the social relations of each individual are unique, like a fingerprint or an iris. These observations thus lead to the development of the Social Relation Key (SRK) concept. We applied the SRK concept in different use cases in the real world, including in the detection of spam SMSes, and another in pinpointing fraud in Twitter followers. Since spammers multicast the same SMS to multiple, randomly-selected receivers and normal users multicast an SMS to friends or acquaintances who know each other, we devise a detection scheme that makes use of a clustering coefficient. We conducted a large scale experiment using an SMS log obtained from a major cellular network operator in Korea, and observed that the proposed scheme performs significantly better than the conventional content-based Naive Bayesian Filtering (NBF). To detect fraud in Twitter followers, we use different social network signatures, namely isomorphic triadic counts, and the property of social status. The experiment based on a Twitter dataset again confirmed the feasibility of the SRK. Our codes are available on a website1.  相似文献   

10.
基于双层采样主动学习的社交网络虚假用户检测方法   总被引:1,自引:0,他引:1  
社交网络的飞速发展给用户带来了便捷,但是社交网络开放性的特点使得其容易受到虚假用户的影响.虚假用户借用社交网络传播虚假信息达到自身的目的,这种行为严重影响着社交网络的安全性和稳定性.目前社交网络虚假用户的检测方法主要通过用户的行为、文本和网络关系等特征对用户进行分类,由于人工标注用户数据需要的代价较大,导致分类器能够使用的标签样本不足.为解决此问题,本文提出一种基于双层采样主动学习的社交网络虚假用户检测方法,该方法使用样本不确定性、代表性和多样性3个指标评估未标记样本的价值,并使用排序和聚类相结合的双层采样算法对未标记样本进行筛选,选出最有价值的样本给专家标注,用于对分类模型的训练.在Twitter、Apontador和Youtube数据集上的实验说明本文所提方法在标签样本数量不足的情况下,只使用少量有标签样本就可以达到与有监督学习接近的检测效果;并且,对比其他主动学习方法,本文方法具有更高的准确率和召回率,需要的标签样本数量更少.  相似文献   

11.
随着以用户为中心的Web 2.0的发展,社交网络平台以惊人的影响力渗入到生活的方方面面,对社交网络中的内容进行情感分析已经成为热点研究课题。Twitter、新浪微博等在线社交网站吸引了大量用户,通过用户间的交互,产生了许多包含用户间社会关系的信息,并且这些社会关系被广泛应用于社交网络的情感分析。融合社会关系的社交网络情感分析将用户间交互形成的社会关系应用到对用户发表在社交网络上内容的情感分析中,拟解决文本短小精炼、语义模糊、特征较为稀疏带来的情感分析准确率低的问题。对融合社会关系的社交网络情感分析研究进展进行综述,梳理、分析主要的方法,列举出其中的关键问题,最后阐述了研究趋势和展望,并进行了总结。  相似文献   

12.
Web spam has become one of the most exciting challenges and threats to web search engines. The relationship between the search systems and those who try to manipulate them came up with the field of adversarial information retrieval. In this article, we set up several experiments to compare HostRank and TrustRank to show how effective it is for TrustRank to combat web spam, and we report a comparison on different link-based web spam detection algorithms.  相似文献   

13.

Twitter has nowadays become a trending microblogging and social media platform for news and discussions. Since the dramatic increase in its platform has additionally set off a dramatic increase in spam utilization in this platform. For Supervised machine learning, one always finds a need to have a labeled dataset of Twitter. It is desirable to design a semi-supervised labeling technique for labeling newly prepared recent datasets. To prepare the labeled dataset lot of human affords are required. This issue has motivated us to propose an efficient approach for preparing labeled datasets so that time can be saved and human errors can be avoided. Our proposed approach relies on readily available features in real-time for better performance and wider applicability. This work aims at collecting the most recent tweets of a user using Twitter streaming and prepare a recent dataset of Twitter. Finally, a semi-supervised machine learning algorithm based on the self-training technique was designed for labeling the tweets. Semi-supervised support vector machine and semi-supervised decision tree classifiers were used as base classifiers in the self-training technique. Further, the authors have applied K means clustering algorithm to the tweets based on the tweet content. The principled novel approach is an ensemble of semi-supervised and unsupervised learning wherein it was found that semi-supervised algorithms are more accurate in prediction than unsupervised ones. To effectively assign the labels to the tweets, authors have implemented the concept of voting in this novel approach and the label pre-directed by the majority voting classifier is the actual label assigned to the tweet dataset. Maximum accuracy of 99.0% has been reported in this paper using a majority voting classifier for spam labeling.

  相似文献   

14.
Chen  Ailin  Yang  Pin  Cheng  Pengsen 《The Journal of supercomputing》2022,78(2):2744-2771

The rumors, advertisements and malicious links are spread in social networks by social spammers, which affect users’ normal access to social networks and cause security problems. Most methods aim to detect social spammers by various features, such as content features, behavior features and relationship graph features, which rely on a large-scale labeled data. However, labeled data are lacking for training in real world, and manual annotating is time-consuming and labor-intensive. To solve this problem, we propose a novel method which combines active learning algorithm with co-training algorithm to make full use of unlabeled data. In co-training, user features are divided into two views without overlap. Classifiers are trained iteratively with labeled instances and the most confident unlabeled instances with pseudo-labels. In active learning, the most representative and uncertain instances are selected and annotated with real labels to extend labeled dataset. Experimental results on the Twitter and Apontador datasets show that our method can effectively detect social spammers in the case of limited labeled data.

  相似文献   

15.
16.
Web spam是指通过内容作弊和网页间链接作弊来欺骗搜索引擎,从而提升自身搜索排名的作弊网页,它干扰了搜索结果的准确性和相关性。提出基于Co-Training模型的Web spam检测方法,使用了网页的两组相互独立的特征——基于内容的统计特征和基于网络图的链接特征,分别建立两个独立的基本分类器;使用Co-Training半监督式学习算法,借助大量未标记数据来改善分类器质量。在WEB SPAM-UK2007数据集上的实验证明:算法改善了SVM分类器的效果。  相似文献   

17.
随着社交网络平台的发展,社交网络已经成为人们获取信息的重要来源。然而社交网络的便利性也导致了虚假谣言的快速传播。与纯文本的谣言相比,带有多媒体信息的网络谣言更容易误导用户以及被传播,因此对多模态的网络谣言检测在现实生活中有着重要意义。研究者们已提出若干多模态的网络谣言检测方法,但这些方法都没有充分挖掘出视觉特征和融合文本与视觉的联合表征特征。为弥补这些不足,提出了一个基于深度学习的端到端的多模态融合网络。该网络首先抽取出图片中各个兴趣区域的视觉特征,然后使用多头注意力机制将文本和视觉特征进行更新与融合,最后将这些特征进行基于注意力机制的拼接以用于社交网络多模态谣言检测。在推特和微博公开数据集上进行对比实验,结果表明,所提方法在推特数据集上F1值有13.4%的提升,在微博数据集上F1值有1.6%的提升。  相似文献   

18.
In this paper, we present a generic statistical approach to identify spam profiles on Online Social Networks (OSNs). Our study is based on real datasets containing both normal and spam profiles crawled from Facebook and Twitter networks. We have identified a set of 14 generic statistical features to identify spam profiles. The identified features are common to both Facebook and Twitter networks. For classification task, we have used three different classification algorithms – naïve Bayes, Jrip, and J48, and evaluated them on both individual and combined datasets to establish the discriminative property of the identified features. The results obtained on a combined dataset has detection rate (DR) as 0.957 and false positive rate (FPR) as 0.048, whereas on Facebook dataset the DR and FPR values are 0.964 and 0.089, respectively, and that on Twitter dataset the DR and FPR values are 0.976 and 0.075, respectively. We have also analyzed the contribution of each individual feature towards the detection accuracy of spam profiles. Thereafter, we have considered 7 most discriminative features and proposed a clustering-based approach to identify spam campaigns on Facebook and Twitter networks.  相似文献   

19.
In this article we first explain the knowledge extraction (KE) process from the World Wide Web (WWW) using search engines. Then we explore the PageRank algorithm of Google search engine (a well-known link-based search engine) with its hidden Markov analysis. We also explore one of the problems of link-based ranking algorithms called hanging pages or dangling pages (pages without any forward links). The presence of these pages affects the ranking of Web pages. Some of the hanging pages may contain important information that cannot be neglected by the search engine during ranking. We propose methodologies to handle the hanging pages and compare the methodologies. We also introduce the TrustRank algorithm (an algorithm to handle the spamming problems in link-based search engines) and include it in our proposed methods so that our methods can combat Web spam. We implemented the PageRank algorithm and TrustRank algorithm and modified those algorithms to implement our proposed methodologies.  相似文献   

20.
Today's e-commerce is highly depended on increasingly growing online customers’ reviews posted in opinion sharing websites. This fact, unfortunately, has tempted spammers to target opinion sharing websites in order to promote and demote products. To date, different types of opinion spam detection methods have been proposed in order to provide reliable resources for customers, manufacturers and researchers. However, supervised approaches suffer from imbalance data due to scarcity of spam reviews in datasets, rating deviation based filtering systems are easily cheated by smart spammers, and content based methods are very expensive and majority of them have not been tested on real data hitherto.The aim of this paper is to propose a robust review spam detection system wherein the rating deviation, content based factors and activeness of reviewers are employed efficiently. To overcome the aforementioned drawbacks, all these factors are synthetically investigated in suspicious time intervals captured from time series of reviews by a pattern recognition technique. The proposed method could be a great asset in online spam filtering systems and could be used in data mining and knowledge discovery tasks as a standalone system to purify product review datasets. These systems can reap benefit from our method in terms of time efficiency and high accuracy. Empirical analyses on real dataset show that the proposed approach is able to successfully detect spam reviews. Comparison with two of the current common methods, indicates that our method is able to achieve higher detection accuracy (F-Score: 0.86) while removing the need for having specific fields of Meta data and reducing heavy computation required for investigation purposes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号