首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Combining machine learning with social network analysis (SNA) can leverage vast amounts of social media data to better respond to crises. We present a case study using Twitter data from the March 2019 Nebraska floods in the United States, which caused over $1 billion in damage in the state and widespread evacuations of residents. We use a subset of machine learning, deep learning (DL), to classify text content of 11,982 tweets, and we integrate that with SNA to understand the structure of tweet interactions. Our DL approach pre‐trains our model with a DL language technique, BERT, and then trains the model using the standard training dataset to sort a dataset of tweets into classes tailored to crisis events. Several performance measures demonstrate that our two‐tiered trained model improves domain adaptation and generalization across different extreme weather event types. This approach identifies the role of Twitter during the damage containment stage of the flood. Our SNA identifies accounts that function as primary sources of information on Twitter. Together, these two approaches help crisis managers filter large volumes of data and overcome challenges faced by simple statistical models and other computational techniques to provide useful information during crises like flooding.  相似文献   

2.
Social networks once being an innoxious platform for sharing pictures and thoughts among a small online community of friends has now transformed into a powerful tool of information, activism, mobilization, and sometimes abuse. Detecting true identity of social network users is an essential step for building social media an efficient channel of communication. This paper targets the microblogging service, Twitter, as the social network of choice for investigation. It has been observed that dissipation of pornographic content and promotion of followers market are actively operational on Twitter. This clearly indicates loopholes in the Twitter’s spam detection techniques. Through this work, five types of spammers-sole spammers, pornographic users, followers market merchants, fake, and compromised profiles have been identified. For the detection purpose, data of around 1 Lakh Twitter users with their 20 million tweets has been collected. Users have been classified based on trust, user and content based features using machine learning techniques such as Bayes Net, Logistic Regression, J48, Random Forest, and AdaBoostM1. The experimental results show that Random Forest classifier is able to predict spammers with an accuracy of 92.1%. Based on these initial classification results, a novel system for real-time streaming of users for spam detection has been developed. We envision that such a system should provide an indication to Twitter users about the identity of users in real-time.  相似文献   

3.
Football is the team sport that mostly attracts great mass audience. Because of the detailed information about all football matches of championships over almost a century, matches build a huge and valuable database to test prediction of matches results. The problem of modeling football data has become increasingly popular in the last years and learning machine have been used to predict football matches results in many studies. Our present work brings a new approach to predict matches results of championships. This approach investigates data of matches in order to predict the results, which are win, draw and defeat. The investigated groups were different type of combinations of two by two pairs, win-draw, win-defeat and draw-defeat, of the possible matches results of each championship. In this study we employed the features obtained by scouts during a football match. The proposed system applies a polynomial algorithm to analyse and define matches results. Some machine-learning algorithms were compared with our approach, which includes experiments with information obtained from the football championships. The association between polynomial algorithm and machine learning techniques allowed a significant increase of the accuracy values. Our polynomial algorithm provided an accuracy superior to 96%, selecting the relevant features from the training and testing set.  相似文献   

4.
As the distinction between online and physical spaces rapidly degrades, social media have now become an integral component of how many people's everyday experiences are mediated. As such, increasing interest has emerged in exploring how the content shared through those online platforms comes to contribute to the collaborative creation of places in physical space at the urban scale. Exploring digital geographies of social media data using methods such as qualitative coding (i.e., content labelling) is a flexible but complex task, commonly limited to small samples due to its impracticality over large datasets. In this paper, we propose a new tool for studies in digital geographies, bridging qualitative and quantitative approaches, able to learn a set of arbitrary labels (qualitative codes) on a small, manually-created sample and apply the same labels on a larger set. We introduce a semi-supervised, deep neural network approach to classify geo-located social media posts based on their textual and image content, as well as geographical and temporal aspects. Our innovative approach is rooted in our understanding of social media posts as augmentations of the time-space configurations that places are, and it comprises a stacked multi-modal autoencoder neural network to create joint representations of text and images, and a spatio-temporal graph convolution neural network for semi-supervised classification. The results presented in this paper show that our approach performs the classification of social media content with higher accuracy than traditional machine learning models as well as two state-of-art deep learning frameworks.  相似文献   

5.
The popularity of many social media sites has prompted both academic and practical research on the possibility of mining social media data for the analysis of public sentiment. Studies have suggested that public emotions shown through Twitter could be well correlated with the Dow Jones Industrial Average. However, it remains unclear how public sentiment, as reflected on social media, can be used to predict stock price movement of a particular publicly-listed company. In this study, we attempt to fill this research void by proposing a technique, called SMeDA-SA, to mine Twitter data for sentiment analysis and then predict the stock movement of specific listed companies. For the purpose of experimentation, we collected 200 million tweets that mentioned one or more of 30 companies that were listed in NASDAQ or the New York Stock Exchange. SMeDA-SA performs its task by first extracting ambiguous textual messages from these tweets to create a list of words that reflects public sentiment. SMeDA-SA then made use of a data mining algorithm to expand the word list by adding emotional phrases so as to better classify sentiments in the tweets. With SMeDA-SA, we discover that the stock movement of many companies can be predicted rather accurately with an average accuracy over 70%. This paper describes how SMeDA-SA can be used to mine social media date for sentiments. It also presents the key implications of our study.  相似文献   

6.
A variety of approaches have been recently proposed to automatically infer users’ personality from their user generated content in social media. Approaches differ in terms of the machine learning algorithms and the feature sets used, type of utilized footprint, and the social media environment used to collect the data. In this paper, we perform a comparative analysis of state-of-the-art computational personality recognition methods on a varied set of social media ground truth data from Facebook, Twitter and YouTube. We answer three questions: (1) Should personality prediction be treated as a multi-label prediction task (i.e., all personality traits of a given user are predicted at once), or should each trait be identified separately? (2) Which predictive features work well across different on-line environments? and (3) What is the decay in accuracy when porting models trained in one social media environment to another?  相似文献   

7.
Crisis events such as terrorist attacks are extensively commented upon on social media platforms such as Twitter. For this reason, social media content posted during emergency events is increasingly being used by news media and in social studies to characterize the public’s reaction to those events. This is typically achieved by having journalists select ‘representative’ tweets to show, or a classifier trained on prior human-annotated tweets is used to provide a sentiment/emotion breakdown for the event. However, social media users, journalists and annotators do not exist in isolation, they each have their own context and world view. In this paper, we ask the question, ‘to what extent do local and international biases affect the sentiments expressed on social media and the way that social media content is interpreted by annotators’. In particular, we perform a multi-lingual study spanning two events and three languages. We show that there are marked disparities between the emotions expressed by users in different languages for an event. For instance, during the 2016 Paris attack, there was 16% more negative comments written in the English than written in French, even though the event originated on French soil. Furthermore, we observed that sentiment biases also affect annotators from those regions, which can negatively impact the accuracy of social media labelling efforts. This highlights the need to consider the sentiment biases of users in different countries, both when analysing events through the lens of social media, but also when using social media as a data source, and for training automatic classification models.  相似文献   

8.

Twitter has nowadays become a trending microblogging and social media platform for news and discussions. Since the dramatic increase in its platform has additionally set off a dramatic increase in spam utilization in this platform. For Supervised machine learning, one always finds a need to have a labeled dataset of Twitter. It is desirable to design a semi-supervised labeling technique for labeling newly prepared recent datasets. To prepare the labeled dataset lot of human affords are required. This issue has motivated us to propose an efficient approach for preparing labeled datasets so that time can be saved and human errors can be avoided. Our proposed approach relies on readily available features in real-time for better performance and wider applicability. This work aims at collecting the most recent tweets of a user using Twitter streaming and prepare a recent dataset of Twitter. Finally, a semi-supervised machine learning algorithm based on the self-training technique was designed for labeling the tweets. Semi-supervised support vector machine and semi-supervised decision tree classifiers were used as base classifiers in the self-training technique. Further, the authors have applied K means clustering algorithm to the tweets based on the tweet content. The principled novel approach is an ensemble of semi-supervised and unsupervised learning wherein it was found that semi-supervised algorithms are more accurate in prediction than unsupervised ones. To effectively assign the labels to the tweets, authors have implemented the concept of voting in this novel approach and the label pre-directed by the majority voting classifier is the actual label assigned to the tweet dataset. Maximum accuracy of 99.0% has been reported in this paper using a majority voting classifier for spam labeling.

  相似文献   

9.
Twitter is a radiant platform with a quick and effective technique to analyze users’ perceptions of activities on social media. Many researchers and industry experts show their attention to Twitter sentiment analysis to recognize the stakeholder group. The sentiment analysis needs an advanced level of approaches including adoption to encompass data sentiment analysis and various machine learning tools. An assessment of sentiment analysis in multiple fields that affect their elevations among the people in real-time by using Naive Bayes and Support Vector Machine (SVM). This paper focused on analysing the distinguished sentiment techniques in tweets behaviour datasets for various spheres such as healthcare, behaviour estimation, etc. In addition, the results in this work explore and validate the statistical machine learning classifiers that provide the accuracy percentages attained in terms of positive, negative and neutral tweets. In this work, we obligated Twitter Application Programming Interface (API) account and programmed in python for sentiment analysis approach for the computational measure of user’s perceptions that extract a massive number of tweets and provide market value to the Twitter account proprietor. To distinguish the results in terms of the performance evaluation, an error analysis investigates the features of various stakeholders comprising social media analytics researchers, Natural Language Processing (NLP) developers, engineering managers and experts involved to have a decision-making approach.  相似文献   

10.
Web 2.0 and social media provide users with an opportunity to discuss and share opinions, as a result, a considerable amount of information will emerge which can be drawn upon to determine some demographic and behavioral features.This study is an attempt to predict gender, as a demographic feature, using linguistic features of data collected from the users' comments in the social media.For this purpose, a framework is proposed to predict the users' gender by counting the number of some given words including verbs, pronouns, articles, adjectives, adverbs, preposition and numbers. This framework, thereafter, was tested using the comments that readers of Los Angeles Times left and the model were observed to predict the gender with an accuracy of 66.66%. Security solution and e-marketing can use this framework respectively for authentication and niche marketing.  相似文献   

11.
膜蛋白在细胞生命活动中扮演着重要的角色。目前,有很多方法用来预测和分类膜转运蛋白。然而,预测膜蛋白功能的工作并不多。为了解决这个问题,基于蛋白质序列信息结合快速傅里叶变换利用支持向量机的方法预测来自TCDB 数据库中的channels/pores,electrochemical potential-driven transporters和primary active transporters三类膜转运蛋白共1 817条蛋白质的功能。模型使用20种氨基酸的分布,残基的疏水性、平均极性和溶剂化自由能为原始的特征数据,利用快速傅里叶变换将其转化为频域上的信息作为机器学习的特征输入。通过五倍交叉检验预测准确率达到了72.1%,而先前的文献报道的准确率为68.1%。论文的研究证明该方法可以有效地对channels/pores,electrochemical potential-driven transporters和primary active transporters 三种不同功能的膜转运蛋白进行功能分类。  相似文献   

12.

Tailoring the muckpile shape and its fragmentation to the requirements of the excavating equipment in surface mines can significantly improve the efficiency and savings through increased production, machine life and reduced maintenance. Considering the various blast parameters together to predict the throw is subtle and can lead to wrong conclusions. In this paper, a different approach was followed to combine the representational power of multilayer neural networks and various machine learning techniques to predict the throw of a bench blast using the data from a limestone mine located in central India. Then, using various analysis techniques, the training parameters have been adjusted to reduce the cross-validation error and increase the accuracy. Here, four different architectures of neural networks have been trained by different techniques, and the best model has been selected. The different machine learning techniques have been implemented on the basis of accuracy of the output. The sensitivity analysis has been done to get the relative importance of the variables in prediction of the output.

  相似文献   

13.
Cloud computing is the delivery of on‐demand computing resources. Cloud computing has numerous applications in fields of education, social networking, and medicine. But the benefit of cloud for medical purposes is seamless, particularly because of the enormous data generated by the health care industry. This colossal data can be managed through big data analytics, and hidden patterns can be extracted using machine learning procedures. In particular, the latest issue in the medical domain is the prediction of heart diseases, which can be resolved through culmination of machine learning and cloud computing. Hence, an attempt has been made to propose an intelligent decision support model that can aid medical experts in predicting heart disease based on the historical data of patients. Various machine learning algorithms have been implemented on the heart disease dataset to predict accuracy for heart disease. Naïve Bayes has been selected as an effective model because it provides the highest accuracy of 86.42% followed by AdaBoost and boosted tree. Further, these 3 models are being ensembled, which has increased the overall accuracy to 87.91%. The experimental results have also been evaluated using 10,082 instances that clearly validate the maximum accuracy through ensembling and minimum execution time in cloud environment.  相似文献   

14.

This paper proposed a new approach in predicting the local damage of reinforced concrete (RC) panels under impact loading using gradient boosting machine learning (GBML), one of the most powerful techniques in machine learning. A number of experimental data on the impact test of RC panels were collected for training and testing of the proposed model. With the lack of test data due to the high cost and complexity of the structural behavior of the panel under impact loading, it was a challenge to predict the failure mode accurately. To overcome this challenge, this study proposed a machine-learning model that uses a robust technique to solve the problem with a minimal amount of resources. Although the accuracy of the prediction result was not as high as expected due to the lack of data and the unbalance experimental output features, this paper provided a new approach that may alternatively replace the conventional method in predicting the failure mode of RC panel under impact loading. This approach is also expected to be widely used for predicting the structural behavior of component and structures under complex and extreme loads.

  相似文献   

15.
16.
Polls show a strong decline in public trust of traditional news outlets; however, social media offers new avenues for receiving news content. This experiment used the Facebook API to manipulate whether a news story appeared to have been posted on Facebook by one of the respondent's real‐life Facebook friends. Results show that social media recommendations improve levels of media trust, and also make people want to follow more news from that particular media outlet in the future. Moreover, these effects are amplified when the real‐life friend sharing the story on social media is perceived as an opinion leader. Implications for democracy and the news business are discussed.  相似文献   

17.
Facebook, Twitter, Instagram, and other social media have emerged as excellent platforms for interacting with friends and expressing thoughts, posts, comments, images, and videos that express moods, sentiments, and feelings. With this, it has become possible to examine user thoughts and feelings in social network data to better understand their perspectives and attitudes. However, the analysis of depression based on social media has gained widespread acceptance worldwide, other verticals still have yet to be discovered. The depression analysis uses Twitter data from a publicly available web source in this work. To assess the accuracy of depression detection, long-short-term memory (LSTM) and convolution neural network (CNN) techniques were used. This method is both efficient and scalable. The simulation results have shown an accuracy of 86.23%, which is reasonable compared to existing methods.  相似文献   

18.
Attributing authorship of documents with unknown creators has been studied extensively for natural language text such as essays and literature, but less so for non‐natural languages such as computer source code. Previous attempts at attributing authorship of source code can be categorised by two attributes: the software features used for the classification, either strings of n tokens/bytes (n‐grams) or software metrics; and the classification technique that exploits those features, either information retrieval ranking or machine learning. The results of existing studies, however, are not directly comparable as all use different test beds and evaluation methodologies, making it difficult to assess which approach is superior. This paper summarises all previous techniques to source code authorship attribution, implements feature sets that are motivated by the literature, and applies information retrieval ranking methods or machine classifiers for each approach. Importantly, all approaches are tested on identical collections from varying programming languages and author types. Our conclusions are as follows: (i) ranking and machine classifier approaches are around 90% and 85% accurate, respectively, for a one‐in‐10 classification problem; (ii) the byte‐level n‐gram approach is best used with different parameters to those previously published; (iii) neural networks and support vector machines were found to be the most accurate machine classifiers of the eight evaluated; (iv) use of n‐gram features in combination with machine classifiers shows promise, but there are scalability problems that still must be overcome; and (v) approaches based on information retrieval techniques are currently more accurate than approaches based on machine learning. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
20.
Artificial surfaces represent one of the key land cover types, and validation is an indispensable component of land cover mapping that ensures data quality. Traditionally, validation has been carried out by confronting the produced land cover map with reference data, which is collected through field surveys or image interpretation. However, this approach has limitations, including high costs in terms of money and time. Recently, geo-tagged photos from social media have been used as reference data. This procedure has lower costs, but the process of interpreting geo-tagged photos is still time-consuming. In fact, social media point of interest (POI) data, including geo-tagged photos, may contain useful textual information for land cover validation. However, this kind of special textual data has seldom been analysed or used to support land cover validation. This paper examines the potential of textual information from social media POIs as a new reference source to assist in artificial surface validation without photo recognition and proposes a validation framework using modified decision trees. First, POI datasets are classified semantically to divide POIs into the standard taxonomy of land cover maps. Then, a decision tree model is built and trained to classify POIs automatically. To eliminate the effects of spatial heterogeneity on POI classification, the shortest distances between each POI and both roads and villages serve as two factors in the modified decision tree model. Finally, a data transformation based on a majority vote algorithm is then performed to convert the classified points into raster form for the purposes of applying confusion matrix methods to the land cover map. Using Beijing as a study area, social media POIs from Sina Weibo were collected to validate artificial surfaces in GlobeLand30 in 2010. A classification accuracy of 80.68% was achieved through our modified decision tree method. Compared with a classification method without spatial heterogeneity, the accuracy is 10% greater. This result indicates that our modified decision tree method displays considerable skill in classifying POIs with high spatial heterogeneity. In addition, a high validation accuracy of 92.76% was achieved, which is relatively close to the official result of 86.7%. These preliminary results indicate that social media POI datasets are valuable ancillary data for land cover validation, and our proposed validation framework provides opportunities for land cover validation with low costs in terms of money and time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号