首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment.  相似文献   

2.
Traditional post-level opinion classification methods usually fail to capture a person’s overall sentiment orientation toward a topic from his/her microblog posts published for a variety of themes related to that topic. One reason for this is that the sentiments connoted in the textual expressions of microblog posts are often obscure. Moreover, a person’s opinions are often influenced by his/her social network. This study therefore proposes a new method based on integrated information of microblog users’ social interactions and textual opinions to infer the sentiment orientation of a user or the whole group regarding a hot topic. A Social Opinion Graph (SOG) is first constructed as the data model for sentiment analysis of a group of microblog users who share opinions on a topic. This represents their social interactions and opinions. The training phase then uses the SOGs of training sets to construct Sentiment Guiding Matrix (SGM), representing the knowledge about the correlation between users’ sentiments, Textual Sentiment Classifier (TSC), and emotion homophily coefficients of the influence of various types of social interaction on users’ mutual sentiments. All of these support a high-performance social sentiment analysis procedure based on the relaxation labeling scheme. The experimental results show that the proposed method has better sentiment classification accuracy than the textual classification and other integrated classification methods. In addition, IMSA can reduce pre-annotation overheads and the influence from sampling deviation.  相似文献   

3.
We initiate a new line of investigation into online property-preserving data reconstruction. Consider a dataset which is assumed to satisfy various (known) structural properties; e.g., it may consist of sorted numbers, or points on a manifold, or vectors in a polyhedral cone, or codewords from an error-correcting code. Because of noise and errors, however, an (unknown) fraction of the data is deemed unsound, i.e., in violation with the expected structural properties. Can one still query into the dataset in an online fashion and be provided data that is always sound? In other words, can one design a filter which, when given a query to any item I in the dataset, returns a sound item J that, although not necessarily in the dataset, differs from I as infrequently as possible. No preprocessing should be allowed and queries should be answered online.We consider the case of a monotone function. Specifically, the dataset encodes a function f:{1,…,n}?? R that is at (unknown) distance ε from monotone, meaning that f can—and must—be modified at ε n places to become monotone.Our main result is a randomized filter that can answer any query in O(log?2 nlog? log?n) time while modifying the function f at only O(ε n) places. The amortized time over n function evaluations is O(log?n). The filter works as stated with probability arbitrarily close to 1. We provide an alternative filter with O(log?n) worst case query time and O(ε nlog?n) function modifications. For reconstructing d-dimensional monotone functions of the form f:{1,…,n} d ? ? R, we present a filter that takes (2 O(d)(log?n)4d?2log?log?n) time per query and modifies at most O(ε n d ) function values (for constant d).  相似文献   

4.
Cellular Learning Automata (CLAs) are hybrid models obtained from combination of Cellular Automata (CAs) and Learning Automata (LAs). These models can be either open or closed. In closed CLAs, the states of neighboring cells of each cell called local environment affect on the action selection process of the LA of that cell whereas in open CLAs, each cell, in addition to its local environment has an exclusive environment which is observed by the cell only and the global environment which can be observed by all the cells in CLA. In dynamic models of CLAs, one of their aspects such as structure, local rule or neighborhood radius may change during the evolution of the CLA. CLAs can also be classified as synchronous CLAs or asynchronous CLAs. In a synchronous CLA, all LAs in different cells are activated synchronously whereas in an asynchronous CLA, the LAs in different cells are activated asynchronously. In this paper, a new closed asynchronous dynamic model of CLA whose structure and the number of LAs in each cell may vary with time has been introduced. To show the potential of the proposed model, a landmark clustering algorithm for solving topology mismatch problem in unstructured peer-to-peer networks has been proposed. To evaluate the proposed algorithm, computer simulations have been conducted and then the results are compared with the results obtained for two existing algorithms for solving topology mismatch problem. It has been shown that the proposed algorithm is superior to the existing algorithms with respect to communication delay and average round-trip time between peers within clusters.  相似文献   

5.
Social networks have become a good place to promote products and also to campaign for causes. Maximizing the spread of information in an online social network at a least cost has attracted the attention of publicist’s. In general, influence user ranking methods are derived either by a network’s topological features or by user features but not both. Existing Influence Maximization Problem (IMP) operates as a modification of greedy algorithms that cannot scale streaming data. Which are time consuming and cannot handle large networks because it requires heavy Monte-Carlo simulation. This is also an NP hard problem in both linear threshold and independent cascade models. Our proposed work aims to address IMP through a Rank-based sampling approach in the Map-Reduce environment. This novel technique combines user and topological features of the network enabling it to handle real-time streaming data. Our experiment of influenced rank-based sampling approach to influence maximization is compared to the greedy approach with and without sampling that exhibits an accuracy of 82%. Performance analysis in terms of running time is reduced from O(n 3) to O(k n). Where ‘k’ is the size of the sample dataset and ‘n’ is the number of user’s.  相似文献   

6.
Crowdsourcing applications like Amazon Mechanical Turk (AMT) make it possible to address many difficult tasks (e.g., image tagging and sentiment analysis) on the internet and make full use of the wisdom of crowd, where worker quality is one of the most crucial issues for the task owners. Thus, a challenging problem is how to effectively and efficiently select the high quality workers, so that the tasks online can be accomplished successfully under a certain budget. The existing methods on the crowd worker selection problem mainly based on the quality measurement of the crowd workers, those who have to register on the crowdsourcing platforms. With the connect of the OSNs and the crowdsourcing applications, the social contexts like social relationships and social trust between participants and social positions of participants can assist requestors to select one or a group of trustworthy crowdsourcing workers. In this paper, we first present a contextual social network structure and a concept of Strong Social Component (SSC), which emblems a group of workers who have high social contexts values. Then, we propose a novel index for SSC, and a new efficient and effective algorithm C-AWSA to find trustworthy workers, who can complete the tasks with high quality. The results of our experiments conducted on four real OSN datasets illustrate that the superiority of our method in trustworthy worker selection.  相似文献   

7.
We present findings from a five-week deployment of voting technologies in a city neighbourhood. Drawing on Marres’ (2012) work on material participation and Massey’s (2005) conceptualisation of space as dynamic, we designed the deployment such that the technologies (which were situated in residents’ homes, on the street, and available online) would work in concert, cutting across the neighbourhood to make visible, juxtapose and draw together the different ‘small worlds’ within it. We demonstrate how the material infrastructure of the voting devices set in motion particular processes and interpretations of participation, putting data in place in a way that had ramifications for the recognition of heterogeneity. We conclude that redistributing participation means not only opening up access, so that everyone can participate, or even providing a multitude of voting channels, so that people can participate in different ways. Rather, it means making visible multiplicity, challenging notions of similarity, and showing how difference may be productive.  相似文献   

8.
We consider the k-Server problem under the advice model of computation when the underlying metric space is sparse. On one side, we introduce Θ(1)-competitive algorithms for a wide range of sparse graphs. These algorithms require advice of (almost) linear size. We show that for graphs of size N and treewidth α, there is an online algorithm that receives O (n(log α + log log N))* bits of advice and optimally serves any sequence of length n. We also prove that if a graph admits a system of μ collective tree (q, r)-spanners, then there is a (q + r)-competitive algorithm which requires O (n(log μ + log log N)) bits of advice. Among other results, this gives a 3-competitive algorithm for planar graphs, when provided with O (n log log N) bits of advice. On the other side, we prove that advice of size Ω(n) is required to obtain a 1-competitive algorithm for sequences of length n even for the 2-server problem on a path metric of size N ≥ 3. Through another lower bound argument, we show that at least \(\frac {n}{2}(\log \alpha - 1.22)\) bits of advice is required to obtain an optimal solution for metric spaces of treewidth α, where 4 ≤ α < 2k.  相似文献   

9.
Due to its wide applications, subgraph query has attracted lots of attentions in database community. In this paper, we focus on subgraph query over a single large graph G, i.e., finding all embeddings of query Q in G. Different from existing feature-based approaches, we map all edges into a two-dimensional space R 2 and propose a bitmap structure to index R 2. At run time, we find a set of adjacent edge pairs (AEP) or star-style patterns (SSP) to cover Q. We develop edge join (EJ) algorithms to address both AEP and SSP subqueries. Based on the bitmap index, our method can optimize I/O and CPU cost. More importantly, our index has the linear space complexity instead of exponential complexity in feature-based approaches, which indicates that our index can scale well with respect to large data size. Furthermore, our index has light maintenance overhead, which has not been considered in most of existing work. Extensive experiments show that our method significantly outperforms existing ones in both online and offline processing with respect to query response time, index building time, index size and index maintenance overhead.  相似文献   

10.
We introduce the novel concept of knowledge states. The knowledge state approach can be used to construct competitive randomized online algorithms and study the trade-off between competitiveness and memory. Many well-known algorithms can be viewed as knowledge state algorithms. A knowledge state consists of a distribution of states for the algorithm, together with a work function which approximates the conditional obligations of the adversary. When a knowledge state algorithm receives a request, it then calculates one or more “subsequent” knowledge states, together with a probability of transition to each. The algorithm uses randomization to select one of those subsequents to be the new knowledge state. We apply this method to randomized k-paging. The optimal minimum competitiveness of any randomized online algorithm for the k-paging problem is the kth harmonic number, \(H_{k}=\sum^{k}_{i=1}\frac{1}{i}\). Existing algorithms which achieve that optimal competitiveness must keep bookmarks, i.e., memory of the names of pages not in the cache. An H k -competitive randomized algorithm for that problem which uses O(k) bookmarks is presented, settling an open question by Borodin and El-Yaniv. In the special cases where k=2 and k=3, solutions are given using only one and two bookmarks, respectively.  相似文献   

11.
Let f be an integer valued function on a finite set V. We call an undirected graph G(V,E) a neighborhood structure for f. The problem of finding a local minimum for f can be phrased as: for a fixed neighborhood structure G(V,E) find a vertex xV such that f(x) is not bigger than any value that f takes on some neighbor of x. The complexity of the algorithm is measured by the number of questions of the form “what is the value of f on x?” We show that the deterministic, randomized and quantum query complexities of the problem are polynomially related. This generalizes earlier results of Aldous (Ann. Probab. 11(2):403–413, [1983]) and Aaronson (SIAM J. Comput. 35(4):804–824, [2006]) and solves the main open problem in Aaronson (SIAM J. Comput. 35(4):804–824, [2006]).  相似文献   

12.
The present paper focuses on identification issues of the micro- and macrocharacteristics of social networks proposed in [1]. For this, we employ data on real online social networks—Facebook, LiveJournal and Twitter. And finally, the results of corresponding simulation experiments are provided and compared.  相似文献   

13.
Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed.  相似文献   

14.
According to New York Times, 5.6 million people in the United States are paralyzed to some degree. Motivated by requirements of these paralyzed patients in controlling assisted-devices that support their mobility, we present a novel EEG-based BCI system, which is composed of an Emotive EPOC neuroheadset, a laptop and a Lego Mindstorms NXT robot in this paper. We provide online learning algorithms that consist of k-means clustering and principal component analysis to classify the signals from the headset into corresponding action commands. Moreover, we also discuss how to integrate the Emotiv EPOC headset into the system, and how to integrate the LEGO robot. Finally, we evaluate the proposed online learning algorithms of our BCI system in terms of precision, recall, and the F-measure, and our results show that the algorithms can accurately classify the subjects’ thoughts into corresponding action commands.  相似文献   

15.
We say that an s-subset of codewords of a code X is (s, l)-bad if X contains l other codewords such that the conjunction of these l words is covered by the disjunction of the words of the s-subset. Otherwise, an s-subset of codewords of X is said to be (s, l)-bad. A binary code X is called a disjunctive (s, l) cover-free (CF) code if X does not contain (s, l)-bad subsets. We consider a probabilistic generalization of (s, l) CF codes: we say that a binary code is an (s, l) almost cover-free (ACF) code if almost all s-subsets of its codewords are (s, l)-good. The most interesting result is the proof of a lower and an upper bound for the capacity of (s, l) ACF codes; the ratio of these bounds tends as s→∞ to the limit value log2 e/(le).  相似文献   

16.
Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases.  相似文献   

17.
Value-based requirements engineering: exploring innovative e-commerce ideas   总被引:5,自引:0,他引:5  
Innovative e-commerce ideas are characterised by commercial products yet unknown to the market, enabled by information technology such as the Internet and technologies on top of it. How to develop such products is hardly known. We propose an interdisciplinary approach, e 3 -value, to explore an innovative e-commerce idea with the aim of understanding such an idea thoroughly and evaluating it for potential profitability. Our methodology exploits a requirements engineering way of working, but employs concepts and terminology from business science, marketing and axiology. It shows how to model business requirements and improve business–IT alignment, in sophisticated multi-actor value constellations that are common in electronic commerce. In addition to the e 3 -value approach methodology, we also present the action research-based development of our methodology, by using one of the longitudinal projects we carried out in the field of online news article provisioning.  相似文献   

18.
The non-configurational geometrization of the electromagnetic field can be realized using the Model of Embedded Spaces (MES). This model assumes the existence of proper 4D space-time manifolds of particles with a nonzero rest mass and declares that physical space-time is the metric result of the dynamic embedding of these manifolds: the value of the partial contribution of the element manifold is determined by element interactions. The space of the model is provided with a Riemann-like geometry, whose differential formalism is described by a generalization of the gradient operator ?/?x i ?/?x i + 2u k ?2/?x[ i ?u k ], where u i = dx i /ds is a matter velocity. In the paper, the redshift effect existing in the space of MES is considered, and its electromagnetic component is analyzed. It is shown that for cold matter of the modern Universe this component reduces to a shift in electric fields and is described by the expression \(\Delta {\omega _e}/\omega \simeq \mp \sqrt k \Delta {\varphi _e}/{c^2} = \mp 0.861 \cdot {10^{ - 21}}\Delta {\varphi _e}\left( V \right)\), where the potential is measured in volts and the sign must be determined experimentally. Testing of the effect is the “experimentum crusis” for MES.  相似文献   

19.
Given a tree T=(V,E) of n nodes such that each node v is associated with a value-weight pair (val v ,w v ), where value val v is a real number and weight w v is a non-negative integer, the density of T is defined as \(\frac{\sum_{v\in V}{\mathit{val}}_{v}}{\sum_{v\in V}w_{v}}\). A subtree of T is a connected subgraph (V′,E′) of T, where V′?V and E′?E. Given two integers w min? and w max?, the weight-constrained maximum-density subtree problem on T is to find a maximum-density subtree T′=(V′,E′) satisfying w min?≤∑vV w v w max?. In this paper, we first present an O(w max? n)-time algorithm to find a weight-constrained maximum-density path in a tree T, and then present an O(w max? 2 n)-time algorithm to find a weight-constrained maximum-density subtree in T. Finally, given a node subset S?V, we also present an O(w max? 2 n)-time algorithm to find a weight-constrained maximum-density subtree in T which covers all the nodes in S.  相似文献   

20.
The starting point of our research is the following problem: given a doubling metric ?=(V,d), can one (efficiently) find an unweighted graph G′=(V′,E′) with V?V′ whose shortest-path metric d′ is still doubling, and which agrees with d on V×V? While it is simple to show that the answer to the above question is negative if distances must be preserved exactly. However, allowing a (1+ε) distortion between d and d′ enables us bypass this hurdle, and obtain an unweighted graph G′ with doubling dimension at most a factor O(log?ε ?1) times the doubling dimension of G.More generally, this paper gives algorithms that construct graphs G′ whose convex (or geodesic) closure has doubling dimension close to that of ?, and the shortest-path distances in G′ closely approximate those of ? when restricted to V×V. Similar results are shown when the metric ? is an additive (tree) metric and the graph G′ is restricted to be a tree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号