排序方式: 共有844条查询结果,搜索用时 15 毫秒
1.
Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Intemet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offiine traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure. 相似文献
2.
3.
4.
一种基于云计算的关联规则Apriori算法 总被引:1,自引:0,他引:1
关联规则是数据挖掘的重要方法之一,它基于支持度和置信度等对规则进行选择,以生成有用的规则。传统的关联规则算法需要读取数据库计算频繁集,开销巨大。随着云计算的发展,MapReduce编程架构已经成为云计算中的重要技术,针对Apriori算法的不足,设计了一种算法将云计算MapReduce框架进行了适当改进,并在此基础上编写了Apriori算法,以解决Apriori算法扩展性差的弱点。实验表明:该算法能有效提高Apriori算法的性能。 相似文献
5.
随着互联网用户及内容的指数级增长,大规模数据场景下的杰卡德相似系数计算对算法的效率提出了更高的要求。为提高算法的执行效率,对MapReduce架构下的算法执行缺陷进行了分析,结合Spark适用于迭代型及交互型任务的特点,基于二维划分算法将算法从MapReduce平台移植到Spark平台;并通过参数调整、内存优化等方法进一步提高了算法的执行效率。两组数据集分别在3组不同规模的集群上的实验结果表明,与MapReduce相比,Spark平台下的算法执行效率提高了4倍以上,能耗效率提升了3倍以上。 相似文献
6.
极限学习机算法虽然训练速度较快,但包含了大量矩阵运算,因此其在面对大数据量时,处理效率依然缓慢。在充分研究Spark分布式数据集并行计算机制的基础上,设计了核心环节矩阵乘法的并行计算方案,并对基于Spark的极限学习机并行化算法进行了设计与实现。为方便性能比较,同时实现了基于Hadoop MapReduce的极限学习机并行化算法。实验结果表明,基于Spark的极限学习机并行化算法相比于Hadoop MapReduce版本的运行时间明显缩短,而且若处理数据量越大,Spark在效率方面的优势就越明显。 相似文献
7.
殷秀叶 《武汉工程大学学报》2014,36(9):66-69
大数据环境下的相似重复记录影响数据统计分析结果的准确性,需要过滤相似重复记录.对相似重复记录检测的研究现状做了介绍,在此基础上提出了属性加权的思想,对属性进行加权,并根据属性权值进行排序分组;在对属性加权时,考虑到一些字段的取值是一一对应的关系,权值相同,提出了同义属性的概念,在原数据集的基础上排除部分同义属性来缩减数据集,提高重复数据检测的效率,最后给出了相似重复记录判定的方法.考虑到大数据集给重复记录检测带来的挑战,将大数据集拆分成若干小数据集,充分利用MapReduce机制进行处理,将大数据集按照权重较大的属性取值进行分组,分割成若干个map任务,分别进行处理.实验结果表明,该方法能够有效地提高相似重复记录检测的效率. 相似文献
8.
赵曦 《微电子学与计算机》2013,(3)
本文提出了一种将业务流程进行优化分解成为可以进行独立并行处理任务的方法,可以在云计算环境下分组处理具有共同特征的计算和操作任务,实现优化资源调配.通过 Hadoop MapReduce 并行计算架构进行模拟验证,实验结果表明了该方法在业务处理效率、资源使用和灵活性方面的优势,在大量在线和批量业务流程处理领域具有一定的应用和深入研究价值. 相似文献
9.
Chao Liu Deze Zeng Hong Yao Xuesong Yan Linchen Yu Zhangjie Fu 《Concurrency and Computation》2020,32(3)
Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel‐like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub‐graph computational tasks. We also re‐design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real‐world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel‐like system) by 58%−69%, and achieves 10×−17× performance improvement over Hadoop. 相似文献
10.
Steganalysis is to detect whether or not the seemly innocent image hiding message. It is an important research topic in information security. With the development of steganography technology, steganalysis becomes more and more difficult. Some steganalysis methods have been proposed to improve the performance. Most research work concentrates on special steganography information detection and the image steganography features are designed manually. Few research works concentrate on universal steganalysis methods. In this paper, as the first several attempts, a novel image steganalysis method based on deep neural network is proposed. First, image high‐frequency features are extracted with wavelet transformation method because that most image hiding message are high frequency. Second, high‐dimensional image steganography features are extracted with deep neural networks according to the high‐frequency images and informative features combination is selected with a novel feature selection method based on entropy. Then, a parallel SVM model is proposed to build the steganalysis model based on large scale training samples. At last, the efficiency of the proposed method is illustrated through analyzing a practical image steganalysis example. 相似文献