首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 51 毫秒
1.
航天信号数据是反映产品功能性能变化的重要载体,基于数据的分析对于故障预测、运维使用乃至产品优化迭代具有重要作用。信号数据分析算法众多,但存在组织应用层面中存在不能共享、不可灵活配置的瓶颈。该文在分析信号数据分析所面临的数据规范、数据存储和算法配置所面临问题的基础上,提出并建立了一种.NET框架技术设计的柔性的信号数据分析平台。在设计数据处理规范的基础上,使用大对象技术实现了高效数据吞吐,使用软驱动技术实现算法引擎从而实现对各种算法的调用和柔性算法配置,从而使组织层面能够共享数据并形成算法生态。实现的信号数据分析平台得到了良好的应用,内置的算法能够有效实现产品试验与在轨过程的异常监测。  相似文献   

2.
分析了当前HIS在信息分析处理领域面临的数据来源复杂多变以及信息分析算法在定制、共享和动态维护等方面存在的主要问题,采用元数据实现对信息分析算法的描述,基于Web Service架构设计了信息分析算法管理模块和信息分析算法库系统.相对于传统模式的信息分析算法,实现了数据和算法的松散耦合和对算法库的动态管理,并进行了实现和验证,研究结果表明,系统能够有效改善HIS中信息分析和数据处理的可维护性和通用性.  相似文献   

3.
超大型压缩数据仓库上的CUBE算法   总被引:9,自引:2,他引:7  
高宏  李建中 《软件学报》2001,12(6):830-839
数据压缩是提高多维数据仓库性能的重要途径,联机分析处理是数据仓库上的主要应用,Cube操作是联机分析处理中最常用的操作之一.压缩多维数据仓库上的Cube算法的研究是数据库界面临的具有挑战性的重要任务.近年来,人们在Cube算法方面开展了大量工作,但却很少涉及多维数据仓库和压缩多维数据仓库.到目前为止,只有一篇论文提出了一种压缩多维数据仓库上的Cube算法.在深入研究压缩数据仓库上的Cube算法的基础上,提出了产生优化Cube计算计划的启发式算法和3个压缩多维数据仓库上的Cube算法.所提出的Cube算法直  相似文献   

4.
由于虚拟机采用虚拟化技术和代码混淆技术,采用传统的逆向分析方法还原被虚拟机保护的算法时存在较大困难。为此,提出一种基于动态数据流分析的虚拟机保护破解方法。以动态二进制插桩平台Pin作为支撑,跟踪记录被虚拟机保护的算法在动态执行过程中的数据流信息,对记录的数据流信息进行整理分析,获取虚拟机指令的解释执行轨迹,还原程序的控制流图,根据轨迹信息对数据生成过程进行分层次、分阶段还原,并由分析人员结合控制流图和数据生成过程进行算法重构。实验结果证明,该方法能够正确还原程序的控制流和数据生成过程,辅助分析人员完成被保护算法的重构。  相似文献   

5.
遥感数据同化技术在动力模型框架内,使用数据同化算法对动力模型输出的定量(物理、化学量)数据与观测数据进行一致性处理与结果误差分析。将多源遥感数据同化到动力模型预测与参数估计中,可帮助改善地表、大气和海洋变化的分析和预测精度。以国家发改委"十二五"建设的国家航空遥感系统项目为依托,针对航空遥感系统10种传感器设计开发数据同化系统。因无法找到适用于该系统的3DVAR和EnKF算法程序,必须自主开发核心算法程序。介绍了研究开发的航空遥感数据同化算法集成计算与可视化系统及其核心算法的关键技术流程。实验结果证实,该系统可以有效地对航空遥感数据进行同化。  相似文献   

6.
Clustering is a popular data analysis and data mining technique. It is the unsupervised classification of patterns into groups. Many algorithms for large data sets have been proposed in the literature using different techniques. However, conventional algorithms have some shortcomings such as slowness of the convergence, sensitive to initial value and preset classed in large scale data set etc. and they still require much investigation to improve performance and efficiency. Over the last decade, clustering with ant-based and swarm-based algorithms are emerging as an alternative to more traditional clustering techniques. Many complex optimization problems still exist, and it is often very difficult to obtain the desired result with one of these algorithms alone. Thus, robust and flexible techniques of optimization are needed to generate good results for clustering data. Some algorithms that imitate certain natural principles, known as evolutionary algorithms have been used in a wide variety of real-world applications. Recently, much research has been proposed using hybrid evolutionary algorithms to solve the clustering problem. This paper provides a survey of hybrid evolutionary algorithms for cluster analysis.  相似文献   

7.
章永来  周耀鉴 《计算机应用》2019,39(7):1869-1882
大数据时代,聚类这种无监督学习算法的地位尤为突出。近年来,对聚类算法的研究取得了长足的进步。首先,总结了聚类分析的全过程、相似性度量、聚类算法的新分类及其结果的评价等内容,将聚类算法重新划分为大数据聚类与小数据聚类两个大类,并特别对大数据聚类作了较为系统的分析与总结。此外,概述并分析了各类聚类算法的研究进展及其应用概况,并结合研究课题讨论了算法的发展趋势。  相似文献   

8.
目前数据挖掘算法的评价   总被引:13,自引:2,他引:11  
首先讨论了数据挖掘算法的评价标准问题,然后运用数据封装分析的方法评价了目前的分类算法,基于实验结果,对目前的关联规则挖掘算法进行了评价。  相似文献   

9.
The application of a simulated binary array processor (BAP) to the rapid analysis of a sequence of images has been studied. Several algorithms have been developed which may be implemented on many existing parallel processing machines. The characteristic operations of a BAP are discussed and analyzed. A set of preprocessing algorithms are described which are designed to register two images of TV-type video data in real time. These algorithms illustrate the potential uses of a BAP and their cost is analyzed in detail. The results of applying these algorithms to FLIR data and to noisy optical data are given. An analysis of these algorithms illustrates the importance of an efficient global feature extraction hardware for image understanding applications.  相似文献   

10.
大数据环境下,机器学习算法受到前所未有的重视。总结和分析了传统机器学习算法在海量数据场景下出现的若干问题,基于当代并行机分类回顾了国内外并行机器学习算法的研究现状,并归纳总结了并行机器学习算法在各种基础体系下存在的问题。针对大数据环境下并行机器学习算法进行了简要的总结,并对其发展趋势作了展望。  相似文献   

11.
While numerous page segmentation algorithms have been proposed in the literature, there is lack of comparative evaluation of these algorithms. In the existing performance evaluation methods, two crucial components are usually missing: 1) automatic training of algorithms with free parameters and 2) statistical and error analysis of experimental results. We use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) first, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms on the training data set, 4) the segmentation algorithms are then evaluated on the test data set, and, finally, 5) a statistical and error analysis is performed to give the statistical significance of the experimental results. In particular, instead of the ad hoc and manual approach typically used in the literature for training algorithms, we pose the automatic training of algorithms as an optimization problem and use the simplex algorithm to search for the optimal parameter value. A paired-model statistical analysis and an error analysis are then conducted to provide confidence intervals for the experimental results of the algorithms. This methodology is applied to the evaluation of live page segmentation algorithms of which, three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III data set. It is found that the performance indices of the Voronoi, Docstrum, and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which, in turn, is significantly better than that of X-Y cut  相似文献   

12.
Interprocedural data flow information is useful for many software testing and analysis techniques, including data flow testing, regression testing, program slicing and impact analysis. For programs with aliases, these testing and analysis techniques can yield invalid results, unless the data flow information accounts for aliasing effects. Recent research provides algorithms for performing interprocedural data flow analysis in the presence of aliases; however, these algorithms are expensive, and achieve precise results only on complete programs. This paper presents an algorithm for performing alias analysis on incomplete programs that lets individual software components such as library routines, subroutines or subsystems be independently analyzed. The paper also presents an algorithm for reusing the results of this separate analysis when the individual software components are linked with calling modules. Our algorithms let us analyze frequently used software components, such as library routines or classes, independently, and reuse the results of that analysis when analyzing calling programs, without incurring the expense of completely reanalyzing each calling program. Our algorithms also provide a way to analyze large systems incrementally  相似文献   

13.
基于广义马氏距离的缺损数据补值算法   总被引:1,自引:0,他引:1  
陈欢  黄德才 《计算机科学》2011,38(5):149-153
在数据收集过程中数据缺损是不可避免的。如何还原这些缺损数据,成为数据挖掘研究的热点问题之一。与许多现有算法一样,基于马氏距离的缺损数据补值算法充分利用了实际数据之间的相关性,具有较好的补值效果,但它要求数据的相关性协方差矩阵可逆,使其应用范围受到了极大的限制。在改进传统主成分分析方法的基础上,利用矩阵的奇异值分解理论和Moors Pcnrosc广义逆性质,提出了广义马氏距离的概念,并运用于SOFM神经网络,结合信息嫡理论设计了基于广义马氏距离的缺损数据补值算法—GS算法。理论分析和数值仿真结果表明,广义马氏距离完全继承了马氏距离在处理相关性数据上的性能优势,新算法不仅在补值的精确度和稳定性上有很好的效果,而且适用于任意数据集合。  相似文献   

14.
《Information Sciences》2005,169(1-2):1-25
Imputation of missing data is of interest in many areas such as survey data editing, medical documentation maintaining and DNA microarray data analysis. This paper is devoted to experimental analysis of a set of imputation methods developed within the so-called least-squares approximation approach, a non-parametric computationally effective multidimensional technique. First, we review global methods for least-squares data imputation. Then we propose extensions of these algorithms based on the nearest neighbours approach. An experimental study of the algorithms on generated data sets is conducted. It appears that straight algorithms may work rather well on data of simple structure and/or with small number of missing entries. However, in more complex cases, the only winner within the least-squares approximation approach is a method, INI, proposed in this paper as a combination of global and local imputation algorithms.  相似文献   

15.
Semi-structured documents and data pervade modern workflows in all areas. Collaborative work and version management rely on effective, automatic difference analysis and three-way difference analysis tools. In our effort to develop a three-way difference analysis for tree-structured documents we developed a kernel three-way difference algorithm which extends the equality-based procedures, such as GNU diff3, by considering the similarity of documents in the difference analysis as well as to ignore the order of data if that is semantically suitable. As a result we obtain difference analysis algorithms that can be more fine-tuned to the application domain. Moreover, the equality-based counter-parts of our three-way difference analysis algorithms has the idempotency property, which the current three-way diff algorithms lacks.  相似文献   

16.
Distributed Data Mining in Peer-to-Peer Networks   总被引:9,自引:0,他引:9  
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.  相似文献   

17.
将计算密度高的部分迁移到GPU上是加速经典数据挖掘算法的有效途径。首先介绍GPU特性和主要的GPU编程模型,随后针对数据挖掘主要任务类型分别介绍基于GPU加速的工作,包括分类、聚类、关联分析、时序分析和深度学习。最后分别基于CPU和GPU实现协同过滤推荐的两类经典算法,并基于经典的MovieLens数据集的实验验证GPU对加速数据挖掘应用的显著效果,进一步了解GPU加速的工作原理和实际意义。  相似文献   

18.
Qualitative trend analysis (QTA) of sensor data is a useful tool for process monitoring, fault diagnosis and data mining. However, because of the varying background noise characteristics and different scales of sensor trends, automated and reliable trend extraction remains a challenge for trend-based analysis systems. In this paper, several new polynomial fit-based trend extraction algorithms are first developed, which determine the parameters automatically in the hypothesis testing framework. An existing trend analysis method developed by Dash et al. (2004) is then modified and added to the abovementioned trend extraction algorithms, which form a complete solution for QTA. The performance comparison of these algorithms is made on a set of simulated data and Tennessee Eastman process data based on several metrics.  相似文献   

19.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

20.
数据挖掘空间聚类   总被引:1,自引:1,他引:0  
聚类分析在数据挖掘领域中得到了广泛的应用,对空间数据的聚类是其中的一个重要研究方向。文章提出了对空间数据聚类的6个标准,并基于这6个标准对一些传统的空间数据聚类算法作了分析比较。在分析的基础上指出没有一种老的算法能同时处理大量数据点、高维数据和多噪声的问题。接着对近年来改进或创新的聚类算法作了简要分析,并对未来发展方向进行了简要展望,目的主要是便于研究者全面了解和掌握空间数据聚类的现有算法,发现更高性能的聚类算法,也使用户能方便快速地找到适合特定问题的聚类方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号