首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
一种新的中心对称聚类算法   总被引:2,自引:0,他引:2  
Data clustering is an important reserch field in data mining.The key of the clustering algorithm is the distance measure.In this paper,we put forward a new distance measure based on central symmetry,Then we apply it to data clustering.The experimental studies prove the feasibility of this algorithm and get a satisfied result in face detection.  相似文献   

2.
Efficient real time data exchange over the Internet plays a crucial role in the successful application of web-based systems. In this paper, a data transfer mechanism over the Internet is proposed for real time web based applications. The mechanism incorporates the extensible Markup Language (XML) and Hierarchical Data Format (HDF) to provide a flexible and efficient data format. Heterogeneous transfer data is classified into light and heavy data, which are stored using XML and HDF respectively; the HDF data format is then mapped to Java Document Object Model (JDOM) objects in XML in the Java environment. These JDOM data objects are sent across computer networks with the support of the Java Remote Method Invocation (RMI) data transfer infrastructure. Client's defined data priority levels are implemented in RMI, which guides a server to transfer data objects at different priorities. A remote monitoring system for an industrial reactor process simulator is used as a case study to illustrate the proposed data transfer mechanism.  相似文献   

3.
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory.With hardware and/or software support,data prefetching brings data closer to a processor before it is actually needed.Many prefetching techniques have been developed for single-core processors.Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching t...  相似文献   

4.
Computer science is undergoing a fundamental change and is reshaping our understanding of the world. An important aspect of this change is the theory and applications dealing with the gathering and analyzing of large real-world data sets. In this paper, we introduce four research projects in which processing and interpreting large data sets is a central focus. Innovative ways of analyzing such data sets allow us to extract useful information that we would never have obtained from small or synthetic data sets, thus providing us with new insights into the real world.  相似文献   

5.
Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases.  相似文献   

6.
When users store data in big data platforms,the integrity of outsourced data is a major concern for data owners due to the lack of direct control over the data.However,the existing remote data auditing schemes for big data platforms are only applicable to static data.In order to verify the integrity of dynamic data in a Hadoop big data platform,we presents a dynamic auditing scheme meeting the special requirement of Hadoop.Concretely,a new data structure,namely Data Block Index Table,is designed to support dynamic data operations on HDFS(Hadoop distributed file system),including appending,inserting,deleting,and modifying.Then combined with the MapReduce framework,a dynamic auditing algorithm is designed to audit the data on HDFS concurrently.Analysis shows that the proposed scheme is secure enough to resist forge attack,replace attack and replay attack on big data platform.It is also efficient in both computation and communication.  相似文献   

7.
The advantage of COOZ(Complete Object-Oriented Z) is to specify large scale software,but it does not support refinement calculus.Thus its application is comfined for software development.Including refinement calculus into COOZ overcomes its disadvantage during design and implementation.The separation between the design and implementation for structure and notation is removed as well .Then the software can be developed smoothly in the same frame.The combination of COOZ and refinement calculus can build object-oriented frame,in which the specification in COOZ is refined stepwise to code by calculus.In this paper,the development model is established.which is based on COOZ and refinement calculus.Data refinement is harder to deal with in a refinement tool than ordinary algorithmic refinement,since data refinement usually has to be done on a large program component at once.As to the implementation technology of refinement calculus,the data refinement calculator is constructed and an approach for data refinement which is based on data refinement calculus and program window inference is offered.  相似文献   

8.
Today, grid technology has evolved to the point where it is no longer a theory but a proven practice. It represents a viable direction for corporations to explore grid computing as an answer to their business needs within tight financial constraints. In general, grids enable the efficient sharing and management of computing resources for the purpose of performing large complex tasks. Data grid provides the data management features to enable data access, synchronization, and distribution of a grid. The main aim here is to ensure a efficient access and quality data, to improve the availability, and be able to continue delivering acceptable services. In such systems, these advantages are not yielded by means others than replication mechanisms. The effective use the replication technique involves several problems, in relation with the problem of the coherence maintenance of replicas. Our contribution consists new service for the consistency management in the data grid. This service combines between pessimistic and optimistic approaches, taking into account benefits of both approaches, to find a compromise between performance and quality. In addition, our service has been extended by a mechanism placement of replicas based on economics model.  相似文献   

9.
Data access delay has become the prominent performance bottleneck of high-end computing systems. The key to reducing data access delay in system design is to diminish data stall time. Memory locality a...  相似文献   

10.
名词卡片     
TCP/IP is used to faeilitate communication within a network of diverse hardware technology, Information is broken into packets (nsually in the range of 1-1500 characters long) to prevent monopolizing of the network. TCP is a transport level protocol which allows a process on one compoter to send data to a process on another computer. It is a connection oriented protocol which means that a path must be established between the two computers. IP defines the datagram, the format of the data being transferred throughout the network and performs connectionless delivery. Connectionless delivery requires each datagram to contain the source and destination address and each datagram is processed separately. TCP takes the information. and breaks it into pieces called packets, numbers the packets, and then sends them.  相似文献   

11.
数据仓库是连接底层数据源与上层应用的枢纽。该文介绍了数据仓库的填充(与数据源的连接)和数据仓库的访问(与应用界面的连接)技术,包括与数据库数据源和非数据库数据源的连接以及采用组件工具访问数据仓库的方式和实现方法。  相似文献   

12.
为了解决实际问题,大数据分析处理系统需要获取数据,然而实际场景中收集到的实际数据通常不完备.另外,大多数问题的解决方案通常是由问题引导或者仅仅进行数据分析,运行参数调整和设定带有较大的盲目性,难以达到应用的智能性.为此,文中提出平行数据的概念和框架,根据实际数据经计算实验产生真正的虚拟大数据,结合默顿定律,以期待的解决方案与问题进行广义对偶,引导大数据聚焦到实际问题.实际数据与虚拟数据动态互动,平行演化,形成一个虚实相生、数据动态变化的过程,最终使数据具备智能,进而解决未知的问题.平行数据不但是一种数据表示形式,更是一种数据演化机制与方式,其特色是虚实互动,所有数据的动力学轨迹构成了数据动力学系统.平行数据为数据处理、表示、挖掘和应用提供了一个新的范式.  相似文献   

13.
吴昊 《电脑学习》2001,(2):36-37
为有效地解决实际问题,可能要使不同语言进行混合编程,这样就涉及到数据的传递和不同类型的数据文件之间的转换,这里讨论几种常用数据交换问题。  相似文献   

14.
陈元  陈文伟 《计算机工程》2000,26(10):9-10,85
通过定义SOL数据挖掘抽取器,设计了数据挖掘算法和数据库管理系统的接口的框架体系。并通过一个常用的数据挖掘算法简单贝叶斯算法说明了这种标准的SOL数据挖掘抽取器的适用性。  相似文献   

15.
数据仓库中的数据提取   总被引:18,自引:2,他引:16  
连立贵  金凤  蔡家楣 《计算机工程》2001,27(9):61-62,99
数据仓库的思想和工具在企业中得到了越来越广泛的体现和应用,在当前典型的数据仓库应用中,数据仓库的填充或者说数据的提取,转换和加载是设计人员遇到的最大挑战之一,预计在数据仓库的设计过程中,80%的努力都将用于此过程。该文从工程角度介绍了数据提取、转换和加载的过程和实现方式,并介绍了4种可灵活应用的数据提取方法。  相似文献   

16.
面向客户数据仓库的数据集成方法   总被引:3,自引:0,他引:3  
数据集成问题是建立数据仓库过程中的重要问题之一,而客户资料又是其中最难以集成的数据,对于该问题目前还没有一个很好的解决方法。我们总结在实际实践中取得的经验,提出了一套切实可行的解决方案,同时兼顾了集成效率和准确率两方面因素,实际应用效果比较理想。  相似文献   

17.
数据转换过程的串行化方法   总被引:2,自引:1,他引:2  
随着数据仓库和数据集成的发展,数据清洗的工作越来越多,用户在进行数据清洗时需要对数据的内容进行多次处理。在设计清洗的建模过程中,用户可能对同一个数据进行多次的清洗和转换处理,由于步骤多,用户往往不知道清洗和转换步骤出现错误。该文对前述问题进行论述,并针对这种清洗和转换处理中的赋值冲突和范围冲突提出解决方法。  相似文献   

18.
数据仓库中数据质量控制研究   总被引:18,自引:1,他引:18  
随着数据仓库的深入应用,数据质量问题成为关系到数据仓库建设成败和数据能否有效应用的重要关键问题。该文首先讨论数据仓库环境下存在的数据质量问题以及保证数据质量的重要性,然后提出数据质量的度量和评价指标,最后给出了数据仓库实施和运行过程中数据质量控制的数据质量成熟度模型和保证仓库数据质量的方法。  相似文献   

19.
常鑫 《计算机时代》2010,(11):51-52,55
数据仓库中的维数据或度量数据都是确定的,其多维分析也是基于确定数据展开的。在现实中,由于环境的复杂与不确定性,常常需要对模糊数据进行多维分析。文章利用隶属度函数实现确定数据的模糊化,形成基于模糊数据的数据立方体,用以展开多维分析。  相似文献   

20.
文必龙  付玥 《计算机系统应用》2012,21(3):240-243,231
近年来,随着数据元标准的建立,数据元在各行各业的数据集成过程中担任着重要角色,用于规范数据库、报表、文档中的数据项,实现各种数据源之间的映射。分析数据元的结构,提出一种数据项与数据元匹配算法,该算法基于编辑距离算法,融合最长公共子序列、权重、词语重心后移等思想,实现数据项与数据元字典中数据元的相似度计算,利用排列组合原理对匹配速度进行优化。以中石化标准数据元为实验数据进行实验,验证了该匹配算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号