排序方式: 共有443条查询结果,搜索用时 171 毫秒
101.
焦雷 《电脑与微电子技术》2011,(20):3-7
研究微阵列数据中挖掘Top—k频繁闭合项集问题,并设计挖掘算法ZDtoP。算法采用ZBDD结构压缩存储数据集,使用自顶向下深度优先搜索策略挖掘项集长度不小于给定值min_l的Top—k频繁闭合项集,并对搜索空间进行有效修剪。通过实例证明该算法是正确有效的。 相似文献
102.
We present algorithms for identifying frequently occurring items in a large distributed data set. Our algorithms use gossip as the underlying communication mechanism, and do not rely on any central control, nor on an underlying network structure, such as a spanning tree. Instead, nodes repeatedly select a random partner and exchange data with that partner. If this process continues for a (short) period of time, the desired results are computed, with probabilistic guarantees on the accuracy. Our algorithm for identifying frequent items is built by layering a novel small space “sketch” of data over a gossip-based data dissemination mechanism. We prove that the algorithm identifies the frequent items with high probability, and provides bounds on the time till convergence. To our knowledge, this is the first work on identifying frequent items using gossip. 相似文献
103.
AprioriTid挖掘频繁项集算法的改进 总被引:1,自引:0,他引:1
针对AprioriTid算法的不足,提出一种新的优化算法IaprioriTid。该算法从事务压缩、项目压缩和散列技术等方面对AprioriTid算法进行优化,提高了算法的效率。对AprioriTid算法中引入的C′k进行事务压缩和项目压缩,减少C′k中的数据量,提高扫描效率,应用散列技术优化产生频繁-2项集。最后实验证明了该算法的有效性。 相似文献
104.
105.
Evolved genetic programming trees contain many repeated code fragments. Size fair crossover limits bloat in automatic programming,
preventing the evolution of recurring motifs. We examine these complex properties in detail using depth vs. size Catalan binary
tree shape plots, subgraph and subtree matching, information entropy, sensitivity analysis, syntactic and semantic fitness
correlations. Programs evolve in a self-similar fashion, akin to fractal random trees, with diffuse introns. Data mining frequent
patterns reveals that as software is progressively improved a large proportion of it is exactly repeated subtrees as well
as exactly repeated subgraphs. We relate this emergent phenomenon to building blocks in GP and suggest GP works by jumbling
subtrees which already have high fitness on the whole problem to give incremental improvements and create complete solutions
with multiple identical components of different importance. 相似文献
106.
107.
基于关系矩阵的关联规则增量式更新 总被引:2,自引:0,他引:2
关联规则是当前数据挖掘研究的主要模式之一.本文提出了一种高效的增量式关联规则的挖掘算法USLIG,以处理当最小支持度改变时相应的关联规则的更新问题.该算法通过构建向量之间的关系矩阵,将频繁项目集的产生过程转化为项目集的关系矩阵中向量的运算过程,能充分利用以前的挖掘结果,只需扫描比数据库小得多的向量,克服了IUA及相关算法需多次扫描数据库的缺点. 相似文献
108.
一种基于多层模糊模式的频繁项集剪枝算法的优化 总被引:3,自引:0,他引:3
运用关联规则对分布式数据库进行数据挖掘是一个常见的模式,为进一步提高在分布式挖掘多层关联规则算法的效率,改善内存的使用率,再次引入模糊理论和有效支持度的概念,并充分考虑有效支持度的闽值和有效支持度的支持频度,提出了一种新的产生频繁项集算法的修改方案,在理论上对此进行了分析和论证,实验证明这种算法的优化效果是明显的、是有用的。 相似文献
109.
针对稠密数据集.提出一种基于单向FP—tree的最大频繁项集挖掘算法Unid_FP-Max2。该算法在挖掘过程中只生成被约束子树,而它是一种虚拟的树结构,在原有的单向FP—tree基础上用三个很小的数组来表示.因而避免了以往算法需递归构造条件FP—tree来计算最大频繁项集的弊端,极大的降低了内存空间和时间开销,提高了挖掘效率。实验表明,与FP—Max算法相比。算法的效率提高了1倍以上。 相似文献
110.
DRFP-tree: disk-resident frequent pattern tree 总被引:4,自引:3,他引:1
Frequent itemset mining methods basically address time scalability and greatly rely on available physical memory. However,
the size of real-world databases to be mined is exponentially increasing, and hence main memory size is a serious bottleneck
of the existing methods. So, it is necessary to develop new methods that do not fully rely on physical memory; new methods
that utilize the secondary storage in the mining process should be the target. This motivates the work described in this paper;
we mainly propose (Disk Resident Frequent Pattern) DRFP-Growth as a disk based approach similar to FP-Growth. DRFP-growth uses DRFP-tree, which is treated exactly as
FP-tree when constructed in main memory and gets into a modified structure when it turns into disk resident to overcome the
main memory bottleneck. This way, we are able to mine for frequent itemsets from databases of arbitrary sizes without being
restricted by the available physical memory. In other words, we initially try to mine the database using the original FP-growth;
we expand into the secondary memory only if we run out of physical memory. So, DRFP-growth is very comparable to FP-growth
for small databases and high support threshold values. On the other hand, using DRFP-growth, we are still able to mine huge
databases for low support threshold values (the only limitation is the available secondary storage rather than physical memory).
The reported test results demonstrate how the proposed approach succeeds for cases where main memory based approaches fail. 相似文献