首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
刘杭  殷歆  陈杰  罗恒 《计算机工程》2023,49(1):121-129
为捕捉时间序列中潜在的特征依赖关系并实现高维时序数据的快速模糊预测,构建基于时间卷积网络(TCN)与自注意力机制的两种混合网络模型:TSANet和TSANet-MF。TSANet模型通过全局和局部两个并行卷积分量结构提取特征后,利用自注意力机制增强特征点关联程度,并结合并行的TCN增大卷积的感受野范围,最大程度地捕捉多维时序数据的周期性特征。TSANet-MF模型将TSANet作为矩阵分解算法的正则化项,使高维数据转化为具有更多时序特征的低维数据,减少计算复杂度,实现高维数据的快速模糊预测。在4种不同领域的时间序列数据集上的实验结果表明,TSANet模型在3种数据集上的预测性能均优于基准模型,尤其在高维Traffic数据集上相对平方根误差降低了19.52%~56.37%,TSANet-MF模型在Electricity和Traffic高维数据集上的训练时间相比于基准模型明显减少。上述实验结果验证了两种混合网络模型均具有较好的多维时间序列预测性能。  相似文献   

2.
提出一种基于偏最小二乘回归的鲁棒性特征选择与分类算法(RFSC-PLSR)用于解决特征选择中特征之间的冗余和多重共线性问题。首先,定义一个基于邻域估计的样本类一致性系数;然后,根据不同k近邻(kNN)操作筛选出局部类分布结构稳定的保守样本,用其建立偏最小二乘回归模型,进行鲁棒性特征选择;最后,在全局结构角度上,用类一致性系数和所有样本的优选特征子集建立偏最小二乘分类模型。从UCI数据库中选择了5个不同维度的数据集进行数值实验,实验结果表明,与支持向量机(SVM)、朴素贝叶斯(NB)、BP神经网络(BPNN)和Logistic回归(LR)四种典型的分类器相比,RFSC-PLSR在低维、中维、高维等不同情况下,分类准确率、鲁棒性和计算效率三种性能上均表现出较强的竞争力。  相似文献   

3.
全局与局部判别信息融合的转子故障数据集降维方法研究   总被引:1,自引:0,他引:1  
针对传统的数据降维方法无法兼顾保持全局特征信息与局部判别信息的问题,提出一种核主元分析(Kernel principal component analysis,KPCA)和正交化局部敏感判别分析(Orthogonal locality sensitive discriminant analysis,OLSDA)相结合的转子故障数据集降维方法.该方法首先利用KPCA算法有效降低数据集的相关性、消除冗余属性,由此实现了最大程度地保留原始数据全局非线性信息的作用;然后利用OLSDA算法充分挖掘出数据的局部流形结构信息,达到了提取出具有高判别力低维本质特征的目的.上述方法的特点是通过同时进行的正交化处理可避免局部子空间结构发生失真,采用三维图直观显示出低维结果,以低维特征子集输入最近邻分类器(K-nearest neighbor,KNN)的识别率和聚类分析之类间距Sb、类内距Sw作为衡量降维效果的指标.实验表明该方法能够全面地提取出全局与局部判别信息,使故障分类更清晰,相应地识别准确率得到了明显提升.该研究可为解决高维和非线性机械故障数据集的可视化与分类问题,提供理论参考依据.  相似文献   

4.
在模式分类问题中,利用Fisher准则及K-L变换将样本数据从高维特征空间映射到低维特征空间以提取特征;而SVM(支持向量机)引进核函数隐含的映射把低维特征空间中的样本数据映射到高维特征空间来实现分类。文章利用三种方法对鸢尾属植物数据集的分类进行仿真试验,并对仿真结果进行分析比较,给出了三种方法在模式分类应用中的异同以及他们之间的内在联系和区别。  相似文献   

5.
The subspace method of pattern recognition is a classification technique in which pattern classes are specified in terms of linear subspaces spanned by their respective class-based basis vectors. To overcome the limitations of the linear methods, kernel-based nonlinear subspace (KNS) methods have been recently proposed in the literature. In KNS, the kernel principal component analysis (kPCA) has been employed to get principal components, not in an input space, but in a high-dimensional space, where the components of the space are nonlinearly related to the input variables. The length of projections onto the basis vectors in the kPCA are computed using a kernel matrix K, whose dimension is equivalent to the number of sample data points. Clearly this is problematic, especially, for large data sets.In this paper, we suggest a computationally superior mechanism to solve the problem. Rather than define the matrix K with the whole data set and compute the principal components, we propose that the data be reduced into a smaller representative subset using a prototype reduction scheme (PRS). Since a PRS has the capability of extracting vectors that satisfactorily represent the global distribution structure, we demonstrate that data points which are ineffective in the classification can be eliminated to obtain a reduced kernel matrix, K, without degrading the performance. Our experimental results demonstrate that the proposed mechanism dramatically reduces the computation time without sacrificing the classification accuracy for samples involving real-life data sets as well as artificial data sets. The results especially demonstrate the computational advantage for large data sets, such as those involved in data mining and text categorization applications.  相似文献   

6.
Fast retrieval methods are critical for many large-scale and data-driven vision applications. Recent work has explored ways to embed high-dimensional features or complex distance functions into a low-dimensional Hamming space where items can be efficiently searched. However, existing methods do not apply for high-dimensional kernelized data when the underlying feature embedding for the kernel is unknown. We show how to generalize locality-sensitive hashing to accommodate arbitrary kernel functions, making it possible to preserve the algorithm's sublinear time similarity search guarantees for a wide class of useful similarity functions. Since a number of successful image-based kernels have unknown or incomputable embeddings, this is especially valuable for image retrieval tasks. We validate our technique on several data sets, and show that it enables accurate and fast performance for several vision problems, including example-based object classification, local feature matching, and content-based retrieval.  相似文献   

7.
Feature selection has been an important preprocessing step in high-dimensional data analysis and pattern recognition. In this paper, we propose a locality preserving multimodal discriminative learning method called LPMDL for supervised feature selection, which arises by solving two standard eigenvalue problems and seeks to find a pair of optimal transformations for two sets of multivariate data in different classes. This topic can optimally discover the local structure information of the given data hided in the original space and aims at structuring an effective low-dimensional embedding space, under which LPMDL keeps nearby data pairs in the same class close and between-class data pairs apart, and the projections of the original data in different classes can be appropriately separated from each other. LPMDL can be performed either in the input space or the reproducing kernel Hilbert space which gives rise to the kernelized version of LPMDL. We also evaluate the feasibility and efficiency of the LPMDL approach by conducting extensive data visualization and classification tasks. Experimental results on a broad range of data sets show LPMDL tends to capture the intrinsic structure characteristics of the samples data due to the effective representation of the points and achieves similar or even better performance than the conventional PCA, NPE, LPP and LFDA methods.  相似文献   

8.
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.  相似文献   

9.
High dimension low sample size data, like the microarray gene expression levels, pose numerous challenges to conventional statistical methods. In the particular case of binary classification, some classification methods, such as the support vector machine (SVM), can efficiently deal with high-dimensional predictors, but lacks the accuracy in estimating the probability of membership of a class. In contrast, the traditional logistic regression (TLR) effectively estimates the probability of class membership for data with low-dimensional inputs, but does not handle high-dimensional cases. The study bridges the gap between SVM and TLR by their loss functions. Based on the proposed new loss function, a pseudo-logistic regression and classification approach which simultaneously combines the strengths of both SVM and TLR is also proposed. Simulation evaluations and real data applications demonstrate that for low-dimensional data, the proposed method produces regression estimates comparable to those of TLR and penalized logistic regression, and that for high-dimensional data, the new method possesses higher classification accuracy than SVM and, in the meanwhile, enjoys enhanced computational convergence and stability.  相似文献   

10.
Li  Kunmei  Fard  Nasser 《The Journal of supercomputing》2022,78(14):16485-16497

Advancements in high-speed computer technology play an ever-increasing role in analyzing various types and massive size data. However, handling big high-dimensional data sets is a challenge in terms of computational storage and capacity. Through feature selection methods, data dimensions can be reduced by eliminating the dummy variables, allowing for more extensive analysis. In this paper, data are classified based on the ratio of response into two types: balanced (almost the same ratio for each class) and partially balanced (consists of majority and minority, with virtually the same ratio for minority classes). Performance comparisons of various feature selection methods for balanced and partially balanced data are provided. This approach will help in selecting sampling strategy and feature selection methods that perform well while utilizing appropriate resources for high-dimensional data.

  相似文献   

11.
We propose a general framework for nonparametric classification of multi-dimensional numerical patterns. Given training points for each class, it builds a set cover with convex sets each of which contains some training points of the class but no points of the other classes. Each convex set has thus an associated class label, and classification of a query point is made to the class of the convex set such that the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class are regarded as “prototypes” for that class. We then apply this framework to two special types of convex sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with them and for computing the projection length onto their boundaries. For convex hulls, we also give a method for implicitly evaluating whether a point is contained in a convex hull, which can avoid computational difficulty for explicit construction of convex hulls in high-dimensional space.  相似文献   

12.
Wei Zhang  Hong Lu 《Pattern recognition》2006,39(11):2240-2243
In this paper a novel subspace learning method called discriminant neighborhood embedding (DNE) is proposed for pattern classification. We suppose that multi-class data points in high-dimensional space tend to move due to local intra-class attraction or inter-class repulsion and the optimal embedding from the point of view of classification is discovered consequently. After being embedded into a low-dimensional subspace, data points in the same class form compact submanifod whereas the gaps between submanifolds corresponding to different classes become wider than before. Experiments on the UMIST and MNIST databases demonstrate the effectiveness of our method.  相似文献   

13.
当前的图像识别领域,大部分的分类或者识别方法都建立在已有大量数据的基础上,将大量数据投入训练,经过采样分析、特征提取后做判别分类。然而在现实世界中,大多数目标分类问题并没有大量的标注数据。为了解决基于小样本数据集的图像识别问题,本文首先使用数据增强方法扩充数据集,然后利用多层卷积神经网络将图像映射到高维嵌入空间中,再使用原型网络得到每个类的原型点,根据嵌入空间中测试图像与各个类原型点之间的距离将其分类。实验结果表明,该方法在小样本条件下具有较高的识别准确率和较强的鲁棒性。  相似文献   

14.
基于判别分析的半监督聚类方法   总被引:1,自引:0,他引:1       下载免费PDF全文
与无监督聚类相比,半监督聚类是利用一部分先验信息来更好地挖掘和理解数据的内在结构,并紧密遵从用户的偏好。现有的典型半监督聚类算法仅仅适合于低维数据,文中提出一种新颖的基于判别分析的半监督聚类算法来解决高维数据聚类问题。新算法首先使用主成分分析来投影高维数据,进一步在投影空间中,使用基于球形K均值聚类算法对数据聚类;然后利用聚类结果,使用线性判别分析降维输入空间数据;最后在投影空间中对数据再次聚类。在一组真实数据集上的实验表明,所提出的算法不仅可以有效地处理高维数据,还提高了聚类性能。  相似文献   

15.
特征选择是处理高维大数据常用的降维手段,但其中牵涉到的多个彼此冲突的特征子集评价目标难以平衡。为综合考虑特征选择中多种子集评价方式间的折中,优化子集性能,提出一种基于子集评价多目标优化的特征选择框架,并重点对多目标粒子群优化(MOPSO)在特征子集评价中的应用进行了研究。该框架分别根据子集的稀疏度、分类能力和信息损失度设计多目标优化函数,继而基于多目标优化算法进行特征权值向量寻优,并通过权值向量Pareto解集膝点选取确定最优向量,最终实现基于权值向量排序的特征选择。设计实验对比了基于多目标粒子群优化算法的特征选择(FS_MOPSO)与四种经典方法的性能,多个数据集上的结果表明,FS_MOPSO在低维空间表现出更高的分类精度,并保证了更少的信息损失。  相似文献   

16.
目的 为了解决基于卷积神经网络的算法对高光谱图像小样本分类精度较低、模型结构复杂和计算量大的问题,提出了一种变维卷积神经网络。方法 变维卷积神经网络对高光谱分类过程可根据内部特征图维度的变化分为空—谱信息融合、降维、混合特征提取与空—谱联合分类的过程。这种变维结构通过改变特征映射的维度,简化了网络结构并减少了计算量,并通过对空—谱信息的充分提取提高了卷积神经网络对小样本高光谱图像分类的精度。结果 实验分为变维卷积神经网络的性能分析实验与分类性能对比实验,所用的数据集为Indian Pines和Pavia University Scene数据集。通过实验可知,变维卷积神经网络对高光谱小样本可取得较高的分类精度,在Indian Pines和Pavia University Scene数据集上的总体分类精度分别为87.87%和98.18%,与其他分类算法对比有较明显的性能优势。结论 实验结果表明,合理的参数优化可有效提高变维卷积神经网络的分类精度,这种变维模型可较大程度提高对高光谱图像中小样本数据的分类性能,并可进一步推广到其他与高光谱图像相关的深度学习分类模型中。  相似文献   

17.
如何准确、实时得到苹果病害信息是苹果病害管理的一个重要研究内容。根据苹果叶片症状准确、快速地诊断苹果病害是预防和控制苹果病害的基础。由于苹果同类病害叶片及其病斑图像的形状、颜色和纹理之间的差异很大,使得很多经典的模式识别方法不能有效地应用于苹果叶部病害识别。为此,提出了一种基于二维子空间学习维数约简(2DSLDR)的苹果病害识别方法。该方法将高维空间的观测样本点映射到低维子空间,使得类内样本点更加紧凑,而类间样本点更加分离,从而得到最佳的分类特征。该方法直接作用于叶片图像,不需要计算逆矩阵,从而克服了经典植物病害识别方法中特征提取与选择的难题,避免了经典子空间判别分析中的小样本问题,提高了识别效果。采用该方法对三种常见苹果叶部病害进行识别实验,并与其他苹果病害识别和监督流形学习方法进行比较。实验结果表明,2DSLDR对苹果叶部病害识别是有效可行的,识别精度高达90%以上。  相似文献   

18.
自闭症患者的行为和认知缺陷与潜在的脑功能异常有关。对于静息态功能磁振图像(functional magnetic resonance imaging, fMRI)高维特征,传统的线性特征提取方法不能充分提取其中的有效信息用于分类。为此,本文面向fMRI数据提出一种新型的无监督模糊特征映射方法,并将其与多视角支持向量机相结合,构建分类模型应用于自闭症的计算机辅助诊断。该方法首先采用多输出TSK模糊系统的规则前件学习方法,将原始特征数据映射到线性可分的高维空间;然后引入流形正则化学习框架,提出新型的无监督模糊特征学习方法,从而得到原输出特征向量的非线性低维嵌入表示;最后使用多视角SVM算法进行分类。实验结果表明:本文方法能够有效提取静息态fMRI数据中的重要特征,在保证模型具有优越且稳定的分类性能的前提下,还可以提高模型的可解释性。  相似文献   

19.
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data.  相似文献   

20.
The aim of this paper is to develop a fuzzy classifier form the point of view of a fuzzy information retrieval system. The genetic algorithm is employed to find useful fuzzy concepts with high classification performance for classification problems; then, each of classes and patterns can be represented by a fuzzy set of useful fuzzy concepts. Each of fuzzy concepts is linguistically interpreted and the corresponding membership functions remain fixed during the evolution. A pattern can be categorized into one class if there exists a maximum degree of similarity between them. For not distorting the usefulness of the proposed classifier for high-dimensional problems, the principal component analysis is incorporated into the proposed classifier to reduce dimensions. The generalization ability of the proposed classifier is examined by performing computer simulations on some well-known data sets, such as the breast cancer data and the wine classification data. The results demonstrate that the proposed classifier works well in comparison with other classification methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号