首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As databases increasingly integrate different types of information such as multimedia, spatial, time-series, and scientific data, it becomes necessary to support efficient retrieval of multidimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. Reducing the dimension of the feature vectors to enhance the performance of the underlying technique is a popular solution to the infamous curse of dimensionality. We expect the techniques to have good quality of distance measures when the similarity distance between two feature vectors is approximated by some notion of distance between two lower-dimensional transformed vectors. Thus, it is desirable to develop techniques resulting in accurate approximations to the original similarity distance. We investigate dimensionality reduction techniques that directly target minimizing the errors made in the approximations. In particular, we develop dynamic techniques for efficient and accurate approximation of similarity evaluations between high-dimensional vectors based on inner-product approximations. Inner-product, by itself, is used as a distance measure in a wide area of applications such as document databases. A first order approximation to the inner-product is obtained from the Cauchy-Schwarz inequality. We extend this idea to higher order power symmetric functions of the multidimensional points. We show how to compute fixed coefficients that work as universal weights based on the moments of the probability density function of the data set. We also develop a dynamic model to compute the universal coefficients for data sets whose distribution is not known. Our experiments on synthetic and real data sets show that the similarity between two objects in high-dimensional space can be accurately approximated by a significantly lower-dimensional representation.  相似文献   

2.
相似性度量是聚类分析的重要基础,如何有效衡量类属型符号间的相似性是相似性度量的一个难点.文中根据离散符号的核概率密度衡量符号间的相似性,与传统的简单符号匹配及符号频度估计方法不同,该相似性度量在核函数带宽的作用下,不再依赖同一属性上符号间独立性假设.随后建立类属型数据的贝叶斯聚类模型,定义基于似然的类属型对象-簇间相似性度量,给出基于模型的聚类算法.采用留一估计和最大似然估计,提出3种求解方法在聚类过程中动态确定最优的核带宽.实验表明,相比使用特征加权或简单匹配距离的聚类算法,文中算法可以获得更高的聚类精度,估计的核函数带宽在重要特征识别等应用中具有实际意义.  相似文献   

3.
Mining and visualization of time profiled temporal associations is an important research problem that is not addressed in a wider perspective and is understudied. Visual analysis of time profiled temporal associations helps to better understand hidden seasonal, emerging, and diminishing temporal trends. The pioneering work by Yoo and Shashi Sekhar termed as SPAMINE applied the Euclidean distance measure. Following their research, subsequent studies were only restricted to the use of Euclidean distance. However, with an increase in the number of time slots, the dimensionality of a prevalence time sequence of temporal association, also increases, and this high dimensionality makes the Euclidean distance not suitable for the higher dimensions. Some of our previous studies, proposed Gaussian based dissimilarity measures and prevalence estimation approaches to discover time profiled temporal associations. To the best of our knowledge, there is no research that has addressed a similarity measure which is based on the standard score and normal probability to find the similarity between temporal patterns in z-space and retains monotonicity. Our research is pioneering work in this direction. This research has three contributions. First, we introduce a novel similarity (or dissimilarity) measure, SRIHASS to find the similarity between temporal associations. The basic idea behind the design of dissimilarity measure is to transform support values of temporal associations onto z-space and then obtain probability sequences of temporal associations using a normal distribution chart. The dissimilarity measure uses these probability sequences to estimate the similarity between patterns in z-space. The second contribution is the prevalence bound estimation approach. Finally, we give the algorithm for time profiled associating mining called Z-SPAMINE that is primarily inspired from SPAMINE. Experiment results prove that our approach, Z-SPAMINE is computationally more efficient and scalable compared to existing approaches such as Naïve, Sequential and SPAMINE that applies the Euclidean distance.  相似文献   

4.
Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.  相似文献   

5.
基于包含度的Vague集相似度量   总被引:9,自引:0,他引:9  
在模糊模式识别中经常要根据最大相似度原理来分辨待测样品属于哪种模式.由于现有的vague集相似度量公式都是基于距离测度的,因此只要vague集间距离测度一样,它们就无法分辨,因此非常有必要寻找其它的相似度量计算方法.首先将模糊集上的包含度概念扩展到Vague集上,指出Vague集相似度量可以由包含度诱导出,然后给出一组新的Vague集相似度量计算公式.数值算例证明它们是有效的,最后将它们与现有方法进行比较,发现它们各有所长.  相似文献   

6.
Value集的模糊嫡、相似度量和距离测度的关系   总被引:4,自引:0,他引:4  
王昌 《计算机科学》2010,37(10):221-224,274
Vague集理论在各个领域中的广泛应用引起越来越多学者的注意,而模糊墒、相似度量和距离测度是其中的3个关键技术。目前已提出多种关于Vague集的模糊嫡、相似度量和距离测度的计算方法,但这些研究都没有讨论这3个基本概念之间的联系。基于Vague集的模糊嫡、相似度量和距离测度的公理化定义,给出了三者之间的相互诱导关系,建立了模糊墒、相似度量和距离测度之间的联系。  相似文献   

7.
Indexing Multidimensional Time-Series   总被引:1,自引:0,他引:1  
While most time series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for an index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of similar trajectories. Trajectory datasets are very common in environmental applications, mobility experiments, and video surveillance and are especially important for the discovery of certain biological patterns. Our primary similarity measure is based on the longest common subsequence (LCSS) model that offers enhanced robustness, particularly for noisy data, which are encountered very often in real-world applications. However, our index is able to accommodate other distance measures as well, including the ubiquitous Euclidean distance and the increasingly popular dynamic time warping (DTW). While other researchers have advocated one or other of these similarity measures, a major contribution of our work is the ability to support all these measures without the need to restructure the index. Our framework guarantees no false dismissals and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall. The experimental results demonstrate that our index can help speed up the computation of expensive similarity measures such as the LCSS and the DTW. Edited by B. Ooi  相似文献   

8.
Recently, the Hesitant Fuzzy Linguistic Term Sets (HFLTSs) have been widely used to address cognitive complex linguistic information because of its advantage in representing vagueness and hesitation in qualitative decision-making process. Information measures, including distance measure, similarity measure, entropy measure, inclusion measure and correlation measure, are used to characterize the relationships between linguistic elements. Many decision-making theories are based on information measures. Up to now, distance, similarity, entropy and correlation measures have been proposed by scholars but there is no paper focuses on inclusion measure. This paper dedicates to filling this gap and the inclusion measure between HFLTSs are proposed. We discuss the relationships among distance, similarity, inclusion and entropy measures of HFLTSs. Given that clustering algorithm is an important application of information measures but there are few papers related to clustering algorithm based on information measures in the environment of HFLTS, in this paper, we propose two clustering algorithms based on correlation measure and distance measure, respectively. After that, a case study concerning water resource bearing capacity is illustrated to verify the applicability of the proposed clustering algorithms.  相似文献   

9.
Fuzzy grey relational analysis for software effort estimation   总被引:1,自引:1,他引:0  
Accurate and credible software effort estimation is a challenge for academic research and software industry. From many software effort estimation models in existence, Estimation by Analogy (EA) is still one of the preferred techniques by software engineering practitioners because it mimics the human problem solving approach. Accuracy of such a model depends on the characteristics of the dataset, which is subject to considerable uncertainty. The inherent uncertainty in software attribute measurement has significant impact on estimation accuracy because these attributes are measured based on human judgment and are often vague and imprecise. To overcome this challenge we propose a new formal EA model based on the integration of Fuzzy set theory with Grey Relational Analysis (GRA). Fuzzy set theory is employed to reduce uncertainty in distance measure between two tuples at the k th continuous feature ( | ( xo(k) - xi(k) | ) \left( {\left| {\left( {{x_o}(k) - {x_i}(k)} \right.} \right|} \right) .GRA is a problem solving method that is used to assess the similarity between two tuples with M features. Since some of these features are not necessary to be continuous and may have nominal and ordinal scale type, aggregating different forms of similarity measures will increase uncertainty in the similarity degree. Thus the GRA is mainly used to reduce uncertainty in the distance measure between two software projects for both continuous and categorical features. Both techniques are suitable when relationship between effort and other effort drivers is complex. Experimental results showed that using integration of GRA with FL produced credible estimates when compared with the results obtained using Case-Based Reasoning, Multiple Linear Regression and Artificial Neural Networks methods.  相似文献   

10.
针对图像局部特征的词袋模型(Bag-of-Word,BOW)检索研究中聚类中心的不确定性和计算复杂性问题,提出一种由不同种类的距离进行相似程度测量的检索和由匹配点数来检索的方法。这种方法首先需要改进文档图像的SURF特征,有效降低特征提取复杂度;其次,对FAST+SURF特征实现FLANN双向匹配与KD-Tree+BBF匹配,在不同变换条件下验证特征鲁棒性;最后,基于这两种检索方法对已收集整理好的各类维吾尔文文档图像数据库进行检索。实验结果表明:基于距离的相似性度量复杂度次于基于匹配数目的检索,而且两种检索策略都能满足快速、精确查找需求。  相似文献   

11.
基于海明距离的直觉模糊粗糙集相似度量方法   总被引:1,自引:0,他引:1  
针对直觉模糊粗糙集的相似度量问题,提出了一种基于海明距离的直觉模糊粗糙集相似度量方法。首先给出了两个直觉模糊粗糙值问的相似度量方法,并揭示了它的若干重要性质。然后,在此基础上,又提出了一种基于海明距离的直觉模糊粗糙集相似度量方法,并证明它也具有同样的性质。最后用数值算例验证了这种方法的有效性。  相似文献   

12.
Maximum-likelihood image matching   总被引:8,自引:0,他引:8  
Image-matching applications, such as tracking and stereo, commonly use the sum-of-squared-difference (SSD) measure to determine the best match. However, this measure is sensitive to outliers and is not robust to template variations. Alternative measures have also been proposed that are more robust to these issues. We improve upon these using a probabilistic formulation for image matching in terms of maximum-likelihood estimation that can be used for both edge template matching and gray-level image matching. This formulation generalizes previous edge-matching methods based on distance transforms. We apply the techniques to stereo matching and feature tracking. Uncertainty estimation techniques allow feature selection to be performed by choosing features that minimize the localization uncertainty  相似文献   

13.
基于空间特征的图像检索   总被引:2,自引:1,他引:1  
史婷婷  李岩 《计算机应用》2008,28(9):2292-2296
提出一种新的基于空间特征的图像特征描述子SCH,利用基于颜色向量角和欧几里得距离的MCVAE算法共同检测原始彩色图像边缘,同时利用一种新的“最大最小分量颜色不变量模型”对原始图像量化,对边缘像素建立边缘相关矩阵;对非边缘像素使用颜色直方图描述局部颜色分布信息;然后,利用新的sin相似性度量法则衡量图像特征间的相似度。实验采用VC++6.0开发了基于内容的图像检索原型系统“SttImageRetrieval”,基于Oracle 9i数据库建立了一个综合型图像数据库“IMAGEDB”。实验分析结果证明,利用SCH描述子的检索准确度明显高于仅基于颜色统计特征的检索结果。  相似文献   

14.
The cosine similarity measure is often applied after discriminant analysis in pattern recognition. This paper first analyzes why the cosine similarity is preferred by establishing the connection between the cosine similarity based decision rule in the discriminant analysis framework and the Bayes decision rule for minimum error. The paper then investigates the challenges inherent of the cosine similarity and presents a new similarity that overcomes these challenges. The contributions of the paper are thus three-fold. First, the application of the cosine similarity after discriminant analysis is discovered to have its theoretical roots in the Bayes decision rule. Second, some inherent problems of the cosine similarity such as its inadequacy in addressing distance and angular measures are discussed. Finally, a new similarity measure, which overcomes the problems by integrating the absolute value of the angular measure and the lp norm (the distance measure), is presented to enhance pattern recognition performance. The effectiveness of the proposed new similarity measure in the discriminant analysis framework is evaluated using a large scale, grand challenge problem, namely, the Face Recognition Grand Challenge (FRGC) problem. Experimental results using 36,818 FRGC images on the most challenging FRGC experiment, the FRGC Experiment 4, show that the new similarity measure improves face recognition performance upon other popular similarity measures, such as the cosine similarity measure, the normalized correlation, and the Euclidean distance measure.  相似文献   

15.
In this paper, we present new definitions on distance and similarity measures between intuitionistic fuzzy sets (IFSs) by combining with hesitation degree. First, we discuss the limitations in traditional distance and similarity measures, which are caused by the neglect of hesitation degree's influence. Even though a vector-valued similarity measure was proposed, which has two components indicating similarity and hesitation aspects, it still cannot perform well in practical applications because hesitation works only when the values of similarity measures are equal. In order to overcome the limitations, we propose new definitions on hesitation, distance and similarity measures, and research some theorems which satisfy the requirements of the proposed definitions. Meanwhile, we investigate the relationships among hesitation, distance, similarity and entropy of IFSs to verify the consistency of our work and previous research. Finally, we analyse and discuss the advantages and disadvantages of the proposed similarity measure in detail, and then we apply the proposed measures (dH and SH) to deal with pattern recognition problems, and demonstrate that they outperform state-of-the-art distance and similarity measures.  相似文献   

16.
Typical hesitant fuzzy sets (THFSs), possessing a finite-set-valued fuzzy membership degrees called typical hesitant fuzzy elements (THFEs), is a special kind of hesitant fuzzy sets. Fuzzy inclusion relationship, as the order structure in fuzzy mathematics, plays an elementary role in the theoretical research and practical applications of fuzzy sets. In this paper, a new partial order for THFEs is defined via the disjunctive semantic meaning of a set, based on which fuzzy inclusion relationship is defined for THFSs. Furthermore, inclusion measures are defined to present the quantitative ranking of every two THFEs and THFSs and different inclusion measures are constructed. The related similarity measure, distance and fuzzy entropy of THFSs are presented and their relationship with inclusion measures are investigated. Finally, an example is given to show that the inclusion measure can be applied effectively in hesitant fuzzy multi-attribute decision making.  相似文献   

17.
基于颜色和形状特征的彩色图像显示与检索技术   总被引:9,自引:0,他引:9  
提出了一种有效用于抽取特征、索引和检索彩色图像的技术途径,通过提取图像的颜色不变量,建立相应的色度直方图(hue histogram)来表示图像的颜色分布特征,为了描述图像中对象的位置及方向特征,首先计算图像的色度轮廓并对其进行Radon变换,然后计算相应的“空间直方图”,由此,得到了一种基于图像的颜色分布特征和形状特征的新的图像表示方法,为了计量图像的全局相似度,基于“累积距离”和“基于向量的距离”定义了两种图像的距离度量,并分别讨论了距离度量的选取与归一化、不同子特征的组合等CBIR所涉及的关键问题,实验结果表明,文中提出的方法能获得满意的检索性能,其检索结果能较好地接近于人的视觉感和结果。  相似文献   

18.
李海林  梁叶 《控制与决策》2017,32(3):451-458
针对传统符号聚合近似方法在特征表示时容易忽略时间序列局部形态特征的局限性,以及动态时间弯曲在度量上的优势,提出一种基于数值符号和形态特征的时间序列相似性度量方法.将时间序列进行符号和形态的特征表示后,提出动态时间弯曲与符号距离结合的时间序列距离度量方法,使所提方法能够较好地反映时间序列数据数值分布和形态特征.实验结果表明,所提出的方法在时间序列数据挖掘中能够得到较好的分类效果,具有一定的优越性.  相似文献   

19.
20.
Many existing intuitionistic fuzzy (IF) decision methods focus on a reasonable ranking for alternatives under unknown weight information. Traditionally, the weight information is usually determined from a multiobjective optimization model based on real-valued measures such as IF distance or similarity measures, which may lose divergence information. In this paper, we propose one new type of optimization model for determining the weights based on a fuzzy measure called the similarity–divergence measure (S–D measure). First, we develop similarity and divergence measures of IF sets respectively, and a 2-tuple consisting of similarity and divergence is defined as a S–D measure. This measure is further proven to be an IF similarity degree and has practical semantics of similarity and divergence features in human’s cognition. Second, we utilize such measure to calculate fuzzy similarities of each alternative and construct a nonlinear optimization model to determine the weights. Third, we design an algorithm for solving the model with the aid of particle swarm optimization and thus develop an IF decision method. Finally, two examples are given to demonstrate our method and then it is compared with existing methods to explain its effectiveness and superiority.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号