首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper examines multilingual audio Query-by-Example (QbE) retrieval, utilizing the posteriorgram-based Phonetic Unit Modelling (PUM) approach and the Weighted Fast Sequential Dynamic Time Warping (WFSDTW) algorithm. The PUM approach employs phone recognizers trained on language-specific external resources in a supervised way. Thus, the information about the phonetic distribution is embedded in the process of acoustic modelling. The resulting acoustic models were also used for language-independent QbE retrieval. The improved WFSDTW algorithm was implemented in order to perform retrievals for each query (keyword) within the particular utterance file. The major interest is placed on a retrieval performance measurement of the proposed WFSDTW solution employing posteriorgram-based keyword matching with Gaussian mixture modelling (GMM). Score normalization and fusion of four different language-dependent sub-systems was carried out using a simple max-score merging strategy. The results show a certain predominance of the proposed WFSDTW solution among two other evaluated techniques, namely basic DTW and segmental DTW algorithms. Also, the combination of multiple PUM techniques together with the WFSDTW has been proved as an effective solution for the QbE task.  相似文献   

2.
近年来,基于内容的音乐检索研究日益受到重视,不少检索方法被提出来。其中,大部分方法主要集中在精确地表征音乐的某一两个特征,以反映出音乐某一两个突出方面的性质。论文采取完全不同的思路,使用从声谱图中提取的特征矩阵来表示音乐,查询片断与数据库中候选乐曲的相似度从而转化成两个特征矩阵间的相似度。实验结果表明:该方法不仅过程与计算简单,而且能够取得良好的检索效果。  相似文献   

3.
This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover’s Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.  相似文献   

4.
针对哼唱式音乐检索的非准确性,本文提出了一种新的旋律表示方法,并根据旋律的表示采用了两次匹配算法,将相似度最高的乐曲输出。实验结果表明,该方法提高了哼唱式音乐检索的准确率。  相似文献   

5.
《微型机与应用》2017,(5):38-41
解决大规模音频数据库快速检索的有效手段之一是建立合适的音频索引,其中音频分割和标注是建立音频索引的基础。文中采用了一种基于短时能量和改进度量距离的两步音频分割算法,使得分割后的音频片段具有段间特征差异大、段内特征方差小的特点。在音频分割的基础上进行了音频数据库中音频流的标注;分别基于BP神经网络算法和Philips音频指纹算法对音频进行了音频类别和音频内容的标注,为后续建立音频索引表做准备。实验结果表明,两步分割算法能较好地分割任意音频流,音频标注算法能有效进行基于音频类别和音频内容的标注,算法同时具有良好的鲁棒性。  相似文献   

6.
In this thesis, we present a novel audio digital watermark method based on counter-propagation neural networks. After dealing with the audio by discrete wavelet transform, we select the important coefficients which are ready to be trained in the neural networks. By making use of the capabilities of memorization and fault tolerance in CPN, watermark is memorized in the nerve cells of CPN. In addition, we adopt a kind of architecture with an adaptive number of parallel CPN to treat with each audio frame and the corresponding watermark bit. Comparing with other traditional methods by using CPN, it was largely improve the efficiency for watermark embedding and correctness for extracting, namely the speed of whole algorithm. The extensive experimental results show that, we can detect the watermark exactly under most of attacks. This method tradeoff both the robustness and inaudibility of the audio digital watermark efficaciously.  相似文献   

7.
Latent topic model such as Latent Dirichlet Allocation (LDA) has been designed for text processing and has also demonstrated success in the task of audio related processing. The main idea behind LDA assumes that the words of each document arise from a mixture of topics, each of which is a multinomial distribution over the vocabulary. When applying the original LDA to process continuous data, the word-like unit need be first generated by vector quantization (VQ). This data discretization usually results in information loss. To overcome this shortage, this paper introduces a new topic model named Gaussian-LDA for audio retrieval. In the proposed model, we consider continuous emission probability, Gaussian instead of multinomial distribution. This new topic model skips the vector quantization and directly models each topic as a Gaussian distribution over audio features. It avoids discretization by this way and integrates the procedure of clustering. The experiments of audio retrieval demonstrate that Gaussian-LDA achieves better performance than other compared methods.  相似文献   

8.
The content‐based classification and retrieval of real‐world audio clips is one of the challenging tasks in multimedia information retrieval. Although the problem has been well studied in the last two decades, most of the current retrieval systems cannot provide flexible querying of audio clips due to the mixed‐type form (e.g., speech over music and speech over environmental sound) of audio information in real world. We present here a complete, scalable, and extensible content‐based classification and retrieval system for mixed‐type audio clips. The system gives users an opportunity for flexible querying of audio data semantically by providing four alternative ways, namely, querying by mixed‐type audio classes, querying by domain‐based fuzzy classes, querying by temporal information and temporal relationships, and querying by example (QBE). In order to reduce the retrieval time, a hash‐based indexing technique is introduced. Two kinds of experiments were conducted on the audio tracks of the TRECVID news broadcasts to evaluate the performance of the proposed system. The results obtained from our experiments demonstrate that the Audio Spectrum Flatness feature in MPEG‐7 standard performs better in music audio samples compared to other kinds of audio samples and the system is robust under different conditions. © 2011 Wiley Periodicals, Inc.  相似文献   

9.
李应 《智能系统学报》2008,3(3):259-264
根据多媒体音频数据的特点,提出一种适用于快速音频数据检索的局部搜索数据结构,即局部搜索树(local search tree,LS-tree).在局部搜索树中,分别以音频数据小波变换系数的过零率和平均幅度作为主、次关键码,基于局部范围对作为索引的其他系数进行组织.其次,基于局部搜索树,提出采用小波包最好基小波塔型算法实现音频数据检索.最后,把采用局部搜索树的小波包最好基—小波塔型算法的搜索和基于小波不同级系数的检索方法相比较,结果表明,这种方法对音频数据检索的快速和有效性.  相似文献   

10.
哈希技术被视为最有潜力的相似性搜索方法,其可以用于大规模多媒体数据搜索场合。为了解决在大规模图像情况下,数据检索效率低下的问题,提出了一种基于分段哈希码的倒排索引树结构,该索引结构将哈希码进行分段处理,对每段哈希码维护一个倒排索引树结构,并结合高效的布隆过滤器构建哈希索引结构。为了进一步提高检索准确性,设计了一种准确的排序融合算法,对多个哈希算法的排序结果分别构建加权无向图,采用PageRank的思想对基于多个哈希算法的排序列表的融合技术进行了详细的说明。实验结果表明,基于分段哈希码的倒排索引树结构能极大地提升数据的检索速度。此外,相比于传统的单个哈希算法排序技术,基于多个哈希算法的排序列表融合技术的检索准确率优势显著。  相似文献   

11.
Multimedia Tools and Applications - Through analyzing users’ listening records, personalized music recommendation can not only help users find interesting music, but also help related...  相似文献   

12.
In current practice, the design is still based on the sequential design methodology, which makes the process or manufacture information not considered at the preliminary design stage. In addition, the quality of design and the time to perform them are largely dependent on the experience of the engineers. Case-based design is an intelligent method which involves retrieving the most similar previous cases to provide a solution of a new decision problem. Therefore, case retrieval is the most crucial process in case-based design. However, few studies attempt to research case retrieval in concurrent design. This paper proposes a hybridization of fuzzy similarity measurement (FSM) and fuzzy multi-criteria decision making (FMCDM) for case retrieval from historical cases for concurrent design. In FSM stage, triangular function is used to represent the different fuzzy requirements, respectively, and meanwhile calculate the similarity. Less valuable case is filtered out by defined threshold. Then FMCDM is developed to evaluate the most similar cases in terms of product criteria to pick out the most suitable case. Furthermore, the FSM–FMCDM model is applied to power transformer concurrent design, and the experiment indicates that the FSM–FMCDM retrieval is more reasonable than the prior SM only case retrieval.  相似文献   

13.
Product development of today is becoming increasingly knowledge intensive. Specifically, design teams face considerable challenges in making effective use of increasing amounts of information. In order to support product information retrieval and reuse, one approach is to use case-based reasoning (CBR) in which problems are solved “by using or adapting solutions to old problems.” In CBR, a case includes both a representation of the problem and a solution to that problem. Case-based reasoning uses similarity measures to identify cases which are more relevant to the problem to be solved. However, most non-numeric similarity measures are based on syntactic grounds, which often fail to produce good matches when confronted with the meaning associated to the words they compare. To overcome this limitation, ontologies can be used to produce similarity measures that are based on semantics. This paper presents an ontology-based approach that can determine the similarity between two classes using feature-based similarity measures that replace features with attributes. The proposed approach is evaluated against other existing similarities. Finally, the effectiveness of the proposed approach is illustrated with a case study on product–service–system design problems.  相似文献   

14.
The advent of large scale digital image database leads to great challenges in content-based image retrieval (CBIR) method. The CBIR is considered an active area of research; however, it comprises a strong backdrop for new methodologies and system implementations. Hence, many research contributions focus on these techniques to enable higher image retrieval accuracy while preserving the low level computational complexity. This paper proposes a CBIR method, which is based on an efficient combination of multiresolution based color and texture features. This paper considers color autocorrelogram of the hue(H) and saturation(s) components of HSV color space for color features, and value(V) component of HSV color space for texture features. These two image features are extracted by computing co-occurrence matrix at optimum level image, which is the basis for the formation of feature vector. Though the optimum level is constructed based on wavelet transform, which contains a few dominant wavelet coefficients. The efficiency of the proposed system is tested with standard image databases, and the experimental results show that the proposed method achieves better retrieval accuracy at optimum level; moreover, the proposed method is very fast with low computational load. The obtained results are compared with existing techniques such as orthogonal polynomial model, multiresolution with BDIP-BVLC method and GLCM based system, and results reveal that the proposed method outperforms the existing methods.  相似文献   

15.
16.
This paper describes a music information retrieval system that uses humming as the key for retrieval. Humming is an easy way for a user to input a melody. However, there are several problems with humming that degrade the retrieval of information. One problem is the human factor. Sometimes, people do not sing accurately, especially if they are inexperienced or unaccompanied. Another problem arises from signal processing. Therefore, a music information retrieval method should be sufficiently robust to surmount various humming errors and signal processing problems. A retrieval system has to extract the pitch from the user's humming. However, pitch extraction is not perfect. It often captures half or double pitches, which are harmonic frequencies of the true pitch, even if the extraction algorithms take the continuity of the pitch into account. Considering these problems, we propose a system that takes multiple pitch candidates into account. In addition to the frequencies of the pitch candidates, the confidence measures obtained from their powers are taken into consideration as well. We also propose the use of an algorithm with three dimensions that is an extension of the conventional Dynamic Programming (DP)algorithm, so that multiple pitch candidates can be treated. Moreover, in the proposed algorithm, DP paths are changed dynamically to take deltaPitches and IOIratios (inter-onset-interval) of input and reference notes into account in order to treat notes being split or unified. We carried out an evaluation experiment to compare the proposed system with a conventional system . When using three-pitch candidates with conference measure and IOI features, the top-ten retrieval accuracy was 94.1%. Thus, the proposed method gave a better retrieval performance than the conventional system.  相似文献   

17.
An overview of audio information retrieval   总被引:28,自引:0,他引:28  
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.  相似文献   

18.
Multimedia Tools and Applications - Similarity measures are very crucial especially in the field of information retrieval. Thus, various distance/similarity measures were proposed throughout the...  相似文献   

19.
Measuring visual similarity between two or more instances within a data distribution is a fundamental task in image retrieval. Theoretically, non-metric distances are able to generate a more complex and accurate similarity model than metric distances, provided that the non-linear data distribution is precisely captured by the system. In this work, we explore neural networks models for learning a non-metric similarity function for instance search. We argue that non-metric similarity functions based on neural networks can build a better model of human visual perception than standard metric distances. As our proposed similarity function is differentiable, we explore a real end-to-end trainable approach for image retrieval, i.e. we learn the weights from the input image pixels to the final similarity score. Experimental evaluation shows that non-metric similarity networks are able to learn visual similarities between images and improve performance on top of state-of-the-art image representations, boosting results in standard image retrieval datasets with respect standard metric distances.  相似文献   

20.
Learning-enhanced relevance feedback is one of the most promising and active research directions in content-based image retrieval in recent years. However, the existing approaches either require prior knowledge of the data or converge slowly and are thus not coneffective. Motivated by the successful history of optimal adaptive filters, we present a new approach to interactive image retrieval based on an adaptive tree similarity model to solve these difficulties. The proposed tree model is a hierarchical nonlinear Boolean representation of a user query concept. Each path of the tree is a clustering pattern of the feedback samples, which is small enough and local in the feature space that it can be approximated by a linear model nicely. Because of the linearity, the parameters of the similartiy model are better learned by the optimal adaptive filter, which does not require any prior knowledge of the data and supports incremental learning with a fast convergence rate. The proposed approach is simple to implement and achieves better performance than most approaches. To illustrate the performance of the proposed approach, extensive experiments have been carried out on a large heterogeneous image collection with 17,000 images, which render promising results on a wide variety of queries.An early version of part of the system was reported in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2001.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号