首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Neural networks for vector quantization of speech and images   总被引:6,自引:0,他引:6  
Using neural networks for vector quantization (VQ) is described. The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer. A powerful feature of the new training algorithms is that the VQ codewords are determined in an adaptive manner, compared to the popular LBG training algorithm, which requires that all the training data be processed in a batch mode. The neural network approach allows for the possibility of training the vector quantizer online, thus adapting to the changing statistics of the input data. The authors compare the neural network VQ algorithms to the LBG algorithm for encoding a large database of speech signals and for encoding images  相似文献   

2.
本文首先提出了一种基于旋转Barnes-Wall格的格型矢量量化器(LVQ)的构造方法及快速量化算法,然后研究了以此LVQ为核心的增益-波形矢量量化器(GSLVQ)的实现方法,最后探讨了GSLVQ在序列图象编码中的应用方案,并给出了较好的实验结果。  相似文献   

3.
Digital image coding using vector quantization (VQ) based techniques provides low-bit rates and high quality coded images, at the expense of intensive computational demands. The computational requirement due to the encoding search process, had hindered application of VQ to real-time high-quality coding of color TV images. Reduction of the encoding search complexity through partitioning of a large codebook into the on-chip memories of a concurrent VLSI chip set is proposed. A real-time vector quantizer architecture for encoding color images is developed. The architecture maps the mean/quantized residual vector quantizer (MQRVQ) (an extension of mean/residual VQ) onto a VLSI/LSI chip set. The MQRVQ contributes to the feasibility of the VLSI architecture through the use of a simple multiplication free distortion measure and reduction of the required memory per code vector. Running at a clock rate of 25 MHz the proposed hardware implementation of this architecture is capable of real-time processing of 480×768 pixels per frame with a refreshing rate of 30 frames/s. The result is a real-time high-quality composite color image coder operating at a fixed rate of 1.12 b per pixel  相似文献   

4.
The authors consider the problem of detecting a discrete Markov source which is transmitted across a discrete memoryless channel. Two maximum a posteriori (MAP) formulations are considered: (i) a sequence MAP detection in which the objective is to determine the most probable transmitted sequence given the observed sequence and (ii) an instantaneous MAP detection which is to determine the most probable transmitted symbol at time n given all the observations prior to and including time n. The solution to the first problem results in a “Viterbi-like” implementation of the MAP detector (with Large delay) while the latter problem results in a recursive implementation (with no delay). For the special case of the binary symmetric Markov source and binary symmetric channel, simulation results are presented and an analysis of these two systems yields explicit critical channel bit error rates above which the MAP detectors become useful. Applications of the MAP detection problem in a combined source-channel coding system are considered. Here, it is assumed that the source is highly correlated and that the source encoder (a vector quantizer (VQ)) fails to remove all of the source redundancy. The remaining redundancy at the output of the source encoder is referred to as the “residual” redundancy. It is shown, through simulation, that the residual redundancy can be used by the MAP detectors to combat channel errors. For small block sizes, the proposed system beats Farvardin and Vaishampayan's channel-optimized VQ by wide margins. Finally, it is shown that the instantaneous MAP detector can be combined with the VQ decoder to form an approximate minimum mean-squared error decoder  相似文献   

5.
The generalization of gain adaptation to vector quantization (VQ) is explored in this paper and a comprehensive examination of alternative techniques is presented. We introduce a class of adaptive vector quantizers that can dynamically adjust the "gain" or amplitude scale of code vectors according to the input signal level. The encoder uses a gain estimator to determine a suitable normalization of each input vector prior to VQ encoding. The normalized vectors have reduced dynamic range and can then be more efficiently coded. At the receiver, the VQ decoder output is multiplied by the estimated gain. Both forward and backward adaptation are considered and several different gain estimators are compared and evaluated. Gain-adaptive VQ can be used alone for "vector PCM" coding (i.e., direct waveform VQ) or as a building block in other vector coding schemes. The design algorithm for generating the appropriate gain-normalized VQ codebook is introduced. When applied to speech coding, gain-adaptive VQ achieves significant performance improvement over fixed VQ with a negligible increase in complexity.  相似文献   

6.
时文华  张雄伟  邹霞  孙蒙 《信号处理》2019,35(4):631-640
针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。 对特征融合和特征拼接两种跳跃连接方式、基于L1和 L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。   相似文献   

7.
A real-time full search vector quantization system for speech waveform coding is implemented using LSTTL and CMOS devices. The system consists of low-pass filters, A/D and D/A converters, an algorithm for discriminating voiced and unvoiced speed, a full search vector quantizer encoder and decoder, and a microprocessor-based controller. The system is designed to operate at two possible rates: one bit/sample using a dimension 8 vector quantizer (6500 bits/s) or 2 bits/sample using a dimension 4 vector quantizer (13 000 bits/s). In both cases the codebooks have rate 8 bits/vector. Separate codebooks were designed for voiced and unvoiced speech based on a training sequence of 640 000 samples containing five different speakers. The subjective and quantitative results are compared to both simulations and with a real-time array processor based implementation.  相似文献   

8.
On entropy-constrained vector quantization using gaussian mixture models   总被引:2,自引:0,他引:2  
A flexible and low-complexity entropy-constrained vector quantizer (ECVQ) scheme based on Gaussian mixture models (GMMs), lattice quantization, and arithmetic coding is presented. The source is assumed to have a probability density function of a GMM. An input vector is first classified to one of the mixture components, and the Karhunen-Lo`eve transform of the selected mixture component is applied to the vector, followed by quantization using a lattice structured codebook. Finally, the scalar elements of the quantized vector are entropy coded sequentially using a specially designed arithmetic coder. The computational complexity of the proposed scheme is low, and independent of the coding rate in both the encoder and the decoder. Therefore, the proposed scheme serves as a lower complexity alternative to the GMM based ECVQ proposed by Gardner, Subramaniam and Rao [1]. The performance of the proposed scheme is analyzed under a high-rate assumption, and quantified for a given GMM. The practical performance of the scheme was evaluated through simulations on both synthetic and speech line spectral frequency (LSF) vectors. For LSF quantization, the proposed scheme has a comparable performance to [1] at rates relevant for speech coding (20-28 bits per vector) with lower computational complexity.  相似文献   

9.
本文提出一种序号预测矢量量化器的结构,与一般矢量量化器相比,它充分利用了图象极强的二维相关特性,并采用预测的方法去除冗余码字,从而在保证译码图象质量与一般矢量量化器的译码图象质量相同的前提下,压缩比可提高一倍以上。  相似文献   

10.
肖强  陈亮  朱涛  黄建军 《信号处理》2011,27(4):563-568
为实现高质量的极低速语音编码,提出一种基于压缩感知理论的线谱对(LSP)参数降维量化算法。编码端利用压缩感知理论对超帧LSP高维矢量进行降维处理,将原始LSP参数投影到低维空间,得到低维测量值,然后采用分裂矢量量化算法对测量值进行量化;解码端以量化后的测量值为已知条件,利用正交匹配追踪算法重构出原始LSP高维矢量。实验结果表明,本算法相对低速语音编码中的矩阵量化方案,平均谱失真降低了0.23dB,相对基于DCT变换的降维量化方案,平均谱失真降低了0.13dB。这种先降维再量化的思想可以大幅减少编码所需的比特数及码本存储复杂度,有效降低语音编码速率,并且合成语音可懂度、自然度较高,音质虽有所失真,但基本上感觉不到明显的听觉质量下降。   相似文献   

11.
Conditional entropy-constrained residual VQ with application toimage coding   总被引:1,自引:0,他引:1  
This paper introduces an extension of entropy constrained residual vector quantization (VQ) where intervector dependencies are exploited. The method, which we call conditional entropy-constrained residual VQ, employs a high-order entropy conditioning strategy that captures local information in the neighboring vectors. When applied to coding images, the proposed method is shown to achieve better rate-distortion performance than that of entropy-constrained residual vector quantization with less computational complexity and lower memory requirements, moreover, it can be designed to support progressive transmission in a natural way. It is also shown to outperform some of the best predictive and finite-state VQ techniques reported in the literature. This is due partly to the joint optimization between the residual vector quantizer and a high order conditional entropy coder as well as the efficiency of the multistage residual VQ structure and the dynamic nature of the prediction.  相似文献   

12.
A lattice-based vector quantizer (VQ) and noiseless code are proposed for transform and subband image coding. The quantization is simple to implement, and no vector codebooks need to be stored. The noiseless code enumerates lattice codevectors based on their (weighted) l(1) norm. A software implementation is able to handle lattice codebooks of size 2(256). The image coding performance is shown to be comparable or superior to the best encoding methods reported in the literature.  相似文献   

13.
孙林慧  张蒙  梁文清 《信号处理》2022,38(12):2519-2531
实际语音分离时,混合语音的说话人性别组合相关信息往往是未知的。若直接在普适的模型上进行分离,语音分离效果欠佳。为了更好地进行语音分离,本文提出一种基于卷积神经网络-支持向量机(CNN-SVM)的性别组合判别模型,来确定混合语音的两个说话人是男-男、男-女还是女-女组合,以便选用相应性别组合的分离模型进行语音分离。为了弥补传统单一特征表征性别组合信息不足的问题,本文提出一种挖掘深度融合特征的策略,使分类特征包含更多性别组合类别的信息。本文的基于CNN-SVM性别组合分类的单通道语音分离方法,首先使用卷积神经网络挖掘梅尔频率倒谱系数和滤波器组特征的深度特征,融合这两种深度特征作为性别组合的分类特征,然后利用支持向量机对混合语音性别组合进行识别,最后选择对应性别组合的深度神经网络/卷积神经网络(DNN/CNN)模型进行语音分离。实验结果表明,与传统的单一特征相比,本文所提的深度融合特征可以有效提高混合语音性别组合的识别率;本文所提的语音分离方法在主观语音质量评估(PESQ)、短时客观可懂度(STOI)、信号失真比(SDR)指标上均优于普适的语音分离模型。   相似文献   

14.
Two enhanced subband coding schemes using a regularized image restoration technique are proposed: the first controls the global regularity of the decompressed image; the second extends the first approach at each decomposition level. The quantization scheme incorporates scalar quantization (SQ) and pyramidal lattice vector quantization (VQ) with both optimal bit and quantizer allocation. Experimental results show that both the block effect due to VQ and the quantization noise are significantly reduced.  相似文献   

15.
自回归(AR)模型是一类描述时序序列相关性的有效方法,经典的AR系数估计方法对残差信号做了简单的假设,在噪声干扰等复杂场景中难以准确估计AR系数,而基于深度神经网络(DNN)的AR(DNN-AR)系数估计方法在训练中容易受到莱文逊-杜宾迭代(LDR)解法的数值稳定性的影响.为改善DNN-AR系数训练的稳定性和整体性能,在保证系统稳定性的前提下,本文利用精度转化提高系统运算速度的思路,提出了基于广义合成分析(GABS)模型的深度网络结构改善方法,提高了AR系数在含噪环境下估计的准确性和网络训练的稳定性.组合DNN的GABS(GABS-DNN)的模型由三个主要部分组成:修正器的谱增强网络、编码器的DNN预处理及LDR参数估计和解码器的AR系数到功率谱的转换.在优化目标函数的过程中,引入了增强谱和观测谱的误差,减少了反向传播时LDR的梯度对增强网络的影响,实现了稳定估计含噪语音的AR系数.  相似文献   

16.
This work is concerned with the problem of designing robust, vector quantizer (VQ)-based communication systems for operation over time-varying Gaussian channels. Transmission energy allocation to VQ codeword bits, according to their error sensitivities, is a powerful tool for improving robustness to channel noise. The power of this technique can be further enhanced by appropriately combining it with index assignment methods. We pose the corresponding joint optimization problem and suggest a simple iterative algorithm for finding a locally optimal solution. The susceptibility of the solution to poor local minima is significantly reduced by an enhanced version of the algorithm which invokes the method of noisy channel relaxation whereby the VQ system is optimized while gradually decreasing the assumed level of channel noise. In a series of experiments, the resulting combined technique is shown to outperform standard pseudo-Gray coding by up to 3.5 dB and to exhibit graceful degradation at mismatched channel conditions. Finally, we extend these ideas to the case where both the transmitter and the receiver have information on the current state of a time-varying channel. The proposed method is based on switched encoding and adaptive decoding. Experimental results show that the proposed system achieves close to optimal performance  相似文献   

17.
A novel fuzzy clustering algorithm for the design of channel-optimized source coding systems is presented in this letter. The algorithm, termed fuzzy channel-optimized vector quantizer (FCOVQ) design algorithm, optimizes the vector quantizer (VQ) design using a fuzzy clustering process in which the index crossover probabilities imposed by a noisy channel are taken into account. The fuzzy clustering process effectively enhances the robustness of the performance of VQ to channel noise without reducing the quantization accuracy. Numerical results demonstrate that the FCOVQ algorithm outperforms existing VQ algorithms under noisy channel conditions for both Gauss-Markov sources and still image data  相似文献   

18.
Joint source-channel coding is an effective approach for the design of bandwidth efficient and error resilient communication systems with manageable complexity. An interesting research direction within this framework is the design of source decoders that exploit the residual redundancy for effective signal reconstruction at the receiver. Such source decoders are expected to replace the traditionally heuristic error concealment units that are elements of most multimedia communication systems. In this paper, we consider the reconstruction of signals encoded with a multistage vector quantizer (MSVQ) and transmitted over a noisy communications channel. The MSVQ maintains a moderate complexity and, due to its successive refinement feature, is a suitable choice for the design of layered (progressive) source codes. An approximate minimum mean squared error source decoder for MSVQ is presented, and its application to the reconstruction of the linear predictive coefficient (LPC) parameters in mixed excitation linear prediction (MELP) speech codec is analyzed. MELP is a low-rate standard speech codec suitable for bandwidth-limited communications and wireless applications. Numerical results demonstrate the effectiveness of the proposed schemes  相似文献   

19.
Classified Vector Quantization of Images   总被引:1,自引:0,他引:1  
Vector quantization (VQ) provides many attractive features for image coding with high compression ratios. However, initial studies of image coding with VQ have revealed several difficulties, most notably edge degradation and high computational complexity. We address these two problems and propose a new coding method, classified vector quantization (CVQ), which is based on a composite source model. Blocks with distinct perceptual features, such as edges, are generated from different subsources, i.e., belong to different classes. In CVQ, a classifier determines the class for each block, and the block is then coded with a vector quantizer designed specifically for that class. We obtain better perceptual quality with significantly lower complexity with CVQ when compared to ordinary VQ. We demonstrate with CVQ visual quality which is comparable to that produced by existing coders of similar complexity, for rates in the range 0.6-1.0 bits/pixel.  相似文献   

20.
We present in this paper a new distributed video coding (DVC) architecture for wireless capsule endoscopy. It is based on the state of the art DVC systems, but without using key frames. Instead, it uses an adapted vector quantization (VQ) with a searching complexity that is shifted to the decoder. VQ allows creating a good side information (SI) by exploiting the similarities in human anatomy. Thus, SI is created from a codebook (CB) rather than by motion compensated prediction. This approach decreases largely the complexity of the encoder, which codes only Wyner-Ziv frames, and allows a progressive decoding. The encoder of the proposed DVC generates only a simple hash that is used by the decoder to select the corresponding VQ codeword. The obtained experimental results show that rate-distortion results are better than those of JPEG, and show the possibility of using scalable coding to control the used rate and energy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号