期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

周东傲林嘉宇《计算机工程与科学》2015,37(4):760-764

自动从视频图像中提取文字信息,对于监控视频图像内容、添加视频标签和建立视频图像检索系统,有重要的意义。文字检测是文字信息提取系统的前端,是文字信息提取中最关键的一步。近年来,视频图像文字信息检测领域有了新的重要的发展,综述从基于区域和基于纹理的文字检测方法进行归纳、比较和分析,概括了近年来文字检测技术的主要进展。此外,为了突出综合性方法的重要性,对其专门进行了总结。最后对视频图像中的文字检测技术的难点进行总结,并对其发展趋势进行展望。相似文献

2.

利用OCR识别技术实现视频中文字的提取

下载免费PDF全文

陈义李言俊孙小炜《计算机工程与应用》2010,46(10):180-183

为了在视频图像中进行字幕信息的实时提取,提出了一套简捷而有效的方法。首先进行文字事件检测,然后进行边缘检测、阈值计算和边缘尺寸限制,最后依据文字像素密度范围进一步滤去非文字区域的视频字幕,提出的叠加水平和垂直方向边缘的方法,加强了检测到的文字的边缘;对边缘进行尺寸限制过滤掉了不符合文字尺寸的边缘。应用投影法最终确定视频字幕所在区域。最后,利用OCR识别技术对提取出来的文字区域进行识别,完成视频中文字的提取。以上方法的结合保证了提出算法的正确率和鲁棒性。相似文献

3.

基于颜色聚类和多帧融合的视频文字识别方法 总被引：1，自引：0，他引：1

易剑彭宇新肖建国《软件学报》2011,22(12):2919-2933

提出一种基于颜色聚类和多帧融合的视频文字识别方法,首先,在视频文字检测模块,综合考虑了文字区域的两个显著特征:一致的颜色和密集的边缘,利用近邻传播聚类算法,根据图像中边缘颜色的复杂程度,自适应地把彩色边缘分解到若干边缘子图中去,使得在各个子图中检测文字区域更为准确.其次,在视频文字增强模块,基于文字笔画强度图过滤掉模糊的文字区域,并综合平均融合和最小值融合的优点,对在不同视频帧中检测到的、包含相同内容的文字区域进行融合,能够得到背景更为平滑、笔画更为清晰的文字区域图像.最后,在视频文字提取模块,通过自适应地选取具有较高文字对比度的颜色分量进行二值化,能够取得比现有方法更好的二值化结果;另一方面,基于图像中背景与文字的颜色差异,利用颜色聚类的方法去除噪声,能够有效地提高文字识别率.实验结果表明,该方法能够比现有方法取得更好的文字识别结果. 相似文献

4.

基于多帧图像的视频文字跟踪和分割算法 总被引：8，自引：2，他引：6

密聪杰刘洋薛向阳《计算机研究与发展》2006,43(9):1523-1529

视频中文字的提取是视频语义理解和检索的重要信息来源．针对视频中的静止文字时间和空间上的冗余特性,以文字区域的边缘位图为特征对检测结果作精化,并提出了基于二分搜索法的快速文字跟踪算法,实现了对文字对象快速有效的定位．在分割阶段,除了采用传统的灰度融合图像进行文字区域增强方法,还结合边缘位图对文字区域进行进一步的背景过滤．实验表明,文字的检测精度和分割质量都有很大提高．相似文献

5.

视频中的字幕提取

下载免费PDF全文

王琦陈临强梁旭《计算机工程与应用》2012,48(5):177-178

提出一种综合运用文字边缘特征、颜色信息以及视频时空特性的字幕提取方法。通过边缘检测获取字幕位置进而得到文字颜色,采用全局混合高斯模型对颜色建模,建模完成后直接利用模型从视频文字变化帧中提取文字颜色层。在判断字幕是否变化时,提出了“与”掩码图的方法。实验结果表明,对于复杂背景下包含1~2种颜色字幕颜色的视频,该方法具有良好的提取效果。相似文献

6.

基于角点检测和自适应阈值的新闻字幕检测 总被引：3，自引：2，他引：1

下载免费PDF全文

张洋朱明《计算机工程》2009,35(13):186-187

目前用于提取新闻视频帧中字幕的方法准确率和检测速度普遍较低,尤其对于分辨率和对比度较小的标题文字,检测效果很差。针对上述问题,提出一种基于角点检测和自适应阈值的字幕检测方法。该方法利用角点检测确定标题帧中的文字区域并进行灰度变换,利用自适应阈值的方法对其进行二值化,得到OCR可识别的文字图片。实验表明,该方法可以快速有效地提取出分辨率和对比度较小的新闻视频标题字幕。相似文献

7.

视频文字信息检查工具的设计与实现 总被引：1，自引：0，他引：1

下载免费PDF全文

文毅龚飞党静雅邢更力《计算机测量与控制》2015,23(5):1754-1757

针对某些特定视频中,画面文字信息经常包含较为敏感文字信息,导致信息泄露,设计实现了一种视频画面中的文字的检测识别系统,对视频画面中的文字标语、文字条幅,新闻画面中的文字导语等信息进行识别与比对;采用基于双阈值的视频镜头分割算法,根据颜色直方图信息提取关键帧,采用最大稳定极值区域算法提取图像中稳定区域,通过聚类和级联分类器实现文字区域提取,最后将文字区域分割后进行OCR识别,实验表明系统针对复杂背景中的文字能够达到较高的检测识别率. 相似文献

8.

基于支持向量机的垃圾邮件过滤方法

下载免费PDF全文

王祖辉姜维《计算机工程》2009,35(13):188-189,

目前用于提取新闻视频帧中字幕的方法准确率和检测速度普遍较低,尤其对于分辨率和对比度较小的标题文字,检测效果很差.针对上述问题,提出一种基于角点检测和自适应阈值的字幕检测方法.该方法利用角点检测确定标题帧中的文字区域并进行灰度变换,利用自适应阈值的方法对其进行二值化,得到OCR可识别的文字图片.实验表明,该方法可以快速有效地提取出分辨率和对比度较小的新闻视频标题字幕. 相似文献

9.

基于LBP和FSVM的视频文字定位方法 总被引：1，自引：0，他引：1

下载免费PDF全文

李丽洁李金宋阳王磊《计算机工程》2011,37(24):144-146

提出一种基于局部二值模式(LBP)和模糊支持向量机(FSVM)的视频文字定位方法。利用边缘信息和形态学操作进行文字粗检测,采用投影直方图和启发式规则形成候选文字区域,提取LBP作为纹理特征,用FSVM对候选文字区域进行精确定位,生成最终的文字块。实验结果表明,该方法具有较好的视频文字定位能力且鲁棒性较强。相似文献

10.

一种新的利用多帧结合检测视频标题文字的算法 总被引：5，自引：0，他引：5

王蓉蓉金万军吴立德《计算机研究与发展》2005,42(7):1191-1197

视频中的标题文字通常在视频信息索引和检索中起到重要作用．提出了一种新的视频标题文字的检测算法．首先采用一种新的多帧结合技术来降低图像背景的复杂度,它基于时间序列对多帧图像进行最小(或最大)像素值搜索,搜索的具体方式由Sobel边缘图来决定．然后以块为单位来进行文字与非文字的分类,即用一扫描窗口对图像进行扫描,以Sobel边缘为特征,判断其是否为文字。一个2级的金字塔被用来检测不同大小的文字．最后,提出一种新的迭代的文字区域分解方法,它能够更精确地定位文字区域的边界．实验结果表明,这种文字检测算法能够取得很高的精度和召回率．相似文献

11.

Accurate video text detection through classification of low and high contrast images

Palaiahnakote Shivakumara Author Vitae Weihua Huang Author Vitae Author Vitae Chew Lim Tan Author Vitae 《Pattern recognition》2010,43(6):2165-2185

Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while low contrast refers to dim intensity values in the video images. The method introduces heuristic rules based on combination of filters and edge analysis for the classification purpose. The heuristic rules are derived based on the fact that the number of Sobel edge components is more than the number of Canny edge components in the case of high contrast video images, and vice versa for low contrast video images. In order to demonstrate the use of this classification on video text detection, we implement a method based on Sobel edges and texture features for detecting text in video images. Experiments are conducted using video images containing both graphic text and scene text with different fonts, sizes, languages, backgrounds. The results show that the proposed method outperforms existing methods in terms of detection rate, false alarm rate, misdetection rate and inaccurate boundary rate. 相似文献

12.

基于稀疏化图结构的转导多标注视频概念检测算法

赵英海蔡俊杰吴秀清孙福明《模式识别与人工智能》2011,24(6):825-832

提出一种基于稀疏化图结构的转导多标注视频概念检测算法。首先,该方法通过信号稀疏化表达方法挖掘样本间视觉相似性关系与概念间分布相关性关系。然后,基于离散隐马尔可夫随机场构建多标注稀疏化图结构完成转导半监督视频概念检测。相关性信息的稀疏化表达可有效去除冗余信息的影响,降低图分类算法的问题复杂度,提高概念检测效率和分类效果。算法在TRECVID2005数据集上进行实验,并与多种有监督、半监督分类算法进行结果比较。实验结果验证该算法的有效性。相似文献

13.

Deep locality-sensitive discriminative dictionary learning for semantic video analysis

Ben-Bright Benuwa Benjamin Ghansah Ernest K. Ansah 《Software》2020,50(4):388-406

Video semantic analysis (VSA) has received significant attention in the area of Machine Learning for some time now, particularly video surveillance applications with sparse representation and dictionary learning. Studies have shown that the duo has significantly impacted on the classification performance of video detection analysis. In VSA, the locality structure of video semantic data containing more discriminative information is very essential for classification. However, there has been modest feat by the current SR-based approaches to fully utilize the discriminative information for high performance. Furthermore, similar coding outcomes are missing from current video features with the same video category. To handle these issues, we first propose an improved deep learning algorithm—locality deep convolutional neural network algorithm (LDCNN) to better extract salient features and obtain local information from semantic video. Second, we propose a novel DL method, called deep locality-sensitive discriminative dictionary learning (DLSDDL) for VSA. In the proposed DLSDDL, a discriminant loss function for the video category based on sparse coding of sparse coefficients is introduced into the structure of the locality-sensitive dictionary learning (LSDL) method. After solving the optimized dictionary, the sparse coefficients for the testing video feature samples are obtained, and then the classification result for video semantic is realized by reducing the error existing between the original and recreated samples. The experiment results show that the proposed DLSDDL technique considerably increases the efficiency of video semantic detection as against competing methods used in our experiment. 相似文献

14.

基于字典优化的稀疏表示的视频镜头分类

陈波詹永照成科扬《计算机应用研究》2012,29(6):2375-2378

为了克服稀疏表示中冗余字典分类效果不佳的问题,提出了基于字典优化的稀疏表示算法。该算法制定了新的基于稀疏表示的分类判别规则,采用了基于冗余字典内基元类内平均欧式距离最小以及类间平均欧式距离最大的字典优化方法,形成优化字典进行特征稀疏表示。将该算法应用于视频镜头的稀疏表示特征提取与分类,实验结果表明该方法优化后的字典进行视频镜头的特征提取和分类,其识别率得到了明显的提高。相似文献

15.

视频中的文字探测 总被引：12，自引：0，他引：12

王辰老松杨胡晓峰《小型微型计算机系统》2002,23(4):478-481

视频中出现的文字往往包含大量的信息 ,是视频分析的重要语义线索 ,探测并识别出来的文字可以为基于内容的视频检索提供索引 .本文简要介绍了目前现有的一些文字探测的方法 ,结合视频中出现的文字的特点 ,提出了一种较为高效的视频文字探测方法 ,该方法在一般图像质量的条件下对中、英文文字都有较好的探测效果 .文章给出了实验结果并对相关问题进行了讨论相似文献

16.

A novel mutual nearest neighbor based symmetry for text frame classification in video

Palaiahnakote Shivakumara Author Vitae Anjan Dutta^{Author Vitae} 《Pattern recognition》2011,44(8):1671-1683

In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max-Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels. 相似文献

17.

Text detection in images using sparse representation with discriminative dictionaries 总被引：2，自引：0，他引：2

Ming Zhao Shutao Li James Kwok 《Image and vision computing》2010

Text detection is important in the retrieval of texts from digital pictures, video databases and webpages. However, it can be very challenging since the text is often embedded in a complex background. In this paper, we propose a classification-based algorithm for text detection using a sparse representation with discriminative dictionaries. First, the edges are detected by the wavelet transform and scanned into patches by a sliding window. Then, candidate text areas are obtained by applying a simple classification procedure using two learned discriminative dictionaries. Finally, the adaptive run-length smoothing algorithm and projection profile analysis are used to further refine the candidate text areas. The proposed method is evaluated on the Microsoft common test set, the ICDAR 2003 text locating set, and an image set collected from the web. Extensive experiments show that the proposed method can effectively detect texts of various sizes, fonts and colors from images and videos. 相似文献

18.

Video text detection and localization in intra-frames of H.264/AVC compressed video

Xueming Qian Huan Wang Xingsong Hou 《Multimedia Tools and Applications》2014,70(3):1487-1502

Video texts are closely related to the video content. The video text information can facilitate content based video analysis, indexing and retrieval. Video sequences are usually compressed before storage and transmission. A basic step of text-based applications is text detection and localization. In this paper, an overlaid text detection and localization method is proposed for H.264/AVC compressed videos by using the integer discrete cosine transform (DCT) coefficients of intra-frames. The main contributions of this paper are in the following two aspects: 1) coarse text blocks detection using block sizes and quantization parameters adaptive thresholds; 2) text line localization according to the characteristics of text in intra frames of H.264/AVC compressed domain. Comparisons are made with the pixel domain based text detection method for the H.264/AVC compressed video. Text detection results on five H.264/AVC video sequences under various qualities show the effectiveness of the proposed method. 相似文献

19.

一种视频字幕检测定位新方法

王勇李小春郑辉胡德文《计算机工程与应用》2004,40(23):40-42,67

简要介绍了现有视频字幕的检测提取方法及独立成分分析的基本理论和算法,探讨了独立成分分析在视频图像序列处理方面的应用,提出了一种基于独立成分分析的新的视频字幕检测提取方法。仿真实验结果表明,在图像背景复杂、图像分辨率低以及字幕字体、大小、颜色多变这些传统检测提取方法或多或少都存在困难的条件下,该方法都具有良好的视频字幕检测提取能力。相似文献

20.

Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing 总被引：1，自引：0，他引：1

Palaiahnakote Shivakumara Anjan Dutta Chew Lim Tan Umapada Pal 《Multimedia Tools and Applications》2014,72(1):515-539

In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well. 相似文献