首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
提出一种从科技文献等文档中自动抽取元数据的方法,将自动归纳法和相似特征度算法结合起来,基于特征相似的归纳学习算法自动生成抽取规则,并对文档进行元数据的自动抽取。这种方法利用文档自身某些特有属性,对文档的内容进行分块,利用归纳法自动生成抽取规则,并结合特征相似度对生成规则进行匹配,然后对文档元数据信息进行自动抽取,提高了自动生成规则的效率和抽取元数据信息的准确率。  相似文献   

2.
3.
4.
基于影像融合的IKONOS影像阴影信息自动提取方法   总被引:5,自引:0,他引:5  
黄浩  张友静  马雪梅 《遥感信息》2004,(4):29-31,i002
高分辨率卫星影像中的地物阴影是其特有的组成部分,如何有效地提取和利用这些阴影信息对于高分辨率卫星影像的应用是一个具有重要意义和实际价值的问题。本文提出在对IKONOS全色波段与多光谱的2、3、4波段进行基于IHS色度空间的影像融合的基础上,运用波谱角度映射表分类方法,对IKONOS影像中的阴影进行自动提取。试验结果表明该方法提取效果良好,平均精度可达85.3%。该方法为高分辨率卫星影像的阴影信息自动提取提供了一种有效途径。  相似文献   

5.
针对建筑物在城市化发展规划、地理国情信息系统更新、数字化城市以及军事侦察等方面的迫切要求,提出将半监督鉴别分析(Semi-supervised Discriminant Analysis,SDA)算法应用于高分辨率SAR影像的建筑区提取中,实现快速提取建筑区信息以及提高城市地物目标识别能力。以Radarsat-2影像和TerraSAR-X影像为实验数据,基于灰度共生矩阵计算影像的各种纹理特征;结合SDA算法进行特征提取,并以新特征作为大津法(Otsu)的输入提取建筑区;最后对分类结果进行后处理。实验结果与线性鉴别分析(Linear Discriminant Analysis,LDA)算法和局部保持投影(Local Preserving Projection,LPP)算法进行比较,结果表明:SDA算法具有较强的泛化能力,在先验类别信息较少时,适用于高分辨率SAR影像的特征提取,可以快速有效地提取建筑区信息。  相似文献   

6.
World Wide Web is transforming itself into the largest information resource making the process of information extraction (IE) from Web an important and challenging problem. In this paper, we present an automated IE system that is domain independent and that can automatically transform a given Web page into a semi-structured hierarchical document using presentation regularities. The resulting documents are weakly annotated in the sense that they might contain many incorrect annotations and missing labels. We also describe how to improve the quality of weakly annotated data by using domain knowledge in terms of a statistical domain model. We demonstrate that such system can recover from ambiguities in the presentation and boost the overall accuracy of a base information extractor by up to 20%. Our experimental evaluations with TAP data, computer science department Web sites, and RoadRunner document sets indicate that our algorithms can scale up to very large data sets.  相似文献   

7.
基于规则归纳的信息抽取系统实现   总被引:2,自引:0,他引:2  
面对Web信息的迅猛增长,信息抽取技术非常适合于从大量的文档中抽取需要的事实数据。通过文档对象模型(DOM)解析以及检索、抽取、映射等规则的定义,设计并实现了一种具有规则归纳能力的信息抽取系统,用于Web信息的自动检索。在用于抽取规则归纳的框架下,还重点对用于生成抽取模式的WHISK学习算法进行了实验对比分析,结果表明系统对于单槽和多槽数据都具有不错的归纳学习能力。  相似文献   

8.
Web中的行情数据获取与预测研究   总被引:1,自引:0,他引:1       下载免费PDF全文
抽取网页中的行情数据进行预测和分析具有重要意义。提出了Web中的行情数据抽取算法,该算法主要基于“行情数据通常在网页中表现为区域最大的数据表格”等实践规律,首先自动识别出最大的数据表格,然后转换为DOM树结构,最后抽取DOM树的结点值。与传统算法不同,算法自动抽取行情区域而无需用户定义抽取数据区域。设计了一个农产品价格预测原型系统,该系统针对某个农产品,自动从特定网站获取价格数据,对月度价格进行预测,实验表明预测性能较好。  相似文献   

9.
基于改进直线Snake算法的建筑物自动提取   总被引:1,自引:0,他引:1  
为研究航空影像中建筑物的自动、半自动提取,通过分析直线Snake算法,对其内部能量函数增加1个平均连通距离,修改二阶项,归一化外部能量函数,增加1个外部力,然后用改进后的直线Snake算法结合贪婪算法对建筑物进行提取.该方法能正确地自动提取建筑物.实验结果表明新算法可以提高提取效率.  相似文献   

10.
目的 格式塔心理学的理论基础为通过对事物的部分感知,实现对事物整体的认识。本文将该思想应用到建筑物提取中,提出一种兼顾目标细节及整体几何特征的高分辨率遥感影像建筑物提取方法。方法 首先,利用SIFT算法提取特征点作为候选边缘点;然后定义格式塔序列连续性原则判别边缘点,从而得到边缘点点集;并由边缘点点集拟合边缘,实现遥感影像建筑物提取。结果 利用提出算法,对WorldView-2遥感影像进行建筑物提取实验。通过与基于多尺度分割和区域合并的建筑物提取算法对比可以看出,提出算法能够更加准确、完整地提取出建筑物。采用分支因子、遗漏因子、检测率和完整性4个定量化指标对实验结果的定量评价,本文算法的检测率和完整性均大于对比算法,且本文算法的检测率均在95%以上,验证了提出基于格式塔理论的高分辨率遥感影像建筑物提取算法的有效性和准确性。结论 基于格式塔的高分辨率遥感影像建筑物提取算法能够准确刻画建筑物细节特征,同时兼顾建筑物整体几何轮廓,准确提取高分辨率遥感影像中的建筑物。本文算法针对高分辨率遥感影像,适用于提取边缘具有直线特征的建筑物。使用本文算法进行遥感影像建筑物提取时,提取精度会随分辨率降低而降低,建议实验影像分辨率在5 m以上。  相似文献   

11.
Dynamic web sites commonly return information in the form of lists and tables. Although hand crafting an extraction program for a specific template is time-consuming but straightforward, it is desirable to automatically generate template extraction programs from examples of lists and tables in html documents. Supervised approaches have been shown to achieve high accuracy, but they require manual labelling of training examples, which is also time consuming. Fully unsupervised approaches, which extract rows and columns by detecting regularities in the data, cannot provide sufficient accuracy for practical domains. We describe a novel technique, Post-supervised Learning, which exploits unsupervised learning to avoid the need for training examples, while minimally involving the user to achieve high accuracy. We have developed unsupervised algorithms to extract the number of rows and adopted a dynamic programming algorithm for extracting columns. Our method achieves high performance with minimal user input compared to fully supervised techniques.  相似文献   

12.
周贤  刘义伦  李学军 《计算机应用》2006,26(5):1214-1216
针对炭素制品X射线检测图像的特点,对其缺陷提取技术进行了研究。首先设计了目标边界提取和基于小波变换的图像增强算法,实现了原始图像中目标区域的增强及其背景的去除。在此基础上,提出对不同的缺陷类型,可分别通过两条途径来实现:一是采用小波变换提取缺陷边缘,二是采用数学形态学结合迭代阈值法提取缺陷区域。实验结果表明,两者均较好地实现了缺陷的自动提取与分割,为缺陷特征参数的提取与选择奠定了良好的基础。  相似文献   

13.
Feature extraction is an important component of a pattern recognition system. It performs two tasks: transforming input parameter vector into a feature vector and/or reducing its dimensionality. A well-defined feature extraction algorithm makes the classification process more effective and efficient. Two popular methods for feature extraction are linear discriminant analysis (LDA) and principal component analysis (PCA). In this paper, the minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction. A generalized MCE (GMCE) training algorithm is proposed to mend the shortcomings of the MCE training algorithm. LDA, PCA, and MCE and GMCE algorithms extract features through linear transformation. Support vector machine (SVM) is a recently developed pattern classification algorithm, which uses non-linear kernel functions to achieve non-linear decision boundaries in the parametric space. In this paper, SVM is also investigated and compared to linear feature extraction algorithms.  相似文献   

14.
网页数据自动抽取系统   总被引:6,自引:0,他引:6  
在Internet中存在着大量的半结构化的HTML网页。为了使用这些丰富的网页数据,需要将这些数据从网页中重新抽取出来。该文介绍了一种新的基于树状结构的信息提取方法和一个自动产生包装器的系统DAE(DOMbasedAutomaticExtraction),将HTML网页数据转换为XML数据,在提取的过程中基本上不需要人工干预,因而实现了抽取过程的自动化。该方法可以应用于信息搜索agent中,或者应用于数据集成系统中等。  相似文献   

15.
In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.  相似文献   

16.
本文针对深度神经网络对高分二号遥感影像道路提取时细节信息丢失较多、道路周围环境考虑不充分等情况, 在已有的研究成果上, 提出一种基于全卷积神经网络遥感影像道路提取的改进方案. 方案创新研究了全卷积神经网络的算法原理, 将预调色后的高分二号影像按一定尺寸分幅输出, 将输出图像及标签对应输入于以全卷积神经网络为基础的改进网络, 通过结合残差单元以及增加网络层数得到识别精度较高的道路提取图像. 实验表明, 该方法在同一样本中对高分二号卫星影像道路提取的效果有所提升, 道路的完整性和准确性有所提高.  相似文献   

17.
This work proposes a novel adaptive approach for character segmentation and feature vector extraction from seriously degraded images. An algorithm based on the histogram automatically detects fragments and merges these fragments before segmenting the fragmented characters. A morphological thickening algorithm automatically locates reference lines for separating the overlapped characters. A morphological thinning algorithm and the segmentation cost calculation automatically determine the baseline for segmenting the connected characters. Basically, our approach can detect fragmented, overlapped, or connected character and adaptively apply for one of three algorithms without manual fine-tuning. Seriously degraded images as license plate images taken from real world are used in the experiments to evaluate the robustness, the flexibility and the effectiveness of our approach. The system approach output data as feature vectors keep useful information more accurately to be used as input data in an automatic pattern recognition system.  相似文献   

18.
基于不变矩的高分辨率遥感图像建筑物提取方法   总被引:1,自引:0,他引:1  
为了有效地对图像进行特征提取, 利用不变矩算法对IKONOS和WorldView两种高分辨率遥感图像的城市建筑物地区进行提取。首先将图像数据经过Canny边缘检测和标记分水岭分割, 然后在此基础上分别利用胡氏不变矩和仿射不变矩对图像进行特征提取; 最后通过实验结果的评价可以证明在建筑物的特征提取上, 仿射不变矩比胡氏不变矩的提取效果更加显著, 进而也证明了利用不变矩算法对高分辨率遥感图像建筑物特征提取这一方法是可行且有效的。  相似文献   

19.
基于Web数据的本体概念抽取   总被引:1,自引:0,他引:1  
本体论(Ontology)在知识管理及语义网(Semantic Web)中越来越重要,但建造本体往往需要耗费大量的时间,且建造完成后本体的维护对知识管理者来说也是费时的工作。自动创建领域Ontology可以克服手工方法的不足,成为当前的研究热点之一;而概念是本体中最重要的组成部分之一,从半结构化的Web文档中自动抽取概念的效率和准确度的高低,直接决定了自动建造的本体的质量,提出一种自动的本体概念抽取模型,此模型不依赖于领域词典或核心本体,并且能达到快速有效地通过对中文Web文本挖掘自动地构建及更新领域本体概念的目的。  相似文献   

20.
Personal information extraction, which extracts the persons in question and their related information (such as biographical information and occupation) from web, is an important component to construct social network (a kind of semantic web). For this practical task, two important issues are to be discussed: personal named entity ambiguity and the extraction of personal information for a specific person. For personal named entity ambiguity, which is a common phenomenon in the fast growing web resource, we propose a robust system which extracts lightweight features with a totally unsupervised approach from broad resources. The experiments show that these lightweight features not only improve the performances, but also increase the robustness of a disambiguation system. To extract the information of the focus person, an integrated system is introduced, which is able to effectively re-use and combine current well-developed tools for web data, and at the same time, to identify the expression properties of web data. We show that our flexible extraction system achieves state-of-the-art performances, especially the high precision, which is very important for real applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号