首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 553 毫秒
1.
马建刚  马应龙 《计算机应用》2019,39(6):1696-1700
基于海量的司法文书进行的高效司法文档分类有助于目前的司法智能化应用,如类案推送、文书检索、判决预测和量刑辅助等。面向通用领域的文本分类方法因没有考虑司法领域文本的复杂结构和知识语义,导致司法文本分类的效能很低。针对该问题提出了一种语义驱动的方法来学习和分类司法文书。首先,提出并构建了面向司法领域的领域知识模型以清晰表达文档级语义;然后,基于该模型对司法文档进行相应的领域知识抽取;最后,利用图长短期记忆模型(Graph LSTM)对司法文书进行训练和分类。实验结果表明该方法在准确率和召回率方面明显优于常用的长短期记忆(LSTM)模型、多类别逻辑回归和支持向量机等方法。  相似文献   

2.
基于OCPN的SMIL文档创作平台设计与实现   总被引:2,自引:0,他引:2  
包小源  金彦钟  宋再生 《计算机应用》2004,24(1):126-128,133
基于SMIL的多媒体文档创作平台是目前的应用研究重点。SMIL Authoring Tool(SAT)创作平台在参考了SMIL2.O定义基础上,以OCPN为多媒体同步时序定义的基本模型,分析了国内外现有的系统优缺点之后,利用面向对象的方法实现。SAT相对于已有的系统而言,它使用树形组织结构和OCPN对节点之间的时序和节点内部的详细同步信息进行设置,二者互相补充,可使用户进行简单的同步时序设置。  相似文献   

3.
面向对象多媒体数据库管理系统的研究与实现   总被引:4,自引:0,他引:4  
本文提出了一种基于对象元模型的面向对象多媒体数据库管理系统的设计思想和实现方法,给出了复杂对象的操作算法,对旬的存储结构,对象的知识表达和推理机制,并设计实现对象结构可视查询语言。  相似文献   

4.
The creation and deployment of knowledge repositories for managing, sharing, and reusing tacit knowledge within an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge repository, knowledge maps, often created by document clustering techniques, represent an appealing and promising approach. Various document clustering techniques have been proposed in the literature, but most deal with monolingual documents (i.e., written in the same language). However, as a result of increased globalization and advances in Internet technology, an organization often maintains documents in different languages in its knowledge repositories, which necessitates multilingual document clustering (MLDC) to create organizational knowledge maps. Motivated by the significance of this demand, this study designs a Latent Semantic Indexing (LSI)-based MLDC technique capable of generating knowledge maps (i.e., document clusters) from multilingual documents. The empirical evaluation results show that the proposed LSI-based MLDC technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision, and is capable of maintaining a good balance between monolingual and cross-lingual clustering effectiveness when clustering a multilingual document corpus.  相似文献   

5.
针对卫星星座健康状态管理文档涉及多项遥测参数的查询和计算、文档格式要求严格、编制工作量巨大、人工耗时较长的问题,提出了一种卫星星座健康状态管理文档自动生成方法.通过对文档中所含的基本数据类型进行归类分析,制定配置文件存储规则,对文档模板进行自定义设置,并应用文档自动生成算法,利用文档模板及相关参数生成数据汇总文档.该方法能够实现文档编制过程中的知识复用和通用内容生成,建立规范有效的文档编制流程.  相似文献   

6.
分析了对象模型模板的原理,提出了对象模型模板向关系模型映射的规则,重点分析了对象类结构表和对象类属性表的映射方法.  相似文献   

7.
Automatic text summarization (ATS) has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale corpora. However, there is still no guarantee that the generated summaries are grammatical, concise, and convey all salient information as the original documents have. To make the summarization results more faithful, this paper presents an unsupervised approach that combines rhetorical structure theory, deep neural model, and domain knowledge concern for ATS. This architecture mainly contains three components: domain knowledge base construction based on representation learning, the attentional encoder–decoder model for rhetorical parsing, and subroutine-based model for text summarization. Domain knowledge can be effectively used for unsupervised rhetorical parsing thus rhetorical structure trees for each document can be derived. In the unsupervised rhetorical parsing module, the idea of translation was adopted to alleviate the problem of data scarcity. The subroutine-based summarization model purely depends on the derived rhetorical structure trees and can generate content-balanced results. To evaluate the summary results without golden standard, we proposed an unsupervised evaluation metric, whose hyper-parameters were tuned by supervised learning. Experimental results show that, on a large-scale Chinese dataset, our proposed approach can obtain comparable performances compared with existing methods.  相似文献   

8.
针对金融类公告中的结构化数据难以被高效快速提取的问题,提出一种基于文档结构与Bi-LSTM-CRF网络模型的信息抽取方法。自定义一种文档结构树生成算法,利用规则从文档结构树中抽取所需节点信息;构建基于信息句触发词的局部句子规则,抽取包含结构化字段信息的信息句;将字段的结构化信息抽取看作序列标注问题,分词时加入领域知识词典,构建基于Bi-LSTM-CRF的神经网络模型进行字段信息识别。实验结果表明,该信息抽取方法可以满足多类型公告的结构化信息提取,最终的信息句与字段信息抽取的平均F1值均可达到91%以上,验证了该方法在产品业务中的可行性和实用性。  相似文献   

9.
针对现有的空间向量模型在进行文档表示时忽略词条之间的语义关系的不足,提出了一种新的基于关联规则的文档向量表示方法。在广义空间向量模型中分析词条的频繁同现关系得到词条同现语义,根据关联规则分析词条之间的关联相关性,挖掘出文档中词条之间的潜在关联语义关系,将词条同现语义和关联语义线性加权对文档进行表示。实验结果表明,与BOW模型和GVSM模型相比,采用关联规则文档向量表示的文档聚类结果更准确。  相似文献   

10.
Ranking plays important role in contemporary information search and retrieval systems. Among existing ranking algorithms, link analysis based algorithms have been proved to be effective for ranking documents retrieved from large-scale text repositories such as the current Web. Recent developments in semantic Web raise considerable interest in designing new ranking paradigms for various semantic search applications. While ranking methods in this context exist, they have not gained much popularity. In this article we introduce the idea of the “Rational Research” model which reflects search behaviour of a “rational” researcher in a scientific research environment, and propose the RareRank algorithm for ranking entities in semantic search systems, in particular, we focus on elaborating the rationale and implementation of the algorithm. Experiments are performed using the RareRank algorithm and the results are evaluated by domain experts using popular ranking performance measures. A comparison study with existing link-based ranking algorithms reveals the benefits of the proposed method.  相似文献   

11.
在建立ERP业务模型之后,需要将其进行文档化输出,以协助ERP系统设计与开发人员理解模型并快速开发系统.分析了ERP模型的文档化需求,提出了一种基于页结构的ERP文档模型,建立文档与ERP模型之间的映射关系,并提出了基于XML的文档描述语言DDL。在此基础上,设计了一种文档生成器,通过标准文档模板配置以满足文档格式与内容的个性化需求,通过将ERP模型数据自动写入标准模板以自动生成DDL文档和最终Word文档。  相似文献   

12.
Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely geometrical layout and lexical information. To handle these documents automatically, logical structure information is necessary. In this paper, we first analyze the elements of the form documents from a communication point of view and retrieve the grammatical elements that appear in them. Then, we present a document structure grammar which governs the logical structure of the form documents. Finally, we propose a structure analysis system of the table form documents based on the grammar. By using grammar notation, we can easily modify and keep it consistent, as the rules are relatively simple. Another advantage of using grammar notation is that it can be used for generating documents only from logical structure. In our system, documents are assumed to be composed of a set of boxes and they are classified as seven box types. Then the box relations between the indication box and its associated entry box are analyzed based on the semantic and geometric knowledge defined in the document structure grammar. Experimental results have shown that the system successfully analyzed several kinds of table forms.  相似文献   

13.
There are many different types of word processor on the market today, each storing documents in a different way. Hence, a document created using one particular word processor cannot be edited using another word processor. Documents are often exchanged in an office environment, and should be in a form acceptable by other vendors' word processing packages. To make the exchange of documents created using word processors general and flexible, a standard canonical form has been adopted, and is outlined here. A methodology for converting the word processor documents into canonical form is described, detailing document and canonical structure, look-up tables, the intermediate file structure, and logical/layout object information. The author concludes by working through the methodology using an example.  相似文献   

14.
刘彤  倪维健 《计算机科学》2015,42(10):275-280, 286
各种专业领域中的文档往往具有显著的结构化特征,即一篇文档往往是由具有不同表达功能的相对固定的多个文本字段构成,同时这些字段蕴含了相关的领域知识。针对专业文档的结构化和领域化特征,设计了一种面向结构化领域文档的信息检索模型。在该模型中,首先对领域文档集进行挖掘以构建能够反映领域知识的结构化模型,之后以此为基础设计了结构化文档检索算法来为用户查询返回相关的领域文档。选择一类典型的领域文档——农技处方开展了应用研究,利用一份现实的农技处方文档数据集将提出的方法与传统的信息检索方法进行了实验对比分析,并开发了农技处方检索原型系统。  相似文献   

15.
We consider the problem of scheduling an application on a computing system consisting of heterogeneous processors and data repositories. The application consists of a large number of file-sharing otherwise independent tasks. The files initially reside on the repositories. The processors and the repositories are connected through a heterogeneous interconnection network. Our aim is to assign the tasks to the processors, to schedule the file transfers from the repositories, and to schedule the executions of tasks on each processor in such a way that the turnaround time is minimized. We propose a heuristic composed of three phases: initial task assignment, task assignment refinement, and execution ordering. We experimentally compare the proposed heuristics with three well-known heuristics on a large number of problem instances. The proposed heuristic runs considerably faster than the existing heuristics and obtains 10–14% better turnaround times than the best of the three existing heuristics.  相似文献   

16.
17.
The huge volume of distributed information that is nowadays available in electronic multimedia documents forces a lot of people to consume a significant percentage of their time looking for documents that contain information useful to them. The filtering of electronic documents seems hard to automate, partly because of document heterogeneity, but mainly because it is difficult to train computers to have an understanding of the contents of these documents and make decisions based on user-subjective criteria. In this paper, we suggest a model for the automation of content-based electronic document filtering, supporting multimedia documents in a wide variety of forms. The model is based on multi-agent technology and utilizes an adaptive knowledge base organized as a set of logical rules. Implementations of the model using the client-server architecture should be able to efficiently access documents distributed over an intranet or the Internet.  相似文献   

18.
方铖  曾平 《计算机应用》2007,27(10):2498-2500
当前已有的数据访问对象(DAO)模式普遍存在诸多不足之处,例如与业务对象的耦合度较大,不能实现软件系统的动态扩充,实现代码重复,系统维护难度较大等。针对这些问题,借鉴数据绑定的有关思想,引入元数据、元模型的概念,利用XML语言的独立性,提出了一个独立性更强、可动态扩充的数据访问对象模式,并结合具体应用实例说明了该模式的使用过程。该模式的创新之处在于如果需要增加业务对象,只需要修改映射文件,不必改动DAOFactory类的任何代码;而且由于该模式用一个DAO实现类完成所有业务对象的数据访问,如果需要修改某个SQL语法,只需要修改这个DAO实现类,不必逐一修改各业务对象类对应的DAO实现类。  相似文献   

19.
The Digital Library Initiative (DLI) project at the University of Illinois at Urbana-Champaign is developing the information infrastructure to effectively search technical documents on the Internet. The authors are constructing a large testbed of scientific literature, evaluating its effectiveness under significant use, and researching enhanced search technology. They are building repositories (organized collections) of indexed multiple-source collections and federating (merging and mapping) them by searching the material via multiple views of a single virtual collection. Developing widely usable Web technology is also a key goal. Improving Web search beyond full-text retrieval will require using document structure in the short term and document semantics in the long term. Their testbed efforts concentrate on journal articles from the scientific literature, with structure specified by the Standard Generalized Markup Language (SGML). Research efforts extract semantics from documents using the scalable technology of concept spaces based on context frequency. They then merge these efforts with traditional library indexing to provide a single Internet interface to indexes of multiple repositories  相似文献   

20.
Machine Learning for Intelligent Processing of Printed Documents   总被引:1,自引:0,他引:1  
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. This article proposes the application of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated from a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classification and understanding of documents. Issues concerning the incremental induction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the proposed solutions is empirically evaluated by processing a set of real printed documents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号