首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The development of suitable mechanisms for securing XML documents is becoming an urgent need since XML is evolving into a standard for data representation and exchange over the Web. To answer this need, we have designed Author-X [1, 3], a Java-based system specifically conceived for the protection of XML documents. Distinguishing features of the access control model of Author-X are the support for a wide range of protection granularity levels and for subject credentials. Another key characteristic of Author-X is the enforcement of different access control strategies for document release: besides the traditional, on user demand, mode of access control, Author-X also supports push distribution, for document dissemination. Managing an access control system based on such a flexible and expressive model requires the design and implementation of suitable administration tools to help the Security Administrator in efficiently performing administrative operations related to access control policies management. In this paper, we present the strategies and related algorithms we have devised for policy management in Author-X , with particular emphasis on information push support. In the paper, besides presenting the algorithms and the related data structures, we provide a complexity study of the proposed algorithms. Additionally, we describe the implementation of the proposed algorithms in the framework of Author-X .  相似文献   

2.
为了实现电子文档的安全管理,提出了一种基于CPK认证技术的电子文档安全管理设计方案。明确电子文档的目标是安全访问与共享,在此基础上建立了基于CPK认证技术的电子文档安全管理系统设计模型,阐述了系统运行基本原理,从而确定系统主要功能模块。应用CPK认证技术实现了系统的用户身份认证、电子文档传输过程中的签名和验证、用户授权等主要功能。通过系统测试结果表明,该系统运行正常,验证了设计方案的可行性。  相似文献   

3.
随着各类管理信息系统的不断增加,迫切需要建立一个安全的身份认证系统来实现单点登录(SSO)。通过对不同身份认证技术的分析,研究了集中式认证服务(CAS)机制,分析了CAS的工作原理和安全性,并针对CAS存在的安全隐患,提出了一种混合动态数据加密算法MDEA。该算法结合了多种数据加密算法,并引入了随机数和时间戳动态因子,增强了身份认证系统的安全性。  相似文献   

4.
To find a document in the sea of information, you must embark on a search process, usually computer-aided. In the traditional information retrieval model, the final goal is to identify and collect a small number of documents to read in detail. In this case, a single query yielding a scalar indication of relevance usually suffices. In contrast, document corpus management seeks to understand what is happening in the collection of documents as a whole (i.e. to find relationships among documents). You may indeed read or skim individual documents, but only to better understand the rest of the document set. Document corpus management seeks to identify trends, discover common links and find clusters of similar documents. The results of many single queries must be combined in various ways so that you can discover trends. We describe a new system called the Stereoscopic Field Analyzer (SFA) that aids in document corpus management by employing 3D volumetric visualization techniques in a minimally immersive real-time interaction style. This interactive information visualization system combines two-handed interaction and stereoscopic viewing with glyph-based rendering of the corpora contents. SFA has a dynamic hypertext environment for text corpora, called Telltale, that provides text indexing, management and retrieval based on n-grams (n character sequences of text). Telltale is a document management and information retrieval engine which provides document similarity measures (n-gram-based m-dimensional vector inner products) visualized by SFA for analyzing patterns and trends within the corpus  相似文献   

5.
为了有效地解决传统的基于向量表示的文档维数降维算法存在的维数灾难和奇异值问题,提出了基于张量最大间隔投影的Web文档分类算法,该算法能够在维数降维的过程中充分利用文档的结构和关联信息来提高算法的分类鉴别能力,在WebKB和20NG数据集上的实验结果表明该算法优于其他常用的的文档分类算法。  相似文献   

6.
Clustering of related or similar objects has long been regarded as a potentially useful contribution of helping users to navigate an information space such as a document collection. Many clustering algorithms and techniques have been developed and implemented but as the sizes of document collections have grown these techniques have not been scaled to large collections because of their computational overhead. To solve this problem, the proposed system concentrates on an interactive text clustering methodology, probability based topic oriented and semi-supervised document clustering. Recently, as web and various documents contain both text and large number of images, the proposed system concentrates on content-based image retrieval (CBIR) for image clustering to give additional effect to the document clustering approach. It suggests two kinds of indexing keys, major colour sets (MCS) and distribution block signature (DBS) to prune away the irrelevant images to given query image. Major colour sets are related with colour information while distribution block signatures are related with spatial information. After successively applying these filters to a large database, only small amount of high potential candidates that are somewhat similar to that of query image are identified. Then, the system uses quad modelling method (QM) to set the initial weight of two-dimensional cells in query image according to each major colour and retrieve more similar images through similarity association function associated with the weights. The proposed system evaluates the system efficiency by implementing and testing the clustering results with Dbscan and K-means clustering algorithms. Experiment shows that the proposed document clustering algorithm performs with an average efficiency of 94.4% for various document categories.  相似文献   

7.
Most traditional cache document replacement policies are focused on the efficiency aspect and these documents are replaced according to their last access times, request frequencies, and sizes. However, in addition to providing efficient document acquisition service, a successful commercial website also has to create incentives for customers, so as to gain sufficient revenues to support the continuing operation of the website. For this reason, a new cache document replacement policy considering the contribution-to-sales of every document is proposed in this study. In order to evaluate the contribution-to-sales of a document, two web mining techniques are applied. Then the traditional GDSF policy is modified to incorporate this new factor. To evaluate the effectiveness of the proposed methodology, an experimental EC website has been constructed, and a series of computer programs have been developed to simulate the accesses by various kinds of users to the website and to obtain some data for rudimentary analyses. The results showed that the proposed replacement policy could at most increase the hit rate and the byte hit rate by 16% and 9%, respectively, for customers compared with traditional replacement policies.  相似文献   

8.
A tool to discover the main themes in a Spanish or English document   总被引:5,自引:0,他引:5  
While most work on Knowledge Discovery in databases has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. In this paper a system based on information retrieval and text mining methods is presented. In addition, it is shown how the system analyzes a document containing natural language sentences in order to recognize its main topics or themes. The knowledge base used for the system is conformed by trees of concept. The architecture and the main algorithms of the system are discussed in this work.  相似文献   

9.
针对大多数现有的深度文本聚类方法在特征映射过程中过于依赖原始数据质量以及关键语义信息丢失的问题,提出了一种基于关键语义信息补足的深度文本聚类算法(DCKSC)。该算法首先通过提取关键词数据对原始文本数据进行数据增强;其次,设计了一个关键语义信息补足模块对传统的自动编码器进行改进,补足映射过程中丢失的关键语义信息;最后,通过综合聚类损失与关键词语义自动编码器的重构损失学习适合于聚类的表示特征。实验证明,提出算法在五个现实数据集上的聚类效果均优于当前先进的聚类方法。聚类结果证明了关键语义信息补足方法和文本数据增强方法对深度文本聚类的重要性。  相似文献   

10.
Most Web content categorization methods are based on the vector space model of information retrieval. One of the most important advantages of this representation model is that it can be used by both instance‐based and model‐based classifiers. However, this popular method of document representation does not capture important structural information, such as the order and proximity of word occurrence or the location of a word within the document. It also makes no use of the markup information that can easily be extracted from the Web document HTML tags. A recently developed graph‐based Web document representation model can preserve Web document structural information. It was shown to outperform the traditional vector representation using the k‐Nearest Neighbor (k‐NN) classification algorithm. The problem, however, is that the eager (model‐based) classifiers cannot work with this representation directly. In this article, three new hybrid approaches to Web document classification are presented, built upon both graph and vector space representations, thus preserving the benefits and overcoming the limitations of each. The hybrid methods presented here are compared to vector‐based models using the C4.5 decision tree and the probabilistic Naïve Bayes classifiers on several benchmark Web document collections. The results demonstrate that the hybrid methods presented in this article outperform, in most cases, existing approaches in terms of classification accuracy, and in addition, achieve a significant reduction in the classification time. © 2008 Wiley Periodicals, Inc.  相似文献   

11.
The Internet provides a universal platform for large-scale distribution of information and supports inter-organizational services, system integration, and collaboration. Use of multimedia documents for dissemination and sharing of massive amounts of information is becoming a common practice for Internet-based applications and enterprises. With the rapid proliferation of multimedia data management technologies over the Internet, there is growing concern about security and privacy of information. Composing multimedia documents in a distributed heterogeneous environment involves integrating media objects from multiple security domains that may employ different access control policies for media objects. In this paper, we present a security model for distributed document management system that allows creation, storage, indexing, and presentation of secure multimedia documents. The model is based on a time augmented Petri-net and provides a flexible, multilevel access control mechanism that allows clearance-based access to different levels of information in a document. In addition, the model provides detailed multimedia synchronization requirements including deterministic and non-deterministic temporal relations and incomplete timing information among media objects.  相似文献   

12.
基于文档平滑和查询扩展的文档敏感信息检测方法   总被引:1,自引:0,他引:1  
由于办公终端可能出现敏感信息泄露的风险,对终端上的文档进行敏感信息检测就显得十分重要,但现有敏感信息检测方法中存在上下文信息无关的索引导致文档建模不准确、查询语义扩展不充分的问题。为此,首先提出基于上下文的文档索引平滑算法,构建尽可能保留文档信息的索引;然后改进查询语义扩展算法,结合领域本体中概念敏感度适当扩大敏感信息检测范围;最后将文档平滑和查询扩展融合于语言模型,在其基础上提出了文档敏感信息检测方法。将采用不同索引机制、查询关键字扩展算法及检测模型的四种方法进行比较,所提出的算法在文档敏感信息检测中的查全率、准确率和F值分别为0.798,0.786和0.792,各项性能指标均明显优于对比算法。结果表明该算法是一种能更有效检测敏感信息的方法。  相似文献   

13.
This paper describes aminimally immersive three-dimensional volumetric interactive information visualization system for management and analysis of document corpora. The system, SFA, uses glyph-based volume rendering, enabling more complex data relationships and information attributes to be visualized than traditional 2D and surface-based visualization systems. Two-handed interaction using three-space magnetic trackers and stereoscopic viewing are combined to produce aminimally immersive interactive system that enhances the user’s three-dimensional perception of the information space. This new system capitalizes on the human visual system’s pre-attentive learning capabilities to quickly analyze the displayed information. SFA is integrated with adocument management and information retrieval engine named Telltale. Together, these systems integrate visualization and document analysis technologies to solve the problem of analyzing large document corpora. We describe the usefulness of this system for the analysis and visualization of document similarity within acorpus of textual documents, and present an example exploring authorship of ancient Biblical texts. Received: 15 December 1997 / Revised: June 1999  相似文献   

14.
针对传统深度文本聚类方法仅利用中间层的文本语义表示进行聚类,没有考虑到不同层次的神经网络学习到的不同文本语义表示以及中间层低维表示的特征稠密难以有效区分类簇的问题,提出一种基于多层次子空间语义融合的深度文本聚类(deep document clustering via muti-layer subspace semantic fusion,DCMSF)模型。该模型首先利用深度自编码器提取出文本不同层次的潜在语义表示;其次,设计一种多层子空间语义融合策略将不同层的语义表示非线性映射到不同子空间以得到融合语义,并用其进行聚类。另外,利用子空间聚类的自表示损失设计一种联合损失函数,用于监督模型参数更新。实验结果表明,DCMSF方法在性能上优于当前已有的多种主流深度文本聚类算法。  相似文献   

15.
The task of automatic document summarization aims at generating short summaries for originally long documents. A good summary should cover the most important information of the original document or a cluster of documents, while being coherent, non-redundant and grammatically readable. Numerous approaches for automatic summarization have been developed to date. In this paper we give a self-contained, broad overview of recent progress made for document summarization within the last 5 years. Specifically, we emphasize on significant contributions made in recent years that represent the state-of-the-art of document summarization, including progress on modern sentence extraction approaches that improve concept coverage, information diversity and content coherence, as well as attempts from summarization frameworks that integrate sentence compression, and more abstractive systems that are able to produce completely new sentences. In addition, we review progress made for document summarization in domains, genres and applications that are different from traditional settings. We also point out some of the latest trends and highlight a few possible future directions.  相似文献   

16.
LED显示屏主要用来显示文字信息,它的控制系统多为单片机控制,通信方式为rs232或者低速的以太网.考虑到单片机的处理能力和控制系统成本的原因,一般控制系统都没有设置字库,所以LED显示屏上显示的文字一般都用位图点阵信息表示.但基于文本的位图数据量很大,且控制系统的通讯传输速度低下,在发送大量显示画面时位图数据的压缩就非常重要.由于控制系统单片机的处理能力有限,JPEG、GIF等图像压缩算法难以实现.基于行程编码图像压缩算法计算量小、适合单片机处理,而且对于文字位图的压缩率较高.  相似文献   

17.
制造企业对公文审批过程自动化的要求是高效性和安全性,需要系统操作简单,易与其他信息系统集成.公文审批过程中存在三个主要因素,即公文、角色和流程.在此基础上构建出公文审批系统的模型,包括公文审批、监控、流程管理以及角色认证等模块.为方便公文审批流程的管理引入模板的概念,最后通过开发出能够与企业内其他信息系统集成的基于Web的模板式公文审批系统验证了模型的合理性和有效性.  相似文献   

18.
钟征燕  郭燕慧  徐国爱 《计算机应用》2012,32(10):2776-2778
在数字产品日益普及的今天,PDF文档的版权保护问题已成为信息安全领域研究的热点。通过分析PDF文档的结构及相关数字水印算法,针对当前一些大容量文本水印算法存在增加文档大小的缺陷,提出了一种基于PDF文档结构的数字水印算法。该算法利用行末标识符不会在文档中显示的特性,通过等量替换PDF文档中具有固定格式的交叉引用表的行末标识符,来实现水印信息的间接嵌入。实验结果表明,该算法水印容量能满足数字版权保护的要求,隐蔽性好,能抵抗统计等攻击。  相似文献   

19.
In this paper, we present a new method for query reweighting to deal with document retrieval. The proposed method uses genetic algorithms to reweight a user's query vector, based on the user's relevance feedback, to improve the performance of document retrieval systems. It encodes a user's query vector into chromosomes and searches for the optimal weights of query terms for retrieving documents by genetic algorithms. After the best chromosome is found, the proposed method decodes the chromosome into the user's query vector for dealing with document retrieval. The proposed query reweighting method can find the best weights of query terms in the user's query vector, based on the user's relevance feedback. It can increase the precision rate and the recall rate of the document retrieval system for dealing with document retrieval.  相似文献   

20.
本文通过分析高校数字图书馆馆际互借与文献传递系统用户身份认证方式的发展过程,重点阐述基于CALIS馆际互借系统的UAS联合认证方式.共享了本校图书馆使用UAS联合认证方式的一些体会.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号