首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 98 毫秒
1.
基于D-S证据理论的邮件筛选方法   总被引:2,自引:0,他引:2  
针对目前色情邮件筛选中只针对邮件的文本内容进行过滤的不足,考虑邮件的文本和附件中所携带的图片信息,提出了一种基于证据理论的色情邮件的筛选方法。新方法分别使用一个色情图片检测器和一个文本分类器分别对邮件的附件及其内容进行初判别,然后将两个检测器的判别结果作为证据,使用D-S理论融合两个检测器的输出得到最后的判别结果。实验表明,该方法能有效地提高色情邮件的识别率。  相似文献   

2.
尹美娟  陈庶民  刘晓楠  路林 《计算机科学》2011,38(12):182-186,199
邮箱用户身份信息挖掘是数据挖掘研究的一个热点。当前相关研究大多仅从邮件头中抽取邮箱用户的别名,遗漏了邮件正文中潜藏的更能代表通信双方身份的别名信息。针对纯文本邮件正文中邮箱用户别名信息抽取问题,提出了基于统计和规则过滤的称呼块和签名块定位算法,该算法能高效准确地从邮件正文中提取出蕴涵邮箱用户别名的称呼块和签名块文本片段;进一步提出了基于别名边界词汇模板修正的别名抽取方法,从而提高了仅基于命名实体识别或词性标注工具识别别名的准确率。实验结果表明,提出的方法可以有效地抽取出邮件正文中邮箱用户的别名。  相似文献   

3.
我们常用的免费邮箱通常都有附件大小和附件数量的 限制,给发送邮件带来了不便。但是对于邮件正文的大小却 没有限制,所以如果能把附件转化成文本放在正文中就可以 突破这个限制了。 1.把邮件变成文本文件:运行OE,按正常方式在“新 邮件”窗口书写邮件并添加附件,单击“文件/另存为”,保  相似文献   

4.
电子邮件已成为许多企业开展商务与办公的重要媒介,许多信息都保存在电子邮件系统。对大量邮件的管理,信息分类是一种有效的管理方法,但传统的人工文本分类方式相对静态且耗时较多。针对非结构化的邮件信息管理,提出采用动态分类体系,通过文本挖掘方法,开发一套基于多智能代理架构的电子邮件自动分类系统,提升邮件自动分类的效率。  相似文献   

5.
分析邮件特征对邮件分类的影响,提出了双层分类方法并用于邮件服务智能代理.它包括邮件长度分类、邮件采集与预处理、文本分词、特征选取和邮件分类器等功能模块.此代理不仅可使邮件服务器具有自动过滤垃圾邮件的能力,也可以用于电子政务和电子商务,对邮件自动分类和转发.该双层分类方法首先对邮件按长度进行分类,然后根据邮件的不同长度类分别使用不同的贝叶斯分类器,从而实现垃圾邮件的过滤.实验表明它有效地提高了邮件分类的效率.  相似文献   

6.
Web页面中常包含非主题信息的内容,网页必须剔除这些无用的信息后才能形成有用的文本信息。文本分类对文本信息的进一步加工处理至关重要,是信息搜索领域的另一研究课题。为了剔除网页中的无用信息,提出一种基于HTML自身结构特点的网页正文信息抽取方法,同时结合文章标题信息,实现文本自动分类的简易分类方法。该方法可以提高网页正文提取及其自动文本分类的效率。实验证明,该方法是可行的。  相似文献   

7.
网页信息抽取及其自动文本分类的实现   总被引:3,自引:1,他引:2  
Web页面中常包含非主题信息的内容,网页必须剔除这些无用的信息后才能形成有用的文本信息.文本分类对文本信息的进一步加工处理至关重要,是信息搜索领域的另一研究课题.为了剔除网页中的无用信息,提出一种基于HTML自身结构特点的网页正文信息抽取方法,同时结合文章标题信息,实现文本自动分类的简易分类方法.该方法可以提高网页正文提取及其自动文本分类的效率.实验证明,该方法是可行的.  相似文献   

8.
比较文本对于企业竞争产品分析至关重要,但目前面向问答领域的比较文本分类研究较少。针对问答文本中比较信息丰富、主题集中的特点,提出了基于主题特征和关键词特征扩展的比较文本分类方法。通过预训练主题模型,推断问答文本的主题概率分布作为其主题特征;针对向量拼接、求和导致关键词信息流失的问题,设计GRU自编码器实现关键词向量特征提取。综合文本主题信息和关键词语义,从语言、产品、情感、社交、主题、关键词角度构建比较文本分类特征,最后使用多种分类器对问答文本进行分类。实验结果表明,构建的特征行之有效,比较文本分类效果较好。  相似文献   

9.
基于信息度量的图像特征与文本图像分类   总被引:2,自引:1,他引:2  
童莉  平西建 《计算机工程》2004,30(17):143-145
作为一种基本图像类型,文本图像在电子商务等方面得到了广泛应用。针对图像数据库中文本图像识别与分类的应用需求,依据文本图像数据与连续色调图像的总体灰度分布差异,该文提出了一种基于图像信息度量(Picture Information Measure IPIM)的图像特征和基于该特征的文本图像分类方法。约2G、800幅网络图像数据库的分析和实验表明,图像的PIM特征可以显著区分文本图像和连续色调图像,识别和分类效果好。  相似文献   

10.
针对目前单纯依赖于分析图像内容或文本关键词的成人图像判定算法的不足,提出一种融合网络图像的相关文本特征与图像内容语义特征的成人图像判定算法。成人图像的特征信息可能存在于其图像内容及其相关文本如图像文件名、所在网页中。在视觉词袋模型的基础上,将文本分析得到的相关文本特征与图像视觉元素特征如纹理、局部形态等进行底层特征融合,并采用支持向量机分类器实现图像分类。实验结果表明,该算法具有较好的分类效果。  相似文献   

11.
邮件监控是网络信息安全的一个重要方面。而监控得到的邮件的处理是一项困难的工作。本文提出并实现了一种应用于邮件监控的邮件处理方式。首先将邮件转换为结构性较强的XML文档,然后通过搜索过滤方式得到初步邮件集,在此基础上对邮件的不同节点应用基于内容的文本分类进一步对邮件进行类别划分。实验证明,该处理方式是行之有效的。  相似文献   

12.
Context‐based email classification requires understanding of semantic and structural attributes of email. Most of the research has focused on generating semantic properties through structural components of email. By viewing emails as events (as a major subset of class of email), a rich contextual test‐bed representation for understanding of the semantic attributes of emails has been devised. The event‐ based emails have traditionally been studied based on simple structural properties. In this paper, we present a novel approach by first representing such class of emails as graphs, followed by heuristically applying graph mining and matching algorithm to pick templates representing contextual and semantic attributes that help classify emails. The classification templates used three key event classes: social, personal and professional. Results show that our graph mining and matching supported template‐based approach performs consistently well over event email data set with high accuracy.  相似文献   

13.
Kristof  Dirk   《Decision Support Systems》2008,44(4):870-882
Customer complaint management is becoming a critical key success factor in today's business environment. This study introduces a methodology to improve complaint-handling strategies through an automatic email-classification system that distinguishes complaints from non-complaints. As such, complaint handling becomes less time-consuming and more successful. The classification system combines traditional text information with new information about the linguistic style of an email. The empirical results show that adding linguistic style information into a classification model with conventional text-classification variables results in a significant increase in predictive performance. In addition, this study reveals linguistic style differences between complaint emails and others.  相似文献   

14.
In this paper, we report our experience on the use of phrases as basic features in the email classification problem. We performed extensive empirical evaluation using our large email collections and tested with three text classification algorithms, namely, a naive Bayes classifier and two k-NN classifiers using TF-IDF weighting and resemblance respectively. The investigation includes studies on the effect of phrase size, the size of local and global sampling, the neighbourhood size, and various methods to improve the classification accuracy. We determined suitable settings for various parameters of the classifiers and performed a comparison among the classifiers with their best settings. Our result shows that no classifier dominates the others in terms of classification accuracy. Also, we made a number of observations on the special characteristics of emails. In particular, we observed that public emails are easier to classify than private ones.  相似文献   

15.
为了从大量的电子邮件中检测垃圾邮件,提出了一个基于Hadoop平台的电子邮件分类方法。不同于传统的基于内容的垃圾邮件检测,通过在Map Reduce框架上统计分析邮件收发记录,提取邮件账号的行为特征。然后使用Map Reduce框架并行的实现随机森林分类器,并基于带有行为特征的样本训练分类器和分类邮件。实验结果表明,基于Hadoop平台的电子邮件分类方法大大提高了大规模电子邮件的分类效率。  相似文献   

16.
Email classification and prioritization expert systems have the potential to automatically group emails and users as communities based on their communication patterns, which is one of the most tedious tasks. The exchange of emails among users along with the time and content information determine the pattern of communication. The intelligent systems extract these patterns from an email corpus of single or all users and are limited to statistical analysis. However, the email information revealed in those methods is either constricted or widespread, i.e. single or all users respectively, which limits the usability of the resultant communities. In contrast to extreme views of the email information, we relax the aforementioned restrictions by considering a subset of all users as multi-user information in an incremental way to extend the personalization concept. Accordingly, we propose a multi-user personalized email community detection method to discover the groupings of email users based on their structural and semantic intimacy. We construct a social graph using multi-user personalized emails. Subsequently, the social graph is uniquely leveraged with expedient attributes, such as semantics, to identify user communities through collaborative similarity measure. The multi-user personalized communities, which are evaluated through different quality measures, enable the email systems to filter spam or malicious emails and suggest contacts while composing emails. The experimental results over two randomly selected users from email network, as constrained information, unveil partial interaction among 80% email users with 14% search space reduction where we notice 25% improvement in the clustering coefficient.  相似文献   

17.
Without imposing restrictions, many enterprises find nonwork-related contents consuming network resources. Business communication over emails thus incurs undesired delays and inflicts damages to businesses, explaining why many enterprises are concerned with the competition to use email services. Obviously, enterprises should prioritize business emails over personal ones in their email service. Therefore, previous works present content-based classification methods to categorize enterprise emails into business or personal correspondence. Accuracy of these methods is largely determined by their ability to survey as much information as possible. However, in addition to decreasing the performance of these methods, monitoring the details of email contents may violate privacy rights that are under legal protection, requiring a careful balance of accurately classifying enterprise emails and protecting privacy rights. The proposed email classification method is thus based on social features rather than a survey of emails contents. Social-based metrics are also designed to characterize emails as social features; the obtained features are treated as an input of machine learning-based classifiers for email classification. Experimental results demonstrate the high accuracy of the proposed method in classifying emails. In contrast with other content-based methods that examine email contents, the emphasis on social features in the proposed method is a promising alternative for solving similar email classification problems.  相似文献   

18.
基于Linux防火墙的内部邮件监控系统   总被引:2,自引:0,他引:2  
给出了一种分布式邮件监控系统的设计和实现方案.该方案的实现建立在Linux防火墙的基础上.采用了Netfilter构架中的ip queue机制获取流经网关的邮件,根据SMTP和POP3的协议特点提取邮件内容,利用文本分类技术对邮件内容进行监控.系统中广泛地使用了插件机制,明确地划分了实时处理和离线分析两大类操作的界限.针对文本分类器的特点,系统定义了简明的接口,使不同算法的分类器可以方便地整合到系统当中来.该方案的实施可以有效地监控流经网关的邮件.  相似文献   

19.
基于内容粘合性的邮件分类   总被引:1,自引:0,他引:1  
廖玲  文敦伟 《计算机仿真》2008,25(2):121-123
电子邮件分类一般采用向量空间模型来表示邮件,但是该模型只是基于独立词在邮件内容中出现的频率来建立的,而并未考虑邮件的结构特征,从而使得特征向量不能准确地表示邮件的内容.针对目前向量空间模型出现的这种缺陷,文中将粘合性衡量方法提取n-gram的思想运用于文本表示当中,对词的权重进行赋值,并以此模型设计了一个邮件分类系统,由于粘合性方法考虑到了邮件的结构特征,实例证明,这种方法能够提高系统的分类精确度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号