首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The Internet, web consumers and computing systems have become more vulnerable to cyber-attacks. Malicious uniform resource locator (URL) is a prominent cyber-attack broadly used with the intention of data, money or personal information stealing. Malicious URLs comprise phishing URLs, spamming URLs, and malware URLs. Detection of malicious URL and identification of their attack type are important to thwart such attacks and to adopt required countermeasures. The proposed methodology for detection and categorization of malicious URLs uses stacked restricted Boltzmann machine for feature selection with deep neural network for binary classification. For multiple classes, IBK-kNN, Binary Relevance, and Label Powerset with SVM are used for classification. The approach is tested with 27700 URL samples and the results demonstrate that the deep learning-based feature selection and classification techniques are able to quickly train the network and detect with reduced false positives.  相似文献   

2.
吴峻  李洋 《计算机应用研究》2008,25(5):1537-1539
在深入分析了传统垃圾邮件过滤技术不足的基础上,提出并实现了一种新型的基于URL过滤的垃圾邮件过滤技术(URLbased spamfiltering,UBSF)。该方法通过对比从到来邮件中提取的URL与URL库中存储的URL信息的相似性来判定垃圾邮件。通过语料库以及构建实际系统原型的测试,表明该方法具有准确性高、误报率低以及实时处理速度快的优点。  相似文献   

3.
网络攻击日益成为一个严重的问题.在这些攻击中,恶意URLs经常扮演着重要角色,并被广泛应用到各种类型的攻击,比如钓鱼、垃圾邮件以及恶意软件中.检测恶意链接对于阻止这些攻击具有重要意义.多种技术被应用于恶意URLs的检测,而近年来基于机器学习的方法得到越来越多的重视.但传统的机器学习算法需要大量的特征预处理工作,非常耗时耗力.在本文中,我们提出了一个完全基于词法特征的检测方法.首先,我们训练一个2层的神经网络,得到URLs中的字符的分布表示,然后训练对URL的分布表示生成的特征图像进行分类.在我们的试验中,使用真实数据,取得了精度为0.973和F1为0.918的结果.  相似文献   

4.
梁志荣 《微计算机信息》2006,22(21):291-293
由于传统的筛选方法在执行效率和准确率上都不能满足网页信息提取的需要,我们提出了一种新的基于领域知识的筛选方法。这种方法不仅通过网页的内容进行筛选,而且利用网页特点,通过URL聚类的方法进行筛选。实验表明,这种方法提取效率和准确性都要高于传统的算法。  相似文献   

5.
Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results. Cloaking is a widely adopted technique of concealing web spam by replying different content to search engines’ crawlers from that displayed in a web browser. Previous work on cloaking detection is mainly based on the differences in terms and/or links between multiple copies of a URL retrieved from web browser and search engine crawler perspectives. This work presents three methods of using difference in tags to determine whether a URL is cloaked. Since the tags of a web page generally do not change as frequently and significantly as the terms and links of the web page, tag-based cloaking detection methods can work more effectively than the term- or link-based methods. The proposed methods are tested with a dataset of URLs covering short-, medium- and long-term users’ interest. Experimental results indicate that the tag-based methods outperform term- or link-based methods in both precision and recall. Moreover, a Weka J4.8 classifier using a combination of term and tag features yields an accuracy rate of 90.48%.  相似文献   

6.
针对网页安全威胁的动态性、广泛性等特点,设计了一个基于蜜罐在线恶意网页检测系统。该系统使用URL数据表来记录网页地址,同时结合蜜罐技术对URL数据表不存在或存在但还需进行检测的网页进行综合检测,实时检测出用户需要浏览的网页的安全状态,避免恶意网页的攻击,从而提高人们网络活动的安全性。  相似文献   

7.
基于异常特征的钓鱼网站URL检测技术   总被引:1,自引:0,他引:1  
典型的网络钓鱼是采用群发垃圾邮件,欺骗用户点击钓鱼网站URL地址,登录并输入个人机密信息的一种攻击手段。文章通过分析钓鱼网站URL地址的结构和词汇特征,提出一种基于异常特征的钓鱼网站URL检测方法。抽取钓鱼网站URL地址中4个结构特征、8个词汇特征,组成12个特征的特征向量,用SVM进行训练和分类。对PhishTank上7291条钓鱼网站URL分类实验,检测出7134条钓鱼网站URL,准确率达到97.85%。  相似文献   

8.
汪鑫  武杨  卢志刚 《计算机科学》2018,45(3):124-130, 170
互联网应用已经渗透到人们日常生活的方方面面,恶意URL防不胜防,给人们的财产和隐私带来了严重威胁。当前主流的防御方法主要依靠黑名单机制, 难以检测 黑名单以外的URL。因此,引入机器学习来优化恶意URL检测是一个主要的研究方向,但其主要受限于URL的短文本特性,导致提取的特征单一,从而使得检测效果较差。针对上述挑战,设计了一个基于威胁情报平台的恶意URL检测系统。该系统针对URL字符串提取了结构特征、情报特征和敏感词特征3类特征来训练分类器,然后采用多分类器投票机制来判断类别,并实现威胁情报的自动更新。实验结果表明,该方法对恶意URL进行检测 的准确率 达到了96%以上。  相似文献   

9.
Malicious online advertisement detection has attracted increasing attention in recent years in both academia and industry. The existing advertising blocking systems are vulnerable to the evolution of new attacks and can cause time latency issues by analyzing web content or querying remote servers. This article proposes a lightweight detection system for advertisement Uniform resource locators (URLs) detection, depending only on lexical‐based features. Deep learning algorithms are used for online advertising classification. After optimizing the deep neural network architecture, our proposed approach can achieve satisfactory results with false negative rate as low as 1.31%. We also design a novel unsupervised method for data clustering. With the implementation of AutoEncoder for feature preprocessing and t‐distributed stochastic neighbor embedding for clustering and visualization, our model outperforms other dimensionality reduction algorithms by generating clear clusterings for different URL families.  相似文献   

10.
Web集群服务的请求分配算法大多使用Hash方法对请求URL进行散列,并按一定规则对请求内容进行负载均衡调度.提出了一种基于URL词典排序及全部URL按其词典序列分为k*n个集合的URLALLOC算法.该算法通过对URL进行词典序排序并将全部URL按其词典序列分为k*n个集合,依访问流量排序及分段互补等一系列方法将Web负载尽可能均匀地分布到多个后端服务器中.仿真实验结果表明:URLALLOC算法比现有的URL散列方法具有更好的负载均衡能力.  相似文献   

11.

The development of digitization over the globe has made digital security inescapable. As every single article on this planet is being digitalized quickly, it is more important to protect those items. Numerous cyber threats effectively deceive ordinary individuals to take away their identifications. Phishing is a kind of social engineering attack where the hackers are using this kind of attack in modern days to steal the user's credentials. After a systematic research analysis of phishing technique and email scam, an intrusion detection system in chrome extension is developed. This technique is used to detect real-time phishing by examining the URL, domain, content and page attributes of an URL prevailing in an email and any web page portion. Considering the reliability, robustness and scalability of an efficient phishing detection system, we designed a lightweight and proactive rule-based incremental construction approach to detect any unknown phishing URLs. Due to the computational intelligence and nondependent of the blacklist signatures, this application can detect the zero-day and spear phishing attacks with a detection rate of 89.12% and 76.2%, respectively. The true positive values acquired in our method is 97.13% and it shows less than 1.5% of false positive values. Thus the application shows the precision level higher than the previous model developed and other phishing techniques. The overall results indicate that our framework outperforms the existing method in identifying phishing URLs.

  相似文献   

12.
Spam Filtering With Dynamically Updated URL Statistics   总被引:3,自引:0,他引:3  
Many URL-based spam filters rely on "white" and "black" lists to classify email. The authors' proposed URL-based spam filter instead analyzes URL statistics to dynamically calculate the probabilities of whether email with specific URLs are spam or legitimate, and then classifies them accordingly.  相似文献   

13.
基于BHO的网站过滤系统研究与实现   总被引:1,自引:0,他引:1       下载免费PDF全文
该文利用浏览器助手对象(BHO)插件技术,提出了网站过滤系统的系统结构设计和功能模块设计方案,并对网站过滤系统的过滤技术、反破坏技术和键盘监控等关键技术进行研究,重点讨论了敏感图像检测技术。系统使用网址过滤、文本内容过滤和敏感图像过滤等三重过滤机制,可为用户提供可靠安全的上网环境。实验结果验证了该系统的有效性。  相似文献   

14.
The main purpose of most spam e-mail messages distributed on Internet today is to entice recipients into visiting World Wide Web pages that are advertised through spam. In essence, e-mail spamming is a campaign that advertises URL addresses at a massive scale and at minimum cost for the advertisers and those advertised. Nevertheless, the characteristics of URL addresses and of web sites advertised through spam have not been studied extensively. In this paper, we investigate the properties of URL-dissemination through spam e-mail, and the characteristics of URL addresses disseminated through spam. We conclude that spammers advertise URL addresses non-repetitively and that spam-advertised URLs are short-lived, elusive, and therefore hard to detect and filter. We also observe that reputable URL addresses are sometimes used as decoys against e-mail users and spam filters. These observations can be valuable for the configuration of spam filters and in order to drive the development of new techniques to fight spam.  相似文献   

15.
URL重写技术在动态网站优化中的应用研究   总被引:1,自引:0,他引:1  
本文分析了动态URL地址在搜索引擎收录方面存在的问题,提出了URL地址重写的优化方案。文中探讨了ISAPI_Rewrite组件进行重写的应用,并给出了解决重写导致的事件回传错误的方法。设计了一个用于生成静态地址的URL地址工厂类,实现了网站中所有的动态地址的统一转换。  相似文献   

16.
Fraudulent and malicious sites on the web   总被引:1,自引:1,他引:0  
Fraudulent and malicious web sites pose a significant threat to desktop security, integrity, and privacy. This paper examines the threat from different perspectives. We harvested URLs linking to web sites from different sources and corpora, and conducted a study to examine these URLs in-depth. For each URL, we extract its domain name, determine its frequency, IP address and geographic location, and check if the web site is accessible. Using 3 search engines (Google, Yahoo!, and Windows Live), we check if the domain name appears in the search results; and using McAfee SiteAdvisor, we determine the domain name’s safety rating. Our study shows that users can encounter URLs pointing to fraudulent and malicious web sites not only in spam and phishing messages but in legitimate email messages and the top search results returned by search engines. To provide better countermeasures against these threats, we present a proxy-based approach to dynamically block access to fraudulent and malicious web sites based on the safety ratings set by McAfee SiteAdvisor.  相似文献   

17.
Web page classification has become a challenging task due to the exponential growth of the World Wide Web. Uniform Resource Locator (URL)‐based web page classification systems play an important role, but high accuracy may not be achievable as URL contains minimal information. Nevertheless, URL‐based classifiers along with rejection framework can be used as a first‐level filter in a multistage classifier, and a costlier feature extraction from contents may be done in later stages. However, noisy and irrelevant features present in URL demand feature selection methods for URL classification. Therefore, we propose a supervised feature selection method by which relevant URL features are identified using statistical methods. We propose a new feature weighting method for a Naive Bayes classifier by embedding the term goodness obtained from the feature selection method. We also propose a rejection framework to the Naive Bayes classifier by using posterior probability for determining the confidence score. The proposed method is evaluated on the Open Directory Project and WebKB data sets. Experimental results show that our method can be an effective first‐level filter. McNemar tests confirm that our approach significantly improves the performance.  相似文献   

18.
Focused web crawlers collect topic-related web pages from the Internet. Using Q learning and semi-supervised learning theories, this study proposes an online semi-supervised clustering approach for topical web crawlers (SCTWC) to select the most topic-related URL to crawl based on the scores of the URLs in the unvisited list. The scores are calculated based on the fuzzy class memberships and the Q values of the unlabelled URLs. Experimental results show that SCTWC increases the crawling performance.  相似文献   

19.
Da Huang  Kai Xu  Jian Pei 《World Wide Web》2014,17(6):1375-1394
Detecting malicious URLs is an essential task in network security intelligence. In this paper, we make two new contributions beyond the state-of-the-art methods on malicious URL detection. First, instead of using any pre-defined features or fixed delimiters for feature selection, we propose to dynamically extract lexical patterns from URLs. Our novel model of URL patterns provides new flexibility and capability on capturing malicious URLs algorithmically generated by malicious programs. Second, we develop a new method to mine our novel URL patterns, which are not assembled using any pre-defined items and thus cannot be mined using any existing frequent pattern mining methods. Our extensive empirical study using the real data sets from Fortinet, a leader in the network security industry, clearly shows the effectiveness and efficiency of our approach.  相似文献   

20.
The data in the cloud is protected by various mechanisms to ensure security aspects and user’s privacy. But, deceptive attacks like phishing might obtain the user’s data and use it for malicious purposes. In Spite of much technological advancement, phishing acts as the first step in a series of attacks. With technological advancements, availability and access to the phishing kits has improved drastically, thus making it an ideal tool for the hackers to execute the attacks. The phishing cases indicate use of foreign characters to disguise the original Uniform Resource Locator (URL), typosquatting the popular domain names, using reserved characters for re directions and multi-chain phishing. Such phishing URLs can be stored as a part of the document and uploaded in the cloud, providing a nudge to hackers in cloud storage. The cloud servers are becoming the trusted tool for executing these attacks. The prevailing software for blacklisting phishing URLs lacks the security for multi-level phishing and expects security from the client’s end (browser). At the same time, the avalanche effect and immutability of block-chain proves to be a strong source of security. Considering these trends in technology, a block-chain based filtering implementation for preserving the integrity of user data stored in the cloud is proposed. The proposed Phish Block detects the homographic phishing URLs with accuracy of 91% which assures the security in cloud storage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号