首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric.  相似文献   

2.
Ontology languages such as OWL are being widely used as the Semantic Web movement gains momentum. With the proliferation of the Semantic Web, more and more large-scale ontologies are being developed in real-world applications to represent and integrate knowledge and data. There is an increasing need for measuring the complexity of these ontologies in order for people to better understand, maintain, reuse and integrate them. In this paper, inspired by the concept of software metrics, we propose a suite of ontology metrics, at both the ontology-level and class-level, to measure the design complexity of ontologies. The proposed metrics are analytically evaluated against Weyuker’s criteria. We have also performed empirical analysis on public domain ontologies to show the characteristics and usefulness of the metrics. We point out possible applications of the proposed metrics to ontology quality control. We believe that the proposed metric suite is useful for managing ontology development projects.  相似文献   

3.
ContextSoftware quality attributes are assessed by employing appropriate metrics. However, the choice of such metrics is not always obvious and is further complicated by the multitude of available metrics. To assist metrics selection, several properties have been proposed. However, although metrics are often used to assess successive software versions, there is no property that assesses their ability to capture structural changes along evolution.ObjectiveWe introduce a property, Software Metric Fluctuation (SMF), which quantifies the degree to which a metric score varies, due to changes occurring between successive system's versions. Regarding SMF, metrics can be characterized as sensitive (changes induce high variation on the metric score) or stable (changes induce low variation on the metric score).MethodSMF property has been evaluated by: (a) a case study on 20 OSS projects to assess the ability of SMF to differently characterize different metrics, and (b) a case study on 10 software engineers to assess SMF's usefulness in the metric selection process.ResultsThe results of the first case study suggest that different metrics that quantify the same quality attributes present differences in their fluctuation. We also provide evidence that an additional factor that is related to metrics’ fluctuation is the function that is used for aggregating metric from the micro to the macro level. In addition, the outcome of the second case study suggested that SMF is capable of helping practitioners in metric selection, since: (a) different practitioners have different perception of metric fluctuation, and (b) this perception is less accurate than the systematic approach that SMF offers.ConclusionsSMF is a useful metric property that can improve the accuracy of metrics selection. Based on SMF, we can differentiate metrics, based on their degree of fluctuation. Such results can provide input to researchers and practitioners in their metric selection processes.  相似文献   

4.
论述了网页文档带权语言网络的建立过程,给出了结合介数指标与紧密度指标的词语综合中心度度量方法,实验表明采用该方法的关键词抽取结果能够很好地符合网页主题。  相似文献   

5.
The failure of Web applications often affects a large population of customers, and leads to severe economic loss. Anomaly detection is essential for improving the reliability of Web applications. Current approaches model correlations among metrics, and detect anomalies when the correlations are broken. However, dynamic workloads cause the metric correlations to change over time. Moreover, modeling various metric correlations are difficult in complex Web applications. This paper addresses these problems and proposes an online anomaly detection approach for Web applications. We present an incremental clustering algorithm for training workload patterns online, and employ the local outlier factor (LOF) in the recognized workload pattern to detect anomalies. In addition, we locate the anomalous metrics with the Student's t-test method. We evaluated our approach on a testbed running the TPC-W industry-standard benchmark. The experimental results show that our approach is able to (1) capture workload fluctuations accurately, (2) detect typical faults effectively and (3) has advantages over two contemporary ones in accuracy.  相似文献   

6.
Measuring the characteristics of visually emphasized objects displayed on a screen seems to be a promising way to rate user interface quality. On the other hand, it brings us problems regarding the ambiguity of object recognition caused by the subjective perception of the users. The goal of this research is to analyze the applicability of chosen object-based metrics for the evaluation of dashboard quality and the ability to distinguish well-design samples, with the focus on the subjective perception of the users. This article presents the model for the rating and classification of object-based metrics according to their ability to objectively distinguish well-designed dashboards. We use the model to rate 13 existing object-based metrics of aesthetics. Then, we present a new approach for the improvement of the rating of one object-based metric—Balance. We base the improvement on the combination of the object-based metric with the pixel-based analysis of color distribution on the screen.  相似文献   

7.
As is popular in e-commerce, consumers share their product experiences and opinions on the Web by assigning rating stars or writing reviews. The information constitutes word of mouth (WOM) about products. An increasing number of studies have sought to understand how WOM metrics are related to product sales. However, current research focuses mainly on single-product-oriented WOM metrics, which do not consider the complex relationships between products. Given the underlying influential impacts between related products, we propose a market-structure-based WOM metric that integrates the product comparison network and transitive influence measures. An empirical study based on data from Amazon.com shows that the proposed transitive WOM metric outperforms other traditional WOM metrics on predicting product sales, and its unique features are also demonstrated. The findings provide important insights into social influence theory and electronic commerce research. In practice, the research provides a method of measuring product WOM from a whole-market perspective, which is especially important for market structure analysis.  相似文献   

8.
One purpose of software metrics is to measure the quality of programs. The results can be for example used to predict maintenance costs or improve code quality. An emerging view is that if software metrics are going to be used to improve quality, they must help in finding code that should be refactored. Often refactoring or applying a design pattern is related to the role of the class to be refactored. In client-based metrics, a project gives the class a context. These metrics measure how a class is used by other classes in the context. We present a new client-based metric LCIC (Lack of Coherence in Clients), which analyses if the class being measured has a coherent set of roles in the program. Interfaces represent the roles of classes. If a class does not have a coherent set of roles, it should be refactored, or a new interface should be defined for the class.We have implemented a tool for measuring the metric LCIC for Java projects in the Eclipse environment. We calculated LCIC values for classes of several open source projects. We compare these results with results of other related metrics, and inspect the measured classes to find out what kind of refactorings are needed. We also analyse the relation of different design patterns and refactorings to our metric. Our experiments reveal the usefulness of client-based metrics to improve the quality of code.  相似文献   

9.
The semantically associated network on the Web is a Semantic Link Network built by mining the associated relation between Web pages. The associated link from page A to page B indicates that users who have browsed page A is likely to also browse page B. This paper explores the statistical properties of the associated network on the Web. Web pages of a specific domain are automatically downloaded by a Web crawler to build an associated network. We analyze the associated network at different domain thresholds and classify the topology into three states, that is, the original state, the kernel state and the final state. A mathematical model is built to study the in‐degree distribution, the out‐degree distribution and the total‐degree distribution for both the kernel state and the final state. By tuning the model parameters to reasonable values, we obtain the distinct power‐law forms for the three degree distributions with exponents that agree well with the statistical data. The proposed model can not only describe the evolving processes of the associated network on the Web, but also provides theory basis for complex applications such as semantic community discovery, intelligent browsing and recommendation. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

10.
In Web search, with the aid of related query recommendation, Web users can revise their initial queries in several serial rounds in pursuit of finding needed Web pages. In this paper, we address the Web search problem on aggregating search results of related queries to improve the retrieval quality. Given an initial query and the suggested related queries, our search system concurrently processes their search result lists from an existing search engine and then forms a single list aggregated by all the retrieved lists. We specifically propose a generic rank aggregation framework which consists of three steps. First we build a so-called Win/Loss graph of Web pages according to a competition rule, and then apply the random walk mechanism on the Win/Loss graph. Last we sort these Web pages by their ranks using a PageRank-like rank mechanism. The proposed framework considers not only the number of wins that an item won in competitions, but also the quality of its competitor items in calculating the ranking of Web page items. Experimental results show that our search system can clearly improve the retrieval quality in a parallel manner over the traditional search strategy that serially returns result lists. Moreover, we also provide empirical evidences as to demonstrate how different rank aggregation methods affect the retrieval quality.  相似文献   

11.
Adapting Web pages for small-screen devices   总被引:3,自引:0,他引:3  
We propose a page-adaptation technique that splits existing Web pages into smaller, logically related units. To do this, we must first solve two technical problems: how to detect an existing Web page's semantic structure, and how to split a Web page into smaller blocks based on that structure. To date, we've implemented our technique in Web browsers for mobile devices, in a proxy server for adapting Web pages on the fly, and as an authoring tool plug-in for converting existing Web pages. The Web page can then be adapted to form a two-level hierarchy with a thumbnail representation at the top level for providing a global view and an index to a set of subpages at the bottom level for detailed information.  相似文献   

12.
Typical depth quality metrics require the ground truth depth image or stereoscopic color image pair, which are not always available in many practical applications. In this paper, we propose a new depth image quality metric which demands only a single pair of color and depth images. Our observations reveal that the depth distortion is strongly related to the local image characteristics, which in turn leads us to formulate a new distortion assessment method for the edge and non-edge pixels in the depth image. The local depth distortion is adaptively weighted using the Gabor filtered color image and added up to the global depth image quality metric. The experimental results show that the proposed metric closely approximates the depth quality metrics that use the ground truth depth or stereo color image pair.  相似文献   

13.
随着Web技术的发展和Web上越来越多的各种信息,如何提供高质量、相关的查询结果成为当前Web搜索引擎的一个巨大挑战.PageRank和HITS是两个最重要的基于链接的排序算法并在商业搜索引擎中使用.然而,在PageRank算法中,每个网页的PR值被平均地分配到它所指向的所有网页,网页之间的质量差异被完全忽略.这样的算法很容易被当前的Web SPAM攻击.基于这样的认识,提出了一个关于PageRank算法的改进,称为Page Quality Based PageRank(QPR)算法.QPR算法动态地评估每个网页的质量,并根据网页的质量对每个网页的PR值做相应公平的分配.在多个不同特性的数据集上进行了全面的实验,实验结果显示,提出的QPR算法能大大提高查询结果的排序,并能有效减轻SPAM网页对查询结果的影响.  相似文献   

14.
The perception of the visual complexity of World Wide Web (Web) pages is a topic of significant interest. Previous work has examined the relationship between complexity and various aspects of presentation, including font styles, colours and images, but automatically quantifying this dimension of a web page at the level of the document remains a challenge. In this paper we demonstrate that areas of high complexity can be identified by detecting areas, or ‘chunks’, of a web page high in block-level elements. We report a computational algorithm that captures this metric and places web pages in a sequence that shows an 86% correlation with the sequences generated through user judgements of complexity. The work shows that structural aspects of a web page influence how complex a user perceives it to be, and presents a straightforward means of determining complexity through examining the DOM.  相似文献   

15.
16.
Improving pattern quality in web usage mining by using semantic information   总被引:1,自引:1,他引:0  
Frequent Web navigation patterns generated by using Web usage mining techniques provide valuable information for several applications such as Web site restructuring and recommendation. In conventional Web usage mining, semantic information of the Web page content does not take part in the pattern generation process. In this work, we investigate the effect of semantic information on the patterns generated for Web usage mining in the form of frequent sequences. To this aim, we developed a technique and a framework for integrating semantic information into Web navigation pattern generation process, where frequent navigational patterns are composed of ontology instances instead of Web page addresses. The quality of the generated patterns is measured through an evaluation mechanism involving Web page recommendation. Experimental results show that more accurate recommendations can be obtained by including semantic information in navigation pattern generation, which indicates the increase in pattern quality.  相似文献   

17.
《Computer Networks》1999,31(11-16):1467-1479
When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers.We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's `What's Related' service (http://home.netscape.com/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information.  相似文献   

18.
Machine learning offers a systematic framework for developing metrics that use multiple criteria to assess the quality of machine translation (MT). However, learning introduces additional complexities that may impact on the resulting metric’s effectiveness. First, a learned metric is more reliable for translations that are similar to its training examples; this calls into question whether it is as effective in evaluating translations from systems that are not its contemporaries. Second, metrics trained from different sets of training examples may exhibit variations in their evaluations. Third, expensive developmental resources (such as translations that have been evaluated by humans) may be needed as training examples. This paper investigates these concerns in the context of using regression to develop metrics for evaluating machine-translated sentences. We track a learned metric’s reliability across a 5 year period to measure the extent to which the learned metric can evaluate sentences produced by other systems. We compare metrics trained under different conditions to measure their variations. Finally, we present an alternative formulation of metric training in which the features are based on comparisons against pseudo-references in order to reduce the demand on human produced resources. Our results confirm that regression is a useful approach for developing new metrics for MT evaluation at the sentence level.  相似文献   

19.
Given a user keyword query, current Web search engines return a list of individual Web pages ranked by their "goodness" with respect to the query. Thus, the basic unit for search and retrieval is an individual page, even though information on a topic is often spread across multiple pages. This degrades the quality of search results, especially for long or uncorrelated (multitopic) queries (in which individual keywords rarely occur together in the same document), where a single page is unlikely to satisfy the user's information need. We propose a technique that, given a keyword query, on the fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by extracting and stitching together relevant pieces from hyperlinked Web pages and retaining links to the original Web pages. To rank the composed pages, we consider both the hyperlink structure of the original pages and the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages. The quality of our method is compared to current approaches by using user surveys. Finally, we also show how our techniques can be used to perform query-specific summarization of Web pages.  相似文献   

20.
Lines of code metrics are routinely used as measures of software system complexity, programmer productivity, and defect density, and are used to predict both effort and cost. The guidelines for using a direct metric, such as lines of code, as a proxy for a quality factor such as complexity or defect density, or in derived metrics such as cost and effort are clear. Amongst other criteria, the direct metric must be linearly related to, and accurately predict, the quality factor and these must be validated through statistical analysis following a rigorous validation methodology. In this paper, we conduct such an analysis to determine the validity and utility of lines of code as a measure using the ISBGS-10 data set. We find that it fails to meet the specified validity tests and, therefore, has limited utility in derived measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号