首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Automatically countering imbalance and its empirical relationship to cost   总被引:3,自引:1,他引:3  
Learning from imbalanced data sets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in large-scale simulations, there is a correspondingly high cost for the misclassification of rare events. Under such circumstances, the data set is often re-sampled to generate models with high minority class accuracy. However, the sampling methods face a common, but important, criticism: how to automatically discover the proper amount and type of sampling? To address this problem, we propose a wrapper paradigm that discovers the amount of re-sampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve (AUROC), cost, cost-curves, and the cost dependent f-measure. Our analysis of the wrapper is twofold. First, we report the interaction between different evaluation and wrapper optimization functions. Second, we present a set of results in a cost- sensitive environment, including scenarios of unknown or changing cost matrices. We also compared the performance of the wrapper approach versus cost-sensitive learning methods—MetaCost and the Cost-Sensitive Classifiers—and found the wrapper to outperform the cost-sensitive classifiers in a cost-sensitive environment. Lastly, we obtained the lowest cost per test example compared to any result we are aware of for the KDD-99 Cup intrusion detection data set.  相似文献   

2.
This paper juxtaposes the probability matching paradox of decision theory and the magnitude of reinforcement problem of animal learning theory to show that simple classifier system bidding structures are unable to match the range of behaviors required in the deterministic and probabilistic problems faced by real cognitive systems. The inclusion of a variance-sensitive bidding (VSB) mechanism is suggested, analyzed, and simulated to enable good bidding performance over a wide range of nonstationary probabilistic and deterministic environments.  相似文献   

3.
Tan  Ming 《Machine Learning》1993,13(1):7-33
Traditional learning-from-examples methods assume that examples are given beforehand and all features are measured for each example. However, in many robotic domains the number of features that could be measured is very large, the cost of measuring those features is significant, and thus the robot must judiciously select which features it will measure. Finding a proper tradeoff between theaccuracy (e.g., number of prediction errors) andefficiency (e.g., cost of measuring features) during learning (prior to convergence) is an important part of the problem. Inspired by such robotic domains, this article considers realistic measurement costs of features in the process of incremental learning of classification knowledge. It proposes a unified framework for learning-from-examples methods that trade off accuracy for efficiency during learning, and analyzes two methods (CS-ID3 and CS-IBL) in detail. Moreover, this article illustrates the application of such a cost-sensitive-learning method to a real robot designed for anapproach-recognize task. The resulting robot learns to approach, recognize, and grasp objects on a floor effectively and efficiently. Experimental results show that highly accurate classification procedures can be learned without sacrificing efficiency in the case of both synthetic and real domains.  相似文献   

4.
Electronic aggression, or cyberbullying, is a relatively new phenomenon. As such, consistency in how the construct is defined and operationalized has not yet been achieved, inhibiting a thorough understanding of the construct and how it relates to developmental outcomes. In a series of two studies, exploratory and confirmatory factor analyses (EFAs and CFAs respectively) were used to examine whether electronic aggression can be measured using items similar to that used for measuring traditional bullying, and whether adolescents respond to questions about electronic aggression in the same way they do for traditional bullying. For Study I (n = 17 551; 49% female), adolescents in grades 8-12 were asked to what extent they had experience with physical, verbal, social, and cyberbullying as a bully and victim. EFA and CFA results revealed that adolescents distinguished between the roles they play (bully, victim) in a bullying situation but not forms of bullying (physical, verbal, social, cyber). To examine this further, Study II (n = 733; 62% female), asked adolescents between the ages of 11 and 18 to respond to questions about their experience sending (bully), receiving (victim), and/or seeing (witness) specific online aggressive acts. EFA and CFA results revealed that adolescents did not differentiate between bullies, victims, and witnesses; rather, they made distinctions among the methods used for the aggressive act (i.e. sending mean messages or posting embarrassing pictures). In general, it appears that adolescents differentiated themselves as individuals who participated in specific mode of online aggression, rather than as individuals who played a particular role in online aggression. This distinction is discussed in terms of policy and educational implications.  相似文献   

5.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

6.
Due to the wide variety of fusion techniques available for combining multiple classifiers into a more accurate classifier, a number of good studies have been devoted to determining in what situations some fusion methods should be preferred over other ones. However, the sample size behavior of the various fusion methods has hitherto received little attention in the literature of multiple classifier systems. The main contribution of this paper is thus to investigate the effect of training sample size on their relative performance and to gain more insight into the conditions for the superiority of some combination rules.A large experiment is conducted to study the performance of some fixed and trainable combination rules for executing one- and two-level classifier fusion for different training sample sizes. The experimental results yield the following conclusions: when implementing one-level fusion to combine homogeneous or heterogeneous base classifiers, fixed rules outperform trainable ones in nearly all cases, with only one exception of merging heterogeneous classifiers for large sample size. Moreover, the best classification for any considered sample size is generally achieved by a second level of combination (namely, utilizing one fusion rule to further combine a set of ensemble classifiers with each of them constructed by fusing base classifiers). Under these circumstances, it seems that adopting different types of fusion rules (fixed or trainable) as the combiners for two levels of fusion is appropriate.  相似文献   

7.
As a broad subfield of artificial intelligence, machine learning is concerned with the development of algorithms and techniques that allow computers to learn. These methods such as fuzzy logic, neural networks, support vector machines, decision trees and Bayesian learning have been applied to learn meaningful rules; however, the only drawback of these methods is that it often gets trapped into a local optimal. In contrast with machine learning methods, a genetic algorithm (GA) is guaranteeing for acquiring better results based on its natural evolution and global searching. GA has given rise to two new fields of research where global optimization is of crucial importance: genetic based machine learning (GBML) and genetic programming (GP). This article adopts the GBML technique to provide a three-phase knowledge extraction methodology, which makes continues and instant learning while integrates multiple rule sets into a centralized knowledge base. Moreover, the proposed system and GP are both applied to the theoretical and empirical experiments. Results for both approaches are presented and compared. This paper makes two important contributions: (1) it uses three criteria (accuracy, coverage, and fitness) to apply the knowledge extraction process which is very effective in selecting an optimal set of rules from a large population; (2) the experiments prove that the rule sets derived by the proposed approach are more accurate than GP.  相似文献   

8.
Cost curves: An improved method for visualizing classifier performance   总被引:10,自引:0,他引:10  
This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier's performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors. Editors: Tom Faweett  相似文献   

9.
In this research, we propose two new clustering algorithms, the improved competitive learning network (ICLN) and the supervised improved competitive learning network (SICLN), for fraud detection and network intrusion detection. The ICLN is an unsupervised clustering algorithm, which applies new rules to the standard competitive learning neural network (SCLN). The network neurons in the ICLN are trained to represent the center of the data by a new reward-punishment update rule. This new update rule overcomes the instability of the SCLN. The SICLN is a supervised version of the ICLN. In the SICLN, the new supervised update rule uses the data labels to guide the training process to achieve a better clustering result. The SICLN can be applied to both labeled and unlabeled data and is highly tolerant to missing or delay labels. Furthermore, the SICLN is capable to reconstruct itself, thus is completely independent from the initial number of clusters.To assess the proposed algorithms, we have performed experimental comparisons on both research data and real-world data in fraud detection and network intrusion detection. The results demonstrate that both the ICLN and the SICLN achieve high performance, and the SICLN outperforms traditional unsupervised clustering algorithms.  相似文献   

10.
Within the context of an introductory CS1 unit on algorithmic problem-solving, we are exploring the pedagogical value of a novel active learning activity—the “studio experience”—that actively engages learners with algorithm visualization technology. In a studio experience, student pairs are tasked with (a) developing a solution to an algorithm design problem, (b) constructing an accompanying visualization with a storyline, and finally (c) presenting that visualization for feedback and discussion in a session modeled after an architectural “design crit.” Is a studio experience educationally valuable? What kind of technology can best support it? To explore these questions, we conducted an empirical study of two alternative CS1 studio experiences in which students used one of two different kinds of algorithm development and visualization technology: (a) a text editor coupled with art supplies, or (b) ALVIS Live!, a computer-based algorithm development and visualization tool. We found that the students who used ALVIS Live! developed algorithms with significantly fewer semantic errors. Moreover, discussions mediated by ALVIS Live! had significantly more student audience contributions, and retained a sharper focus on the specific details of algorithm behavior, leading to the collaborative identification and repair of semantic errors. In addition, discussions mediated by both ALVIS Live! and art supplies contained substantial evidence of higher order thinking. Based on our results, we make recommendations for educators interested in exploring studio-based approaches, and we propose an agenda for future research into studio-based learning in computer science education.  相似文献   

11.
Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Na?¨ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM’s. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.  相似文献   

12.
The purpose of this study was to consider the efficacy and popularity of “Virtual Lectures” (text-based, structured electronic courseware with information presented in manageable “chunks”, interaction and multimedia) and “e-Lectures” (on-screen synchrony of PowerPoint slides and recorded voice) as alternatives to traditional lectures. We considered how three modes of delivery compare when increasingly deeper forms of learning are assessed and also student reaction to electronic delivery. Fifty-eight students in three groups took three topics of a human genetics module, one in each delivery style. Results indicated no overall greater efficacy of either delivery style when all question types were taken into account but significantly different delivery-specific results depending on which level of Bloom’s taxonomy was assessed. That is, overall, questions assessing knowledge consistently achieved the highest marks followed by analysis, comprehension, evaluation and application. Students receiving traditional lectures scored significantly lower marks for comprehension questions. Students receiving Virtual Lectures scored high for knowledge, comprehension and application but significantly lower for analysis and evaluation questions. The e-Lectures scored high for knowledge questions and were the median for all question types except application. Questionnaire analysis revealed a preference for traditional lectures over computer-based but nevertheless an appreciation of the advantages offered by them.  相似文献   

13.
网络流量异常检测中分类器的提取与训练方法研究   总被引:2,自引:0,他引:2  
郑黎明  邹鹏  贾焰  韩伟红 《计算机学报》2012,35(4):719-729,827
随着网络安全领域研究的不断深入,研究者提出了各种类型的流量异常检测方法,基于分类的方法是其中很重要的一类.但是因为网络环境的多样性和动态变化性,在训练数据集上具有很高精度的检测系统实际部署时可能出现大量的误报.文中针对训练模型难于获取以及部署环境的动态变化性问题,对分类器的选择、使用和训练方法进行了研究.首先把网络流量数据投影到不同维度的Hash直方图上构建检测向量,在检测向量的基础上对比了各类分类器,选用能够处理高维数据、泛化能力强的SVDD进行异常检测;采用增减式在线训练算法对分类器进行不断训练,提高异常检测系统的精度并减少训练成本;最后采用多步关联检测算法优化检测精度,并在新增样本中剔除明显的异常样本,减少训练成本提高分类精度.通过大量的真实网络流量数据验证了上述方法具有较高的检准率和较低的误报率,并能够有效减少训练成本.  相似文献   

14.
Although the use of ATMs in the U.S. is approaching saturation, ATM deployment is still increasing in other parts of the world, such as Australia, China, Canada, Germany, and the United Kingdom. The relationship between IT investment and firm performance has been extensively studied, but few researchers have examined the impact of self-service technology (SST) in general and ATMs in particular on a firm's cost efficiency. Given the growing importance of SSTs in the banking industry, it is thus surprising that few studies have examined the impact of SST on banking cost efficiency. We therefore empirically examined the cost efficiency effects on ATMs.  相似文献   

15.
Project management is vital to the effective application of organizational resources to competing demands within and across projects. The effective application of project management, however, is predicated upon accurate estimates of the project budget and schedule. This study assesses primary and supporting activities that exploit knowledge within an organization's memory to develop project schedule durations and budgets. The study further assesses the subsequent impact of predictability on project success. Two hundred and sixteen survey responses from IT professionals with project management responsibilities were analyzed. Results found use of the primary activities of using parametric estimating techniques (use of formal models), bottom-up estimating techniques (formulating estimates at the task level), and the support activities of team reliance, realistic targets, and professional experience all impact the predictability of estimates for project cost and duration. Predictability in turn was found to directly impact project success with regards to meeting cost and duration objectives. While use of analogous estimating techniques (using similar previous projects) was not found to be useful for project managers with more experience, it was used by project managers with less experience in determining predictability.  相似文献   

16.
In this paper, we investigate the impact of flow (operationalized as heightened challenge and skill), engagement, and immersion on learning in game-based learning environments. The data was gathered through a survey from players (N = 173) of two learning games (Quantum Spectre: N = 134 and Spumone: N = 40). The results show that engagement in the game has a clear positive effect on learning, however, we did not find a significant effect between immersion in the game and learning. Challenge of the game had a positive effect on learning both directly and via the increased engagement. Being skilled in the game did not affect learning directly but by increasing engagement in the game. Both the challenge of the game and being skilled in the game had a positive effect on both being engaged and immersed in the game. The challenge in the game was an especially strong predictor of learning outcomes. For the design of educational games, the results suggest that the challenge of the game should be able to keep up with the learners growing abilities and learning in order to endorse continued learning in game-based learning environments.  相似文献   

17.
This paper investigates occupational stress, Type A behavior pattern, work attitudes, health symptoms, and health behaviors among information systems personnel. Hundreds of research studies have been conducted on the stress associated with working in various occupations. Unfortunately, information systems is one occupation that has not been included in these stress studies. The present study investigated the stress, work attitudes, and health behaviors of 446 information systems personnel employed in 18 different organizations. Type A behavior pattern was found to be a significant moderator for some of the stressor-criterion associations. The findings suggest that more managerial understanding of person-environment fit in general and the individual employee's predisposition toward the Type A behavior pattern specifically may be beneficial in attempting to initiate, nurture, and sustain a productive and healthy work atmosphere.  相似文献   

18.
Using the model for learning and teaching activities (MOLTA), a new technology enhanced hybrid instruction was designed, developed and implemented. The effectiveness of the hybrid instruction in regard to students’ achievement, knowledge retention, attitudes towards the subject, and course satisfaction was evaluated in comparison to traditional classroom instruction. Experimental study with pre-test, post-test control group design was carried out. The sample of the study consisted of 50 university students enrolled in “Computer Networks and Communication” course. The control and experimental groups composed of 24 and 26 students respectively, and the experiment lasted 14 weeks. The findings of the study indicated no significant difference between the hybrid course and the traditional course in students’ achievement, knowledge retention, satisfaction, and attitude.  相似文献   

19.
Yair Levy   《Computers & Education》2008,51(4):1664-1675
According to activity theory, activities are at the center of human behavior. Extensive attention has been given in literature to the success and effectiveness of online learning programs. Value theory suggests that human perceived value is a critical construct in investigating what is important to individuals. However, very limited attention has been given in literature to the role of users’ perceived value of learning activities in educational settings. Scholars suggest that additional studies on learning activities are needed in order to progress the current knowledge of the use of information systems in education. Therefore, this study investigated issues related to learners’ perceived value by uncovering the critical value factors (CVFs) of online learning activities. Participants in this study included 209 graduate students attending an online learning program. This study extended the first phase done in a prior research to uncover the CVFs of online learning activities. Results of this research study produced five reliable CVFs: (a) collaborative, social, and passive learning activities; (b) formal communication activities; (c) formal learning activities; (d) logistic activities; and (e) printing activities.  相似文献   

20.
This article describes the findings from the assessment of a touch-screen, multi-media learning program on livestock health and production: The Daktari. The program was tested on a sample of 62 livestock keepers in the Nairobi slums of Kariobangi and Kibera. The study examined prior knowledge regarding three livestock diseases (liver fluke, mastitis and mange) and compared this to newly acquired knowledge after exposure to the software. The results demonstrated a significant difference between pre- and post-knowledge assessments confirming that use of the program led to learning. Learning occurred among a variety of demographic/social groups (i.e. age, gender and education) with a range of abilities. Indeed, by utilising an audio–visual interface developed with relevant content for the population in question, it was found that the program could support and enhance participant understanding of livestock disease causation, diagnosis, treatment and prevention.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号