首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the quick increase of information and knowledge, automatically classifying text documents is becoming a hotspot of knowledge management. A critical capability of knowledge management systems is to classify the text documents into different categories, which are meaningful to users. In this paper, a text topic classification model based on domain ontology by using Vector Space Model is proposed. Eigenvectors as the input to the vector space model are constructed by utilizing concepts and hierarchical structure of ontology, which also provides the domain knowledge. However, a limited vocabulary problem is encountered while mapping keywords to their corresponding ontology concepts. A synonymy lexicon is utilized to extend the ontology and compress the eigenvector. The problem that eigenvectors are too large and complex to be calculated in traditional methods can be solved. At last, combing the concept's supporting, a top-down method according to the ontology structure is used to complete topic classification. An experimental system is implemented and the model is applied to this practical system. Test results show that this model is feasible.  相似文献   

2.
Text classification techniques mostly rely on single term analysis of the document data set, while more concepts, especially the specific ones, are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset's contribution to the classification. Experiments over the Reuters and newsgroup corpus are carried out, which validate the practicability of the proposed system.  相似文献   

3.
Innovating Web Page Classification Through Reducing Noise   总被引:5,自引:0,他引:5       下载免费PDF全文
This paper presents a new method that eliminates noise in Web page classification.It first describes the presentation of a Web page based on HTML tags.Then through a novel distance formula,it eliminates the noise in similarity measure.After carefully analyzing Web pages,we design an algorithm that can distinguish related hyperlinks from noisy ones,Web can utilize non-noisy hyperlinks to improve th performance of Web page classification (The AWN algorithm).For any page.we can classify it through the text and category of neighbor pages relted to the page.The experimental results show that our approach improved classification accuracy.  相似文献   

4.
Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplanes using an incremental approach. Such an approach allows one to find a near global minimizer of the classification error function and to compute as few hyperplanes as needed for separating sets. We apply this algorithm for solving supervised data classification problems and report the results of numerical experiments on real-world data sets. These results demonstrate that the new algorithm requires a reasonable training time and its test set accuracy is consistently good on most data sets compared with mainstream classifiers.  相似文献   

5.
Web Classification Based on Latent Semantic Indexing   总被引:2,自引:0,他引:2  
A new web document automatic classification algorithm based on Latent Semantic Indexing (LSIWAC), is proposed in this paper. LSMAC uses the LSI based on Singular Value Decomposition (SVD) to compress the document vector space to lower dimensional space. Using the optimal clustering, LSIWAC can cluster part of web documents Then, LSIWAC uses the optimal discriminate transform to get feature vector from every clustering's discriminate features. Finally, it uses the conception classification algorithm to classify the rest documents. LSIWAC solves the high dimension problem and improves the precision of web classification.  相似文献   

6.
A Fuzzy Approach to Classification of Text Documents   总被引:1,自引:0,他引:1       下载免费PDF全文
This paper discusses the classification problems of text documents. Based on the concept of the proximity degree, the set of words is partitioned into some equivalence classes.Particularly, the concepts of the semantic field and association degree are given in this paper.Based on the above concepts, this paper presents a fuzzy classification approach for document categorization. Furthermore, applying the concept of the entropy of information, the approaches to select key words from the set of words covering the classification of documents and to construct the hierarchical structure of key words are obtained.  相似文献   

7.
This paper propses the notion of a greylevel difference classification algorithm in fractal image compression .Then an example of the greylevel difference classification alge-rithm is given as an improvement of the quadrant grevlevel and variance classification in the quadtree-based encoding algorithm.The algorithm incorporates the frequency feature in spatial analysis using th notion of average quadrant greylevel difference,leading to an enhancement in terms of encoding time ,PSNR value and compression ratio.  相似文献   

8.
Classification Using Φ-Machines and Constructive Function Approximation   总被引:1,自引:1,他引:0  
This article presents a new classification algorithm, called CLEF, which induces a -machine by constructing its own features based on the training data. The features can be viewed as defining subsets of the instance space, and they allow CLEF to create useful non-linear functions over the input variables. The algorithm is guaranteed to find a classifier that separates the training instances, if such a separation is possible. We compare CLEF empirically to several other classification algorithms, including a well-known decision tree inducer, an artificial neural network inducer, and a support vector machine inducer. Our results show that the CLEF-induced -machines and support vector machines have similar accuracy on the suite tested, and that both are significantly more accurate than the other classifiers produced. We argue that the classifiers produced by CLEF are easy to interpret, and hence may be preferred over support vector machines in certain circumstances.  相似文献   

9.
网络的迅速普及,使得数据包分类技术广泛应用到网络通信领域的各个方面,这也加速了人们对数据包分类算法的研究.本文就算法的分类步骤、评价算法的性能指标等作了简单介绍,并对更适合实际应用的RFC算法进行了详细阐述,以及提出了对RFC算法的改进方法.  相似文献   

10.
Brain–computer interfaces (BCIs) are recent developments in alternative technologies of user interaction. The purpose of this paper is to explore the potential of BCIs as user interfaces for CAD systems. The paper describes experiments and algorithms that use the BCI to distinguish between primitive shapes that are imagined by a user. Users wear an electroencephalogram (EEG) headset and imagine the shape of a cube, sphere, cylinder, pyramid or a cone. The EEG headset collects brain activity from 14 locations on the scalp. The data is analyzed with independent component analysis (ICA) and the Hilbert–Huang Transform (HHT). The features of interest are the marginal spectra of different frequency bands (theta, alpha, beta and gamma bands) calculated from the Hilbert spectrum of each independent component. The Mann–Whitney U-test is then applied to rank the EEG electrode channels by relevance in five pair-wise classifications. The features from the highest ranking independent components form the final feature vector which is then used to train a linear discriminant classifier. Results show that this classifier can discriminate between the five basic primitive objects with an average accuracy of about 44.6% (compared to naïve classification rate of 20%) over ten subjects (accuracy range of 36%–54%). The accuracy classification changes to 39.9% when both visual and verbal cues are used. The repeatability of the feature extraction and classification was checked by conducting the experiment on 10 different days with the same participants. This shows that the BCI holds promise in creating geometric shapes in CAD systems and could be used as a novel means of user interaction.  相似文献   

11.
Classification is an important technique in data mining.The decision trees builty by most of the existing classification algorithms commonly feature over-branching,which will lead to poor efficiency in the subsequent classification period.In this paper,we present a new value-oriented classification method,which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible,based on the concepts of frequent-pattern-node and exceptive-child-node.The experiments show that while using relevant anal-ysis as pre-processing ,our classification method,without loss of accuracy,can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.  相似文献   

12.
A Non-Collision Hash Trie-Tree Based Fast IP Classification Algorithm   总被引:10,自引:0,他引:10       下载免费PDF全文
With the developemnt of network applications,routers must support such functions as firewalls,provision of QoS,traffic billing,etc.All these functions need the classification of IP packets,according to how different the packetes are processd subsequently,which is determined.In this article,a novle IP classification algorithm is proposed based on the Grid of Tries algorithm.The new algorithm not only eliminates original limitations in th case of multiple fields but also shows better performance in regard to both and space.It has better overall performance than many other algorithms.  相似文献   

13.
Map recognition is an essential data input means of Geographic Information System (GIS). How to solve the problems in the procedure, such as recognition of maps with crisscross pipeline networks, classification of buildings and roads, and processing of connected text, is a critical step for GIS keeping high-speed development. In this paper, a new recognition method of pipeline maps is presented, and some common patterns of pipeline connection and component labels are established. Through pattern matching, pipelines and component labels are recognized and peeled off from maps. After this approach, maps simply consist of buildings and roads, which are recognized and classified with fuzzy classification method. In addition, the Double Sides Scan (DSS) technique is also described, through which the effect of connected text can be eliminated.  相似文献   

14.
In this paper,we mainly used MODIS NDVI time-series dataset at 16-days temporal resolution and 250-meters spatial resolution to analyze land cover mapping of northeastern China.We used two different filter methods to fit NDVI time-series dataset,and compared their average classes’ separability based on Jeffries-Matusita distance index.In addition,we made use of hierarchical classification method to complete classification,combined with short-wave infrared spectral reflectance data and DEM.We conformed to the principle that separate area hierarchically into several parts first and then classify each part further,and use a single characteristic band first and then multiple feature bands.In the process of classification,we adopted threshold value method,support vector machine,artificial net neural and C5.0 decision tree classification to distinguish each land-cover type hierarchically.Finally,we evaluated the accuracy of the final classification of study area using known land-cover classification data and high-resolution remote sensing imagery,overall accuracy is 84.61%,Kappa coefficient is 0.8262.  相似文献   

15.
Neural Processing Letters - Several convolutional neural network architectures have been proposed for handwritten character recognition. However, most of the conventional architectures demand large...  相似文献   

16.
Although a β–turn consists of only four amino acids, it assumes many different types in proteins. Is this basically dependent on the tetrapeptide sequence alone or as a result of variety of interactions with the other part of a protein? To answer this question, T. Kohonen's self–organization Model which is one of the typical neural networks is proposed that can reflect the sequence–coupling effect of a tetrapeptide is not only a β–turn or non–β–turn, but also different types of a β–turn. There are 6028 β–turn tetrapeptides of β–turn types I (1227), I′(125), II(405), II′(89), VI(55), VIII(320), and non–β–turns (3807) in the training database as constructed recently by Chou and Blinn (1997). Using these training data the rate of correct prediction by the neural network for a given protein: rubredoxin (54 residues, 51 tetrapeptides) which includes 12 β–turn types I tetrapeptides, 1 β–turn types II tetrapeptides and 38 non–β–turns reaches 90.2%. The high quality of prediction of the neural network model implies that the formation of different β–turn types or non–β–turns is considerably correlated with the sequence of a tetrapeptide.  相似文献   

17.
In the light of multi-continued fraction theories, we make a classification and counting for multi-strict continued fractions, which are corresponding to multi-sequences of multiplicity m and length n. Based on the above counting, we develop an iterative formula for computing fast the linear complexity distribution of multi-sequences. As an application, we obtain the linear complexity distributions and expectations of multi-sequences of any given length n and multiplicity m less than 12 by a personal computer. But only results of m=3 and 4 are given in this paper.  相似文献   

18.
This article describes a novel method that models the correlation among acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by mean trajectory polynomial segment models (PSM). This method is an extension of conventional segment modeling approaches in that it describes the correlation of acoustic observations not only inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. Using the proposed method in a speaker-independent phoneme classification test resulted in a 7 to 9% relative reduction of error rate as compared with the traditional triphone segmental model system and a 31% reduction as compared with a similar triphone hidden Markov model (HMM) system.  相似文献   

19.
It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS oi1 several benchmark datasets.  相似文献   

20.
We have examined the contribution of multipolarized airborne radar data for the discrimination of crops. An unsupervised classification algorithm and a maximum likelihood supervised classification were used and compared. The results show that multipolarized radar data offer an accurate means of identifying crops. The average classification accuracies were 83 and 79 per cent for the supervised and unsupervised methods respectively. Comparison of the two methods using the same data suggests that the unsupervised method gives essentially similar results to those using the supervised classification method; however, the unsupervised method requires far less field effort and computer time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号