首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
Two hierarchical classifier strategies which minimize the global and local probabilities of misclassification, respectively are presented. The modified version of the k-NN rule for a hierarchical classifier is proposed. Numerical examples are given.  相似文献   

2.
3.
Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as an optimization process of grouping documents into k clusters so that a particular criterion function is minimized or maximized. Usually, the cosine function is used to measure the similarity between two documents in the criterion function, but it may not work well when the clusters are not well separated. To solve this problem, we applied the concepts of neighbors and link, introduced in [S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366], to document clustering. If two documents are similar enough, they are considered as neighbors of each other. And the link between two documents represents the number of their common neighbors. Instead of just considering the pairwise similarity, the neighbors and link involve the global information into the measurement of the closeness of two documents. In this paper, we propose to use the neighbors and link for the family of k-means algorithms in three aspects: a new method to select initial cluster centroids based on the ranks of candidate documents; a new similarity measure which uses a combination of the cosine and link functions; and a new heuristic function for selecting a cluster to split based on the neighbors of the cluster centroids. Our experimental results on real-life data sets demonstrated that our proposed methods can significantly improve the performance of document clustering in terms of accuracy without increasing the execution time much.  相似文献   

4.
There are different ways for an external agent to influence the outcome of an election. We concentrate on “control” by adding or deleting candidates. Our main focus is to investigate the parameterized complexity of various control problems for different voting systems. To this end, we introduce natural digraph problems that may be of independent interest. They help in determining the parameterized complexity of control for different voting systems including Llull, Copeland, and plurality voting. Devising several parameterized reductions, we provide an overview of the parameterized complexity of the digraph and control problems with respect to natural parameters such as adding/deleting only a bounded number of candidates or having only few voters.  相似文献   

5.
C.  I.  J.  J. I.  G. 《Pattern recognition》2002,35(12):2761-2769
Classifiers based on neighbourhood concept require a high computational cost when the Reference Patterns Set is large. In this paper, we propose the use of hierarchical classifiers to reduce this computational cost, maintaining the hit rate in the recognition of handwritten digits. The hierarchical classifiers reach the hit rate of the best individual classifier. We have used NIST Database to carry out the experimentation, and we have worked with two test sets: in Test 1 (SD3, SD19) the hit rate is 99.54%, with a speed-up of 40.6, and in Test 2 (SD7), the hit rate is 97.51% with a speed-up of 15.7.  相似文献   

6.
Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. We make use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm.  相似文献   

7.
Proper interpretation of the thyroid gland functional data is an important issue in diagnosis of thyroid disease. The primary role of the thyroid gland is to help regulation of the body's metabolism. Thyroid hormone produced by thyroid gland provides this. Production of too little thyroid hormone (hypo-thyroidism) or production of too much thyroid hormone (hyper-thyroidism) defines the types of thyroid disease. It is evident that usage of machine learning methods in disease diagnosis has been increasing gradually. In this study, diagnosis of thyroid disease, which is a very common and important disease, was conducted with such a machine learning system. In this study, we have detected on thyroid disease using principles component analysis (PCA), k-nearest neighbor (k-NN) based weighted pre-processing and adaptive neuro-fuzzy inference system (ANFIS). The proposed system has three stages. In the first stage, dimension of thyroid disease dataset that has 5 features is reduced to 2 features using principles component analysis. In the second stage, a new weighting scheme based on k-nearest neighbor (k-NN) method was utilized as a pre-processing step before the main classifier. Then, in the third stage, we have used adaptive neuro-fuzzy inference system to diagnosis of thyroid disease. We took the thyroid disease dataset used in our study from the UCI machine learning database. The obtained classification accuracy of our system was 100% and it was very promising with regard to the other classification applications in literature for this problem.  相似文献   

8.
We show that anyk-connected graphG = (V, E) has a sparsek-connected spanning subgraphG = (V, E) with ¦E¦ =O(k¦V¦) by presenting anOE¦)-time algorithm to find one such subgraph, where connectivity stands for either edge-connectivity or node-connectivity. By using this algorithm as preprocessing, the time complexities of some graph problems related to connectivity can be improved. For example, the current best time boundO(max{k 2¦V¦1/2,k¦V¦}¦E¦) to determine whether node-connectivityK(G) of a graphG = (V, E) is larger than a given integerk or not can be reduced toO(max{k 3¦V¦3/2,k 2¦V¦2}).The first author was partially supported by the Grant-in-Aid for Encouragement of Young Scientists of the Ministry of Education, Science and Culture of Japan and by the subvention to young scientists by the Research Foundation of Electrotechnology of Chubu.  相似文献   

9.
Attribute reduction based on evidence theory in incomplete decision systems   总被引:3,自引:0,他引:3  
Wei-Zhi Wu 《Information Sciences》2008,178(5):1355-1371
Attribute reduction is a basic issue in knowledge representation and data mining. This paper deals with attribute reduction in incomplete information systems and incomplete decision systems based on Dempster-Shafer theory of evidence. The concepts of plausibility reduct and belief reduct in incomplete information systems as well as relative plausibility reduct and relative belief reduct in incomplete decision systems are introduced. It is shown that in an incomplete information system an attribute set is a belief reduct if and only if it is a classical reduct and a plausibility consistent set must be a classical consistent set. In a consistent incomplete decision system, the concepts of relative reduct, relative plausibility reduct, and relative belief reduct are all equivalent. In an inconsistent incomplete decision system, an attribute set is a relative plausibility reduct if and only if it is a relative reduct, a plausibility consistent set must be a belief consistent set, and a belief consistent set is not a plausibility consistent set in general.  相似文献   

10.
The parametric-mixed-sensitivity problem is posed for the class of irrational transfer matrices with a realization as a Pritchard-Salamon system. An exact solution is obtained in terms of two uncoupled, standard Riccati equations.  相似文献   

11.
A refinement to the (k,n) threshold scheme proposed by Wu and He [1] is presented. It has been found that usable primes p need not be confined to the form p ≡ ±3 (mod 8) as suggested originally. Our constructive proof also lead to a different algorithm for implementation. This implementation dispenses with the directory file suggested in [1] which can be prohibitively large when the scheme is implemented with large p using the original proposal.  相似文献   

12.
The densest k-subgraph (DkS) problem asks for a k-vertex subgraph of a given graph with the maximum number of edges. The DkS problem is NP-hard even for special graph classes including bipartite, planar, comparability and chordal graphs, while no constant approximation algorithm is known for any of these classes. In this paper we present a 3-approximation algorithm for the class of chordal graphs. The analysis of our algorithm is based on a graph theoretic lemma of independent interest.  相似文献   

13.
In this paper a suboptimal solution to the time-variant discrete version of the extended Nehari problem is given. The main tool used to obtain such a solution, with prescribed tolerance, is the so-called anticausal stabilizing solution to the reverse discrete-time Riccati equation.  相似文献   

14.
Sanjay  Geir   《Automatica》2001,37(12)
In this paper we address the asynchronous multi-rate sampled-data H synthesis problem. Necessary and sufficient conditions are given for the existence of a controller achieving the desired performance, and the problem is shown to be equivalent to a convex optimization problem expressed in the form of linear operator inequalities. In the case where the sample and hold rates are synchronous, these operator inequalities reduce to linear matrix inequalities, for which standard numerical software is available.  相似文献   

15.
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast κ-nearest-neighbor (κ-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a κ-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B^+-tree. Thus, given a query point, its κ-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.  相似文献   

16.
In this note, a simple proof is given for the Hamiltonian solution to the H optimal sensitivity for plants with arbitrary inner transfer function. The approach combines skew Toeplitz theory and state-space representations, and gives rise to a straightforward and basis-free methodology.  相似文献   

17.
Gergely     rp  d  J  nos  J  zsef 《Performance Evaluation》2003,53(3-4):209-223
This paper introduces new admission criteria that enable the use of algorithms based on the many sources asymptotics in real-life applications. This is achieved by a significant reduction in the computational requirements and by moving the computationally intensive tasks away from the timing-sensitive decision instant. It is shown that the traditional overflow probability type admission control method can be reformulated into a bandwidth requirement type and into a buffer requirement type method and that these methods are equivalent when used for admission control. The original and the two proposed methods are compared through the example of fractional Brownian motion (fBm) traffic.  相似文献   

18.
Given a set S of m points stored on a reconfigurable mesh computer of size n×n, one point per processing element (PE). In this paper we present a parallel method for solving the k-Nearest Neighbor problem (k-NN). This method permits each point of S to know its k-NN (0<k<m). The corresponding algorithm requires that each PE must have 2k registers where it stores the (x,y) coordinates of its k-NN in a given order. This algorithm has a complexity of O(logh+k 2) times, where h is a maximal number of points within a row of the mesh. This complexity is reduced to O(k 2) times, using an appropriate procedure which demonstrates the power of the reconfiguration operations carried out by the processors, and the polymorphic properties of the mesh.  相似文献   

19.
In this paper, we investigate what constitutes the least amount of a priori information on the nonlinearity so that the FIR linear part is identifiable in the non-Gaussian input case. Three types of a priori information are considered including quadrant information, point information and locally monotonous information. In all three cases, identifiability has been established and corresponding identification algorithms are developed with their convergence proofs.  相似文献   

20.
We present a generic scheme for approximating NP-hard problems on graphs of treewidth k=ω(logn). When a tree-decomposition of width ? is given, the scheme typically yields an ?/logn-approximation factor; otherwise, an extra logk factor is incurred. Our method applies to several basic subgraph and partitioning problems, including the maximum independent set problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号