期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple Window Scan Statistics for Two Dimensional Poisson Processes

Jie Chen Joseph Glaz 《Methodology and Computing in Applied Probability》2016,18(4):967-977

In this article, approximations for the distribution of multiple window scan statistics for Poisson Processes on a two dimensional rectangular region are derived, for the conditional and unconditional model. These multiple window scan statistics are based on the minimum of p-values and repeated minimum p-values of fixed window scan statistics. Numerical results are presented to evaluate the performance of these multiple window scan statistics and compare their power with fixed window scan statistics for selected local type alternatives. 相似文献

2.

A martingale approach to scan statistics

Vladimir Pozdnyakov Joseph Glaz Martin Kulldorff J. Michael Steele 《Annals of the Institute of Statistical Mathematics》2005,57(1):21-37

Scan statistics are commonly used in biology, medicine, engineering and other fields where interest is in the probability of observing clusters of events in a window at an unknown location. Due to the dependent nature of the number of events in a large number of overlapping window locations, even approximate solutions for the simplest scan statistics may require elaborate calculations. We propose a new martingale method which allows one to approximate the distribution for a wide variety of scan statistics, including some for which analytical results are computationally infeasible. 相似文献

3.

Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters

《Journal of computational and graphical statistics》2013,22(2):428-442

Spatial scan statistics are commonly used for geographic disease cluster detection and evaluation. We propose and implement a modified version of the simulated annealing spatial scan statistic that incorporates the concept of “non-compactness” in order to penalize clusters that are very irregular in shape. We evaluate its power for the simulated annealing scan and compare it with the circular and elliptic spatial scan statistics. We observe that, with the non-compactness penalty, the simulated annealing method is competitive with the circular and elliptic scan statistic, and both have good power performance. The elliptic scan statistic is computationally faster and is well suited for mildly irregular clusters, but the simulated annealing method deals better with highly irregular cluster shapes. The new method is applied to breast cancer mortality data from northeastern United States. 相似文献

4.

Estimating the Distributions of Scan Statistics with High Precision

George Haiman 《Extremes》2000,3(4):349-361

In many statistical applications one is concerned with the estimation of the distribution of the maximum or minimum number of points in a moving window of fixed length, called scan statistics. Scan statistics are also extremes of 1-dependent sequences. A result of Haiman (1999) provides approximations of these distributions together with sharp bounds for the corresponding errors. Applications concern the maximum cluster of points on a line or on a circle and multiple coverage by subintervals or subarcs of fixed size. We compare our method with some existing empirical and non empirical methods and show how it can be applied to multidimensional scanning 相似文献

5.

A multiple window scan statistic for time series models

《Statistics & probability letters》2014

In this article we extend the results derived for scan statistics in Wang and Glaz (2014) for independent normal observations. We investigate the performance of two approximations for the distribution of fixed window scan statistics for time series models. An R algorithm for computing multivariate normal probabilities established in Genz and Bretz (2009) can be used along with proposed approximations to implement fixed window scan statistics for ARMA models. The accuracy of these approximations is investigated via simulation. Moreover, a multiple window scan statistic is defined for detecting a local change in the mean of a Gaussian white noise component in ARMA models, when the appropriate length of the scanning window is unknown. Based on the numerical results, for power comparisons of the scan statistics, we can conclude that when the window size of a local change is unknown, the multiple window scan statistic outperforms the fixed window scan statistics. 相似文献

6.

Scan Statistics for Detecting a Local Change in Variance for Normal Data with Known Variance

Bo Zhao Joseph Glaz 《Methodology and Computing in Applied Probability》2016,18(2):563-573

In this article, several scan statistics are discussed for detecting a local change in variance for one dimensional normal data. When the length of the scanning window is known, a fixed window scan statistic based on moving sum of squares is proposed. Two approximations for the distribution of this scan statistic are investigated. When the length of the scanning window is unknown, a variable window scan statistic based on a generalized likelihood ratio test and a multiple window minimum P-value scan statistic are proposed for detecting the local change in variance. For a moderate or large shift in variance, numerical results indicate that both the variable and multiple window scan statistics perform well. For large data sets, considering the detection power and computing efficiency, the multiple window scan statistic is recommended. 相似文献

7.

Double-Scan Statistics

Naus J. I. Stefanov V. T. 《Methodology and Computing in Applied Probability》2002,4(2):163-180

Researchers frequently scan sequences for unusual clustering of events. Glaz et al. (2001) survey scan statistic tools developed for these analyses. Many of these tools deal with clustering of one type of event. In other applications the researcher scans for clusters of two types of events, A and B. Consider a sequence of D independent and identically distributed trials where each trial has one of four possible outcomes: A ^c B ^c, A B ^c, A ^c B, A B. When the events A and B occur within d consecutive trials, we say that a two-type d-cluster has occurred (a directional cluster is also defined that requires that the A event comes at least as early as the B event). Naus and Wartenberg (1997) develop a double scan statistic that counts the number of declumped (a type of non-overlapping) clusters that contain at least one of each of two different types of events. They derived the expectation and variance and Poisson approximation for the distribution of the double scan statistic. The approximation and declumping methods used work well when the events are relatively rare but not as well for the case where the two types of events occur with high frequency. This paper develops an alternative family of double scan statistics to count the number of non-overlapping two-type d-clusters. These new double scan statistics behave similarly to the Naus-Wartenberg statistic for rare events, but capture other information for the more dense event case. Exact and approximate results are derived for the distribution of the new double scan statistics, allowing its use for a wider range of density of events. The double scan statistics are compared for the epidemiologic application in Naus and Wartenberg, and for a molecular biology application involving genome versus genome protein hits. 相似文献

8.

A spatial scan statistic for case event data based on connected components

Lionel Cucala Christophe Demattei Paulo Lopes Andre Ribeiro 《Computational Statistics》2013,28(1):357-369

We propose a new spatial scan statistic based on graph theory as a method for detecting irregularly-shaped clusters of events over space. A graph-based method is proposed for identifying potential clusters in spatial point processes. It relies on linking the events closest than a given distance and thus defining a graph associated to the point process. The set of possible clusters is then restricted to windows including the connected components of the graph. The concentration in each of these possible clusters is measured through classical concentration indices based on likelihood ratio and also through a new concentration index which does not depend on any alternative hypothesis. These graph-based spatial scan tests seem to be very powerful against any arbitrarily-shaped cluster alternative, whatever the dimension of the data. These results have applications in various fields, such as the epidemiological study of rare diseases or the analysis of astrophysical data. 相似文献

9.

Exact pseudopolynomial algorithms for a balanced 2-clustering problem

A. V. Kel’manov A. V. Motkova 《Journal of Applied and Industrial Mathematics》2016,10(3):349-355

We consider the strongly NP-hard problem of partitioning a set of Euclidean points into two clusters so as to minimize the sum (over both clusters) of the weighted sum of the squared intracluster distances from the elements of the clusters to their centers. The weights of sums are the sizes of the clusters. The center of one cluster is given as input, while the center of the other cluster is unknown and determined as the average value over all points in the cluster (as the geometric center). Two variants of the problems are analyzed in which the cluster sizes are either given or unknown. We present and prove some exact pseudopolynomial algorithms in the case of integer components of the input points and fixed space dimension. 相似文献

10.

A cluster-based optimization approach for the multi-depot heterogeneous fleet vehicle routing problem with time windows

Rodolfo Dondo Jaime Cerdá 《European Journal of Operational Research》2007

This paper presents a novel three-phase heuristic/algorithmic approach for the multi-depot routing problem with time windows and heterogeneous vehicles. It has been derived from embedding a heuristic-based clustering algorithm within a VRPTW optimization framework. To this purpose, a rigorous MILP mathematical model for the VRPTW problem is first introduced. Likewise other optimization approaches, the new formulation can efficiently solve case studies involving at most 25 nodes to optimality. To overcome this limitation, a preprocessing stage clustering nodes together is initially performed to yield a more compact cluster-based MILP problem formulation. In this way, a hierarchical hybrid procedure involving one heuristic and two algorithmic phases was developed. Phase I aims to identifying a set of cost-effective feasible clusters while Phase II assigns clusters to vehicles and sequences them on each tour by using the cluster-based MILP formulation. Ordering nodes within clusters and scheduling vehicle arrival times at customer locations for each tour through solving a small MILP model is finally performed at Phase III. Numerous benchmark problems featuring different sizes, clustered/random customer locations and time window distributions have been solved at acceptable CPU times. 相似文献

11.

Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem

A. V. Kel’manov V. I. Khandeev 《Computational Mathematics and Mathematical Physics》2016,56(2):334-341

The strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters of given sizes (cardinalities) minimizing the sum (over both clusters) of the intracluster sums of squared distances from the elements of the clusters to their centers is considered. It is assumed that the center of one of the sought clusters is specified at the desired (arbitrary) point of space (without loss of generality, at the origin), while the center of the other one is unknown and determined as the mean value over all elements of this cluster. It is shown that unless P = NP, there is no fully polynomial-time approximation scheme for this problem, and such a scheme is substantiated in the case of a fixed space dimension. 相似文献

12.

一种改进的遗传k-means聚类算法 总被引：8，自引：0，他引：8

刘婷郭海湘诸克军高思维《数学的实践与认识》2007,37(8):104-111

在经典的k-means聚类算法中,聚类数k必须事先给定,然而在现实中k很难被精确的确定.本文提出了一种改进的遗传k-means聚类算法,并构造了一个用来评价分类程度好坏的适应度函数,该适应度函数考虑的是在提高紧凑度(类内距)和分离度(类间距)的同时使得分类个数尽可能少.最后采用两个人工数据集和三个UCI数据集对k-means聚类算法(KM),遗传聚类算法(GA),遗传k-means聚类算法(GKM)和改进的遗传k-means聚类算法(IGKM)进行比较研究,比较的指标有类间距、类内距和分类正确率.研究证明改进的遗传k-means算法能够自动获取最佳聚类数k并且保持较高的正确率. 相似文献

13.

Statistical Cluster Points of Sequences in Finite Dimensional Spaces

S. Pehlivan A. Güncan M. A. Mamedov 《Czechoslovak Mathematical Journal》2004,54(1):95-102

In this paper we study the set of statistical cluster points of sequences in m-dimensional spaces. We show that some properties of the set of statistical cluster points of the real number sequences remain in force for the sequences in m-dimensional spaces too. We also define a notion of -statistical convergence. A sequence xis -statistically convergent to a set Cif Cis a minimal closed set such that for every > 0 the set has density zero. It is shown that every statistically bounded sequence is -statistically convergent. Moreover if a sequence is -statistically convergent then the limit set is a set of statistical cluster points. 相似文献

14.

Clustering of imbalanced high-dimensional media data

Šárka Brodinová Maia Zaharieva Peter Filzmoser Thomas Ortner Christian Breiteneder 《Advances in Data Analysis and Classification》2018,12(2):261-284

Media content in large repositories usually exhibits multiple groups of strongly varying sizes. Media of potential interest often form notably smaller groups. Such media groups differ so much from the remaining data that it may be worthy to look at them in more detail. In contrast, media with popular content appear in larger groups. Identifying groups of varying sizes is addressed by clustering of imbalanced data. Clustering highly imbalanced media groups is additionally challenged by the high dimensionality of the underlying features. In this paper, we present the imbalanced clustering (IClust) algorithm designed to reveal group structures in high-dimensional media data. IClust employs an existing clustering method in order to find an initial set of a large number of potentially highly pure clusters which are then successively merged. The main advantage of IClust is that the number of clusters does not have to be pre-specified and that no specific assumptions about the cluster or data characteristics need to be made. Experiments on real-world media data demonstrate that in comparison to existing methods, IClust is able to better identify media groups, especially groups of small sizes. 相似文献

15.

Multi-objective selection for collecting cluster alternatives

Johann M. Kraus Christoph M��ssel G��nther Palm Hans A. Kestler 《Computational Statistics》2011,26(2):341-353

Grouping objects into different categories is a basic means of cognition. In the fields of machine learning and statistics, this subject is addressed by cluster analysis. Yet, it is still controversially discussed how to assess the reliability and quality of clusterings. In particular, it is hard to determine the optimal number of clusters inherent in the underlying data. Running different cluster algorithms and cluster validation methods usually yields different optimal clusterings. In fact, several clusterings with different numbers of clusters are plausible in many situations, as different methods are specialized on diverse structural properties. To account for the possibility of multiple plausible clusterings, we employ a multi-objective approach for collecting cluster alternatives (MOCCA) from a combination of cluster algorithms and validation measures. In an application to artificial data as well as microarray data sets, we demonstrate that exploring a Pareto set of optimal partitions rather than a single solution can identify alternative solutions that are overlooked by conventional clustering strategies. Competitive solutions are hereby ranked following an impartial criterion, while the ultimate judgement is left to the investigator. 相似文献

16.

A scatter search heuristic for the capacitated clustering problem

《European Journal of Operational Research》2006,169(2):533-547

This paper proposes a scatter search-based heuristic approach to the capacitated clustering problem. In this problem, a given set of customers with known demands must be partitioned into p distinct clusters. Each cluster is specified by a customer acting as a cluster center for this cluster. The objective is to minimize the sum of distances from all cluster centers to all other customers in their cluster, such that a given capacity limit of the cluster is not exceeded and that every customer is assigned to exactly one cluster. Computational results on a set of instances from the literature indicate that the heuristic is among the best heuristics developed for this problem. 相似文献

17.

带有弹性碰撞的离散的凝结方程 总被引：1，自引：0，他引：1

郑列《数学理论与应用》2004,24(3):97-101

带有弹性碰撞的离散的凝结方程是反映粒子增长动力学的数学模型，它刻划了这样一种粒子反应系统；系统中任意两个粒子碰撞后一定的概率或者凝结成为更大的粒子，或者发生弹性碰撞．本文研究了这一系统发生冻肢的可能性，并给出了一个充分条件．相似文献

18.

Delineation of Irregularly Shaped Disease Clusters Through Multiobjective Optimization

《Journal of computational and graphical statistics》2013,22(1):243-262

Irregularly shaped spatial disease clusters occur commonly in epidemiological studies, but their geographic delineation is poorly defined. Most current spatial scan software usually displays only one of the many possible cluster solutions with different shapes, from the most compact round cluster to the most irregularly shaped one, corresponding to varying degrees of penalization parameters imposed on the freedom of shape. Even when a fairly complete set of solutions is available, the choice of the most appropriate parameter setting is left to the practitioner, whose decision is often subjective. We propose quantitative criteria for choosing the best cluster solution, through multiobjective optimization, by finding the Pareto-set in the solution space. Two competing objectives are involved in the search: regularity of shape and scan statistic value. Instead of running sequentially a cluster-finding algorithm with varying degrees of penalization, the complete set of solutions is found in parallel, employing a genetic algorithm. The cluster significance concept is extended for this set in a natural and unbiased way, being employed as a decision criterion for choosing the optimal solution. The Gumbel distribution is used to approximate the empirical scan statistic distribution, speeding up the significance estimation. The multiobjective methodology is compared with the genetic mono-objective algorithm. The method is fast, with good power of detection. We discuss an application to breast cancer cluster detection. The introduction of the concept of Pareto-set in this problem, followed by the choice of the most significant solution, is shown to allow a rigorous statement about what is a “best solution,” without the need of any arbitrary parameter. 相似文献

19.

离散的非线性爆炸方程的密度守恒解 总被引：2，自引：0，他引：2

郑列《应用数学》2005,18(1):104-111

离散的非线性爆炸方程是刻划粒子增长动力学的数学模型，这一模型反映了一类粒子反应系统中各种粒子密度随时间变化的规律，它是由可数无限多个彼此相互关联的非线性常微分方程所组成的自治系统。本文研究了这一无限维系统的密度守恒解的存在性。相似文献

20.

Clusters of extremes: modeling and examples

Natalia M. Markovich 《Extremes》2017,20(3):519-538

We study clusters of threshold exceedances caused by dependence in time series. The clusters are defined as conglomerates containing consecutive threshold exceedances of the series separated by return intervals with consecutive non-exceedances. We derive asymptotic distributions of the cluster and inter-cluster sizes for processes with the extremal index equal to zero, the asymptotic expectation of the inter-cluster size and an exponential rate of convergence of the distribution tail of the return interval between clusters to the stable distribution tail. Distributions of the cluster and inter-cluster sizes of ARMAX, MM and AR(1) processes are obtained. 相似文献