首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
Two-dimensional orthogonal matching pursuit (2D-OMP) algorithm is an extension of the one-dimensional OMP (1D-OMP), whose complexity and memory usage are lower than the 1D-OMP when they are applied to 2D sparse signal recovery. However, the major shortcoming of the 2D-OMP still resides in long computing time. To overcome this disadvantage, we develop a novel parallel design strategy of the 2D-OMP algorithm on a graphics processing unit (GPU) in this paper. We first analyze the complexity of the 2D-OMP and point out that the bottlenecks lie in matrix inverse and projection. After adopting the strategy of matrix inverse update whose performance is superior to traditional methods to reduce the complexity of original matrix inverse, projection becomes the most time-consuming module. Hence, a parallel matrix–matrix multiplication leveraging tiling algorithm strategy is launched to accelerate projection computation on GPU. Moreover, a fast matrix–vector multiplication, a parallel reduction algorithm, and some other parallel skills are also exploited to boost the performance of the 2D-OMP further on GPU. In the case of the sensing matrix of size 128 \(\times \) 256 (176 \(\times \) 256, resp.) for a 256 \(\times \) 256 scale image, experimental results show that the parallel 2D-OMP achieves 17 \(\times \) to 41 \(\times \) (24 \(\times \) to 62 \(\times \) , resp.) speedup over the original C code compiled with the O \(_2\) optimization option. Higher speedup would be further obtained with larger-size image recovery.  相似文献   

2.
Uruguay is currently undergoing a gradual process of inclusion of wind energy in its matrix of electric power generation. In this context, a computational tool has been developed to predict the electrical power that will be injected into the grid. The tool is based on the Weather Research and Forecasting (WRF) numerical model, which is the performance bottleneck of the application. For this reason, and in line with several successful efforts of other researchers, this article presents advances in porting the WRF to GPU. In particular, we present the implementation of sintb and bdy_interp1 routines on GPU and the integration of these routines with previous efforts from other authors. The speedup values obtained for the newly ported routines on a Nvidia GeForce GTX 480 GPU are up to \(33.9\times \) when compared with the sequential WRF and \(9.2\times \) when compared with the four-threaded WRF. The integration of the newly ported routines along with previous works produces a reduction of more than a 30 % in the total runtime of the multi-core four-threaded WRF and of more than a 50 % in the single-threaded version.  相似文献   

3.
In this work, we present a tool that exploits heterogeneous computing to calculate the noise scattered by an object from the pressure distribution over its surface and its normal derivative. The method mainly deals with a large Matrix–Vector Product where the matrix elements must be calculated on the fly in such a way that the problem fits in main memory. To prove the performance of the heterogeneous implementations, the tool is tested using one NVIDIA K20c GPU, one Intel Xeon Phi 5110P, and two Intel Xeon E5-2650 CPUs. The speedup of the accelerated implementations ranges from \(3\times \) (Xeon Phi) to \(8\times \) (Xeon Phi  \(+\)  K20c) when compared to our parallel CPU code with \(32\) threads. This work, combined with the authors’ previous works for the computation of the acoustic pressure over the obstacle surface, results in a valuable toolset for noise control applications during aircraft design.  相似文献   

4.
The behavior of total quantum correlations (discord) in dimers consisting of dipolar-coupled spins 1/2 are studied. We found that the discord $Q=0$ at absolute zero temperature. As the temperature $T$ increases, the quantum correlations in the system increase at first from zero to its maximum and then decrease to zero according to the asymptotic law $T^{-2}$ . It is also shown that in absence of external magnetic field $B$ , the classical correlations $C$ at $T\rightarrow 0$ are, vice versa, maximal. Our calculations predict that in crystalline gypsum $\hbox {CaSO}_{4}\cdot \hbox {2H}_{2}{\hbox {O}}$ the value of natural $(B=0)$ quantum discord between nuclear spins of hydrogen atoms is maximal at the temperature of 0.644  $\upmu $ K, and for 1,2-dichloroethane $\hbox {H}_{2}$ ClC– $\hbox {CH}_{2}{\hbox {Cl}}$ the discord achieves the largest value at $T=0.517~\upmu $ K. In both cases, the discord equals $Q\approx 0.083$  bit/dimer what is $8.3\,\%$ of its upper limit in two-qubit systems. We estimate also that for gypsum at room temperature $Q\sim 10^{-18}$  bit/dimer, and for 1,2-dichloroethane at $T=90$  K the discord is $Q\sim 10^{-17}$  bit per a dimer.  相似文献   

5.
We consider mining unusual patterns from a set  \(T\) of target texts. A typical method outputs unusual patterns if their observed frequencies are far from their expectation estimated under an assumed probabilistic model. However, it is difficult for the method to deal with the zero frequency and thus it suffers from data sparseness. We employ another set  \(B\) of background texts to define a composition  \(xy\) to be peculiar if both \(x\) and  \(y\) are more frequent in  \(B\) than in  \(T\) and conversely \(xy\) is more frequent in  \(T\) . \(xy\) is unusual because \(x\) and  \(y\) are infrequent in  \(T\) while \(xy\) is unexpectedly frequent compared to  \(xy\) in  \(B\) . To find frequent subpatterns and infrequent patterns simultaneously, we develop a fast algorithm using the suffix tree and show that it scales almost linearly under practical settings of parameters. Experiments using DNA sequences show that found peculiar compositions basically appear in rRNA while patterns found by an existing method seem not to relate to specific biological functions. We also show that discovered patterns have similar lengths at which the distribution of frequencies of fixed length substrings begins to skew. This fact explains why our method can find long peculiar compositions.  相似文献   

6.
Prolate elements are a “plug-compatible” modification of spectral elements in which Legendre polynomials are replaced by prolate spheroidal wave functions of order zero. Prolate functions contain a“bandwidth parameter” $c \ge 0 $ c ≥ 0 whose value is crucial to numerical performance; the prolate functions reduce to Legendre polynomials for $c\,=\,0$ c = 0 . We show that the optimal bandwidth parameter $c$ c not only depends on the number of prolate modes per element $N$ N , but also on the element widths $h$ h . We prove that prolate elements lack $h$ h -convergence for fixed $c$ c in the sense that the error does not go to zero as the element size $h$ h is made smaller and smaller. Furthermore, the theoretical predictions that Chebyshev and Legendre polynomials require $\pi $ π degrees of freedom per wavelength to resolve sinusoidal functions while prolate series need only 2 degrees of freedom per wavelength are asymptotic limits as $N \rightarrow \infty $ N → ∞ ; we investigate the rather different behavior when $N \sim O(4-10)$ N ~ O ( 4 ? 10 ) as appropriate for spectral elements and prolate elements. On the other hand, our investigations show that there are certain combinations of $N,\,h$ N , h and $c>0$ c > 0 where a prolate basis clearly outperforms the Legendre polynomial approximation.  相似文献   

7.
In this paper, We propose a simple and practical method (that works only for triangular fuzzy numbers) to solve an arbitrary fully fuzzy linear system (FFLS) in the form $\widetilde{A}\otimes \widetilde{x}=\widetilde{b},$ where $\widetilde{A}_{n \times n}$ is a fuzzy matrix, $\widetilde{x}$ and $\widetilde{b}$ are n × 1 fuzzy vectors. The idea of the presented method is constructed based on the extending 0-cut and 1-cut solution of original fully fuzzy linear systems (FFLS). We also define a fuzzy solution of FFLS and establish the necessary and sufficient conditions for the uniqueness of a fuzzy solution.  相似文献   

8.
Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct \(f\) crash or \(\lfloor f/2 \rfloor \) Byzantine faults among \(n\) different machines, replication requires \(nf\) backup machines. We present a solution called fusion that requires just \(f\) backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an ( \(f\) , \(m\) )-fusion, which is a set of \(m\) backup machines that can correct \(f\) crash faults or \(\lfloor f/2 \rfloor \) Byzantine faults among a given set of machines. Second, we present an algorithm to generate an ( \(f\) , \(f\) )-fusion for a given set of machines. We ensure that our backups are efficient in terms of the size of their state and event sets. Third, we use locality sensitive hashing for the detection and correction of faults that incurs almost the same overhead as that for replication. We detect Byzantine faults with time complexity \(O(n f)\) on average while we correct crash and Byzantine faults with time complexity \(O(n \rho f)\) with high probability, where \(\rho \) is the average state reduction achieved by fusion. Finally, our evaluation of fusion on the widely used MCNC’91 benchmarks for DFSMs shows that the average state space savings in fusion (over replication) is 38 % (range 0–99 %). To demonstrate the practical use of fusion, we describe its potential application to two areas: sensor networks and the MapReduce framework. In the case of sensor networks a fusion-based solution can lead to significantly fewer sensor-nodes than a replication-based solution. For the MapReduce framework, fusion can reduce the number of map-tasks compared to replication. Hence, fusion results in considerable savings in state space and other resources such as the power needed to run the backups.  相似文献   

9.
We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculative overhead. Unlike previous approaches, our speculation is failure-free, i.e., (1) sequential semantics are maintained, and (2) speed-downs are avoided altogether. On architectures with a SIMD gather-operation for indexed memory loads, our matching operation is fully vectorized. The proposed load-balancing scheme uses an off-line profiling step to determine the matching capacity of each participating processor. Based on matching capacities, DFA matches are load-balanced on inhomogeneous parallel architectures such as cloud computing environments. We evaluated our speculative DFA membership test for a representative set of benchmarks from the Perl-compatible Regular Expression (PCRE) library and the PROSITE protein database. Evaluation was conducted on a 4 CPU (40 cores) shared-memory node of the Intel Academic Program Manycore Testing Lab (Intel MTL), on the Intel AVX2 SDE simulator for 8-way fully vectorized SIMD execution, and on a 20-node (288 cores) cluster on the Amazon EC2 computing cloud. Obtained speedups are on the order of $\mathcal O \left( 1+\frac{|P|-1}{|Q|\cdot \gamma }\right) $ , where $|P|$ denotes the number of processors or SIMD units, $|Q|$ denotes the number of DFA states, and $0<\gamma \le 1$ represents a statically computed DFA property. For all observed cases, we found that $0.02<\gamma <0.47$ . Actual speedups range from 2.3 $\times $ to 38.8 $\times $ for up to 512 DFA states for PCRE, and between 1.3 $\times $ and 19.9 $\times $ for up to 1,288 DFA states for PROSITE on a 40-core MTL node. Speedups on the EC2 computing cloud range from 5.0 $\times $ to 65.8 $\times $ for PCRE, and from 5.0 $\times $ to 138.5 $\times $ for PROSITE. Speedups of our C-based DFA matcher over the Perl-based ScanProsite scan tool range from 559.3 $\times $ to 15079.7 $\times $ on a 40-core MTL node. We show the scalability of our approach for input-sizes of up to 10 GB.  相似文献   

10.
Minghua Lin 《Calcolo》2014,51(3):363-366
This short note proves that if \(A\) is accretive-dissipative, then the growth factor for such \(A\) in Gaussian elimination is less than \(4\) . If \(A\) is a Higham matrix, i.e., the accretive-dissipative matrix \(A\) is complex symmetric, then the growth factor is less than \(2\sqrt{2}\) . The result obtained improves those of George et al. in [Numer. Linear Algebra Appl. 9, 107–114 (2002)] and is one step closer to the final solution of Higham’s conjecture.  相似文献   

11.
In this paper we study decentralized routing in small-world networks that combine a wide variation in node degrees with a notion of spatial embedding. Specifically, we consider a variant of J. Kleinberg’s grid-based small-world model in which (1) the number of long-range edges of each node is not fixed, but is drawn from a power-law probability distribution with exponent parameter \(\alpha \ge 0\) and constant mean, and (2) the long-range edges are considered to be bidirectional for the purposes of routing. This model is motivated by empirical observations indicating that several real networks have degrees that follow a power-law distribution. The measured power-law exponent \(\alpha \) for these networks is often in the range between 2 and 3. For the small-world model we consider, we show that when \(2 < \alpha < 3\) the standard greedy routing algorithm, in which a node forwards the message to its neighbor that is closest to the target in the grid, finishes in an expected number of \(O(\log ^{\alpha -1} n\cdot \log \log n)\) steps, for any source–target pair. This is asymptotically smaller than the \(O(\log ^2 n)\) steps needed in Kleinberg’s original model with the same average degree, and approaches \(O(\log n)\) as \(\alpha \) approaches 2. Further, we show that when \(0\le \alpha < 2\) or \(\alpha \ge 3\) the expected number of steps is \(O(\log ^2 n)\) , while for \(\alpha = 2\) it is \(O(\log ^{4/3} n)\) . We complement these results with lower bounds that match the upper bounds within at most a \(\log \log n\) factor.  相似文献   

12.
Multi-dimensional color image processing has two difficulties: One is that a large number of bits are needed to store multi-dimensional color images, such as, a three-dimensional color image of $1024 \times 1024 \times 1024$ needs $1024 \times 1024 \times 1024 \times 24$  bits. The other one is that the efficiency or accuracy of image segmentation is not high enough for some images to be used in content-based image search. In order to solve the above problems, this paper proposes a new representation for multi-dimensional color image, called a $(n\,+\,1)$ -qubit normal arbitrary quantum superposition state (NAQSS), where $n$ qubits represent colors and coordinates of ${2^n}$ pixels (e.g., represent a three-dimensional color image of $1024 \times 1024 \times 1024$ only using 30 qubits), and the remaining 1 qubit represents an image segmentation information to improve the accuracy of image segmentation. And then we design a general quantum circuit to create the NAQSS state in order to store a multi-dimensional color image in a quantum system and propose a quantum circuit simplification algorithm to reduce the number of the quantum gates of the general quantum circuit. Finally, different strategies to retrieve a whole image or the target sub-image of an image from a quantum system are studied, including Monte Carlo sampling and improved Grover’s algorithm which can search out a coordinate of a target sub-image only running in $O(\sqrt{N/r} )$ where $N$ and $r$ are the numbers of pixels of an image and a target sub-image, respectively.  相似文献   

13.
This paper is devoted to the study of self-referential proofs and/or justifications, i.e., valid proofs that prove statements about these same proofs. The goal is to investigate whether such self-referential justifications are present in the reasoning described by standard modal epistemic logics such as  $\mathsf{S4}$ . We argue that the modal language by itself is too coarse to capture this concept of self-referentiality and that the language of justification logic can serve as an adequate refinement. We consider well-known modal logics of knowledge/belief and show, using explicit justifications, that $\mathsf{S4}$ , $\mathsf{D4}$ , $\mathsf{K4}$ , and  $\mathsf{T}$ with their respective justification counterparts  $\mathsf{LP}$ , $\mathsf{JD4}$ , $\mathsf{J4}$ , and  $\mathsf{JT}$ describe knowledge that is self-referential in some strong sense. We also demonstrate that self-referentiality can be avoided for  $\mathsf{K}$ and  $\mathsf{D}$ . In order to prove the former result, we develop a machinery of minimal evidence functions used to effectively build models for justification logics. We observe that the calculus used to construct the minimal functions axiomatizes the reflected fragments of justification logics. We also discuss difficulties that result from an introduction of negative introspection.  相似文献   

14.
After 100 years of effort, the classification of all the finite subgroups of $SU(3)$ is yet incomplete. The most recently updated list can be found in Ludl (J Phys A Math Theory 44:255204, 2011), where the structure of the series $(C)$ and $(D)$ of $SU(3)$ -subgroups is studied. We provide a minimal set of generators for one of these groups which has order $162$ . These generators appear up to phase as the image of an irreducible unitary braid group representation issued from the Jones–Kauffman version of $SU(2)$ Chern–Simons theory at level $4$ . In light of these new generators, we study the structure of the group in detail and recover the fact that it is isomorphic to the semidirect product $\mathbb Z _9\times \mathbb Z _3\rtimes S_3$ with respect to conjugation.  相似文献   

15.
We consider the problem of doing fast and reliable estimation of the number z of non-zero entries in a sparse boolean matrix product. This problem has applications in databases and computer algebra. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1±ε approximation of z (with small probability of error) in expected time for any $\varepsilon> 4/\sqrt[4]{z}$ . The previously best estimation algorithm, due to Cohen (J. Comput. Syst. Sci. 53(3):441–453, 1997), uses time . We also present a variant using I/Os in expectation in the cache-oblivious model. In contrast to these results, the currently best algorithms for computing a sparse boolean matrix product use time ω(n 4/3) (resp. ω(n 4/3/B) I/Os), even if the result matrix is restricted to nonzero entries. Our algorithm combines the size estimation technique of Bar-Yossef et al. (Proceedings of the 6th International Workshop on Randomization and Approximation Techniques (RANDOM ’02), pp. 1–10, 2002) with a particular class of pairwise independent hash functions that allows the sketch of a set of the form to be computed in expected time and I/Os. We then describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS ’05), pp. 259–270, 2005), and matches a space lower bound shown in that paper. Finally, we present experiments on real-world data sets that show the accuracy of both our methods to be significantly better than the worst-case analysis predicts.  相似文献   

16.
Our study indicates suspended water films, driven by crossed square-wave electric fields, may act as “washers”, “centrifuges” and “mixers”. Based on our recent model successfully describing experimentally observed electro-hydrodynamical flows in water films, we derive conditions for generating specific rotational flows. Our main findings, which are advantageous for microfluidic devices and basic research, are as follows: (1) the film’s rotational patterns depend on the phase difference $\Updelta\varphi$ and frequencies f of the applied fields. (2) For $\Updelta\varphi=\pi/2$ and $f\sim10^{-1}-10^{0}$  Hz, the film exhibits symmetrical reciprocating rotations, i.e. it constitutes a washer. (3) For $\Updelta\varphi=0$ and any f, it exhibits an ordinary anticlockwise (or clockwise) rotation, i.e. it constitutes a centrifuge or motor. (4) For $\Updelta\varphi$ with other values and f below 1 Hz, it exhibits asymmetrical reciprocating rotations, it constitutes an asymmetrical mixer.  相似文献   

17.
In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order $O(R K N^{2} M)$ with $R$ the number of applied regularization parameters, $K$ the number of folds in the validation set, $N$ the number of input features and $M$ the number of data samples in the training set. Our implementation has a computational complexity of the order $O(KN^3)$ . This computational cost is smaller than that of regression without regularization $O(N^2M)$ for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity $O(RKNN_s^3)$ and $O(RKN^3N_r)$ for forward and backward feature selection respectively, with $N_s$ the number of selected features and $N_r$ the number of removed features. This is an order $M$ faster than $O(RKNN_s^3M)$ and $O(RKN^3N_rM)$ for the naive implementation, with $N \ll M$ for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines.  相似文献   

18.
Despite their relative simplicity, correlation matrix memories (CMMs) are an active area of research, as they are able to be integrated into more complex architectures such as the Associative Rule Chaining Architecture (ARCA) “Austin et al. (International conference on artificial neural networks, pp 49–56, 2012)”. In this architecture, CMMs are used effectively in order to reduce the time complexity of a tree search from \(O(b^d)\) to \(O(d)\) —where \(b\) is the branching factor and \(d\) is the depth of the tree. This paper introduces the Extended Neural Associative Memory Language (ENAMeL)—a domain specific language developed to ease development of applications using CMMs. We discuss various considerations required while developing the language, and techniques used to reduce the memory requirements of CMM-based applications. Finally we show that the memory requirements of ARCA when using the ENAMeL interpreter compare favourably to our original results “Austin et al. (International conference on artificial neural networks, pp 49–56, 2012)” run in MATLAB.  相似文献   

19.
In this paper, we consider a popular model for collaborative filtering in recommender systems. In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users). When users (or items) are clustered, our algorithm can recover the rating matrix with \(\omega (MK \log M)\) noisy entries while \(MK\) entries are necessary, where \(K\) is the number of clusters and \(M\) is the number of items. In the case of co-clustering, we prove that \(K^2\) entries are necessary for recovering the rating matrix, and our algorithm achieves this lower bound within a logarithmic factor when \(K\) is sufficiently large. Extensive simulations on Netflix and MovieLens data show that our algorithm outperforms the alternating minimization and the popularity-among-friends algorithm. The performance difference increases even more when noise is added to the datasets.  相似文献   

20.
The Induced Graph Matching problem asks to find \(k\) disjoint induced subgraphs isomorphic to a given graph  \(H\) in a given graph \(G\) such that there are no edges between vertices of different subgraphs. This problem generalizes the classical Independent Set and Induced Matching problems, among several other problems. We show that Induced Graph Matching is fixed-parameter tractable in \(k\) on claw-free graphs when \(H\) is a fixed connected graph, and even admits a polynomial kernel when  \(H\) is a complete graph. Both results rely on a new, strong, and generic algorithmic structure theorem for claw-free graphs. Complementing the above positive results, we prove \(\mathsf {W}[1]\) -hardness of Induced Graph Matching on graphs excluding \(K_{1,4}\) as an induced subgraph, for any fixed complete graph \(H\) . In particular, we show that Independent Set is \(\mathsf {W}[1]\) -hard on \(K_{1,4}\) -free graphs. Finally, we consider the complexity of Induced Graph Matching on a large subclass of claw-free graphs, namely on proper circular-arc graphs. We show that the problem is either polynomial-time solvable or \(\mathsf {NP}\) -complete, depending on the connectivity of \(H\) and the structure of \(G\) .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号