In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization. 相似文献
The Quranic Arabic Corpus (http://corpus.quran.com) is a collaboratively constructed linguistic resource initiated at the University of Leeds, with multiple layers of annotation including part-of-speech tagging, morphological segmentation (Dukes and Habash 2010) and syntactic analysis using dependency grammar (Dukes and Buckwalter 2010). The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year-old central religious text of Islam. This project contrasts with other Arabic treebanks by providing a deep linguistic model based on the historical traditional grammar known as i′rāb (?????). By adapting this well-known canon of Quranic grammar into a familiar tagset, it is possible to encourage online annotation by Arabic linguists and Quranic experts. This article presents a new approach to linguistic annotation of an Arabic corpus: online supervised collaboration using a multi-stage approach. The different stages include automatic rule-based tagging, initial manual verification, and online supervised collaborative proofreading. A popular website attracting thousands of visitors per day, the Quranic Arabic Corpus has approximately 100 unpaid volunteer annotators each suggesting corrections to existing linguistic tagging. To ensure a high-quality resource, a small number of expert annotators are promoted to a supervisory role, allowing them to review or veto suggestions made by other collaborators. The Quran also benefits from a large body of existing historical grammatical analysis, which may be leveraged during this review. In this paper we evaluate and report on the effectiveness of the chosen annotation methodology. We also discuss the unique challenges of annotating Quranic Arabic online and describe the custom linguistic software used to aid collaborative annotation. 相似文献
Finite mixture models have been applied for different computer vision, image processing and pattern recognition tasks. The majority of the work done concerning finite mixture models has focused on mixtures for continuous data. However, many applications involve and generate discrete data for which discrete mixtures are better suited. In this paper, we investigate the problem of discrete data modeling using finite mixture models. We propose a novel, well motivated mixture that we call the multinomial generalized Dirichlet mixture. The novel model is compared with other discrete mixtures. We designed experiments involving spatial color image databases modeling and summarization, and text classification to show the robustness, flexibility and merits of our approach. 相似文献
We present a synchronized routing and scheduling problem that arises in the forest industry, as a variation of the log-truck scheduling problem. It combines routing and scheduling of trucks with specific constraints related to the Canadian forestry context. This problem includes aspects such as pick-up and delivery, multiple products, inventory stock, multiple supply points and multiple demand points. We developed a decomposition approach to solve the weekly problem in two phases. In the first phase we use a MIP solver to solve a tactical model that determines the destinations of full truckloads from forest areas to woodmills. In the second phase, we make use of two different methods to route and schedule the daily transportation of logs: the first one consists in using a constraint-based local search approach while the second one is a hybrid approach involving a constraint programming based model and a constraint-based local search model. These approaches have been implemented using COMET2.0. The method, was tested on two industrial cases from forest companies in Canada. 相似文献
Total lipid contents, fatty acid compositions, phenolic profiles and antioxidants activities of seeds from Thapsia garganica, Orlaya maritima, and Retama raetam were investigated. The oil values were more than 26 %, except seeds of R. raetam (ca. 3 %). Unsaturated fatty acids accounted for the majority of the fatty acids (more than 75 %). Oleic and linoleic acid were the predominant fatty acids. Total phenolic compounds (24–104 mg GAE g?1 DR), total flavonoids (4–102 mg QE g?1g DR), total tannins (28–85 mg GAE g?1 DR) and condensed tannins (0.62–131 mg CE g?1 DR) were also determined. The antioxidant activities using different assays were evaluated. The predominant detected classes were the phenolic acids (42–85 %) and the flavonoids (11–48 %). The major phenolic acids were caffeic, trans‐4‐hydroxy‐3‐methoxycinnamic, p‐coumaric, and gallic acid. The predominant flavonoids were quercetin, luteolin, naringin, apigenin, and kaempferol. This study brings attention to the medicinal importance of these species as a source of oil and antioxidant molecules. 相似文献
The Tulul al Ashaqif region is an arid area in northeastern Jordan that contains renewable shallow perched aquifer water.
The study of these aquifers has led to better understanding of the recharge process as well as other hydrological issues related
to management of water resources in similar areas. The use of geographic information system (GIS)-based predictive mapping
to locate areas of high potential for shallow perched aquifer sites is explored in this paper. Knowledge of the hydrologic,
geologic and geomorphic variables influencing the development of shallow aquifer formation is used to produce GIS layers representing
the spatial distribution of those variables. The GIS layers are then analyzed to identify locations where combinations of
environmental variables match patterns observed at known sites. In addition, information can be deduced on the volume of water
that is available and the best locations to site recharge facilities. Moreover, future development of these resources requires
consideration of possible adverse affects of usage on these resources. The database developed can be used for this purpose
as well. 相似文献
Local kaolinitic clay (from the region of Tabarka, Tunisia) was tested as a pozzolanic material. Thermal treatments were performed as a means of activation of the minerals. The phase identification, before and after heat treatment, was studied by X-ray diffraction and differential thermal analysis/thermogravimetric analysis (DTA/TGA).
In order to check the effect of three variables (the calcination temperature, the specific surface of the calcined clay and the percentage of incorporation of the heat treated clay in the formula of the blended cement) on the compressive strength of blended cement mortar bars at 7, 28 and 91 days, a Box–Behnken design was set up. It was concluded that the mechanical properties of the blended cements were mainly governed by the percentage of incorporation and the fineness of the calcined clay. It was also demonstrated that increasing the fineness of the calcined clay allowed for increases in the level of cement substitution. Finally, a blended cement composition has been formulated, with optimal results at calcining temperature 700 °C, 30% of calcined clay ground at a Blaine fineness of 7700 cm2/g. 相似文献