Sparse Markov Chains for Sequence Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Sparse Markov Chains for Sequence Data

Authors:	Väinö Jääskinen Jie Xiong Jukka Corander Timo Koski

Affiliation:	1. Department of Mathematics and Statistics, University of Helsinki;2. Department of Mathematics, ?bo Akademi University;3. Department of Mathematics, KTH Royal Institute of Technology

Abstract:	Finite memory sources and variable‐length Markov chains have recently gained popularity in data compression and mining, in particular, for applications in bioinformatics and language modelling. Here, we consider denser data compression and prediction with a family of sparse Bayesian predictive models for Markov chains in finite state spaces. Our approach lumps transition probabilities into classes composed of invariant probabilities, such that the resulting models need not have a hierarchical structure as in context tree‐based approaches. This can lead to a substantially higher rate of data compression, and such non‐hierarchical sparse models can be motivated for instance by data dependence structures existing in the bioinformatics context. We describe a Bayesian inference algorithm for learning sparse Markov models through clustering of transition probabilities. Experiments with DNA sequence and protein data show that our approach is competitive in both prediction and classification when compared with several alternative methods on the basis of variable memory length.

Keywords:	Bayesian learning data compression predictive inference Markov chains variable order Markov models

设为首页 | 免责声明 | 关于勤云 | 加入收藏