5 Identification of Patterns in a semi-Markov Chain

Many models have been proposed aiming to identify a pattern in a sequence of symbols under the assumption that the sequence of symbols is described by a Markov chain. The sojourn time in a state in a Markov chain must be dictated by the geometric distribution in a discrete process. This is the main disadvantage of considering Markov chains in real applications. By contrast, semi-Markov processes (SMPs) provide a generalization of their Markov counterparts where the sojourn time in a state can be governed by any distribution function. The aim of this chapter is to provide the probability that a pattern occurs for the first time after n symbols in a semi-Markov sequence. Finally, in order to illustrate our model, we present an example concerning the DNA sequence of bacteriophage the concerned pattern, the SmaI enzyme.

5.1. Introduction

Finding a specific pattern through a sequence of symbols is useful in many areas, for instance in databases it could be the keyword for certain research; in biology it can make reference to a particular illness, in communication it could be a precise word from a conversation, etc. Seeking a particular pattern over a chain formed by a big quantity of symbols is a very cumbersome task and in most cases it cannot be achieved by simple inspection. There are a number of models and algorithms that propose an automatic treatment for the searching problem in a few seconds (see, for example, Aboluion ...

Get Statistical Topics and Stochastic Models for Dependent Data with Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.