Controlled Markov Processes

Oliver C. Ibe , in Markov Processes for Stochastic Modeling (Second Edition), 2013

13.4.1 Partially Observable Markov Processes

Consider a discrete-state Markov process that can be in one of two states: S 1 and S 2 . Given that it is currently in state S 1 , it will enter state S 1 again with probability p 11 and state S 2 next with probability p 12 = 1 p 11 . Similarly, given that it is currently in state S 2 , it will enter state S 2 again with probability p 22 , and state S 1 next with probability p 21 = 1 p 22 . Assume that the dynamics of the process is being observed through an imperfect medium that allows us to observe two states: Ω 1 and Ω 2 . Let the conditional probability ϕ i j be the probability that the process is actually in state S i given that the observable state is Ω j , i , j = 1 , 2 , where ϕ i 1 + ϕ i 2 = 1 , i = 1 , 2 . Figure 13.4 represents the state transition diagram of the Markov process with partial observability, which is called the POMP.

Figure 13.4. State-transition diagram for the partially observable process.

There are two processes involved in POMP: the core process and the observation process. The core process is the underlying Markov process whose states are the S i and the transition probabilities are the p i j . The observation process is the process whose states Ω i are in the observation space. In the preceding example, one can interpret Ω i by the statement "the core process seems to be in state S i ." Note that the preceding example assumes that there is a one-to-one correspondence between the core process and the observation process, even though we are not able to link Ω i with certainty to S i . In a more general case, the core process can have n states, whereas the observation process has m states, where m n .

The goal in the analysis of the POMP is to estimate the state of the Markov process given an observation or a set of observations. We assume that the observation space has no memory, that is, the observations are not correlated and are thus made independently of one another. The estimation can be based on considering each observation independently, using the Bayes' rule. For the problem shown in Figure 13.4, assume that the steady-state probability that the underlying Markov process is in state S i at a random time n is P [ S 1 ( n ) ] = P 1 and P [ S 2 ( n ) ] = P 2 = 1 P 1 . Let the function

arg max { z } y

denote the argument y that corresponds to the maximum of the expression z. Also, let Ω j ( n ) denote the event that the nth observation is Ω j , and let S ˆ ( Ω j ( n ) ) denote our estimate of the state as a result of Ω j ( n ) . Then the decision criterion becomes

S ˆ ( Ω j ( n ) ) = arg max S i { P [ S i ( n ) | Ω j ( n ) ] } = arg max S i { P [ S i ( n ) ] P [ Ω j ( n ) | S i ( n ) ] P [ Ω j ( n ) ] } = arg max S i { P [ S i ( n ) ] P [ Ω j ( n ) | S i ( n ) ] P [ S i ( n ) ] P [ Ω j ( n ) | S i ( n ) ] + P [ S ¯ i ( n ) ] P [ Ω j ( n ) | S ¯ i ( n ) ] } = arg max S i { P i ϕ i j P i ϕ i j + ( 1 P i ) ϕ k j }

where S k is the state of the underlying Markov process with steady-state probability P k = 1 P i . Applying this to the preceding example, we obtain the sample space as shown in Figure 13.5.

Figure 13.5. Sample space of state estimation process.

If we assume that ϕ 11 > ϕ 12 , ϕ 22 > ϕ 21 and P 1 = P 2 , then the decoding rule becomes that when the observed state is Ω 1 we consider the state of the core process to be S 1 , and when the observed state is Ω 2 we assume that the state of the core process is S 2 .

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012407795900013X

JUMP MARKOV PROCESSES WITH DISCRETE STATES

Daniel T. Gillespie , in Markov Processes, 1992

5.2.B Example: The Completely Homogeneous One-Step Process

An example of a completely homogeneous discrete state Markov process X(t) is the completely homogeneous 'one-step' process, for which the function w(v) is given by

(5.2-13) w ( v ) = { b , for v = + 1 1 b , for v = 1 0 , for v ± 1 ,

b being any constant that satisfies 0≤ b≤1. We see that when this process is in any state n, then its next jump will be either to state n + 1, with probability b, or to state n − 1, with probability 1 - b.

We showed in the preceding subsection that the moment evolution equations for this process are Eqs.(4.4-10) – (4.4-14) (4.4-10) (4.4-11) (4.4-12) (4.4-13) (4.4-14) , and those equations evidently require that we first calculate the constants awk . This is easily done for the characterizing function w(v) specified by Eq. (5.2-13): Using the definition (5.2-12), we calculate

w k = ( 1 ) k w ( 1 ) + ( + 1 ) k w ( + 1 ) = ( 1 ) k ( 1 b ) + b ;

so, upon multiplying through by a, we conclude that

(5.2-14) a w k = { a ( 2 b 1 ) , for k = 1 , 3 , 5 , a , for k = 2 , 4 , 6 ,

Inserting this formula for awk into Eqs.(4.4-10) – (4.4-12) (4.4-10) (4.4-11) (4.4-12) , we obtain explicit differential equations for the moments of X(t) and its integral S(t). Those differential equations are closed, and can be solved exactly. Of most interest of course are the means, variances and covariances of X(t) and S(t). Explicit formulas for those quantities may be obtained by substituting Eq. (5.2-14) into Eqs.(4.4-13) and (4.4-14). In that way we find for X(t) the formulas

(5.2-15a) X k ( t ) = n 0 + a ( 2 b 1 ) ( t t 0 ) ( t 0 t ) ,

(5.2-15b) var { X ( t ) } = a ( t t 0 ) ( t 0 t ) ,

(5.2-15c) cov { X ( t 1 ) , X ( t 2 ) } = a ( t 1 t 0 ) ( t 0 t 1 t 2 ) .

and for S(t) the formulas

(5.2-16a) S ( t ) = n 0 ( t t 0 ) + a ( b 1 2 ) ( t t 0 ) 2 ( t 0 t ) ,

(5.2-16b) var { S ( t ) } = ( a / 3 ) ( t t 0 ) 3 ( t 0 t ) ,

(5.2-16c) cov { S ( t 1 ) , S ( t 2 ) } = a ( t 1 t 0 ) 2 [ ( t 1 t 0 ) / 3 + ( t 2 t 1 ) / 2 ] ( t 0 t 1 t 2 ) .

To see what we can deduce about the form of P(n,t | n 0,t 0) for this process, we turn to Eqs.(5.2-9) and (5.2-10). From the latter we calculate w ( u ) v w ( v ) e i u v = ( 1 b ) e i u ( 1 ) + b e i u ( + 1 ) , or, after expanding the complex exponentials and collecting terms,

(5.2-17) w ( u ) = cos u i ( 2 b 1 ) sin u .

Inserting this into Eq. (5.2-9) gives P ( n , t n 0 , t 0 ) = ( 2 Π ) 1 Π Π d u exp [ i u ( n n 0 ) ] × exp ( a ( t t 0 ) [ cos u i ( 2 b 1 ) sin u 1 ] ) = ( 2 Π ) 1 exp [ a ( t t 0 ) ] Π Π d u exp [ a ( t t 0 ) cos u ] × exp [ i { u ( n n 0 ) a ( 2 b 1 ) ( t t 0 ) sin u } ] .

Expanding the complex exponential, and taking account of the even and odd natures of the real and imaginary parts of the integrand, we conclude that the Markov state density function for the completely homogeneous one-step Markov process defined by Eqs.(5.2-13) is

(5.2-18) P ( n , t n 0 , t 0 ) = Π 1 exp [ a ( t t 0 ) ] 0 Π d u exp [ a ( t t 0 ) cos u ] × cos [ u ( n n 0 ) a ( 2 b 1 ) ( t t 0 ) sin u ] ( t 0 t ) .

Equation (5.2-18) shows that a single ordinary integration is all that is required to evaluate the Markov state density function for a completely homogeneous one-step Markov process. And if that integral cannot be evaluated analytically, it can surely be evaluated numerically. One instance in which we can evaluate the u-integral in Eq. (5.2-18) analytically is for b = 1, in which case every jump that occurs is a jump of size v = + 1. In that case we have (2b − 1) = 1, and the u-integral assumes the form of Eq. (A-10) with the replacements

a a ( t t 0 ) and n ( n n 0 ) ;

the integral therefore has the value { [ a ( t t 0 ) ] ( n n 0 ) Π / ( n n 0 ) ! } , so Eq. (5.2-18) becomes

(5.2-19) P ( n , t n 0 , t 0 ) = e a ( t t 0 ) [ a ( t t 0 ) ] ( n n 0 ) ( n n 0 ) ! ( b = 1 ; t 0 t ; n n 0 ) .

Comparing with Eqs.(1.7-14), we see that P(n,t | n 0,t 0), considered as a function of (nn 0), is the density function of the Poisson random variable with mean and variance a(tt 0). This particular process (with b = 1) is in fact known as the Poisson process.

Figures 5-2 and 5-3 show simulation runs of two completely homogeneous one-step Markov processes X(t) and their integrals S(t). Each simulation was carried out using the Monte Carlo algorithm of Fig.5-1 with t 0 = n 0 = 0 and a = 1.Figure 5-2 has b = 1, thus making X(t) a Poisson process, whereas Fig.5-3 has b = 1/2, thus making X(t) a so-called drunkard's walk process; both of these processes will be discussed further in Chapter 6. The dashed curves in Figs.5-2 and 5-3 show the one-standard deviation envelopes < X(t)> ± sdev{X(t)} and 〈 S(t)〉 + sdev{S(t)}, as calculated from Eqs.(5.2-15) and (5.2-16).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080918372500107

TEMPORALLY HOMOGENEOUS BIRTH-DEATH MARKOV PROCESSES

Daniel T. Gillespie , in Markov Processes, 1992

6.1.B The Next-Jump Density Function and its Simulation Algorithm

The next-jump density function for any temporally homogeneous discrete state Markov process is given by Eq. (5.1-41). When the birth-death characterizing functions (6.1-1) and (6.1-2) are substituted into that equation, it becomes

(6.1-15) p ( τ , v n , t ) = { a ( n ) exp ( a ( n ) τ ) w + ( n ) , if τ 0, v = + 1 and n 0, a ( n ) exp ( a ( n ) τ ) w ( n ) , if τ 0, v = 1 and n 0, 0 , otherwise .

A direct derivation of this result is not difficult, and perhaps is worth reviewing here: Letting p 0(τ) denote the probability that the process, in state n at time t, will not leave that state in the time interval [t,t + τ), then the laws of probability and Eq.(6.1-4) allow us to write

p 0 ( τ + d τ ) = p 0 ( τ ) × [ 1 a ( n ) d τ ] .

This relation implies the differential equation dp 0(τ)/dτ = − a(n)p 0(τ) and the solution to that differential equation for the required initial condition p 0(0) = 1 is

p 0 ( τ ) = exp ( a ( n ) τ ) .

Combining this result with Eqs.(6.1-4) and (6.1-5) using the multiplication law of probability, we conclude that the probability that the process, in state n at time t, will remain in that state until time t + τ, then jump away in the time interval [t + τ, t + τ + dτ), and finally land in state n±1, is

exp ( a ( n ) τ ) × a ( n ) d τ × w ± ( n ) p ( τ , ± 1 n , t ) d τ .

This expression, when coupled with the fact that only jumps with v = ± 1 are possible for a birth-death process, establishes the formula (6.1-15).

If we sum Eq.(6.1-15) over v using Eq.(6.1-3), we find that the marginal density function for τ is the density function of the exponential random variable with decay constant a(n). Thus, the pausing time in state n is the random variable E(a(n)), and in particular,

(6.1-16) 1/a ( n ) = the mean pausing time in state n .

We also find from Eq.(6.1-15) that the conditional density function for the jump vector from state n, i.e., the density function for v conditioned on τ (as well as on n and t), is w ±(n); this of course is just as we should expect on the basis of Eq.(6.1-5).

The Monte Carlo simulation algorithm for a temporally homogeneous birth-death Markov process is based wholly upon the next-jump density function. The simulation algorithm may be most easily obtained here by substituting formulas (6.1-1) and (6.1-2) for a(n,t) and w(v | n,t) into the general discrete state simulation algorithm of Fig.5-1. Because the process X(t) in this case is temporally homogeneous and its jump variable v is confined to the two values ±1, steps 2° and 3° of the algorithm of Fig.5-1 simplify considerably along the lines of figure notes a and b. The resulting temporally homogeneous birth-death simulation algorithm is displayed in Fig.6-1. It produces exact realizations x(t) and s(t) of the process X(t) and its time-integral S(t).

Figure 6-1. Exact Monte Carlo simulation algorithm for a temporally homogeneous birth-death Markov process. The algorithm produces exact sample values x(t) and s(t) of the process X(t) and its time-integral S(t) for all t &gt; t 0.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080918372500119

Special Random Processes

Oliver C. Ibe , in Fundamentals of Applied Probability and Random Processes (Second Edition), 2014

12.6 Markov Processes

Markov processes are widely used in engineering, science, and business modeling. They are used to model systems that have a limited memory of their past. For example, in the gambler's ruin problem discussed earlier in this chapter, the amount of money the gambler will make after n  +   1 games is determined by the amount of money he has made after n games. Any other information is irrelevant in making this prediction. In population growth studies, the population of the next generation depends mainly on the current population and possibly the last few generations.

A random process {X(t), t  T} is called a first-order Markov process if for any t 0  < t 1  <     < t n the conditional CDF of X(t n ) for given values of X(t 0), X(t 1),   …, X(t n    1) depends only on X(t n    1). That is,

(12.23) P [ X t n x n | X ( t n 1 ) x n 1 , X ( t n 2 ) x n 2 , , X ( t 0 ) x 0 ] = P [ X t n x n | X ( t n 1 ) x n 1 ]

This means that, given the present state of the process, the future state is independent of the past. This property is usually referred to as the Markov property. In second-order Markov processes the future state depends on both the current state and the last immediate state, and so on for higher-order Markov processes. In this chapter we consider only first-order Markov processes.

Markov processes are classified according to the nature of the time parameter and the nature of the state space. With respect to state space, a Markov process can be either a discrete-state Markov process or continuous-state Markov process. A discrete-state Markov process is called a Markov chain. Similarly, with respect to time, a Markov process can be either a discrete-time Markov process or a continuous-time Markov process. Thus, there are four basic types of Markov processes:

1.

Discrete-time Markov chain (or discrete-time discrete-state Markov process)

2.

Continuous-time Markov chain (or continuous-time discrete-state Markov process)

3.

Discrete-time Markov process (or discrete-time continuous-state Markov process)

4.

Continuous-time Markov process (or continuous-time continuous-state Markov process)

This classification of Markov processes is illustrated in Figure 12.8.

Figure 12.8. Classification of Markov Processes

The remainder of the discussion in this chapter deals with Markov chains (that is, discrete-state Markov processes).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128008522000122

Structure

Howard Lasnik , Juan Uriagereka , in Philosophy of Linguistics, 2012

2.1 The Upper Limits of Finite-state Description

For illustrative purposes, we begin with some simple examples of finite state formal languages (the first finite, the second infinite), from Chomsky [1957], and graphic representations of the finite state Markov processes generating them:

3.

The man comes / The men come

4.

5.

The man comes / The old man comes / The old old man comes / …

6.

Alongside these, Chomsky introduces some non finite state context-free languages. We present these now, and context-free grammars generating them. Chomsky calls these (Σ, F) grammars, as Σ is a finite set of initial strings and F a finite set of Post-style instruction formulas, each rewriting a single symbol (rewrite rules):

7.

ab, aabb, aaabbb,…, and in general, all sentences consisting of n occurrences of a followed by n occurrences of b and only these.

8.

Σ : S

F : S a S b S a b

9.

aa, bb, abba, baab, aaaa, bbbb, aabbaa, abbbba,…, and in general, all sentences consisting of a string X followed by the 'mirror image' of X (i.e., X in reverse), and only these

10.

Σ : S

F : S a S b S b S b S a a S b b

Chomsky shows how English (portions) cannot be described in finite state terms:

11.
(a)

If S 1, then S 2

(b)

Either S3, or S4

(c)

The man who said that S5, is arriving today

The crucial property of these examples is not merely that there can be a string of unlimited length between the dependent items (if-then, either-or, man-is). There can also be a string of unlimited length between the and man in the finite state language (6). But in (11) we have recursion, while in (6) we merely had iteration. As he puts it (p.22):

In [(11)a], we cannot have 'or' in place of 'then'; in [(11)b], we cannot have 'then' in place of 'or'; in [(11)c], we cannot have 'are' instead of 'is'. In each of these cases there is a dependency between words on opposite sides of the comma (i.e., 'if'-'then', 'either'-'or', 'man'-'is'). But between the interdependent words, in each case, we can insert a declarative sentence S 1, S 3, S 5, and this declarative sentence may in fact be one of [(11)a-c] … It is clear, then, that in English we can find a sequence a+S1 + b, where there is a dependency between a and b, and we can select as S 1 another sequence containing c + S 2 + d, where there is a dependency between c and d, then select as S 2 another sequence of this form, etc. A set of sentences that is constructed in this way … will have all of the mirror image properties of [(9)] which exclude [(9)] from the set of finite state languages.

Σ F grammars are capable of handling languages with the properties of those in (7) and (9). Further, they can easily generate all finite state (formal) languages as well, thus yielding a set-theoretic picture as in Figure 1:

Figure 1. Chomsky Hierarchy up to context-free languages

At this point in his presentation, Chomsky simply abandons finite state description.

Abandoning a description because an alternative is more inclusive (in the sense of Figure 1) is an argument about the system's weak generative capacity; i.e., an extensional set-theoretic characterization. Chomsky later coined the term E-language (where the E stands for 'extensional' and 'external') to denote this conception of language, basically anything other than I-language (implying that there is no utility to this notion). He opposed the concept to the linguistically more relevant I-language, mentioned in the Introduction. From the biolinguistic perspective, the linguist's task is to formulate feasible hypotheses about I-language, to test them against reality (in describing acceptable expressions, how children acquire variants, the specifics of language use, how brains may represent them, and so on). It is an interesting empirical question whether, in I-language terms, a 'more inclusive' description entails abandoning a 'less inclusive' one, when the meaning of 'inclusiveness' is less obvious in terms of a generative procedure.

The descriptive advantage of Post-style PS grammars, as compared to finite state grammars, is that PS grammars can pair up things that are indefinitely far apart, and separated by dependencies without limit. The way they do that is by introducing symbols that are never physically manifested: the non-terminals. That is, PS grammars introduce structure, as graphically represented in the tree diagram of a sentence from language (7), aaabbb, where a, b are the terminal symbols and S is a non-terminal symbol:

12.

Although we return to this fundamental consideration, we want to emphasize that there is no dispute within generative grammar with regards to the significance of this sort of structure, and arguments abound to demonstrate its reality. We will mention just four.

Consider, first, the contrasts in (13):

13.
(a)

(Usually,) cats chase mice.

(b)

Cats chase mice (, usually).

(c)

Cats (usually) chase mice.

(d)

Cats chase (*usually) mice.

The question is why adverbs like usually can be placed in all the positions in (13) except in between the verb and the direct object. An explanation of the contrasts is possible if an adverb must associate to a phrasal constituent, not just a single word. If there is a constituent formed by chase and mice (a verb-phrase, VP), then the modification by the adverb in (13c) is as straightforward as in (13a) or (13b), involving even more complex constituents (entire sentences) that get modified. (13d) fails because no constituent is formed by cats and chased, and therefore the adverb has nothing it can associate to.

Confirmation of the reality of the abstract VP structure stems from the fact that it can be displaced as a unit, which the facts below directly show:

14.
(a)

They say cats chase mice, and chase mice, I've surely seen they can!

(b)

They say cats chase mice, * and cats can, I've surely seen chase mice!

So-called VP fronting is a colloquial way of emphasizing this sort of expression, as in (14a), where the entire constituent chase mice is displaced. A parallel fronting, involving the subject cats and the verb can — though logically imaginable as a source of emphasis of the semantic dependency between cats and their abilities — is unavailable, as (14b) shows. This follows if only phrases can displace and cats can is not a phrase. The issue is purely structural. Had language presented 'subject phrases' (including the subject and the auxiliary verb) as opposed to verb phrases, the paradigms above would reverse.

Asymmetries between subjects and predicates and what they contain are easy to find, and they provide yet another argument for structure. Thus consider (15), which involves an anaphor each other (that is, an element whose antecedent for referential purposes must be grammatically determined, in a sense we are about to investigate):

15.
(a)

Jack and Jill [kissed each other].

(b)

*Each other [kissed Jack and Jill].

Whereas an anaphor in object position can take the subject as its antecedent, the reverse is not true. This is not just a fact about anaphors; asymmetries remain with pronouns

16.
(a)

Jack and Jill said that [someone [kissed them]].

(b)

They said that [someone [kissed Jack and Jill]].

In (16a) the object pronoun can take the names in subject position as antecedent; in contrast, in (16b), with the reverse order, they (now in subject position) must not refer to Jack and Jill in object position. The matter cannot be one of simple precedence in the presentation of names, pronouns and anaphors, for (17), which is very similar to (16b), is in fact fine with their referring (forward) to Jack and Jill:

17.

[Their teacher] said that [someone [kissed Jack and Jill]].

The difference between these two sentences is standardly described in terms of structure too: their is buried inside the phrase their teacher in (17), the subject of the main clause, while this is not true for they in (16b), which is itself the subject of the sentence. As it turns out, anaphors, pronouns and names are sensitive to whether or not there is a direct path between them and their antecedent, which goes by the name of c-command 6 . That notion is totally dependent on a precise phrasal description.

Although the divide between subject and predicate is fundamental in determining phrasal asymmetries, it is not the only one. Anaphoric facts of the abstract sort discussed in [Barss and Lasnik, 1986] directly show asymmetries internal to the verb phrase:

18.
(a)

Jack [showed Jill to herself (in the mirror)].

(b)

*Jack [showed herself to Jill (in the mirror)].

Such asymmetries are prevalent, and become manifest in all sorts of circumstances. For example, while compounds are possible involving a direct object and a verb, as in (20a), whose import is that of (19), they are generally impossible with any other verbal dependent (indirect object, subject, etc.) or with so-called (circumstantial) adjuncts:

19.

Mailmen carry letters (for people) (on weekdays).

20.
(a)

Mailmen are letter-carriers (for people) (on weekdays).

(b)

*Mailmen are people-carriers (letters).

(c)

*Letters are mailman-carrier/carried (for people).

(d)

*Mailmen are weekday-carriers (letters) (for people).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444517470500020

Diffusion Processes

Oliver C. Ibe , in Markov Processes for Stochastic Modeling (Second Edition), 2013

10.8 Problems

10.1

Show that the Fokker–Planck equation

P t = a P x + D 2 2 P x 2

has the solution

P ( x , t ) = 1 2 π D t e ( x a ) 2 / 2 D t

10.2

Consider a particle whose position, x ( t ) , undergoes the diffusion and damping process

d x = μ x d t + ( 1 x 2 ) σ d B

What is the steady-state PDF of x?

10.3

Let { X ( t ) , t 0 } be a continuous-time continuous-state Markov process whose transition PDF f ( x , t ; x 0 , t 0 ) satisfies the following forward Kolmogorov equation:

f t = x ( a ( t , x ) f ) + 1 2 2 x 2 ( b ( t , x ) f )

Assume that a ( t , x ) = a ( t ) and b ( t , x ) = b ( t ) . Show that the forward Kolmogorov equation can be reduced to

f t = 1 2 2 f x 2

Also, show that the corresponding distribution is Gaussian.

10.4

Another way to define a diffusion process is as follows. Let μ ( t , x ) and σ ( t , x ) be continuous functions of t and x, where

0 t E [ σ 2 ( u , B ( u ) ) ] d u <

Define

X ( t ) = X ( 0 ) + 0 t μ ( u , B ( u ) ) d u + 0 t σ ( u , B ( u ) ) d B ( u ) t 0

Then, { X ( t ) , t 0 } is a diffusion process with μ as its drift function and σ the diffusion function. If X ( 0 ) = x 0 , μ ( t , x ) = μ and σ ( t , x ) = σ , show that { X ( t ) } is a Brownian motion with drift, where the drift is the initial state x 0 .

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124077959000104