Title: Hidden Markov Models # Hidden Markov Models ## Introduction and Usage Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition, handwritten letter recognition or natural language processing. ## Formal Definition A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our case discrete) random variables O and Y, which change their state sequentially. The variable Y with states \{y_1, ... , y_n\} is called the "hidden variable", since its state is not directly observable. The state of Y changes sequentially with a so called - in our case first-order - Markov Property. This means, that the state change probability of Y only depends on its current state and does not change in time. Formally we write: P(Y(t+1)=y_i|Y(0)...Y(t)) = P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)). The variable O with states \{o_1, ... , o_m\} is called the "observable variable", since its state can be directly observed. O does not have a Markov Property, but its state probability depends statically on the current state of Y. Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional "transition matrix" containing the transition probabilities such that A\[i,j\](i,j\.html) =P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional "emission matrix" containing the observation probabilities such that B\[i,j\]= P(O=o_i|Y=y_j). ## Problems Rabiner \[1\](1\.html) defined three main problems for HMM models: 1. Evaluation: Given a sequence O of observations and a model M, what is the probability P(O|M) that sequence O was generated by model M. The Evaluation problem can be efficiently solved using the Forward algorithm 2. Decoding: Given a sequence O of observations and a model M, what is the most likely sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to generate this sequence. The Decoding problem can be efficiently solved using the Viterbi algorithm. 3. Learning: Given a sequence O of observations, what is the most likely model M*=argmax(M)P(O|M) to generate this sequence. The Learning problem can be efficiently solved using the Baum-Welch algorithm. ## Example To build a Hidden Markov Model and use it to build some predictions, try a simple example like this: Create an input file to train the model. Here we have a sequence drawn from the set of states 0, 1, 2, and 3, separated by space characters. $ echo "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" > hmm-input Now run the baumwelch job to train your model, after first setting MAHOUT_LOCAL to true, to use your local file system. $ export MAHOUT_LOCAL=true $ $MAHOUT_HOME/bin/mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000 Output like the following should appear in the console. Initial probabilities: 0 1 2 1.0 0.0 3.5659361683006626E-251 Transition matrix: 0 1 2 0 6.098919959130616E-5 0.9997275322964165 2.1147850399214744E-4 1 7.404648706054873E-37 0.9086408633885092 0.09135913661149081 2 0.2284374545687356 7.01786289571088E-11 0.7715625453610858 Emission matrix: 0 1 2 3 0 0.9999997858591223 2.0536163836449762E-39 2.1414087769942127E-7 1.052441093535389E-27 1 7.495656581383351E-34 0.2241269055449904 0.4510889999455847 0.32478409450942497 2 0.815051477991782 0.18494852200821799 8.465660634827592E-33 2.8603899591778015E-36 14/03/22 09:52:21 INFO driver.MahoutDriver: Program took 180 ms (Minutes: 0.003) The model trained with the input set now is in the file 'hmm-model', which we can use to build a predicted sequence. $ $MAHOUT_HOME/bin/mahout hmmpredict -m hmm-model -o hmm-predictions -l 10 To see the predictions: $ cat hmm-predictions 0 1 3 3 2 2 2 2 1 2 ## Resources \[1\] Lawrence R. Rabiner (February 1989). "A tutorial on Hidden Markov Models and selected applications in speech recognition". Proceedings of the IEEE 77 (2): 257-286. doi:10.1109/5.18626.