MP5: Weighted Finite State Transducers

In this MP, you will train and test a nested set of two WFSTs: a language model, and a lexicon. The file you're looking at right now (mp5overview.ipynb) is a debugging tool. The file you actually need to complete is mp5.py. The unit tests are provided in run_tests.py and tests/test_visible.py. All of these are available as part of the code package, https://courses.engr.illinois.edu/ece417/fa2020/ece417_20fall_mp5.zip.

In [84]:
import numpy  as np
import matplotlib.figure
import matplotlib.pyplot as plt
%matplotlib inline
In [85]:
import mp5
import importlib
importlib.reload(mp5)
Out[85]:
<module 'mp5' from '/Users/jhasegaw/Dropbox/mark/teaching/ece417/ece417labs/20fall/mp5/src/mp5.py'>

How to debug

In order to reduce the length of this overview, every block below has two options: you can either show your own results, or you can show the distributed solutions. In order to decide which one you want to see, you should just comment out the other one.

In [86]:
import json
with open('solutions.json') as f:
    solutions = json.load(f)

Browsing the Data

This MP will simulate training an automatic speech recognizer from an untranscribed input utterance.

  • The audio file is available in the data folder, but we won't load it.
  • Instead, we'll assume that a phoneme transcription system has already converted it into a sequence of phonemes, and your goal is to figure out what words were spoken.

Here are the phonemes that were "recognized" in the input audio file:

In [87]:
with open('data/transcript.txt') as f:
    S = f.read().strip().split()
print(S)
['h', 'ˈu', 'z', 'w', 'ˈʊ', 'd', 'z', 'ð', 'ˈi', 'z', 'ˈɑ', 'ɹ', 'ˈɑɪ', 'ɵ', 'ˈɪ', 'ŋ', 'k', 'ˈɑɪ', 'n', 'ˈoʊ', 'h', 'ɪ', 'z', 'h', 'ˈaʊ', 's', 'ɪ', 'z', 'ˈɪ', 'n', 'ð', 'ə', 'v', 'ˈɪ', 'l', 'ɪ', 'dʒ', 'ð', 'ˈoʊ', 'h', 'ˈi', 'w', 'ɪ', 'l', 'n', 'ˈɑ', 't', 's', 'ˈi', 'm', 'ˈi', 's', 't', 'ˈɑ', 'p', 'ɪ', 'ŋ', 'h', 'ˈɪ', 'ɹ', 't', 'ˈoʊ', 'w', 'ˈɑ', 'tʃ', 'h', 'ɪ', 'z', 'w', 'ˈʊ', 'd', 'z', 'f', 'ˈɪ', 'l', 'ˈʌ', 'p', 'w', 'ɪ', 'ð', 's', 'n', 'ˈoʊ', 'm', 'ˈi', 'l', 'ˈɪ', 't', 'l̩', 'h', 'ˈɔ', 'ɹ', 's', 'm', 'ə', 's', 't', 'ɵ', 'ˈɪ', 'ŋ', 'k', 'ɪ', 't', 'k', 'w', 'ˈɪ', 'ɹ', 't', 'ˈoʊ', 's', 't', 'ˈɑ', 'p', 'w', 'ɪ', 'ð', 'ˈaʊ', 't', 'ə', 'f', 'ˈɑ', 'ɹ', 'm', 'h', 'ˈaʊ', 's', 'n', 'ˌi', 'ɹ', 'b', 'i', 't', 'w', 'ˈi', 'n', 'ð', 'ə', 'w', 'ˈʊ', 'd', 'z', 'ə', 'n', 'd', 'f', 'ɹ', 'ˈoʊ', 'z', 'n̩', 'l', 'ˈei', 'k', 'ð', 'ə', 'd', 'ˈɑ', 'ɹ', 'k', 'ə', 's', 't', 'ˈi', 'v', 'n', 'ɪ', 'ŋ', 'ə', 'v', 'ð', 'ə', 'j', 'ˌi', 'ɹ', 'h', 'ˈi', 'g', 'ˈɪ', 'v', 'z', 'h', 'ɪ', 'z', 'h', 'ˈɑ', 'ɹ', 'n', 'ɪ', 's', 'b', 'ˈɛ', 'l', 'z', 'ə', 'ʃ', 'ˈei', 'k', 't', 'ˈoʊ', 'ˈæ', 's', 'k', 'ɪ', 'f', 'ð', 'ˈɛ', 'ɹ', 'ɪ', 'z', 's', 'ˈʌ', 'm', 'm', 'ɪ', 's', 't', 'ˈei', 'k', 'ð', 'ə', 'ˈoʊ', 'n', 'l', 'i', 'ˈʌ', 'ð', 'ɚ', 's', 'ˈaʊ', 'n', 'd', 'z', 'ð', 'ə', 's', 'w', 'ˈi', 'p', 'ə', 'v', 'ˈi', 'z', 'i', 'w', 'ˈɪ', 'n', 'd', 'ə', 'n', 'd', 'd', 'ˈaʊ', 'n', 'i', 'f', 'l', 'ˈei', 'k', 'ð', 'ə', 'w', 'ˈʊ', 'd', 'z', 'ˈɑ', 'ɹ', 'l', 'ˈʌ', 'v', 'l', 'i', 'd', 'ˈɑ', 'ɹ', 'k', 'ə', 'n', 'd', 'd', 'ˈi', 'p', 'b', 'ə', 't', 'ˈɑɪ', 'h', 'ˈæ', 'v', 'p', 'ɹ', 'ˈɑ', 'm', 'ə', 's', 'ə', 'z', 't', 'ˈoʊ', 'k', 'ˈi', 'p', 'ə', 'n', 'd', 'm', 'ˈɑɪ', 'l̩', 'z', 't', 'ˈoʊ', 'g', 'ˈoʊ', 'b', 'ɪ', 'f', 'ˈoʊ', 'ɹ', 'ˈɑɪ', 's', 'l', 'ˈi', 'p', 'ə', 'n', 'd', 'm', 'ˈɑɪ', 'l̩', 'z', 't', 'ˈoʊ', 'g', 'ˈoʊ', 'b', 'ɪ', 'f', 'ˈoʊ', 'ɹ', 'ˈɑɪ', 's', 'l', 'ˈi', 'p']

This time, you'll need to write your own code to read in the lexicon, to train a language model, and to read in the transcript as a WFST. Here's some code to load the lexicon. Notice that, for most words, there is more than one possible pronunciation, e.g., because people sometimes pronounce things casually in a reduced fashion:

In [88]:
with open('data/lexicon.txt') as f:
    for line in f:
        ph = line.strip().split()
        print('Word %s is pronounced as %s'%(ph[0],':'.join(ph[1:])))
Word a is pronounced as ə
Word a is pronounced as ˌei
Word about is pronounced as b:ˌaʊ:t
Word about is pronounced as ə:b:ˈaʊ:t
Word aleksandrovich is pronounced as ˌɑ:l:ɛ:k:s:ˈɑ:n:d:ɹ:ɑ:v:ɪ:tʃ
Word alexander is pronounced as ˌæ:l:ɪ:g:z:ˈæ:n:d:ɚ
Word alexander is pronounced as ˌæ:l:ɪ:g:z:ˈæ:n:d:ə:ɹ
Word all is pronounced as ɑ:l
Word all is pronounced as ˈɔ:l
Word and is pronounced as ə:n:d
Word and is pronounced as ˈæ:n:d
Word are is pronounced as ˈɑ:ɹ
Word as is pronounced as ˈæ:z
Word as is pronounced as ˈɛ:z
Word ask is pronounced as ˈæ:s:k
Word at is pronounced as ə:t
Word at is pronounced as ˈæ:t
Word be is pronounced as b:i
Word be is pronounced as b:ˈei
Word be is pronounced as b:ˈi
Word before is pronounced as b:ɪ:f:ˈoʊ:ɹ
Word before is pronounced as b:ɪ:f:ˈɔ:ɹ
Word before is pronounced as b:ˌi:f:ˈɔ:ɹ
Word bells is pronounced as b:ˈɛ:l:z
Word between is pronounced as b:i:t:w:ˈi:n
Word between is pronounced as b:ɪ:t:w:ˈi:n
Word book is pronounced as b:ˈʊ:k
Word broke is pronounced as b:ɹ:ˈoʊ:k
Word but is pronounced as b:ə:t
Word but is pronounced as b:ˈʌ:t
Word by is pronounced as b:ˈɑɪ
Word communist is pronounced as k:ˈɑ:m:j:ə:n:ɪ:s:t
Word communist is pronounced as k:ˈɑ:m:j:ə:n:ə:s:t
Word cried is pronounced as k:ɹ:ˈɑɪ:d
Word czar is pronounced as z:ˈɑ:ɹ
Word dark is pronounced as d:ˈɑ:ɹ:k
Word darkest is pronounced as d:ˈɑ:ɹ:k:ə:s:t
Word deep is pronounced as d:ˈi:p
Word died is pronounced as d:ˈɑɪ:d
Word down is pronounced as d:ˈaʊ:n
Word downy is pronounced as d:ˈaʊ:n:i
Word dramatic is pronounced as d:ɹ:ə:m:ˈæ:ɾ:ɪ:k
Word easy is pronounced as ˈi:z:i
Word eighteen is pronounced as ˈei:t:ˈi:n
Word emperor is pronounced as ˈɛ:m:p:ɚ:ɚ
Word emperor is pronounced as ˈɛ:m:p:ə:ɹ:ə:ɹ
Word end is pronounced as ˈɛ:n:d
Word entire is pronounced as ɛ:n:t:ˈɑɪ:ɚ
Word entire is pronounced as ɪ:n:t:ˈɑɪ:ɚ
Word entire is pronounced as ˌɛ:n:t:ˌɑɪ:ɹ
Word evening is pronounced as ˈi:v:n:ɪ:ŋ
Word family is pronounced as f:ˈæ:m:ə:l:i
Word family is pronounced as f:ˈæ:m:l:i
Word farm is pronounced as f:ˈɑ:ɹ:m
Word father is pronounced as f:ˈɑ:ð:ɚ
Word father is pronounced as f:ˈɑ:ð:ə:ɹ
Word fill is pronounced as f:ˈɪ:l
Word flake is pronounced as f:l:ˈei:k
Word four is pronounced as f:oʊ:ɹ
Word four is pronounced as f:ˈɔ:ɹ
Word frozen is pronounced as f:ɹ:ˈoʊ:z:n̩
Word gives is pronounced as g:ˈɪ:v:z
Word go is pronounced as g:ˈoʊ
Word going is pronounced as g:ˈoʊ:ɪ:n
Word going is pronounced as g:ˈoʊ:ɪ:ŋ
Word had is pronounced as h:ˈæ:d
Word happen is pronounced as h:ˈæ:p:n̩
Word harness is pronounced as h:ˈɑ:ɹ:n:ɪ:s
Word have is pronounced as h:ˈæ:v
Word he is pronounced as h:ˈi
Word he is pronounced as h:ˈʌ
Word help is pronounced as h:ˈɛ:l:p
Word her is pronounced as h:ɚ:ɹ
Word her is pronounced as h:ˈɝ
Word here is pronounced as h:ˈɪ:ɹ
Word him is pronounced as h:ɪ:m
Word him is pronounced as h:ˈɪ:m
Word him is pronounced as ɪ:m
Word his is pronounced as h:ɪ:z
Word horse is pronounced as h:ˈɔ:ɹ:s
Word house is pronounced as h:ˈaʊ:s
Word house is pronounced as h:ˈaʊ:z
Word household is pronounced as h:ˈaʊ:s:h:ˌoʊ:l:d
Word i is pronounced as ˈɑɪ
Word if is pronounced as ɪ:f
Word in is pronounced as ˈɪ:n
Word including is pronounced as ɪ:n:k:l:u:d:ɪ:ŋ
Word including is pronounced as ɪ:n:k:l:ˈu:d:ɪ:ŋ
Word instructive is pronounced as ɪ:n:s:t:ɹ:ˈʌ:k:t:ɪ:v
Word introduced is pronounced as ˌɪ:n:t:ɹ:oʊ:d:ˈu:s:t
Word introduced is pronounced as ˌɪ:n:t:ɹ:ə:d:ˈu:s:t
Word is is pronounced as ɪ:z
Word it is pronounced as ɪ:t
Word its is pronounced as ɪ:t:s
Word july is pronounced as dʒ:u:l:ˈɑɪ
Word july is pronounced as dʒ:ə:l:ˈɑɪ
Word july is pronounced as dʒ:ˌu:l:ˈɑɪ
Word keep is pronounced as k:ˈi:p
Word know is pronounced as n:ˈoʊ
Word lake is pronounced as l:ˈei:k
Word learning is pronounced as l:ˈɝ:n:ɪ:ŋ
Word learning is pronounced as l:ˈɝ:ɹ:n:ɪ:ŋ
Word little is pronounced as l:ˈɪ:t:l̩
Word little is pronounced as l:ˈɪ:ɾ:l̩
Word lovely is pronounced as l:ˈʌ:v:l:i
Word massacre is pronounced as m:ˈæ:s:ə:k:ɚ
Word massacre is pronounced as m:ˈæ:s:ə:k:ə:ɹ
Word me is pronounced as m:ˈi
Word miles is pronounced as m:ˈɑɪ:l̩:z
Word miles is pronounced as m:ˈɑɪ:l:z
Word mistake is pronounced as m:ɪ:s:t:ˈei:k
Word more is pronounced as m:oʊ:ɹ
Word more is pronounced as m:ˈɔ:ɹ
Word movement is pronounced as m:ˈu:v:m:n̩:t
Word must is pronounced as m:ə:s:t
Word must is pronounced as m:ˈʌ:s:t
Word my is pronounced as m:ˈi
Word my is pronounced as m:ˈɑɪ
Word near is pronounced as n:ˌi:ɹ
Word near is pronounced as n:ˈɪ:ɹ
Word never is pronounced as n:ˈɛ:v:ɚ
Word never is pronounced as n:ˈɛ:v:ə:ɹ
Word nicholas is pronounced as n:ˈɪ:k:ə:l:ə:s
Word nicholas is pronounced as n:ˈɪ:k:l:ə:s
Word nineteen is pronounced as n:ˈɑɪ:n:t:ˈi:n
Word ninety is pronounced as n:ˈɑɪ:n:t:i
Word not is pronounced as n:ˈɑ:t
Word november is pronounced as n:oʊ:v:ˈɛ:m:b:ɚ
Word november is pronounced as n:oʊ:v:ˈɛ:m:b:ə:ɹ
Word now is pronounced as n:ˈaʊ
Word of is pronounced as ə:v
Word old is pronounced as ˈoʊ:l:d
Word on is pronounced as ˈɑ:n
Word on is pronounced as ˈɔ:n
Word only is pronounced as ˈoʊ:n:l:i
Word other is pronounced as ˈʌ:ð:ɚ
Word other is pronounced as ˈʌ:ð:ə:ɹ
Word our is pronounced as ˈaʊ:ɚ
Word our is pronounced as ˈaʊ:ɹ
Word our is pronounced as ˈɑ:ɹ
Word personal is pronounced as p:ˈɝ:s:ɪ:n:ɪ:l
Word personal is pronounced as p:ˈɝ:ɹ:s:ə:n:l̩
Word physician is pronounced as f:ə:z:ˈɪ:ʃ:n̩
Word physician is pronounced as f:ɪ:z:ˈɪ:ʃ:n̩
Word promises is pronounced as p:ɹ:ˈɑ:m:ə:s:ə:z
Word queer is pronounced as k:w:ˈɪ:ɹ
Word ran is pronounced as ɹ:ɑ:n
Word ran is pronounced as ɹ:ˈæ:n
Word reaction is pronounced as ɹ:i:ˈæ:k:ʃ:n̩
Word romanov is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:f
Word romanov is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:v
Word rule is pronounced as ɹ:ˈu:l
Word russia is pronounced as ɹ:ˈʌ:ʃ:ə
Word second is pronounced as s:ˈɛ:k:n̩
Word second is pronounced as s:ˈɛ:k:n̩:d
Word see is pronounced as s:ˈi
Word shake is pronounced as ʃ:ˈei:k
Word shoulder is pronounced as ʃ:ˈoʊ:l:d:ɚ
Word shoulder is pronounced as ʃ:ˈoʊ:l:d:ə:ɹ
Word simon is pronounced as s:ˈɑɪ:m:n̩
Word sister is pronounced as s:ˈɪ:s:t:ɚ
Word sister is pronounced as s:ˈɪ:s:t:ə:ɹ
Word six is pronounced as s:ɪ:k:s
Word six is pronounced as s:ˈɪ:k:s
Word sleep is pronounced as s:l:ˈi:p
Word snow is pronounced as s:n:ˈoʊ
Word some is pronounced as s:ˈʌ:m
Word sounds is pronounced as s:ˈaʊ:n:d:z
Word sounds is pronounced as s:ˈaʊ:n:z
Word start is pronounced as s:t:ɑ:t
Word start is pronounced as s:t:ˈɑ:ɹ:t
Word stop is pronounced as s:t:ˈɑ:p
Word stopping is pronounced as s:t:ˈɑ:p:ɪ:ŋ
Word stories is pronounced as s:t:ˈɔ:ɹ:i:z
Word sweep is pronounced as s:w:ˈi:p
Word tears is pronounced as t:ˈɛ:ɹ:z
Word tears is pronounced as t:ˈɪ:ɹ:z
Word telling is pronounced as t:ˈɛ:l:ɪ:ŋ
Word temptation is pronounced as t:ɛ:m:t:ˈei:ʃ:n̩
Word temptation is pronounced as t:ɛ:m:p:t:ˈei:ʃ:n̩
Word that is pronounced as ð:ə:t
Word that is pronounced as ð:ˈæ:t
Word the is pronounced as ð:ə
Word the is pronounced as ð:ˈi
Word the is pronounced as ð:ˈʌ
Word there is pronounced as ð:ˈɛ:ɹ
Word these is pronounced as ð:ˈi:z
Word think is pronounced as ɵ:ˈɪ:ŋ:k
Word though is pronounced as ð:ˈoʊ
Word to is pronounced as t:ˈoʊ
Word to is pronounced as t:ˈu
Word to is pronounced as t:ˈʌ
Word told is pronounced as t:ˈoʊ:l:d
Word triumphant is pronounced as t:ɹ:ɑɪ:ˈʌ:m:f:n̩:t
Word twenty is pronounced as t:w:ˈɛ:n:i
Word twenty is pronounced as t:w:ˈɛ:n:t:i
Word up is pronounced as ˈʌ:p
Word village is pronounced as v:ˈɪ:l:ɪ:dʒ
Word wanted is pronounced as w:ˈɔ:n:ɪ:d
Word wanted is pronounced as w:ˈɑ:n:t:ə:d
Word wanted is pronounced as w:ˈɔ:n:t:ɪ:d
Word was is pronounced as w:ə:z
Word was is pronounced as w:ˈɑ:z
Word was is pronounced as w:ˈɔ:z
Word watch is pronounced as w:ˈɑ:tʃ
Word what is pronounced as h:w:ˈʌ:t
Word what is pronounced as w:ˈʌ:t
Word what is pronounced as w:ˌɑ:t
Word what is pronounced as ə:t
Word when is pronounced as h:w:ˈɛ:n
Word when is pronounced as h:w:ˈɪ:n
Word when is pronounced as w:ˈɛ:n
Word when is pronounced as w:ˈɪ:n
Word when is pronounced as ɛ:n
Word which is pronounced as h:w:ˈɪ:tʃ
Word which is pronounced as w:ˈɪ:tʃ
Word which is pronounced as ɪ:tʃ
Word whose is pronounced as h:ˈu:z
Word will is pronounced as w:ɪ:l
Word wind is pronounced as w:ˈɪ:n:d
Word wind is pronounced as w:ˈɑɪ:n:d
Word with is pronounced as w:ɪ:ð
Word with is pronounced as w:ɪ:ɵ
Word without is pronounced as w:ɪ:ð:ˈaʊ:t
Word woods is pronounced as w:ˈʊ:d:z
Word year is pronounced as j:ˌi:ɹ
Word year is pronounced as j:ˈɪ:ɹ
Word montenori is pronounced as m:ˌɔ:n:t:ə:n:ˈoʊ:ɹ:i
Word nicholai is pronounced as n:ˈɪ:k:ə:l:ɑɪ
Word romanovs is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:f:s
Word sebag is pronounced as s:ˈɛ:b:æ:g

You will also use Laplace smoothing to train a unigram language model. Let's look at the language model training texts:

In [89]:
text = []
with open('data/languagemodeltexts.txt') as f:
    for line in f:
        text.append(line.strip())
print(' '.join(text))
when telling of nicholas the second our temptation is to start at its dramatic end july nineteen eighteen massacre of him his entire family his household help and personal physician by which a triumphant communist movement introduced its rule  but there are more instructive stories about nicholas including his reaction to learning in november eighteen ninety four that his father alexander had died and that he nicholas was now emperor of all russia  as told by simon sebag montenori in his book twenty six year old nicholai aleksandrovich romanov broke down in tears and ran to his sister  what is going to happen to me to my family and to russia he cried on her shoulder i never wanted to be czar 

Read the data as three WFSTs

The first thing you need to do is to read in the lexicon and the transcript in the form of WFSTs, and to train a WFST to represent the language model. Remember that an WFST is composed of six things:

  1. A set of input labels.
  2. A set of output labels.
  3. A set of states.
  4. A specification of the initial state.
  5. A specification of the final states.
  6. A list of transitions.

Most of this MP will assume that most of these things are trivial:

  1. Input labels: we'll assume that any string is possible.
  2. Output labels: any string is possible.
  3. States: non-negative integers, numbered sequentially from zero.
  4. Initial state: we'll always assume 0 is the initial state.
  5. Final states: this needs to be specified for each WFST.
  6. Transitions: OK, this is where the real work will occur.

We will assume that "creating a WFST" is synonymous with "creating a list of transitions." Each transition, $t$, needs to be a python tuple, containing the following five elements, in order:

  1. t[0]=$p[t]$: The preceding state (a non-negative integer)
  2. t[1]=$i[t]$: The input label (a string, or '' for epsilon)
  3. t[2]=$o[t]$: The output label (a string, or '' for epsilon)
  4. t[3]=$w[t]$: The weight (surprisal: a real number, or np.inf)
  5. t[4]=$n[t]$: The next state (a non-negative integer)

First, let's start with the easiest WFST: the one representing the input transcription. The number of transitions should exactly equal $N$, the number of phoneme symbols in the file data/transcript.txt. These transitions should go through a sequence of $N+1$ states, starting with $p[0]=0$, and ending with $n[N]=N$. The input and output labels are the same ($i[t]=o[t]$), and the weights are all zero ($w[t]=0$). So it should look like

  1. (0,'h','h',0,1)
  2. (1,'u','u',0,2) ...

Obviously, the initial state is $q=0$, and the final state is $q=N$. We won't write that down anywhere, we just need to remember it.

In [90]:
importlib.reload(mp5)
#T, Tfinal =mp5.todo_transcript2wfst('data/transcript.txt')
T = solutions['T']
Tfinal = solutions['Tfinal']
print(Tfinal)
print(T)
[342]
[[0, 'h', 'h', 0, 1], [1, 'ˈu', 'ˈu', 0, 2], [2, 'z', 'z', 0, 3], [3, 'w', 'w', 0, 4], [4, 'ˈʊ', 'ˈʊ', 0, 5], [5, 'd', 'd', 0, 6], [6, 'z', 'z', 0, 7], [7, 'ð', 'ð', 0, 8], [8, 'ˈi', 'ˈi', 0, 9], [9, 'z', 'z', 0, 10], [10, 'ˈɑ', 'ˈɑ', 0, 11], [11, 'ɹ', 'ɹ', 0, 12], [12, 'ˈɑɪ', 'ˈɑɪ', 0, 13], [13, 'ɵ', 'ɵ', 0, 14], [14, 'ˈɪ', 'ˈɪ', 0, 15], [15, 'ŋ', 'ŋ', 0, 16], [16, 'k', 'k', 0, 17], [17, 'ˈɑɪ', 'ˈɑɪ', 0, 18], [18, 'n', 'n', 0, 19], [19, 'ˈoʊ', 'ˈoʊ', 0, 20], [20, 'h', 'h', 0, 21], [21, 'ɪ', 'ɪ', 0, 22], [22, 'z', 'z', 0, 23], [23, 'h', 'h', 0, 24], [24, 'ˈaʊ', 'ˈaʊ', 0, 25], [25, 's', 's', 0, 26], [26, 'ɪ', 'ɪ', 0, 27], [27, 'z', 'z', 0, 28], [28, 'ˈɪ', 'ˈɪ', 0, 29], [29, 'n', 'n', 0, 30], [30, 'ð', 'ð', 0, 31], [31, 'ə', 'ə', 0, 32], [32, 'v', 'v', 0, 33], [33, 'ˈɪ', 'ˈɪ', 0, 34], [34, 'l', 'l', 0, 35], [35, 'ɪ', 'ɪ', 0, 36], [36, 'dʒ', 'dʒ', 0, 37], [37, 'ð', 'ð', 0, 38], [38, 'ˈoʊ', 'ˈoʊ', 0, 39], [39, 'h', 'h', 0, 40], [40, 'ˈi', 'ˈi', 0, 41], [41, 'w', 'w', 0, 42], [42, 'ɪ', 'ɪ', 0, 43], [43, 'l', 'l', 0, 44], [44, 'n', 'n', 0, 45], [45, 'ˈɑ', 'ˈɑ', 0, 46], [46, 't', 't', 0, 47], [47, 's', 's', 0, 48], [48, 'ˈi', 'ˈi', 0, 49], [49, 'm', 'm', 0, 50], [50, 'ˈi', 'ˈi', 0, 51], [51, 's', 's', 0, 52], [52, 't', 't', 0, 53], [53, 'ˈɑ', 'ˈɑ', 0, 54], [54, 'p', 'p', 0, 55], [55, 'ɪ', 'ɪ', 0, 56], [56, 'ŋ', 'ŋ', 0, 57], [57, 'h', 'h', 0, 58], [58, 'ˈɪ', 'ˈɪ', 0, 59], [59, 'ɹ', 'ɹ', 0, 60], [60, 't', 't', 0, 61], [61, 'ˈoʊ', 'ˈoʊ', 0, 62], [62, 'w', 'w', 0, 63], [63, 'ˈɑ', 'ˈɑ', 0, 64], [64, 'tʃ', 'tʃ', 0, 65], [65, 'h', 'h', 0, 66], [66, 'ɪ', 'ɪ', 0, 67], [67, 'z', 'z', 0, 68], [68, 'w', 'w', 0, 69], [69, 'ˈʊ', 'ˈʊ', 0, 70], [70, 'd', 'd', 0, 71], [71, 'z', 'z', 0, 72], [72, 'f', 'f', 0, 73], [73, 'ˈɪ', 'ˈɪ', 0, 74], [74, 'l', 'l', 0, 75], [75, 'ˈʌ', 'ˈʌ', 0, 76], [76, 'p', 'p', 0, 77], [77, 'w', 'w', 0, 78], [78, 'ɪ', 'ɪ', 0, 79], [79, 'ð', 'ð', 0, 80], [80, 's', 's', 0, 81], [81, 'n', 'n', 0, 82], [82, 'ˈoʊ', 'ˈoʊ', 0, 83], [83, 'm', 'm', 0, 84], [84, 'ˈi', 'ˈi', 0, 85], [85, 'l', 'l', 0, 86], [86, 'ˈɪ', 'ˈɪ', 0, 87], [87, 't', 't', 0, 88], [88, 'l̩', 'l̩', 0, 89], [89, 'h', 'h', 0, 90], [90, 'ˈɔ', 'ˈɔ', 0, 91], [91, 'ɹ', 'ɹ', 0, 92], [92, 's', 's', 0, 93], [93, 'm', 'm', 0, 94], [94, 'ə', 'ə', 0, 95], [95, 's', 's', 0, 96], [96, 't', 't', 0, 97], [97, 'ɵ', 'ɵ', 0, 98], [98, 'ˈɪ', 'ˈɪ', 0, 99], [99, 'ŋ', 'ŋ', 0, 100], [100, 'k', 'k', 0, 101], [101, 'ɪ', 'ɪ', 0, 102], [102, 't', 't', 0, 103], [103, 'k', 'k', 0, 104], [104, 'w', 'w', 0, 105], [105, 'ˈɪ', 'ˈɪ', 0, 106], [106, 'ɹ', 'ɹ', 0, 107], [107, 't', 't', 0, 108], [108, 'ˈoʊ', 'ˈoʊ', 0, 109], [109, 's', 's', 0, 110], [110, 't', 't', 0, 111], [111, 'ˈɑ', 'ˈɑ', 0, 112], [112, 'p', 'p', 0, 113], [113, 'w', 'w', 0, 114], [114, 'ɪ', 'ɪ', 0, 115], [115, 'ð', 'ð', 0, 116], [116, 'ˈaʊ', 'ˈaʊ', 0, 117], [117, 't', 't', 0, 118], [118, 'ə', 'ə', 0, 119], [119, 'f', 'f', 0, 120], [120, 'ˈɑ', 'ˈɑ', 0, 121], [121, 'ɹ', 'ɹ', 0, 122], [122, 'm', 'm', 0, 123], [123, 'h', 'h', 0, 124], [124, 'ˈaʊ', 'ˈaʊ', 0, 125], [125, 's', 's', 0, 126], [126, 'n', 'n', 0, 127], [127, 'ˌi', 'ˌi', 0, 128], [128, 'ɹ', 'ɹ', 0, 129], [129, 'b', 'b', 0, 130], [130, 'i', 'i', 0, 131], [131, 't', 't', 0, 132], [132, 'w', 'w', 0, 133], [133, 'ˈi', 'ˈi', 0, 134], [134, 'n', 'n', 0, 135], [135, 'ð', 'ð', 0, 136], [136, 'ə', 'ə', 0, 137], [137, 'w', 'w', 0, 138], [138, 'ˈʊ', 'ˈʊ', 0, 139], [139, 'd', 'd', 0, 140], [140, 'z', 'z', 0, 141], [141, 'ə', 'ə', 0, 142], [142, 'n', 'n', 0, 143], [143, 'd', 'd', 0, 144], [144, 'f', 'f', 0, 145], [145, 'ɹ', 'ɹ', 0, 146], [146, 'ˈoʊ', 'ˈoʊ', 0, 147], [147, 'z', 'z', 0, 148], [148, 'n̩', 'n̩', 0, 149], [149, 'l', 'l', 0, 150], [150, 'ˈei', 'ˈei', 0, 151], [151, 'k', 'k', 0, 152], [152, 'ð', 'ð', 0, 153], [153, 'ə', 'ə', 0, 154], [154, 'd', 'd', 0, 155], [155, 'ˈɑ', 'ˈɑ', 0, 156], [156, 'ɹ', 'ɹ', 0, 157], [157, 'k', 'k', 0, 158], [158, 'ə', 'ə', 0, 159], [159, 's', 's', 0, 160], [160, 't', 't', 0, 161], [161, 'ˈi', 'ˈi', 0, 162], [162, 'v', 'v', 0, 163], [163, 'n', 'n', 0, 164], [164, 'ɪ', 'ɪ', 0, 165], [165, 'ŋ', 'ŋ', 0, 166], [166, 'ə', 'ə', 0, 167], [167, 'v', 'v', 0, 168], [168, 'ð', 'ð', 0, 169], [169, 'ə', 'ə', 0, 170], [170, 'j', 'j', 0, 171], [171, 'ˌi', 'ˌi', 0, 172], [172, 'ɹ', 'ɹ', 0, 173], [173, 'h', 'h', 0, 174], [174, 'ˈi', 'ˈi', 0, 175], [175, 'g', 'g', 0, 176], [176, 'ˈɪ', 'ˈɪ', 0, 177], [177, 'v', 'v', 0, 178], [178, 'z', 'z', 0, 179], [179, 'h', 'h', 0, 180], [180, 'ɪ', 'ɪ', 0, 181], [181, 'z', 'z', 0, 182], [182, 'h', 'h', 0, 183], [183, 'ˈɑ', 'ˈɑ', 0, 184], [184, 'ɹ', 'ɹ', 0, 185], [185, 'n', 'n', 0, 186], [186, 'ɪ', 'ɪ', 0, 187], [187, 's', 's', 0, 188], [188, 'b', 'b', 0, 189], [189, 'ˈɛ', 'ˈɛ', 0, 190], [190, 'l', 'l', 0, 191], [191, 'z', 'z', 0, 192], [192, 'ə', 'ə', 0, 193], [193, 'ʃ', 'ʃ', 0, 194], [194, 'ˈei', 'ˈei', 0, 195], [195, 'k', 'k', 0, 196], [196, 't', 't', 0, 197], [197, 'ˈoʊ', 'ˈoʊ', 0, 198], [198, 'ˈæ', 'ˈæ', 0, 199], [199, 's', 's', 0, 200], [200, 'k', 'k', 0, 201], [201, 'ɪ', 'ɪ', 0, 202], [202, 'f', 'f', 0, 203], [203, 'ð', 'ð', 0, 204], [204, 'ˈɛ', 'ˈɛ', 0, 205], [205, 'ɹ', 'ɹ', 0, 206], [206, 'ɪ', 'ɪ', 0, 207], [207, 'z', 'z', 0, 208], [208, 's', 's', 0, 209], [209, 'ˈʌ', 'ˈʌ', 0, 210], [210, 'm', 'm', 0, 211], [211, 'm', 'm', 0, 212], [212, 'ɪ', 'ɪ', 0, 213], [213, 's', 's', 0, 214], [214, 't', 't', 0, 215], [215, 'ˈei', 'ˈei', 0, 216], [216, 'k', 'k', 0, 217], [217, 'ð', 'ð', 0, 218], [218, 'ə', 'ə', 0, 219], [219, 'ˈoʊ', 'ˈoʊ', 0, 220], [220, 'n', 'n', 0, 221], [221, 'l', 'l', 0, 222], [222, 'i', 'i', 0, 223], [223, 'ˈʌ', 'ˈʌ', 0, 224], [224, 'ð', 'ð', 0, 225], [225, 'ɚ', 'ɚ', 0, 226], [226, 's', 's', 0, 227], [227, 'ˈaʊ', 'ˈaʊ', 0, 228], [228, 'n', 'n', 0, 229], [229, 'd', 'd', 0, 230], [230, 'z', 'z', 0, 231], [231, 'ð', 'ð', 0, 232], [232, 'ə', 'ə', 0, 233], [233, 's', 's', 0, 234], [234, 'w', 'w', 0, 235], [235, 'ˈi', 'ˈi', 0, 236], [236, 'p', 'p', 0, 237], [237, 'ə', 'ə', 0, 238], [238, 'v', 'v', 0, 239], [239, 'ˈi', 'ˈi', 0, 240], [240, 'z', 'z', 0, 241], [241, 'i', 'i', 0, 242], [242, 'w', 'w', 0, 243], [243, 'ˈɪ', 'ˈɪ', 0, 244], [244, 'n', 'n', 0, 245], [245, 'd', 'd', 0, 246], [246, 'ə', 'ə', 0, 247], [247, 'n', 'n', 0, 248], [248, 'd', 'd', 0, 249], [249, 'd', 'd', 0, 250], [250, 'ˈaʊ', 'ˈaʊ', 0, 251], [251, 'n', 'n', 0, 252], [252, 'i', 'i', 0, 253], [253, 'f', 'f', 0, 254], [254, 'l', 'l', 0, 255], [255, 'ˈei', 'ˈei', 0, 256], [256, 'k', 'k', 0, 257], [257, 'ð', 'ð', 0, 258], [258, 'ə', 'ə', 0, 259], [259, 'w', 'w', 0, 260], [260, 'ˈʊ', 'ˈʊ', 0, 261], [261, 'd', 'd', 0, 262], [262, 'z', 'z', 0, 263], [263, 'ˈɑ', 'ˈɑ', 0, 264], [264, 'ɹ', 'ɹ', 0, 265], [265, 'l', 'l', 0, 266], [266, 'ˈʌ', 'ˈʌ', 0, 267], [267, 'v', 'v', 0, 268], [268, 'l', 'l', 0, 269], [269, 'i', 'i', 0, 270], [270, 'd', 'd', 0, 271], [271, 'ˈɑ', 'ˈɑ', 0, 272], [272, 'ɹ', 'ɹ', 0, 273], [273, 'k', 'k', 0, 274], [274, 'ə', 'ə', 0, 275], [275, 'n', 'n', 0, 276], [276, 'd', 'd', 0, 277], [277, 'd', 'd', 0, 278], [278, 'ˈi', 'ˈi', 0, 279], [279, 'p', 'p', 0, 280], [280, 'b', 'b', 0, 281], [281, 'ə', 'ə', 0, 282], [282, 't', 't', 0, 283], [283, 'ˈɑɪ', 'ˈɑɪ', 0, 284], [284, 'h', 'h', 0, 285], [285, 'ˈæ', 'ˈæ', 0, 286], [286, 'v', 'v', 0, 287], [287, 'p', 'p', 0, 288], [288, 'ɹ', 'ɹ', 0, 289], [289, 'ˈɑ', 'ˈɑ', 0, 290], [290, 'm', 'm', 0, 291], [291, 'ə', 'ə', 0, 292], [292, 's', 's', 0, 293], [293, 'ə', 'ə', 0, 294], [294, 'z', 'z', 0, 295], [295, 't', 't', 0, 296], [296, 'ˈoʊ', 'ˈoʊ', 0, 297], [297, 'k', 'k', 0, 298], [298, 'ˈi', 'ˈi', 0, 299], [299, 'p', 'p', 0, 300], [300, 'ə', 'ə', 0, 301], [301, 'n', 'n', 0, 302], [302, 'd', 'd', 0, 303], [303, 'm', 'm', 0, 304], [304, 'ˈɑɪ', 'ˈɑɪ', 0, 305], [305, 'l̩', 'l̩', 0, 306], [306, 'z', 'z', 0, 307], [307, 't', 't', 0, 308], [308, 'ˈoʊ', 'ˈoʊ', 0, 309], [309, 'g', 'g', 0, 310], [310, 'ˈoʊ', 'ˈoʊ', 0, 311], [311, 'b', 'b', 0, 312], [312, 'ɪ', 'ɪ', 0, 313], [313, 'f', 'f', 0, 314], [314, 'ˈoʊ', 'ˈoʊ', 0, 315], [315, 'ɹ', 'ɹ', 0, 316], [316, 'ˈɑɪ', 'ˈɑɪ', 0, 317], [317, 's', 's', 0, 318], [318, 'l', 'l', 0, 319], [319, 'ˈi', 'ˈi', 0, 320], [320, 'p', 'p', 0, 321], [321, 'ə', 'ə', 0, 322], [322, 'n', 'n', 0, 323], [323, 'd', 'd', 0, 324], [324, 'm', 'm', 0, 325], [325, 'ˈɑɪ', 'ˈɑɪ', 0, 326], [326, 'l̩', 'l̩', 0, 327], [327, 'z', 'z', 0, 328], [328, 't', 't', 0, 329], [329, 'ˈoʊ', 'ˈoʊ', 0, 330], [330, 'g', 'g', 0, 331], [331, 'ˈoʊ', 'ˈoʊ', 0, 332], [332, 'b', 'b', 0, 333], [333, 'ɪ', 'ɪ', 0, 334], [334, 'f', 'f', 0, 335], [335, 'ˈoʊ', 'ˈoʊ', 0, 336], [336, 'ɹ', 'ɹ', 0, 337], [337, 'ˈɑɪ', 'ˈɑɪ', 0, 338], [338, 's', 's', 0, 339], [339, 'l', 'l', 0, 340], [340, 'ˈi', 'ˈi', 0, 341], [341, 'p', 'p', 0, 342]]

Now let's read the lexicon. Assuming there are no homophones (no pairs of words that sound the same), we can construct a deterministic WFST as follows:

  1. The initial state is $i=0$.
  2. A word containing $N$ phones is expressed as a sequence of $N+1$ transitions.
  3. The first transition in each word starts from state 0.
  4. The first $N$ transitions in each word have the phone as input label, and epsilon (the empty string, '') as output label.
  5. The $N+1$st transition in each word has epsilon ('') as input string, and the word as the output label, and ends in state 0.
  6. Transitions are shared, between words, whenever possible. Two words that have the first $M$ phonemes in common will share the first $M$ transitions, and diverge on the $(M+1)$st transition.
  7. For now, we'll set the edge weights all to 0. We will change that later, during Baum-Welch re-estimation.
  8. In order to make it possible for the autograder to recognize that you've done this correctly, please construct states in the order that they are required. Each new state should be one larger than the most recently constructed state.

So if the first two entries in the lexicon were "a ə", "a ˌei" and "about ə b ˈaʊ t", then the first several transitions in the WFST would be:

  • (0,"ə","",0,1)
  • (1,"","a",0,0)
  • (0,"ˌei","",0,2)
  • (2,"","a",0,0)
  • (1,"b","",0,3)
  • (3,"ˈaʊ","",0,4)
  • (4,"t","",0,5)
  • (5,"","about",0,0)

Here's what that looks like. Actually the third word in the lexicon is "about b ˈaʊ t" not "about ə b ˈaʊ t", so the result is a little different than the one listed above:

In [91]:
importlib.reload(mp5)
#L, Lfinal = mp5.todo_lexicon2wfst('data/lexicon.txt')
L = solutions['L']
Lfinal = solutions['Lfinal']
print(Lfinal)
print(L)
[0]
[[0, 'ə', '', 0, 1], [1, '', 'a', 0, 0], [0, 'ˌei', '', 0, 2], [2, '', 'a', 0, 0], [0, 'b', '', 0, 3], [3, 'ˌaʊ', '', 0, 4], [4, 't', '', 0, 5], [5, '', 'about', 0, 0], [1, 'b', '', 0, 6], [6, 'ˈaʊ', '', 0, 7], [7, 't', '', 0, 8], [8, '', 'about', 0, 0], [0, 'ˌɑ', '', 0, 9], [9, 'l', '', 0, 10], [10, 'ɛ', '', 0, 11], [11, 'k', '', 0, 12], [12, 's', '', 0, 13], [13, 'ˈɑ', '', 0, 14], [14, 'n', '', 0, 15], [15, 'd', '', 0, 16], [16, 'ɹ', '', 0, 17], [17, 'ɑ', '', 0, 18], [18, 'v', '', 0, 19], [19, 'ɪ', '', 0, 20], [20, 'tʃ', '', 0, 21], [21, '', 'aleksandrovich', 0, 0], [0, 'ˌæ', '', 0, 22], [22, 'l', '', 0, 23], [23, 'ɪ', '', 0, 24], [24, 'g', '', 0, 25], [25, 'z', '', 0, 26], [26, 'ˈæ', '', 0, 27], [27, 'n', '', 0, 28], [28, 'd', '', 0, 29], [29, 'ɚ', '', 0, 30], [30, '', 'alexander', 0, 0], [29, 'ə', '', 0, 31], [31, 'ɹ', '', 0, 32], [32, '', 'alexander', 0, 0], [0, 'ɑ', '', 0, 33], [33, 'l', '', 0, 34], [34, '', 'all', 0, 0], [0, 'ˈɔ', '', 0, 35], [35, 'l', '', 0, 36], [36, '', 'all', 0, 0], [1, 'n', '', 0, 37], [37, 'd', '', 0, 38], [38, '', 'and', 0, 0], [0, 'ˈæ', '', 0, 39], [39, 'n', '', 0, 40], [40, 'd', '', 0, 41], [41, '', 'and', 0, 0], [0, 'ˈɑ', '', 0, 42], [42, 'ɹ', '', 0, 43], [43, '', 'are', 0, 0], [39, 'z', '', 0, 44], [44, '', 'as', 0, 0], [0, 'ˈɛ', '', 0, 45], [45, 'z', '', 0, 46], [46, '', 'as', 0, 0], [39, 's', '', 0, 47], [47, 'k', '', 0, 48], [48, '', 'ask', 0, 0], [1, 't', '', 0, 49], [49, '', 'at', 0, 0], [39, 't', '', 0, 50], [50, '', 'at', 0, 0], [3, 'i', '', 0, 51], [51, '', 'be', 0, 0], [3, 'ˈei', '', 0, 52], [52, '', 'be', 0, 0], [3, 'ˈi', '', 0, 53], [53, '', 'be', 0, 0], [3, 'ɪ', '', 0, 54], [54, 'f', '', 0, 55], [55, 'ˈoʊ', '', 0, 56], [56, 'ɹ', '', 0, 57], [57, '', 'before', 0, 0], [55, 'ˈɔ', '', 0, 58], [58, 'ɹ', '', 0, 59], [59, '', 'before', 0, 0], [3, 'ˌi', '', 0, 60], [60, 'f', '', 0, 61], [61, 'ˈɔ', '', 0, 62], [62, 'ɹ', '', 0, 63], [63, '', 'before', 0, 0], [3, 'ˈɛ', '', 0, 64], [64, 'l', '', 0, 65], [65, 'z', '', 0, 66], [66, '', 'bells', 0, 0], [51, 't', '', 0, 67], [67, 'w', '', 0, 68], [68, 'ˈi', '', 0, 69], [69, 'n', '', 0, 70], [70, '', 'between', 0, 0], [54, 't', '', 0, 71], [71, 'w', '', 0, 72], [72, 'ˈi', '', 0, 73], [73, 'n', '', 0, 74], [74, '', 'between', 0, 0], [3, 'ˈʊ', '', 0, 75], [75, 'k', '', 0, 76], [76, '', 'book', 0, 0], [3, 'ɹ', '', 0, 77], [77, 'ˈoʊ', '', 0, 78], [78, 'k', '', 0, 79], [79, '', 'broke', 0, 0], [3, 'ə', '', 0, 80], [80, 't', '', 0, 81], [81, '', 'but', 0, 0], [3, 'ˈʌ', '', 0, 82], [82, 't', '', 0, 83], [83, '', 'but', 0, 0], [3, 'ˈɑɪ', '', 0, 84], [84, '', 'by', 0, 0], [0, 'k', '', 0, 85], [85, 'ˈɑ', '', 0, 86], [86, 'm', '', 0, 87], [87, 'j', '', 0, 88], [88, 'ə', '', 0, 89], [89, 'n', '', 0, 90], [90, 'ɪ', '', 0, 91], [91, 's', '', 0, 92], [92, 't', '', 0, 93], [93, '', 'communist', 0, 0], [90, 'ə', '', 0, 94], [94, 's', '', 0, 95], [95, 't', '', 0, 96], [96, '', 'communist', 0, 0], [85, 'ɹ', '', 0, 97], [97, 'ˈɑɪ', '', 0, 98], [98, 'd', '', 0, 99], [99, '', 'cried', 0, 0], [0, 'z', '', 0, 100], [100, 'ˈɑ', '', 0, 101], [101, 'ɹ', '', 0, 102], [102, '', 'czar', 0, 0], [0, 'd', '', 0, 103], [103, 'ˈɑ', '', 0, 104], [104, 'ɹ', '', 0, 105], [105, 'k', '', 0, 106], [106, '', 'dark', 0, 0], [106, 'ə', '', 0, 107], [107, 's', '', 0, 108], [108, 't', '', 0, 109], [109, '', 'darkest', 0, 0], [103, 'ˈi', '', 0, 110], [110, 'p', '', 0, 111], [111, '', 'deep', 0, 0], [103, 'ˈɑɪ', '', 0, 112], [112, 'd', '', 0, 113], [113, '', 'died', 0, 0], [103, 'ˈaʊ', '', 0, 114], [114, 'n', '', 0, 115], [115, '', 'down', 0, 0], [115, 'i', '', 0, 116], [116, '', 'downy', 0, 0], [103, 'ɹ', '', 0, 117], [117, 'ə', '', 0, 118], [118, 'm', '', 0, 119], [119, 'ˈæ', '', 0, 120], [120, 'ɾ', '', 0, 121], [121, 'ɪ', '', 0, 122], [122, 'k', '', 0, 123], [123, '', 'dramatic', 0, 0], [0, 'ˈi', '', 0, 124], [124, 'z', '', 0, 125], [125, 'i', '', 0, 126], [126, '', 'easy', 0, 0], [0, 'ˈei', '', 0, 127], [127, 't', '', 0, 128], [128, 'ˈi', '', 0, 129], [129, 'n', '', 0, 130], [130, '', 'eighteen', 0, 0], [45, 'm', '', 0, 131], [131, 'p', '', 0, 132], [132, 'ɚ', '', 0, 133], [133, 'ɚ', '', 0, 134], [134, '', 'emperor', 0, 0], [132, 'ə', '', 0, 135], [135, 'ɹ', '', 0, 136], [136, 'ə', '', 0, 137], [137, 'ɹ', '', 0, 138], [138, '', 'emperor', 0, 0], [45, 'n', '', 0, 139], [139, 'd', '', 0, 140], [140, '', 'end', 0, 0], [0, 'ɛ', '', 0, 141], [141, 'n', '', 0, 142], [142, 't', '', 0, 143], [143, 'ˈɑɪ', '', 0, 144], [144, 'ɚ', '', 0, 145], [145, '', 'entire', 0, 0], [0, 'ɪ', '', 0, 146], [146, 'n', '', 0, 147], [147, 't', '', 0, 148], [148, 'ˈɑɪ', '', 0, 149], [149, 'ɚ', '', 0, 150], [150, '', 'entire', 0, 0], [0, 'ˌɛ', '', 0, 151], [151, 'n', '', 0, 152], [152, 't', '', 0, 153], [153, 'ˌɑɪ', '', 0, 154], [154, 'ɹ', '', 0, 155], [155, '', 'entire', 0, 0], [124, 'v', '', 0, 156], [156, 'n', '', 0, 157], [157, 'ɪ', '', 0, 158], [158, 'ŋ', '', 0, 159], [159, '', 'evening', 0, 0], [0, 'f', '', 0, 160], [160, 'ˈæ', '', 0, 161], [161, 'm', '', 0, 162], [162, 'ə', '', 0, 163], [163, 'l', '', 0, 164], [164, 'i', '', 0, 165], [165, '', 'family', 0, 0], [162, 'l', '', 0, 166], [166, 'i', '', 0, 167], [167, '', 'family', 0, 0], [160, 'ˈɑ', '', 0, 168], [168, 'ɹ', '', 0, 169], [169, 'm', '', 0, 170], [170, '', 'farm', 0, 0], [168, 'ð', '', 0, 171], [171, 'ɚ', '', 0, 172], [172, '', 'father', 0, 0], [171, 'ə', '', 0, 173], [173, 'ɹ', '', 0, 174], [174, '', 'father', 0, 0], [160, 'ˈɪ', '', 0, 175], [175, 'l', '', 0, 176], [176, '', 'fill', 0, 0], [160, 'l', '', 0, 177], [177, 'ˈei', '', 0, 178], [178, 'k', '', 0, 179], [179, '', 'flake', 0, 0], [160, 'oʊ', '', 0, 180], [180, 'ɹ', '', 0, 181], [181, '', 'four', 0, 0], [160, 'ˈɔ', '', 0, 182], [182, 'ɹ', '', 0, 183], [183, '', 'four', 0, 0], [160, 'ɹ', '', 0, 184], [184, 'ˈoʊ', '', 0, 185], [185, 'z', '', 0, 186], [186, 'n̩', '', 0, 187], [187, '', 'frozen', 0, 0], [0, 'g', '', 0, 188], [188, 'ˈɪ', '', 0, 189], [189, 'v', '', 0, 190], [190, 'z', '', 0, 191], [191, '', 'gives', 0, 0], [188, 'ˈoʊ', '', 0, 192], [192, '', 'go', 0, 0], [192, 'ɪ', '', 0, 193], [193, 'n', '', 0, 194], [194, '', 'going', 0, 0], [193, 'ŋ', '', 0, 195], [195, '', 'going', 0, 0], [0, 'h', '', 0, 196], [196, 'ˈæ', '', 0, 197], [197, 'd', '', 0, 198], [198, '', 'had', 0, 0], [197, 'p', '', 0, 199], [199, 'n̩', '', 0, 200], [200, '', 'happen', 0, 0], [196, 'ˈɑ', '', 0, 201], [201, 'ɹ', '', 0, 202], [202, 'n', '', 0, 203], [203, 'ɪ', '', 0, 204], [204, 's', '', 0, 205], [205, '', 'harness', 0, 0], [197, 'v', '', 0, 206], [206, '', 'have', 0, 0], [196, 'ˈi', '', 0, 207], [207, '', 'he', 0, 0], [196, 'ˈʌ', '', 0, 208], [208, '', 'he', 0, 0], [196, 'ˈɛ', '', 0, 209], [209, 'l', '', 0, 210], [210, 'p', '', 0, 211], [211, '', 'help', 0, 0], [196, 'ɚ', '', 0, 212], [212, 'ɹ', '', 0, 213], [213, '', 'her', 0, 0], [196, 'ˈɝ', '', 0, 214], [214, '', 'her', 0, 0], [196, 'ˈɪ', '', 0, 215], [215, 'ɹ', '', 0, 216], [216, '', 'here', 0, 0], [196, 'ɪ', '', 0, 217], [217, 'm', '', 0, 218], [218, '', 'him', 0, 0], [215, 'm', '', 0, 219], [219, '', 'him', 0, 0], [146, 'm', '', 0, 220], [220, '', 'him', 0, 0], [217, 'z', '', 0, 221], [221, '', 'his', 0, 0], [196, 'ˈɔ', '', 0, 222], [222, 'ɹ', '', 0, 223], [223, 's', '', 0, 224], [224, '', 'horse', 0, 0], [196, 'ˈaʊ', '', 0, 225], [225, 's', '', 0, 226], [226, '', 'house', 0, 0], [225, 'z', '', 0, 227], [227, '', 'house', 0, 0], [226, 'h', '', 0, 228], [228, 'ˌoʊ', '', 0, 229], [229, 'l', '', 0, 230], [230, 'd', '', 0, 231], [231, '', 'household', 0, 0], [0, 'ˈɑɪ', '', 0, 232], [232, '', 'i', 0, 0], [146, 'f', '', 0, 233], [233, '', 'if', 0, 0], [0, 'ˈɪ', '', 0, 234], [234, 'n', '', 0, 235], [235, '', 'in', 0, 0], [147, 'k', '', 0, 236], [236, 'l', '', 0, 237], [237, 'u', '', 0, 238], [238, 'd', '', 0, 239], [239, 'ɪ', '', 0, 240], [240, 'ŋ', '', 0, 241], [241, '', 'including', 0, 0], [237, 'ˈu', '', 0, 242], [242, 'd', '', 0, 243], [243, 'ɪ', '', 0, 244], [244, 'ŋ', '', 0, 245], [245, '', 'including', 0, 0], [147, 's', '', 0, 246], [246, 't', '', 0, 247], [247, 'ɹ', '', 0, 248], [248, 'ˈʌ', '', 0, 249], [249, 'k', '', 0, 250], [250, 't', '', 0, 251], [251, 'ɪ', '', 0, 252], [252, 'v', '', 0, 253], [253, '', 'instructive', 0, 0], [0, 'ˌɪ', '', 0, 254], [254, 'n', '', 0, 255], [255, 't', '', 0, 256], [256, 'ɹ', '', 0, 257], [257, 'oʊ', '', 0, 258], [258, 'd', '', 0, 259], [259, 'ˈu', '', 0, 260], [260, 's', '', 0, 261], [261, 't', '', 0, 262], [262, '', 'introduced', 0, 0], [257, 'ə', '', 0, 263], [263, 'd', '', 0, 264], [264, 'ˈu', '', 0, 265], [265, 's', '', 0, 266], [266, 't', '', 0, 267], [267, '', 'introduced', 0, 0], [146, 'z', '', 0, 268], [268, '', 'is', 0, 0], [146, 't', '', 0, 269], [269, '', 'it', 0, 0], [269, 's', '', 0, 270], [270, '', 'its', 0, 0], [0, 'dʒ', '', 0, 271], [271, 'u', '', 0, 272], [272, 'l', '', 0, 273], [273, 'ˈɑɪ', '', 0, 274], [274, '', 'july', 0, 0], [271, 'ə', '', 0, 275], [275, 'l', '', 0, 276], [276, 'ˈɑɪ', '', 0, 277], [277, '', 'july', 0, 0], [271, 'ˌu', '', 0, 278], [278, 'l', '', 0, 279], [279, 'ˈɑɪ', '', 0, 280], [280, '', 'july', 0, 0], [85, 'ˈi', '', 0, 281], [281, 'p', '', 0, 282], [282, '', 'keep', 0, 0], [0, 'n', '', 0, 283], [283, 'ˈoʊ', '', 0, 284], [284, '', 'know', 0, 0], [0, 'l', '', 0, 285], [285, 'ˈei', '', 0, 286], [286, 'k', '', 0, 287], [287, '', 'lake', 0, 0], [285, 'ˈɝ', '', 0, 288], [288, 'n', '', 0, 289], [289, 'ɪ', '', 0, 290], [290, 'ŋ', '', 0, 291], [291, '', 'learning', 0, 0], [288, 'ɹ', '', 0, 292], [292, 'n', '', 0, 293], [293, 'ɪ', '', 0, 294], [294, 'ŋ', '', 0, 295], [295, '', 'learning', 0, 0], [285, 'ˈɪ', '', 0, 296], [296, 't', '', 0, 297], [297, 'l̩', '', 0, 298], [298, '', 'little', 0, 0], [296, 'ɾ', '', 0, 299], [299, 'l̩', '', 0, 300], [300, '', 'little', 0, 0], [285, 'ˈʌ', '', 0, 301], [301, 'v', '', 0, 302], [302, 'l', '', 0, 303], [303, 'i', '', 0, 304], [304, '', 'lovely', 0, 0], [0, 'm', '', 0, 305], [305, 'ˈæ', '', 0, 306], [306, 's', '', 0, 307], [307, 'ə', '', 0, 308], [308, 'k', '', 0, 309], [309, 'ɚ', '', 0, 310], [310, '', 'massacre', 0, 0], [309, 'ə', '', 0, 311], [311, 'ɹ', '', 0, 312], [312, '', 'massacre', 0, 0], [305, 'ˈi', '', 0, 313], [313, '', 'me', 0, 0], [305, 'ˈɑɪ', '', 0, 314], [314, 'l̩', '', 0, 315], [315, 'z', '', 0, 316], [316, '', 'miles', 0, 0], [314, 'l', '', 0, 317], [317, 'z', '', 0, 318], [318, '', 'miles', 0, 0], [305, 'ɪ', '', 0, 319], [319, 's', '', 0, 320], [320, 't', '', 0, 321], [321, 'ˈei', '', 0, 322], [322, 'k', '', 0, 323], [323, '', 'mistake', 0, 0], [305, 'oʊ', '', 0, 324], [324, 'ɹ', '', 0, 325], [325, '', 'more', 0, 0], [305, 'ˈɔ', '', 0, 326], [326, 'ɹ', '', 0, 327], [327, '', 'more', 0, 0], [305, 'ˈu', '', 0, 328], [328, 'v', '', 0, 329], [329, 'm', '', 0, 330], [330, 'n̩', '', 0, 331], [331, 't', '', 0, 332], [332, '', 'movement', 0, 0], [305, 'ə', '', 0, 333], [333, 's', '', 0, 334], [334, 't', '', 0, 335], [335, '', 'must', 0, 0], [305, 'ˈʌ', '', 0, 336], [336, 's', '', 0, 337], [337, 't', '', 0, 338], [338, '', 'must', 0, 0], [313, '', 'my', 0, 0], [314, '', 'my', 0, 0], [283, 'ˌi', '', 0, 339], [339, 'ɹ', '', 0, 340], [340, '', 'near', 0, 0], [283, 'ˈɪ', '', 0, 341], [341, 'ɹ', '', 0, 342], [342, '', 'near', 0, 0], [283, 'ˈɛ', '', 0, 343], [343, 'v', '', 0, 344], [344, 'ɚ', '', 0, 345], [345, '', 'never', 0, 0], [344, 'ə', '', 0, 346], [346, 'ɹ', '', 0, 347], [347, '', 'never', 0, 0], [341, 'k', '', 0, 348], [348, 'ə', '', 0, 349], [349, 'l', '', 0, 350], [350, 'ə', '', 0, 351], [351, 's', '', 0, 352], [352, '', 'nicholas', 0, 0], [348, 'l', '', 0, 353], [353, 'ə', '', 0, 354], [354, 's', '', 0, 355], [355, '', 'nicholas', 0, 0], [283, 'ˈɑɪ', '', 0, 356], [356, 'n', '', 0, 357], [357, 't', '', 0, 358], [358, 'ˈi', '', 0, 359], [359, 'n', '', 0, 360], [360, '', 'nineteen', 0, 0], [358, 'i', '', 0, 361], [361, '', 'ninety', 0, 0], [283, 'ˈɑ', '', 0, 362], [362, 't', '', 0, 363], [363, '', 'not', 0, 0], [283, 'oʊ', '', 0, 364], [364, 'v', '', 0, 365], [365, 'ˈɛ', '', 0, 366], [366, 'm', '', 0, 367], [367, 'b', '', 0, 368], [368, 'ɚ', '', 0, 369], [369, '', 'november', 0, 0], [368, 'ə', '', 0, 370], [370, 'ɹ', '', 0, 371], [371, '', 'november', 0, 0], [283, 'ˈaʊ', '', 0, 372], [372, '', 'now', 0, 0], [1, 'v', '', 0, 373], [373, '', 'of', 0, 0], [0, 'ˈoʊ', '', 0, 374], [374, 'l', '', 0, 375], [375, 'd', '', 0, 376], [376, '', 'old', 0, 0], [42, 'n', '', 0, 377], [377, '', 'on', 0, 0], [35, 'n', '', 0, 378], [378, '', 'on', 0, 0], [374, 'n', '', 0, 379], [379, 'l', '', 0, 380], [380, 'i', '', 0, 381], [381, '', 'only', 0, 0], [0, 'ˈʌ', '', 0, 382], [382, 'ð', '', 0, 383], [383, 'ɚ', '', 0, 384], [384, '', 'other', 0, 0], [383, 'ə', '', 0, 385], [385, 'ɹ', '', 0, 386], [386, '', 'other', 0, 0], [0, 'ˈaʊ', '', 0, 387], [387, 'ɚ', '', 0, 388], [388, '', 'our', 0, 0], [387, 'ɹ', '', 0, 389], [389, '', 'our', 0, 0], [43, '', 'our', 0, 0], [0, 'p', '', 0, 390], [390, 'ˈɝ', '', 0, 391], [391, 's', '', 0, 392], [392, 'ɪ', '', 0, 393], [393, 'n', '', 0, 394], [394, 'ɪ', '', 0, 395], [395, 'l', '', 0, 396], [396, '', 'personal', 0, 0], [391, 'ɹ', '', 0, 397], [397, 's', '', 0, 398], [398, 'ə', '', 0, 399], [399, 'n', '', 0, 400], [400, 'l̩', '', 0, 401], [401, '', 'personal', 0, 0], [160, 'ə', '', 0, 402], [402, 'z', '', 0, 403], [403, 'ˈɪ', '', 0, 404], [404, 'ʃ', '', 0, 405], [405, 'n̩', '', 0, 406], [406, '', 'physician', 0, 0], [160, 'ɪ', '', 0, 407], [407, 'z', '', 0, 408], [408, 'ˈɪ', '', 0, 409], [409, 'ʃ', '', 0, 410], [410, 'n̩', '', 0, 411], [411, '', 'physician', 0, 0], [390, 'ɹ', '', 0, 412], [412, 'ˈɑ', '', 0, 413], [413, 'm', '', 0, 414], [414, 'ə', '', 0, 415], [415, 's', '', 0, 416], [416, 'ə', '', 0, 417], [417, 'z', '', 0, 418], [418, '', 'promises', 0, 0], [85, 'w', '', 0, 419], [419, 'ˈɪ', '', 0, 420], [420, 'ɹ', '', 0, 421], [421, '', 'queer', 0, 0], [0, 'ɹ', '', 0, 422], [422, 'ɑ', '', 0, 423], [423, 'n', '', 0, 424], [424, '', 'ran', 0, 0], [422, 'ˈæ', '', 0, 425], [425, 'n', '', 0, 426], [426, '', 'ran', 0, 0], [422, 'i', '', 0, 427], [427, 'ˈæ', '', 0, 428], [428, 'k', '', 0, 429], [429, 'ʃ', '', 0, 430], [430, 'n̩', '', 0, 431], [431, '', 'reaction', 0, 0], [422, 'ˈoʊ', '', 0, 432], [432, 'm', '', 0, 433], [433, 'ə', '', 0, 434], [434, 'n', '', 0, 435], [435, 'ˌɔ', '', 0, 436], [436, 'f', '', 0, 437], [437, '', 'romanov', 0, 0], [436, 'v', '', 0, 438], [438, '', 'romanov', 0, 0], [422, 'ˈu', '', 0, 439], [439, 'l', '', 0, 440], [440, '', 'rule', 0, 0], [422, 'ˈʌ', '', 0, 441], [441, 'ʃ', '', 0, 442], [442, 'ə', '', 0, 443], [443, '', 'russia', 0, 0], [0, 's', '', 0, 444], [444, 'ˈɛ', '', 0, 445], [445, 'k', '', 0, 446], [446, 'n̩', '', 0, 447], [447, '', 'second', 0, 0], [447, 'd', '', 0, 448], [448, '', 'second', 0, 0], [444, 'ˈi', '', 0, 449], [449, '', 'see', 0, 0], [0, 'ʃ', '', 0, 450], [450, 'ˈei', '', 0, 451], [451, 'k', '', 0, 452], [452, '', 'shake', 0, 0], [450, 'ˈoʊ', '', 0, 453], [453, 'l', '', 0, 454], [454, 'd', '', 0, 455], [455, 'ɚ', '', 0, 456], [456, '', 'shoulder', 0, 0], [455, 'ə', '', 0, 457], [457, 'ɹ', '', 0, 458], [458, '', 'shoulder', 0, 0], [444, 'ˈɑɪ', '', 0, 459], [459, 'm', '', 0, 460], [460, 'n̩', '', 0, 461], [461, '', 'simon', 0, 0], [444, 'ˈɪ', '', 0, 462], [462, 's', '', 0, 463], [463, 't', '', 0, 464], [464, 'ɚ', '', 0, 465], [465, '', 'sister', 0, 0], [464, 'ə', '', 0, 466], [466, 'ɹ', '', 0, 467], [467, '', 'sister', 0, 0], [444, 'ɪ', '', 0, 468], [468, 'k', '', 0, 469], [469, 's', '', 0, 470], [470, '', 'six', 0, 0], [462, 'k', '', 0, 471], [471, 's', '', 0, 472], [472, '', 'six', 0, 0], [444, 'l', '', 0, 473], [473, 'ˈi', '', 0, 474], [474, 'p', '', 0, 475], [475, '', 'sleep', 0, 0], [444, 'n', '', 0, 476], [476, 'ˈoʊ', '', 0, 477], [477, '', 'snow', 0, 0], [444, 'ˈʌ', '', 0, 478], [478, 'm', '', 0, 479], [479, '', 'some', 0, 0], [444, 'ˈaʊ', '', 0, 480], [480, 'n', '', 0, 481], [481, 'd', '', 0, 482], [482, 'z', '', 0, 483], [483, '', 'sounds', 0, 0], [481, 'z', '', 0, 484], [484, '', 'sounds', 0, 0], [444, 't', '', 0, 485], [485, 'ɑ', '', 0, 486], [486, 't', '', 0, 487], [487, '', 'start', 0, 0], [485, 'ˈɑ', '', 0, 488], [488, 'ɹ', '', 0, 489], [489, 't', '', 0, 490], [490, '', 'start', 0, 0], [488, 'p', '', 0, 491], [491, '', 'stop', 0, 0], [491, 'ɪ', '', 0, 492], [492, 'ŋ', '', 0, 493], [493, '', 'stopping', 0, 0], [485, 'ˈɔ', '', 0, 494], [494, 'ɹ', '', 0, 495], [495, 'i', '', 0, 496], [496, 'z', '', 0, 497], [497, '', 'stories', 0, 0], [444, 'w', '', 0, 498], [498, 'ˈi', '', 0, 499], [499, 'p', '', 0, 500], [500, '', 'sweep', 0, 0], [0, 't', '', 0, 501], [501, 'ˈɛ', '', 0, 502], [502, 'ɹ', '', 0, 503], [503, 'z', '', 0, 504], [504, '', 'tears', 0, 0], [501, 'ˈɪ', '', 0, 505], [505, 'ɹ', '', 0, 506], [506, 'z', '', 0, 507], [507, '', 'tears', 0, 0], [502, 'l', '', 0, 508], [508, 'ɪ', '', 0, 509], [509, 'ŋ', '', 0, 510], [510, '', 'telling', 0, 0], [501, 'ɛ', '', 0, 511], [511, 'm', '', 0, 512], [512, 't', '', 0, 513], [513, 'ˈei', '', 0, 514], [514, 'ʃ', '', 0, 515], [515, 'n̩', '', 0, 516], [516, '', 'temptation', 0, 0], [512, 'p', '', 0, 517], [517, 't', '', 0, 518], [518, 'ˈei', '', 0, 519], [519, 'ʃ', '', 0, 520], [520, 'n̩', '', 0, 521], [521, '', 'temptation', 0, 0], [0, 'ð', '', 0, 522], [522, 'ə', '', 0, 523], [523, 't', '', 0, 524], [524, '', 'that', 0, 0], [522, 'ˈæ', '', 0, 525], [525, 't', '', 0, 526], [526, '', 'that', 0, 0], [523, '', 'the', 0, 0], [522, 'ˈi', '', 0, 527], [527, '', 'the', 0, 0], [522, 'ˈʌ', '', 0, 528], [528, '', 'the', 0, 0], [522, 'ˈɛ', '', 0, 529], [529, 'ɹ', '', 0, 530], [530, '', 'there', 0, 0], [527, 'z', '', 0, 531], [531, '', 'these', 0, 0], [0, 'ɵ', '', 0, 532], [532, 'ˈɪ', '', 0, 533], [533, 'ŋ', '', 0, 534], [534, 'k', '', 0, 535], [535, '', 'think', 0, 0], [522, 'ˈoʊ', '', 0, 536], [536, '', 'though', 0, 0], [501, 'ˈoʊ', '', 0, 537], [537, '', 'to', 0, 0], [501, 'ˈu', '', 0, 538], [538, '', 'to', 0, 0], [501, 'ˈʌ', '', 0, 539], [539, '', 'to', 0, 0], [537, 'l', '', 0, 540], [540, 'd', '', 0, 541], [541, '', 'told', 0, 0], [501, 'ɹ', '', 0, 542], [542, 'ɑɪ', '', 0, 543], [543, 'ˈʌ', '', 0, 544], [544, 'm', '', 0, 545], [545, 'f', '', 0, 546], [546, 'n̩', '', 0, 547], [547, 't', '', 0, 548], [548, '', 'triumphant', 0, 0], [501, 'w', '', 0, 549], [549, 'ˈɛ', '', 0, 550], [550, 'n', '', 0, 551], [551, 'i', '', 0, 552], [552, '', 'twenty', 0, 0], [551, 't', '', 0, 553], [553, 'i', '', 0, 554], [554, '', 'twenty', 0, 0], [382, 'p', '', 0, 555], [555, '', 'up', 0, 0], [0, 'v', '', 0, 556], [556, 'ˈɪ', '', 0, 557], [557, 'l', '', 0, 558], [558, 'ɪ', '', 0, 559], [559, 'dʒ', '', 0, 560], [560, '', 'village', 0, 0], [0, 'w', '', 0, 561], [561, 'ˈɔ', '', 0, 562], [562, 'n', '', 0, 563], [563, 'ɪ', '', 0, 564], [564, 'd', '', 0, 565], [565, '', 'wanted', 0, 0], [561, 'ˈɑ', '', 0, 566], [566, 'n', '', 0, 567], [567, 't', '', 0, 568], [568, 'ə', '', 0, 569], [569, 'd', '', 0, 570], [570, '', 'wanted', 0, 0], [563, 't', '', 0, 571], [571, 'ɪ', '', 0, 572], [572, 'd', '', 0, 573], [573, '', 'wanted', 0, 0], [561, 'ə', '', 0, 574], [574, 'z', '', 0, 575], [575, '', 'was', 0, 0], [566, 'z', '', 0, 576], [576, '', 'was', 0, 0], [562, 'z', '', 0, 577], [577, '', 'was', 0, 0], [566, 'tʃ', '', 0, 578], [578, '', 'watch', 0, 0], [196, 'w', '', 0, 579], [579, 'ˈʌ', '', 0, 580], [580, 't', '', 0, 581], [581, '', 'what', 0, 0], [561, 'ˈʌ', '', 0, 582], [582, 't', '', 0, 583], [583, '', 'what', 0, 0], [561, 'ˌɑ', '', 0, 584], [584, 't', '', 0, 585], [585, '', 'what', 0, 0], [49, '', 'what', 0, 0], [579, 'ˈɛ', '', 0, 586], [586, 'n', '', 0, 587], [587, '', 'when', 0, 0], [579, 'ˈɪ', '', 0, 588], [588, 'n', '', 0, 589], [589, '', 'when', 0, 0], [561, 'ˈɛ', '', 0, 590], [590, 'n', '', 0, 591], [591, '', 'when', 0, 0], [561, 'ˈɪ', '', 0, 592], [592, 'n', '', 0, 593], [593, '', 'when', 0, 0], [142, '', 'when', 0, 0], [588, 'tʃ', '', 0, 594], [594, '', 'which', 0, 0], [592, 'tʃ', '', 0, 595], [595, '', 'which', 0, 0], [146, 'tʃ', '', 0, 596], [596, '', 'which', 0, 0], [196, 'ˈu', '', 0, 597], [597, 'z', '', 0, 598], [598, '', 'whose', 0, 0], [561, 'ɪ', '', 0, 599], [599, 'l', '', 0, 600], [600, '', 'will', 0, 0], [593, 'd', '', 0, 601], [601, '', 'wind', 0, 0], [561, 'ˈɑɪ', '', 0, 602], [602, 'n', '', 0, 603], [603, 'd', '', 0, 604], [604, '', 'wind', 0, 0], [599, 'ð', '', 0, 605], [605, '', 'with', 0, 0], [599, 'ɵ', '', 0, 606], [606, '', 'with', 0, 0], [605, 'ˈaʊ', '', 0, 607], [607, 't', '', 0, 608], [608, '', 'without', 0, 0], [561, 'ˈʊ', '', 0, 609], [609, 'd', '', 0, 610], [610, 'z', '', 0, 611], [611, '', 'woods', 0, 0], [0, 'j', '', 0, 612], [612, 'ˌi', '', 0, 613], [613, 'ɹ', '', 0, 614], [614, '', 'year', 0, 0], [612, 'ˈɪ', '', 0, 615], [615, 'ɹ', '', 0, 616], [616, '', 'year', 0, 0], [305, 'ˌɔ', '', 0, 617], [617, 'n', '', 0, 618], [618, 't', '', 0, 619], [619, 'ə', '', 0, 620], [620, 'n', '', 0, 621], [621, 'ˈoʊ', '', 0, 622], [622, 'ɹ', '', 0, 623], [623, 'i', '', 0, 624], [624, '', 'montenori', 0, 0], [350, 'ɑɪ', '', 0, 625], [625, '', 'nicholai', 0, 0], [437, 's', '', 0, 626], [626, '', 'romanovs', 0, 0], [445, 'b', '', 0, 627], [627, 'æ', '', 0, 628], [628, 'g', '', 0, 629], [629, '', 'sebag', 0, 0]]

Finally, let's create the language model. We will use a unigram language model, meaning that the probability of every word is independent of the words around it, e.g., the word sequence $\vec{o}=[o_1,o_2,o_3]$ is modeled as

$$p(\vec{o}) = \prod_{k=1}^3 p(o_k)$$

This can be modeled as a WFST with just one state, state 0. Each transition, $t$, goes from $p[t]=0$ to $n[t]=0$. The input label and output label are the same: they both equal the word. The weight on each edge is the negative log probability of the word. So for example, the first several edges would be

  • (0,"a","a",$-\ln p($a$)$,0)
  • (0,"about","about",$-\ln p($about$)$,0)
  • (0,"alexandrovich","alexandrovich",$-\ln p($alexandrovich$)$,0)

The probabilities should be estimated from the file 'data/languagemodeltexts.txt', by counting the number of occurrences of each word, and Laplace-smoothing with a smoothing factor of $1$. So if $C(w)$ is the number of times word $w$ occurs in languagemodeltexts.txt, then

$$p(w) = \frac{1+C(w)}{\sum_{v\in V} (1+C(v))}$$

where $V$ is the set of all distinct words (all words with different orthographic representations) in the lexicon.

In order to make sure that the autograder can grade your code, please make sure that the edges occur in the same sequence as the distinct words in the lexicon.

In [92]:
importlib.reload(mp5)
#G, Gfinal = mp5.todo_unigram('data/languagemodeltexts.txt',L)
G = solutions['G']
Gfinal = solutions['Gfinal']
print(Gfinal)
print(G)
[0]
[[0, 'a', 'a', 4.923623917106626, 0], [0, 'about', 'about', 4.923623917106626, 0], [0, 'aleksandrovich', 'aleksandrovich', 4.923623917106626, 0], [0, 'alexander', 'alexander', 4.923623917106626, 0], [0, 'all', 'all', 4.923623917106626, 0], [0, 'and', 'and', 4.007333185232471, 0], [0, 'are', 'are', 4.923623917106626, 0], [0, 'as', 'as', 4.923623917106626, 0], [0, 'ask', 'ask', 5.616771097666572, 0], [0, 'at', 'at', 4.923623917106626, 0], [0, 'be', 'be', 4.923623917106626, 0], [0, 'before', 'before', 5.616771097666572, 0], [0, 'bells', 'bells', 5.616771097666572, 0], [0, 'between', 'between', 5.616771097666572, 0], [0, 'book', 'book', 4.923623917106626, 0], [0, 'broke', 'broke', 4.923623917106626, 0], [0, 'but', 'but', 4.923623917106626, 0], [0, 'by', 'by', 4.518158808998462, 0], [0, 'communist', 'communist', 4.923623917106626, 0], [0, 'cried', 'cried', 4.923623917106626, 0], [0, 'czar', 'czar', 4.923623917106626, 0], [0, 'dark', 'dark', 5.616771097666572, 0], [0, 'darkest', 'darkest', 5.616771097666572, 0], [0, 'deep', 'deep', 5.616771097666572, 0], [0, 'died', 'died', 4.923623917106626, 0], [0, 'down', 'down', 4.923623917106626, 0], [0, 'downy', 'downy', 5.616771097666572, 0], [0, 'dramatic', 'dramatic', 4.923623917106626, 0], [0, 'easy', 'easy', 5.616771097666572, 0], [0, 'eighteen', 'eighteen', 4.518158808998462, 0], [0, 'emperor', 'emperor', 4.923623917106626, 0], [0, 'end', 'end', 4.923623917106626, 0], [0, 'entire', 'entire', 4.923623917106626, 0], [0, 'evening', 'evening', 5.616771097666572, 0], [0, 'family', 'family', 4.518158808998462, 0], [0, 'farm', 'farm', 5.616771097666572, 0], [0, 'father', 'father', 4.923623917106626, 0], [0, 'fill', 'fill', 5.616771097666572, 0], [0, 'flake', 'flake', 5.616771097666572, 0], [0, 'four', 'four', 4.923623917106626, 0], [0, 'frozen', 'frozen', 5.616771097666572, 0], [0, 'gives', 'gives', 5.616771097666572, 0], [0, 'go', 'go', 5.616771097666572, 0], [0, 'going', 'going', 4.923623917106626, 0], [0, 'had', 'had', 4.923623917106626, 0], [0, 'happen', 'happen', 4.923623917106626, 0], [0, 'harness', 'harness', 5.616771097666572, 0], [0, 'have', 'have', 5.616771097666572, 0], [0, 'he', 'he', 4.518158808998462, 0], [0, 'help', 'help', 4.923623917106626, 0], [0, 'her', 'her', 4.923623917106626, 0], [0, 'here', 'here', 5.616771097666572, 0], [0, 'him', 'him', 4.923623917106626, 0], [0, 'his', 'his', 3.670860948611258, 0], [0, 'horse', 'horse', 5.616771097666572, 0], [0, 'house', 'house', 5.616771097666572, 0], [0, 'household', 'household', 4.923623917106626, 0], [0, 'i', 'i', 4.923623917106626, 0], [0, 'if', 'if', 5.616771097666572, 0], [0, 'in', 'in', 4.230476736546681, 0], [0, 'including', 'including', 4.923623917106626, 0], [0, 'instructive', 'instructive', 4.923623917106626, 0], [0, 'introduced', 'introduced', 4.923623917106626, 0], [0, 'is', 'is', 4.518158808998462, 0], [0, 'it', 'it', 5.616771097666572, 0], [0, 'its', 'its', 4.518158808998462, 0], [0, 'july', 'july', 4.923623917106626, 0], [0, 'keep', 'keep', 5.616771097666572, 0], [0, 'know', 'know', 5.616771097666572, 0], [0, 'lake', 'lake', 5.616771097666572, 0], [0, 'learning', 'learning', 4.923623917106626, 0], [0, 'little', 'little', 5.616771097666572, 0], [0, 'lovely', 'lovely', 5.616771097666572, 0], [0, 'massacre', 'massacre', 4.923623917106626, 0], [0, 'me', 'me', 4.923623917106626, 0], [0, 'miles', 'miles', 5.616771097666572, 0], [0, 'mistake', 'mistake', 5.616771097666572, 0], [0, 'more', 'more', 4.923623917106626, 0], [0, 'movement', 'movement', 4.923623917106626, 0], [0, 'must', 'must', 5.616771097666572, 0], [0, 'my', 'my', 4.923623917106626, 0], [0, 'near', 'near', 5.616771097666572, 0], [0, 'never', 'never', 4.923623917106626, 0], [0, 'nicholas', 'nicholas', 4.230476736546681, 0], [0, 'nineteen', 'nineteen', 4.923623917106626, 0], [0, 'ninety', 'ninety', 4.923623917106626, 0], [0, 'not', 'not', 5.616771097666572, 0], [0, 'november', 'november', 4.923623917106626, 0], [0, 'now', 'now', 4.923623917106626, 0], [0, 'of', 'of', 4.230476736546681, 0], [0, 'old', 'old', 4.923623917106626, 0], [0, 'on', 'on', 4.923623917106626, 0], [0, 'only', 'only', 5.616771097666572, 0], [0, 'other', 'other', 5.616771097666572, 0], [0, 'our', 'our', 4.923623917106626, 0], [0, 'personal', 'personal', 4.923623917106626, 0], [0, 'physician', 'physician', 4.923623917106626, 0], [0, 'promises', 'promises', 5.616771097666572, 0], [0, 'queer', 'queer', 5.616771097666572, 0], [0, 'ran', 'ran', 4.923623917106626, 0], [0, 'reaction', 'reaction', 4.923623917106626, 0], [0, 'romanov', 'romanov', 4.923623917106626, 0], [0, 'rule', 'rule', 4.923623917106626, 0], [0, 'russia', 'russia', 4.518158808998462, 0], [0, 'second', 'second', 4.923623917106626, 0], [0, 'see', 'see', 5.616771097666572, 0], [0, 'shake', 'shake', 5.616771097666572, 0], [0, 'shoulder', 'shoulder', 4.923623917106626, 0], [0, 'simon', 'simon', 4.923623917106626, 0], [0, 'sister', 'sister', 4.923623917106626, 0], [0, 'six', 'six', 4.923623917106626, 0], [0, 'sleep', 'sleep', 5.616771097666572, 0], [0, 'snow', 'snow', 5.616771097666572, 0], [0, 'some', 'some', 5.616771097666572, 0], [0, 'sounds', 'sounds', 5.616771097666572, 0], [0, 'start', 'start', 4.923623917106626, 0], [0, 'stop', 'stop', 5.616771097666572, 0], [0, 'stopping', 'stopping', 5.616771097666572, 0], [0, 'stories', 'stories', 4.923623917106626, 0], [0, 'sweep', 'sweep', 5.616771097666572, 0], [0, 'tears', 'tears', 4.923623917106626, 0], [0, 'telling', 'telling', 4.923623917106626, 0], [0, 'temptation', 'temptation', 4.923623917106626, 0], [0, 'that', 'that', 4.518158808998462, 0], [0, 'the', 'the', 4.923623917106626, 0], [0, 'there', 'there', 4.923623917106626, 0], [0, 'these', 'these', 5.616771097666572, 0], [0, 'think', 'think', 5.616771097666572, 0], [0, 'though', 'though', 5.616771097666572, 0], [0, 'to', 'to', 3.4195465203303517, 0], [0, 'told', 'told', 4.923623917106626, 0], [0, 'triumphant', 'triumphant', 4.923623917106626, 0], [0, 'twenty', 'twenty', 4.923623917106626, 0], [0, 'up', 'up', 5.616771097666572, 0], [0, 'village', 'village', 5.616771097666572, 0], [0, 'wanted', 'wanted', 4.923623917106626, 0], [0, 'was', 'was', 4.923623917106626, 0], [0, 'watch', 'watch', 5.616771097666572, 0], [0, 'what', 'what', 4.923623917106626, 0], [0, 'when', 'when', 4.923623917106626, 0], [0, 'which', 'which', 4.923623917106626, 0], [0, 'whose', 'whose', 5.616771097666572, 0], [0, 'will', 'will', 5.616771097666572, 0], [0, 'wind', 'wind', 5.616771097666572, 0], [0, 'with', 'with', 5.616771097666572, 0], [0, 'without', 'without', 5.616771097666572, 0], [0, 'woods', 'woods', 5.616771097666572, 0], [0, 'year', 'year', 4.923623917106626, 0], [0, 'montenori', 'montenori', 4.923623917106626, 0], [0, 'nicholai', 'nicholai', 4.923623917106626, 0], [0, 'romanovs', 'romanovs', 5.616771097666572, 0], [0, 'sebag', 'sebag', 4.923623917106626, 0]]

Composing and Searching the WFSTs

In order to figure out what words exist in the transcript, we will do the following things:

  1. Compose $L$ with $G$ to create $LG=L\circ G$. If $G$ was constructed from relevant texts, this would have the effect of telling the recognizer which words are most probable. In our case, $G$ is made from irrelevant texts, so it will have the opposite effect.
  2. Compose $T$ with $LG$ to create $TLG=T\circ LG$. This creates a WFST that contains all and only the paths that explain the transcription (if any!)
  3. Use the bestpath algorithm to find the most likely path.

The first thing we need to do is compose $L$ and $G$. You should write todo_fstcompose. With arguments $C=A\circ B$ your code should implement this algorithm:

  1. The set of states in $C$ is the cross product of the states in $A$, and those in $B$. You can use the provided function, 'cross_product'.
  2. For each state $q_A\in A$, for each transition $t_B\in B$ that has an epsilon input ($i[t_B]=\epsilon$), create a transition $t_C$ that doesn't change $q_A$.
  3. For each state $q_B\in B$, for each transition $t_A\in A$ with epsilon output ($o[t_A]=\epsilon$), create a transition $t_C$ that doesn't change $q_B$.
  4. For each pair of transitions $t_A$ and $t_B$ s.t. $o[t_A]=i[t_B]\ne\epsilon$, create a transition $t_C$ that changes both $q_A$ and $q_B$.
  5. The set of final states for $C$ is the set of tuples $(q_A,q_B)$ for which both $q_A$ and $q_B$ are final.
In [93]:
importlib.reload(mp5)
#LG, LGfinal = mp5.todo_fstcompose(L,Lfinal,G,Gfinal)
LG = solutions['LG']
LGfinal = solutions['LGfinal']
print(LGfinal)
print(LG)
[0]
[[0, 'ə', '', 0, 1], [0, 'ˌei', '', 0, 2], [0, 'b', '', 0, 3], [0, 'ˌɑ', '', 0, 9], [0, 'ˌæ', '', 0, 22], [0, 'ɑ', '', 0, 33], [0, 'ˈɔ', '', 0, 35], [0, 'ˈæ', '', 0, 39], [0, 'ˈɑ', '', 0, 42], [0, 'ˈɛ', '', 0, 45], [0, 'k', '', 0, 85], [0, 'z', '', 0, 100], [0, 'd', '', 0, 103], [0, 'ˈi', '', 0, 124], [0, 'ˈei', '', 0, 127], [0, 'ɛ', '', 0, 141], [0, 'ɪ', '', 0, 146], [0, 'ˌɛ', '', 0, 151], [0, 'f', '', 0, 160], [0, 'g', '', 0, 188], [0, 'h', '', 0, 196], [0, 'ˈɑɪ', '', 0, 232], [0, 'ˈɪ', '', 0, 234], [0, 'ˌɪ', '', 0, 254], [0, 'dʒ', '', 0, 271], [0, 'n', '', 0, 283], [0, 'l', '', 0, 285], [0, 'm', '', 0, 305], [0, 'ˈoʊ', '', 0, 374], [0, 'ˈʌ', '', 0, 382], [0, 'ˈaʊ', '', 0, 387], [0, 'p', '', 0, 390], [0, 'ɹ', '', 0, 422], [0, 's', '', 0, 444], [0, 'ʃ', '', 0, 450], [0, 't', '', 0, 501], [0, 'ð', '', 0, 522], [0, 'ɵ', '', 0, 532], [0, 'v', '', 0, 556], [0, 'w', '', 0, 561], [0, 'j', '', 0, 612], [1, '', 'a', 4.923623917106626, 0], [1, 'b', '', 0, 6], [1, 'n', '', 0, 37], [1, 't', '', 0, 49], [1, 'v', '', 0, 373], [2, '', 'a', 4.923623917106626, 0], [3, 'ˌaʊ', '', 0, 4], [3, 'i', '', 0, 51], [3, 'ˈei', '', 0, 52], [3, 'ˈi', '', 0, 53], [3, 'ɪ', '', 0, 54], [3, 'ˌi', '', 0, 60], [3, 'ˈɛ', '', 0, 64], [3, 'ˈʊ', '', 0, 75], [3, 'ɹ', '', 0, 77], [3, 'ə', '', 0, 80], [3, 'ˈʌ', '', 0, 82], [3, 'ˈɑɪ', '', 0, 84], [4, 't', '', 0, 5], [5, '', 'about', 4.923623917106626, 0], [6, 'ˈaʊ', '', 0, 7], [7, 't', '', 0, 8], [8, '', 'about', 4.923623917106626, 0], [9, 'l', '', 0, 10], [10, 'ɛ', '', 0, 11], [11, 'k', '', 0, 12], [12, 's', '', 0, 13], [13, 'ˈɑ', '', 0, 14], [14, 'n', '', 0, 15], [15, 'd', '', 0, 16], [16, 'ɹ', '', 0, 17], [17, 'ɑ', '', 0, 18], [18, 'v', '', 0, 19], [19, 'ɪ', '', 0, 20], [20, 'tʃ', '', 0, 21], [21, '', 'aleksandrovich', 4.923623917106626, 0], [22, 'l', '', 0, 23], [23, 'ɪ', '', 0, 24], [24, 'g', '', 0, 25], [25, 'z', '', 0, 26], [26, 'ˈæ', '', 0, 27], [27, 'n', '', 0, 28], [28, 'd', '', 0, 29], [29, 'ɚ', '', 0, 30], [29, 'ə', '', 0, 31], [30, '', 'alexander', 4.923623917106626, 0], [31, 'ɹ', '', 0, 32], [32, '', 'alexander', 4.923623917106626, 0], [33, 'l', '', 0, 34], [34, '', 'all', 4.923623917106626, 0], [35, 'l', '', 0, 36], [35, 'n', '', 0, 378], [36, '', 'all', 4.923623917106626, 0], [37, 'd', '', 0, 38], [38, '', 'and', 4.007333185232471, 0], [39, 'n', '', 0, 40], [39, 'z', '', 0, 44], [39, 's', '', 0, 47], [39, 't', '', 0, 50], [40, 'd', '', 0, 41], [41, '', 'and', 4.007333185232471, 0], [42, 'ɹ', '', 0, 43], [42, 'n', '', 0, 377], [43, '', 'our', 4.923623917106626, 0], [43, '', 'are', 4.923623917106626, 0], [44, '', 'as', 4.923623917106626, 0], [45, 'z', '', 0, 46], [45, 'm', '', 0, 131], [45, 'n', '', 0, 139], [46, '', 'as', 4.923623917106626, 0], [47, 'k', '', 0, 48], [48, '', 'ask', 5.616771097666572, 0], [49, '', 'what', 4.923623917106626, 0], [49, '', 'at', 4.923623917106626, 0], [50, '', 'at', 4.923623917106626, 0], [51, '', 'be', 4.923623917106626, 0], [51, 't', '', 0, 67], [52, '', 'be', 4.923623917106626, 0], [53, '', 'be', 4.923623917106626, 0], [54, 'f', '', 0, 55], [54, 't', '', 0, 71], [55, 'ˈoʊ', '', 0, 56], [55, 'ˈɔ', '', 0, 58], [56, 'ɹ', '', 0, 57], [57, '', 'before', 5.616771097666572, 0], [58, 'ɹ', '', 0, 59], [59, '', 'before', 5.616771097666572, 0], [60, 'f', '', 0, 61], [61, 'ˈɔ', '', 0, 62], [62, 'ɹ', '', 0, 63], [63, '', 'before', 5.616771097666572, 0], [64, 'l', '', 0, 65], [65, 'z', '', 0, 66], [66, '', 'bells', 5.616771097666572, 0], [67, 'w', '', 0, 68], [68, 'ˈi', '', 0, 69], [69, 'n', '', 0, 70], [70, '', 'between', 5.616771097666572, 0], [71, 'w', '', 0, 72], [72, 'ˈi', '', 0, 73], [73, 'n', '', 0, 74], [74, '', 'between', 5.616771097666572, 0], [75, 'k', '', 0, 76], [76, '', 'book', 4.923623917106626, 0], [77, 'ˈoʊ', '', 0, 78], [78, 'k', '', 0, 79], [79, '', 'broke', 4.923623917106626, 0], [80, 't', '', 0, 81], [81, '', 'but', 4.923623917106626, 0], [82, 't', '', 0, 83], [83, '', 'but', 4.923623917106626, 0], [84, '', 'by', 4.518158808998462, 0], [85, 'ˈɑ', '', 0, 86], [85, 'ɹ', '', 0, 97], [85, 'ˈi', '', 0, 281], [85, 'w', '', 0, 419], [86, 'm', '', 0, 87], [87, 'j', '', 0, 88], [88, 'ə', '', 0, 89], [89, 'n', '', 0, 90], [90, 'ɪ', '', 0, 91], [90, 'ə', '', 0, 94], [91, 's', '', 0, 92], [92, 't', '', 0, 93], [93, '', 'communist', 4.923623917106626, 0], [94, 's', '', 0, 95], [95, 't', '', 0, 96], [96, '', 'communist', 4.923623917106626, 0], [97, 'ˈɑɪ', '', 0, 98], [98, 'd', '', 0, 99], [99, '', 'cried', 4.923623917106626, 0], [100, 'ˈɑ', '', 0, 101], [101, 'ɹ', '', 0, 102], [102, '', 'czar', 4.923623917106626, 0], [103, 'ˈɑ', '', 0, 104], [103, 'ˈi', '', 0, 110], [103, 'ˈɑɪ', '', 0, 112], [103, 'ˈaʊ', '', 0, 114], [103, 'ɹ', '', 0, 117], [104, 'ɹ', '', 0, 105], [105, 'k', '', 0, 106], [106, '', 'dark', 5.616771097666572, 0], [106, 'ə', '', 0, 107], [107, 's', '', 0, 108], [108, 't', '', 0, 109], [109, '', 'darkest', 5.616771097666572, 0], [110, 'p', '', 0, 111], [111, '', 'deep', 5.616771097666572, 0], [112, 'd', '', 0, 113], [113, '', 'died', 4.923623917106626, 0], [114, 'n', '', 0, 115], [115, '', 'down', 4.923623917106626, 0], [115, 'i', '', 0, 116], [116, '', 'downy', 5.616771097666572, 0], [117, 'ə', '', 0, 118], [118, 'm', '', 0, 119], [119, 'ˈæ', '', 0, 120], [120, 'ɾ', '', 0, 121], [121, 'ɪ', '', 0, 122], [122, 'k', '', 0, 123], [123, '', 'dramatic', 4.923623917106626, 0], [124, 'z', '', 0, 125], [124, 'v', '', 0, 156], [125, 'i', '', 0, 126], [126, '', 'easy', 5.616771097666572, 0], [127, 't', '', 0, 128], [128, 'ˈi', '', 0, 129], [129, 'n', '', 0, 130], [130, '', 'eighteen', 4.518158808998462, 0], [131, 'p', '', 0, 132], [132, 'ɚ', '', 0, 133], [132, 'ə', '', 0, 135], [133, 'ɚ', '', 0, 134], [134, '', 'emperor', 4.923623917106626, 0], [135, 'ɹ', '', 0, 136], [136, 'ə', '', 0, 137], [137, 'ɹ', '', 0, 138], [138, '', 'emperor', 4.923623917106626, 0], [139, 'd', '', 0, 140], [140, '', 'end', 4.923623917106626, 0], [141, 'n', '', 0, 142], [142, '', 'when', 4.923623917106626, 0], [142, 't', '', 0, 143], [143, 'ˈɑɪ', '', 0, 144], [144, 'ɚ', '', 0, 145], [145, '', 'entire', 4.923623917106626, 0], [146, 'n', '', 0, 147], [146, 'm', '', 0, 220], [146, 'f', '', 0, 233], [146, 'z', '', 0, 268], [146, 't', '', 0, 269], [146, 'tʃ', '', 0, 596], [147, 't', '', 0, 148], [147, 'k', '', 0, 236], [147, 's', '', 0, 246], [148, 'ˈɑɪ', '', 0, 149], [149, 'ɚ', '', 0, 150], [150, '', 'entire', 4.923623917106626, 0], [151, 'n', '', 0, 152], [152, 't', '', 0, 153], [153, 'ˌɑɪ', '', 0, 154], [154, 'ɹ', '', 0, 155], [155, '', 'entire', 4.923623917106626, 0], [156, 'n', '', 0, 157], [157, 'ɪ', '', 0, 158], [158, 'ŋ', '', 0, 159], [159, '', 'evening', 5.616771097666572, 0], [160, 'ˈæ', '', 0, 161], [160, 'ˈɑ', '', 0, 168], [160, 'ˈɪ', '', 0, 175], [160, 'l', '', 0, 177], [160, 'oʊ', '', 0, 180], [160, 'ˈɔ', '', 0, 182], [160, 'ɹ', '', 0, 184], [160, 'ə', '', 0, 402], [160, 'ɪ', '', 0, 407], [161, 'm', '', 0, 162], [162, 'ə', '', 0, 163], [162, 'l', '', 0, 166], [163, 'l', '', 0, 164], [164, 'i', '', 0, 165], [165, '', 'family', 4.518158808998462, 0], [166, 'i', '', 0, 167], [167, '', 'family', 4.518158808998462, 0], [168, 'ɹ', '', 0, 169], [168, 'ð', '', 0, 171], [169, 'm', '', 0, 170], [170, '', 'farm', 5.616771097666572, 0], [171, 'ɚ', '', 0, 172], [171, 'ə', '', 0, 173], [172, '', 'father', 4.923623917106626, 0], [173, 'ɹ', '', 0, 174], [174, '', 'father', 4.923623917106626, 0], [175, 'l', '', 0, 176], [176, '', 'fill', 5.616771097666572, 0], [177, 'ˈei', '', 0, 178], [178, 'k', '', 0, 179], [179, '', 'flake', 5.616771097666572, 0], [180, 'ɹ', '', 0, 181], [181, '', 'four', 4.923623917106626, 0], [182, 'ɹ', '', 0, 183], [183, '', 'four', 4.923623917106626, 0], [184, 'ˈoʊ', '', 0, 185], [185, 'z', '', 0, 186], [186, 'n̩', '', 0, 187], [187, '', 'frozen', 5.616771097666572, 0], [188, 'ˈɪ', '', 0, 189], [188, 'ˈoʊ', '', 0, 192], [189, 'v', '', 0, 190], [190, 'z', '', 0, 191], [191, '', 'gives', 5.616771097666572, 0], [192, '', 'go', 5.616771097666572, 0], [192, 'ɪ', '', 0, 193], [193, 'n', '', 0, 194], [193, 'ŋ', '', 0, 195], [194, '', 'going', 4.923623917106626, 0], [195, '', 'going', 4.923623917106626, 0], [196, 'ˈæ', '', 0, 197], [196, 'ˈɑ', '', 0, 201], [196, 'ˈi', '', 0, 207], [196, 'ˈʌ', '', 0, 208], [196, 'ˈɛ', '', 0, 209], [196, 'ɚ', '', 0, 212], [196, 'ˈɝ', '', 0, 214], [196, 'ˈɪ', '', 0, 215], [196, 'ɪ', '', 0, 217], [196, 'ˈɔ', '', 0, 222], [196, 'ˈaʊ', '', 0, 225], [196, 'w', '', 0, 579], [196, 'ˈu', '', 0, 597], [197, 'd', '', 0, 198], [197, 'p', '', 0, 199], [197, 'v', '', 0, 206], [198, '', 'had', 4.923623917106626, 0], [199, 'n̩', '', 0, 200], [200, '', 'happen', 4.923623917106626, 0], [201, 'ɹ', '', 0, 202], [202, 'n', '', 0, 203], [203, 'ɪ', '', 0, 204], [204, 's', '', 0, 205], [205, '', 'harness', 5.616771097666572, 0], [206, '', 'have', 5.616771097666572, 0], [207, '', 'he', 4.518158808998462, 0], [208, '', 'he', 4.518158808998462, 0], [209, 'l', '', 0, 210], [210, 'p', '', 0, 211], [211, '', 'help', 4.923623917106626, 0], [212, 'ɹ', '', 0, 213], [213, '', 'her', 4.923623917106626, 0], [214, '', 'her', 4.923623917106626, 0], [215, 'ɹ', '', 0, 216], [215, 'm', '', 0, 219], [216, '', 'here', 5.616771097666572, 0], [217, 'm', '', 0, 218], [217, 'z', '', 0, 221], [218, '', 'him', 4.923623917106626, 0], [219, '', 'him', 4.923623917106626, 0], [220, '', 'him', 4.923623917106626, 0], [221, '', 'his', 3.670860948611258, 0], [222, 'ɹ', '', 0, 223], [223, 's', '', 0, 224], [224, '', 'horse', 5.616771097666572, 0], [225, 's', '', 0, 226], [225, 'z', '', 0, 227], [226, '', 'house', 5.616771097666572, 0], [226, 'h', '', 0, 228], [227, '', 'house', 5.616771097666572, 0], [228, 'ˌoʊ', '', 0, 229], [229, 'l', '', 0, 230], [230, 'd', '', 0, 231], [231, '', 'household', 4.923623917106626, 0], [232, '', 'i', 4.923623917106626, 0], [233, '', 'if', 5.616771097666572, 0], [234, 'n', '', 0, 235], [235, '', 'in', 4.230476736546681, 0], [236, 'l', '', 0, 237], [237, 'u', '', 0, 238], [237, 'ˈu', '', 0, 242], [238, 'd', '', 0, 239], [239, 'ɪ', '', 0, 240], [240, 'ŋ', '', 0, 241], [241, '', 'including', 4.923623917106626, 0], [242, 'd', '', 0, 243], [243, 'ɪ', '', 0, 244], [244, 'ŋ', '', 0, 245], [245, '', 'including', 4.923623917106626, 0], [246, 't', '', 0, 247], [247, 'ɹ', '', 0, 248], [248, 'ˈʌ', '', 0, 249], [249, 'k', '', 0, 250], [250, 't', '', 0, 251], [251, 'ɪ', '', 0, 252], [252, 'v', '', 0, 253], [253, '', 'instructive', 4.923623917106626, 0], [254, 'n', '', 0, 255], [255, 't', '', 0, 256], [256, 'ɹ', '', 0, 257], [257, 'oʊ', '', 0, 258], [257, 'ə', '', 0, 263], [258, 'd', '', 0, 259], [259, 'ˈu', '', 0, 260], [260, 's', '', 0, 261], [261, 't', '', 0, 262], [262, '', 'introduced', 4.923623917106626, 0], [263, 'd', '', 0, 264], [264, 'ˈu', '', 0, 265], [265, 's', '', 0, 266], [266, 't', '', 0, 267], [267, '', 'introduced', 4.923623917106626, 0], [268, '', 'is', 4.518158808998462, 0], [269, '', 'it', 5.616771097666572, 0], [269, 's', '', 0, 270], [270, '', 'its', 4.518158808998462, 0], [271, 'u', '', 0, 272], [271, 'ə', '', 0, 275], [271, 'ˌu', '', 0, 278], [272, 'l', '', 0, 273], [273, 'ˈɑɪ', '', 0, 274], [274, '', 'july', 4.923623917106626, 0], [275, 'l', '', 0, 276], [276, 'ˈɑɪ', '', 0, 277], [277, '', 'july', 4.923623917106626, 0], [278, 'l', '', 0, 279], [279, 'ˈɑɪ', '', 0, 280], [280, '', 'july', 4.923623917106626, 0], [281, 'p', '', 0, 282], [282, '', 'keep', 5.616771097666572, 0], [283, 'ˈoʊ', '', 0, 284], [283, 'ˌi', '', 0, 339], [283, 'ˈɪ', '', 0, 341], [283, 'ˈɛ', '', 0, 343], [283, 'ˈɑɪ', '', 0, 356], [283, 'ˈɑ', '', 0, 362], [283, 'oʊ', '', 0, 364], [283, 'ˈaʊ', '', 0, 372], [284, '', 'know', 5.616771097666572, 0], [285, 'ˈei', '', 0, 286], [285, 'ˈɝ', '', 0, 288], [285, 'ˈɪ', '', 0, 296], [285, 'ˈʌ', '', 0, 301], [286, 'k', '', 0, 287], [287, '', 'lake', 5.616771097666572, 0], [288, 'n', '', 0, 289], [288, 'ɹ', '', 0, 292], [289, 'ɪ', '', 0, 290], [290, 'ŋ', '', 0, 291], [291, '', 'learning', 4.923623917106626, 0], [292, 'n', '', 0, 293], [293, 'ɪ', '', 0, 294], [294, 'ŋ', '', 0, 295], [295, '', 'learning', 4.923623917106626, 0], [296, 't', '', 0, 297], [296, 'ɾ', '', 0, 299], [297, 'l̩', '', 0, 298], [298, '', 'little', 5.616771097666572, 0], [299, 'l̩', '', 0, 300], [300, '', 'little', 5.616771097666572, 0], [301, 'v', '', 0, 302], [302, 'l', '', 0, 303], [303, 'i', '', 0, 304], [304, '', 'lovely', 5.616771097666572, 0], [305, 'ˈæ', '', 0, 306], [305, 'ˈi', '', 0, 313], [305, 'ˈɑɪ', '', 0, 314], [305, 'ɪ', '', 0, 319], [305, 'oʊ', '', 0, 324], [305, 'ˈɔ', '', 0, 326], [305, 'ˈu', '', 0, 328], [305, 'ə', '', 0, 333], [305, 'ˈʌ', '', 0, 336], [305, 'ˌɔ', '', 0, 617], [306, 's', '', 0, 307], [307, 'ə', '', 0, 308], [308, 'k', '', 0, 309], [309, 'ɚ', '', 0, 310], [309, 'ə', '', 0, 311], [310, '', 'massacre', 4.923623917106626, 0], [311, 'ɹ', '', 0, 312], [312, '', 'massacre', 4.923623917106626, 0], [313, '', 'me', 4.923623917106626, 0], [313, '', 'my', 4.923623917106626, 0], [314, '', 'my', 4.923623917106626, 0], [314, 'l̩', '', 0, 315], [314, 'l', '', 0, 317], [315, 'z', '', 0, 316], [316, '', 'miles', 5.616771097666572, 0], [317, 'z', '', 0, 318], [318, '', 'miles', 5.616771097666572, 0], [319, 's', '', 0, 320], [320, 't', '', 0, 321], [321, 'ˈei', '', 0, 322], [322, 'k', '', 0, 323], [323, '', 'mistake', 5.616771097666572, 0], [324, 'ɹ', '', 0, 325], [325, '', 'more', 4.923623917106626, 0], [326, 'ɹ', '', 0, 327], [327, '', 'more', 4.923623917106626, 0], [328, 'v', '', 0, 329], [329, 'm', '', 0, 330], [330, 'n̩', '', 0, 331], [331, 't', '', 0, 332], [332, '', 'movement', 4.923623917106626, 0], [333, 's', '', 0, 334], [334, 't', '', 0, 335], [335, '', 'must', 5.616771097666572, 0], [336, 's', '', 0, 337], [337, 't', '', 0, 338], [338, '', 'must', 5.616771097666572, 0], [339, 'ɹ', '', 0, 340], [340, '', 'near', 5.616771097666572, 0], [341, 'ɹ', '', 0, 342], [341, 'k', '', 0, 348], [342, '', 'near', 5.616771097666572, 0], [343, 'v', '', 0, 344], [344, 'ɚ', '', 0, 345], [344, 'ə', '', 0, 346], [345, '', 'never', 4.923623917106626, 0], [346, 'ɹ', '', 0, 347], [347, '', 'never', 4.923623917106626, 0], [348, 'ə', '', 0, 349], [348, 'l', '', 0, 353], [349, 'l', '', 0, 350], [350, 'ə', '', 0, 351], [350, 'ɑɪ', '', 0, 625], [351, 's', '', 0, 352], [352, '', 'nicholas', 4.230476736546681, 0], [353, 'ə', '', 0, 354], [354, 's', '', 0, 355], [355, '', 'nicholas', 4.230476736546681, 0], [356, 'n', '', 0, 357], [357, 't', '', 0, 358], [358, 'ˈi', '', 0, 359], [358, 'i', '', 0, 361], [359, 'n', '', 0, 360], [360, '', 'nineteen', 4.923623917106626, 0], [361, '', 'ninety', 4.923623917106626, 0], [362, 't', '', 0, 363], [363, '', 'not', 5.616771097666572, 0], [364, 'v', '', 0, 365], [365, 'ˈɛ', '', 0, 366], [366, 'm', '', 0, 367], [367, 'b', '', 0, 368], [368, 'ɚ', '', 0, 369], [368, 'ə', '', 0, 370], [369, '', 'november', 4.923623917106626, 0], [370, 'ɹ', '', 0, 371], [371, '', 'november', 4.923623917106626, 0], [372, '', 'now', 4.923623917106626, 0], [373, '', 'of', 4.230476736546681, 0], [374, 'l', '', 0, 375], [374, 'n', '', 0, 379], [375, 'd', '', 0, 376], [376, '', 'old', 4.923623917106626, 0], [377, '', 'on', 4.923623917106626, 0], [378, '', 'on', 4.923623917106626, 0], [379, 'l', '', 0, 380], [380, 'i', '', 0, 381], [381, '', 'only', 5.616771097666572, 0], [382, 'ð', '', 0, 383], [382, 'p', '', 0, 555], [383, 'ɚ', '', 0, 384], [383, 'ə', '', 0, 385], [384, '', 'other', 5.616771097666572, 0], [385, 'ɹ', '', 0, 386], [386, '', 'other', 5.616771097666572, 0], [387, 'ɚ', '', 0, 388], [387, 'ɹ', '', 0, 389], [388, '', 'our', 4.923623917106626, 0], [389, '', 'our', 4.923623917106626, 0], [390, 'ˈɝ', '', 0, 391], [390, 'ɹ', '', 0, 412], [391, 's', '', 0, 392], [391, 'ɹ', '', 0, 397], [392, 'ɪ', '', 0, 393], [393, 'n', '', 0, 394], [394, 'ɪ', '', 0, 395], [395, 'l', '', 0, 396], [396, '', 'personal', 4.923623917106626, 0], [397, 's', '', 0, 398], [398, 'ə', '', 0, 399], [399, 'n', '', 0, 400], [400, 'l̩', '', 0, 401], [401, '', 'personal', 4.923623917106626, 0], [402, 'z', '', 0, 403], [403, 'ˈɪ', '', 0, 404], [404, 'ʃ', '', 0, 405], [405, 'n̩', '', 0, 406], [406, '', 'physician', 4.923623917106626, 0], [407, 'z', '', 0, 408], [408, 'ˈɪ', '', 0, 409], [409, 'ʃ', '', 0, 410], [410, 'n̩', '', 0, 411], [411, '', 'physician', 4.923623917106626, 0], [412, 'ˈɑ', '', 0, 413], [413, 'm', '', 0, 414], [414, 'ə', '', 0, 415], [415, 's', '', 0, 416], [416, 'ə', '', 0, 417], [417, 'z', '', 0, 418], [418, '', 'promises', 5.616771097666572, 0], [419, 'ˈɪ', '', 0, 420], [420, 'ɹ', '', 0, 421], [421, '', 'queer', 5.616771097666572, 0], [422, 'ɑ', '', 0, 423], [422, 'ˈæ', '', 0, 425], [422, 'i', '', 0, 427], [422, 'ˈoʊ', '', 0, 432], [422, 'ˈu', '', 0, 439], [422, 'ˈʌ', '', 0, 441], [423, 'n', '', 0, 424], [424, '', 'ran', 4.923623917106626, 0], [425, 'n', '', 0, 426], [426, '', 'ran', 4.923623917106626, 0], [427, 'ˈæ', '', 0, 428], [428, 'k', '', 0, 429], [429, 'ʃ', '', 0, 430], [430, 'n̩', '', 0, 431], [431, '', 'reaction', 4.923623917106626, 0], [432, 'm', '', 0, 433], [433, 'ə', '', 0, 434], [434, 'n', '', 0, 435], [435, 'ˌɔ', '', 0, 436], [436, 'f', '', 0, 437], [436, 'v', '', 0, 438], [437, '', 'romanov', 4.923623917106626, 0], [437, 's', '', 0, 626], [438, '', 'romanov', 4.923623917106626, 0], [439, 'l', '', 0, 440], [440, '', 'rule', 4.923623917106626, 0], [441, 'ʃ', '', 0, 442], [442, 'ə', '', 0, 443], [443, '', 'russia', 4.518158808998462, 0], [444, 'ˈɛ', '', 0, 445], [444, 'ˈi', '', 0, 449], [444, 'ˈɑɪ', '', 0, 459], [444, 'ˈɪ', '', 0, 462], [444, 'ɪ', '', 0, 468], [444, 'l', '', 0, 473], [444, 'n', '', 0, 476], [444, 'ˈʌ', '', 0, 478], [444, 'ˈaʊ', '', 0, 480], [444, 't', '', 0, 485], [444, 'w', '', 0, 498], [445, 'k', '', 0, 446], [445, 'b', '', 0, 627], [446, 'n̩', '', 0, 447], [447, '', 'second', 4.923623917106626, 0], [447, 'd', '', 0, 448], [448, '', 'second', 4.923623917106626, 0], [449, '', 'see', 5.616771097666572, 0], [450, 'ˈei', '', 0, 451], [450, 'ˈoʊ', '', 0, 453], [451, 'k', '', 0, 452], [452, '', 'shake', 5.616771097666572, 0], [453, 'l', '', 0, 454], [454, 'd', '', 0, 455], [455, 'ɚ', '', 0, 456], [455, 'ə', '', 0, 457], [456, '', 'shoulder', 4.923623917106626, 0], [457, 'ɹ', '', 0, 458], [458, '', 'shoulder', 4.923623917106626, 0], [459, 'm', '', 0, 460], [460, 'n̩', '', 0, 461], [461, '', 'simon', 4.923623917106626, 0], [462, 's', '', 0, 463], [462, 'k', '', 0, 471], [463, 't', '', 0, 464], [464, 'ɚ', '', 0, 465], [464, 'ə', '', 0, 466], [465, '', 'sister', 4.923623917106626, 0], [466, 'ɹ', '', 0, 467], [467, '', 'sister', 4.923623917106626, 0], [468, 'k', '', 0, 469], [469, 's', '', 0, 470], [470, '', 'six', 4.923623917106626, 0], [471, 's', '', 0, 472], [472, '', 'six', 4.923623917106626, 0], [473, 'ˈi', '', 0, 474], [474, 'p', '', 0, 475], [475, '', 'sleep', 5.616771097666572, 0], [476, 'ˈoʊ', '', 0, 477], [477, '', 'snow', 5.616771097666572, 0], [478, 'm', '', 0, 479], [479, '', 'some', 5.616771097666572, 0], [480, 'n', '', 0, 481], [481, 'd', '', 0, 482], [481, 'z', '', 0, 484], [482, 'z', '', 0, 483], [483, '', 'sounds', 5.616771097666572, 0], [484, '', 'sounds', 5.616771097666572, 0], [485, 'ɑ', '', 0, 486], [485, 'ˈɑ', '', 0, 488], [485, 'ˈɔ', '', 0, 494], [486, 't', '', 0, 487], [487, '', 'start', 4.923623917106626, 0], [488, 'ɹ', '', 0, 489], [488, 'p', '', 0, 491], [489, 't', '', 0, 490], [490, '', 'start', 4.923623917106626, 0], [491, '', 'stop', 5.616771097666572, 0], [491, 'ɪ', '', 0, 492], [492, 'ŋ', '', 0, 493], [493, '', 'stopping', 5.616771097666572, 0], [494, 'ɹ', '', 0, 495], [495, 'i', '', 0, 496], [496, 'z', '', 0, 497], [497, '', 'stories', 4.923623917106626, 0], [498, 'ˈi', '', 0, 499], [499, 'p', '', 0, 500], [500, '', 'sweep', 5.616771097666572, 0], [501, 'ˈɛ', '', 0, 502], [501, 'ˈɪ', '', 0, 505], [501, 'ɛ', '', 0, 511], [501, 'ˈoʊ', '', 0, 537], [501, 'ˈu', '', 0, 538], [501, 'ˈʌ', '', 0, 539], [501, 'ɹ', '', 0, 542], [501, 'w', '', 0, 549], [502, 'ɹ', '', 0, 503], [502, 'l', '', 0, 508], [503, 'z', '', 0, 504], [504, '', 'tears', 4.923623917106626, 0], [505, 'ɹ', '', 0, 506], [506, 'z', '', 0, 507], [507, '', 'tears', 4.923623917106626, 0], [508, 'ɪ', '', 0, 509], [509, 'ŋ', '', 0, 510], [510, '', 'telling', 4.923623917106626, 0], [511, 'm', '', 0, 512], [512, 't', '', 0, 513], [512, 'p', '', 0, 517], [513, 'ˈei', '', 0, 514], [514, 'ʃ', '', 0, 515], [515, 'n̩', '', 0, 516], [516, '', 'temptation', 4.923623917106626, 0], [517, 't', '', 0, 518], [518, 'ˈei', '', 0, 519], [519, 'ʃ', '', 0, 520], [520, 'n̩', '', 0, 521], [521, '', 'temptation', 4.923623917106626, 0], [522, 'ə', '', 0, 523], [522, 'ˈæ', '', 0, 525], [522, 'ˈi', '', 0, 527], [522, 'ˈʌ', '', 0, 528], [522, 'ˈɛ', '', 0, 529], [522, 'ˈoʊ', '', 0, 536], [523, '', 'the', 4.923623917106626, 0], [523, 't', '', 0, 524], [524, '', 'that', 4.518158808998462, 0], [525, 't', '', 0, 526], [526, '', 'that', 4.518158808998462, 0], [527, '', 'the', 4.923623917106626, 0], [527, 'z', '', 0, 531], [528, '', 'the', 4.923623917106626, 0], [529, 'ɹ', '', 0, 530], [530, '', 'there', 4.923623917106626, 0], [531, '', 'these', 5.616771097666572, 0], [532, 'ˈɪ', '', 0, 533], [533, 'ŋ', '', 0, 534], [534, 'k', '', 0, 535], [535, '', 'think', 5.616771097666572, 0], [536, '', 'though', 5.616771097666572, 0], [537, '', 'to', 3.4195465203303517, 0], [537, 'l', '', 0, 540], [538, '', 'to', 3.4195465203303517, 0], [539, '', 'to', 3.4195465203303517, 0], [540, 'd', '', 0, 541], [541, '', 'told', 4.923623917106626, 0], [542, 'ɑɪ', '', 0, 543], [543, 'ˈʌ', '', 0, 544], [544, 'm', '', 0, 545], [545, 'f', '', 0, 546], [546, 'n̩', '', 0, 547], [547, 't', '', 0, 548], [548, '', 'triumphant', 4.923623917106626, 0], [549, 'ˈɛ', '', 0, 550], [550, 'n', '', 0, 551], [551, 'i', '', 0, 552], [551, 't', '', 0, 553], [552, '', 'twenty', 4.923623917106626, 0], [553, 'i', '', 0, 554], [554, '', 'twenty', 4.923623917106626, 0], [555, '', 'up', 5.616771097666572, 0], [556, 'ˈɪ', '', 0, 557], [557, 'l', '', 0, 558], [558, 'ɪ', '', 0, 559], [559, 'dʒ', '', 0, 560], [560, '', 'village', 5.616771097666572, 0], [561, 'ˈɔ', '', 0, 562], [561, 'ˈɑ', '', 0, 566], [561, 'ə', '', 0, 574], [561, 'ˈʌ', '', 0, 582], [561, 'ˌɑ', '', 0, 584], [561, 'ˈɛ', '', 0, 590], [561, 'ˈɪ', '', 0, 592], [561, 'ɪ', '', 0, 599], [561, 'ˈɑɪ', '', 0, 602], [561, 'ˈʊ', '', 0, 609], [562, 'n', '', 0, 563], [562, 'z', '', 0, 577], [563, 'ɪ', '', 0, 564], [563, 't', '', 0, 571], [564, 'd', '', 0, 565], [565, '', 'wanted', 4.923623917106626, 0], [566, 'n', '', 0, 567], [566, 'z', '', 0, 576], [566, 'tʃ', '', 0, 578], [567, 't', '', 0, 568], [568, 'ə', '', 0, 569], [569, 'd', '', 0, 570], [570, '', 'wanted', 4.923623917106626, 0], [571, 'ɪ', '', 0, 572], [572, 'd', '', 0, 573], [573, '', 'wanted', 4.923623917106626, 0], [574, 'z', '', 0, 575], [575, '', 'was', 4.923623917106626, 0], [576, '', 'was', 4.923623917106626, 0], [577, '', 'was', 4.923623917106626, 0], [578, '', 'watch', 5.616771097666572, 0], [579, 'ˈʌ', '', 0, 580], [579, 'ˈɛ', '', 0, 586], [579, 'ˈɪ', '', 0, 588], [580, 't', '', 0, 581], [581, '', 'what', 4.923623917106626, 0], [582, 't', '', 0, 583], [583, '', 'what', 4.923623917106626, 0], [584, 't', '', 0, 585], [585, '', 'what', 4.923623917106626, 0], [586, 'n', '', 0, 587], [587, '', 'when', 4.923623917106626, 0], [588, 'n', '', 0, 589], [588, 'tʃ', '', 0, 594], [589, '', 'when', 4.923623917106626, 0], [590, 'n', '', 0, 591], [591, '', 'when', 4.923623917106626, 0], [592, 'n', '', 0, 593], [592, 'tʃ', '', 0, 595], [593, '', 'when', 4.923623917106626, 0], [593, 'd', '', 0, 601], [594, '', 'which', 4.923623917106626, 0], [595, '', 'which', 4.923623917106626, 0], [596, '', 'which', 4.923623917106626, 0], [597, 'z', '', 0, 598], [598, '', 'whose', 5.616771097666572, 0], [599, 'l', '', 0, 600], [599, 'ð', '', 0, 605], [599, 'ɵ', '', 0, 606], [600, '', 'will', 5.616771097666572, 0], [601, '', 'wind', 5.616771097666572, 0], [602, 'n', '', 0, 603], [603, 'd', '', 0, 604], [604, '', 'wind', 5.616771097666572, 0], [605, '', 'with', 5.616771097666572, 0], [605, 'ˈaʊ', '', 0, 607], [606, '', 'with', 5.616771097666572, 0], [607, 't', '', 0, 608], [608, '', 'without', 5.616771097666572, 0], [609, 'd', '', 0, 610], [610, 'z', '', 0, 611], [611, '', 'woods', 5.616771097666572, 0], [612, 'ˌi', '', 0, 613], [612, 'ˈɪ', '', 0, 615], [613, 'ɹ', '', 0, 614], [614, '', 'year', 4.923623917106626, 0], [615, 'ɹ', '', 0, 616], [616, '', 'year', 4.923623917106626, 0], [617, 'n', '', 0, 618], [618, 't', '', 0, 619], [619, 'ə', '', 0, 620], [620, 'n', '', 0, 621], [621, 'ˈoʊ', '', 0, 622], [622, 'ɹ', '', 0, 623], [623, 'i', '', 0, 624], [624, '', 'montenori', 4.923623917106626, 0], [625, '', 'nicholai', 4.923623917106626, 0], [626, '', 'romanovs', 5.616771097666572, 0], [627, 'æ', '', 0, 628], [628, 'g', '', 0, 629], [629, '', 'sebag', 4.923623917106626, 0]]

Now let's compose $TLG=T\circ LG$, in order to get the set of all valid full-word paths through the transcription. We know that the only possible final state of $LG$ is state $0$. The only possible final state of $T$ is its largest integer, equal to the number of transitions.

In [94]:
importlib.reload(mp5)
#TLG, TLGfinal = mp5.todo_fstcompose(T,Tfinal,LG,LGfinal)
TLG = solutions['TLG']
TLGfinal = solutions['TLGfinal']
print(TLGfinal)
print(TLG[:45])
[215460]
[[0, 'h', '', 0, 826], [1, '', 'a', 4.923623917106626, 0], [2, '', 'a', 4.923623917106626, 0], [5, '', 'about', 4.923623917106626, 0], [8, '', 'about', 4.923623917106626, 0], [21, '', 'aleksandrovich', 4.923623917106626, 0], [30, '', 'alexander', 4.923623917106626, 0], [32, '', 'alexander', 4.923623917106626, 0], [34, '', 'all', 4.923623917106626, 0], [36, '', 'all', 4.923623917106626, 0], [38, '', 'and', 4.007333185232471, 0], [41, '', 'and', 4.007333185232471, 0], [43, '', 'are', 4.923623917106626, 0], [43, '', 'our', 4.923623917106626, 0], [44, '', 'as', 4.923623917106626, 0], [46, '', 'as', 4.923623917106626, 0], [48, '', 'ask', 5.616771097666572, 0], [49, '', 'what', 4.923623917106626, 0], [49, '', 'at', 4.923623917106626, 0], [50, '', 'at', 4.923623917106626, 0], [51, '', 'be', 4.923623917106626, 0], [52, '', 'be', 4.923623917106626, 0], [53, '', 'be', 4.923623917106626, 0], [57, '', 'before', 5.616771097666572, 0], [59, '', 'before', 5.616771097666572, 0], [63, '', 'before', 5.616771097666572, 0], [66, '', 'bells', 5.616771097666572, 0], [70, '', 'between', 5.616771097666572, 0], [74, '', 'between', 5.616771097666572, 0], [76, '', 'book', 4.923623917106626, 0], [79, '', 'broke', 4.923623917106626, 0], [81, '', 'but', 4.923623917106626, 0], [83, '', 'but', 4.923623917106626, 0], [84, '', 'by', 4.518158808998462, 0], [93, '', 'communist', 4.923623917106626, 0], [96, '', 'communist', 4.923623917106626, 0], [99, '', 'cried', 4.923623917106626, 0], [102, '', 'czar', 4.923623917106626, 0], [106, '', 'dark', 5.616771097666572, 0], [109, '', 'darkest', 5.616771097666572, 0], [111, '', 'deep', 5.616771097666572, 0], [113, '', 'died', 4.923623917106626, 0], [115, '', 'down', 4.923623917106626, 0], [116, '', 'downy', 5.616771097666572, 0], [123, '', 'dramatic', 4.923623917106626, 0]]

Obviously, TLG contains a lot of useless transitions! Specifically, it contains a lot of transitions that are disconnected from the graph --- no way to get to them, and/or nowhere to go from them. Since TLG contains no cycles (can you see why?), we can eliminate those useless transitions by topologically sorting the graph, using a topological sort function that only keeps transitions that can be reached from the initial state.

In [95]:
importlib.reload(mp5)
#TLG_sorted, TLGfinal_sorted = mp5.todo_sort_topologically(TLG,TLGfinal)
TLG_sorted = solutions['TLG_sorted']
TLGfinal_sorted = solutions['TLGfinal_sorted']
print(TLG_sorted[:45])
[[0, 'h', '', 0, 1], [1, 'ˈu', '', 0, 2], [2, 'z', '', 0, 3], [3, '', 'whose', 5.616771097666572, 4], [4, 'w', '', 0, 5], [5, 'ˈʊ', '', 0, 6], [6, 'd', '', 0, 7], [7, 'z', '', 0, 8], [8, '', 'woods', 5.616771097666572, 9], [9, 'ð', '', 0, 10], [10, 'ˈi', '', 0, 11], [11, '', 'the', 4.923623917106626, 12], [11, 'z', '', 0, 13], [12, 'z', '', 0, 14], [13, '', 'these', 5.616771097666572, 15], [14, 'ˈɑ', '', 0, 16], [15, 'ˈɑ', '', 0, 17], [16, 'ɹ', '', 0, 18], [17, 'ɹ', '', 0, 19], [18, '', 'czar', 4.923623917106626, 20], [19, '', 'are', 4.923623917106626, 20], [19, '', 'our', 4.923623917106626, 20], [20, 'ˈɑɪ', '', 0, 21], [21, '', 'i', 4.923623917106626, 22], [22, 'ɵ', '', 0, 23], [23, 'ˈɪ', '', 0, 24], [24, 'ŋ', '', 0, 25], [25, 'k', '', 0, 26], [26, '', 'think', 5.616771097666572, 27], [27, 'ˈɑɪ', '', 0, 28], [28, '', 'i', 4.923623917106626, 29], [29, 'n', '', 0, 30], [30, 'ˈoʊ', '', 0, 31], [31, '', 'know', 5.616771097666572, 32], [32, 'h', '', 0, 33], [33, 'ɪ', '', 0, 34], [34, 'z', '', 0, 35], [35, '', 'his', 3.670860948611258, 36], [36, 'h', '', 0, 37], [37, 'ˈaʊ', '', 0, 38], [38, 's', '', 0, 39], [39, '', 'house', 5.616771097666572, 40], [40, 'ɪ', '', 0, 41], [41, 'z', '', 0, 42], [42, '', 'is', 4.518158808998462, 43]]

One reason to topologically sort the graph is that it makes the bestpath algorithm, the forward algorithm, and the backward algorithm all much more efficient.

In [96]:
importlib.reload(mp5)
#delta, psi, bestpath = mp5.todo_fstbestpath(TLG_sorted,TLGfinal_sorted)
bestpath = solutions['bestpath']
print([ t[2] for t in bestpath if t[2]!='' ])
['whose', 'woods', 'the', 'czar', 'i', 'think', 'i', 'know', 'his', 'house', 'is', 'in', 'the', 'village', 'though', 'he', 'will', 'not', 'see', 'me', 'stopping', 'here', 'to', 'watch', 'his', 'woods', 'fill', 'up', 'with', 'snow', 'me', 'little', 'horse', 'must', 'think', 'it', 'queer', 'to', 'stop', 'without', 'a', 'farm', 'house', 'near', 'between', 'the', 'woods', 'and', 'frozen', 'lake', 'the', 'darkest', 'evening', 'of', 'the', 'year', 'he', 'gives', 'his', 'harness', 'bells', 'a', 'shake', 'to', 'ask', 'if', 'there', 'is', 'some', 'mistake', 'the', 'only', 'other', 'sounds', 'the', 'sweep', 'of', 'easy', 'wind', 'and', 'downy', 'flake', 'the', 'woods', 'our', 'lovely', 'dark', 'and', 'deep', 'but', 'i', 'have', 'promises', 'to', 'keep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep']

Re-estimation

Re-estimation of an FST is pretty similar to an HMM. We run the forward algorithm, then the backward algorithm, then compute xi=alpha otimes beta, and set each transition's weight equal to $w[t]=-\ln(\exp\xi_u/\sum_{u:n[u]=n[t]}\exp\xi_u)$.

In [97]:
importlib.reload(mp5)
#alpha = mp5.todo_fstforward(TLG_sorted)
alpha = { int(k):v for (k,v) in solutions['alpha'].items() }
print(alpha)
{0: 0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 5.616771097666572, 5: 5.616771097666572, 6: 5.616771097666572, 7: 5.616771097666572, 8: 5.616771097666572, 9: 11.233542195333143, 10: 11.233542195333143, 11: 11.233542195333143, 12: 16.157166112439768, 13: 11.233542195333143, 14: 16.157166112439768, 15: 16.850313292999715, 16: 16.157166112439768, 17: 16.850313292999715, 18: 16.157166112439768, 19: 16.850313292999715, 20: 20.38764284898645, 21: 20.38764284898645, 22: 25.311266766093077, 23: 25.311266766093077, 24: 25.311266766093077, 25: 25.311266766093077, 26: 25.311266766093077, 27: 30.92803786375965, 28: 30.92803786375965, 29: 35.85166178086627, 30: 35.85166178086627, 31: 35.85166178086627, 32: 41.468432878532845, 33: 41.468432878532845, 34: 41.468432878532845, 35: 41.468432878532845, 36: 45.1392938271441, 37: 45.1392938271441, 38: 45.1392938271441, 39: 45.1392938271441, 40: 50.75606492481067, 41: 50.75606492481067, 42: 50.75606492481067, 43: 55.27422373380914, 44: 55.27422373380914, 45: 55.27422373380914, 46: 59.504700470355814, 47: 59.504700470355814, 48: 59.504700470355814, 49: 64.42832438746244, 50: 64.42832438746244, 51: 64.42832438746244, 52: 64.42832438746244, 53: 64.42832438746244, 54: 64.42832438746244, 55: 70.04509548512901, 56: 70.04509548512901, 57: 70.04509548512901, 58: 75.66186658279558, 59: 75.66186658279558, 60: 75.66186658279558, 61: 80.18002539179405, 62: 80.18002539179405, 63: 80.18002539179405, 64: 80.18002539179405, 65: 85.79679648946062, 66: 85.79679648946062, 67: 85.79679648946062, 68: 85.79679648946062, 69: 91.41356758712719, 70: 91.41356758712719, 71: 91.41356758712719, 72: 97.03033868479376, 73: 97.03033868479376, 74: 97.03033868479376, 75: 101.26081542134044, 76: 101.26081542134044, 77: 101.26081542134044, 78: 101.26081542134044, 79: 101.26081542134044, 80: 106.87758651900701, 81: 101.26081542134044, 82: 106.87758651900701, 83: 101.26081542134044, 84: 106.87758651900701, 85: 106.87758651900701, 86: 106.87758651900701, 87: 106.87758651900701, 88: 112.49435761667358, 89: 112.49435761667358, 90: 112.49435761667358, 91: 115.91390413700394, 92: 115.91390413700394, 93: 115.91390413700394, 94: 115.91390413700394, 95: 121.53067523467051, 96: 121.53067523467051, 97: 121.53067523467051, 98: 121.53067523467051, 99: 125.20153618328177, 100: 125.20153618328177, 101: 125.20153618328177, 102: 125.20153618328177, 103: 125.20153618328177, 104: 130.81830728094835, 105: 130.81830728094835, 106: 130.81830728094835, 107: 130.81830728094835, 108: 136.43507837861492, 109: 136.43507837861492, 110: 136.43507837861492, 111: 142.0518494762815, 112: 142.0518494762815, 113: 142.0518494762815, 114: 142.0518494762815, 115: 147.66862057394806, 116: 147.66862057394806, 117: 147.66862057394806, 118: 147.66862057394806, 119: 153.28539167161463, 120: 153.28539167161463, 121: 153.28539167161463, 122: 157.5158684081613, 123: 157.5158684081613, 124: 157.5158684081613, 125: 157.5158684081613, 126: 157.5158684081613, 127: 163.13263950582788, 128: 163.13263950582788, 129: 163.13263950582788, 130: 163.13263950582788, 131: 163.13263950582788, 132: 168.74941060349445, 133: 168.74941060349445, 134: 168.74941060349445, 135: 168.74941060349445, 136: 168.74941060349445, 137: 174.36618170116103, 138: 174.36618170116103, 139: 174.36618170116103, 140: 174.36618170116103, 141: 174.36618170116103, 142: 179.9829527988276, 143: 179.9829527988276, 144: 179.9829527988276, 145: 185.59972389649417, 146: 185.59972389649417, 147: 185.59972389649417, 148: 185.59972389649417, 149: 185.59972389649417, 150: 191.21649499416074, 151: 191.21649499416074, 152: 191.21649499416074, 153: 194.6360415144911, 154: 194.6360415144911, 155: 194.6360415144911, 156: 194.6360415144911, 157: 194.6360415144911, 158: 200.25281261215767, 159: 200.25281261215767, 160: 200.25281261215767, 161: 200.25281261215767, 162: 205.86958370982424, 163: 200.25281261215767, 164: 205.86958370982424, 165: 200.25281261215767, 166: 205.86958370982424, 167: 205.86958370982424, 168: 210.79320762693087, 169: 210.79320762693087, 170: 210.79320762693087, 171: 210.79320762693087, 172: 210.79320762693087, 173: 216.40997872459744, 174: 216.40997872459744, 175: 216.40997872459744, 176: 216.40997872459744, 177: 222.026749822264, 178: 222.026749822264, 179: 222.026749822264, 180: 222.026749822264, 181: 227.64352091993058, 182: 227.64352091993058, 183: 227.64352091993058, 184: 232.5671448370372, 185: 227.64352091993058, 186: 232.5671448370372, 187: 227.64352091993058, 188: 232.5671448370372, 189: 227.64352091993058, 190: 227.64352091993058, 191: 233.26029201759715, 192: 233.26029201759715, 193: 233.26029201759715, 194: 238.18391593470378, 195: 238.18391593470378, 196: 238.18391593470378, 197: 238.18391593470378, 198: 238.18391593470378, 199: 243.80068703237035, 200: 243.80068703237035, 201: 248.72431094947697, 202: 243.80068703237035, 203: 248.72431094947697, 204: 243.80068703237035, 205: 247.80802021760283, 206: 247.80802021760283, 207: 247.80802021760283, 208: 247.80802021760283, 209: 247.80802021760283, 210: 247.80802021760283, 211: 253.4247913152694, 212: 253.4247913152694, 213: 253.4247913152694, 214: 253.4247913152694, 215: 259.04156241293595, 216: 259.04156241293595, 217: 259.04156241293595, 218: 263.96518633004257, 219: 263.96518633004257, 220: 263.96518633004257, 221: 263.96518633004257, 222: 263.96518633004257, 223: 269.58195742770914, 224: 263.96518633004257, 225: 269.58195742770914, 226: 263.96518633004257, 227: 274.50558134481577, 228: 263.96518633004257, 229: 274.50558134481577, 230: 269.58195742770914, 231: 274.50558134481577, 232: 269.58195742770914, 233: 269.58195742770914, 234: 269.58195742770914, 235: 269.58195742770914, 236: 269.58195742770914, 237: 275.1987285253757, 238: 275.1987285253757, 239: 280.12235244248234, 240: 275.1987285253757, 241: 280.12235244248234, 242: 279.4292052619224, 243: 279.4292052619224, 244: 279.4292052619224, 245: 284.352829179029, 246: 284.352829179029, 247: 284.352829179029, 248: 284.352829179029, 249: 289.27645309613564, 250: 289.27645309613564, 251: 289.27645309613564, 252: 293.7946119051341, 253: 293.7946119051341, 254: 293.7946119051341, 255: 293.7946119051341, 256: 293.7946119051341, 257: 299.4113830028007, 258: 299.4113830028007, 259: 299.4113830028007, 260: 299.4113830028007, 261: 303.08224395141195, 262: 303.08224395141195, 263: 303.08224395141195, 264: 303.08224395141195, 265: 303.08224395141195, 266: 303.08224395141195, 267: 303.08224395141195, 268: 308.6990150490785, 269: 308.6990150490785, 270: 308.6990150490785, 271: 308.6990150490785, 272: 308.6990150490785, 273: 314.3157861467451, 274: 314.3157861467451, 275: 319.2394100638517, 276: 319.2394100638517, 277: 319.2394100638517, 278: 319.2394100638517, 279: 324.8561811615183, 280: 324.8561811615183, 281: 324.8561811615183, 282: 328.27572768184865, 283: 328.27572768184865, 284: 328.27572768184865, 285: 328.27572768184865, 286: 333.8924987795152, 287: 333.8924987795152, 288: 333.8924987795152, 289: 339.5092698771818, 290: 339.5092698771818, 291: 339.5092698771818, 292: 339.5092698771818, 293: 344.4328937942884, 294: 344.4328937942884, 295: 344.4328937942884, 296: 348.9510526032869, 297: 348.9510526032869, 298: 348.9510526032869, 299: 348.9510526032869, 300: 354.56782370095345, 301: 354.56782370095345, 302: 354.56782370095345, 303: 354.56782370095345, 304: 354.56782370095345, 305: 354.56782370095345, 306: 354.56782370095345, 307: 360.18459479862, 308: 360.18459479862, 309: 360.18459479862, 310: 365.10821871572665, 311: 365.10821871572665, 312: 365.10821871572665, 313: 365.10821871572665, 314: 365.10821871572665, 315: 370.7249898133932, 316: 370.7249898133932, 317: 370.7249898133932, 318: 370.7249898133932, 319: 376.3417609110598, 320: 376.3417609110598, 321: 376.3417609110598, 322: 376.3417609110598, 323: 376.3417609110598, 324: 376.3417609110598, 325: 381.95853200872637, 326: 381.95853200872637, 327: 381.95853200872637, 328: 386.882155925833, 329: 386.882155925833, 330: 386.882155925833, 331: 386.882155925833, 332: 386.882155925833, 333: 392.49892702349956, 334: 392.49892702349956, 335: 397.4225509406062, 336: 392.49892702349956, 337: 397.4225509406062, 338: 396.72940376004624, 339: 396.72940376004624, 340: 396.72940376004624, 341: 396.72940376004624, 342: 402.3461748577128, 343: 402.3461748577128, 344: 402.3461748577128, 345: 402.3461748577128, 346: 407.26979877481944, 347: 402.3461748577128, 348: 407.26979877481944, 349: 407.9629459553794, 350: 407.9629459553794, 351: 412.886569872486, 352: 407.9629459553794, 353: 412.886569872486, 354: 407.9629459553794, 355: 411.97027914061186, 356: 411.97027914061186, 357: 411.97027914061186, 358: 411.97027914061186, 359: 416.8939030577185, 360: 411.97027914061186, 361: 417.58705023827844, 362: 417.58705023827844, 363: 417.58705023827844, 364: 417.58705023827844, 365: 417.58705023827844, 366: 423.203821335945, 367: 423.203821335945, 368: 423.203821335945, 369: 428.12744525305163, 370: 428.12744525305163, 371: 428.12744525305163, 372: 428.12744525305163, 373: 428.12744525305163, 374: 433.7442163507182, 375: 433.7442163507182, 376: 433.7442163507182, 377: 437.9746930872649, 378: 437.9746930872649, 379: 437.9746930872649, 380: 437.9746930872649, 381: 437.9746930872649, 382: 437.9746930872649, 383: 443.59146418493145, 384: 443.59146418493145, 385: 443.59146418493145, 386: 443.59146418493145, 387: 443.59146418493145, 388: 449.208235282598, 389: 443.59146418493145, 390: 449.208235282598, 391: 454.13185919970465, 392: 449.208235282598, 393: 454.13185919970465, 394: 449.208235282598, 395: 453.2155684678305, 396: 453.2155684678305, 397: 453.2155684678305, 398: 453.2155684678305, 399: 458.8323395654971, 400: 458.8323395654971, 401: 458.8323395654971, 402: 458.8323395654971, 403: 463.7559634826037, 404: 463.7559634826037, 405: 468.67958739971033, 406: 468.67958739971033, 407: 468.67958739971033, 408: 468.67958739971033, 409: 474.2963584973769, 410: 474.2963584973769, 411: 474.2963584973769, 412: 474.2963584973769, 413: 474.2963584973769, 414: 474.2963584973769, 415: 474.2963584973769, 416: 474.2963584973769, 417: 474.2963584973769, 418: 479.9131295950435, 419: 479.9131295950435, 420: 479.9131295950435, 421: 483.33267611537383, 422: 483.33267611537383, 423: 483.33267611537383, 424: 483.33267611537383, 425: 488.9494472130404, 426: 488.9494472130404, 427: 493.873071130147, 428: 488.9494472130404, 429: 493.873071130147, 430: 488.9494472130404, 431: 492.9567803982729, 432: 492.9567803982729, 433: 492.9567803982729, 434: 497.8804043153795, 435: 492.9567803982729, 436: 492.9567803982729, 437: 498.57355149593945, 438: 498.57355149593945, 439: 498.57355149593945, 440: 501.9930980162698, 441: 501.9930980162698, 442: 501.9930980162698, 443: 507.6098691139364, 444: 507.6098691139364, 445: 507.6098691139364, 446: 507.6098691139364, 447: 507.6098691139364, 448: 507.6098691139364, 449: 513.2266402116029, 450: 513.2266402116029, 451: 518.1502641287095, 452: 518.1502641287095, 453: 518.1502641287095, 454: 518.1502641287095, 455: 518.1502641287095, 456: 523.767035226376, 457: 523.767035226376, 458: 528.6906591434827, 459: 523.767035226376, 460: 528.6906591434827, 461: 523.767035226376, 462: 527.7743684116085, 463: 527.7743684116085, 464: 527.7743684116085, 465: 532.6979923287151, 466: 527.7743684116085, 467: 527.7743684116085, 468: 533.391139509275, 469: 533.391139509275, 470: 533.391139509275, 471: 536.8106860296053, 472: 536.8106860296053, 473: 536.8106860296053, 474: 542.427457127272, 475: 542.427457127272, 476: 542.427457127272, 477: 542.427457127272, 478: 542.427457127272, 479: 542.427457127272, 480: 548.0442282249385, 481: 548.0442282249385, 482: 552.9678521420451, 483: 552.9678521420451, 484: 552.9678521420451, 485: 552.9678521420451, 486: 552.9678521420451, 487: 558.5846232397116}
In [99]:
importlib.reload(mp5)
#beta = mp5.todo_fstbackward(TLG_sorted, TLGfinal_sorted)
beta = { int(k):v for (k,v) in solutions['beta'].items() }
print(beta)
{0: 558.5846232397118, 1: 558.5846232397118, 2: 558.5846232397118, 3: 558.5846232397118, 4: 552.9678521420453, 5: 552.9678521420453, 6: 552.9678521420453, 7: 552.9678521420453, 8: 552.9678521420453, 9: 547.3510810443788, 10: 547.3510810443788, 11: 547.3510810443788, 12: 543.1206043078321, 13: 548.0442282249387, 14: 543.1206043078321, 15: 542.4274571272722, 16: 543.1206043078321, 17: 542.4274571272722, 18: 543.1206043078321, 19: 542.4274571272722, 20: 538.1969803907255, 21: 538.1969803907255, 22: 533.2733564736188, 23: 533.2733564736188, 24: 533.2733564736188, 25: 533.2733564736188, 26: 533.2733564736188, 27: 527.6565853759523, 28: 527.6565853759523, 29: 522.7329614588457, 30: 522.7329614588457, 31: 522.7329614588457, 32: 517.1161903611791, 33: 517.1161903611791, 34: 517.1161903611791, 35: 517.1161903611791, 36: 513.4453294125678, 37: 513.4453294125678, 38: 513.4453294125678, 39: 513.4453294125678, 40: 507.8285583149012, 41: 507.8285583149012, 42: 507.8285583149012, 43: 503.31039950590275, 44: 503.31039950590275, 45: 503.31039950590275, 46: 499.0799227693561, 47: 499.0799227693561, 48: 499.0799227693561, 49: 494.15629885224945, 50: 494.15629885224945, 51: 494.15629885224945, 52: 494.15629885224945, 53: 494.15629885224945, 54: 494.15629885224945, 55: 488.5395277545829, 56: 488.5395277545829, 57: 488.5395277545829, 58: 482.9227566569163, 59: 482.9227566569163, 60: 482.9227566569163, 61: 478.40459784791784, 62: 478.40459784791784, 63: 478.40459784791784, 64: 478.40459784791784, 65: 472.78782675025127, 66: 472.78782675025127, 67: 472.78782675025127, 68: 472.78782675025127, 69: 467.1710556525847, 70: 467.1710556525847, 71: 467.1710556525847, 72: 461.5542845549181, 73: 461.5542845549181, 74: 461.5542845549181, 75: 457.32380781837145, 76: 457.32380781837145, 77: 457.32380781837145, 78: 457.32380781837145, 79: 457.32380781837145, 80: inf, 81: 457.32380781837145, 82: inf, 83: 457.32380781837145, 84: 451.7070367207049, 85: 451.7070367207049, 86: 451.7070367207049, 87: 451.7070367207049, 88: 446.0902656230383, 89: 446.0902656230383, 90: 446.0902656230383, 91: 442.67071910270795, 92: 442.67071910270795, 93: 442.67071910270795, 94: 442.67071910270795, 95: 437.0539480050414, 96: 437.0539480050414, 97: 437.0539480050414, 98: 437.0539480050414, 99: 433.3830870564301, 100: 433.3830870564301, 101: 433.3830870564301, 102: 433.3830870564301, 103: 433.3830870564301, 104: 427.7663159587635, 105: 427.7663159587635, 106: 427.7663159587635, 107: 427.7663159587635, 108: 422.14954486109696, 109: 422.14954486109696, 110: 422.14954486109696, 111: 416.5327737634304, 112: 416.5327737634304, 113: 416.5327737634304, 114: 416.5327737634304, 115: 410.9160026657638, 116: 410.9160026657638, 117: 410.9160026657638, 118: 410.9160026657638, 119: 405.29923156809724, 120: 405.29923156809724, 121: 405.29923156809724, 122: 401.06875483155056, 123: 401.06875483155056, 124: 401.06875483155056, 125: 401.06875483155056, 126: 401.06875483155056, 127: 395.451983733884, 128: 395.451983733884, 129: 395.451983733884, 130: 395.451983733884, 131: 395.451983733884, 132: 389.8352126362174, 133: 389.8352126362174, 134: 389.8352126362174, 135: 389.8352126362174, 136: 389.8352126362174, 137: 384.21844153855085, 138: 384.21844153855085, 139: 384.21844153855085, 140: 384.21844153855085, 141: 384.21844153855085, 142: 378.6016704408843, 143: 378.6016704408843, 144: 378.6016704408843, 145: 372.9848993432177, 146: 372.9848993432177, 147: 372.9848993432177, 148: 372.9848993432177, 149: 372.9848993432177, 150: 367.36812824555113, 151: 367.36812824555113, 152: 367.36812824555113, 153: 363.9485817252208, 154: 363.9485817252208, 155: 363.9485817252208, 156: 363.9485817252208, 157: 363.9485817252208, 158: 358.3318106275542, 159: 358.3318106275542, 160: 358.3318106275542, 161: 358.3318106275542, 162: inf, 163: 358.3318106275542, 164: inf, 165: 358.3318106275542, 166: 352.71503952988763, 167: 352.71503952988763, 168: 347.791415612781, 169: 347.791415612781, 170: 347.791415612781, 171: 347.791415612781, 172: 347.791415612781, 173: 342.17464451511444, 174: 342.17464451511444, 175: 342.17464451511444, 176: 342.17464451511444, 177: 336.55787341744787, 178: 336.55787341744787, 179: 336.55787341744787, 180: 336.55787341744787, 181: 330.9411023197813, 182: 330.9411023197813, 183: 330.9411023197813, 184: inf, 185: 330.9411023197813, 186: inf, 187: 330.9411023197813, 188: inf, 189: 330.9411023197813, 190: 330.9411023197813, 191: 325.3243312221147, 192: 325.3243312221147, 193: 325.3243312221147, 194: 320.4007073050081, 195: 320.4007073050081, 196: 320.4007073050081, 197: 320.4007073050081, 198: 320.4007073050081, 199: 314.7839362073415, 200: 314.7839362073415, 201: inf, 202: 314.7839362073415, 203: inf, 204: 314.7839362073415, 205: 310.77660302210904, 206: 310.77660302210904, 207: 310.77660302210904, 208: 310.77660302210904, 209: 310.77660302210904, 210: 310.77660302210904, 211: 305.15983192444247, 212: 305.15983192444247, 213: 305.15983192444247, 214: 305.15983192444247, 215: 299.5430608267759, 216: 299.5430608267759, 217: 299.5430608267759, 218: 294.6194369096693, 219: 294.6194369096693, 220: 294.6194369096693, 221: 294.6194369096693, 222: 294.6194369096693, 223: inf, 224: 294.6194369096693, 225: inf, 226: 294.6194369096693, 227: inf, 228: 294.6194369096693, 229: inf, 230: 289.0026658120027, 231: inf, 232: 289.0026658120027, 233: 289.0026658120027, 234: 289.0026658120027, 235: 289.0026658120027, 236: 289.0026658120027, 237: 283.38589471433613, 238: 283.38589471433613, 239: inf, 240: 283.38589471433613, 241: inf, 242: 279.15541797778945, 243: 279.15541797778945, 244: 279.15541797778945, 245: 274.23179406068283, 246: 274.23179406068283, 247: 274.23179406068283, 248: 274.23179406068283, 249: 269.3081701435762, 250: 269.3081701435762, 251: 269.3081701435762, 252: 264.79001133457774, 253: 264.79001133457774, 254: 264.79001133457774, 255: 264.79001133457774, 256: 264.79001133457774, 257: 259.17324023691117, 258: 259.17324023691117, 259: 259.17324023691117, 260: 259.17324023691117, 261: 255.5023792882999, 262: 255.5023792882999, 263: 255.5023792882999, 264: 255.5023792882999, 265: 255.5023792882999, 266: 255.5023792882999, 267: 255.5023792882999, 268: 249.88560819063332, 269: 249.88560819063332, 270: 249.88560819063332, 271: 249.88560819063332, 272: 249.88560819063332, 273: 244.26883709296675, 274: 244.26883709296675, 275: 239.34521317586012, 276: 239.34521317586012, 277: 239.34521317586012, 278: 239.34521317586012, 279: 233.72844207819355, 280: 233.72844207819355, 281: 233.72844207819355, 282: 230.3088955578632, 283: 230.3088955578632, 284: 230.3088955578632, 285: 230.3088955578632, 286: 224.69212446019662, 287: 224.69212446019662, 288: 224.69212446019662, 289: 219.07535336253005, 290: 219.07535336253005, 291: 219.07535336253005, 292: 219.07535336253005, 293: 214.15172944542343, 294: 214.15172944542343, 295: 214.15172944542343, 296: 209.63357063642496, 297: 209.63357063642496, 298: 209.63357063642496, 299: 209.63357063642496, 300: 204.0167995387584, 301: 204.0167995387584, 302: 204.0167995387584, 303: 204.0167995387584, 304: 204.0167995387584, 305: 204.0167995387584, 306: 204.0167995387584, 307: 198.40002844109182, 308: 198.40002844109182, 309: 198.40002844109182, 310: 193.4764045239852, 311: 193.4764045239852, 312: 193.4764045239852, 313: 193.4764045239852, 314: 193.4764045239852, 315: 187.85963342631862, 316: 187.85963342631862, 317: 187.85963342631862, 318: 187.85963342631862, 319: 182.24286232865205, 320: 182.24286232865205, 321: 182.24286232865205, 322: 182.24286232865205, 323: 182.24286232865205, 324: 182.24286232865205, 325: 176.62609123098548, 326: 176.62609123098548, 327: 176.62609123098548, 328: 171.70246731387886, 329: 171.70246731387886, 330: 171.70246731387886, 331: 171.70246731387886, 332: 171.70246731387886, 333: 166.08569621621228, 334: 166.08569621621228, 335: inf, 336: 166.08569621621228, 337: inf, 338: 161.8552194796656, 339: 161.8552194796656, 340: 161.8552194796656, 341: 161.8552194796656, 342: 156.23844838199904, 343: 156.23844838199904, 344: 156.23844838199904, 345: 156.23844838199904, 346: inf, 347: 156.23844838199904, 348: inf, 349: 150.62167728433246, 350: 150.62167728433246, 351: inf, 352: 150.62167728433246, 353: inf, 354: 150.62167728433246, 355: 146.61434409909998, 356: 146.61434409909998, 357: 146.61434409909998, 358: 146.61434409909998, 359: inf, 360: 146.61434409909998, 361: 140.9975730014334, 362: 140.9975730014334, 363: 140.9975730014334, 364: 140.9975730014334, 365: 140.9975730014334, 366: 135.38080190376684, 367: 135.38080190376684, 368: 135.38080190376684, 369: 130.4571779866602, 370: 130.4571779866602, 371: 130.4571779866602, 372: 130.4571779866602, 373: 130.4571779866602, 374: 124.84040688899364, 375: 124.84040688899364, 376: 124.84040688899364, 377: 120.60993015244696, 378: 120.60993015244696, 379: 120.60993015244696, 380: 120.60993015244696, 381: 120.60993015244696, 382: 120.60993015244696, 383: 114.99315905478039, 384: 114.99315905478039, 385: 114.99315905478039, 386: 114.99315905478039, 387: 114.99315905478039, 388: 109.37638795711382, 389: inf, 390: 109.37638795711382, 391: inf, 392: 109.37638795711382, 393: inf, 394: 109.37638795711382, 395: 105.36905477188135, 396: 105.36905477188135, 397: 105.36905477188135, 398: 105.36905477188135, 399: 99.75228367421478, 400: 99.75228367421478, 401: 99.75228367421478, 402: 99.75228367421478, 403: 94.82865975710816, 404: 94.82865975710816, 405: 89.90503584000153, 406: 89.90503584000153, 407: 89.90503584000153, 408: 89.90503584000153, 409: 84.28826474233496, 410: 84.28826474233496, 411: 84.28826474233496, 412: 84.28826474233496, 413: 84.28826474233496, 414: 84.28826474233496, 415: 84.28826474233496, 416: 84.28826474233496, 417: 84.28826474233496, 418: 78.67149364466839, 419: 78.67149364466839, 420: 78.67149364466839, 421: 75.25194712433803, 422: 75.25194712433803, 423: 75.25194712433803, 424: 75.25194712433803, 425: 69.63517602667146, 426: 69.63517602667146, 427: inf, 428: 69.63517602667146, 429: inf, 430: 69.63517602667146, 431: 65.62784284143899, 432: 65.62784284143899, 433: 65.62784284143899, 434: inf, 435: 65.62784284143899, 436: 65.62784284143899, 437: 60.01107174377242, 438: 60.01107174377242, 439: 60.01107174377242, 440: 56.59152522344207, 441: 56.59152522344207, 442: 56.59152522344207, 443: 50.9747541257755, 444: 50.9747541257755, 445: 50.9747541257755, 446: 50.9747541257755, 447: 50.9747541257755, 448: 50.9747541257755, 449: 45.357983028108926, 450: 45.357983028108926, 451: 40.4343591110023, 452: 40.4343591110023, 453: 40.4343591110023, 454: 40.4343591110023, 455: 40.4343591110023, 456: 34.81758801333573, 457: 34.81758801333573, 458: inf, 459: 34.81758801333573, 460: inf, 461: 34.81758801333573, 462: 30.81025482810326, 463: 30.81025482810326, 464: 30.81025482810326, 465: inf, 466: 30.81025482810326, 467: 30.81025482810326, 468: 25.19348373043669, 469: 25.19348373043669, 470: 25.19348373043669, 471: 21.77393721010634, 472: 21.77393721010634, 473: 21.77393721010634, 474: 16.157166112439768, 475: 16.157166112439768, 476: 16.157166112439768, 477: 16.157166112439768, 478: 16.157166112439768, 479: 16.157166112439768, 480: 10.540395014773198, 481: 10.540395014773198, 482: 5.616771097666572, 483: 5.616771097666572, 484: 5.616771097666572, 485: 5.616771097666572, 486: 5.616771097666572, 487: 0}
In [100]:
importlib.reload(mp5)
LG_re,LGfinal_re=mp5.fstreestimate(TLG_sorted, L, Lfinal, alpha, beta)
print(LG_re)
[(0, 'ə', '', 2.723798558070257, 1), (1, '', 'a', 3.1780538303480625, 0), (0, 'ˌei', '', inf, 2), (2, '', 'a', 0.0, 0), (0, 'b', '', 4.158883083359569, 3), (3, 'ˌaʊ', '', inf, 4), (4, 't', '', 0.0, 5), (5, '', 'about', inf, 0), (1, 'b', '', 2.261763098473807, 6), (6, 'ˈaʊ', '', 0.0, 7), (7, 't', '', 0.0, 8), (8, '', 'about', inf, 0), (0, 'ˌɑ', '', inf, 9), (9, 'l', '', 0.0, 10), (10, 'ɛ', '', inf, 11), (11, 'k', '', 0.0, 12), (12, 's', '', 0.0, 13), (13, 'ˈɑ', '', 0.0, 14), (14, 'n', '', 0.0, 15), (15, 'd', '', 0.0, 16), (16, 'ɹ', '', 0.0, 17), (17, 'ɑ', '', inf, 18), (18, 'v', '', 0.0, 19), (19, 'ɪ', '', 0.0, 20), (20, 'tʃ', '', 0.0, 21), (21, '', 'aleksandrovich', inf, 0), (0, 'ˌæ', '', inf, 22), (22, 'l', '', 0.0, 23), (23, 'ɪ', '', 0.0, 24), (24, 'g', '', 0.0, 25), (25, 'z', '', 0.0, 26), (26, 'ˈæ', '', 0.0, 27), (27, 'n', '', 0.0, 28), (28, 'd', '', 0.0, 29), (29, 'ɚ', '', 3.091042453358341, 30), (30, '', 'alexander', inf, 0), (29, 'ə', '', 0.04652001563488284, 31), (31, 'ɹ', '', 0.0, 32), (32, '', 'alexander', inf, 0), (0, 'ɑ', '', inf, 33), (33, 'l', '', 0.0, 34), (34, '', 'all', inf, 0), (0, 'ˈɔ', '', 5.768320995793715, 35), (35, 'l', '', 0.8823891801985155, 36), (36, '', 'all', inf, 0), (1, 'n', '', 1.0379876668516772, 37), (37, 'd', '', 0.0, 38), (38, '', 'and', 0.0, 0), (0, 'ˈæ', '', 5.075173815233825, 39), (39, 'n', '', 1.4294665329850886, 40), (40, 'd', '', 0.0, 41), (41, '', 'and', 0.0, 0), (0, 'ˈɑ', '', 3.3704257229953782, 42), (42, 'ɹ', '', 0.7239188392267124, 43), (43, '', 'are', 0.6931471805598903, 0), (39, 'z', '', 1.31824089787483, 44), (44, '', 'as', inf, 0), (0, 'ˈɛ', '', 5.075173815233825, 45), (45, 'z', '', 0.8622235106038261, 46), (46, '', 'as', inf, 0), (39, 's', '', 1.3723081191451456, 47), (47, 'k', '', 0.0, 48), (48, '', 'ask', 0.0, 0), (1, 't', '', 1.0379876668516772, 49), (49, '', 'at', inf, 0), (39, 't', '', 1.4294665329850886, 50), (50, '', 'at', inf, 0), (3, 'i', '', 2.9549102790336974, 51), (51, '', 'be', inf, 0), (3, 'ˈei', '', 3.178053830347949, 52), (52, '', 'be', inf, 0), (3, 'ˈi', '', 1.9252908618525453, 53), (53, '', 'be', inf, 0), (3, 'ɪ', '', 1.7311348474115675, 54), (54, 'f', '', 1.232143681292655, 55), (55, 'ˈoʊ', '', 0.0645385211375924, 56), (56, 'ɹ', '', 0.0, 57), (57, '', 'before', 0.0, 0), (55, 'ˈɔ', '', 2.7725887222399024, 58), (58, 'ɹ', '', 0.0, 59), (59, '', 'before', 0.0, 0), (3, 'ˌi', '', 3.871201010907953, 60), (60, 'f', '', 0.0, 61), (61, 'ˈɔ', '', 0.0, 62), (62, 'ɹ', '', 0.0, 63), (63, '', 'before', 0.0, 0), (3, 'ˈɛ', '', 3.871201010907953, 64), (64, 'l', '', 0.0, 65), (65, 'z', '', 0.0, 66), (66, '', 'bells', 0.0, 0), (51, 't', '', 0.0, 67), (67, 'w', '', 0.0, 68), (68, 'ˈi', '', 0.0, 69), (69, 'n', '', 0.0, 70), (70, '', 'between', 0.0, 0), (54, 't', '', 0.3448404862916732, 71), (71, 'w', '', 0.0, 72), (72, 'ˈi', '', 0.0, 73), (73, 'n', '', 0.0, 74), (74, '', 'between', 0.0, 0), (3, 'ˈʊ', '', 3.178053830347949, 75), (75, 'k', '', 0.0, 76), (76, '', 'book', inf, 0), (3, 'ɹ', '', 1.7917594692280545, 77), (77, 'ˈoʊ', '', 0.0, 78), (78, 'k', '', 0.0, 79), (79, '', 'broke', inf, 0), (3, 'ə', '', 1.519825753744385, 80), (80, 't', '', 0.0, 81), (81, '', 'but', 0.0, 0), (3, 'ˈʌ', '', 3.178053830347949, 82), (82, 't', '', 0.0, 83), (83, '', 'but', 0.0, 0), (3, 'ˈɑɪ', '', 2.6184380424124356, 84), (84, '', 'by', inf, 0), (0, 'k', '', 3.3704257229953782, 85), (85, 'ˈɑ', '', 1.572396640753709, 86), (86, 'm', '', 0.0, 87), (87, 'j', '', 0.0, 88), (88, 'ə', '', 0.0, 89), (89, 'n', '', 0.0, 90), (90, 'ɪ', '', 0.8043728156701491, 91), (91, 's', '', 0.0, 92), (92, 't', '', 0.0, 93), (93, '', 'communist', inf, 0), (90, 'ə', '', 0.5930637220029666, 94), (94, 's', '', 0.0, 95), (95, 't', '', 0.0, 96), (96, '', 'communist', inf, 0), (85, 'ɹ', '', 1.1977031913122573, 97), (97, 'ˈɑɪ', '', 0.0, 98), (98, 'd', '', 0.0, 99), (99, '', 'cried', inf, 0), (0, 'z', '', 2.8238820166271807, 100), (100, 'ˈɑ', '', 0.0, 101), (101, 'ɹ', '', 0.0, 102), (102, '', 'czar', 0.0, 0), (0, 'd', '', 3.060270794691519, 103), (103, 'ˈɑ', '', 1.572396640753709, 104), (104, 'ɹ', '', 0.0, 105), (105, 'k', '', 0.0, 106), (106, '', 'dark', 3.091042453358341, 0), (106, 'ə', '', 0.04652001563488284, 107), (107, 's', '', 0.0, 108), (108, 't', '', 0.0, 109), (109, '', 'darkest', 0.0, 0), (103, 'ˈi', '', 1.331234583936748, 110), (110, 'p', '', 0.0, 111), (111, '', 'deep', 0.0, 0), (103, 'ˈɑɪ', '', 2.0243817644966384, 112), (112, 'd', '', 0.0, 113), (113, '', 'died', inf, 0), (103, 'ˈaʊ', '', 2.3608540011179002, 114), (114, 'n', '', 0.0, 115), (115, '', 'down', inf, 0), (115, 'i', '', 0.0, 116), (116, '', 'downy', 0.0, 0), (103, 'ɹ', '', 1.1977031913122573, 117), (117, 'ə', '', 0.0, 118), (118, 'm', '', 0.0, 119), (119, 'ˈæ', '', 0.0, 120), (120, 'ɾ', '', inf, 121), (121, 'ɪ', '', 0.0, 122), (122, 'k', '', 0.0, 123), (123, '', 'dramatic', inf, 0), (0, 'ˈi', '', 3.1292636661784172, 124), (124, 'z', '', 0.3136575588549704, 125), (125, 'i', '', 0.0, 126), (126, '', 'easy', 0.0, 0), (0, 'ˈei', '', 4.382026634673821, 127), (127, 't', '', 0.0, 128), (128, 'ˈi', '', 0.0, 129), (129, 'n', '', 0.0, 130), (130, '', 'eighteen', inf, 0), (45, 'm', '', 1.6094379124341458, 131), (131, 'p', '', 0.0, 132), (132, 'ɚ', '', 3.091042453358341, 133), (133, 'ɚ', '', 0.0, 134), (134, '', 'emperor', inf, 0), (132, 'ə', '', 0.04652001563488284, 135), (135, 'ɹ', '', 0.0, 136), (136, 'ə', '', 0.0, 137), (137, 'ɹ', '', 0.0, 138), (138, '', 'emperor', inf, 0), (45, 'n', '', 0.9734491457140848, 139), (139, 'd', '', 0.0, 140), (140, '', 'end', inf, 0), (0, 'ɛ', '', inf, 141), (141, 'n', '', 0.0, 142), (142, 't', '', 0.0, 143), (143, 'ˈɑɪ', '', 0.0, 144), (144, 'ɚ', '', 0.0, 145), (145, '', 'entire', inf, 0), (0, 'ɪ', '', 2.9351076517374395, 146), (146, 'n', '', 1.415281897993168, 147), (147, 't', '', 0.9954280524328851, 148), (148, 'ˈɑɪ', '', 0.0, 149), (149, 'ɚ', '', 0.0, 150), (150, '', 'entire', inf, 0), (0, 'ˌɛ', '', inf, 151), (151, 'n', '', 0.0, 152), (152, 't', '', 0.0, 153), (153, 'ˌɑɪ', '', inf, 154), (154, 'ɹ', '', 0.0, 155), (155, '', 'entire', inf, 0), (124, 'v', '', 1.3121863889662109, 156), (156, 'n', '', 0.0, 157), (157, 'ɪ', '', 0.0, 158), (158, 'ŋ', '', 0.0, 159), (159, '', 'evening', 0.0, 0), (0, 'f', '', 3.8224108467384212, 160), (160, 'ˈæ', '', 3.8066624897703605, 161), (161, 'm', '', 0.0, 162), (162, 'ə', '', 0.451985123743043, 163), (163, 'l', '', 0.0, 164), (164, 'i', '', 0.0, 165), (165, '', 'family', inf, 0), (162, 'l', '', 1.0116009116785563, 166), (166, 'i', '', 0.0, 167), (167, '', 'family', inf, 0), (160, 'ˈɑ', '', 2.101914397531914, 168), (168, 'ɹ', '', 0.5947071077466717, 169), (169, 'm', '', 0.0, 170), (170, '', 'farm', 0.0, 0), (168, 'ð', '', 0.8023464725249596, 171), (171, 'ɚ', '', 3.091042453358341, 172), (172, '', 'father', inf, 0), (171, 'ə', '', 0.04652001563488284, 173), (173, 'ɹ', '', 0.0, 174), (174, '', 'father', inf, 0), (160, 'ˈɪ', '', 2.1972245773362147, 175), (175, 'l', '', 0.0, 176), (176, '', 'fill', 0.0, 0), (160, 'l', '', 2.014903020542306, 177), (177, 'ˈei', '', 0.0, 178), (178, 'k', '', 0.0, 179), (179, '', 'flake', 0.0, 0), (160, 'oʊ', '', inf, 180), (180, 'ɹ', '', 0.0, 181), (181, '', 'four', inf, 0), (160, 'ˈɔ', '', 4.499809670330251, 182), (182, 'ɹ', '', 0.0, 183), (183, '', 'four', inf, 0), (160, 'ɹ', '', 1.7272209480904621, 184), (184, 'ˈoʊ', '', 0.0, 185), (185, 'z', '', 0.0, 186), (186, 'n̩', '', 0.0, 187), (187, '', 'frozen', 0.0, 0), (0, 'g', '', 4.669708707125551, 188), (188, 'ˈɪ', '', 0.9162907318742555, 189), (189, 'v', '', 0.0, 190), (190, 'z', '', 0.0, 191), (191, '', 'gives', 0.0, 0), (188, 'ˈoʊ', '', 0.5108256237659816, 192), (192, '', 'go', 2.2512917986065304, 0), (192, 'ɪ', '', 0.11122563511025874, 193), (193, 'n', '', 0.21130909366718242, 194), (194, '', 'going', inf, 0), (193, 'ŋ', '', 1.6582280766035637, 195), (195, '', 'going', inf, 0), (0, 'h', '', 3.2834143460057703, 196), (196, 'ˈæ', '', 3.688879454114044, 197), (197, 'd', '', 0.72593700338291, 198), (198, '', 'had', inf, 0), (197, 'p', '', 1.2367626271488916, 199), (199, 'n̩', '', 0.0, 200), (200, '', 'happen', inf, 0), (196, 'ˈɑ', '', 1.9841313618755976, 201), (201, 'ɹ', '', 0.0, 202), (202, 'n', '', 0.0, 203), (203, 'ɪ', '', 0.0, 204), (204, 's', '', 0.0, 205), (205, '', 'harness', 0.0, 0), (197, 'v', '', 1.4880770554298124, 206), (206, '', 'have', 0.0, 0), (196, 'ˈi', '', 1.7429693050586366, 207), (207, '', 'he', 0.0, 0), (196, 'ˈʌ', '', 2.99573227355404, 208), (208, '', 'he', 0.0, 0), (196, 'ˈɛ', '', 3.688879454114044, 209), (209, 'l', '', 0.0, 210), (210, 'p', '', 0.0, 211), (211, '', 'help', inf, 0), (196, 'ɚ', '', 4.3820266346739345, 212), (212, 'ɹ', '', 0.0, 213), (213, '', 'her', inf, 0), (196, 'ˈɝ', '', inf, 214), (214, '', 'her', inf, 0), (196, 'ˈɪ', '', 2.0794415416798984, 215), (215, 'ɹ', '', 0.4462871026283892, 216), (216, '', 'here', 0.0, 0), (196, 'ɪ', '', 1.5488132906176588, 217), (217, 'm', '', 1.1349799328390873, 218), (218, '', 'him', inf, 0), (215, 'm', '', 1.0216512475319632, 219), (219, '', 'him', inf, 0), (146, 'm', '', 2.051270664713229, 220), (220, '', 'him', inf, 0), (217, 'z', '', 0.38776553100876754, 221), (221, '', 'his', 0.0, 0), (196, 'ˈɔ', '', 4.3820266346739345, 222), (222, 'ɹ', '', 0.0, 223), (223, 's', '', 0.0, 224), (224, '', 'horse', 0.0, 0), (196, 'ˈaʊ', '', 2.7725887222397887, 225), (225, 's', '', 0.7205461547480354, 226), (226, '', 'house', 1.945910149055294, 0), (225, 'z', '', 0.6664789334777197, 227), (227, '', 'house', 0.0, 0), (226, 'h', '', 0.1541506798272394, 228), (228, 'ˌoʊ', '', inf, 229), (229, 'l', '', 0.0, 230), (230, 'd', '', 0.0, 231), (231, '', 'household', inf, 0), (0, 'ˈɑɪ', '', 3.8224108467383076, 232), (232, '', 'i', 0.0, 0), (146, 'f', '', 2.30258509299415, 233), (233, '', 'if', 0.0, 0), (0, 'ˈɪ', '', 3.465735902799679, 234), (234, 'n', '', 0.0, 235), (235, '', 'in', 0.0, 0), (147, 'k', '', 1.4307461236908239, 236), (236, 'l', '', 0.0, 237), (237, 'u', '', inf, 238), (238, 'd', '', 0.0, 239), (239, 'ɪ', '', 0.0, 240), (240, 'ŋ', '', 0.0, 241), (241, '', 'including', inf, 0), (237, 'ˈu', '', 0.0, 242), (242, 'd', '', 0.0, 243), (243, 'ɪ', '', 0.0, 244), (244, 'ŋ', '', 0.0, 245), (245, '', 'including', inf, 0), (147, 's', '', 0.9382696385929421, 246), (246, 't', '', 0.0, 247), (247, 'ɹ', '', 0.0, 248), (248, 'ˈʌ', '', 0.0, 249), (249, 'k', '', 0.0, 250), (250, 't', '', 0.0, 251), (251, 'ɪ', '', 0.0, 252), (252, 'v', '', 0.0, 253), (253, '', 'instructive', inf, 0), (0, 'ˌɪ', '', inf, 254), (254, 'n', '', 0.0, 255), (255, 't', '', 0.0, 256), (256, 'ɹ', '', 0.0, 257), (257, 'oʊ', '', inf, 258), (258, 'd', '', 0.0, 259), (259, 'ˈu', '', 0.0, 260), (260, 's', '', 0.0, 261), (261, 't', '', 0.0, 262), (262, '', 'introduced', inf, 0), (257, 'ə', '', 0.0, 263), (263, 'd', '', 0.0, 264), (264, 'ˈu', '', 0.0, 265), (265, 's', '', 0.0, 266), (266, 't', '', 0.0, 267), (267, '', 'introduced', inf, 0), (146, 'z', '', 1.3040562628829093, 268), (268, '', 'is', 0.0, 0), (146, 't', '', 1.415281897993168, 269), (269, '', 'it', 2.9444389791665344, 0), (269, 's', '', 0.05406722127031571, 270), (270, '', 'its', inf, 0), (0, 'dʒ', '', 5.768320995793715, 271), (271, 'u', '', inf, 272), (272, 'l', '', 0.0, 273), (273, 'ˈɑɪ', '', 0.0, 274), (274, '', 'july', inf, 0), (271, 'ə', '', 0.0, 275), (275, 'l', '', 0.0, 276), (276, 'ˈɑɪ', '', 0.0, 277), (277, '', 'july', inf, 0), (271, 'ˌu', '', inf, 278), (278, 'l', '', 0.0, 279), (279, 'ˈɑɪ', '', 0.0, 280), (280, '', 'july', inf, 0), (85, 'ˈi', '', 1.331234583936748, 281), (281, 'p', '', 0.0, 282), (282, '', 'keep', 0.0, 0), (0, 'n', '', 2.9351076517374395, 283), (283, 'ˈoʊ', '', 1.2431935174791988, 284), (284, '', 'know', 0.0, 0), (0, 'l', '', 3.2834143460057703, 285), (285, 'ˈei', '', 1.5040773967763243, 286), (286, 'k', '', 0.0, 287), (287, '', 'lake', 0.0, 0), (285, 'ˈɝ', '', inf, 288), (288, 'n', '', 0.6632942174102254, 289), (289, 'ɪ', '', 0.0, 290), (290, 'ŋ', '', 0.0, 291), (291, '', 'learning', inf, 0), (288, 'ɹ', '', 0.7239188392267124, 292), (292, 'n', '', 0.0, 293), (293, 'ɪ', '', 0.0, 294), (294, 'ŋ', '', 0.0, 295), (295, '', 'learning', inf, 0), (285, 'ˈɪ', '', 0.5877866649021826, 296), (296, 't', '', 0.0, 297), (297, 'l̩', '', 0.0, 298), (298, '', 'little', 0.0, 0), (296, 'ɾ', '', inf, 299), (299, 'l̩', '', 0.0, 300), (300, '', 'little', 0.0, 0), (285, 'ˈʌ', '', 1.5040773967763243, 301), (301, 'v', '', 0.0, 302), (302, 'l', '', 0.0, 303), (303, 'i', '', 0.0, 304), (304, '', 'lovely', 0.0, 0), (0, 'm', '', 3.5710964184575005, 305), (305, 'ˈæ', '', 3.5115454388310354, 306), (306, 's', '', 0.0, 307), (307, 'ə', '', 0.0, 308), (308, 'k', '', 0.0, 309), (309, 'ɚ', '', 3.091042453358341, 310), (310, '', 'massacre', inf, 0), (309, 'ə', '', 0.04652001563488284, 311), (311, 'ɹ', '', 0.0, 312), (312, '', 'massacre', inf, 0), (305, 'ˈi', '', 1.5656352897756278, 313), (313, '', 'me', 0.6931471805598903, 0), (305, 'ˈɑɪ', '', 2.258782470335518, 314), (314, 'l̩', '', 1.6739764335716245, 315), (315, 'z', '', 0.0, 316), (316, '', 'miles', 0.0, 0), (314, 'l', '', 0.28768207245184385, 317), (317, 'z', '', 0.0, 318), (318, '', 'miles', 0.0, 0), (305, 'ɪ', '', 1.37147927533465, 319), (319, 's', '', 0.0, 320), (320, 't', '', 0.0, 321), (321, 'ˈei', '', 0.0, 322), (322, 'k', '', 0.0, 323), (323, '', 'mistake', 0.0, 0), (305, 'oʊ', '', inf, 324), (324, 'ɹ', '', 0.0, 325), (325, '', 'more', inf, 0), (305, 'ˈɔ', '', 4.204692619390926, 326), (326, 'ɹ', '', 0.0, 327), (327, '', 'more', inf, 0), (305, 'ˈu', '', 4.204692619390926, 328), (328, 'v', '', 0.0, 329), (329, 'm', '', 0.0, 330), (330, 'n̩', '', 0.0, 331), (331, 't', '', 0.0, 332), (332, '', 'movement', inf, 0), (305, 'ə', '', 1.1601701816674677, 333), (333, 's', '', 0.0, 334), (334, 't', '', 0.0, 335), (335, '', 'must', 0.0, 0), (305, 'ˈʌ', '', 2.8183982582710314, 336), (336, 's', '', 0.0, 337), (337, 't', '', 0.0, 338), (338, '', 'must', 0.0, 0), (313, '', 'my', 0.6931471805598903, 0), (314, '', 'my', 2.7725887222399024, 0), (283, 'ˌi', '', 3.2580965380216185, 339), (339, 'ɹ', '', 0.0, 340), (340, '', 'near', 0.0, 0), (283, 'ˈɪ', '', 1.6486586255874727, 341), (341, 'ɹ', '', 0.5232481437644765, 342), (342, '', 'near', 0.0, 0), (283, 'ˈɛ', '', 3.2580965380216185, 343), (343, 'v', '', 0.0, 344), (344, 'ɚ', '', 3.091042453358341, 345), (345, '', 'never', inf, 0), (344, 'ə', '', 0.04652001563488284, 346), (346, 'ɹ', '', 0.0, 347), (347, '', 'never', inf, 0), (341, 'k', '', 0.8979415932059283, 348), (348, 'ə', '', 0.451985123743043, 349), (349, 'l', '', 0.0, 350), (350, 'ə', '', 0.0, 351), (351, 's', '', 0.0, 352), (352, '', 'nicholas', inf, 0), (348, 'l', '', 1.0116009116785563, 353), (353, 'ə', '', 0.0, 354), (354, 's', '', 0.0, 355), (355, '', 'nicholas', inf, 0), (283, 'ˈɑɪ', '', 2.005333569526101, 356), (356, 'n', '', 0.0, 357), (357, 't', '', 0.0, 358), (358, 'ˈi', '', 0.3053816495512365, 359), (359, 'n', '', 0.0, 360), (360, '', 'nineteen', inf, 0), (358, 'i', '', 1.3350010667323886, 361), (361, '', 'ninety', inf, 0), (283, 'ˈɑ', '', 1.5533484457831719, 362), (362, 't', '', 0.0, 363), (363, '', 'not', 0.0, 0), (283, 'oʊ', '', inf, 364), (364, 'v', '', 0.0, 365), (365, 'ˈɛ', '', 0.0, 366), (366, 'm', '', 0.0, 367), (367, 'b', '', 0.0, 368), (368, 'ɚ', '', 3.091042453358341, 369), (369, '', 'november', inf, 0), (368, 'ə', '', 0.04652001563488284, 370), (370, 'ɹ', '', 0.0, 371), (371, '', 'november', inf, 0), (283, 'ˈaʊ', '', 2.341805806147363, 372), (372, '', 'now', inf, 0), (1, 'v', '', 1.925290861852659, 373), (373, '', 'of', 0.0, 0), (0, 'ˈoʊ', '', 3.060270794691405, 374), (374, 'l', '', 0.8823891801985155, 375), (375, 'd', '', 0.0, 376), (376, '', 'old', inf, 0), (42, 'n', '', 0.6632942174102254, 377), (377, '', 'on', inf, 0), (35, 'n', '', 0.5340824859301847, 378), (378, '', 'on', inf, 0), (374, 'n', '', 0.5340824859301847, 379), (379, 'l', '', 0.0, 380), (380, 'i', '', 0.0, 381), (381, '', 'only', 0.0, 0), (0, 'ˈʌ', '', 4.382026634673821, 382), (382, 'ð', '', 0.5260930958968402, 383), (383, 'ɚ', '', 3.091042453358341, 384), (384, '', 'other', 0.0, 0), (383, 'ə', '', 0.04652001563488284, 385), (385, 'ɹ', '', 0.0, 386), (386, '', 'other', 0.0, 0), (0, 'ˈaʊ', '', 4.158883083359569, 387), (387, 'ɚ', '', 2.8332133440562757, 388), (388, '', 'our', 0.0, 0), (387, 'ɹ', '', 0.06062462181648698, 389), (389, '', 'our', 0.0, 0), (43, '', 'our', 0.6931471805598903, 0), (0, 'p', '', 3.5710964184575005, 390), (390, 'ˈɝ', '', inf, 391), (391, 's', '', 0.6359887667199473, 392), (392, 'ɪ', '', 0.0, 393), (393, 'n', '', 0.0, 394), (394, 'ɪ', '', 0.0, 395), (395, 'l', '', 0.0, 396), (396, '', 'personal', inf, 0), (391, 'ɹ', '', 0.7537718023763773, 397), (397, 's', '', 0.0, 398), (398, 'ə', '', 0.0, 399), (399, 'n', '', 0.0, 400), (400, 'l̩', '', 0.0, 401), (401, '', 'personal', inf, 0), (160, 'ə', '', 1.4552872326067927, 402), (402, 'z', '', 0.0, 403), (403, 'ˈɪ', '', 0.0, 404), (404, 'ʃ', '', 0.0, 405), (405, 'n̩', '', 0.0, 406), (406, '', 'physician', inf, 0), (160, 'ɪ', '', 1.6665963262739751, 407), (407, 'z', '', 0.0, 408), (408, 'ˈɪ', '', 0.0, 409), (409, 'ʃ', '', 0.0, 410), (410, 'n̩', '', 0.0, 411), (411, '', 'physician', inf, 0), (390, 'ɹ', '', 0.0, 412), (412, 'ˈɑ', '', 0.0, 413), (413, 'm', '', 0.0, 414), (414, 'ə', '', 0.0, 415), (415, 's', '', 0.0, 416), (416, 'ə', '', 0.0, 417), (417, 'z', '', 0.0, 418), (418, '', 'promises', 0.0, 0), (85, 'w', '', 1.4853852637641012, 419), (419, 'ˈɪ', '', 0.0, 420), (420, 'ɹ', '', 0.0, 421), (421, '', 'queer', 0.0, 0), (0, 'ɹ', '', 2.9957322735539265, 422), (422, 'ɑ', '', inf, 423), (423, 'n', '', 0.0, 424), (424, '', 'ran', inf, 0), (422, 'ˈæ', '', 2.6026896854444885, 425), (425, 'n', '', 0.0, 426), (426, '', 'ran', inf, 0), (422, 'i', '', 1.686398953570233, 427), (427, 'ˈæ', '', 0.0, 428), (428, 'k', '', 0.0, 429), (429, 'ʃ', '', 0.0, 430), (430, 'n̩', '', 0.0, 431), (431, '', 'reaction', inf, 0), (422, 'ˈoʊ', '', 0.5877866649020689, 432), (432, 'm', '', 0.0, 433), (433, 'ə', '', 0.0, 434), (434, 'n', '', 0.0, 435), (435, 'ˌɔ', '', inf, 436), (436, 'f', '', 0.6931471805598903, 437), (437, '', 'romanov', inf, 0), (436, 'v', '', 0.6931471805598903, 438), (438, '', 'romanov', inf, 0), (422, 'ˈu', '', 3.295836866004379, 439), (439, 'l', '', 0.0, 440), (440, '', 'rule', inf, 0), (422, 'ˈʌ', '', 1.9095425048844845, 441), (441, 'ʃ', '', 0.0, 442), (442, 'ə', '', 0.0, 443), (443, '', 'russia', inf, 0), (0, 's', '', 2.8779492378974965, 444), (444, 'ˈɛ', '', 4.069026754237939, 445), (445, 'k', '', 0.3746934494414518, 446), (446, 'n̩', '', 0.0, 447), (447, '', 'second', inf, 0), (447, 'd', '', 0.0, 448), (448, '', 'second', inf, 0), (444, 'ˈi', '', 2.123116605182531, 449), (449, '', 'see', 0.0, 0), (0, 'ʃ', '', 5.768320995793715, 450), (450, 'ˈei', '', 1.55814461804664, 451), (451, 'k', '', 0.0, 452), (452, '', 'shake', 0.0, 0), (450, 'ˈoʊ', '', 0.23638877806422443, 453), (453, 'l', '', 0.0, 454), (454, 'd', '', 0.0, 455), (455, 'ɚ', '', 3.091042453358341, 456), (456, '', 'shoulder', inf, 0), (455, 'ə', '', 0.04652001563488284, 457), (457, 'ɹ', '', 0.0, 458), (458, '', 'shoulder', inf, 0), (444, 'ˈɑɪ', '', 2.8162637857424215, 459), (459, 'm', '', 0.0, 460), (460, 'n̩', '', 0.0, 461), (461, '', 'simon', inf, 0), (444, 'ˈɪ', '', 2.459588841803793, 462), (462, 's', '', 0.4769240720902417, 463), (463, 't', '', 0.0, 464), (464, 'ɚ', '', 3.091042453358341, 465), (465, '', 'sister', inf, 0), (464, 'ə', '', 0.04652001563488284, 466), (466, 'ɹ', '', 0.0, 467), (467, '', 'sister', inf, 0), (444, 'ɪ', '', 1.9289605907415535, 468), (468, 'k', '', 0.0, 469), (469, 's', '', 0.0, 470), (470, '', 'six', inf, 0), (462, 'k', '', 0.9694005571881235, 471), (471, 's', '', 0.0, 472), (472, '', 'six', inf, 0), (444, 'l', '', 2.2772672850098843, 473), (473, 'ˈi', '', 0.0, 474), (474, 'p', '', 0.0, 475), (475, '', 'sleep', 0.0, 0), (444, 'n', '', 1.9289605907415535, 476), (476, 'ˈoʊ', '', 0.0, 477), (477, '', 'snow', 0.0, 0), (444, 'ˈʌ', '', 3.375879573677935, 478), (478, 'm', '', 0.0, 479), (479, '', 'some', 0.0, 0), (444, 'ˈaʊ', '', 3.1527360223636833, 480), (480, 'n', '', 0.0, 481), (481, 'd', '', 0.8183103235139697, 482), (482, 'z', '', 0.0, 483), (483, '', 'sounds', 0.0, 0), (481, 'z', '', 0.5819215454496316, 484), (484, '', 'sounds', 0.0, 0), (444, 't', '', 1.9289605907415535, 485), (485, 'ɑ', '', inf, 486), (486, 't', '', 0.0, 487), (487, '', 'start', inf, 0), (485, 'ˈɑ', '', 0.08701137698960792, 488), (488, 'ɹ', '', 0.4462871026283892, 489), (489, 't', '', 0.0, 490), (490, '', 'start', inf, 0), (488, 'p', '', 1.0216512475319632, 491), (491, '', 'stop', 2.8903717578962187, 0), (491, 'ɪ', '', 0.05715841383994302, 492), (492, 'ŋ', '', 0.0, 493), (493, '', 'stopping', 0.0, 0), (485, 'ˈɔ', '', 2.484906649787945, 494), (494, 'ɹ', '', 0.0, 495), (495, 'i', '', 0.0, 496), (496, 'z', '', 0.0, 497), (497, '', 'stories', inf, 0), (444, 'w', '', 2.2772672850098843, 498), (498, 'ˈi', '', 0.0, 499), (499, 'p', '', 0.0, 500), (500, '', 'sweep', 0.0, 0), (0, 't', '', 2.9351076517374395, 501), (501, 'ˈɛ', '', 3.4011973816622003, 502), (502, 'ɹ', '', 0.5596157879353996, 503), (503, 'z', '', 0.0, 504), (504, '', 'tears', inf, 0), (501, 'ˈɪ', '', 1.7917594692280545, 505), (505, 'ɹ', '', 0.0, 506), (506, 'z', '', 0.0, 507), (507, '', 'tears', inf, 0), (502, 'l', '', 0.8472978603872434, 508), (508, 'ɪ', '', 0.0, 509), (509, 'ŋ', '', 0.0, 510), (510, '', 'telling', inf, 0), (501, 'ɛ', '', inf, 511), (511, 'm', '', 0.0, 512), (512, 't', '', 0.4248831939652291, 513), (513, 'ˈei', '', 0.0, 514), (514, 'ʃ', '', 0.0, 515), (515, 'n̩', '', 0.0, 516), (516, '', 'temptation', inf, 0), (512, 'p', '', 1.06087196068529, 517), (517, 't', '', 0.0, 518), (518, 'ˈei', '', 0.0, 519), (519, 'ʃ', '', 0.0, 520), (520, 'n̩', '', 0.0, 521), (521, '', 'temptation', inf, 0), (0, 'ð', '', 3.2033716383322144, 522), (522, 'ə', '', 1.01592057282312, 523), (523, 't', '', 0.3654597734944218, 524), (524, '', 'that', inf, 0), (522, 'ˈæ', '', 3.3672958299866877, 525), (525, 't', '', 0.0, 526), (526, '', 'that', inf, 0), (523, '', 'the', 1.1837700970083915, 0), (522, 'ˈi', '', 1.4213856809312801, 527), (527, '', 'the', 1.2622417124499634, 0), (522, 'ˈʌ', '', 2.6741486494266837, 528), (528, '', 'the', 0.0, 0), (522, 'ˈɛ', '', 3.3672958299866877, 529), (529, 'ɹ', '', 0.0, 530), (530, '', 'there', 0.0, 0), (527, 'z', '', 0.332705753825735, 531), (531, '', 'these', 0.0, 0), (0, 'ɵ', '', 5.075173815233825, 532), (532, 'ˈɪ', '', 0.0, 533), (533, 'ŋ', '', 0.0, 534), (534, 'k', '', 0.0, 535), (535, '', 'think', 0.0, 0), (522, 'ˈoʊ', '', 1.352392809444268, 536), (536, '', 'though', 0.0, 0), (501, 'ˈoʊ', '', 1.3862943611197807, 537), (537, '', 'to', 1.0986122886680505, 0), (501, 'ˈu', '', 4.094344562222091, 538), (538, '', 'to', 0.0, 0), (501, 'ˈʌ', '', 2.7080502011021963, 539), (539, '', 'to', 0.0, 0), (537, 'l', '', 0.40546510810827385, 540), (540, 'd', '', 0.0, 541), (541, '', 'told', inf, 0), (501, 'ɹ', '', 1.321755839982302, 542), (542, 'ɑɪ', '', inf, 543), (543, 'ˈʌ', '', 0.0, 544), (544, 'm', '', 0.0, 545), (545, 'f', '', 0.0, 546), (546, 'n̩', '', 0.0, 547), (547, 't', '', 0.0, 548), (548, '', 'triumphant', inf, 0), (501, 'w', '', 1.6094379124341458, 549), (549, 'ˈɛ', '', 0.0, 550), (550, 'n', '', 0.0, 551), (551, 'i', '', 1.4816045409241951, 552), (552, '', 'twenty', inf, 0), (551, 't', '', 0.25782910930206526, 553), (553, 'i', '', 0.0, 554), (554, '', 'twenty', inf, 0), (382, 'p', '', 0.8938178760221263, 555), (555, '', 'up', 0.0, 0), (0, 'v', '', 3.8224108467384212, 556), (556, 'ˈɪ', '', 0.0, 557), (557, 'l', '', 0.0, 558), (558, 'ɪ', '', 0.0, 559), (559, 'dʒ', '', 0.0, 560), (560, '', 'village', 0.0, 0), (0, 'w', '', 3.2834143460057703, 561), (561, 'ˈɔ', '', 4.3438054218537445, 562), (562, 'n', '', 0.750305594399947, 563), (563, 'ɪ', '', 0.6931471805598903, 564), (564, 'd', '', 0.0, 565), (565, '', 'wanted', inf, 0), (561, 'ˈɑ', '', 1.9459101490554076, 566), (566, 'n', '', 0.7777045685880921, 567), (567, 't', '', 0.0, 568), (568, 'ə', '', 0.0, 569), (569, 'd', '', 0.0, 570), (570, '', 'wanted', inf, 0), (563, 't', '', 0.6931471805598903, 571), (571, 'ɪ', '', 0.0, 572), (572, 'd', '', 0.0, 573), (573, '', 'wanted', inf, 0), (561, 'ə', '', 1.2992829841302864, 574), (574, 'z', '', 0.0, 575), (575, '', 'was', inf, 0), (566, 'z', '', 0.6664789334778334, 576), (576, '', 'was', inf, 0), (562, 'z', '', 0.6390799592896883, 577), (577, '', 'was', inf, 0), (566, 'tʃ', '', 3.610917912644368, 578), (578, '', 'watch', 0.0, 0), (196, 'w', '', 1.8971199848859897, 579), (579, 'ˈʌ', '', 1.3862943611198943, 580), (580, 't', '', 0.0, 581), (581, '', 'what', inf, 0), (561, 'ˈʌ', '', 2.95751106073385, 582), (582, 't', '', 0.0, 583), (583, '', 'what', inf, 0), (561, 'ˌɑ', '', inf, 584), (584, 't', '', 0.0, 585), (585, '', 'what', inf, 0), (49, '', 'what', inf, 0), (579, 'ˈɛ', '', 2.0794415416798984, 586), (586, 'n', '', 0.0, 587), (587, '', 'when', inf, 0), (579, 'ˈɪ', '', 0.47000362924575256, 588), (588, 'n', '', 0.05715841383994302, 589), (589, '', 'when', inf, 0), (561, 'ˈɛ', '', 3.650658241293854, 590), (590, 'n', '', 0.0, 591), (591, '', 'when', inf, 0), (561, 'ˈɪ', '', 2.0412203288597084, 592), (592, 'n', '', 0.05715841383994302, 593), (593, '', 'when', inf, 0), (142, '', 'when', inf, 0), (588, 'tʃ', '', 2.8903717578962187, 594), (594, '', 'which', inf, 0), (592, 'tʃ', '', 2.8903717578962187, 595), (595, '', 'which', inf, 0), (146, 'tʃ', '', 4.248495242049444, 596), (596, '', 'which', inf, 0), (196, 'ˈu', '', 4.3820266346739345, 597), (597, 'z', '', 0.0, 598), (598, '', 'whose', 0.0, 0), (561, 'ɪ', '', 1.5105920777974688, 599), (599, 'l', '', 0.8109302162163203, 600), (600, '', 'will', 0.0, 0), (593, 'd', '', 0.0, 601), (601, '', 'wind', 0.0, 0), (561, 'ˈɑɪ', '', 2.397895272798337, 602), (602, 'n', '', 0.0, 603), (603, 'd', '', 0.0, 604), (604, '', 'wind', 0.0, 0), (599, 'ð', '', 0.7308875085427644, 605), (605, '', 'with', 1.7917594692280545, 0), (599, 'ɵ', '', 2.602689685444375, 606), (606, '', 'with', 0.0, 0), (605, 'ˈaʊ', '', 0.1823215567939087, 607), (607, 't', '', 0.0, 608), (608, '', 'without', 0.0, 0), (561, 'ˈʊ', '', 2.95751106073385, 609), (609, 'd', '', 0.0, 610), (610, 'z', '', 0.0, 611), (611, '', 'woods', 0.0, 0), (0, 'j', '', 5.768320995793715, 612), (612, 'ˌi', '', 1.7917594692280545, 613), (613, 'ɹ', '', 0.0, 614), (614, '', 'year', 0.0, 0), (612, 'ˈɪ', '', 0.1823215567939087, 615), (615, 'ɹ', '', 0.0, 616), (616, '', 'year', 0.0, 0), (305, 'ˌɔ', '', inf, 617), (617, 'n', '', 0.0, 618), (618, 't', '', 0.0, 619), (619, 'ə', '', 0.0, 620), (620, 'n', '', 0.0, 621), (621, 'ˈoʊ', '', 0.0, 622), (622, 'ɹ', '', 0.0, 623), (623, 'i', '', 0.0, 624), (624, '', 'montenori', inf, 0), (350, 'ɑɪ', '', inf, 625), (625, '', 'nicholai', inf, 0), (437, 's', '', 0.0, 626), (626, '', 'romanovs', inf, 0), (445, 'b', '', 1.163150809805643, 627), (627, 'æ', '', inf, 628), (628, 'g', '', 0.0, 629), (629, '', 'sebag', inf, 0)]

Finally, let's see if this re-estimation fixed the mistake!

In [101]:
importlib.reload(mp5)
TLG_re,TLGfinal_re = mp5.todo_fstcompose(T,Tfinal,LG_re,LGfinal_re)
TLG_sortre,TLGfinal_sortre = mp5.todo_sort_topologically(TLG_re,TLGfinal_re)
delta, psi, bestpath_re = mp5.todo_fstbestpath(TLG_sortre,TLGfinal_sortre)
print([ t[2] for t in bestpath_re if t[2]!='' ])
['whose', 'woods', 'the', 'czar', 'i', 'think', 'i', 'know', 'his', 'house', 'is', 'in', 'the', 'village', 'though', 'he', 'will', 'not', 'see', 'me', 'stopping', 'here', 'to', 'watch', 'his', 'woods', 'fill', 'up', 'with', 'snow', 'me', 'little', 'horse', 'must', 'think', 'it', 'queer', 'to', 'stop', 'without', 'a', 'farm', 'house', 'near', 'between', 'the', 'woods', 'and', 'frozen', 'lake', 'the', 'darkest', 'evening', 'of', 'the', 'year', 'he', 'gives', 'his', 'harness', 'bells', 'a', 'shake', 'to', 'ask', 'if', 'there', 'is', 'some', 'mistake', 'the', 'only', 'other', 'sounds', 'the', 'sweep', 'of', 'easy', 'wind', 'and', 'downy', 'flake', 'the', 'woods', 'are', 'lovely', 'dark', 'and', 'deep', 'but', 'i', 'have', 'promises', 'to', 'keep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep']

Still there! Maybe it would be a good idea to use language model training data that has some topical similarity to the speech we want to recognize.

In [ ]: