In this MP, you will train and test a nested set of two WFSTs: a language model, and a lexicon. The file you're looking at right now (mp5overview.ipynb) is a debugging tool. The file you actually need to complete is mp5.py. The unit tests are provided in run_tests.py and tests/test_visible.py. All of these are available as part of the code package, https://courses.engr.illinois.edu/ece417/fa2020/ece417_20fall_mp5.zip.
import numpy as np
import matplotlib.figure
import matplotlib.pyplot as plt
%matplotlib inline
import mp5
import importlib
importlib.reload(mp5)
<module 'mp5' from '/Users/jhasegaw/Dropbox/mark/teaching/ece417/ece417labs/20fall/mp5/src/mp5.py'>
In order to reduce the length of this overview, every block below has two options: you can either show your own results, or you can show the distributed solutions. In order to decide which one you want to see, you should just comment out the other one.
import json
with open('solutions.json') as f:
solutions = json.load(f)
This MP will simulate training an automatic speech recognizer from an untranscribed input utterance.
Here are the phonemes that were "recognized" in the input audio file:
with open('data/transcript.txt') as f:
S = f.read().strip().split()
print(S)
['h', 'ˈu', 'z', 'w', 'ˈʊ', 'd', 'z', 'ð', 'ˈi', 'z', 'ˈɑ', 'ɹ', 'ˈɑɪ', 'ɵ', 'ˈɪ', 'ŋ', 'k', 'ˈɑɪ', 'n', 'ˈoʊ', 'h', 'ɪ', 'z', 'h', 'ˈaʊ', 's', 'ɪ', 'z', 'ˈɪ', 'n', 'ð', 'ə', 'v', 'ˈɪ', 'l', 'ɪ', 'dʒ', 'ð', 'ˈoʊ', 'h', 'ˈi', 'w', 'ɪ', 'l', 'n', 'ˈɑ', 't', 's', 'ˈi', 'm', 'ˈi', 's', 't', 'ˈɑ', 'p', 'ɪ', 'ŋ', 'h', 'ˈɪ', 'ɹ', 't', 'ˈoʊ', 'w', 'ˈɑ', 'tʃ', 'h', 'ɪ', 'z', 'w', 'ˈʊ', 'd', 'z', 'f', 'ˈɪ', 'l', 'ˈʌ', 'p', 'w', 'ɪ', 'ð', 's', 'n', 'ˈoʊ', 'm', 'ˈi', 'l', 'ˈɪ', 't', 'l̩', 'h', 'ˈɔ', 'ɹ', 's', 'm', 'ə', 's', 't', 'ɵ', 'ˈɪ', 'ŋ', 'k', 'ɪ', 't', 'k', 'w', 'ˈɪ', 'ɹ', 't', 'ˈoʊ', 's', 't', 'ˈɑ', 'p', 'w', 'ɪ', 'ð', 'ˈaʊ', 't', 'ə', 'f', 'ˈɑ', 'ɹ', 'm', 'h', 'ˈaʊ', 's', 'n', 'ˌi', 'ɹ', 'b', 'i', 't', 'w', 'ˈi', 'n', 'ð', 'ə', 'w', 'ˈʊ', 'd', 'z', 'ə', 'n', 'd', 'f', 'ɹ', 'ˈoʊ', 'z', 'n̩', 'l', 'ˈei', 'k', 'ð', 'ə', 'd', 'ˈɑ', 'ɹ', 'k', 'ə', 's', 't', 'ˈi', 'v', 'n', 'ɪ', 'ŋ', 'ə', 'v', 'ð', 'ə', 'j', 'ˌi', 'ɹ', 'h', 'ˈi', 'g', 'ˈɪ', 'v', 'z', 'h', 'ɪ', 'z', 'h', 'ˈɑ', 'ɹ', 'n', 'ɪ', 's', 'b', 'ˈɛ', 'l', 'z', 'ə', 'ʃ', 'ˈei', 'k', 't', 'ˈoʊ', 'ˈæ', 's', 'k', 'ɪ', 'f', 'ð', 'ˈɛ', 'ɹ', 'ɪ', 'z', 's', 'ˈʌ', 'm', 'm', 'ɪ', 's', 't', 'ˈei', 'k', 'ð', 'ə', 'ˈoʊ', 'n', 'l', 'i', 'ˈʌ', 'ð', 'ɚ', 's', 'ˈaʊ', 'n', 'd', 'z', 'ð', 'ə', 's', 'w', 'ˈi', 'p', 'ə', 'v', 'ˈi', 'z', 'i', 'w', 'ˈɪ', 'n', 'd', 'ə', 'n', 'd', 'd', 'ˈaʊ', 'n', 'i', 'f', 'l', 'ˈei', 'k', 'ð', 'ə', 'w', 'ˈʊ', 'd', 'z', 'ˈɑ', 'ɹ', 'l', 'ˈʌ', 'v', 'l', 'i', 'd', 'ˈɑ', 'ɹ', 'k', 'ə', 'n', 'd', 'd', 'ˈi', 'p', 'b', 'ə', 't', 'ˈɑɪ', 'h', 'ˈæ', 'v', 'p', 'ɹ', 'ˈɑ', 'm', 'ə', 's', 'ə', 'z', 't', 'ˈoʊ', 'k', 'ˈi', 'p', 'ə', 'n', 'd', 'm', 'ˈɑɪ', 'l̩', 'z', 't', 'ˈoʊ', 'g', 'ˈoʊ', 'b', 'ɪ', 'f', 'ˈoʊ', 'ɹ', 'ˈɑɪ', 's', 'l', 'ˈi', 'p', 'ə', 'n', 'd', 'm', 'ˈɑɪ', 'l̩', 'z', 't', 'ˈoʊ', 'g', 'ˈoʊ', 'b', 'ɪ', 'f', 'ˈoʊ', 'ɹ', 'ˈɑɪ', 's', 'l', 'ˈi', 'p']
This time, you'll need to write your own code to read in the lexicon, to train a language model, and to read in the transcript as a WFST. Here's some code to load the lexicon. Notice that, for most words, there is more than one possible pronunciation, e.g., because people sometimes pronounce things casually in a reduced fashion:
with open('data/lexicon.txt') as f:
for line in f:
ph = line.strip().split()
print('Word %s is pronounced as %s'%(ph[0],':'.join(ph[1:])))
Word a is pronounced as ə Word a is pronounced as ˌei Word about is pronounced as b:ˌaʊ:t Word about is pronounced as ə:b:ˈaʊ:t Word aleksandrovich is pronounced as ˌɑ:l:ɛ:k:s:ˈɑ:n:d:ɹ:ɑ:v:ɪ:tʃ Word alexander is pronounced as ˌæ:l:ɪ:g:z:ˈæ:n:d:ɚ Word alexander is pronounced as ˌæ:l:ɪ:g:z:ˈæ:n:d:ə:ɹ Word all is pronounced as ɑ:l Word all is pronounced as ˈɔ:l Word and is pronounced as ə:n:d Word and is pronounced as ˈæ:n:d Word are is pronounced as ˈɑ:ɹ Word as is pronounced as ˈæ:z Word as is pronounced as ˈɛ:z Word ask is pronounced as ˈæ:s:k Word at is pronounced as ə:t Word at is pronounced as ˈæ:t Word be is pronounced as b:i Word be is pronounced as b:ˈei Word be is pronounced as b:ˈi Word before is pronounced as b:ɪ:f:ˈoʊ:ɹ Word before is pronounced as b:ɪ:f:ˈɔ:ɹ Word before is pronounced as b:ˌi:f:ˈɔ:ɹ Word bells is pronounced as b:ˈɛ:l:z Word between is pronounced as b:i:t:w:ˈi:n Word between is pronounced as b:ɪ:t:w:ˈi:n Word book is pronounced as b:ˈʊ:k Word broke is pronounced as b:ɹ:ˈoʊ:k Word but is pronounced as b:ə:t Word but is pronounced as b:ˈʌ:t Word by is pronounced as b:ˈɑɪ Word communist is pronounced as k:ˈɑ:m:j:ə:n:ɪ:s:t Word communist is pronounced as k:ˈɑ:m:j:ə:n:ə:s:t Word cried is pronounced as k:ɹ:ˈɑɪ:d Word czar is pronounced as z:ˈɑ:ɹ Word dark is pronounced as d:ˈɑ:ɹ:k Word darkest is pronounced as d:ˈɑ:ɹ:k:ə:s:t Word deep is pronounced as d:ˈi:p Word died is pronounced as d:ˈɑɪ:d Word down is pronounced as d:ˈaʊ:n Word downy is pronounced as d:ˈaʊ:n:i Word dramatic is pronounced as d:ɹ:ə:m:ˈæ:ɾ:ɪ:k Word easy is pronounced as ˈi:z:i Word eighteen is pronounced as ˈei:t:ˈi:n Word emperor is pronounced as ˈɛ:m:p:ɚ:ɚ Word emperor is pronounced as ˈɛ:m:p:ə:ɹ:ə:ɹ Word end is pronounced as ˈɛ:n:d Word entire is pronounced as ɛ:n:t:ˈɑɪ:ɚ Word entire is pronounced as ɪ:n:t:ˈɑɪ:ɚ Word entire is pronounced as ˌɛ:n:t:ˌɑɪ:ɹ Word evening is pronounced as ˈi:v:n:ɪ:ŋ Word family is pronounced as f:ˈæ:m:ə:l:i Word family is pronounced as f:ˈæ:m:l:i Word farm is pronounced as f:ˈɑ:ɹ:m Word father is pronounced as f:ˈɑ:ð:ɚ Word father is pronounced as f:ˈɑ:ð:ə:ɹ Word fill is pronounced as f:ˈɪ:l Word flake is pronounced as f:l:ˈei:k Word four is pronounced as f:oʊ:ɹ Word four is pronounced as f:ˈɔ:ɹ Word frozen is pronounced as f:ɹ:ˈoʊ:z:n̩ Word gives is pronounced as g:ˈɪ:v:z Word go is pronounced as g:ˈoʊ Word going is pronounced as g:ˈoʊ:ɪ:n Word going is pronounced as g:ˈoʊ:ɪ:ŋ Word had is pronounced as h:ˈæ:d Word happen is pronounced as h:ˈæ:p:n̩ Word harness is pronounced as h:ˈɑ:ɹ:n:ɪ:s Word have is pronounced as h:ˈæ:v Word he is pronounced as h:ˈi Word he is pronounced as h:ˈʌ Word help is pronounced as h:ˈɛ:l:p Word her is pronounced as h:ɚ:ɹ Word her is pronounced as h:ˈɝ Word here is pronounced as h:ˈɪ:ɹ Word him is pronounced as h:ɪ:m Word him is pronounced as h:ˈɪ:m Word him is pronounced as ɪ:m Word his is pronounced as h:ɪ:z Word horse is pronounced as h:ˈɔ:ɹ:s Word house is pronounced as h:ˈaʊ:s Word house is pronounced as h:ˈaʊ:z Word household is pronounced as h:ˈaʊ:s:h:ˌoʊ:l:d Word i is pronounced as ˈɑɪ Word if is pronounced as ɪ:f Word in is pronounced as ˈɪ:n Word including is pronounced as ɪ:n:k:l:u:d:ɪ:ŋ Word including is pronounced as ɪ:n:k:l:ˈu:d:ɪ:ŋ Word instructive is pronounced as ɪ:n:s:t:ɹ:ˈʌ:k:t:ɪ:v Word introduced is pronounced as ˌɪ:n:t:ɹ:oʊ:d:ˈu:s:t Word introduced is pronounced as ˌɪ:n:t:ɹ:ə:d:ˈu:s:t Word is is pronounced as ɪ:z Word it is pronounced as ɪ:t Word its is pronounced as ɪ:t:s Word july is pronounced as dʒ:u:l:ˈɑɪ Word july is pronounced as dʒ:ə:l:ˈɑɪ Word july is pronounced as dʒ:ˌu:l:ˈɑɪ Word keep is pronounced as k:ˈi:p Word know is pronounced as n:ˈoʊ Word lake is pronounced as l:ˈei:k Word learning is pronounced as l:ˈɝ:n:ɪ:ŋ Word learning is pronounced as l:ˈɝ:ɹ:n:ɪ:ŋ Word little is pronounced as l:ˈɪ:t:l̩ Word little is pronounced as l:ˈɪ:ɾ:l̩ Word lovely is pronounced as l:ˈʌ:v:l:i Word massacre is pronounced as m:ˈæ:s:ə:k:ɚ Word massacre is pronounced as m:ˈæ:s:ə:k:ə:ɹ Word me is pronounced as m:ˈi Word miles is pronounced as m:ˈɑɪ:l̩:z Word miles is pronounced as m:ˈɑɪ:l:z Word mistake is pronounced as m:ɪ:s:t:ˈei:k Word more is pronounced as m:oʊ:ɹ Word more is pronounced as m:ˈɔ:ɹ Word movement is pronounced as m:ˈu:v:m:n̩:t Word must is pronounced as m:ə:s:t Word must is pronounced as m:ˈʌ:s:t Word my is pronounced as m:ˈi Word my is pronounced as m:ˈɑɪ Word near is pronounced as n:ˌi:ɹ Word near is pronounced as n:ˈɪ:ɹ Word never is pronounced as n:ˈɛ:v:ɚ Word never is pronounced as n:ˈɛ:v:ə:ɹ Word nicholas is pronounced as n:ˈɪ:k:ə:l:ə:s Word nicholas is pronounced as n:ˈɪ:k:l:ə:s Word nineteen is pronounced as n:ˈɑɪ:n:t:ˈi:n Word ninety is pronounced as n:ˈɑɪ:n:t:i Word not is pronounced as n:ˈɑ:t Word november is pronounced as n:oʊ:v:ˈɛ:m:b:ɚ Word november is pronounced as n:oʊ:v:ˈɛ:m:b:ə:ɹ Word now is pronounced as n:ˈaʊ Word of is pronounced as ə:v Word old is pronounced as ˈoʊ:l:d Word on is pronounced as ˈɑ:n Word on is pronounced as ˈɔ:n Word only is pronounced as ˈoʊ:n:l:i Word other is pronounced as ˈʌ:ð:ɚ Word other is pronounced as ˈʌ:ð:ə:ɹ Word our is pronounced as ˈaʊ:ɚ Word our is pronounced as ˈaʊ:ɹ Word our is pronounced as ˈɑ:ɹ Word personal is pronounced as p:ˈɝ:s:ɪ:n:ɪ:l Word personal is pronounced as p:ˈɝ:ɹ:s:ə:n:l̩ Word physician is pronounced as f:ə:z:ˈɪ:ʃ:n̩ Word physician is pronounced as f:ɪ:z:ˈɪ:ʃ:n̩ Word promises is pronounced as p:ɹ:ˈɑ:m:ə:s:ə:z Word queer is pronounced as k:w:ˈɪ:ɹ Word ran is pronounced as ɹ:ɑ:n Word ran is pronounced as ɹ:ˈæ:n Word reaction is pronounced as ɹ:i:ˈæ:k:ʃ:n̩ Word romanov is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:f Word romanov is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:v Word rule is pronounced as ɹ:ˈu:l Word russia is pronounced as ɹ:ˈʌ:ʃ:ə Word second is pronounced as s:ˈɛ:k:n̩ Word second is pronounced as s:ˈɛ:k:n̩:d Word see is pronounced as s:ˈi Word shake is pronounced as ʃ:ˈei:k Word shoulder is pronounced as ʃ:ˈoʊ:l:d:ɚ Word shoulder is pronounced as ʃ:ˈoʊ:l:d:ə:ɹ Word simon is pronounced as s:ˈɑɪ:m:n̩ Word sister is pronounced as s:ˈɪ:s:t:ɚ Word sister is pronounced as s:ˈɪ:s:t:ə:ɹ Word six is pronounced as s:ɪ:k:s Word six is pronounced as s:ˈɪ:k:s Word sleep is pronounced as s:l:ˈi:p Word snow is pronounced as s:n:ˈoʊ Word some is pronounced as s:ˈʌ:m Word sounds is pronounced as s:ˈaʊ:n:d:z Word sounds is pronounced as s:ˈaʊ:n:z Word start is pronounced as s:t:ɑ:t Word start is pronounced as s:t:ˈɑ:ɹ:t Word stop is pronounced as s:t:ˈɑ:p Word stopping is pronounced as s:t:ˈɑ:p:ɪ:ŋ Word stories is pronounced as s:t:ˈɔ:ɹ:i:z Word sweep is pronounced as s:w:ˈi:p Word tears is pronounced as t:ˈɛ:ɹ:z Word tears is pronounced as t:ˈɪ:ɹ:z Word telling is pronounced as t:ˈɛ:l:ɪ:ŋ Word temptation is pronounced as t:ɛ:m:t:ˈei:ʃ:n̩ Word temptation is pronounced as t:ɛ:m:p:t:ˈei:ʃ:n̩ Word that is pronounced as ð:ə:t Word that is pronounced as ð:ˈæ:t Word the is pronounced as ð:ə Word the is pronounced as ð:ˈi Word the is pronounced as ð:ˈʌ Word there is pronounced as ð:ˈɛ:ɹ Word these is pronounced as ð:ˈi:z Word think is pronounced as ɵ:ˈɪ:ŋ:k Word though is pronounced as ð:ˈoʊ Word to is pronounced as t:ˈoʊ Word to is pronounced as t:ˈu Word to is pronounced as t:ˈʌ Word told is pronounced as t:ˈoʊ:l:d Word triumphant is pronounced as t:ɹ:ɑɪ:ˈʌ:m:f:n̩:t Word twenty is pronounced as t:w:ˈɛ:n:i Word twenty is pronounced as t:w:ˈɛ:n:t:i Word up is pronounced as ˈʌ:p Word village is pronounced as v:ˈɪ:l:ɪ:dʒ Word wanted is pronounced as w:ˈɔ:n:ɪ:d Word wanted is pronounced as w:ˈɑ:n:t:ə:d Word wanted is pronounced as w:ˈɔ:n:t:ɪ:d Word was is pronounced as w:ə:z Word was is pronounced as w:ˈɑ:z Word was is pronounced as w:ˈɔ:z Word watch is pronounced as w:ˈɑ:tʃ Word what is pronounced as h:w:ˈʌ:t Word what is pronounced as w:ˈʌ:t Word what is pronounced as w:ˌɑ:t Word what is pronounced as ə:t Word when is pronounced as h:w:ˈɛ:n Word when is pronounced as h:w:ˈɪ:n Word when is pronounced as w:ˈɛ:n Word when is pronounced as w:ˈɪ:n Word when is pronounced as ɛ:n Word which is pronounced as h:w:ˈɪ:tʃ Word which is pronounced as w:ˈɪ:tʃ Word which is pronounced as ɪ:tʃ Word whose is pronounced as h:ˈu:z Word will is pronounced as w:ɪ:l Word wind is pronounced as w:ˈɪ:n:d Word wind is pronounced as w:ˈɑɪ:n:d Word with is pronounced as w:ɪ:ð Word with is pronounced as w:ɪ:ɵ Word without is pronounced as w:ɪ:ð:ˈaʊ:t Word woods is pronounced as w:ˈʊ:d:z Word year is pronounced as j:ˌi:ɹ Word year is pronounced as j:ˈɪ:ɹ Word montenori is pronounced as m:ˌɔ:n:t:ə:n:ˈoʊ:ɹ:i Word nicholai is pronounced as n:ˈɪ:k:ə:l:ɑɪ Word romanovs is pronounced as ɹ:ˈoʊ:m:ə:n:ˌɔ:f:s Word sebag is pronounced as s:ˈɛ:b:æ:g
You will also use Laplace smoothing to train a unigram language model. Let's look at the language model training texts:
text = []
with open('data/languagemodeltexts.txt') as f:
for line in f:
text.append(line.strip())
print(' '.join(text))
when telling of nicholas the second our temptation is to start at its dramatic end july nineteen eighteen massacre of him his entire family his household help and personal physician by which a triumphant communist movement introduced its rule but there are more instructive stories about nicholas including his reaction to learning in november eighteen ninety four that his father alexander had died and that he nicholas was now emperor of all russia as told by simon sebag montenori in his book twenty six year old nicholai aleksandrovich romanov broke down in tears and ran to his sister what is going to happen to me to my family and to russia he cried on her shoulder i never wanted to be czar
The first thing you need to do is to read in the lexicon and the transcript in the form of WFSTs, and to train a WFST to represent the language model. Remember that an WFST is composed of six things:
Most of this MP will assume that most of these things are trivial:
We will assume that "creating a WFST" is synonymous with "creating a list of transitions." Each transition, $t$, needs to be a python tuple, containing the following five elements, in order:
First, let's start with the easiest WFST: the one representing the input transcription. The number of transitions should exactly equal $N$, the number of phoneme symbols in the file data/transcript.txt. These transitions should go through a sequence of $N+1$ states, starting with $p[0]=0$, and ending with $n[N]=N$. The input and output labels are the same ($i[t]=o[t]$), and the weights are all zero ($w[t]=0$). So it should look like
Obviously, the initial state is $q=0$, and the final state is $q=N$. We won't write that down anywhere, we just need to remember it.
importlib.reload(mp5)
#T, Tfinal =mp5.todo_transcript2wfst('data/transcript.txt')
T = solutions['T']
Tfinal = solutions['Tfinal']
print(Tfinal)
print(T)
[342] [[0, 'h', 'h', 0, 1], [1, 'ˈu', 'ˈu', 0, 2], [2, 'z', 'z', 0, 3], [3, 'w', 'w', 0, 4], [4, 'ˈʊ', 'ˈʊ', 0, 5], [5, 'd', 'd', 0, 6], [6, 'z', 'z', 0, 7], [7, 'ð', 'ð', 0, 8], [8, 'ˈi', 'ˈi', 0, 9], [9, 'z', 'z', 0, 10], [10, 'ˈɑ', 'ˈɑ', 0, 11], [11, 'ɹ', 'ɹ', 0, 12], [12, 'ˈɑɪ', 'ˈɑɪ', 0, 13], [13, 'ɵ', 'ɵ', 0, 14], [14, 'ˈɪ', 'ˈɪ', 0, 15], [15, 'ŋ', 'ŋ', 0, 16], [16, 'k', 'k', 0, 17], [17, 'ˈɑɪ', 'ˈɑɪ', 0, 18], [18, 'n', 'n', 0, 19], [19, 'ˈoʊ', 'ˈoʊ', 0, 20], [20, 'h', 'h', 0, 21], [21, 'ɪ', 'ɪ', 0, 22], [22, 'z', 'z', 0, 23], [23, 'h', 'h', 0, 24], [24, 'ˈaʊ', 'ˈaʊ', 0, 25], [25, 's', 's', 0, 26], [26, 'ɪ', 'ɪ', 0, 27], [27, 'z', 'z', 0, 28], [28, 'ˈɪ', 'ˈɪ', 0, 29], [29, 'n', 'n', 0, 30], [30, 'ð', 'ð', 0, 31], [31, 'ə', 'ə', 0, 32], [32, 'v', 'v', 0, 33], [33, 'ˈɪ', 'ˈɪ', 0, 34], [34, 'l', 'l', 0, 35], [35, 'ɪ', 'ɪ', 0, 36], [36, 'dʒ', 'dʒ', 0, 37], [37, 'ð', 'ð', 0, 38], [38, 'ˈoʊ', 'ˈoʊ', 0, 39], [39, 'h', 'h', 0, 40], [40, 'ˈi', 'ˈi', 0, 41], [41, 'w', 'w', 0, 42], [42, 'ɪ', 'ɪ', 0, 43], [43, 'l', 'l', 0, 44], [44, 'n', 'n', 0, 45], [45, 'ˈɑ', 'ˈɑ', 0, 46], [46, 't', 't', 0, 47], [47, 's', 's', 0, 48], [48, 'ˈi', 'ˈi', 0, 49], [49, 'm', 'm', 0, 50], [50, 'ˈi', 'ˈi', 0, 51], [51, 's', 's', 0, 52], [52, 't', 't', 0, 53], [53, 'ˈɑ', 'ˈɑ', 0, 54], [54, 'p', 'p', 0, 55], [55, 'ɪ', 'ɪ', 0, 56], [56, 'ŋ', 'ŋ', 0, 57], [57, 'h', 'h', 0, 58], [58, 'ˈɪ', 'ˈɪ', 0, 59], [59, 'ɹ', 'ɹ', 0, 60], [60, 't', 't', 0, 61], [61, 'ˈoʊ', 'ˈoʊ', 0, 62], [62, 'w', 'w', 0, 63], [63, 'ˈɑ', 'ˈɑ', 0, 64], [64, 'tʃ', 'tʃ', 0, 65], [65, 'h', 'h', 0, 66], [66, 'ɪ', 'ɪ', 0, 67], [67, 'z', 'z', 0, 68], [68, 'w', 'w', 0, 69], [69, 'ˈʊ', 'ˈʊ', 0, 70], [70, 'd', 'd', 0, 71], [71, 'z', 'z', 0, 72], [72, 'f', 'f', 0, 73], [73, 'ˈɪ', 'ˈɪ', 0, 74], [74, 'l', 'l', 0, 75], [75, 'ˈʌ', 'ˈʌ', 0, 76], [76, 'p', 'p', 0, 77], [77, 'w', 'w', 0, 78], [78, 'ɪ', 'ɪ', 0, 79], [79, 'ð', 'ð', 0, 80], [80, 's', 's', 0, 81], [81, 'n', 'n', 0, 82], [82, 'ˈoʊ', 'ˈoʊ', 0, 83], [83, 'm', 'm', 0, 84], [84, 'ˈi', 'ˈi', 0, 85], [85, 'l', 'l', 0, 86], [86, 'ˈɪ', 'ˈɪ', 0, 87], [87, 't', 't', 0, 88], [88, 'l̩', 'l̩', 0, 89], [89, 'h', 'h', 0, 90], [90, 'ˈɔ', 'ˈɔ', 0, 91], [91, 'ɹ', 'ɹ', 0, 92], [92, 's', 's', 0, 93], [93, 'm', 'm', 0, 94], [94, 'ə', 'ə', 0, 95], [95, 's', 's', 0, 96], [96, 't', 't', 0, 97], [97, 'ɵ', 'ɵ', 0, 98], [98, 'ˈɪ', 'ˈɪ', 0, 99], [99, 'ŋ', 'ŋ', 0, 100], [100, 'k', 'k', 0, 101], [101, 'ɪ', 'ɪ', 0, 102], [102, 't', 't', 0, 103], [103, 'k', 'k', 0, 104], [104, 'w', 'w', 0, 105], [105, 'ˈɪ', 'ˈɪ', 0, 106], [106, 'ɹ', 'ɹ', 0, 107], [107, 't', 't', 0, 108], [108, 'ˈoʊ', 'ˈoʊ', 0, 109], [109, 's', 's', 0, 110], [110, 't', 't', 0, 111], [111, 'ˈɑ', 'ˈɑ', 0, 112], [112, 'p', 'p', 0, 113], [113, 'w', 'w', 0, 114], [114, 'ɪ', 'ɪ', 0, 115], [115, 'ð', 'ð', 0, 116], [116, 'ˈaʊ', 'ˈaʊ', 0, 117], [117, 't', 't', 0, 118], [118, 'ə', 'ə', 0, 119], [119, 'f', 'f', 0, 120], [120, 'ˈɑ', 'ˈɑ', 0, 121], [121, 'ɹ', 'ɹ', 0, 122], [122, 'm', 'm', 0, 123], [123, 'h', 'h', 0, 124], [124, 'ˈaʊ', 'ˈaʊ', 0, 125], [125, 's', 's', 0, 126], [126, 'n', 'n', 0, 127], [127, 'ˌi', 'ˌi', 0, 128], [128, 'ɹ', 'ɹ', 0, 129], [129, 'b', 'b', 0, 130], [130, 'i', 'i', 0, 131], [131, 't', 't', 0, 132], [132, 'w', 'w', 0, 133], [133, 'ˈi', 'ˈi', 0, 134], [134, 'n', 'n', 0, 135], [135, 'ð', 'ð', 0, 136], [136, 'ə', 'ə', 0, 137], [137, 'w', 'w', 0, 138], [138, 'ˈʊ', 'ˈʊ', 0, 139], [139, 'd', 'd', 0, 140], [140, 'z', 'z', 0, 141], [141, 'ə', 'ə', 0, 142], [142, 'n', 'n', 0, 143], [143, 'd', 'd', 0, 144], [144, 'f', 'f', 0, 145], [145, 'ɹ', 'ɹ', 0, 146], [146, 'ˈoʊ', 'ˈoʊ', 0, 147], [147, 'z', 'z', 0, 148], [148, 'n̩', 'n̩', 0, 149], [149, 'l', 'l', 0, 150], [150, 'ˈei', 'ˈei', 0, 151], [151, 'k', 'k', 0, 152], [152, 'ð', 'ð', 0, 153], [153, 'ə', 'ə', 0, 154], [154, 'd', 'd', 0, 155], [155, 'ˈɑ', 'ˈɑ', 0, 156], [156, 'ɹ', 'ɹ', 0, 157], [157, 'k', 'k', 0, 158], [158, 'ə', 'ə', 0, 159], [159, 's', 's', 0, 160], [160, 't', 't', 0, 161], [161, 'ˈi', 'ˈi', 0, 162], [162, 'v', 'v', 0, 163], [163, 'n', 'n', 0, 164], [164, 'ɪ', 'ɪ', 0, 165], [165, 'ŋ', 'ŋ', 0, 166], [166, 'ə', 'ə', 0, 167], [167, 'v', 'v', 0, 168], [168, 'ð', 'ð', 0, 169], [169, 'ə', 'ə', 0, 170], [170, 'j', 'j', 0, 171], [171, 'ˌi', 'ˌi', 0, 172], [172, 'ɹ', 'ɹ', 0, 173], [173, 'h', 'h', 0, 174], [174, 'ˈi', 'ˈi', 0, 175], [175, 'g', 'g', 0, 176], [176, 'ˈɪ', 'ˈɪ', 0, 177], [177, 'v', 'v', 0, 178], [178, 'z', 'z', 0, 179], [179, 'h', 'h', 0, 180], [180, 'ɪ', 'ɪ', 0, 181], [181, 'z', 'z', 0, 182], [182, 'h', 'h', 0, 183], [183, 'ˈɑ', 'ˈɑ', 0, 184], [184, 'ɹ', 'ɹ', 0, 185], [185, 'n', 'n', 0, 186], [186, 'ɪ', 'ɪ', 0, 187], [187, 's', 's', 0, 188], [188, 'b', 'b', 0, 189], [189, 'ˈɛ', 'ˈɛ', 0, 190], [190, 'l', 'l', 0, 191], [191, 'z', 'z', 0, 192], [192, 'ə', 'ə', 0, 193], [193, 'ʃ', 'ʃ', 0, 194], [194, 'ˈei', 'ˈei', 0, 195], [195, 'k', 'k', 0, 196], [196, 't', 't', 0, 197], [197, 'ˈoʊ', 'ˈoʊ', 0, 198], [198, 'ˈæ', 'ˈæ', 0, 199], [199, 's', 's', 0, 200], [200, 'k', 'k', 0, 201], [201, 'ɪ', 'ɪ', 0, 202], [202, 'f', 'f', 0, 203], [203, 'ð', 'ð', 0, 204], [204, 'ˈɛ', 'ˈɛ', 0, 205], [205, 'ɹ', 'ɹ', 0, 206], [206, 'ɪ', 'ɪ', 0, 207], [207, 'z', 'z', 0, 208], [208, 's', 's', 0, 209], [209, 'ˈʌ', 'ˈʌ', 0, 210], [210, 'm', 'm', 0, 211], [211, 'm', 'm', 0, 212], [212, 'ɪ', 'ɪ', 0, 213], [213, 's', 's', 0, 214], [214, 't', 't', 0, 215], [215, 'ˈei', 'ˈei', 0, 216], [216, 'k', 'k', 0, 217], [217, 'ð', 'ð', 0, 218], [218, 'ə', 'ə', 0, 219], [219, 'ˈoʊ', 'ˈoʊ', 0, 220], [220, 'n', 'n', 0, 221], [221, 'l', 'l', 0, 222], [222, 'i', 'i', 0, 223], [223, 'ˈʌ', 'ˈʌ', 0, 224], [224, 'ð', 'ð', 0, 225], [225, 'ɚ', 'ɚ', 0, 226], [226, 's', 's', 0, 227], [227, 'ˈaʊ', 'ˈaʊ', 0, 228], [228, 'n', 'n', 0, 229], [229, 'd', 'd', 0, 230], [230, 'z', 'z', 0, 231], [231, 'ð', 'ð', 0, 232], [232, 'ə', 'ə', 0, 233], [233, 's', 's', 0, 234], [234, 'w', 'w', 0, 235], [235, 'ˈi', 'ˈi', 0, 236], [236, 'p', 'p', 0, 237], [237, 'ə', 'ə', 0, 238], [238, 'v', 'v', 0, 239], [239, 'ˈi', 'ˈi', 0, 240], [240, 'z', 'z', 0, 241], [241, 'i', 'i', 0, 242], [242, 'w', 'w', 0, 243], [243, 'ˈɪ', 'ˈɪ', 0, 244], [244, 'n', 'n', 0, 245], [245, 'd', 'd', 0, 246], [246, 'ə', 'ə', 0, 247], [247, 'n', 'n', 0, 248], [248, 'd', 'd', 0, 249], [249, 'd', 'd', 0, 250], [250, 'ˈaʊ', 'ˈaʊ', 0, 251], [251, 'n', 'n', 0, 252], [252, 'i', 'i', 0, 253], [253, 'f', 'f', 0, 254], [254, 'l', 'l', 0, 255], [255, 'ˈei', 'ˈei', 0, 256], [256, 'k', 'k', 0, 257], [257, 'ð', 'ð', 0, 258], [258, 'ə', 'ə', 0, 259], [259, 'w', 'w', 0, 260], [260, 'ˈʊ', 'ˈʊ', 0, 261], [261, 'd', 'd', 0, 262], [262, 'z', 'z', 0, 263], [263, 'ˈɑ', 'ˈɑ', 0, 264], [264, 'ɹ', 'ɹ', 0, 265], [265, 'l', 'l', 0, 266], [266, 'ˈʌ', 'ˈʌ', 0, 267], [267, 'v', 'v', 0, 268], [268, 'l', 'l', 0, 269], [269, 'i', 'i', 0, 270], [270, 'd', 'd', 0, 271], [271, 'ˈɑ', 'ˈɑ', 0, 272], [272, 'ɹ', 'ɹ', 0, 273], [273, 'k', 'k', 0, 274], [274, 'ə', 'ə', 0, 275], [275, 'n', 'n', 0, 276], [276, 'd', 'd', 0, 277], [277, 'd', 'd', 0, 278], [278, 'ˈi', 'ˈi', 0, 279], [279, 'p', 'p', 0, 280], [280, 'b', 'b', 0, 281], [281, 'ə', 'ə', 0, 282], [282, 't', 't', 0, 283], [283, 'ˈɑɪ', 'ˈɑɪ', 0, 284], [284, 'h', 'h', 0, 285], [285, 'ˈæ', 'ˈæ', 0, 286], [286, 'v', 'v', 0, 287], [287, 'p', 'p', 0, 288], [288, 'ɹ', 'ɹ', 0, 289], [289, 'ˈɑ', 'ˈɑ', 0, 290], [290, 'm', 'm', 0, 291], [291, 'ə', 'ə', 0, 292], [292, 's', 's', 0, 293], [293, 'ə', 'ə', 0, 294], [294, 'z', 'z', 0, 295], [295, 't', 't', 0, 296], [296, 'ˈoʊ', 'ˈoʊ', 0, 297], [297, 'k', 'k', 0, 298], [298, 'ˈi', 'ˈi', 0, 299], [299, 'p', 'p', 0, 300], [300, 'ə', 'ə', 0, 301], [301, 'n', 'n', 0, 302], [302, 'd', 'd', 0, 303], [303, 'm', 'm', 0, 304], [304, 'ˈɑɪ', 'ˈɑɪ', 0, 305], [305, 'l̩', 'l̩', 0, 306], [306, 'z', 'z', 0, 307], [307, 't', 't', 0, 308], [308, 'ˈoʊ', 'ˈoʊ', 0, 309], [309, 'g', 'g', 0, 310], [310, 'ˈoʊ', 'ˈoʊ', 0, 311], [311, 'b', 'b', 0, 312], [312, 'ɪ', 'ɪ', 0, 313], [313, 'f', 'f', 0, 314], [314, 'ˈoʊ', 'ˈoʊ', 0, 315], [315, 'ɹ', 'ɹ', 0, 316], [316, 'ˈɑɪ', 'ˈɑɪ', 0, 317], [317, 's', 's', 0, 318], [318, 'l', 'l', 0, 319], [319, 'ˈi', 'ˈi', 0, 320], [320, 'p', 'p', 0, 321], [321, 'ə', 'ə', 0, 322], [322, 'n', 'n', 0, 323], [323, 'd', 'd', 0, 324], [324, 'm', 'm', 0, 325], [325, 'ˈɑɪ', 'ˈɑɪ', 0, 326], [326, 'l̩', 'l̩', 0, 327], [327, 'z', 'z', 0, 328], [328, 't', 't', 0, 329], [329, 'ˈoʊ', 'ˈoʊ', 0, 330], [330, 'g', 'g', 0, 331], [331, 'ˈoʊ', 'ˈoʊ', 0, 332], [332, 'b', 'b', 0, 333], [333, 'ɪ', 'ɪ', 0, 334], [334, 'f', 'f', 0, 335], [335, 'ˈoʊ', 'ˈoʊ', 0, 336], [336, 'ɹ', 'ɹ', 0, 337], [337, 'ˈɑɪ', 'ˈɑɪ', 0, 338], [338, 's', 's', 0, 339], [339, 'l', 'l', 0, 340], [340, 'ˈi', 'ˈi', 0, 341], [341, 'p', 'p', 0, 342]]
Now let's read the lexicon. Assuming there are no homophones (no pairs of words that sound the same), we can construct a deterministic WFST as follows:
So if the first two entries in the lexicon were "a ə", "a ˌei" and "about ə b ˈaʊ t", then the first several transitions in the WFST would be:
Here's what that looks like. Actually the third word in the lexicon is "about b ˈaʊ t" not "about ə b ˈaʊ t", so the result is a little different than the one listed above:
importlib.reload(mp5)
#L, Lfinal = mp5.todo_lexicon2wfst('data/lexicon.txt')
L = solutions['L']
Lfinal = solutions['Lfinal']
print(Lfinal)
print(L)
[0] [[0, 'ə', '', 0, 1], [1, '', 'a', 0, 0], [0, 'ˌei', '', 0, 2], [2, '', 'a', 0, 0], [0, 'b', '', 0, 3], [3, 'ˌaʊ', '', 0, 4], [4, 't', '', 0, 5], [5, '', 'about', 0, 0], [1, 'b', '', 0, 6], [6, 'ˈaʊ', '', 0, 7], [7, 't', '', 0, 8], [8, '', 'about', 0, 0], [0, 'ˌɑ', '', 0, 9], [9, 'l', '', 0, 10], [10, 'ɛ', '', 0, 11], [11, 'k', '', 0, 12], [12, 's', '', 0, 13], [13, 'ˈɑ', '', 0, 14], [14, 'n', '', 0, 15], [15, 'd', '', 0, 16], [16, 'ɹ', '', 0, 17], [17, 'ɑ', '', 0, 18], [18, 'v', '', 0, 19], [19, 'ɪ', '', 0, 20], [20, 'tʃ', '', 0, 21], [21, '', 'aleksandrovich', 0, 0], [0, 'ˌæ', '', 0, 22], [22, 'l', '', 0, 23], [23, 'ɪ', '', 0, 24], [24, 'g', '', 0, 25], [25, 'z', '', 0, 26], [26, 'ˈæ', '', 0, 27], [27, 'n', '', 0, 28], [28, 'd', '', 0, 29], [29, 'ɚ', '', 0, 30], [30, '', 'alexander', 0, 0], [29, 'ə', '', 0, 31], [31, 'ɹ', '', 0, 32], [32, '', 'alexander', 0, 0], [0, 'ɑ', '', 0, 33], [33, 'l', '', 0, 34], [34, '', 'all', 0, 0], [0, 'ˈɔ', '', 0, 35], [35, 'l', '', 0, 36], [36, '', 'all', 0, 0], [1, 'n', '', 0, 37], [37, 'd', '', 0, 38], [38, '', 'and', 0, 0], [0, 'ˈæ', '', 0, 39], [39, 'n', '', 0, 40], [40, 'd', '', 0, 41], [41, '', 'and', 0, 0], [0, 'ˈɑ', '', 0, 42], [42, 'ɹ', '', 0, 43], [43, '', 'are', 0, 0], [39, 'z', '', 0, 44], [44, '', 'as', 0, 0], [0, 'ˈɛ', '', 0, 45], [45, 'z', '', 0, 46], [46, '', 'as', 0, 0], [39, 's', '', 0, 47], [47, 'k', '', 0, 48], [48, '', 'ask', 0, 0], [1, 't', '', 0, 49], [49, '', 'at', 0, 0], [39, 't', '', 0, 50], [50, '', 'at', 0, 0], [3, 'i', '', 0, 51], [51, '', 'be', 0, 0], [3, 'ˈei', '', 0, 52], [52, '', 'be', 0, 0], [3, 'ˈi', '', 0, 53], [53, '', 'be', 0, 0], [3, 'ɪ', '', 0, 54], [54, 'f', '', 0, 55], [55, 'ˈoʊ', '', 0, 56], [56, 'ɹ', '', 0, 57], [57, '', 'before', 0, 0], [55, 'ˈɔ', '', 0, 58], [58, 'ɹ', '', 0, 59], [59, '', 'before', 0, 0], [3, 'ˌi', '', 0, 60], [60, 'f', '', 0, 61], [61, 'ˈɔ', '', 0, 62], [62, 'ɹ', '', 0, 63], [63, '', 'before', 0, 0], [3, 'ˈɛ', '', 0, 64], [64, 'l', '', 0, 65], [65, 'z', '', 0, 66], [66, '', 'bells', 0, 0], [51, 't', '', 0, 67], [67, 'w', '', 0, 68], [68, 'ˈi', '', 0, 69], [69, 'n', '', 0, 70], [70, '', 'between', 0, 0], [54, 't', '', 0, 71], [71, 'w', '', 0, 72], [72, 'ˈi', '', 0, 73], [73, 'n', '', 0, 74], [74, '', 'between', 0, 0], [3, 'ˈʊ', '', 0, 75], [75, 'k', '', 0, 76], [76, '', 'book', 0, 0], [3, 'ɹ', '', 0, 77], [77, 'ˈoʊ', '', 0, 78], [78, 'k', '', 0, 79], [79, '', 'broke', 0, 0], [3, 'ə', '', 0, 80], [80, 't', '', 0, 81], [81, '', 'but', 0, 0], [3, 'ˈʌ', '', 0, 82], [82, 't', '', 0, 83], [83, '', 'but', 0, 0], [3, 'ˈɑɪ', '', 0, 84], [84, '', 'by', 0, 0], [0, 'k', '', 0, 85], [85, 'ˈɑ', '', 0, 86], [86, 'm', '', 0, 87], [87, 'j', '', 0, 88], [88, 'ə', '', 0, 89], [89, 'n', '', 0, 90], [90, 'ɪ', '', 0, 91], [91, 's', '', 0, 92], [92, 't', '', 0, 93], [93, '', 'communist', 0, 0], [90, 'ə', '', 0, 94], [94, 's', '', 0, 95], [95, 't', '', 0, 96], [96, '', 'communist', 0, 0], [85, 'ɹ', '', 0, 97], [97, 'ˈɑɪ', '', 0, 98], [98, 'd', '', 0, 99], [99, '', 'cried', 0, 0], [0, 'z', '', 0, 100], [100, 'ˈɑ', '', 0, 101], [101, 'ɹ', '', 0, 102], [102, '', 'czar', 0, 0], [0, 'd', '', 0, 103], [103, 'ˈɑ', '', 0, 104], [104, 'ɹ', '', 0, 105], [105, 'k', '', 0, 106], [106, '', 'dark', 0, 0], [106, 'ə', '', 0, 107], [107, 's', '', 0, 108], [108, 't', '', 0, 109], [109, '', 'darkest', 0, 0], [103, 'ˈi', '', 0, 110], [110, 'p', '', 0, 111], [111, '', 'deep', 0, 0], [103, 'ˈɑɪ', '', 0, 112], [112, 'd', '', 0, 113], [113, '', 'died', 0, 0], [103, 'ˈaʊ', '', 0, 114], [114, 'n', '', 0, 115], [115, '', 'down', 0, 0], [115, 'i', '', 0, 116], [116, '', 'downy', 0, 0], [103, 'ɹ', '', 0, 117], [117, 'ə', '', 0, 118], [118, 'm', '', 0, 119], [119, 'ˈæ', '', 0, 120], [120, 'ɾ', '', 0, 121], [121, 'ɪ', '', 0, 122], [122, 'k', '', 0, 123], [123, '', 'dramatic', 0, 0], [0, 'ˈi', '', 0, 124], [124, 'z', '', 0, 125], [125, 'i', '', 0, 126], [126, '', 'easy', 0, 0], [0, 'ˈei', '', 0, 127], [127, 't', '', 0, 128], [128, 'ˈi', '', 0, 129], [129, 'n', '', 0, 130], [130, '', 'eighteen', 0, 0], [45, 'm', '', 0, 131], [131, 'p', '', 0, 132], [132, 'ɚ', '', 0, 133], [133, 'ɚ', '', 0, 134], [134, '', 'emperor', 0, 0], [132, 'ə', '', 0, 135], [135, 'ɹ', '', 0, 136], [136, 'ə', '', 0, 137], [137, 'ɹ', '', 0, 138], [138, '', 'emperor', 0, 0], [45, 'n', '', 0, 139], [139, 'd', '', 0, 140], [140, '', 'end', 0, 0], [0, 'ɛ', '', 0, 141], [141, 'n', '', 0, 142], [142, 't', '', 0, 143], [143, 'ˈɑɪ', '', 0, 144], [144, 'ɚ', '', 0, 145], [145, '', 'entire', 0, 0], [0, 'ɪ', '', 0, 146], [146, 'n', '', 0, 147], [147, 't', '', 0, 148], [148, 'ˈɑɪ', '', 0, 149], [149, 'ɚ', '', 0, 150], [150, '', 'entire', 0, 0], [0, 'ˌɛ', '', 0, 151], [151, 'n', '', 0, 152], [152, 't', '', 0, 153], [153, 'ˌɑɪ', '', 0, 154], [154, 'ɹ', '', 0, 155], [155, '', 'entire', 0, 0], [124, 'v', '', 0, 156], [156, 'n', '', 0, 157], [157, 'ɪ', '', 0, 158], [158, 'ŋ', '', 0, 159], [159, '', 'evening', 0, 0], [0, 'f', '', 0, 160], [160, 'ˈæ', '', 0, 161], [161, 'm', '', 0, 162], [162, 'ə', '', 0, 163], [163, 'l', '', 0, 164], [164, 'i', '', 0, 165], [165, '', 'family', 0, 0], [162, 'l', '', 0, 166], [166, 'i', '', 0, 167], [167, '', 'family', 0, 0], [160, 'ˈɑ', '', 0, 168], [168, 'ɹ', '', 0, 169], [169, 'm', '', 0, 170], [170, '', 'farm', 0, 0], [168, 'ð', '', 0, 171], [171, 'ɚ', '', 0, 172], [172, '', 'father', 0, 0], [171, 'ə', '', 0, 173], [173, 'ɹ', '', 0, 174], [174, '', 'father', 0, 0], [160, 'ˈɪ', '', 0, 175], [175, 'l', '', 0, 176], [176, '', 'fill', 0, 0], [160, 'l', '', 0, 177], [177, 'ˈei', '', 0, 178], [178, 'k', '', 0, 179], [179, '', 'flake', 0, 0], [160, 'oʊ', '', 0, 180], [180, 'ɹ', '', 0, 181], [181, '', 'four', 0, 0], [160, 'ˈɔ', '', 0, 182], [182, 'ɹ', '', 0, 183], [183, '', 'four', 0, 0], [160, 'ɹ', '', 0, 184], [184, 'ˈoʊ', '', 0, 185], [185, 'z', '', 0, 186], [186, 'n̩', '', 0, 187], [187, '', 'frozen', 0, 0], [0, 'g', '', 0, 188], [188, 'ˈɪ', '', 0, 189], [189, 'v', '', 0, 190], [190, 'z', '', 0, 191], [191, '', 'gives', 0, 0], [188, 'ˈoʊ', '', 0, 192], [192, '', 'go', 0, 0], [192, 'ɪ', '', 0, 193], [193, 'n', '', 0, 194], [194, '', 'going', 0, 0], [193, 'ŋ', '', 0, 195], [195, '', 'going', 0, 0], [0, 'h', '', 0, 196], [196, 'ˈæ', '', 0, 197], [197, 'd', '', 0, 198], [198, '', 'had', 0, 0], [197, 'p', '', 0, 199], [199, 'n̩', '', 0, 200], [200, '', 'happen', 0, 0], [196, 'ˈɑ', '', 0, 201], [201, 'ɹ', '', 0, 202], [202, 'n', '', 0, 203], [203, 'ɪ', '', 0, 204], [204, 's', '', 0, 205], [205, '', 'harness', 0, 0], [197, 'v', '', 0, 206], [206, '', 'have', 0, 0], [196, 'ˈi', '', 0, 207], [207, '', 'he', 0, 0], [196, 'ˈʌ', '', 0, 208], [208, '', 'he', 0, 0], [196, 'ˈɛ', '', 0, 209], [209, 'l', '', 0, 210], [210, 'p', '', 0, 211], [211, '', 'help', 0, 0], [196, 'ɚ', '', 0, 212], [212, 'ɹ', '', 0, 213], [213, '', 'her', 0, 0], [196, 'ˈɝ', '', 0, 214], [214, '', 'her', 0, 0], [196, 'ˈɪ', '', 0, 215], [215, 'ɹ', '', 0, 216], [216, '', 'here', 0, 0], [196, 'ɪ', '', 0, 217], [217, 'm', '', 0, 218], [218, '', 'him', 0, 0], [215, 'm', '', 0, 219], [219, '', 'him', 0, 0], [146, 'm', '', 0, 220], [220, '', 'him', 0, 0], [217, 'z', '', 0, 221], [221, '', 'his', 0, 0], [196, 'ˈɔ', '', 0, 222], [222, 'ɹ', '', 0, 223], [223, 's', '', 0, 224], [224, '', 'horse', 0, 0], [196, 'ˈaʊ', '', 0, 225], [225, 's', '', 0, 226], [226, '', 'house', 0, 0], [225, 'z', '', 0, 227], [227, '', 'house', 0, 0], [226, 'h', '', 0, 228], [228, 'ˌoʊ', '', 0, 229], [229, 'l', '', 0, 230], [230, 'd', '', 0, 231], [231, '', 'household', 0, 0], [0, 'ˈɑɪ', '', 0, 232], [232, '', 'i', 0, 0], [146, 'f', '', 0, 233], [233, '', 'if', 0, 0], [0, 'ˈɪ', '', 0, 234], [234, 'n', '', 0, 235], [235, '', 'in', 0, 0], [147, 'k', '', 0, 236], [236, 'l', '', 0, 237], [237, 'u', '', 0, 238], [238, 'd', '', 0, 239], [239, 'ɪ', '', 0, 240], [240, 'ŋ', '', 0, 241], [241, '', 'including', 0, 0], [237, 'ˈu', '', 0, 242], [242, 'd', '', 0, 243], [243, 'ɪ', '', 0, 244], [244, 'ŋ', '', 0, 245], [245, '', 'including', 0, 0], [147, 's', '', 0, 246], [246, 't', '', 0, 247], [247, 'ɹ', '', 0, 248], [248, 'ˈʌ', '', 0, 249], [249, 'k', '', 0, 250], [250, 't', '', 0, 251], [251, 'ɪ', '', 0, 252], [252, 'v', '', 0, 253], [253, '', 'instructive', 0, 0], [0, 'ˌɪ', '', 0, 254], [254, 'n', '', 0, 255], [255, 't', '', 0, 256], [256, 'ɹ', '', 0, 257], [257, 'oʊ', '', 0, 258], [258, 'd', '', 0, 259], [259, 'ˈu', '', 0, 260], [260, 's', '', 0, 261], [261, 't', '', 0, 262], [262, '', 'introduced', 0, 0], [257, 'ə', '', 0, 263], [263, 'd', '', 0, 264], [264, 'ˈu', '', 0, 265], [265, 's', '', 0, 266], [266, 't', '', 0, 267], [267, '', 'introduced', 0, 0], [146, 'z', '', 0, 268], [268, '', 'is', 0, 0], [146, 't', '', 0, 269], [269, '', 'it', 0, 0], [269, 's', '', 0, 270], [270, '', 'its', 0, 0], [0, 'dʒ', '', 0, 271], [271, 'u', '', 0, 272], [272, 'l', '', 0, 273], [273, 'ˈɑɪ', '', 0, 274], [274, '', 'july', 0, 0], [271, 'ə', '', 0, 275], [275, 'l', '', 0, 276], [276, 'ˈɑɪ', '', 0, 277], [277, '', 'july', 0, 0], [271, 'ˌu', '', 0, 278], [278, 'l', '', 0, 279], [279, 'ˈɑɪ', '', 0, 280], [280, '', 'july', 0, 0], [85, 'ˈi', '', 0, 281], [281, 'p', '', 0, 282], [282, '', 'keep', 0, 0], [0, 'n', '', 0, 283], [283, 'ˈoʊ', '', 0, 284], [284, '', 'know', 0, 0], [0, 'l', '', 0, 285], [285, 'ˈei', '', 0, 286], [286, 'k', '', 0, 287], [287, '', 'lake', 0, 0], [285, 'ˈɝ', '', 0, 288], [288, 'n', '', 0, 289], [289, 'ɪ', '', 0, 290], [290, 'ŋ', '', 0, 291], [291, '', 'learning', 0, 0], [288, 'ɹ', '', 0, 292], [292, 'n', '', 0, 293], [293, 'ɪ', '', 0, 294], [294, 'ŋ', '', 0, 295], [295, '', 'learning', 0, 0], [285, 'ˈɪ', '', 0, 296], [296, 't', '', 0, 297], [297, 'l̩', '', 0, 298], [298, '', 'little', 0, 0], [296, 'ɾ', '', 0, 299], [299, 'l̩', '', 0, 300], [300, '', 'little', 0, 0], [285, 'ˈʌ', '', 0, 301], [301, 'v', '', 0, 302], [302, 'l', '', 0, 303], [303, 'i', '', 0, 304], [304, '', 'lovely', 0, 0], [0, 'm', '', 0, 305], [305, 'ˈæ', '', 0, 306], [306, 's', '', 0, 307], [307, 'ə', '', 0, 308], [308, 'k', '', 0, 309], [309, 'ɚ', '', 0, 310], [310, '', 'massacre', 0, 0], [309, 'ə', '', 0, 311], [311, 'ɹ', '', 0, 312], [312, '', 'massacre', 0, 0], [305, 'ˈi', '', 0, 313], [313, '', 'me', 0, 0], [305, 'ˈɑɪ', '', 0, 314], [314, 'l̩', '', 0, 315], [315, 'z', '', 0, 316], [316, '', 'miles', 0, 0], [314, 'l', '', 0, 317], [317, 'z', '', 0, 318], [318, '', 'miles', 0, 0], [305, 'ɪ', '', 0, 319], [319, 's', '', 0, 320], [320, 't', '', 0, 321], [321, 'ˈei', '', 0, 322], [322, 'k', '', 0, 323], [323, '', 'mistake', 0, 0], [305, 'oʊ', '', 0, 324], [324, 'ɹ', '', 0, 325], [325, '', 'more', 0, 0], [305, 'ˈɔ', '', 0, 326], [326, 'ɹ', '', 0, 327], [327, '', 'more', 0, 0], [305, 'ˈu', '', 0, 328], [328, 'v', '', 0, 329], [329, 'm', '', 0, 330], [330, 'n̩', '', 0, 331], [331, 't', '', 0, 332], [332, '', 'movement', 0, 0], [305, 'ə', '', 0, 333], [333, 's', '', 0, 334], [334, 't', '', 0, 335], [335, '', 'must', 0, 0], [305, 'ˈʌ', '', 0, 336], [336, 's', '', 0, 337], [337, 't', '', 0, 338], [338, '', 'must', 0, 0], [313, '', 'my', 0, 0], [314, '', 'my', 0, 0], [283, 'ˌi', '', 0, 339], [339, 'ɹ', '', 0, 340], [340, '', 'near', 0, 0], [283, 'ˈɪ', '', 0, 341], [341, 'ɹ', '', 0, 342], [342, '', 'near', 0, 0], [283, 'ˈɛ', '', 0, 343], [343, 'v', '', 0, 344], [344, 'ɚ', '', 0, 345], [345, '', 'never', 0, 0], [344, 'ə', '', 0, 346], [346, 'ɹ', '', 0, 347], [347, '', 'never', 0, 0], [341, 'k', '', 0, 348], [348, 'ə', '', 0, 349], [349, 'l', '', 0, 350], [350, 'ə', '', 0, 351], [351, 's', '', 0, 352], [352, '', 'nicholas', 0, 0], [348, 'l', '', 0, 353], [353, 'ə', '', 0, 354], [354, 's', '', 0, 355], [355, '', 'nicholas', 0, 0], [283, 'ˈɑɪ', '', 0, 356], [356, 'n', '', 0, 357], [357, 't', '', 0, 358], [358, 'ˈi', '', 0, 359], [359, 'n', '', 0, 360], [360, '', 'nineteen', 0, 0], [358, 'i', '', 0, 361], [361, '', 'ninety', 0, 0], [283, 'ˈɑ', '', 0, 362], [362, 't', '', 0, 363], [363, '', 'not', 0, 0], [283, 'oʊ', '', 0, 364], [364, 'v', '', 0, 365], [365, 'ˈɛ', '', 0, 366], [366, 'm', '', 0, 367], [367, 'b', '', 0, 368], [368, 'ɚ', '', 0, 369], [369, '', 'november', 0, 0], [368, 'ə', '', 0, 370], [370, 'ɹ', '', 0, 371], [371, '', 'november', 0, 0], [283, 'ˈaʊ', '', 0, 372], [372, '', 'now', 0, 0], [1, 'v', '', 0, 373], [373, '', 'of', 0, 0], [0, 'ˈoʊ', '', 0, 374], [374, 'l', '', 0, 375], [375, 'd', '', 0, 376], [376, '', 'old', 0, 0], [42, 'n', '', 0, 377], [377, '', 'on', 0, 0], [35, 'n', '', 0, 378], [378, '', 'on', 0, 0], [374, 'n', '', 0, 379], [379, 'l', '', 0, 380], [380, 'i', '', 0, 381], [381, '', 'only', 0, 0], [0, 'ˈʌ', '', 0, 382], [382, 'ð', '', 0, 383], [383, 'ɚ', '', 0, 384], [384, '', 'other', 0, 0], [383, 'ə', '', 0, 385], [385, 'ɹ', '', 0, 386], [386, '', 'other', 0, 0], [0, 'ˈaʊ', '', 0, 387], [387, 'ɚ', '', 0, 388], [388, '', 'our', 0, 0], [387, 'ɹ', '', 0, 389], [389, '', 'our', 0, 0], [43, '', 'our', 0, 0], [0, 'p', '', 0, 390], [390, 'ˈɝ', '', 0, 391], [391, 's', '', 0, 392], [392, 'ɪ', '', 0, 393], [393, 'n', '', 0, 394], [394, 'ɪ', '', 0, 395], [395, 'l', '', 0, 396], [396, '', 'personal', 0, 0], [391, 'ɹ', '', 0, 397], [397, 's', '', 0, 398], [398, 'ə', '', 0, 399], [399, 'n', '', 0, 400], [400, 'l̩', '', 0, 401], [401, '', 'personal', 0, 0], [160, 'ə', '', 0, 402], [402, 'z', '', 0, 403], [403, 'ˈɪ', '', 0, 404], [404, 'ʃ', '', 0, 405], [405, 'n̩', '', 0, 406], [406, '', 'physician', 0, 0], [160, 'ɪ', '', 0, 407], [407, 'z', '', 0, 408], [408, 'ˈɪ', '', 0, 409], [409, 'ʃ', '', 0, 410], [410, 'n̩', '', 0, 411], [411, '', 'physician', 0, 0], [390, 'ɹ', '', 0, 412], [412, 'ˈɑ', '', 0, 413], [413, 'm', '', 0, 414], [414, 'ə', '', 0, 415], [415, 's', '', 0, 416], [416, 'ə', '', 0, 417], [417, 'z', '', 0, 418], [418, '', 'promises', 0, 0], [85, 'w', '', 0, 419], [419, 'ˈɪ', '', 0, 420], [420, 'ɹ', '', 0, 421], [421, '', 'queer', 0, 0], [0, 'ɹ', '', 0, 422], [422, 'ɑ', '', 0, 423], [423, 'n', '', 0, 424], [424, '', 'ran', 0, 0], [422, 'ˈæ', '', 0, 425], [425, 'n', '', 0, 426], [426, '', 'ran', 0, 0], [422, 'i', '', 0, 427], [427, 'ˈæ', '', 0, 428], [428, 'k', '', 0, 429], [429, 'ʃ', '', 0, 430], [430, 'n̩', '', 0, 431], [431, '', 'reaction', 0, 0], [422, 'ˈoʊ', '', 0, 432], [432, 'm', '', 0, 433], [433, 'ə', '', 0, 434], [434, 'n', '', 0, 435], [435, 'ˌɔ', '', 0, 436], [436, 'f', '', 0, 437], [437, '', 'romanov', 0, 0], [436, 'v', '', 0, 438], [438, '', 'romanov', 0, 0], [422, 'ˈu', '', 0, 439], [439, 'l', '', 0, 440], [440, '', 'rule', 0, 0], [422, 'ˈʌ', '', 0, 441], [441, 'ʃ', '', 0, 442], [442, 'ə', '', 0, 443], [443, '', 'russia', 0, 0], [0, 's', '', 0, 444], [444, 'ˈɛ', '', 0, 445], [445, 'k', '', 0, 446], [446, 'n̩', '', 0, 447], [447, '', 'second', 0, 0], [447, 'd', '', 0, 448], [448, '', 'second', 0, 0], [444, 'ˈi', '', 0, 449], [449, '', 'see', 0, 0], [0, 'ʃ', '', 0, 450], [450, 'ˈei', '', 0, 451], [451, 'k', '', 0, 452], [452, '', 'shake', 0, 0], [450, 'ˈoʊ', '', 0, 453], [453, 'l', '', 0, 454], [454, 'd', '', 0, 455], [455, 'ɚ', '', 0, 456], [456, '', 'shoulder', 0, 0], [455, 'ə', '', 0, 457], [457, 'ɹ', '', 0, 458], [458, '', 'shoulder', 0, 0], [444, 'ˈɑɪ', '', 0, 459], [459, 'm', '', 0, 460], [460, 'n̩', '', 0, 461], [461, '', 'simon', 0, 0], [444, 'ˈɪ', '', 0, 462], [462, 's', '', 0, 463], [463, 't', '', 0, 464], [464, 'ɚ', '', 0, 465], [465, '', 'sister', 0, 0], [464, 'ə', '', 0, 466], [466, 'ɹ', '', 0, 467], [467, '', 'sister', 0, 0], [444, 'ɪ', '', 0, 468], [468, 'k', '', 0, 469], [469, 's', '', 0, 470], [470, '', 'six', 0, 0], [462, 'k', '', 0, 471], [471, 's', '', 0, 472], [472, '', 'six', 0, 0], [444, 'l', '', 0, 473], [473, 'ˈi', '', 0, 474], [474, 'p', '', 0, 475], [475, '', 'sleep', 0, 0], [444, 'n', '', 0, 476], [476, 'ˈoʊ', '', 0, 477], [477, '', 'snow', 0, 0], [444, 'ˈʌ', '', 0, 478], [478, 'm', '', 0, 479], [479, '', 'some', 0, 0], [444, 'ˈaʊ', '', 0, 480], [480, 'n', '', 0, 481], [481, 'd', '', 0, 482], [482, 'z', '', 0, 483], [483, '', 'sounds', 0, 0], [481, 'z', '', 0, 484], [484, '', 'sounds', 0, 0], [444, 't', '', 0, 485], [485, 'ɑ', '', 0, 486], [486, 't', '', 0, 487], [487, '', 'start', 0, 0], [485, 'ˈɑ', '', 0, 488], [488, 'ɹ', '', 0, 489], [489, 't', '', 0, 490], [490, '', 'start', 0, 0], [488, 'p', '', 0, 491], [491, '', 'stop', 0, 0], [491, 'ɪ', '', 0, 492], [492, 'ŋ', '', 0, 493], [493, '', 'stopping', 0, 0], [485, 'ˈɔ', '', 0, 494], [494, 'ɹ', '', 0, 495], [495, 'i', '', 0, 496], [496, 'z', '', 0, 497], [497, '', 'stories', 0, 0], [444, 'w', '', 0, 498], [498, 'ˈi', '', 0, 499], [499, 'p', '', 0, 500], [500, '', 'sweep', 0, 0], [0, 't', '', 0, 501], [501, 'ˈɛ', '', 0, 502], [502, 'ɹ', '', 0, 503], [503, 'z', '', 0, 504], [504, '', 'tears', 0, 0], [501, 'ˈɪ', '', 0, 505], [505, 'ɹ', '', 0, 506], [506, 'z', '', 0, 507], [507, '', 'tears', 0, 0], [502, 'l', '', 0, 508], [508, 'ɪ', '', 0, 509], [509, 'ŋ', '', 0, 510], [510, '', 'telling', 0, 0], [501, 'ɛ', '', 0, 511], [511, 'm', '', 0, 512], [512, 't', '', 0, 513], [513, 'ˈei', '', 0, 514], [514, 'ʃ', '', 0, 515], [515, 'n̩', '', 0, 516], [516, '', 'temptation', 0, 0], [512, 'p', '', 0, 517], [517, 't', '', 0, 518], [518, 'ˈei', '', 0, 519], [519, 'ʃ', '', 0, 520], [520, 'n̩', '', 0, 521], [521, '', 'temptation', 0, 0], [0, 'ð', '', 0, 522], [522, 'ə', '', 0, 523], [523, 't', '', 0, 524], [524, '', 'that', 0, 0], [522, 'ˈæ', '', 0, 525], [525, 't', '', 0, 526], [526, '', 'that', 0, 0], [523, '', 'the', 0, 0], [522, 'ˈi', '', 0, 527], [527, '', 'the', 0, 0], [522, 'ˈʌ', '', 0, 528], [528, '', 'the', 0, 0], [522, 'ˈɛ', '', 0, 529], [529, 'ɹ', '', 0, 530], [530, '', 'there', 0, 0], [527, 'z', '', 0, 531], [531, '', 'these', 0, 0], [0, 'ɵ', '', 0, 532], [532, 'ˈɪ', '', 0, 533], [533, 'ŋ', '', 0, 534], [534, 'k', '', 0, 535], [535, '', 'think', 0, 0], [522, 'ˈoʊ', '', 0, 536], [536, '', 'though', 0, 0], [501, 'ˈoʊ', '', 0, 537], [537, '', 'to', 0, 0], [501, 'ˈu', '', 0, 538], [538, '', 'to', 0, 0], [501, 'ˈʌ', '', 0, 539], [539, '', 'to', 0, 0], [537, 'l', '', 0, 540], [540, 'd', '', 0, 541], [541, '', 'told', 0, 0], [501, 'ɹ', '', 0, 542], [542, 'ɑɪ', '', 0, 543], [543, 'ˈʌ', '', 0, 544], [544, 'm', '', 0, 545], [545, 'f', '', 0, 546], [546, 'n̩', '', 0, 547], [547, 't', '', 0, 548], [548, '', 'triumphant', 0, 0], [501, 'w', '', 0, 549], [549, 'ˈɛ', '', 0, 550], [550, 'n', '', 0, 551], [551, 'i', '', 0, 552], [552, '', 'twenty', 0, 0], [551, 't', '', 0, 553], [553, 'i', '', 0, 554], [554, '', 'twenty', 0, 0], [382, 'p', '', 0, 555], [555, '', 'up', 0, 0], [0, 'v', '', 0, 556], [556, 'ˈɪ', '', 0, 557], [557, 'l', '', 0, 558], [558, 'ɪ', '', 0, 559], [559, 'dʒ', '', 0, 560], [560, '', 'village', 0, 0], [0, 'w', '', 0, 561], [561, 'ˈɔ', '', 0, 562], [562, 'n', '', 0, 563], [563, 'ɪ', '', 0, 564], [564, 'd', '', 0, 565], [565, '', 'wanted', 0, 0], [561, 'ˈɑ', '', 0, 566], [566, 'n', '', 0, 567], [567, 't', '', 0, 568], [568, 'ə', '', 0, 569], [569, 'd', '', 0, 570], [570, '', 'wanted', 0, 0], [563, 't', '', 0, 571], [571, 'ɪ', '', 0, 572], [572, 'd', '', 0, 573], [573, '', 'wanted', 0, 0], [561, 'ə', '', 0, 574], [574, 'z', '', 0, 575], [575, '', 'was', 0, 0], [566, 'z', '', 0, 576], [576, '', 'was', 0, 0], [562, 'z', '', 0, 577], [577, '', 'was', 0, 0], [566, 'tʃ', '', 0, 578], [578, '', 'watch', 0, 0], [196, 'w', '', 0, 579], [579, 'ˈʌ', '', 0, 580], [580, 't', '', 0, 581], [581, '', 'what', 0, 0], [561, 'ˈʌ', '', 0, 582], [582, 't', '', 0, 583], [583, '', 'what', 0, 0], [561, 'ˌɑ', '', 0, 584], [584, 't', '', 0, 585], [585, '', 'what', 0, 0], [49, '', 'what', 0, 0], [579, 'ˈɛ', '', 0, 586], [586, 'n', '', 0, 587], [587, '', 'when', 0, 0], [579, 'ˈɪ', '', 0, 588], [588, 'n', '', 0, 589], [589, '', 'when', 0, 0], [561, 'ˈɛ', '', 0, 590], [590, 'n', '', 0, 591], [591, '', 'when', 0, 0], [561, 'ˈɪ', '', 0, 592], [592, 'n', '', 0, 593], [593, '', 'when', 0, 0], [142, '', 'when', 0, 0], [588, 'tʃ', '', 0, 594], [594, '', 'which', 0, 0], [592, 'tʃ', '', 0, 595], [595, '', 'which', 0, 0], [146, 'tʃ', '', 0, 596], [596, '', 'which', 0, 0], [196, 'ˈu', '', 0, 597], [597, 'z', '', 0, 598], [598, '', 'whose', 0, 0], [561, 'ɪ', '', 0, 599], [599, 'l', '', 0, 600], [600, '', 'will', 0, 0], [593, 'd', '', 0, 601], [601, '', 'wind', 0, 0], [561, 'ˈɑɪ', '', 0, 602], [602, 'n', '', 0, 603], [603, 'd', '', 0, 604], [604, '', 'wind', 0, 0], [599, 'ð', '', 0, 605], [605, '', 'with', 0, 0], [599, 'ɵ', '', 0, 606], [606, '', 'with', 0, 0], [605, 'ˈaʊ', '', 0, 607], [607, 't', '', 0, 608], [608, '', 'without', 0, 0], [561, 'ˈʊ', '', 0, 609], [609, 'd', '', 0, 610], [610, 'z', '', 0, 611], [611, '', 'woods', 0, 0], [0, 'j', '', 0, 612], [612, 'ˌi', '', 0, 613], [613, 'ɹ', '', 0, 614], [614, '', 'year', 0, 0], [612, 'ˈɪ', '', 0, 615], [615, 'ɹ', '', 0, 616], [616, '', 'year', 0, 0], [305, 'ˌɔ', '', 0, 617], [617, 'n', '', 0, 618], [618, 't', '', 0, 619], [619, 'ə', '', 0, 620], [620, 'n', '', 0, 621], [621, 'ˈoʊ', '', 0, 622], [622, 'ɹ', '', 0, 623], [623, 'i', '', 0, 624], [624, '', 'montenori', 0, 0], [350, 'ɑɪ', '', 0, 625], [625, '', 'nicholai', 0, 0], [437, 's', '', 0, 626], [626, '', 'romanovs', 0, 0], [445, 'b', '', 0, 627], [627, 'æ', '', 0, 628], [628, 'g', '', 0, 629], [629, '', 'sebag', 0, 0]]
Finally, let's create the language model. We will use a unigram language model, meaning that the probability of every word is independent of the words around it, e.g., the word sequence $\vec{o}=[o_1,o_2,o_3]$ is modeled as
$$p(\vec{o}) = \prod_{k=1}^3 p(o_k)$$This can be modeled as a WFST with just one state, state 0. Each transition, $t$, goes from $p[t]=0$ to $n[t]=0$. The input label and output label are the same: they both equal the word. The weight on each edge is the negative log probability of the word. So for example, the first several edges would be
The probabilities should be estimated from the file 'data/languagemodeltexts.txt', by counting the number of occurrences of each word, and Laplace-smoothing with a smoothing factor of $1$. So if $C(w)$ is the number of times word $w$ occurs in languagemodeltexts.txt, then
$$p(w) = \frac{1+C(w)}{\sum_{v\in V} (1+C(v))}$$where $V$ is the set of all distinct words (all words with different orthographic representations) in the lexicon.
In order to make sure that the autograder can grade your code, please make sure that the edges occur in the same sequence as the distinct words in the lexicon.
importlib.reload(mp5)
#G, Gfinal = mp5.todo_unigram('data/languagemodeltexts.txt',L)
G = solutions['G']
Gfinal = solutions['Gfinal']
print(Gfinal)
print(G)
[0] [[0, 'a', 'a', 4.923623917106626, 0], [0, 'about', 'about', 4.923623917106626, 0], [0, 'aleksandrovich', 'aleksandrovich', 4.923623917106626, 0], [0, 'alexander', 'alexander', 4.923623917106626, 0], [0, 'all', 'all', 4.923623917106626, 0], [0, 'and', 'and', 4.007333185232471, 0], [0, 'are', 'are', 4.923623917106626, 0], [0, 'as', 'as', 4.923623917106626, 0], [0, 'ask', 'ask', 5.616771097666572, 0], [0, 'at', 'at', 4.923623917106626, 0], [0, 'be', 'be', 4.923623917106626, 0], [0, 'before', 'before', 5.616771097666572, 0], [0, 'bells', 'bells', 5.616771097666572, 0], [0, 'between', 'between', 5.616771097666572, 0], [0, 'book', 'book', 4.923623917106626, 0], [0, 'broke', 'broke', 4.923623917106626, 0], [0, 'but', 'but', 4.923623917106626, 0], [0, 'by', 'by', 4.518158808998462, 0], [0, 'communist', 'communist', 4.923623917106626, 0], [0, 'cried', 'cried', 4.923623917106626, 0], [0, 'czar', 'czar', 4.923623917106626, 0], [0, 'dark', 'dark', 5.616771097666572, 0], [0, 'darkest', 'darkest', 5.616771097666572, 0], [0, 'deep', 'deep', 5.616771097666572, 0], [0, 'died', 'died', 4.923623917106626, 0], [0, 'down', 'down', 4.923623917106626, 0], [0, 'downy', 'downy', 5.616771097666572, 0], [0, 'dramatic', 'dramatic', 4.923623917106626, 0], [0, 'easy', 'easy', 5.616771097666572, 0], [0, 'eighteen', 'eighteen', 4.518158808998462, 0], [0, 'emperor', 'emperor', 4.923623917106626, 0], [0, 'end', 'end', 4.923623917106626, 0], [0, 'entire', 'entire', 4.923623917106626, 0], [0, 'evening', 'evening', 5.616771097666572, 0], [0, 'family', 'family', 4.518158808998462, 0], [0, 'farm', 'farm', 5.616771097666572, 0], [0, 'father', 'father', 4.923623917106626, 0], [0, 'fill', 'fill', 5.616771097666572, 0], [0, 'flake', 'flake', 5.616771097666572, 0], [0, 'four', 'four', 4.923623917106626, 0], [0, 'frozen', 'frozen', 5.616771097666572, 0], [0, 'gives', 'gives', 5.616771097666572, 0], [0, 'go', 'go', 5.616771097666572, 0], [0, 'going', 'going', 4.923623917106626, 0], [0, 'had', 'had', 4.923623917106626, 0], [0, 'happen', 'happen', 4.923623917106626, 0], [0, 'harness', 'harness', 5.616771097666572, 0], [0, 'have', 'have', 5.616771097666572, 0], [0, 'he', 'he', 4.518158808998462, 0], [0, 'help', 'help', 4.923623917106626, 0], [0, 'her', 'her', 4.923623917106626, 0], [0, 'here', 'here', 5.616771097666572, 0], [0, 'him', 'him', 4.923623917106626, 0], [0, 'his', 'his', 3.670860948611258, 0], [0, 'horse', 'horse', 5.616771097666572, 0], [0, 'house', 'house', 5.616771097666572, 0], [0, 'household', 'household', 4.923623917106626, 0], [0, 'i', 'i', 4.923623917106626, 0], [0, 'if', 'if', 5.616771097666572, 0], [0, 'in', 'in', 4.230476736546681, 0], [0, 'including', 'including', 4.923623917106626, 0], [0, 'instructive', 'instructive', 4.923623917106626, 0], [0, 'introduced', 'introduced', 4.923623917106626, 0], [0, 'is', 'is', 4.518158808998462, 0], [0, 'it', 'it', 5.616771097666572, 0], [0, 'its', 'its', 4.518158808998462, 0], [0, 'july', 'july', 4.923623917106626, 0], [0, 'keep', 'keep', 5.616771097666572, 0], [0, 'know', 'know', 5.616771097666572, 0], [0, 'lake', 'lake', 5.616771097666572, 0], [0, 'learning', 'learning', 4.923623917106626, 0], [0, 'little', 'little', 5.616771097666572, 0], [0, 'lovely', 'lovely', 5.616771097666572, 0], [0, 'massacre', 'massacre', 4.923623917106626, 0], [0, 'me', 'me', 4.923623917106626, 0], [0, 'miles', 'miles', 5.616771097666572, 0], [0, 'mistake', 'mistake', 5.616771097666572, 0], [0, 'more', 'more', 4.923623917106626, 0], [0, 'movement', 'movement', 4.923623917106626, 0], [0, 'must', 'must', 5.616771097666572, 0], [0, 'my', 'my', 4.923623917106626, 0], [0, 'near', 'near', 5.616771097666572, 0], [0, 'never', 'never', 4.923623917106626, 0], [0, 'nicholas', 'nicholas', 4.230476736546681, 0], [0, 'nineteen', 'nineteen', 4.923623917106626, 0], [0, 'ninety', 'ninety', 4.923623917106626, 0], [0, 'not', 'not', 5.616771097666572, 0], [0, 'november', 'november', 4.923623917106626, 0], [0, 'now', 'now', 4.923623917106626, 0], [0, 'of', 'of', 4.230476736546681, 0], [0, 'old', 'old', 4.923623917106626, 0], [0, 'on', 'on', 4.923623917106626, 0], [0, 'only', 'only', 5.616771097666572, 0], [0, 'other', 'other', 5.616771097666572, 0], [0, 'our', 'our', 4.923623917106626, 0], [0, 'personal', 'personal', 4.923623917106626, 0], [0, 'physician', 'physician', 4.923623917106626, 0], [0, 'promises', 'promises', 5.616771097666572, 0], [0, 'queer', 'queer', 5.616771097666572, 0], [0, 'ran', 'ran', 4.923623917106626, 0], [0, 'reaction', 'reaction', 4.923623917106626, 0], [0, 'romanov', 'romanov', 4.923623917106626, 0], [0, 'rule', 'rule', 4.923623917106626, 0], [0, 'russia', 'russia', 4.518158808998462, 0], [0, 'second', 'second', 4.923623917106626, 0], [0, 'see', 'see', 5.616771097666572, 0], [0, 'shake', 'shake', 5.616771097666572, 0], [0, 'shoulder', 'shoulder', 4.923623917106626, 0], [0, 'simon', 'simon', 4.923623917106626, 0], [0, 'sister', 'sister', 4.923623917106626, 0], [0, 'six', 'six', 4.923623917106626, 0], [0, 'sleep', 'sleep', 5.616771097666572, 0], [0, 'snow', 'snow', 5.616771097666572, 0], [0, 'some', 'some', 5.616771097666572, 0], [0, 'sounds', 'sounds', 5.616771097666572, 0], [0, 'start', 'start', 4.923623917106626, 0], [0, 'stop', 'stop', 5.616771097666572, 0], [0, 'stopping', 'stopping', 5.616771097666572, 0], [0, 'stories', 'stories', 4.923623917106626, 0], [0, 'sweep', 'sweep', 5.616771097666572, 0], [0, 'tears', 'tears', 4.923623917106626, 0], [0, 'telling', 'telling', 4.923623917106626, 0], [0, 'temptation', 'temptation', 4.923623917106626, 0], [0, 'that', 'that', 4.518158808998462, 0], [0, 'the', 'the', 4.923623917106626, 0], [0, 'there', 'there', 4.923623917106626, 0], [0, 'these', 'these', 5.616771097666572, 0], [0, 'think', 'think', 5.616771097666572, 0], [0, 'though', 'though', 5.616771097666572, 0], [0, 'to', 'to', 3.4195465203303517, 0], [0, 'told', 'told', 4.923623917106626, 0], [0, 'triumphant', 'triumphant', 4.923623917106626, 0], [0, 'twenty', 'twenty', 4.923623917106626, 0], [0, 'up', 'up', 5.616771097666572, 0], [0, 'village', 'village', 5.616771097666572, 0], [0, 'wanted', 'wanted', 4.923623917106626, 0], [0, 'was', 'was', 4.923623917106626, 0], [0, 'watch', 'watch', 5.616771097666572, 0], [0, 'what', 'what', 4.923623917106626, 0], [0, 'when', 'when', 4.923623917106626, 0], [0, 'which', 'which', 4.923623917106626, 0], [0, 'whose', 'whose', 5.616771097666572, 0], [0, 'will', 'will', 5.616771097666572, 0], [0, 'wind', 'wind', 5.616771097666572, 0], [0, 'with', 'with', 5.616771097666572, 0], [0, 'without', 'without', 5.616771097666572, 0], [0, 'woods', 'woods', 5.616771097666572, 0], [0, 'year', 'year', 4.923623917106626, 0], [0, 'montenori', 'montenori', 4.923623917106626, 0], [0, 'nicholai', 'nicholai', 4.923623917106626, 0], [0, 'romanovs', 'romanovs', 5.616771097666572, 0], [0, 'sebag', 'sebag', 4.923623917106626, 0]]
In order to figure out what words exist in the transcript, we will do the following things:
The first thing we need to do is compose $L$ and $G$. You should write todo_fstcompose. With arguments $C=A\circ B$ your code should implement this algorithm:
importlib.reload(mp5)
#LG, LGfinal = mp5.todo_fstcompose(L,Lfinal,G,Gfinal)
LG = solutions['LG']
LGfinal = solutions['LGfinal']
print(LGfinal)
print(LG)
[0] [[0, 'ə', '', 0, 1], [0, 'ˌei', '', 0, 2], [0, 'b', '', 0, 3], [0, 'ˌɑ', '', 0, 9], [0, 'ˌæ', '', 0, 22], [0, 'ɑ', '', 0, 33], [0, 'ˈɔ', '', 0, 35], [0, 'ˈæ', '', 0, 39], [0, 'ˈɑ', '', 0, 42], [0, 'ˈɛ', '', 0, 45], [0, 'k', '', 0, 85], [0, 'z', '', 0, 100], [0, 'd', '', 0, 103], [0, 'ˈi', '', 0, 124], [0, 'ˈei', '', 0, 127], [0, 'ɛ', '', 0, 141], [0, 'ɪ', '', 0, 146], [0, 'ˌɛ', '', 0, 151], [0, 'f', '', 0, 160], [0, 'g', '', 0, 188], [0, 'h', '', 0, 196], [0, 'ˈɑɪ', '', 0, 232], [0, 'ˈɪ', '', 0, 234], [0, 'ˌɪ', '', 0, 254], [0, 'dʒ', '', 0, 271], [0, 'n', '', 0, 283], [0, 'l', '', 0, 285], [0, 'm', '', 0, 305], [0, 'ˈoʊ', '', 0, 374], [0, 'ˈʌ', '', 0, 382], [0, 'ˈaʊ', '', 0, 387], [0, 'p', '', 0, 390], [0, 'ɹ', '', 0, 422], [0, 's', '', 0, 444], [0, 'ʃ', '', 0, 450], [0, 't', '', 0, 501], [0, 'ð', '', 0, 522], [0, 'ɵ', '', 0, 532], [0, 'v', '', 0, 556], [0, 'w', '', 0, 561], [0, 'j', '', 0, 612], [1, '', 'a', 4.923623917106626, 0], [1, 'b', '', 0, 6], [1, 'n', '', 0, 37], [1, 't', '', 0, 49], [1, 'v', '', 0, 373], [2, '', 'a', 4.923623917106626, 0], [3, 'ˌaʊ', '', 0, 4], [3, 'i', '', 0, 51], [3, 'ˈei', '', 0, 52], [3, 'ˈi', '', 0, 53], [3, 'ɪ', '', 0, 54], [3, 'ˌi', '', 0, 60], [3, 'ˈɛ', '', 0, 64], [3, 'ˈʊ', '', 0, 75], [3, 'ɹ', '', 0, 77], [3, 'ə', '', 0, 80], [3, 'ˈʌ', '', 0, 82], [3, 'ˈɑɪ', '', 0, 84], [4, 't', '', 0, 5], [5, '', 'about', 4.923623917106626, 0], [6, 'ˈaʊ', '', 0, 7], [7, 't', '', 0, 8], [8, '', 'about', 4.923623917106626, 0], [9, 'l', '', 0, 10], [10, 'ɛ', '', 0, 11], [11, 'k', '', 0, 12], [12, 's', '', 0, 13], [13, 'ˈɑ', '', 0, 14], [14, 'n', '', 0, 15], [15, 'd', '', 0, 16], [16, 'ɹ', '', 0, 17], [17, 'ɑ', '', 0, 18], [18, 'v', '', 0, 19], [19, 'ɪ', '', 0, 20], [20, 'tʃ', '', 0, 21], [21, '', 'aleksandrovich', 4.923623917106626, 0], [22, 'l', '', 0, 23], [23, 'ɪ', '', 0, 24], [24, 'g', '', 0, 25], [25, 'z', '', 0, 26], [26, 'ˈæ', '', 0, 27], [27, 'n', '', 0, 28], [28, 'd', '', 0, 29], [29, 'ɚ', '', 0, 30], [29, 'ə', '', 0, 31], [30, '', 'alexander', 4.923623917106626, 0], [31, 'ɹ', '', 0, 32], [32, '', 'alexander', 4.923623917106626, 0], [33, 'l', '', 0, 34], [34, '', 'all', 4.923623917106626, 0], [35, 'l', '', 0, 36], [35, 'n', '', 0, 378], [36, '', 'all', 4.923623917106626, 0], [37, 'd', '', 0, 38], [38, '', 'and', 4.007333185232471, 0], [39, 'n', '', 0, 40], [39, 'z', '', 0, 44], [39, 's', '', 0, 47], [39, 't', '', 0, 50], [40, 'd', '', 0, 41], [41, '', 'and', 4.007333185232471, 0], [42, 'ɹ', '', 0, 43], [42, 'n', '', 0, 377], [43, '', 'our', 4.923623917106626, 0], [43, '', 'are', 4.923623917106626, 0], [44, '', 'as', 4.923623917106626, 0], [45, 'z', '', 0, 46], [45, 'm', '', 0, 131], [45, 'n', '', 0, 139], [46, '', 'as', 4.923623917106626, 0], [47, 'k', '', 0, 48], [48, '', 'ask', 5.616771097666572, 0], [49, '', 'what', 4.923623917106626, 0], [49, '', 'at', 4.923623917106626, 0], [50, '', 'at', 4.923623917106626, 0], [51, '', 'be', 4.923623917106626, 0], [51, 't', '', 0, 67], [52, '', 'be', 4.923623917106626, 0], [53, '', 'be', 4.923623917106626, 0], [54, 'f', '', 0, 55], [54, 't', '', 0, 71], [55, 'ˈoʊ', '', 0, 56], [55, 'ˈɔ', '', 0, 58], [56, 'ɹ', '', 0, 57], [57, '', 'before', 5.616771097666572, 0], [58, 'ɹ', '', 0, 59], [59, '', 'before', 5.616771097666572, 0], [60, 'f', '', 0, 61], [61, 'ˈɔ', '', 0, 62], [62, 'ɹ', '', 0, 63], [63, '', 'before', 5.616771097666572, 0], [64, 'l', '', 0, 65], [65, 'z', '', 0, 66], [66, '', 'bells', 5.616771097666572, 0], [67, 'w', '', 0, 68], [68, 'ˈi', '', 0, 69], [69, 'n', '', 0, 70], [70, '', 'between', 5.616771097666572, 0], [71, 'w', '', 0, 72], [72, 'ˈi', '', 0, 73], [73, 'n', '', 0, 74], [74, '', 'between', 5.616771097666572, 0], [75, 'k', '', 0, 76], [76, '', 'book', 4.923623917106626, 0], [77, 'ˈoʊ', '', 0, 78], [78, 'k', '', 0, 79], [79, '', 'broke', 4.923623917106626, 0], [80, 't', '', 0, 81], [81, '', 'but', 4.923623917106626, 0], [82, 't', '', 0, 83], [83, '', 'but', 4.923623917106626, 0], [84, '', 'by', 4.518158808998462, 0], [85, 'ˈɑ', '', 0, 86], [85, 'ɹ', '', 0, 97], [85, 'ˈi', '', 0, 281], [85, 'w', '', 0, 419], [86, 'm', '', 0, 87], [87, 'j', '', 0, 88], [88, 'ə', '', 0, 89], [89, 'n', '', 0, 90], [90, 'ɪ', '', 0, 91], [90, 'ə', '', 0, 94], [91, 's', '', 0, 92], [92, 't', '', 0, 93], [93, '', 'communist', 4.923623917106626, 0], [94, 's', '', 0, 95], [95, 't', '', 0, 96], [96, '', 'communist', 4.923623917106626, 0], [97, 'ˈɑɪ', '', 0, 98], [98, 'd', '', 0, 99], [99, '', 'cried', 4.923623917106626, 0], [100, 'ˈɑ', '', 0, 101], [101, 'ɹ', '', 0, 102], [102, '', 'czar', 4.923623917106626, 0], [103, 'ˈɑ', '', 0, 104], [103, 'ˈi', '', 0, 110], [103, 'ˈɑɪ', '', 0, 112], [103, 'ˈaʊ', '', 0, 114], [103, 'ɹ', '', 0, 117], [104, 'ɹ', '', 0, 105], [105, 'k', '', 0, 106], [106, '', 'dark', 5.616771097666572, 0], [106, 'ə', '', 0, 107], [107, 's', '', 0, 108], [108, 't', '', 0, 109], [109, '', 'darkest', 5.616771097666572, 0], [110, 'p', '', 0, 111], [111, '', 'deep', 5.616771097666572, 0], [112, 'd', '', 0, 113], [113, '', 'died', 4.923623917106626, 0], [114, 'n', '', 0, 115], [115, '', 'down', 4.923623917106626, 0], [115, 'i', '', 0, 116], [116, '', 'downy', 5.616771097666572, 0], [117, 'ə', '', 0, 118], [118, 'm', '', 0, 119], [119, 'ˈæ', '', 0, 120], [120, 'ɾ', '', 0, 121], [121, 'ɪ', '', 0, 122], [122, 'k', '', 0, 123], [123, '', 'dramatic', 4.923623917106626, 0], [124, 'z', '', 0, 125], [124, 'v', '', 0, 156], [125, 'i', '', 0, 126], [126, '', 'easy', 5.616771097666572, 0], [127, 't', '', 0, 128], [128, 'ˈi', '', 0, 129], [129, 'n', '', 0, 130], [130, '', 'eighteen', 4.518158808998462, 0], [131, 'p', '', 0, 132], [132, 'ɚ', '', 0, 133], [132, 'ə', '', 0, 135], [133, 'ɚ', '', 0, 134], [134, '', 'emperor', 4.923623917106626, 0], [135, 'ɹ', '', 0, 136], [136, 'ə', '', 0, 137], [137, 'ɹ', '', 0, 138], [138, '', 'emperor', 4.923623917106626, 0], [139, 'd', '', 0, 140], [140, '', 'end', 4.923623917106626, 0], [141, 'n', '', 0, 142], [142, '', 'when', 4.923623917106626, 0], [142, 't', '', 0, 143], [143, 'ˈɑɪ', '', 0, 144], [144, 'ɚ', '', 0, 145], [145, '', 'entire', 4.923623917106626, 0], [146, 'n', '', 0, 147], [146, 'm', '', 0, 220], [146, 'f', '', 0, 233], [146, 'z', '', 0, 268], [146, 't', '', 0, 269], [146, 'tʃ', '', 0, 596], [147, 't', '', 0, 148], [147, 'k', '', 0, 236], [147, 's', '', 0, 246], [148, 'ˈɑɪ', '', 0, 149], [149, 'ɚ', '', 0, 150], [150, '', 'entire', 4.923623917106626, 0], [151, 'n', '', 0, 152], [152, 't', '', 0, 153], [153, 'ˌɑɪ', '', 0, 154], [154, 'ɹ', '', 0, 155], [155, '', 'entire', 4.923623917106626, 0], [156, 'n', '', 0, 157], [157, 'ɪ', '', 0, 158], [158, 'ŋ', '', 0, 159], [159, '', 'evening', 5.616771097666572, 0], [160, 'ˈæ', '', 0, 161], [160, 'ˈɑ', '', 0, 168], [160, 'ˈɪ', '', 0, 175], [160, 'l', '', 0, 177], [160, 'oʊ', '', 0, 180], [160, 'ˈɔ', '', 0, 182], [160, 'ɹ', '', 0, 184], [160, 'ə', '', 0, 402], [160, 'ɪ', '', 0, 407], [161, 'm', '', 0, 162], [162, 'ə', '', 0, 163], [162, 'l', '', 0, 166], [163, 'l', '', 0, 164], [164, 'i', '', 0, 165], [165, '', 'family', 4.518158808998462, 0], [166, 'i', '', 0, 167], [167, '', 'family', 4.518158808998462, 0], [168, 'ɹ', '', 0, 169], [168, 'ð', '', 0, 171], [169, 'm', '', 0, 170], [170, '', 'farm', 5.616771097666572, 0], [171, 'ɚ', '', 0, 172], [171, 'ə', '', 0, 173], [172, '', 'father', 4.923623917106626, 0], [173, 'ɹ', '', 0, 174], [174, '', 'father', 4.923623917106626, 0], [175, 'l', '', 0, 176], [176, '', 'fill', 5.616771097666572, 0], [177, 'ˈei', '', 0, 178], [178, 'k', '', 0, 179], [179, '', 'flake', 5.616771097666572, 0], [180, 'ɹ', '', 0, 181], [181, '', 'four', 4.923623917106626, 0], [182, 'ɹ', '', 0, 183], [183, '', 'four', 4.923623917106626, 0], [184, 'ˈoʊ', '', 0, 185], [185, 'z', '', 0, 186], [186, 'n̩', '', 0, 187], [187, '', 'frozen', 5.616771097666572, 0], [188, 'ˈɪ', '', 0, 189], [188, 'ˈoʊ', '', 0, 192], [189, 'v', '', 0, 190], [190, 'z', '', 0, 191], [191, '', 'gives', 5.616771097666572, 0], [192, '', 'go', 5.616771097666572, 0], [192, 'ɪ', '', 0, 193], [193, 'n', '', 0, 194], [193, 'ŋ', '', 0, 195], [194, '', 'going', 4.923623917106626, 0], [195, '', 'going', 4.923623917106626, 0], [196, 'ˈæ', '', 0, 197], [196, 'ˈɑ', '', 0, 201], [196, 'ˈi', '', 0, 207], [196, 'ˈʌ', '', 0, 208], [196, 'ˈɛ', '', 0, 209], [196, 'ɚ', '', 0, 212], [196, 'ˈɝ', '', 0, 214], [196, 'ˈɪ', '', 0, 215], [196, 'ɪ', '', 0, 217], [196, 'ˈɔ', '', 0, 222], [196, 'ˈaʊ', '', 0, 225], [196, 'w', '', 0, 579], [196, 'ˈu', '', 0, 597], [197, 'd', '', 0, 198], [197, 'p', '', 0, 199], [197, 'v', '', 0, 206], [198, '', 'had', 4.923623917106626, 0], [199, 'n̩', '', 0, 200], [200, '', 'happen', 4.923623917106626, 0], [201, 'ɹ', '', 0, 202], [202, 'n', '', 0, 203], [203, 'ɪ', '', 0, 204], [204, 's', '', 0, 205], [205, '', 'harness', 5.616771097666572, 0], [206, '', 'have', 5.616771097666572, 0], [207, '', 'he', 4.518158808998462, 0], [208, '', 'he', 4.518158808998462, 0], [209, 'l', '', 0, 210], [210, 'p', '', 0, 211], [211, '', 'help', 4.923623917106626, 0], [212, 'ɹ', '', 0, 213], [213, '', 'her', 4.923623917106626, 0], [214, '', 'her', 4.923623917106626, 0], [215, 'ɹ', '', 0, 216], [215, 'm', '', 0, 219], [216, '', 'here', 5.616771097666572, 0], [217, 'm', '', 0, 218], [217, 'z', '', 0, 221], [218, '', 'him', 4.923623917106626, 0], [219, '', 'him', 4.923623917106626, 0], [220, '', 'him', 4.923623917106626, 0], [221, '', 'his', 3.670860948611258, 0], [222, 'ɹ', '', 0, 223], [223, 's', '', 0, 224], [224, '', 'horse', 5.616771097666572, 0], [225, 's', '', 0, 226], [225, 'z', '', 0, 227], [226, '', 'house', 5.616771097666572, 0], [226, 'h', '', 0, 228], [227, '', 'house', 5.616771097666572, 0], [228, 'ˌoʊ', '', 0, 229], [229, 'l', '', 0, 230], [230, 'd', '', 0, 231], [231, '', 'household', 4.923623917106626, 0], [232, '', 'i', 4.923623917106626, 0], [233, '', 'if', 5.616771097666572, 0], [234, 'n', '', 0, 235], [235, '', 'in', 4.230476736546681, 0], [236, 'l', '', 0, 237], [237, 'u', '', 0, 238], [237, 'ˈu', '', 0, 242], [238, 'd', '', 0, 239], [239, 'ɪ', '', 0, 240], [240, 'ŋ', '', 0, 241], [241, '', 'including', 4.923623917106626, 0], [242, 'd', '', 0, 243], [243, 'ɪ', '', 0, 244], [244, 'ŋ', '', 0, 245], [245, '', 'including', 4.923623917106626, 0], [246, 't', '', 0, 247], [247, 'ɹ', '', 0, 248], [248, 'ˈʌ', '', 0, 249], [249, 'k', '', 0, 250], [250, 't', '', 0, 251], [251, 'ɪ', '', 0, 252], [252, 'v', '', 0, 253], [253, '', 'instructive', 4.923623917106626, 0], [254, 'n', '', 0, 255], [255, 't', '', 0, 256], [256, 'ɹ', '', 0, 257], [257, 'oʊ', '', 0, 258], [257, 'ə', '', 0, 263], [258, 'd', '', 0, 259], [259, 'ˈu', '', 0, 260], [260, 's', '', 0, 261], [261, 't', '', 0, 262], [262, '', 'introduced', 4.923623917106626, 0], [263, 'd', '', 0, 264], [264, 'ˈu', '', 0, 265], [265, 's', '', 0, 266], [266, 't', '', 0, 267], [267, '', 'introduced', 4.923623917106626, 0], [268, '', 'is', 4.518158808998462, 0], [269, '', 'it', 5.616771097666572, 0], [269, 's', '', 0, 270], [270, '', 'its', 4.518158808998462, 0], [271, 'u', '', 0, 272], [271, 'ə', '', 0, 275], [271, 'ˌu', '', 0, 278], [272, 'l', '', 0, 273], [273, 'ˈɑɪ', '', 0, 274], [274, '', 'july', 4.923623917106626, 0], [275, 'l', '', 0, 276], [276, 'ˈɑɪ', '', 0, 277], [277, '', 'july', 4.923623917106626, 0], [278, 'l', '', 0, 279], [279, 'ˈɑɪ', '', 0, 280], [280, '', 'july', 4.923623917106626, 0], [281, 'p', '', 0, 282], [282, '', 'keep', 5.616771097666572, 0], [283, 'ˈoʊ', '', 0, 284], [283, 'ˌi', '', 0, 339], [283, 'ˈɪ', '', 0, 341], [283, 'ˈɛ', '', 0, 343], [283, 'ˈɑɪ', '', 0, 356], [283, 'ˈɑ', '', 0, 362], [283, 'oʊ', '', 0, 364], [283, 'ˈaʊ', '', 0, 372], [284, '', 'know', 5.616771097666572, 0], [285, 'ˈei', '', 0, 286], [285, 'ˈɝ', '', 0, 288], [285, 'ˈɪ', '', 0, 296], [285, 'ˈʌ', '', 0, 301], [286, 'k', '', 0, 287], [287, '', 'lake', 5.616771097666572, 0], [288, 'n', '', 0, 289], [288, 'ɹ', '', 0, 292], [289, 'ɪ', '', 0, 290], [290, 'ŋ', '', 0, 291], [291, '', 'learning', 4.923623917106626, 0], [292, 'n', '', 0, 293], [293, 'ɪ', '', 0, 294], [294, 'ŋ', '', 0, 295], [295, '', 'learning', 4.923623917106626, 0], [296, 't', '', 0, 297], [296, 'ɾ', '', 0, 299], [297, 'l̩', '', 0, 298], [298, '', 'little', 5.616771097666572, 0], [299, 'l̩', '', 0, 300], [300, '', 'little', 5.616771097666572, 0], [301, 'v', '', 0, 302], [302, 'l', '', 0, 303], [303, 'i', '', 0, 304], [304, '', 'lovely', 5.616771097666572, 0], [305, 'ˈæ', '', 0, 306], [305, 'ˈi', '', 0, 313], [305, 'ˈɑɪ', '', 0, 314], [305, 'ɪ', '', 0, 319], [305, 'oʊ', '', 0, 324], [305, 'ˈɔ', '', 0, 326], [305, 'ˈu', '', 0, 328], [305, 'ə', '', 0, 333], [305, 'ˈʌ', '', 0, 336], [305, 'ˌɔ', '', 0, 617], [306, 's', '', 0, 307], [307, 'ə', '', 0, 308], [308, 'k', '', 0, 309], [309, 'ɚ', '', 0, 310], [309, 'ə', '', 0, 311], [310, '', 'massacre', 4.923623917106626, 0], [311, 'ɹ', '', 0, 312], [312, '', 'massacre', 4.923623917106626, 0], [313, '', 'me', 4.923623917106626, 0], [313, '', 'my', 4.923623917106626, 0], [314, '', 'my', 4.923623917106626, 0], [314, 'l̩', '', 0, 315], [314, 'l', '', 0, 317], [315, 'z', '', 0, 316], [316, '', 'miles', 5.616771097666572, 0], [317, 'z', '', 0, 318], [318, '', 'miles', 5.616771097666572, 0], [319, 's', '', 0, 320], [320, 't', '', 0, 321], [321, 'ˈei', '', 0, 322], [322, 'k', '', 0, 323], [323, '', 'mistake', 5.616771097666572, 0], [324, 'ɹ', '', 0, 325], [325, '', 'more', 4.923623917106626, 0], [326, 'ɹ', '', 0, 327], [327, '', 'more', 4.923623917106626, 0], [328, 'v', '', 0, 329], [329, 'm', '', 0, 330], [330, 'n̩', '', 0, 331], [331, 't', '', 0, 332], [332, '', 'movement', 4.923623917106626, 0], [333, 's', '', 0, 334], [334, 't', '', 0, 335], [335, '', 'must', 5.616771097666572, 0], [336, 's', '', 0, 337], [337, 't', '', 0, 338], [338, '', 'must', 5.616771097666572, 0], [339, 'ɹ', '', 0, 340], [340, '', 'near', 5.616771097666572, 0], [341, 'ɹ', '', 0, 342], [341, 'k', '', 0, 348], [342, '', 'near', 5.616771097666572, 0], [343, 'v', '', 0, 344], [344, 'ɚ', '', 0, 345], [344, 'ə', '', 0, 346], [345, '', 'never', 4.923623917106626, 0], [346, 'ɹ', '', 0, 347], [347, '', 'never', 4.923623917106626, 0], [348, 'ə', '', 0, 349], [348, 'l', '', 0, 353], [349, 'l', '', 0, 350], [350, 'ə', '', 0, 351], [350, 'ɑɪ', '', 0, 625], [351, 's', '', 0, 352], [352, '', 'nicholas', 4.230476736546681, 0], [353, 'ə', '', 0, 354], [354, 's', '', 0, 355], [355, '', 'nicholas', 4.230476736546681, 0], [356, 'n', '', 0, 357], [357, 't', '', 0, 358], [358, 'ˈi', '', 0, 359], [358, 'i', '', 0, 361], [359, 'n', '', 0, 360], [360, '', 'nineteen', 4.923623917106626, 0], [361, '', 'ninety', 4.923623917106626, 0], [362, 't', '', 0, 363], [363, '', 'not', 5.616771097666572, 0], [364, 'v', '', 0, 365], [365, 'ˈɛ', '', 0, 366], [366, 'm', '', 0, 367], [367, 'b', '', 0, 368], [368, 'ɚ', '', 0, 369], [368, 'ə', '', 0, 370], [369, '', 'november', 4.923623917106626, 0], [370, 'ɹ', '', 0, 371], [371, '', 'november', 4.923623917106626, 0], [372, '', 'now', 4.923623917106626, 0], [373, '', 'of', 4.230476736546681, 0], [374, 'l', '', 0, 375], [374, 'n', '', 0, 379], [375, 'd', '', 0, 376], [376, '', 'old', 4.923623917106626, 0], [377, '', 'on', 4.923623917106626, 0], [378, '', 'on', 4.923623917106626, 0], [379, 'l', '', 0, 380], [380, 'i', '', 0, 381], [381, '', 'only', 5.616771097666572, 0], [382, 'ð', '', 0, 383], [382, 'p', '', 0, 555], [383, 'ɚ', '', 0, 384], [383, 'ə', '', 0, 385], [384, '', 'other', 5.616771097666572, 0], [385, 'ɹ', '', 0, 386], [386, '', 'other', 5.616771097666572, 0], [387, 'ɚ', '', 0, 388], [387, 'ɹ', '', 0, 389], [388, '', 'our', 4.923623917106626, 0], [389, '', 'our', 4.923623917106626, 0], [390, 'ˈɝ', '', 0, 391], [390, 'ɹ', '', 0, 412], [391, 's', '', 0, 392], [391, 'ɹ', '', 0, 397], [392, 'ɪ', '', 0, 393], [393, 'n', '', 0, 394], [394, 'ɪ', '', 0, 395], [395, 'l', '', 0, 396], [396, '', 'personal', 4.923623917106626, 0], [397, 's', '', 0, 398], [398, 'ə', '', 0, 399], [399, 'n', '', 0, 400], [400, 'l̩', '', 0, 401], [401, '', 'personal', 4.923623917106626, 0], [402, 'z', '', 0, 403], [403, 'ˈɪ', '', 0, 404], [404, 'ʃ', '', 0, 405], [405, 'n̩', '', 0, 406], [406, '', 'physician', 4.923623917106626, 0], [407, 'z', '', 0, 408], [408, 'ˈɪ', '', 0, 409], [409, 'ʃ', '', 0, 410], [410, 'n̩', '', 0, 411], [411, '', 'physician', 4.923623917106626, 0], [412, 'ˈɑ', '', 0, 413], [413, 'm', '', 0, 414], [414, 'ə', '', 0, 415], [415, 's', '', 0, 416], [416, 'ə', '', 0, 417], [417, 'z', '', 0, 418], [418, '', 'promises', 5.616771097666572, 0], [419, 'ˈɪ', '', 0, 420], [420, 'ɹ', '', 0, 421], [421, '', 'queer', 5.616771097666572, 0], [422, 'ɑ', '', 0, 423], [422, 'ˈæ', '', 0, 425], [422, 'i', '', 0, 427], [422, 'ˈoʊ', '', 0, 432], [422, 'ˈu', '', 0, 439], [422, 'ˈʌ', '', 0, 441], [423, 'n', '', 0, 424], [424, '', 'ran', 4.923623917106626, 0], [425, 'n', '', 0, 426], [426, '', 'ran', 4.923623917106626, 0], [427, 'ˈæ', '', 0, 428], [428, 'k', '', 0, 429], [429, 'ʃ', '', 0, 430], [430, 'n̩', '', 0, 431], [431, '', 'reaction', 4.923623917106626, 0], [432, 'm', '', 0, 433], [433, 'ə', '', 0, 434], [434, 'n', '', 0, 435], [435, 'ˌɔ', '', 0, 436], [436, 'f', '', 0, 437], [436, 'v', '', 0, 438], [437, '', 'romanov', 4.923623917106626, 0], [437, 's', '', 0, 626], [438, '', 'romanov', 4.923623917106626, 0], [439, 'l', '', 0, 440], [440, '', 'rule', 4.923623917106626, 0], [441, 'ʃ', '', 0, 442], [442, 'ə', '', 0, 443], [443, '', 'russia', 4.518158808998462, 0], [444, 'ˈɛ', '', 0, 445], [444, 'ˈi', '', 0, 449], [444, 'ˈɑɪ', '', 0, 459], [444, 'ˈɪ', '', 0, 462], [444, 'ɪ', '', 0, 468], [444, 'l', '', 0, 473], [444, 'n', '', 0, 476], [444, 'ˈʌ', '', 0, 478], [444, 'ˈaʊ', '', 0, 480], [444, 't', '', 0, 485], [444, 'w', '', 0, 498], [445, 'k', '', 0, 446], [445, 'b', '', 0, 627], [446, 'n̩', '', 0, 447], [447, '', 'second', 4.923623917106626, 0], [447, 'd', '', 0, 448], [448, '', 'second', 4.923623917106626, 0], [449, '', 'see', 5.616771097666572, 0], [450, 'ˈei', '', 0, 451], [450, 'ˈoʊ', '', 0, 453], [451, 'k', '', 0, 452], [452, '', 'shake', 5.616771097666572, 0], [453, 'l', '', 0, 454], [454, 'd', '', 0, 455], [455, 'ɚ', '', 0, 456], [455, 'ə', '', 0, 457], [456, '', 'shoulder', 4.923623917106626, 0], [457, 'ɹ', '', 0, 458], [458, '', 'shoulder', 4.923623917106626, 0], [459, 'm', '', 0, 460], [460, 'n̩', '', 0, 461], [461, '', 'simon', 4.923623917106626, 0], [462, 's', '', 0, 463], [462, 'k', '', 0, 471], [463, 't', '', 0, 464], [464, 'ɚ', '', 0, 465], [464, 'ə', '', 0, 466], [465, '', 'sister', 4.923623917106626, 0], [466, 'ɹ', '', 0, 467], [467, '', 'sister', 4.923623917106626, 0], [468, 'k', '', 0, 469], [469, 's', '', 0, 470], [470, '', 'six', 4.923623917106626, 0], [471, 's', '', 0, 472], [472, '', 'six', 4.923623917106626, 0], [473, 'ˈi', '', 0, 474], [474, 'p', '', 0, 475], [475, '', 'sleep', 5.616771097666572, 0], [476, 'ˈoʊ', '', 0, 477], [477, '', 'snow', 5.616771097666572, 0], [478, 'm', '', 0, 479], [479, '', 'some', 5.616771097666572, 0], [480, 'n', '', 0, 481], [481, 'd', '', 0, 482], [481, 'z', '', 0, 484], [482, 'z', '', 0, 483], [483, '', 'sounds', 5.616771097666572, 0], [484, '', 'sounds', 5.616771097666572, 0], [485, 'ɑ', '', 0, 486], [485, 'ˈɑ', '', 0, 488], [485, 'ˈɔ', '', 0, 494], [486, 't', '', 0, 487], [487, '', 'start', 4.923623917106626, 0], [488, 'ɹ', '', 0, 489], [488, 'p', '', 0, 491], [489, 't', '', 0, 490], [490, '', 'start', 4.923623917106626, 0], [491, '', 'stop', 5.616771097666572, 0], [491, 'ɪ', '', 0, 492], [492, 'ŋ', '', 0, 493], [493, '', 'stopping', 5.616771097666572, 0], [494, 'ɹ', '', 0, 495], [495, 'i', '', 0, 496], [496, 'z', '', 0, 497], [497, '', 'stories', 4.923623917106626, 0], [498, 'ˈi', '', 0, 499], [499, 'p', '', 0, 500], [500, '', 'sweep', 5.616771097666572, 0], [501, 'ˈɛ', '', 0, 502], [501, 'ˈɪ', '', 0, 505], [501, 'ɛ', '', 0, 511], [501, 'ˈoʊ', '', 0, 537], [501, 'ˈu', '', 0, 538], [501, 'ˈʌ', '', 0, 539], [501, 'ɹ', '', 0, 542], [501, 'w', '', 0, 549], [502, 'ɹ', '', 0, 503], [502, 'l', '', 0, 508], [503, 'z', '', 0, 504], [504, '', 'tears', 4.923623917106626, 0], [505, 'ɹ', '', 0, 506], [506, 'z', '', 0, 507], [507, '', 'tears', 4.923623917106626, 0], [508, 'ɪ', '', 0, 509], [509, 'ŋ', '', 0, 510], [510, '', 'telling', 4.923623917106626, 0], [511, 'm', '', 0, 512], [512, 't', '', 0, 513], [512, 'p', '', 0, 517], [513, 'ˈei', '', 0, 514], [514, 'ʃ', '', 0, 515], [515, 'n̩', '', 0, 516], [516, '', 'temptation', 4.923623917106626, 0], [517, 't', '', 0, 518], [518, 'ˈei', '', 0, 519], [519, 'ʃ', '', 0, 520], [520, 'n̩', '', 0, 521], [521, '', 'temptation', 4.923623917106626, 0], [522, 'ə', '', 0, 523], [522, 'ˈæ', '', 0, 525], [522, 'ˈi', '', 0, 527], [522, 'ˈʌ', '', 0, 528], [522, 'ˈɛ', '', 0, 529], [522, 'ˈoʊ', '', 0, 536], [523, '', 'the', 4.923623917106626, 0], [523, 't', '', 0, 524], [524, '', 'that', 4.518158808998462, 0], [525, 't', '', 0, 526], [526, '', 'that', 4.518158808998462, 0], [527, '', 'the', 4.923623917106626, 0], [527, 'z', '', 0, 531], [528, '', 'the', 4.923623917106626, 0], [529, 'ɹ', '', 0, 530], [530, '', 'there', 4.923623917106626, 0], [531, '', 'these', 5.616771097666572, 0], [532, 'ˈɪ', '', 0, 533], [533, 'ŋ', '', 0, 534], [534, 'k', '', 0, 535], [535, '', 'think', 5.616771097666572, 0], [536, '', 'though', 5.616771097666572, 0], [537, '', 'to', 3.4195465203303517, 0], [537, 'l', '', 0, 540], [538, '', 'to', 3.4195465203303517, 0], [539, '', 'to', 3.4195465203303517, 0], [540, 'd', '', 0, 541], [541, '', 'told', 4.923623917106626, 0], [542, 'ɑɪ', '', 0, 543], [543, 'ˈʌ', '', 0, 544], [544, 'm', '', 0, 545], [545, 'f', '', 0, 546], [546, 'n̩', '', 0, 547], [547, 't', '', 0, 548], [548, '', 'triumphant', 4.923623917106626, 0], [549, 'ˈɛ', '', 0, 550], [550, 'n', '', 0, 551], [551, 'i', '', 0, 552], [551, 't', '', 0, 553], [552, '', 'twenty', 4.923623917106626, 0], [553, 'i', '', 0, 554], [554, '', 'twenty', 4.923623917106626, 0], [555, '', 'up', 5.616771097666572, 0], [556, 'ˈɪ', '', 0, 557], [557, 'l', '', 0, 558], [558, 'ɪ', '', 0, 559], [559, 'dʒ', '', 0, 560], [560, '', 'village', 5.616771097666572, 0], [561, 'ˈɔ', '', 0, 562], [561, 'ˈɑ', '', 0, 566], [561, 'ə', '', 0, 574], [561, 'ˈʌ', '', 0, 582], [561, 'ˌɑ', '', 0, 584], [561, 'ˈɛ', '', 0, 590], [561, 'ˈɪ', '', 0, 592], [561, 'ɪ', '', 0, 599], [561, 'ˈɑɪ', '', 0, 602], [561, 'ˈʊ', '', 0, 609], [562, 'n', '', 0, 563], [562, 'z', '', 0, 577], [563, 'ɪ', '', 0, 564], [563, 't', '', 0, 571], [564, 'd', '', 0, 565], [565, '', 'wanted', 4.923623917106626, 0], [566, 'n', '', 0, 567], [566, 'z', '', 0, 576], [566, 'tʃ', '', 0, 578], [567, 't', '', 0, 568], [568, 'ə', '', 0, 569], [569, 'd', '', 0, 570], [570, '', 'wanted', 4.923623917106626, 0], [571, 'ɪ', '', 0, 572], [572, 'd', '', 0, 573], [573, '', 'wanted', 4.923623917106626, 0], [574, 'z', '', 0, 575], [575, '', 'was', 4.923623917106626, 0], [576, '', 'was', 4.923623917106626, 0], [577, '', 'was', 4.923623917106626, 0], [578, '', 'watch', 5.616771097666572, 0], [579, 'ˈʌ', '', 0, 580], [579, 'ˈɛ', '', 0, 586], [579, 'ˈɪ', '', 0, 588], [580, 't', '', 0, 581], [581, '', 'what', 4.923623917106626, 0], [582, 't', '', 0, 583], [583, '', 'what', 4.923623917106626, 0], [584, 't', '', 0, 585], [585, '', 'what', 4.923623917106626, 0], [586, 'n', '', 0, 587], [587, '', 'when', 4.923623917106626, 0], [588, 'n', '', 0, 589], [588, 'tʃ', '', 0, 594], [589, '', 'when', 4.923623917106626, 0], [590, 'n', '', 0, 591], [591, '', 'when', 4.923623917106626, 0], [592, 'n', '', 0, 593], [592, 'tʃ', '', 0, 595], [593, '', 'when', 4.923623917106626, 0], [593, 'd', '', 0, 601], [594, '', 'which', 4.923623917106626, 0], [595, '', 'which', 4.923623917106626, 0], [596, '', 'which', 4.923623917106626, 0], [597, 'z', '', 0, 598], [598, '', 'whose', 5.616771097666572, 0], [599, 'l', '', 0, 600], [599, 'ð', '', 0, 605], [599, 'ɵ', '', 0, 606], [600, '', 'will', 5.616771097666572, 0], [601, '', 'wind', 5.616771097666572, 0], [602, 'n', '', 0, 603], [603, 'd', '', 0, 604], [604, '', 'wind', 5.616771097666572, 0], [605, '', 'with', 5.616771097666572, 0], [605, 'ˈaʊ', '', 0, 607], [606, '', 'with', 5.616771097666572, 0], [607, 't', '', 0, 608], [608, '', 'without', 5.616771097666572, 0], [609, 'd', '', 0, 610], [610, 'z', '', 0, 611], [611, '', 'woods', 5.616771097666572, 0], [612, 'ˌi', '', 0, 613], [612, 'ˈɪ', '', 0, 615], [613, 'ɹ', '', 0, 614], [614, '', 'year', 4.923623917106626, 0], [615, 'ɹ', '', 0, 616], [616, '', 'year', 4.923623917106626, 0], [617, 'n', '', 0, 618], [618, 't', '', 0, 619], [619, 'ə', '', 0, 620], [620, 'n', '', 0, 621], [621, 'ˈoʊ', '', 0, 622], [622, 'ɹ', '', 0, 623], [623, 'i', '', 0, 624], [624, '', 'montenori', 4.923623917106626, 0], [625, '', 'nicholai', 4.923623917106626, 0], [626, '', 'romanovs', 5.616771097666572, 0], [627, 'æ', '', 0, 628], [628, 'g', '', 0, 629], [629, '', 'sebag', 4.923623917106626, 0]]
Now let's compose $TLG=T\circ LG$, in order to get the set of all valid full-word paths through the transcription. We know that the only possible final state of $LG$ is state $0$. The only possible final state of $T$ is its largest integer, equal to the number of transitions.
importlib.reload(mp5)
#TLG, TLGfinal = mp5.todo_fstcompose(T,Tfinal,LG,LGfinal)
TLG = solutions['TLG']
TLGfinal = solutions['TLGfinal']
print(TLGfinal)
print(TLG[:45])
[215460] [[0, 'h', '', 0, 826], [1, '', 'a', 4.923623917106626, 0], [2, '', 'a', 4.923623917106626, 0], [5, '', 'about', 4.923623917106626, 0], [8, '', 'about', 4.923623917106626, 0], [21, '', 'aleksandrovich', 4.923623917106626, 0], [30, '', 'alexander', 4.923623917106626, 0], [32, '', 'alexander', 4.923623917106626, 0], [34, '', 'all', 4.923623917106626, 0], [36, '', 'all', 4.923623917106626, 0], [38, '', 'and', 4.007333185232471, 0], [41, '', 'and', 4.007333185232471, 0], [43, '', 'are', 4.923623917106626, 0], [43, '', 'our', 4.923623917106626, 0], [44, '', 'as', 4.923623917106626, 0], [46, '', 'as', 4.923623917106626, 0], [48, '', 'ask', 5.616771097666572, 0], [49, '', 'what', 4.923623917106626, 0], [49, '', 'at', 4.923623917106626, 0], [50, '', 'at', 4.923623917106626, 0], [51, '', 'be', 4.923623917106626, 0], [52, '', 'be', 4.923623917106626, 0], [53, '', 'be', 4.923623917106626, 0], [57, '', 'before', 5.616771097666572, 0], [59, '', 'before', 5.616771097666572, 0], [63, '', 'before', 5.616771097666572, 0], [66, '', 'bells', 5.616771097666572, 0], [70, '', 'between', 5.616771097666572, 0], [74, '', 'between', 5.616771097666572, 0], [76, '', 'book', 4.923623917106626, 0], [79, '', 'broke', 4.923623917106626, 0], [81, '', 'but', 4.923623917106626, 0], [83, '', 'but', 4.923623917106626, 0], [84, '', 'by', 4.518158808998462, 0], [93, '', 'communist', 4.923623917106626, 0], [96, '', 'communist', 4.923623917106626, 0], [99, '', 'cried', 4.923623917106626, 0], [102, '', 'czar', 4.923623917106626, 0], [106, '', 'dark', 5.616771097666572, 0], [109, '', 'darkest', 5.616771097666572, 0], [111, '', 'deep', 5.616771097666572, 0], [113, '', 'died', 4.923623917106626, 0], [115, '', 'down', 4.923623917106626, 0], [116, '', 'downy', 5.616771097666572, 0], [123, '', 'dramatic', 4.923623917106626, 0]]
Obviously, TLG contains a lot of useless transitions! Specifically, it contains a lot of transitions that are disconnected from the graph --- no way to get to them, and/or nowhere to go from them. Since TLG contains no cycles (can you see why?), we can eliminate those useless transitions by topologically sorting the graph, using a topological sort function that only keeps transitions that can be reached from the initial state.
importlib.reload(mp5)
#TLG_sorted, TLGfinal_sorted = mp5.todo_sort_topologically(TLG,TLGfinal)
TLG_sorted = solutions['TLG_sorted']
TLGfinal_sorted = solutions['TLGfinal_sorted']
print(TLG_sorted[:45])
[[0, 'h', '', 0, 1], [1, 'ˈu', '', 0, 2], [2, 'z', '', 0, 3], [3, '', 'whose', 5.616771097666572, 4], [4, 'w', '', 0, 5], [5, 'ˈʊ', '', 0, 6], [6, 'd', '', 0, 7], [7, 'z', '', 0, 8], [8, '', 'woods', 5.616771097666572, 9], [9, 'ð', '', 0, 10], [10, 'ˈi', '', 0, 11], [11, '', 'the', 4.923623917106626, 12], [11, 'z', '', 0, 13], [12, 'z', '', 0, 14], [13, '', 'these', 5.616771097666572, 15], [14, 'ˈɑ', '', 0, 16], [15, 'ˈɑ', '', 0, 17], [16, 'ɹ', '', 0, 18], [17, 'ɹ', '', 0, 19], [18, '', 'czar', 4.923623917106626, 20], [19, '', 'are', 4.923623917106626, 20], [19, '', 'our', 4.923623917106626, 20], [20, 'ˈɑɪ', '', 0, 21], [21, '', 'i', 4.923623917106626, 22], [22, 'ɵ', '', 0, 23], [23, 'ˈɪ', '', 0, 24], [24, 'ŋ', '', 0, 25], [25, 'k', '', 0, 26], [26, '', 'think', 5.616771097666572, 27], [27, 'ˈɑɪ', '', 0, 28], [28, '', 'i', 4.923623917106626, 29], [29, 'n', '', 0, 30], [30, 'ˈoʊ', '', 0, 31], [31, '', 'know', 5.616771097666572, 32], [32, 'h', '', 0, 33], [33, 'ɪ', '', 0, 34], [34, 'z', '', 0, 35], [35, '', 'his', 3.670860948611258, 36], [36, 'h', '', 0, 37], [37, 'ˈaʊ', '', 0, 38], [38, 's', '', 0, 39], [39, '', 'house', 5.616771097666572, 40], [40, 'ɪ', '', 0, 41], [41, 'z', '', 0, 42], [42, '', 'is', 4.518158808998462, 43]]
One reason to topologically sort the graph is that it makes the bestpath algorithm, the forward algorithm, and the backward algorithm all much more efficient.
importlib.reload(mp5)
#delta, psi, bestpath = mp5.todo_fstbestpath(TLG_sorted,TLGfinal_sorted)
bestpath = solutions['bestpath']
print([ t[2] for t in bestpath if t[2]!='' ])
['whose', 'woods', 'the', 'czar', 'i', 'think', 'i', 'know', 'his', 'house', 'is', 'in', 'the', 'village', 'though', 'he', 'will', 'not', 'see', 'me', 'stopping', 'here', 'to', 'watch', 'his', 'woods', 'fill', 'up', 'with', 'snow', 'me', 'little', 'horse', 'must', 'think', 'it', 'queer', 'to', 'stop', 'without', 'a', 'farm', 'house', 'near', 'between', 'the', 'woods', 'and', 'frozen', 'lake', 'the', 'darkest', 'evening', 'of', 'the', 'year', 'he', 'gives', 'his', 'harness', 'bells', 'a', 'shake', 'to', 'ask', 'if', 'there', 'is', 'some', 'mistake', 'the', 'only', 'other', 'sounds', 'the', 'sweep', 'of', 'easy', 'wind', 'and', 'downy', 'flake', 'the', 'woods', 'our', 'lovely', 'dark', 'and', 'deep', 'but', 'i', 'have', 'promises', 'to', 'keep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep']
Re-estimation of an FST is pretty similar to an HMM. We run the forward algorithm, then the backward algorithm, then compute xi=alpha otimes beta, and set each transition's weight equal to $w[t]=-\ln(\exp\xi_u/\sum_{u:n[u]=n[t]}\exp\xi_u)$.
importlib.reload(mp5)
#alpha = mp5.todo_fstforward(TLG_sorted)
alpha = { int(k):v for (k,v) in solutions['alpha'].items() }
print(alpha)
{0: 0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 5.616771097666572, 5: 5.616771097666572, 6: 5.616771097666572, 7: 5.616771097666572, 8: 5.616771097666572, 9: 11.233542195333143, 10: 11.233542195333143, 11: 11.233542195333143, 12: 16.157166112439768, 13: 11.233542195333143, 14: 16.157166112439768, 15: 16.850313292999715, 16: 16.157166112439768, 17: 16.850313292999715, 18: 16.157166112439768, 19: 16.850313292999715, 20: 20.38764284898645, 21: 20.38764284898645, 22: 25.311266766093077, 23: 25.311266766093077, 24: 25.311266766093077, 25: 25.311266766093077, 26: 25.311266766093077, 27: 30.92803786375965, 28: 30.92803786375965, 29: 35.85166178086627, 30: 35.85166178086627, 31: 35.85166178086627, 32: 41.468432878532845, 33: 41.468432878532845, 34: 41.468432878532845, 35: 41.468432878532845, 36: 45.1392938271441, 37: 45.1392938271441, 38: 45.1392938271441, 39: 45.1392938271441, 40: 50.75606492481067, 41: 50.75606492481067, 42: 50.75606492481067, 43: 55.27422373380914, 44: 55.27422373380914, 45: 55.27422373380914, 46: 59.504700470355814, 47: 59.504700470355814, 48: 59.504700470355814, 49: 64.42832438746244, 50: 64.42832438746244, 51: 64.42832438746244, 52: 64.42832438746244, 53: 64.42832438746244, 54: 64.42832438746244, 55: 70.04509548512901, 56: 70.04509548512901, 57: 70.04509548512901, 58: 75.66186658279558, 59: 75.66186658279558, 60: 75.66186658279558, 61: 80.18002539179405, 62: 80.18002539179405, 63: 80.18002539179405, 64: 80.18002539179405, 65: 85.79679648946062, 66: 85.79679648946062, 67: 85.79679648946062, 68: 85.79679648946062, 69: 91.41356758712719, 70: 91.41356758712719, 71: 91.41356758712719, 72: 97.03033868479376, 73: 97.03033868479376, 74: 97.03033868479376, 75: 101.26081542134044, 76: 101.26081542134044, 77: 101.26081542134044, 78: 101.26081542134044, 79: 101.26081542134044, 80: 106.87758651900701, 81: 101.26081542134044, 82: 106.87758651900701, 83: 101.26081542134044, 84: 106.87758651900701, 85: 106.87758651900701, 86: 106.87758651900701, 87: 106.87758651900701, 88: 112.49435761667358, 89: 112.49435761667358, 90: 112.49435761667358, 91: 115.91390413700394, 92: 115.91390413700394, 93: 115.91390413700394, 94: 115.91390413700394, 95: 121.53067523467051, 96: 121.53067523467051, 97: 121.53067523467051, 98: 121.53067523467051, 99: 125.20153618328177, 100: 125.20153618328177, 101: 125.20153618328177, 102: 125.20153618328177, 103: 125.20153618328177, 104: 130.81830728094835, 105: 130.81830728094835, 106: 130.81830728094835, 107: 130.81830728094835, 108: 136.43507837861492, 109: 136.43507837861492, 110: 136.43507837861492, 111: 142.0518494762815, 112: 142.0518494762815, 113: 142.0518494762815, 114: 142.0518494762815, 115: 147.66862057394806, 116: 147.66862057394806, 117: 147.66862057394806, 118: 147.66862057394806, 119: 153.28539167161463, 120: 153.28539167161463, 121: 153.28539167161463, 122: 157.5158684081613, 123: 157.5158684081613, 124: 157.5158684081613, 125: 157.5158684081613, 126: 157.5158684081613, 127: 163.13263950582788, 128: 163.13263950582788, 129: 163.13263950582788, 130: 163.13263950582788, 131: 163.13263950582788, 132: 168.74941060349445, 133: 168.74941060349445, 134: 168.74941060349445, 135: 168.74941060349445, 136: 168.74941060349445, 137: 174.36618170116103, 138: 174.36618170116103, 139: 174.36618170116103, 140: 174.36618170116103, 141: 174.36618170116103, 142: 179.9829527988276, 143: 179.9829527988276, 144: 179.9829527988276, 145: 185.59972389649417, 146: 185.59972389649417, 147: 185.59972389649417, 148: 185.59972389649417, 149: 185.59972389649417, 150: 191.21649499416074, 151: 191.21649499416074, 152: 191.21649499416074, 153: 194.6360415144911, 154: 194.6360415144911, 155: 194.6360415144911, 156: 194.6360415144911, 157: 194.6360415144911, 158: 200.25281261215767, 159: 200.25281261215767, 160: 200.25281261215767, 161: 200.25281261215767, 162: 205.86958370982424, 163: 200.25281261215767, 164: 205.86958370982424, 165: 200.25281261215767, 166: 205.86958370982424, 167: 205.86958370982424, 168: 210.79320762693087, 169: 210.79320762693087, 170: 210.79320762693087, 171: 210.79320762693087, 172: 210.79320762693087, 173: 216.40997872459744, 174: 216.40997872459744, 175: 216.40997872459744, 176: 216.40997872459744, 177: 222.026749822264, 178: 222.026749822264, 179: 222.026749822264, 180: 222.026749822264, 181: 227.64352091993058, 182: 227.64352091993058, 183: 227.64352091993058, 184: 232.5671448370372, 185: 227.64352091993058, 186: 232.5671448370372, 187: 227.64352091993058, 188: 232.5671448370372, 189: 227.64352091993058, 190: 227.64352091993058, 191: 233.26029201759715, 192: 233.26029201759715, 193: 233.26029201759715, 194: 238.18391593470378, 195: 238.18391593470378, 196: 238.18391593470378, 197: 238.18391593470378, 198: 238.18391593470378, 199: 243.80068703237035, 200: 243.80068703237035, 201: 248.72431094947697, 202: 243.80068703237035, 203: 248.72431094947697, 204: 243.80068703237035, 205: 247.80802021760283, 206: 247.80802021760283, 207: 247.80802021760283, 208: 247.80802021760283, 209: 247.80802021760283, 210: 247.80802021760283, 211: 253.4247913152694, 212: 253.4247913152694, 213: 253.4247913152694, 214: 253.4247913152694, 215: 259.04156241293595, 216: 259.04156241293595, 217: 259.04156241293595, 218: 263.96518633004257, 219: 263.96518633004257, 220: 263.96518633004257, 221: 263.96518633004257, 222: 263.96518633004257, 223: 269.58195742770914, 224: 263.96518633004257, 225: 269.58195742770914, 226: 263.96518633004257, 227: 274.50558134481577, 228: 263.96518633004257, 229: 274.50558134481577, 230: 269.58195742770914, 231: 274.50558134481577, 232: 269.58195742770914, 233: 269.58195742770914, 234: 269.58195742770914, 235: 269.58195742770914, 236: 269.58195742770914, 237: 275.1987285253757, 238: 275.1987285253757, 239: 280.12235244248234, 240: 275.1987285253757, 241: 280.12235244248234, 242: 279.4292052619224, 243: 279.4292052619224, 244: 279.4292052619224, 245: 284.352829179029, 246: 284.352829179029, 247: 284.352829179029, 248: 284.352829179029, 249: 289.27645309613564, 250: 289.27645309613564, 251: 289.27645309613564, 252: 293.7946119051341, 253: 293.7946119051341, 254: 293.7946119051341, 255: 293.7946119051341, 256: 293.7946119051341, 257: 299.4113830028007, 258: 299.4113830028007, 259: 299.4113830028007, 260: 299.4113830028007, 261: 303.08224395141195, 262: 303.08224395141195, 263: 303.08224395141195, 264: 303.08224395141195, 265: 303.08224395141195, 266: 303.08224395141195, 267: 303.08224395141195, 268: 308.6990150490785, 269: 308.6990150490785, 270: 308.6990150490785, 271: 308.6990150490785, 272: 308.6990150490785, 273: 314.3157861467451, 274: 314.3157861467451, 275: 319.2394100638517, 276: 319.2394100638517, 277: 319.2394100638517, 278: 319.2394100638517, 279: 324.8561811615183, 280: 324.8561811615183, 281: 324.8561811615183, 282: 328.27572768184865, 283: 328.27572768184865, 284: 328.27572768184865, 285: 328.27572768184865, 286: 333.8924987795152, 287: 333.8924987795152, 288: 333.8924987795152, 289: 339.5092698771818, 290: 339.5092698771818, 291: 339.5092698771818, 292: 339.5092698771818, 293: 344.4328937942884, 294: 344.4328937942884, 295: 344.4328937942884, 296: 348.9510526032869, 297: 348.9510526032869, 298: 348.9510526032869, 299: 348.9510526032869, 300: 354.56782370095345, 301: 354.56782370095345, 302: 354.56782370095345, 303: 354.56782370095345, 304: 354.56782370095345, 305: 354.56782370095345, 306: 354.56782370095345, 307: 360.18459479862, 308: 360.18459479862, 309: 360.18459479862, 310: 365.10821871572665, 311: 365.10821871572665, 312: 365.10821871572665, 313: 365.10821871572665, 314: 365.10821871572665, 315: 370.7249898133932, 316: 370.7249898133932, 317: 370.7249898133932, 318: 370.7249898133932, 319: 376.3417609110598, 320: 376.3417609110598, 321: 376.3417609110598, 322: 376.3417609110598, 323: 376.3417609110598, 324: 376.3417609110598, 325: 381.95853200872637, 326: 381.95853200872637, 327: 381.95853200872637, 328: 386.882155925833, 329: 386.882155925833, 330: 386.882155925833, 331: 386.882155925833, 332: 386.882155925833, 333: 392.49892702349956, 334: 392.49892702349956, 335: 397.4225509406062, 336: 392.49892702349956, 337: 397.4225509406062, 338: 396.72940376004624, 339: 396.72940376004624, 340: 396.72940376004624, 341: 396.72940376004624, 342: 402.3461748577128, 343: 402.3461748577128, 344: 402.3461748577128, 345: 402.3461748577128, 346: 407.26979877481944, 347: 402.3461748577128, 348: 407.26979877481944, 349: 407.9629459553794, 350: 407.9629459553794, 351: 412.886569872486, 352: 407.9629459553794, 353: 412.886569872486, 354: 407.9629459553794, 355: 411.97027914061186, 356: 411.97027914061186, 357: 411.97027914061186, 358: 411.97027914061186, 359: 416.8939030577185, 360: 411.97027914061186, 361: 417.58705023827844, 362: 417.58705023827844, 363: 417.58705023827844, 364: 417.58705023827844, 365: 417.58705023827844, 366: 423.203821335945, 367: 423.203821335945, 368: 423.203821335945, 369: 428.12744525305163, 370: 428.12744525305163, 371: 428.12744525305163, 372: 428.12744525305163, 373: 428.12744525305163, 374: 433.7442163507182, 375: 433.7442163507182, 376: 433.7442163507182, 377: 437.9746930872649, 378: 437.9746930872649, 379: 437.9746930872649, 380: 437.9746930872649, 381: 437.9746930872649, 382: 437.9746930872649, 383: 443.59146418493145, 384: 443.59146418493145, 385: 443.59146418493145, 386: 443.59146418493145, 387: 443.59146418493145, 388: 449.208235282598, 389: 443.59146418493145, 390: 449.208235282598, 391: 454.13185919970465, 392: 449.208235282598, 393: 454.13185919970465, 394: 449.208235282598, 395: 453.2155684678305, 396: 453.2155684678305, 397: 453.2155684678305, 398: 453.2155684678305, 399: 458.8323395654971, 400: 458.8323395654971, 401: 458.8323395654971, 402: 458.8323395654971, 403: 463.7559634826037, 404: 463.7559634826037, 405: 468.67958739971033, 406: 468.67958739971033, 407: 468.67958739971033, 408: 468.67958739971033, 409: 474.2963584973769, 410: 474.2963584973769, 411: 474.2963584973769, 412: 474.2963584973769, 413: 474.2963584973769, 414: 474.2963584973769, 415: 474.2963584973769, 416: 474.2963584973769, 417: 474.2963584973769, 418: 479.9131295950435, 419: 479.9131295950435, 420: 479.9131295950435, 421: 483.33267611537383, 422: 483.33267611537383, 423: 483.33267611537383, 424: 483.33267611537383, 425: 488.9494472130404, 426: 488.9494472130404, 427: 493.873071130147, 428: 488.9494472130404, 429: 493.873071130147, 430: 488.9494472130404, 431: 492.9567803982729, 432: 492.9567803982729, 433: 492.9567803982729, 434: 497.8804043153795, 435: 492.9567803982729, 436: 492.9567803982729, 437: 498.57355149593945, 438: 498.57355149593945, 439: 498.57355149593945, 440: 501.9930980162698, 441: 501.9930980162698, 442: 501.9930980162698, 443: 507.6098691139364, 444: 507.6098691139364, 445: 507.6098691139364, 446: 507.6098691139364, 447: 507.6098691139364, 448: 507.6098691139364, 449: 513.2266402116029, 450: 513.2266402116029, 451: 518.1502641287095, 452: 518.1502641287095, 453: 518.1502641287095, 454: 518.1502641287095, 455: 518.1502641287095, 456: 523.767035226376, 457: 523.767035226376, 458: 528.6906591434827, 459: 523.767035226376, 460: 528.6906591434827, 461: 523.767035226376, 462: 527.7743684116085, 463: 527.7743684116085, 464: 527.7743684116085, 465: 532.6979923287151, 466: 527.7743684116085, 467: 527.7743684116085, 468: 533.391139509275, 469: 533.391139509275, 470: 533.391139509275, 471: 536.8106860296053, 472: 536.8106860296053, 473: 536.8106860296053, 474: 542.427457127272, 475: 542.427457127272, 476: 542.427457127272, 477: 542.427457127272, 478: 542.427457127272, 479: 542.427457127272, 480: 548.0442282249385, 481: 548.0442282249385, 482: 552.9678521420451, 483: 552.9678521420451, 484: 552.9678521420451, 485: 552.9678521420451, 486: 552.9678521420451, 487: 558.5846232397116}
importlib.reload(mp5)
#beta = mp5.todo_fstbackward(TLG_sorted, TLGfinal_sorted)
beta = { int(k):v for (k,v) in solutions['beta'].items() }
print(beta)
{0: 558.5846232397118, 1: 558.5846232397118, 2: 558.5846232397118, 3: 558.5846232397118, 4: 552.9678521420453, 5: 552.9678521420453, 6: 552.9678521420453, 7: 552.9678521420453, 8: 552.9678521420453, 9: 547.3510810443788, 10: 547.3510810443788, 11: 547.3510810443788, 12: 543.1206043078321, 13: 548.0442282249387, 14: 543.1206043078321, 15: 542.4274571272722, 16: 543.1206043078321, 17: 542.4274571272722, 18: 543.1206043078321, 19: 542.4274571272722, 20: 538.1969803907255, 21: 538.1969803907255, 22: 533.2733564736188, 23: 533.2733564736188, 24: 533.2733564736188, 25: 533.2733564736188, 26: 533.2733564736188, 27: 527.6565853759523, 28: 527.6565853759523, 29: 522.7329614588457, 30: 522.7329614588457, 31: 522.7329614588457, 32: 517.1161903611791, 33: 517.1161903611791, 34: 517.1161903611791, 35: 517.1161903611791, 36: 513.4453294125678, 37: 513.4453294125678, 38: 513.4453294125678, 39: 513.4453294125678, 40: 507.8285583149012, 41: 507.8285583149012, 42: 507.8285583149012, 43: 503.31039950590275, 44: 503.31039950590275, 45: 503.31039950590275, 46: 499.0799227693561, 47: 499.0799227693561, 48: 499.0799227693561, 49: 494.15629885224945, 50: 494.15629885224945, 51: 494.15629885224945, 52: 494.15629885224945, 53: 494.15629885224945, 54: 494.15629885224945, 55: 488.5395277545829, 56: 488.5395277545829, 57: 488.5395277545829, 58: 482.9227566569163, 59: 482.9227566569163, 60: 482.9227566569163, 61: 478.40459784791784, 62: 478.40459784791784, 63: 478.40459784791784, 64: 478.40459784791784, 65: 472.78782675025127, 66: 472.78782675025127, 67: 472.78782675025127, 68: 472.78782675025127, 69: 467.1710556525847, 70: 467.1710556525847, 71: 467.1710556525847, 72: 461.5542845549181, 73: 461.5542845549181, 74: 461.5542845549181, 75: 457.32380781837145, 76: 457.32380781837145, 77: 457.32380781837145, 78: 457.32380781837145, 79: 457.32380781837145, 80: inf, 81: 457.32380781837145, 82: inf, 83: 457.32380781837145, 84: 451.7070367207049, 85: 451.7070367207049, 86: 451.7070367207049, 87: 451.7070367207049, 88: 446.0902656230383, 89: 446.0902656230383, 90: 446.0902656230383, 91: 442.67071910270795, 92: 442.67071910270795, 93: 442.67071910270795, 94: 442.67071910270795, 95: 437.0539480050414, 96: 437.0539480050414, 97: 437.0539480050414, 98: 437.0539480050414, 99: 433.3830870564301, 100: 433.3830870564301, 101: 433.3830870564301, 102: 433.3830870564301, 103: 433.3830870564301, 104: 427.7663159587635, 105: 427.7663159587635, 106: 427.7663159587635, 107: 427.7663159587635, 108: 422.14954486109696, 109: 422.14954486109696, 110: 422.14954486109696, 111: 416.5327737634304, 112: 416.5327737634304, 113: 416.5327737634304, 114: 416.5327737634304, 115: 410.9160026657638, 116: 410.9160026657638, 117: 410.9160026657638, 118: 410.9160026657638, 119: 405.29923156809724, 120: 405.29923156809724, 121: 405.29923156809724, 122: 401.06875483155056, 123: 401.06875483155056, 124: 401.06875483155056, 125: 401.06875483155056, 126: 401.06875483155056, 127: 395.451983733884, 128: 395.451983733884, 129: 395.451983733884, 130: 395.451983733884, 131: 395.451983733884, 132: 389.8352126362174, 133: 389.8352126362174, 134: 389.8352126362174, 135: 389.8352126362174, 136: 389.8352126362174, 137: 384.21844153855085, 138: 384.21844153855085, 139: 384.21844153855085, 140: 384.21844153855085, 141: 384.21844153855085, 142: 378.6016704408843, 143: 378.6016704408843, 144: 378.6016704408843, 145: 372.9848993432177, 146: 372.9848993432177, 147: 372.9848993432177, 148: 372.9848993432177, 149: 372.9848993432177, 150: 367.36812824555113, 151: 367.36812824555113, 152: 367.36812824555113, 153: 363.9485817252208, 154: 363.9485817252208, 155: 363.9485817252208, 156: 363.9485817252208, 157: 363.9485817252208, 158: 358.3318106275542, 159: 358.3318106275542, 160: 358.3318106275542, 161: 358.3318106275542, 162: inf, 163: 358.3318106275542, 164: inf, 165: 358.3318106275542, 166: 352.71503952988763, 167: 352.71503952988763, 168: 347.791415612781, 169: 347.791415612781, 170: 347.791415612781, 171: 347.791415612781, 172: 347.791415612781, 173: 342.17464451511444, 174: 342.17464451511444, 175: 342.17464451511444, 176: 342.17464451511444, 177: 336.55787341744787, 178: 336.55787341744787, 179: 336.55787341744787, 180: 336.55787341744787, 181: 330.9411023197813, 182: 330.9411023197813, 183: 330.9411023197813, 184: inf, 185: 330.9411023197813, 186: inf, 187: 330.9411023197813, 188: inf, 189: 330.9411023197813, 190: 330.9411023197813, 191: 325.3243312221147, 192: 325.3243312221147, 193: 325.3243312221147, 194: 320.4007073050081, 195: 320.4007073050081, 196: 320.4007073050081, 197: 320.4007073050081, 198: 320.4007073050081, 199: 314.7839362073415, 200: 314.7839362073415, 201: inf, 202: 314.7839362073415, 203: inf, 204: 314.7839362073415, 205: 310.77660302210904, 206: 310.77660302210904, 207: 310.77660302210904, 208: 310.77660302210904, 209: 310.77660302210904, 210: 310.77660302210904, 211: 305.15983192444247, 212: 305.15983192444247, 213: 305.15983192444247, 214: 305.15983192444247, 215: 299.5430608267759, 216: 299.5430608267759, 217: 299.5430608267759, 218: 294.6194369096693, 219: 294.6194369096693, 220: 294.6194369096693, 221: 294.6194369096693, 222: 294.6194369096693, 223: inf, 224: 294.6194369096693, 225: inf, 226: 294.6194369096693, 227: inf, 228: 294.6194369096693, 229: inf, 230: 289.0026658120027, 231: inf, 232: 289.0026658120027, 233: 289.0026658120027, 234: 289.0026658120027, 235: 289.0026658120027, 236: 289.0026658120027, 237: 283.38589471433613, 238: 283.38589471433613, 239: inf, 240: 283.38589471433613, 241: inf, 242: 279.15541797778945, 243: 279.15541797778945, 244: 279.15541797778945, 245: 274.23179406068283, 246: 274.23179406068283, 247: 274.23179406068283, 248: 274.23179406068283, 249: 269.3081701435762, 250: 269.3081701435762, 251: 269.3081701435762, 252: 264.79001133457774, 253: 264.79001133457774, 254: 264.79001133457774, 255: 264.79001133457774, 256: 264.79001133457774, 257: 259.17324023691117, 258: 259.17324023691117, 259: 259.17324023691117, 260: 259.17324023691117, 261: 255.5023792882999, 262: 255.5023792882999, 263: 255.5023792882999, 264: 255.5023792882999, 265: 255.5023792882999, 266: 255.5023792882999, 267: 255.5023792882999, 268: 249.88560819063332, 269: 249.88560819063332, 270: 249.88560819063332, 271: 249.88560819063332, 272: 249.88560819063332, 273: 244.26883709296675, 274: 244.26883709296675, 275: 239.34521317586012, 276: 239.34521317586012, 277: 239.34521317586012, 278: 239.34521317586012, 279: 233.72844207819355, 280: 233.72844207819355, 281: 233.72844207819355, 282: 230.3088955578632, 283: 230.3088955578632, 284: 230.3088955578632, 285: 230.3088955578632, 286: 224.69212446019662, 287: 224.69212446019662, 288: 224.69212446019662, 289: 219.07535336253005, 290: 219.07535336253005, 291: 219.07535336253005, 292: 219.07535336253005, 293: 214.15172944542343, 294: 214.15172944542343, 295: 214.15172944542343, 296: 209.63357063642496, 297: 209.63357063642496, 298: 209.63357063642496, 299: 209.63357063642496, 300: 204.0167995387584, 301: 204.0167995387584, 302: 204.0167995387584, 303: 204.0167995387584, 304: 204.0167995387584, 305: 204.0167995387584, 306: 204.0167995387584, 307: 198.40002844109182, 308: 198.40002844109182, 309: 198.40002844109182, 310: 193.4764045239852, 311: 193.4764045239852, 312: 193.4764045239852, 313: 193.4764045239852, 314: 193.4764045239852, 315: 187.85963342631862, 316: 187.85963342631862, 317: 187.85963342631862, 318: 187.85963342631862, 319: 182.24286232865205, 320: 182.24286232865205, 321: 182.24286232865205, 322: 182.24286232865205, 323: 182.24286232865205, 324: 182.24286232865205, 325: 176.62609123098548, 326: 176.62609123098548, 327: 176.62609123098548, 328: 171.70246731387886, 329: 171.70246731387886, 330: 171.70246731387886, 331: 171.70246731387886, 332: 171.70246731387886, 333: 166.08569621621228, 334: 166.08569621621228, 335: inf, 336: 166.08569621621228, 337: inf, 338: 161.8552194796656, 339: 161.8552194796656, 340: 161.8552194796656, 341: 161.8552194796656, 342: 156.23844838199904, 343: 156.23844838199904, 344: 156.23844838199904, 345: 156.23844838199904, 346: inf, 347: 156.23844838199904, 348: inf, 349: 150.62167728433246, 350: 150.62167728433246, 351: inf, 352: 150.62167728433246, 353: inf, 354: 150.62167728433246, 355: 146.61434409909998, 356: 146.61434409909998, 357: 146.61434409909998, 358: 146.61434409909998, 359: inf, 360: 146.61434409909998, 361: 140.9975730014334, 362: 140.9975730014334, 363: 140.9975730014334, 364: 140.9975730014334, 365: 140.9975730014334, 366: 135.38080190376684, 367: 135.38080190376684, 368: 135.38080190376684, 369: 130.4571779866602, 370: 130.4571779866602, 371: 130.4571779866602, 372: 130.4571779866602, 373: 130.4571779866602, 374: 124.84040688899364, 375: 124.84040688899364, 376: 124.84040688899364, 377: 120.60993015244696, 378: 120.60993015244696, 379: 120.60993015244696, 380: 120.60993015244696, 381: 120.60993015244696, 382: 120.60993015244696, 383: 114.99315905478039, 384: 114.99315905478039, 385: 114.99315905478039, 386: 114.99315905478039, 387: 114.99315905478039, 388: 109.37638795711382, 389: inf, 390: 109.37638795711382, 391: inf, 392: 109.37638795711382, 393: inf, 394: 109.37638795711382, 395: 105.36905477188135, 396: 105.36905477188135, 397: 105.36905477188135, 398: 105.36905477188135, 399: 99.75228367421478, 400: 99.75228367421478, 401: 99.75228367421478, 402: 99.75228367421478, 403: 94.82865975710816, 404: 94.82865975710816, 405: 89.90503584000153, 406: 89.90503584000153, 407: 89.90503584000153, 408: 89.90503584000153, 409: 84.28826474233496, 410: 84.28826474233496, 411: 84.28826474233496, 412: 84.28826474233496, 413: 84.28826474233496, 414: 84.28826474233496, 415: 84.28826474233496, 416: 84.28826474233496, 417: 84.28826474233496, 418: 78.67149364466839, 419: 78.67149364466839, 420: 78.67149364466839, 421: 75.25194712433803, 422: 75.25194712433803, 423: 75.25194712433803, 424: 75.25194712433803, 425: 69.63517602667146, 426: 69.63517602667146, 427: inf, 428: 69.63517602667146, 429: inf, 430: 69.63517602667146, 431: 65.62784284143899, 432: 65.62784284143899, 433: 65.62784284143899, 434: inf, 435: 65.62784284143899, 436: 65.62784284143899, 437: 60.01107174377242, 438: 60.01107174377242, 439: 60.01107174377242, 440: 56.59152522344207, 441: 56.59152522344207, 442: 56.59152522344207, 443: 50.9747541257755, 444: 50.9747541257755, 445: 50.9747541257755, 446: 50.9747541257755, 447: 50.9747541257755, 448: 50.9747541257755, 449: 45.357983028108926, 450: 45.357983028108926, 451: 40.4343591110023, 452: 40.4343591110023, 453: 40.4343591110023, 454: 40.4343591110023, 455: 40.4343591110023, 456: 34.81758801333573, 457: 34.81758801333573, 458: inf, 459: 34.81758801333573, 460: inf, 461: 34.81758801333573, 462: 30.81025482810326, 463: 30.81025482810326, 464: 30.81025482810326, 465: inf, 466: 30.81025482810326, 467: 30.81025482810326, 468: 25.19348373043669, 469: 25.19348373043669, 470: 25.19348373043669, 471: 21.77393721010634, 472: 21.77393721010634, 473: 21.77393721010634, 474: 16.157166112439768, 475: 16.157166112439768, 476: 16.157166112439768, 477: 16.157166112439768, 478: 16.157166112439768, 479: 16.157166112439768, 480: 10.540395014773198, 481: 10.540395014773198, 482: 5.616771097666572, 483: 5.616771097666572, 484: 5.616771097666572, 485: 5.616771097666572, 486: 5.616771097666572, 487: 0}
importlib.reload(mp5)
LG_re,LGfinal_re=mp5.fstreestimate(TLG_sorted, L, Lfinal, alpha, beta)
print(LG_re)
[(0, 'ə', '', 2.723798558070257, 1), (1, '', 'a', 3.1780538303480625, 0), (0, 'ˌei', '', inf, 2), (2, '', 'a', 0.0, 0), (0, 'b', '', 4.158883083359569, 3), (3, 'ˌaʊ', '', inf, 4), (4, 't', '', 0.0, 5), (5, '', 'about', inf, 0), (1, 'b', '', 2.261763098473807, 6), (6, 'ˈaʊ', '', 0.0, 7), (7, 't', '', 0.0, 8), (8, '', 'about', inf, 0), (0, 'ˌɑ', '', inf, 9), (9, 'l', '', 0.0, 10), (10, 'ɛ', '', inf, 11), (11, 'k', '', 0.0, 12), (12, 's', '', 0.0, 13), (13, 'ˈɑ', '', 0.0, 14), (14, 'n', '', 0.0, 15), (15, 'd', '', 0.0, 16), (16, 'ɹ', '', 0.0, 17), (17, 'ɑ', '', inf, 18), (18, 'v', '', 0.0, 19), (19, 'ɪ', '', 0.0, 20), (20, 'tʃ', '', 0.0, 21), (21, '', 'aleksandrovich', inf, 0), (0, 'ˌæ', '', inf, 22), (22, 'l', '', 0.0, 23), (23, 'ɪ', '', 0.0, 24), (24, 'g', '', 0.0, 25), (25, 'z', '', 0.0, 26), (26, 'ˈæ', '', 0.0, 27), (27, 'n', '', 0.0, 28), (28, 'd', '', 0.0, 29), (29, 'ɚ', '', 3.091042453358341, 30), (30, '', 'alexander', inf, 0), (29, 'ə', '', 0.04652001563488284, 31), (31, 'ɹ', '', 0.0, 32), (32, '', 'alexander', inf, 0), (0, 'ɑ', '', inf, 33), (33, 'l', '', 0.0, 34), (34, '', 'all', inf, 0), (0, 'ˈɔ', '', 5.768320995793715, 35), (35, 'l', '', 0.8823891801985155, 36), (36, '', 'all', inf, 0), (1, 'n', '', 1.0379876668516772, 37), (37, 'd', '', 0.0, 38), (38, '', 'and', 0.0, 0), (0, 'ˈæ', '', 5.075173815233825, 39), (39, 'n', '', 1.4294665329850886, 40), (40, 'd', '', 0.0, 41), (41, '', 'and', 0.0, 0), (0, 'ˈɑ', '', 3.3704257229953782, 42), (42, 'ɹ', '', 0.7239188392267124, 43), (43, '', 'are', 0.6931471805598903, 0), (39, 'z', '', 1.31824089787483, 44), (44, '', 'as', inf, 0), (0, 'ˈɛ', '', 5.075173815233825, 45), (45, 'z', '', 0.8622235106038261, 46), (46, '', 'as', inf, 0), (39, 's', '', 1.3723081191451456, 47), (47, 'k', '', 0.0, 48), (48, '', 'ask', 0.0, 0), (1, 't', '', 1.0379876668516772, 49), (49, '', 'at', inf, 0), (39, 't', '', 1.4294665329850886, 50), (50, '', 'at', inf, 0), (3, 'i', '', 2.9549102790336974, 51), (51, '', 'be', inf, 0), (3, 'ˈei', '', 3.178053830347949, 52), (52, '', 'be', inf, 0), (3, 'ˈi', '', 1.9252908618525453, 53), (53, '', 'be', inf, 0), (3, 'ɪ', '', 1.7311348474115675, 54), (54, 'f', '', 1.232143681292655, 55), (55, 'ˈoʊ', '', 0.0645385211375924, 56), (56, 'ɹ', '', 0.0, 57), (57, '', 'before', 0.0, 0), (55, 'ˈɔ', '', 2.7725887222399024, 58), (58, 'ɹ', '', 0.0, 59), (59, '', 'before', 0.0, 0), (3, 'ˌi', '', 3.871201010907953, 60), (60, 'f', '', 0.0, 61), (61, 'ˈɔ', '', 0.0, 62), (62, 'ɹ', '', 0.0, 63), (63, '', 'before', 0.0, 0), (3, 'ˈɛ', '', 3.871201010907953, 64), (64, 'l', '', 0.0, 65), (65, 'z', '', 0.0, 66), (66, '', 'bells', 0.0, 0), (51, 't', '', 0.0, 67), (67, 'w', '', 0.0, 68), (68, 'ˈi', '', 0.0, 69), (69, 'n', '', 0.0, 70), (70, '', 'between', 0.0, 0), (54, 't', '', 0.3448404862916732, 71), (71, 'w', '', 0.0, 72), (72, 'ˈi', '', 0.0, 73), (73, 'n', '', 0.0, 74), (74, '', 'between', 0.0, 0), (3, 'ˈʊ', '', 3.178053830347949, 75), (75, 'k', '', 0.0, 76), (76, '', 'book', inf, 0), (3, 'ɹ', '', 1.7917594692280545, 77), (77, 'ˈoʊ', '', 0.0, 78), (78, 'k', '', 0.0, 79), (79, '', 'broke', inf, 0), (3, 'ə', '', 1.519825753744385, 80), (80, 't', '', 0.0, 81), (81, '', 'but', 0.0, 0), (3, 'ˈʌ', '', 3.178053830347949, 82), (82, 't', '', 0.0, 83), (83, '', 'but', 0.0, 0), (3, 'ˈɑɪ', '', 2.6184380424124356, 84), (84, '', 'by', inf, 0), (0, 'k', '', 3.3704257229953782, 85), (85, 'ˈɑ', '', 1.572396640753709, 86), (86, 'm', '', 0.0, 87), (87, 'j', '', 0.0, 88), (88, 'ə', '', 0.0, 89), (89, 'n', '', 0.0, 90), (90, 'ɪ', '', 0.8043728156701491, 91), (91, 's', '', 0.0, 92), (92, 't', '', 0.0, 93), (93, '', 'communist', inf, 0), (90, 'ə', '', 0.5930637220029666, 94), (94, 's', '', 0.0, 95), (95, 't', '', 0.0, 96), (96, '', 'communist', inf, 0), (85, 'ɹ', '', 1.1977031913122573, 97), (97, 'ˈɑɪ', '', 0.0, 98), (98, 'd', '', 0.0, 99), (99, '', 'cried', inf, 0), (0, 'z', '', 2.8238820166271807, 100), (100, 'ˈɑ', '', 0.0, 101), (101, 'ɹ', '', 0.0, 102), (102, '', 'czar', 0.0, 0), (0, 'd', '', 3.060270794691519, 103), (103, 'ˈɑ', '', 1.572396640753709, 104), (104, 'ɹ', '', 0.0, 105), (105, 'k', '', 0.0, 106), (106, '', 'dark', 3.091042453358341, 0), (106, 'ə', '', 0.04652001563488284, 107), (107, 's', '', 0.0, 108), (108, 't', '', 0.0, 109), (109, '', 'darkest', 0.0, 0), (103, 'ˈi', '', 1.331234583936748, 110), (110, 'p', '', 0.0, 111), (111, '', 'deep', 0.0, 0), (103, 'ˈɑɪ', '', 2.0243817644966384, 112), (112, 'd', '', 0.0, 113), (113, '', 'died', inf, 0), (103, 'ˈaʊ', '', 2.3608540011179002, 114), (114, 'n', '', 0.0, 115), (115, '', 'down', inf, 0), (115, 'i', '', 0.0, 116), (116, '', 'downy', 0.0, 0), (103, 'ɹ', '', 1.1977031913122573, 117), (117, 'ə', '', 0.0, 118), (118, 'm', '', 0.0, 119), (119, 'ˈæ', '', 0.0, 120), (120, 'ɾ', '', inf, 121), (121, 'ɪ', '', 0.0, 122), (122, 'k', '', 0.0, 123), (123, '', 'dramatic', inf, 0), (0, 'ˈi', '', 3.1292636661784172, 124), (124, 'z', '', 0.3136575588549704, 125), (125, 'i', '', 0.0, 126), (126, '', 'easy', 0.0, 0), (0, 'ˈei', '', 4.382026634673821, 127), (127, 't', '', 0.0, 128), (128, 'ˈi', '', 0.0, 129), (129, 'n', '', 0.0, 130), (130, '', 'eighteen', inf, 0), (45, 'm', '', 1.6094379124341458, 131), (131, 'p', '', 0.0, 132), (132, 'ɚ', '', 3.091042453358341, 133), (133, 'ɚ', '', 0.0, 134), (134, '', 'emperor', inf, 0), (132, 'ə', '', 0.04652001563488284, 135), (135, 'ɹ', '', 0.0, 136), (136, 'ə', '', 0.0, 137), (137, 'ɹ', '', 0.0, 138), (138, '', 'emperor', inf, 0), (45, 'n', '', 0.9734491457140848, 139), (139, 'd', '', 0.0, 140), (140, '', 'end', inf, 0), (0, 'ɛ', '', inf, 141), (141, 'n', '', 0.0, 142), (142, 't', '', 0.0, 143), (143, 'ˈɑɪ', '', 0.0, 144), (144, 'ɚ', '', 0.0, 145), (145, '', 'entire', inf, 0), (0, 'ɪ', '', 2.9351076517374395, 146), (146, 'n', '', 1.415281897993168, 147), (147, 't', '', 0.9954280524328851, 148), (148, 'ˈɑɪ', '', 0.0, 149), (149, 'ɚ', '', 0.0, 150), (150, '', 'entire', inf, 0), (0, 'ˌɛ', '', inf, 151), (151, 'n', '', 0.0, 152), (152, 't', '', 0.0, 153), (153, 'ˌɑɪ', '', inf, 154), (154, 'ɹ', '', 0.0, 155), (155, '', 'entire', inf, 0), (124, 'v', '', 1.3121863889662109, 156), (156, 'n', '', 0.0, 157), (157, 'ɪ', '', 0.0, 158), (158, 'ŋ', '', 0.0, 159), (159, '', 'evening', 0.0, 0), (0, 'f', '', 3.8224108467384212, 160), (160, 'ˈæ', '', 3.8066624897703605, 161), (161, 'm', '', 0.0, 162), (162, 'ə', '', 0.451985123743043, 163), (163, 'l', '', 0.0, 164), (164, 'i', '', 0.0, 165), (165, '', 'family', inf, 0), (162, 'l', '', 1.0116009116785563, 166), (166, 'i', '', 0.0, 167), (167, '', 'family', inf, 0), (160, 'ˈɑ', '', 2.101914397531914, 168), (168, 'ɹ', '', 0.5947071077466717, 169), (169, 'm', '', 0.0, 170), (170, '', 'farm', 0.0, 0), (168, 'ð', '', 0.8023464725249596, 171), (171, 'ɚ', '', 3.091042453358341, 172), (172, '', 'father', inf, 0), (171, 'ə', '', 0.04652001563488284, 173), (173, 'ɹ', '', 0.0, 174), (174, '', 'father', inf, 0), (160, 'ˈɪ', '', 2.1972245773362147, 175), (175, 'l', '', 0.0, 176), (176, '', 'fill', 0.0, 0), (160, 'l', '', 2.014903020542306, 177), (177, 'ˈei', '', 0.0, 178), (178, 'k', '', 0.0, 179), (179, '', 'flake', 0.0, 0), (160, 'oʊ', '', inf, 180), (180, 'ɹ', '', 0.0, 181), (181, '', 'four', inf, 0), (160, 'ˈɔ', '', 4.499809670330251, 182), (182, 'ɹ', '', 0.0, 183), (183, '', 'four', inf, 0), (160, 'ɹ', '', 1.7272209480904621, 184), (184, 'ˈoʊ', '', 0.0, 185), (185, 'z', '', 0.0, 186), (186, 'n̩', '', 0.0, 187), (187, '', 'frozen', 0.0, 0), (0, 'g', '', 4.669708707125551, 188), (188, 'ˈɪ', '', 0.9162907318742555, 189), (189, 'v', '', 0.0, 190), (190, 'z', '', 0.0, 191), (191, '', 'gives', 0.0, 0), (188, 'ˈoʊ', '', 0.5108256237659816, 192), (192, '', 'go', 2.2512917986065304, 0), (192, 'ɪ', '', 0.11122563511025874, 193), (193, 'n', '', 0.21130909366718242, 194), (194, '', 'going', inf, 0), (193, 'ŋ', '', 1.6582280766035637, 195), (195, '', 'going', inf, 0), (0, 'h', '', 3.2834143460057703, 196), (196, 'ˈæ', '', 3.688879454114044, 197), (197, 'd', '', 0.72593700338291, 198), (198, '', 'had', inf, 0), (197, 'p', '', 1.2367626271488916, 199), (199, 'n̩', '', 0.0, 200), (200, '', 'happen', inf, 0), (196, 'ˈɑ', '', 1.9841313618755976, 201), (201, 'ɹ', '', 0.0, 202), (202, 'n', '', 0.0, 203), (203, 'ɪ', '', 0.0, 204), (204, 's', '', 0.0, 205), (205, '', 'harness', 0.0, 0), (197, 'v', '', 1.4880770554298124, 206), (206, '', 'have', 0.0, 0), (196, 'ˈi', '', 1.7429693050586366, 207), (207, '', 'he', 0.0, 0), (196, 'ˈʌ', '', 2.99573227355404, 208), (208, '', 'he', 0.0, 0), (196, 'ˈɛ', '', 3.688879454114044, 209), (209, 'l', '', 0.0, 210), (210, 'p', '', 0.0, 211), (211, '', 'help', inf, 0), (196, 'ɚ', '', 4.3820266346739345, 212), (212, 'ɹ', '', 0.0, 213), (213, '', 'her', inf, 0), (196, 'ˈɝ', '', inf, 214), (214, '', 'her', inf, 0), (196, 'ˈɪ', '', 2.0794415416798984, 215), (215, 'ɹ', '', 0.4462871026283892, 216), (216, '', 'here', 0.0, 0), (196, 'ɪ', '', 1.5488132906176588, 217), (217, 'm', '', 1.1349799328390873, 218), (218, '', 'him', inf, 0), (215, 'm', '', 1.0216512475319632, 219), (219, '', 'him', inf, 0), (146, 'm', '', 2.051270664713229, 220), (220, '', 'him', inf, 0), (217, 'z', '', 0.38776553100876754, 221), (221, '', 'his', 0.0, 0), (196, 'ˈɔ', '', 4.3820266346739345, 222), (222, 'ɹ', '', 0.0, 223), (223, 's', '', 0.0, 224), (224, '', 'horse', 0.0, 0), (196, 'ˈaʊ', '', 2.7725887222397887, 225), (225, 's', '', 0.7205461547480354, 226), (226, '', 'house', 1.945910149055294, 0), (225, 'z', '', 0.6664789334777197, 227), (227, '', 'house', 0.0, 0), (226, 'h', '', 0.1541506798272394, 228), (228, 'ˌoʊ', '', inf, 229), (229, 'l', '', 0.0, 230), (230, 'd', '', 0.0, 231), (231, '', 'household', inf, 0), (0, 'ˈɑɪ', '', 3.8224108467383076, 232), (232, '', 'i', 0.0, 0), (146, 'f', '', 2.30258509299415, 233), (233, '', 'if', 0.0, 0), (0, 'ˈɪ', '', 3.465735902799679, 234), (234, 'n', '', 0.0, 235), (235, '', 'in', 0.0, 0), (147, 'k', '', 1.4307461236908239, 236), (236, 'l', '', 0.0, 237), (237, 'u', '', inf, 238), (238, 'd', '', 0.0, 239), (239, 'ɪ', '', 0.0, 240), (240, 'ŋ', '', 0.0, 241), (241, '', 'including', inf, 0), (237, 'ˈu', '', 0.0, 242), (242, 'd', '', 0.0, 243), (243, 'ɪ', '', 0.0, 244), (244, 'ŋ', '', 0.0, 245), (245, '', 'including', inf, 0), (147, 's', '', 0.9382696385929421, 246), (246, 't', '', 0.0, 247), (247, 'ɹ', '', 0.0, 248), (248, 'ˈʌ', '', 0.0, 249), (249, 'k', '', 0.0, 250), (250, 't', '', 0.0, 251), (251, 'ɪ', '', 0.0, 252), (252, 'v', '', 0.0, 253), (253, '', 'instructive', inf, 0), (0, 'ˌɪ', '', inf, 254), (254, 'n', '', 0.0, 255), (255, 't', '', 0.0, 256), (256, 'ɹ', '', 0.0, 257), (257, 'oʊ', '', inf, 258), (258, 'd', '', 0.0, 259), (259, 'ˈu', '', 0.0, 260), (260, 's', '', 0.0, 261), (261, 't', '', 0.0, 262), (262, '', 'introduced', inf, 0), (257, 'ə', '', 0.0, 263), (263, 'd', '', 0.0, 264), (264, 'ˈu', '', 0.0, 265), (265, 's', '', 0.0, 266), (266, 't', '', 0.0, 267), (267, '', 'introduced', inf, 0), (146, 'z', '', 1.3040562628829093, 268), (268, '', 'is', 0.0, 0), (146, 't', '', 1.415281897993168, 269), (269, '', 'it', 2.9444389791665344, 0), (269, 's', '', 0.05406722127031571, 270), (270, '', 'its', inf, 0), (0, 'dʒ', '', 5.768320995793715, 271), (271, 'u', '', inf, 272), (272, 'l', '', 0.0, 273), (273, 'ˈɑɪ', '', 0.0, 274), (274, '', 'july', inf, 0), (271, 'ə', '', 0.0, 275), (275, 'l', '', 0.0, 276), (276, 'ˈɑɪ', '', 0.0, 277), (277, '', 'july', inf, 0), (271, 'ˌu', '', inf, 278), (278, 'l', '', 0.0, 279), (279, 'ˈɑɪ', '', 0.0, 280), (280, '', 'july', inf, 0), (85, 'ˈi', '', 1.331234583936748, 281), (281, 'p', '', 0.0, 282), (282, '', 'keep', 0.0, 0), (0, 'n', '', 2.9351076517374395, 283), (283, 'ˈoʊ', '', 1.2431935174791988, 284), (284, '', 'know', 0.0, 0), (0, 'l', '', 3.2834143460057703, 285), (285, 'ˈei', '', 1.5040773967763243, 286), (286, 'k', '', 0.0, 287), (287, '', 'lake', 0.0, 0), (285, 'ˈɝ', '', inf, 288), (288, 'n', '', 0.6632942174102254, 289), (289, 'ɪ', '', 0.0, 290), (290, 'ŋ', '', 0.0, 291), (291, '', 'learning', inf, 0), (288, 'ɹ', '', 0.7239188392267124, 292), (292, 'n', '', 0.0, 293), (293, 'ɪ', '', 0.0, 294), (294, 'ŋ', '', 0.0, 295), (295, '', 'learning', inf, 0), (285, 'ˈɪ', '', 0.5877866649021826, 296), (296, 't', '', 0.0, 297), (297, 'l̩', '', 0.0, 298), (298, '', 'little', 0.0, 0), (296, 'ɾ', '', inf, 299), (299, 'l̩', '', 0.0, 300), (300, '', 'little', 0.0, 0), (285, 'ˈʌ', '', 1.5040773967763243, 301), (301, 'v', '', 0.0, 302), (302, 'l', '', 0.0, 303), (303, 'i', '', 0.0, 304), (304, '', 'lovely', 0.0, 0), (0, 'm', '', 3.5710964184575005, 305), (305, 'ˈæ', '', 3.5115454388310354, 306), (306, 's', '', 0.0, 307), (307, 'ə', '', 0.0, 308), (308, 'k', '', 0.0, 309), (309, 'ɚ', '', 3.091042453358341, 310), (310, '', 'massacre', inf, 0), (309, 'ə', '', 0.04652001563488284, 311), (311, 'ɹ', '', 0.0, 312), (312, '', 'massacre', inf, 0), (305, 'ˈi', '', 1.5656352897756278, 313), (313, '', 'me', 0.6931471805598903, 0), (305, 'ˈɑɪ', '', 2.258782470335518, 314), (314, 'l̩', '', 1.6739764335716245, 315), (315, 'z', '', 0.0, 316), (316, '', 'miles', 0.0, 0), (314, 'l', '', 0.28768207245184385, 317), (317, 'z', '', 0.0, 318), (318, '', 'miles', 0.0, 0), (305, 'ɪ', '', 1.37147927533465, 319), (319, 's', '', 0.0, 320), (320, 't', '', 0.0, 321), (321, 'ˈei', '', 0.0, 322), (322, 'k', '', 0.0, 323), (323, '', 'mistake', 0.0, 0), (305, 'oʊ', '', inf, 324), (324, 'ɹ', '', 0.0, 325), (325, '', 'more', inf, 0), (305, 'ˈɔ', '', 4.204692619390926, 326), (326, 'ɹ', '', 0.0, 327), (327, '', 'more', inf, 0), (305, 'ˈu', '', 4.204692619390926, 328), (328, 'v', '', 0.0, 329), (329, 'm', '', 0.0, 330), (330, 'n̩', '', 0.0, 331), (331, 't', '', 0.0, 332), (332, '', 'movement', inf, 0), (305, 'ə', '', 1.1601701816674677, 333), (333, 's', '', 0.0, 334), (334, 't', '', 0.0, 335), (335, '', 'must', 0.0, 0), (305, 'ˈʌ', '', 2.8183982582710314, 336), (336, 's', '', 0.0, 337), (337, 't', '', 0.0, 338), (338, '', 'must', 0.0, 0), (313, '', 'my', 0.6931471805598903, 0), (314, '', 'my', 2.7725887222399024, 0), (283, 'ˌi', '', 3.2580965380216185, 339), (339, 'ɹ', '', 0.0, 340), (340, '', 'near', 0.0, 0), (283, 'ˈɪ', '', 1.6486586255874727, 341), (341, 'ɹ', '', 0.5232481437644765, 342), (342, '', 'near', 0.0, 0), (283, 'ˈɛ', '', 3.2580965380216185, 343), (343, 'v', '', 0.0, 344), (344, 'ɚ', '', 3.091042453358341, 345), (345, '', 'never', inf, 0), (344, 'ə', '', 0.04652001563488284, 346), (346, 'ɹ', '', 0.0, 347), (347, '', 'never', inf, 0), (341, 'k', '', 0.8979415932059283, 348), (348, 'ə', '', 0.451985123743043, 349), (349, 'l', '', 0.0, 350), (350, 'ə', '', 0.0, 351), (351, 's', '', 0.0, 352), (352, '', 'nicholas', inf, 0), (348, 'l', '', 1.0116009116785563, 353), (353, 'ə', '', 0.0, 354), (354, 's', '', 0.0, 355), (355, '', 'nicholas', inf, 0), (283, 'ˈɑɪ', '', 2.005333569526101, 356), (356, 'n', '', 0.0, 357), (357, 't', '', 0.0, 358), (358, 'ˈi', '', 0.3053816495512365, 359), (359, 'n', '', 0.0, 360), (360, '', 'nineteen', inf, 0), (358, 'i', '', 1.3350010667323886, 361), (361, '', 'ninety', inf, 0), (283, 'ˈɑ', '', 1.5533484457831719, 362), (362, 't', '', 0.0, 363), (363, '', 'not', 0.0, 0), (283, 'oʊ', '', inf, 364), (364, 'v', '', 0.0, 365), (365, 'ˈɛ', '', 0.0, 366), (366, 'm', '', 0.0, 367), (367, 'b', '', 0.0, 368), (368, 'ɚ', '', 3.091042453358341, 369), (369, '', 'november', inf, 0), (368, 'ə', '', 0.04652001563488284, 370), (370, 'ɹ', '', 0.0, 371), (371, '', 'november', inf, 0), (283, 'ˈaʊ', '', 2.341805806147363, 372), (372, '', 'now', inf, 0), (1, 'v', '', 1.925290861852659, 373), (373, '', 'of', 0.0, 0), (0, 'ˈoʊ', '', 3.060270794691405, 374), (374, 'l', '', 0.8823891801985155, 375), (375, 'd', '', 0.0, 376), (376, '', 'old', inf, 0), (42, 'n', '', 0.6632942174102254, 377), (377, '', 'on', inf, 0), (35, 'n', '', 0.5340824859301847, 378), (378, '', 'on', inf, 0), (374, 'n', '', 0.5340824859301847, 379), (379, 'l', '', 0.0, 380), (380, 'i', '', 0.0, 381), (381, '', 'only', 0.0, 0), (0, 'ˈʌ', '', 4.382026634673821, 382), (382, 'ð', '', 0.5260930958968402, 383), (383, 'ɚ', '', 3.091042453358341, 384), (384, '', 'other', 0.0, 0), (383, 'ə', '', 0.04652001563488284, 385), (385, 'ɹ', '', 0.0, 386), (386, '', 'other', 0.0, 0), (0, 'ˈaʊ', '', 4.158883083359569, 387), (387, 'ɚ', '', 2.8332133440562757, 388), (388, '', 'our', 0.0, 0), (387, 'ɹ', '', 0.06062462181648698, 389), (389, '', 'our', 0.0, 0), (43, '', 'our', 0.6931471805598903, 0), (0, 'p', '', 3.5710964184575005, 390), (390, 'ˈɝ', '', inf, 391), (391, 's', '', 0.6359887667199473, 392), (392, 'ɪ', '', 0.0, 393), (393, 'n', '', 0.0, 394), (394, 'ɪ', '', 0.0, 395), (395, 'l', '', 0.0, 396), (396, '', 'personal', inf, 0), (391, 'ɹ', '', 0.7537718023763773, 397), (397, 's', '', 0.0, 398), (398, 'ə', '', 0.0, 399), (399, 'n', '', 0.0, 400), (400, 'l̩', '', 0.0, 401), (401, '', 'personal', inf, 0), (160, 'ə', '', 1.4552872326067927, 402), (402, 'z', '', 0.0, 403), (403, 'ˈɪ', '', 0.0, 404), (404, 'ʃ', '', 0.0, 405), (405, 'n̩', '', 0.0, 406), (406, '', 'physician', inf, 0), (160, 'ɪ', '', 1.6665963262739751, 407), (407, 'z', '', 0.0, 408), (408, 'ˈɪ', '', 0.0, 409), (409, 'ʃ', '', 0.0, 410), (410, 'n̩', '', 0.0, 411), (411, '', 'physician', inf, 0), (390, 'ɹ', '', 0.0, 412), (412, 'ˈɑ', '', 0.0, 413), (413, 'm', '', 0.0, 414), (414, 'ə', '', 0.0, 415), (415, 's', '', 0.0, 416), (416, 'ə', '', 0.0, 417), (417, 'z', '', 0.0, 418), (418, '', 'promises', 0.0, 0), (85, 'w', '', 1.4853852637641012, 419), (419, 'ˈɪ', '', 0.0, 420), (420, 'ɹ', '', 0.0, 421), (421, '', 'queer', 0.0, 0), (0, 'ɹ', '', 2.9957322735539265, 422), (422, 'ɑ', '', inf, 423), (423, 'n', '', 0.0, 424), (424, '', 'ran', inf, 0), (422, 'ˈæ', '', 2.6026896854444885, 425), (425, 'n', '', 0.0, 426), (426, '', 'ran', inf, 0), (422, 'i', '', 1.686398953570233, 427), (427, 'ˈæ', '', 0.0, 428), (428, 'k', '', 0.0, 429), (429, 'ʃ', '', 0.0, 430), (430, 'n̩', '', 0.0, 431), (431, '', 'reaction', inf, 0), (422, 'ˈoʊ', '', 0.5877866649020689, 432), (432, 'm', '', 0.0, 433), (433, 'ə', '', 0.0, 434), (434, 'n', '', 0.0, 435), (435, 'ˌɔ', '', inf, 436), (436, 'f', '', 0.6931471805598903, 437), (437, '', 'romanov', inf, 0), (436, 'v', '', 0.6931471805598903, 438), (438, '', 'romanov', inf, 0), (422, 'ˈu', '', 3.295836866004379, 439), (439, 'l', '', 0.0, 440), (440, '', 'rule', inf, 0), (422, 'ˈʌ', '', 1.9095425048844845, 441), (441, 'ʃ', '', 0.0, 442), (442, 'ə', '', 0.0, 443), (443, '', 'russia', inf, 0), (0, 's', '', 2.8779492378974965, 444), (444, 'ˈɛ', '', 4.069026754237939, 445), (445, 'k', '', 0.3746934494414518, 446), (446, 'n̩', '', 0.0, 447), (447, '', 'second', inf, 0), (447, 'd', '', 0.0, 448), (448, '', 'second', inf, 0), (444, 'ˈi', '', 2.123116605182531, 449), (449, '', 'see', 0.0, 0), (0, 'ʃ', '', 5.768320995793715, 450), (450, 'ˈei', '', 1.55814461804664, 451), (451, 'k', '', 0.0, 452), (452, '', 'shake', 0.0, 0), (450, 'ˈoʊ', '', 0.23638877806422443, 453), (453, 'l', '', 0.0, 454), (454, 'd', '', 0.0, 455), (455, 'ɚ', '', 3.091042453358341, 456), (456, '', 'shoulder', inf, 0), (455, 'ə', '', 0.04652001563488284, 457), (457, 'ɹ', '', 0.0, 458), (458, '', 'shoulder', inf, 0), (444, 'ˈɑɪ', '', 2.8162637857424215, 459), (459, 'm', '', 0.0, 460), (460, 'n̩', '', 0.0, 461), (461, '', 'simon', inf, 0), (444, 'ˈɪ', '', 2.459588841803793, 462), (462, 's', '', 0.4769240720902417, 463), (463, 't', '', 0.0, 464), (464, 'ɚ', '', 3.091042453358341, 465), (465, '', 'sister', inf, 0), (464, 'ə', '', 0.04652001563488284, 466), (466, 'ɹ', '', 0.0, 467), (467, '', 'sister', inf, 0), (444, 'ɪ', '', 1.9289605907415535, 468), (468, 'k', '', 0.0, 469), (469, 's', '', 0.0, 470), (470, '', 'six', inf, 0), (462, 'k', '', 0.9694005571881235, 471), (471, 's', '', 0.0, 472), (472, '', 'six', inf, 0), (444, 'l', '', 2.2772672850098843, 473), (473, 'ˈi', '', 0.0, 474), (474, 'p', '', 0.0, 475), (475, '', 'sleep', 0.0, 0), (444, 'n', '', 1.9289605907415535, 476), (476, 'ˈoʊ', '', 0.0, 477), (477, '', 'snow', 0.0, 0), (444, 'ˈʌ', '', 3.375879573677935, 478), (478, 'm', '', 0.0, 479), (479, '', 'some', 0.0, 0), (444, 'ˈaʊ', '', 3.1527360223636833, 480), (480, 'n', '', 0.0, 481), (481, 'd', '', 0.8183103235139697, 482), (482, 'z', '', 0.0, 483), (483, '', 'sounds', 0.0, 0), (481, 'z', '', 0.5819215454496316, 484), (484, '', 'sounds', 0.0, 0), (444, 't', '', 1.9289605907415535, 485), (485, 'ɑ', '', inf, 486), (486, 't', '', 0.0, 487), (487, '', 'start', inf, 0), (485, 'ˈɑ', '', 0.08701137698960792, 488), (488, 'ɹ', '', 0.4462871026283892, 489), (489, 't', '', 0.0, 490), (490, '', 'start', inf, 0), (488, 'p', '', 1.0216512475319632, 491), (491, '', 'stop', 2.8903717578962187, 0), (491, 'ɪ', '', 0.05715841383994302, 492), (492, 'ŋ', '', 0.0, 493), (493, '', 'stopping', 0.0, 0), (485, 'ˈɔ', '', 2.484906649787945, 494), (494, 'ɹ', '', 0.0, 495), (495, 'i', '', 0.0, 496), (496, 'z', '', 0.0, 497), (497, '', 'stories', inf, 0), (444, 'w', '', 2.2772672850098843, 498), (498, 'ˈi', '', 0.0, 499), (499, 'p', '', 0.0, 500), (500, '', 'sweep', 0.0, 0), (0, 't', '', 2.9351076517374395, 501), (501, 'ˈɛ', '', 3.4011973816622003, 502), (502, 'ɹ', '', 0.5596157879353996, 503), (503, 'z', '', 0.0, 504), (504, '', 'tears', inf, 0), (501, 'ˈɪ', '', 1.7917594692280545, 505), (505, 'ɹ', '', 0.0, 506), (506, 'z', '', 0.0, 507), (507, '', 'tears', inf, 0), (502, 'l', '', 0.8472978603872434, 508), (508, 'ɪ', '', 0.0, 509), (509, 'ŋ', '', 0.0, 510), (510, '', 'telling', inf, 0), (501, 'ɛ', '', inf, 511), (511, 'm', '', 0.0, 512), (512, 't', '', 0.4248831939652291, 513), (513, 'ˈei', '', 0.0, 514), (514, 'ʃ', '', 0.0, 515), (515, 'n̩', '', 0.0, 516), (516, '', 'temptation', inf, 0), (512, 'p', '', 1.06087196068529, 517), (517, 't', '', 0.0, 518), (518, 'ˈei', '', 0.0, 519), (519, 'ʃ', '', 0.0, 520), (520, 'n̩', '', 0.0, 521), (521, '', 'temptation', inf, 0), (0, 'ð', '', 3.2033716383322144, 522), (522, 'ə', '', 1.01592057282312, 523), (523, 't', '', 0.3654597734944218, 524), (524, '', 'that', inf, 0), (522, 'ˈæ', '', 3.3672958299866877, 525), (525, 't', '', 0.0, 526), (526, '', 'that', inf, 0), (523, '', 'the', 1.1837700970083915, 0), (522, 'ˈi', '', 1.4213856809312801, 527), (527, '', 'the', 1.2622417124499634, 0), (522, 'ˈʌ', '', 2.6741486494266837, 528), (528, '', 'the', 0.0, 0), (522, 'ˈɛ', '', 3.3672958299866877, 529), (529, 'ɹ', '', 0.0, 530), (530, '', 'there', 0.0, 0), (527, 'z', '', 0.332705753825735, 531), (531, '', 'these', 0.0, 0), (0, 'ɵ', '', 5.075173815233825, 532), (532, 'ˈɪ', '', 0.0, 533), (533, 'ŋ', '', 0.0, 534), (534, 'k', '', 0.0, 535), (535, '', 'think', 0.0, 0), (522, 'ˈoʊ', '', 1.352392809444268, 536), (536, '', 'though', 0.0, 0), (501, 'ˈoʊ', '', 1.3862943611197807, 537), (537, '', 'to', 1.0986122886680505, 0), (501, 'ˈu', '', 4.094344562222091, 538), (538, '', 'to', 0.0, 0), (501, 'ˈʌ', '', 2.7080502011021963, 539), (539, '', 'to', 0.0, 0), (537, 'l', '', 0.40546510810827385, 540), (540, 'd', '', 0.0, 541), (541, '', 'told', inf, 0), (501, 'ɹ', '', 1.321755839982302, 542), (542, 'ɑɪ', '', inf, 543), (543, 'ˈʌ', '', 0.0, 544), (544, 'm', '', 0.0, 545), (545, 'f', '', 0.0, 546), (546, 'n̩', '', 0.0, 547), (547, 't', '', 0.0, 548), (548, '', 'triumphant', inf, 0), (501, 'w', '', 1.6094379124341458, 549), (549, 'ˈɛ', '', 0.0, 550), (550, 'n', '', 0.0, 551), (551, 'i', '', 1.4816045409241951, 552), (552, '', 'twenty', inf, 0), (551, 't', '', 0.25782910930206526, 553), (553, 'i', '', 0.0, 554), (554, '', 'twenty', inf, 0), (382, 'p', '', 0.8938178760221263, 555), (555, '', 'up', 0.0, 0), (0, 'v', '', 3.8224108467384212, 556), (556, 'ˈɪ', '', 0.0, 557), (557, 'l', '', 0.0, 558), (558, 'ɪ', '', 0.0, 559), (559, 'dʒ', '', 0.0, 560), (560, '', 'village', 0.0, 0), (0, 'w', '', 3.2834143460057703, 561), (561, 'ˈɔ', '', 4.3438054218537445, 562), (562, 'n', '', 0.750305594399947, 563), (563, 'ɪ', '', 0.6931471805598903, 564), (564, 'd', '', 0.0, 565), (565, '', 'wanted', inf, 0), (561, 'ˈɑ', '', 1.9459101490554076, 566), (566, 'n', '', 0.7777045685880921, 567), (567, 't', '', 0.0, 568), (568, 'ə', '', 0.0, 569), (569, 'd', '', 0.0, 570), (570, '', 'wanted', inf, 0), (563, 't', '', 0.6931471805598903, 571), (571, 'ɪ', '', 0.0, 572), (572, 'd', '', 0.0, 573), (573, '', 'wanted', inf, 0), (561, 'ə', '', 1.2992829841302864, 574), (574, 'z', '', 0.0, 575), (575, '', 'was', inf, 0), (566, 'z', '', 0.6664789334778334, 576), (576, '', 'was', inf, 0), (562, 'z', '', 0.6390799592896883, 577), (577, '', 'was', inf, 0), (566, 'tʃ', '', 3.610917912644368, 578), (578, '', 'watch', 0.0, 0), (196, 'w', '', 1.8971199848859897, 579), (579, 'ˈʌ', '', 1.3862943611198943, 580), (580, 't', '', 0.0, 581), (581, '', 'what', inf, 0), (561, 'ˈʌ', '', 2.95751106073385, 582), (582, 't', '', 0.0, 583), (583, '', 'what', inf, 0), (561, 'ˌɑ', '', inf, 584), (584, 't', '', 0.0, 585), (585, '', 'what', inf, 0), (49, '', 'what', inf, 0), (579, 'ˈɛ', '', 2.0794415416798984, 586), (586, 'n', '', 0.0, 587), (587, '', 'when', inf, 0), (579, 'ˈɪ', '', 0.47000362924575256, 588), (588, 'n', '', 0.05715841383994302, 589), (589, '', 'when', inf, 0), (561, 'ˈɛ', '', 3.650658241293854, 590), (590, 'n', '', 0.0, 591), (591, '', 'when', inf, 0), (561, 'ˈɪ', '', 2.0412203288597084, 592), (592, 'n', '', 0.05715841383994302, 593), (593, '', 'when', inf, 0), (142, '', 'when', inf, 0), (588, 'tʃ', '', 2.8903717578962187, 594), (594, '', 'which', inf, 0), (592, 'tʃ', '', 2.8903717578962187, 595), (595, '', 'which', inf, 0), (146, 'tʃ', '', 4.248495242049444, 596), (596, '', 'which', inf, 0), (196, 'ˈu', '', 4.3820266346739345, 597), (597, 'z', '', 0.0, 598), (598, '', 'whose', 0.0, 0), (561, 'ɪ', '', 1.5105920777974688, 599), (599, 'l', '', 0.8109302162163203, 600), (600, '', 'will', 0.0, 0), (593, 'd', '', 0.0, 601), (601, '', 'wind', 0.0, 0), (561, 'ˈɑɪ', '', 2.397895272798337, 602), (602, 'n', '', 0.0, 603), (603, 'd', '', 0.0, 604), (604, '', 'wind', 0.0, 0), (599, 'ð', '', 0.7308875085427644, 605), (605, '', 'with', 1.7917594692280545, 0), (599, 'ɵ', '', 2.602689685444375, 606), (606, '', 'with', 0.0, 0), (605, 'ˈaʊ', '', 0.1823215567939087, 607), (607, 't', '', 0.0, 608), (608, '', 'without', 0.0, 0), (561, 'ˈʊ', '', 2.95751106073385, 609), (609, 'd', '', 0.0, 610), (610, 'z', '', 0.0, 611), (611, '', 'woods', 0.0, 0), (0, 'j', '', 5.768320995793715, 612), (612, 'ˌi', '', 1.7917594692280545, 613), (613, 'ɹ', '', 0.0, 614), (614, '', 'year', 0.0, 0), (612, 'ˈɪ', '', 0.1823215567939087, 615), (615, 'ɹ', '', 0.0, 616), (616, '', 'year', 0.0, 0), (305, 'ˌɔ', '', inf, 617), (617, 'n', '', 0.0, 618), (618, 't', '', 0.0, 619), (619, 'ə', '', 0.0, 620), (620, 'n', '', 0.0, 621), (621, 'ˈoʊ', '', 0.0, 622), (622, 'ɹ', '', 0.0, 623), (623, 'i', '', 0.0, 624), (624, '', 'montenori', inf, 0), (350, 'ɑɪ', '', inf, 625), (625, '', 'nicholai', inf, 0), (437, 's', '', 0.0, 626), (626, '', 'romanovs', inf, 0), (445, 'b', '', 1.163150809805643, 627), (627, 'æ', '', inf, 628), (628, 'g', '', 0.0, 629), (629, '', 'sebag', inf, 0)]
Finally, let's see if this re-estimation fixed the mistake!
importlib.reload(mp5)
TLG_re,TLGfinal_re = mp5.todo_fstcompose(T,Tfinal,LG_re,LGfinal_re)
TLG_sortre,TLGfinal_sortre = mp5.todo_sort_topologically(TLG_re,TLGfinal_re)
delta, psi, bestpath_re = mp5.todo_fstbestpath(TLG_sortre,TLGfinal_sortre)
print([ t[2] for t in bestpath_re if t[2]!='' ])
['whose', 'woods', 'the', 'czar', 'i', 'think', 'i', 'know', 'his', 'house', 'is', 'in', 'the', 'village', 'though', 'he', 'will', 'not', 'see', 'me', 'stopping', 'here', 'to', 'watch', 'his', 'woods', 'fill', 'up', 'with', 'snow', 'me', 'little', 'horse', 'must', 'think', 'it', 'queer', 'to', 'stop', 'without', 'a', 'farm', 'house', 'near', 'between', 'the', 'woods', 'and', 'frozen', 'lake', 'the', 'darkest', 'evening', 'of', 'the', 'year', 'he', 'gives', 'his', 'harness', 'bells', 'a', 'shake', 'to', 'ask', 'if', 'there', 'is', 'some', 'mistake', 'the', 'only', 'other', 'sounds', 'the', 'sweep', 'of', 'easy', 'wind', 'and', 'downy', 'flake', 'the', 'woods', 'are', 'lovely', 'dark', 'and', 'deep', 'but', 'i', 'have', 'promises', 'to', 'keep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep', 'and', 'miles', 'to', 'go', 'before', 'i', 'sleep']
Still there! Maybe it would be a good idea to use language model training data that has some topical similarity to the speech we want to recognize.