Setting language model weight to 0 gives different results for different language models

reuben · September 29, 2019, 12:35pm

Setting lm_alpha and lm_beta to 0 is not a suitable way to disable LM scoring. As some have already mentioned here, the clients only enable the LM if you pass the flags. As for the Python code, just pass scorer=None to the decoder calls.

SamahZaro · September 29, 2019, 5:53pm

run_singleshot_clean_final_v3.sh data/recorded/po6vfbxbnyduz0k9.wav

with --lm_alpha 0.75 --lm_beta 1.85:

ءَلَيهِم وَيَتُووبِ عَلَيهِم وَللَدِيييين

with --lm_alpha 0 --lm_beta 0:

ءَلَيهِم جَيلَ زُوو فِعَلَيهِم وَلَلءِيين

with scorer=None:

ءَلَيهِم جَيمَڧزُوو بِعَلَيهِم وَلَڟڟَآآللِيييين

Those are right-to-left, however they are special characters which are not readable anyway

I just need to test my model without any language model, and not restricting to any bag of words. The second option was not satisfying for me regarding the generated results and seems not discarding the LM/trie. The third option was very satisfying, but it is very slow.

Thank you.

SamahZaro · September 29, 2019, 6:01pm

Great, you confirmed what I saw in my tests’ results.
I am very satisfied with the results of scorer=None. But, it takes a long time to finish decoding. Instead of 00:30 when using scorer, 12:30 hours were needed using scorer=Noneon my test data. I am using the default --beam_width value.

Any help in this would be much appreciated.
Thank you @reuben

lissyx · September 29, 2019, 6:46pm

This is just one result. Having different results depending on the values of alpha and beta is expected. So far, it seems you imply that over subsequent runs with 0.0 for both values, you get different decoding. This is what I’m curious about.

The LM and trie have a play in the speed of the decoding, that’s expected. Try reducing the beam width.

SamahZaro · September 29, 2019, 7:26pm

with --lm_alpha 0 --lm_beta 0, using one LM:

ءَلَيهِم جَيلَ زُوو فِعَلَيهِم وَلَلءِيين

with --lm_alpha 0 --lm_beta 0, using another LM:

عَلَيهِم جَمَعُوو عَلَيهِم وَلَڞڞَآآللِيييين

reuben · September 30, 2019, 9:27am

The performance decrease is a direct consequence from disabling LM scoring. Without a LM, the model will explore every beam it can create (within the beam width limit), rather than ignoring beams that lead to out-of-vocabulary words (since in that case it has no constrained vocabulary). If at all possible, you should try to create a LM that matches your use case. If that’s not doable, you can reduce the beam_width as @lissyx has mentioned, and also use cutoff_prob/cutoff_top_n to trade performance for accuracy. I’d start by setting cutoff_prob=0.99 and seeing what that gets you.