I am trying to limit out of vocabulary words for my custom LM but for that I needed to understand exactly what do lm alpha and lm beta stand for.
They control the way the language model will act. util/flags.py
gives more context, you can read that in --helpfull
. Please give us feedback if it is still unclear.
This may be due to the fact I am out of practice in C and C++, but can someone please explain how the alpha and beta values cause any effect in the decoder?
I traced it to here: https://github.com/mozilla/DeepSpeech/blob/1eaec6eb5e92323b5d97bdfa6e41502179bfe8a1/native_client/ctcdecode/scorer.h#L99
They then get set here:
But they never seem to be actually used? If so, where?
Does path_trie.cpp use it somehow?
Thank you!
Here?
native_client/ctcdecode/ctc_beam_search_decoder.cpp: score = ext_scorer_->get_log_cond_prob(ngram, bos) * ext_scorer_->alpha;
native_client/ctcdecode/ctc_beam_search_decoder.cpp: approx_ctc -= (ext_scorer_->get_sent_log_prob(words)) * ext_scorer_->alpha;
Well shoot. Must have missed that when searching for the keyword alpha in the repo. Thank you!