What is the significance of lm alpha and lm beta?

rajpuneet.sandhu · November 19, 2019, 7:27pm

I am trying to limit out of vocabulary words for my custom LM but for that I needed to understand exactly what do lm alpha and lm beta stand for.

lissyx · November 20, 2019, 1:26pm

They control the way the language model will act. util/flags.py gives more context, you can read that in --helpfull. Please give us feedback if it is still unclear.

MattC_eostar · December 20, 2019, 9:48pm

This may be due to the fact I am out of practice in C and C++, but can someone please explain how the alpha and beta values cause any effect in the decoder?

I traced it to here: https://github.com/mozilla/DeepSpeech/blob/1eaec6eb5e92323b5d97bdfa6e41502179bfe8a1/native_client/ctcdecode/scorer.h#L99

They then get set here:

github.com

mozilla/DeepSpeech/blob/1eaec6eb5e92323b5d97bdfa6e41502179bfe8a1/native_client/ctcdecode/scorer.cpp#L270




    // Only increment window start position after we have a full window
    if (win_size == max_order_) {
      win_start++;
    }
  }


  return score / NUM_FLT_LOGE;
}


void Scorer::reset_params(float alpha, float beta)
{
  this->alpha = alpha;
  this->beta = beta;
}


std::vector<std::string> Scorer::split_labels_into_scored_units(const std::vector<int>& labels)
{
  if (labels.empty()) return {};


  std::string s = alphabet_.LabelsToString(labels);

But they never seem to be actually used? If so, where?
Does path_trie.cpp use it somehow?

github.com

mozilla/DeepSpeech/blob/1eaec6eb5e92323b5d97bdfa6e41502179bfe8a1/native_client/ctcdecode/path_trie.cpp

#include "path_trie.h"

#include <algorithm>
#include <limits>
#include <memory>
#include <utility>
#include <vector>

#include "decoder_utils.h"

PathTrie::PathTrie() {
  log_prob_b_prev = -NUM_FLT_INF;
  log_prob_nb_prev = -NUM_FLT_INF;
  log_prob_b_cur = -NUM_FLT_INF;
  log_prob_nb_cur = -NUM_FLT_INF;
  log_prob_c = -NUM_FLT_INF;
  score = -NUM_FLT_INF;

  ROOT_ = -1;
  character = ROOT_;

This file has been truncated. show original

Thank you!

lissyx · December 20, 2019, 9:52pm

Here?

native_client/ctcdecode/ctc_beam_search_decoder.cpp:        score = ext_scorer_->get_log_cond_prob(ngram, bos) * ext_scorer_->alpha;
native_client/ctcdecode/ctc_beam_search_decoder.cpp:      approx_ctc -= (ext_scorer_->get_sent_log_prob(words)) * ext_scorer_->alpha;

MattC_eostar · December 20, 2019, 9:58pm

Well shoot. Must have missed that when searching for the keyword alpha in the repo. Thank you!