I know that this also a tensorfow question, but tensorflow has poor documentation on this.
This is what I understand from some external documentation.
In general this layer is capable to (1) Implement a softmax layer to convert the output to a probability distribution over symbols (2) eliminate repeated characters between the blank symbols that comes from the acoustic model (3) Implement beam search on a prefix (character) tree and extract the most probable sequence (4) Optionally this output can be fed into the language model that is responsible to “correct” this output in word level based on known word sequences.
If the above are correct my questions are :
(1) Where is the prefix tree that tf.nn.ctc_beam_search_decoder includes ? Or it does not ?
(2) Is the Language Model indeed used only in word level? Is it possible to be used in character level?
(3) I removed the tf.nn.ctc_beam_search_decoder from the protobuf file and implement it later on the pipeline, however, the results using tf.nn.ctc_beam_search_decoder inside and outside the protobuf are different (why is that happening ?? )
(4) Is the Language model necessary while training ? Isn’t the purpose to run the LM on top of the Acoustic model as a standalone module ??
Thanks in advance !