Hello Team, I was looking at the last year blog https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/ from @reuben regarding change in Mozilla architecture. I am writing this query to confirm, if Mozilla still uses the same architecture. When I looked in the code, I realized that now DeepSpeech uses 6 layers. Could you please confirm if my understanding about current architecture is correct:
3 fully connected layers (dense) -> uni-directional RNN layer -> fully connected layer (dense) -> output layer (fully-connected)
The hidden fully connected layers use the ReLU activation. The RNN layer tanh activation.