Maximum length

Deepspeech can transcribe any length of audio file. I have used pre trained model to transcribe a 16 minutes audio file and it gives the output which is also longer. But my question is how does it change the shape of its output tensors and input tensors?
Is there a fixed size of its tensor or dynamic?

1 Like

Hello, isnt output sensors always the same, amount of alphabets ? So, thats not going to change depending on audio file size. Input however is padded to meet the size of audio … so its dynamic. Correct me if Im wrong.

1 Like