I need some clarification on ignore-longer-outputs-than-inputs flag

@kdavis @reuben I was training data I scraped from youtube and its cc aka vtt aka subtitle as transcript on deepspeech 0.5.0 model when I get this error.

Not enough time for target transition sequence (required: 102, available: 0)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

I gave ignore_longer_outputs_than_inputs=True this flag in tf.nn.ctc_loss and model started training again but I need some clarification on this.

what does it mean?..

why i get this error… it might be true that my transcript is not 100% match to audio but I remember giving this model completely wrong transcript and it still trained on it,
and how to know how many training sample its ignoring after giving this flag. what if its skipping over all of the sample because I am not seeing even slightest effect on model after training all day…

So far there’s no better solution than either filtering on min / max length and / or do some binary search to find offending samples.

how do i filter on min/max length. Sry I did not fully understand that. :roll_eyes::grimacing:
how do i find offending samples error do not specify anything about on which sample it is stuck…

You can look at the data directly. If the audio is too short for its transcript, it won’t work. Audio windows have a 20ms step between them, so to get the number of windows from an audio file you can just divide its duration by 20ms, and then compare that with the length of the transcript.

1 Like

Good answer. However, the CTC loss calculation, as far as I know, adds blank character ‘-’ between repetitive characters of the transcript or something like this… this will make comparing with the length of the transcript just an indicator but not accurate. @reuben, what do you think?

I don’t think CTC blanks are relevant here.

@reuben, @lissyx : I am using Deep Speech v0.5.0, and I am also encountering this error. I have set ignore_longer_outputs_than_inputs=True

total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

Now, when I run the training my Training Loss is always infinity. Kindly guide, how to resolve it?

Epoch 0 | Training | Elapsed Time: 0:12:42 | Steps: 1142 | Loss: inf
Epoch 0 | Validation | Elapsed Time: 0:01:39 | Steps: 163 | Loss: 146.396210 | Dataset: …/german-speech-corpus/data_mailabs/dev.csv
I Saved new best validating model with loss 146.396210 to: /home/agarwal/.local/share/deepspeech/checkpoints/best_dev-1142
Epoch 1 | Training | Elapsed Time: 0:12:32 | Steps: 1142 | Loss: inf
Epoch 1 | Validation | Elapsed Time: 0:00:58 | Steps: 163 | Loss: 131.277453 | Dataset: …/german-speech-corpus/data_mailabs/dev.csv
WARNING:tensorflow:From /home/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I Saved new best validating model with loss 131.277453 to: /home/agarwal/.local/share/deepspeech/checkpoints/best_dev-2284
Epoch 2 | Training | Elapsed Time: 0:12:33 | Steps: 1142 | Loss: inf
Epoch 2 | Validation | Elapsed Time: 0:00:58 | Steps: 163 | Loss: 125.264005 | Dataset: …/german-speech-corpus/data_mailabs/dev.csv
I Saved new best validating model with loss 125.264005 to: /home/agarwal/.local/share/deepspeech/checkpoints/best_dev-3426
Epoch 3 | Training | Elapsed Time: 0:12:34 | Steps: 1142 | Loss: inf
Epoch 3 | Validation | Elapsed Time: 0:00:58 | Steps: 163 | Loss: 128.504051 | Dataset: …/german-speech-corpus/data_mailabs/dev.csv
Epoch 4 | Training | Elapsed Time: 0:08:50 | Steps: 918 | Loss: inf
(env) agarwal@wika:~/DeepSpeech$

@lissyx, could you please help on the above issue. Even after setting the flag, it didn’t work.

The training loss is inf and validation loss is decreasing. I am using German-Mailabs dataset.