In the release notes of DeepSpeech 0.5.1, it is mentioned that the model was trained for 467356 steps or 75 epochs. Going by the batch training batch size of 24
steps_per_epoch = 467356 / 75 ≅ 6231
total_number_of_samples = 6231 * 24 = 149544
Did the model achieve a word error rate of 8.22% by training only on 149544 audio samples? And what was the maximum audio length (in seconds) considered while processing the dataset?