Fine-tuning DeepSpeech v0.5.1 pretrained models with TIMIT database

I am fine-tuning DeepSpeech v 0.5.1 pre-trained models with TIMIT database using the below command.

python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir /home/iiit_admin/DeepSpeech-0.5.1/DeepSpeech-0.5.1/checkpoint/deepspeech-0.5.1-checkpoint/ --epochs 3 --train_files /home/iiit_admin/Desktop/TIMIT/timit_train.csv --test_files /home/iiit_admin/Desktop/TIMIT/timit_test.csv --learning_rate 0.0001

Epoch 0 is itself taking more than 20 hours. Is this the proper way,approximately how long does it take to fine tune with TIMIT dataset?

How big is this dataset ? What’s your hardware ?

Also, you’re not specifying a batch size, which means it’s using the default of 1. You should tune it to a bigger value that still fits within your GPU memory.

@lissyx @reuben Thanks for your reply.
TIMIT database is around 4.5 hours .My GPU is GeForce GTX 1080ti with 2 GPU cards 11GB each.
What is the best batch size to be specified?

As documented, batch size depends on your hardware and dataset, we can’t help more.

So 4.5h of audio, I’m not sure how much fine tuning you can get. That being said, with 2x 1080Ti with appropriately set batch size, I think it should not take more the 30 min. But again, that depends on your hardware, you only shared GPUs…