Fine tuning pre-trained checkpoint model

I’ve downloaded the pre-trained checkpoint model from the official website.toegther with that I want to add my own wav files for training. so I continue training from the checkpoint as mentioned in https://github.com/mozilla/DeepSpeech#continuing-training-from-a-release-model . I’m using the following command to use fine tuning/transfer learning.
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir …/…/…/deepspeech-0.4.1-checkpoint/ --epochs 4 --train _files trainFiles/trainthousandrows.csv --dev_files trainFiles/valthousandrows.csv --test_files trainFiles/testthousandrows.csv --learning_rate 0.0001
But when run this gives the following output:

/home/glmr/glmShare/atuldata/speechTrainingData/extractedData/DeepSpeech/trainFiles/valthousandrows.csv
Preprocessing [‘trainFiles/trainthousandrows.csv’]
inside preprocess
trainFiles/trainthousandrows.csv
Preprocessing done
Preprocessing [‘trainFiles/valthousandrows.csv’]
inside preprocess
/trainFiles/valthousandrows.csv
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
trainFiles/testthousandrows.csv’]
inside preprocess
trainFiles/testthousandrows.csv
Preprocessing done
Computing acoustic model predictions…
100% (1000 of 1000) |############################################################################################################| Elapsed Time: 0:30:26 Time: 0:30:26
Decoding predictions…
100% (1000 of 1000) |############################################################################################################| Elapsed Time: 0:03:05 Time: 0:03:05
Test - WER: 0.339791, CER: 8.394000, loss: 69.201622

WER: 2.000000, CER: 5.000000, loss: 32.247986

  • src: “bread”
  • res: “true brad”

WER: 1.200000, CER: 17.000000, loss: 70.571228

  • src: “volunteer please visit librivox org”
  • res: “or do volunteer please visit ly provand dog or”

WER: 1.142857, CER: 23.000000, loss: 120.662392

  • src: “recording by winfred henson s wwf by”
  • res: “recording by winfried henson double u u u d that life”

WER: 1.000000, CER: 2.000000, loss: 2.363688

  • src: “man”
  • res: “a man”

WER: 1.000000, CER: 2.000000, loss: 4.948270

  • src: “sins”
  • res: “sen”

WER: 1.000000, CER: 3.000000, loss: 5.357125

  • src: “died”
  • res: “he died”

WER: 1.000000, CER: 4.000000, loss: 8.052462

  • src: “them”
  • res: “”

WER: 1.000000, CER: 3.000000, loss: 8.746547

  • src: “eye”
  • res: “a”

WER: 1.000000, CER: 4.000000, loss: 16.740582

  • src: “said that”
  • res: "had the "

WER: 1.000000, CER: 6.000000, loss: 18.676899

  • src: “galileans received him”
  • res: “the gale leans received him”

Why is the training not starting for the checkpoint model??

Read the README section for fine tuning for the version of the code you’re running. In particular, the --epoch flag should be a negative value when starting from an existing checkpoint.

1 Like

Thanks for your answer @reuben. I understand that we need to provide negative value for epoch when starting the training from an existing checkpoint. But What I want to do is that take the pre-trained model, add my own dataset to it which is different from what the checkpoint model is trained on and then resume the training. For such cases also we need to provide negative epochs?

The documentation for the --epoch flag should clarify what negative values represent. Does it answer your question?

Thanks @reuben

Hello @atuljha18

  1. DeepSpeech’s requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files.
  2. All you have to do is generate CSV files for your splits with three columns wav_filename , wav_filesize and transcript that specify the path to the WAV file, its size, and the corresponding transcript text for each of your train, validation and test splits.

Whatever the way is easy, you can follow your audio data to convert into this format.

I am using -3 as epochs but getting same output. How could that be

Have you using --epoch or --epochs?

I have used epochs
and I am using 0.4.1 branch

> python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir fine_tuning_checkpoints/ --epochs -3 --train_files clips/train.csv --dev_files clips/dev.csv --test_files clips/test.csv --learning_rate 0.00001 --export_dir model/

Please change --epochs to --epoch

thanks that was the problem

@DarisettySuneel I am using --epochs as per the in the document util/flags.py is that correct or should i need to change --epoch.I am using deepseech 0.5.1

There is no --epoch flag in v0.5.1, as you can see by searching in util/flags.py: https://github.com/mozilla/DeepSpeech/blob/v0.5.1/util/flags.py

1 Like