Hi i have a paid azure vm deployed specially for training Mozilla Deep Speech models with common voice dataset so that speech recognition will be very accurate.
My VM Specs:
6 cores vcpu
56 gb ram
356 gb ssd
1 x Nvidia Tesla K80 Gpu
Cost( $ 1.23 / hour)
But i am a newbie to Deep Speech and i wanted to clarify my approach and understanding so that i do not want to waste the paid hours of my vm and end up paying for nothing.
So what i wanted is to train the common voice data set with best accuracy.
I am aware of the guide for training common voice data set.
Hence I am following the following approach:
cd DeepSpeech
pip3 install -r requirements.txt
pip3 install $(python3 util/taskcluster.py --decoder)
pip3 uninstall tensorflow
pip3 install 'tensorflow-gpu==1.14.0'
- Downloaded the English common voice data set and extracted to folder en
so current directory has a folder en in it
- CommonVoice v2.0 importer
sudo apt-get install sox
sudo apt-get install -y libsox-dev
DeepSpeech/bin/import_cv2.py --filter DeepSpeech/data/alphabet.txt en
so current directory has a folder en /clips
- Starting training and giving model output directory - models
and checkpoint to directory - checkoints
so current directory has a folder en , models , checkpoints
DeepSpeech/DeepSpeech.py --train_files en/clips/train.csv \
--dev_files en/clips/dev.csv --test_files en/clips/test.csv \
--automatic_mixed_precision=True --checkpoint_dir checkpoints \
--export_dir models
Now just a rookie question will this process terminate itself or i have to terminate it my self after many epochs and low loss?
-
Now i am confused for the next steps. Will the models made in the --export_dir can be used directly in the code ? or are there any steps to perform later?
-
(Optional Question) Given the VM Specs how much time will it take(approx) to train for achieving highest accuracy? (for vm cost prediction)
Please dont mind my rookie questions as i am newbie who had just recently started learning ML , TF etc