DeepSpeech model training

Hi @lissyx,

I am working in training new deepSpeech model for German language…

I have downloaded data-sets from the official site and followed the steps mentioned on - https://www.npmjs.com/package/deepspeech , to convert mp3 into wav format that is compatible with deepSpeech training .

I am executing following command to start training -

python3 DeepSpeech.py --checkpoint_dir /root/.local/share/deepspeech/checkpoints/test_training --epochs 3 --nouse_seq_length --export_tflite --export_dir ./test/export/destination --train_files ./test/train.csv --dev_files ./test/dev.csv --test_files ./test/test.csv --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary

what i noticed , during the training lm.binary file is not getting updated . I think because of that i am getting below error-

Instructions for updating:
Use tf.cast instead.
Epoch 0 | Training | Elapsed Time: 0:03:38 | Steps: 3 | Loss: 702.842646
Epoch 0 | Validation | Elapsed Time: 0:00:10 | Steps: 3 | Loss: 538.984263 | Dataset: ./test/dev.csv
Epoch 1 | Training | Elapsed Time: 0:03:59 | Steps: 3 | Loss: 385.228027
Epoch 1 | Validation | Elapsed Time: 0:00:11 | Steps: 3 | Loss: 232.611954 | Dataset: ./test/dev.csv
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 2 | Training | Elapsed Time: 0:04:13 | Steps: 3 | Loss: 255.834290
Epoch 2 | Validation | Elapsed Time: 0:00:13 | Steps: 3 | Loss: 215.997345 | Dataset: ./test/dev.csv
Loading the LM will be faster if you build a binary file.
Reading data/lm-test/lm.binary
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of ‘util::EndOfFileException’
what(): End of file Byte: 0
Aborted (core dumped)

Please help me in letting me understand what i am doing wrong. Thanks!!

You have not properly set-up git-lfs. Please read documentation: https://github.com/mozilla/DeepSpeech/blob/master/README.md#training-your-own-model

Hi @lissyx,

I am not getting you properly , sorry for that.

I just executed below command , hoping to install git-lfs.

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash

Below is the output-

–curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
Detected operating system as Ubuntu/xenial.
Checking for curl…
Detected curl…
Checking for gpg…
Detected gpg…
Running apt-get update… done.
Installing apt-transport-https… done.
Installing /etc/apt/sources.list.d/github_git-lfs.list…done.
Importing packagecloud gpg key… done.
Running apt-get update… done.

The repository is setup! You can now install packages.

Is this right ? Or i need to do something else.

Please help!!

Make sure you re-clone after that. Or you have to manually git lfs fetch or something to get the files.

Hi @lissyx,

Thanks for your quick responses . I just competed the flow with one training file without any errors , hope i will not get any when i train with bulk data-set :slight_smile:

Thanks-lot again!!

1 Like

HI @lissyx,

i am trying to use the newly trained language model using NodeJS and i am getting below error-

Error: Trie file version mismatch (4 instead of expected 3). Update your trie file.
Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state
Segmentation fault (core dumped)

Here are my configuration -

deepspeech --version
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.1-0-g4b29b78

is it related to – https://github.com/mozilla/DeepSpeech/issues/2206

if yes, then is it mandatory to update to deepspeech version v0.6.0-alpha.1 ?

You need to use things in sync, either all v0.5 or all v0.6

Hi @lissyx,

I am performing below mentioned instructions , using deepspeech version v0.5.1.

1- set-up git lfs
2- clone deepSpeech library git clone --branch v0.5.1 https://github.com/mozilla/DeepSpeech.git DeepSpeech-lib
3- install the dependencies -
pip3 install -r requirements.txt
4- install ds_ctcdecoder
pip3 install $(python3 util/taskcluster.py --decoder) , this installed ds-ctcdecoder==0.5.1
5. Download data-sets from official site.
6. convert data to a format that deepSPeech engine can understand -
bin/import_cv2.py …/data-sets/german/clips

  1. train using below command
    python3 DeepSpeech.py --epochs 10 --checkpoint_dir /root/.local/share/deepspeech/checkpoints --nouse_seq_length --export_dir ./test/export/destination --train_files ./test/train.csv --dev_files ./test/dev.csv --test_files ./test/test.csv

    above command will output_graph.pb in the mentioned export dir i.e - ./test/export/destination

  2. Test with newly exported model
    python3 ./native_client/python/client.py --model ./test/export/destination/output_graph.pb --alphabet ./data/alphabet.txt --lm ./data/lm/lm.binary --trie ./data/lm/trie --audio …/Data-sets/german/clips/common_voice_de_17300571.wav

I am getting below error after step 8 , i.e trying to use newly trained model.

I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

and then –

I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA

is it now related to my system configurations , please provide your inputs.

Thanks!!

This is not an error and is not related to your models. Those are warnings, you can ignore them.

Hi @lissyx,

I made a copy paste error , sorry for that . Below is the actual error message i am getting -

Error running session: Invalid argument: Tensor input_lengths:0, specified in either feed_devices or fetch_devices was not found in the Graph

While looking out on Net , i got below reason on one site -

Although the model has a Session and Graph, in some tensorflow methods, the default Session and Graph are used. To fix this I had to explicity say that I wanted to use both my Session and my Graph as the default:

but i am not getting this properly , Please let me know your inputs.

Thanks!!

That feels strange, but you are running directly client.py and you don’t share the start of the input, so we cannot check what libdeepspeech.so is actually running.

Please test properly, as documented: set-up a virtualenv and install with pip install deepspeech==0.5.1 and run inference with deepspeech rather than calling client.py directly.

Hi @lissyx,

I tried it running with interface as well and got the same error.

It works fine with pre-trained (model). i will try with creating new virtual environment .

Thanks!!

Hi @lissyx,

I tried with creating new virtual environment , still facing same error.

Can it be because i have trained model with very few data-set (2-3 files of 10 sec).Currently i am trying to do a complete POC that’s why i have not trained with large data-set .Please let me know your inputs?

Thanks!!

No, that’s something else.

Like …

and yes @laxmikant04.yadav you shared that earlier, but since you kept sharing without proper code formatting, your python command line was unreadable to me and thus I missed that information.

Thanks @lissyx ,

I went through your reply on post -
[FIXED] Error with master/alpha8 (unknown op: UnwrapDatasetVariant & WrapDatasetVariant)

so currently, i am training without --nouse_seq_length flag.

Those were simple steps i was making a note for myself on text file . I will keep in mind to have proper formatting on my next comments .

Thanks-alot!!

You don’t need to retrain, just re-export without that flag.

1 Like

Thanks @lissyx .

It worked fine after exporting without “–nouse_seq_length” flag.

Thanks!!!

1 Like

HI @lissyx,

I am working on speech recognition with microphone , and i started with below example from deepspeech github repo -

I could see it’s trying to recognise the speech but accuracy is not coming good for me .
i am working on Ubuntu 16.04 OS on a desktop .

Currently it’s only able to recignise one word that too when spoken very loud and very clear. and failing otherwise .

Could you please suggest what else i should try or where i can look up to increase it’s accuracy.

Our expectations are that it should be able to recognise simple sentances like - “Welcome to speech recognition” . this works perfectly when i try with clean audio files.

Thanks!!!

Looks like you’ve got some hint yourself. Though, you don’t document if those clean audio files are produced by you or if they are from other origin.

Also,

It looks like we have not updated that to 0.5.1, maybe it is worth testing if it improves, since this model was trained to be more robust to some noise.

Make sure your system is able to actually capture at mono 16kHz, resampling might get into.

It could also just be a side-effect of your mic, that captures poor quality sound. Besides improving the model, there’s hardly anything we can easily improve.