I recorded my own voice to test out deepspeech and want the model to recognise this voice when played back to the model.
So these are the steps which i followed:
1> prepared train.csv, test.csv and dev.csv all having the following single entry:
/Users/kausthub.naarayan/speech_project/long_route.wav|1|the driver took a long route|
2> I then ran this command to train the model against the voice i had given:
python -u DeepSpeech.py \ --train_files ../new_model-2/train-2.csv \ --dev_files ../new_model-2/dev-2.csv \ --test_files ../new_model-2/test-2.csv \ --train_batch_size 80 \ --dev_batch_size 80 \ --test_batch_size 40 \ --n_hidden 375 \ --epoch 33 \ --validation_step 1 \ --early_stop True \ --earlystop_nsteps 6 \ --estop_mean_thresh 0.1 \ --estop_std_thresh 0.1 \ --dropout_rate 0.22 \ --learning_rate 0.00095 \ --report_count 100 \ --use_seq_length False \ --export_dir ../new_model-2/ \ --decoder_library_path ../libctc_decoder_with_kenlm.so \ --alphabet_config_path ../models/alphabet.txt \ --lm_binary_path ../models/lm.binary \ --lm_trie_path ../models/trie \ "$@"
I am using lm.binary and alphabet.txt and trie from the pre trained model given along with DeepSpeech.
This outputs a new model.
3> I run this audio file against this new model and existing language model by the following command:
../DeepSpeech/deepspeech output_graph.pb ../models/alphabet.txt ../models/lm.binary ../models/trie ../long_route.wav
This gives an output which is not even close to what the transcript is.
this is the result i got when i ran:
the rotototogroe
the actual transcript is:
the driver took a long route
Can anyone please help me in finding out what is the mistake i am doing ??
This is the details of the audio file:
Input File : 'long_route.wav’
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:06.41 = 102516 samples ~ 480.544 CDDA sectors
File Size : 205k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
link to my audio file: https://vocaroo.com/i/s0qUMwH3qqUF
thanks in advance