TUTORIAL : How I trained a specific french model to control my robot

Thanks for sharing such a wonderful article … but can you please share a snapshot of your csv as i am confused that do we need to give the full path of the wav files or only their name

@gr8nishan,
Thanks for compliments.

here is a sample of a typical deepspeech csv file :

wav_filename,wav_filesize,transcript
/home/nvidia/DeepSpeech/data/alfred/dev/record.1.wav,87404,qui es-tu et qui est-il
/home/nvidia/DeepSpeech/data/alfred/dev/record.2.wav,101804,quel est ton nom ou comment tu t'appelles
/home/nvidia/DeepSpeech/data/alfred/dev/record.3.wav,65324,est-ce que tu vas bien 

You must respect the first line (needed to create columns for CSV usage)
And each next line inform 3 values, separated by a comma :

  • where is the wav file, (I use complete link, perhaps relative path could work ?!)
  • what is it size, (you can have size with this : os.path.getsize(“the wav file”))
  • what is the transcript (in the wav language)

Take a look at …DeepSpeech/bin/import_ldc93s1.py, L23 for CSV creation !!

About transcript, pay attention to only enter characters present in alphabet.txt, otherwise you’ll encounter errors when training.

Hope it will help you.
Vincent

1 Like

@elpimous_robot
but i have more than 16000 file wav. how can i write in csv file.
we can follow the same DeepSpeech/bin/import_ldc93s1.py to do write in csv file. That right ?

Thanks for the help when i was trying from relative path it was not working for me but giving the full absolute path worked

@gr8nishan, thanks for info !
@phanthanhlong7695, try this :

save it in a python file :
run it as python2, and follow asks !! You’ll have nice finished CSV file !
if python3, you’ll have some minor changes to do !

when asked for prefix, enter only prefix wav (all before numbers)
ex : audio223 -> audio ; audio.223 -> audio.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import os
import fnmatch

print('\n\n°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°  ')
print('                         CSV creator :                           ')  
print('                         -------------                           ')                
print('      -  adding CSV columns,                                            ')
print('      -  files location, bytes size, and transcription.           ')
print('              Vincent FOUCAULT,     Septembre 2017            ')
print('°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°\n\n')

def process():
    directory = raw_input('Paste here the location of your wavs:\n>> ')
    directory = directory.replace('file://','')
    textfile = raw_input('Paste here the location of your transcript text:\n>> ')
    textfile = textfile.replace('file://','')
    sentenceTextFile = open(textfile, 'rb')
    sentences = sentenceTextFile.readlines()
    csv_file = raw_input('Paste here the complete CVS file link:\n>> ')
    csv_file = csv_file.replace('file://','')
    transcriptions = open(csv_file, 'wb')

    wavDir = directory
    wav_prefix = raw_input('Enter the prefix of wav file (ex : if record.223.wav --> enter "record.") :\n>> ')
    wavs = directory+"/"+wav_prefix
    
    print('\n******************************************************************************************')
    print('your wav dir is : '+directory)
    print('wave prefix name is : '+wav_prefix)
    print('transcript is here : '+textfile)
    print('you want to save CSV here : '+csv_file)
    print('******************************************************************************************')
    
    content = len(fnmatch.filter(os.listdir(wavDir), '*.wav'))
    print('\nNumber of wav found : '+str(content)+'\n')
    transcriptions.write('wav_filename,wav_filesize,transcript\n')
    for i in range(content):
        wavPath = wavs+str(i+1)+'.wav'
        wavSize=(os.path.getsize(wavPath))
        transcript=sentences[i]        
        transcriptions.write(wavPath+","+str(wavSize)+','+transcript)
    transcriptions.close()
        
if __name__ == "__main__":
    try:
        process()
        print('--->  CSV passed !')
        print('\n\n --->  Bye !!\n\n')
    except:
        print('An error occured !! Check your links.')
        print('GOOD LUCK !!')

Here is the terminal result :


your wav dir is : /media/nvidia/neo_backup/DeepSpeech/data/alfred/test2/
wave prefix name is : record.
transcript is here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test.txt
you want to save CSV here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test_final.csv


Number of wav found : 71

—> CSV passed !

—> Bye !!

Hi Mark,

I ran into the same problem as this. Were you able to find a solution to this??

Prafful’s MacBook Pro:~ naveen$ /Users/naveen/Downloads/kenlm/build/bin/build_binary -T -s /Users/naveen/Downloads/kenlm/build/words.arpa lm.binary
Reading /Users/naveen/Downloads/kenlm/build/words.arpa
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100


/Users/naveen/Downloads/kenlm/lm/vocab.cc:305 in void lm::ngram::MissingSentenceMarker(const lm::ngram::Config &, const char *) threw SpecialWordMissingException.
The ARPA file is missing and the model is configured to reject these models. Run build_binary -s to disable this check. Byte: 191298
ERROR

How did you record your arpa ?
/bin/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3

Hi!

I have quite vague understanding what caused that error in my case. I think something related to wrong characters or wrong encoding. But I fixed the problem by filtering out from the vocabulary all characters that are not present in my alphabet.

In Python something like that:
PERMITTED_CHARS = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
new_data = “”.join(c for c in data if c in PERMITTED_CHARS)

I am trying this process on macos. I have got everything done except the trie file. When i am trying to generate the trie file, i am getting this error using the details provided:-

“cannot execute binary file”

when i searched this error, i see that its a linux file. is it so??

Can anyone help me out?

btw, this is what i am running:

/Users/naveen/generate_trie / /Users/naveen/Downloads/DeepSpeech/alphabet.txt / /Users/naveen/Downloads/DeepSpeech/lm.binary / /Users/naveen/Downloads/DeepSpeech/vocabulary.txt / /Users/naveen/Downloads/DeepSpeech/trie

@elpimous_robot

yup, like this only. Finally, this got resolved when i did " Run build_binary -s to disable this check. " as suggested

Hey, thank you for the tutorial , it’s really helpful.
I have been trying to train a french model using this data. https://datashare.is.ed.ac.uk/handle/10283/2353
i divided the data 6800 files training, 1950 dev, 976 test.
i followed all your steps, but the loss is really high and it doesn’t decrease much , it doesn’t go below 160 , and if i enabled the early stop it would stop at 46 epochs
any thoughts ?

I think the problem was with the frequency of the files. they were in (41000 Hz) and i converted them to (16000 Hz) and it works better now.

Very good…
And wav must be correctly
Sampled :
Ex : test
Wav on audacity / it should reach ±0.5 amplitude…

The max (±0.5) the better for training.

Ps: what is your total wav duration for french ??

it’s a about ten hours. i’m facing another problem. the ten hours are for the same female voice. when i tried to use other recordings for a different male person, it didn’t work. is the model sensitive to the voice itself ?

No. The computer does t mind !!
It should be a wav format error, or some alphabet changes (or csv)

maybe i wasn’t so clear, i trained with female voice only, and tried to test with male voice and different tune , but it didn’t give a good output (random text)

Ah… not same !!!
Normal.

The model only knows this girl voice !!

This is why we need a max of different speakers, to let the model try to anderstand an unknown one (principle of this deep learning!)
Hope this will help.

1 Like

yes, thank you . i will try to have more data and different speakers. thank you again :slight_smile:

Do it right…perhaps I’ll ask you to test your model !! LOL

I don t see wave length on your link !!

Do you know the total wave length on the website, for french ?

Hi there,
Just for the testing, i have only one sentence in my vocabulary file (vocabulary.txt), and i use kenlm tool to generate the apra file. But, its taking too long to generate apra file. Is it usual.
Here’s my command line on kenlm/build directory,
(sr_env) jugs@jugs:~/PycharmProjects/DeepSpeech/native_client/kenlm/build$ bin/lmplz -o 5 ~/Desktop/jugs_lm/vocabulary.txt ~/Desktop/jugs_lm/out.arpa

and the running process shows,
=== 1/5 Counting and sorting n-grams ===
File /dev/pts/23 isn’t normal. Using slower read() instead of mmap(). No progress bar.