Bin/import_cv2.py import 0 samples of CommonVoice ga-IE

20164356 · November 1, 2019, 2:09pm

Hi. I do the instruction in training.rst with data from CommonVoice named ga-IE ( English). When i run:
(deepspeech-train-venv) root@tuan-X450LD:~/DeepSpeech-master# bin/import_cv2.py --filter_alphabet data/alphabet.txt ~/ga-IE/
The result is:
Loading TSV file: /home/tuan/ga-IE/train.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/train.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/train.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 522 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/tuan/ga-IE/test.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/test.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/test.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 482 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/tuan/ga-IE/dev.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/dev.csv
Importing mp3 files…
Progress |#################################################### | 98% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/dev.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 462 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/tuan/ga-IE/validated.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/validated.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/validated.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 2529 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/tuan/ga-IE/other.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/other.csv
Importing mp3 files…
Progress |#################################################### | 98% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/other.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 1033 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
So i only have empty *.csv file. What should i do to fix it?
There is a same problem here, but it seem to not be helpful with me: Import_cv2 : all files failed to convert

reuben · November 1, 2019, 4:04pm

It’s saying they failed the conversion, but the importer swallows all errors silently. Are you sure you have SoX installed? You can try taking out the try-except block in _maybe_convert_wav to see the errors. Just call transformer.build without the exception handling.

20164356 · November 2, 2019, 3:03am

I have just checked as you said. When i remove sox 14.4.2 and reinstall sox 14.4.1, IT WORKS. Here is the result:
(deepspeech-train-venv) root@tuan-X450LD:~/DeepSpeech-master# bin/import_cv2.py --filter_alphabet data/alphabet.txt ~/ga-IE/
Loading TSV file: /home/tuan/ga-IE/train.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/train.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/train.csv
Progress |#####################################################| 100% completed
Imported 522 samples.
Skipped 460 samples that failed on transcript validation.
Final amount of imported audio: 0:25:33.
Loading TSV file: /home/tuan/ga-IE/test.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/test.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/test.csv
Progress |#####################################################| 100% completed
Imported 482 samples.
Skipped 426 samples that failed on transcript validation.
Final amount of imported audio: 0:30:19.
Loading TSV file: /home/tuan/ga-IE/dev.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/dev.csv
Importing mp3 files…
Progress |#################################################### | 98% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/dev.csv
Progress |#####################################################| 100% completed
Imported 462 samples.
Skipped 410 samples that failed on transcript validation.
Final amount of imported audio: 0:26:33.
Loading TSV file: /home/tuan/ga-IE/validated.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/validated.csv
Importing mp3 files…
Progress |#####################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/validated.csv
Progress |#####################################################| 100% completed
Imported 2529 samples.
Skipped 2215 samples that failed on transcript validation.
Final amount of imported audio: 2:17:12.
Loading TSV file: /home/tuan/ga-IE/other.tsv
Saving new DeepSpeech-formatted CSV file to: /home/tuan/ga-IE/clips/other.csv
Importing mp3 files…
Progress |#################################################### | 98% completedWriting CSV file for DeepSpeech.py as: /home/tuan/ga-IE/clips/other.csv
Progress |#####################################################| 100% completed
Imported 1033 samples.
Skipped 909 samples that failed on transcript validation.
Final amount of imported audio: 1:03:55.
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
Progress |#####################################################| 100% completed
And i alo see .wav and .csv file (with text) as i expected. Thanks for your advice

lissyx · November 2, 2019, 11:05am

@20164356 Again, please use proper code formatting when sharing console output …