Hi, I am facing an error while I try to import my data set by using ‘import_cv2.py’ python script. My tsv files are in Bangla, but while I am trying to import it, it shows me this error. Do you have any idea what should I do to overcome this error ?
(deepspeech-venv) learning@machine:~/DeepSpeech$ python bin/import_cv2.py --filter_alphabet path/to/some/alphabet.txt path/to/extracted/language/archive
Loading TSV file: /home/learning/DeepSpeech/path/to/extracted/language/archive/train.tsv
Saving new DeepSpeech-formatted CSV file to: path/to/extracted/language/archive/clips/train.csv
Traceback (most recent call last):
File “bin/import_cv2.py”, line 166, in
_preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
File “bin/import_cv2.py”, line 43, in _preprocess_data
_maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
File “bin/import_cv2.py”, line 56, in _maybe_convert_set
for row in reader:
File “/home/learning/anaconda3/lib/python3.7/csv.py”, line 111, in next
self.fieldnames
File “/home/learning/anaconda3/lib/python3.7/csv.py”, line 98, in fieldnames
self._fieldnames = next(self.reader)
File “/home/learning/tmp/deepspeech-venv/lib/python3.7/codecs.py”, line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: ‘utf-8’ codec can’t decode bytes in position 15-16: invalid continuation byte