the error message that i got on run time
alright i will add more content on my csv file
#!/usr/bin/env bash
set -xe
if [ ! -f DeepSpeech.py ]; then
echo "Please make sure you run this from DeepSpeech's top level directory."
exit 1
fi;
python3 -u DeepSpeech.py \
--train_files /home/metlife-vad/DeepSpeech/minigir/train/miniger-train.csv \
--dev_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv \
--test_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv \
--train_batch_size 48 \
--dev_batch_size 40 \
--test_batch_size 40 \
--n_hidden 1024 \
--epochs 64 \
--early_stop True \
--es_steps 6 \
--es_mean_th 0.1 \
--es_std_th 0.1 \
--dropout_rate 0.30 \
--learning_rate 0.0005 \
--report_count 100 \
--export_dir /metlife-models/ \
--checkpoint_dir /home/metlife-vad/Deepspeech/metlife-models/check_point \
--alphabet_config_path /home/metlife-vad/metlife-models/alphabet.txt \
--lm_binary_path /home/metlife-vad/Deepspeech/metlife-models/lm.binary \
--lm_trie_path /home/metlife-vad/Deepspeech/metlife-models/trie \
"$@"
This error TypeError: 'str' object cannot be interpreted as an integer
really points to some broken data.
Can you triple check your path / file ?
You would not be the first one mixing one file with another, myself includeed.
To avoid any mistake, can you please generate your one-sample CSV under a different name and re-run your script with it so we are sure this is the one being ran ?
Can you also ensure / eradicate all form of feature cache ?
i didnāt understand this part
this part of the error is from the csv file under the transcript column ā¦ is it correct ???
I canāt help without the content of the CSV.
#!/usr/bin/env bash
set -xe
if [ ! -f DeepSpeech.py ]; then
echo "Please make sure you run this from DeepSpeech's top level directory."
exit 1
fi;
python3 -u DeepSpeech.py \
--train_files /home/metlife-vad/DeepSpeech/minigir/train/train.csv \
--dev_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv \
--test_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv \
--train_batch_size 48 \
--dev_batch_size 40 \
--test_batch_size 40 \
--n_hidden 1024 \
--epochs 64 \
--early_stop True \
--es_steps 6 \
--es_mean_th 0.1 \
--es_std_th 0.1 \
--dropout_rate 0.30 \
--learning_rate 0.0005 \
--report_count 100 \
--export_dir /metlife-models/ \
--checkpoint_dir /home/metlife-vad/Deepspeech/metlife-models/check_point \
--alphabet_config_path /home/metlife-vad/metlife-models/alphabet.txt \
--lm_binary_path /home/metlife-vad/Deepspeech/metlife-models/lm.binary \
--lm_trie_path /home/metlife-vad/Deepspeech/metlife-models/trie \
"$@"
for this the error message is
+ '[' '!' -f DeepSpeech.py ']'
+ python3 -u DeepSpeech.py --train_files /home/metlife-vad/DeepSpeech/minigir/train/train.csv --dev_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv --test_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv --train_batch_size 48 --dev_batch_size 40 --test_batch_size 40 --n_hidden 1024 --epochs 64 --early_stop True --es_steps 6 --es_mean_th 0.1 --es_std_th 0.1 --dropout_rate 0.30 --learning_rate 0.0005 --report_count 100 --export_dir /metlife-models/ --checkpoint_dir /home/metlife-vad/Deepspeech/metlife-models/check_point --alphabet_config_path /home/metlife-vad/metlife-models/alphabet.txt --lm_binary_path /home/metlife-vad/Deepspeech/metlife-models/lm.binary --lm_trie_path /home/metlife-vad/Deepspeech/metlife-models/trie
Traceback (most recent call last):
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4736, in get_value
return libindex.get_value_box(s, key)
File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/metlife-vad/DeepSpeech/util/text.py", line 85, in text_to_char_array
transcript = np.asarray(alphabet.encode(series['transcript']))
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/series.py", line 1071, in __getitem__
result = self.index.get_value(self, key)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4744, in get_value
raise e1
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4730, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'transcript'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 931, in <module>
absl.app.run(main)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 915, in main
train()
File "DeepSpeech.py", line 435, in train
train_phase=True)
File "/home/metlife-vad/DeepSpeech/util/feeding.py", line 101, in create_dataset
df['transcript'] = df.apply(text_to_char_array, alphabet=Config.alphabet, result_type='reduce', axis=1)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 6928, in apply
return op.get_result()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 112, in f
return func(x, *args, **kwds)
File "/home/metlife-vad/DeepSpeech/util/text.py", line 91, in text_to_char_array
raise ValueError('While processing: {}\n{}'.format(series['wav_filename'], e))
ValueError: ("While processing: /home/metlife-vad/DeepSpeech/minigir/wav/tmp.wav\n'transcript'", 'occurred at index 0')
@bharath.vadithya Sorry, but your shared CSV works for me:
$ python3
Python 3.7.5rc1 (default, Oct 8 2019, 16:47:45)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> pandas.read_csv('test.csv')
wav_filename wav_filesize transcript
0 /home/metlife-vad/DeepSpeech/minigir/wav/tmp.wav 2368044 very good morning this side jewel calling on b...
>>> a = pandas.read_csv('test.csv')
>>> a['transcript']
0 very good morning this side jewel calling on b...
Name: transcript, dtype: object
>>>
And FTR:
$ pip list|grep pandas
pandas 0.25.1
@bharath.vadithya Please check with this.
python3 -c "import pandas; print(pandas.read_csv('/home/metlife-vad/DeepSpeech/minigir/train/train.csv')['transcript']);"
thanks alot ā¦ its a small mistake
can you do something for this also
+ '[' '!' -f DeepSpeech.py ']'
+ python3 -u DeepSpeech.py --train_files /home/metlife-vad/DeepSpeech/minigir/train/train.csv --dev_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv --test_files /home/metlife-vad/Deepspeech/minigir/train/miniger-train.csv --train_batch_size 48 --dev_batch_size 40 --test_batch_size 40 --n_hidden 1024 --epochs 64 --early_stop True --es_steps 6 --es_mean_th 0.1 --es_std_th 0.1 --dropout_rate 0.30 --learning_rate 0.0005 --report_count 100 --export_dir /metlife-models/ --checkpoint_dir /home/metlife-vad/Deepspeech/metlife-models/check_point --alphabet_config_path /home/metlife-vad/metlife-models/alphabet.txt --lm_binary_path /home/metlife-vad/Deepspeech/metlife-models/lm.binary --lm_trie_path /home/metlife-vad/Deepspeech/metlife-models/trie
Traceback (most recent call last):
File "/home/metlife-vad/DeepSpeech/util/text.py", line 33, in _label_from_string
return self._str_to_label[string]
KeyError: 'ā'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/metlife-vad/DeepSpeech/util/text.py", line 85, in text_to_char_array
transcript = np.asarray(alphabet.encode(series['transcript']))
File "/home/metlife-vad/DeepSpeech/util/text.py", line 47, in encode
res.append(self._label_from_string(char))
File "/home/metlife-vad/DeepSpeech/util/text.py", line 39, in _label_from_string
).with_traceback(e.__traceback__)
File "/home/metlife-vad/DeepSpeech/util/text.py", line 33, in _label_from_string
return self._str_to_label[string]
KeyError: "ERROR: Your transcripts contain characters (e.g. 'ā') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 931, in <module>
absl.app.run(main)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 915, in main
train()
File "DeepSpeech.py", line 435, in train
train_phase=True)
File "/home/metlife-vad/DeepSpeech/util/feeding.py", line 101, in create_dataset
df['transcript'] = df.apply(text_to_char_array, alphabet=Config.alphabet, result_type='reduce', axis=1)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 6928, in apply
return op.get_result()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 112, in f
return func(x, *args, **kwds)
File "/home/metlife-vad/DeepSpeech/util/text.py", line 91, in text_to_char_array
raise ValueError('While processing: {}\n{}'.format(series['wav_filename'], e))
ValueError: ('While processing: /home/metlife-vad/DeepSpeech/minigir/wav/tmp.wav\n"ERROR: Your transcripts contain characters (e.g. \'ā\') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."', 'occurred at index 0')
Please check your alphabet file
hey @lissyx
my alphabet.txt file
# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# associated with a numeric label.
# A line that starts with # is a comment. You can escape it with \# if you wish
# to use '#' as a label.
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
# The last (non-comment) line needs to end with a newline.
After checking my transcripts ā¦i found these characters
### The following unique characters were found in your transcripts: ###
[' ', 'b', 'c', 'j', 'h', 't', 'x', 's', 'o', 'r', 'f', 'n', 'm', 'q', 'k', 'g', 'u', 'w', 'p', 'e', 'y', 'z', 'a', 'i', 'l', 'v', 'd']
i got the error as ā¦ but there are no extra spaces
+ '[' '!' -f DeepSpeech.py ']'
+ python3 -u DeepSpeech.py --train_files minigir/train/train.csv --dev_files minigir/train/train.csv --test_files minigir/train/train.csv --train_batch_size 48 --dev_batch_size 40 --test_batch_size 40 --n_hidden 1024 --epochs 64 --early_stop True --es_steps 6 --es_mean_th 0.1 --es_std_th 0.1 --dropout_rate 0.30 --learning_rate 0.0005 --report_count 100 --export_dir metlife-models/ --checkpoint_dir metlife-models/check_point --alphabet_config_path metlife-models/alphabet.txt --lm_binary_path metlife-models/lm.binary --lm_trie_path metlife-models/trie
Traceback (most recent call last):
File "/home/metlife-vad/DeepSpeech/util/text.py", line 33, in _label_from_string
return self._str_to_label[string]
KeyError: ' '
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/metlife-vad/DeepSpeech/util/text.py", line 85, in text_to_char_array
transcript = np.asarray(alphabet.encode(series['transcript']))
File "/home/metlife-vad/DeepSpeech/util/text.py", line 47, in encode
res.append(self._label_from_string(char))
File "/home/metlife-vad/DeepSpeech/util/text.py", line 39, in _label_from_string
).with_traceback(e.__traceback__)
File "/home/metlife-vad/DeepSpeech/util/text.py", line 33, in _label_from_string
return self._str_to_label[string]
KeyError: "ERROR: Your transcripts contain characters (e.g. ' ') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "DeepSpeech.py", line 931, in <module>
absl.app.run(main)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 915, in main
train()
File "DeepSpeech.py", line 435, in train
train_phase=True)
File "/home/metlife-vad/DeepSpeech/util/feeding.py", line 101, in create_dataset
df['transcript'] = df.apply(text_to_char_array, alphabet=Config.alphabet, result_type='reduce', axis=1)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 6928, in apply
return op.get_result()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard
self.apply_series_generator()
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator
results[i] = self.f(v)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/pandas/core/apply.py", line 112, in f
return func(x, *args, **kwds)
File "/home/metlife-vad/DeepSpeech/util/text.py", line 91, in text_to_char_array
raise ValueError('While processing: {}\n{}'.format(series['wav_filename'], e))
ValueError: ('While processing: /home/metlife-vad/DeepSpeech/minigir/wav/tmp.wav\n"ERROR: Your transcripts contain characters (e.g. \' \') which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to data/alphabet.txt."', 'occurred at index 0')
Make sure this is not some UTF-8 special space. You can also use util/check_characters.py
to build the alphabet from the dataset.
yeah ā¦solved UTF 8 special space
after i run the scriptā¦ i got something like thisā¦ i searched on internet, but iam unable to figure out it.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: minigir/train/train.csv
Traceback (most recent call last):
File "DeepSpeech.py", line 931, in <module>
absl.app.run(main)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/metlife-vad/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 915, in main
train()
File "DeepSpeech.py", line 642, in train
dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero
hey @lissyx ā¦ i am just training on three examples will it enough for training (this is just to check weather iam getting results or not) so that i can go further training of more number data.
Whatās your training flags / command line ?
i mean three audio files transcripts ā¦ the csv file has three audio transcripts