Close to those places, you should check for anything related to the number of channels
can you post your training logs? maybe the last few epochs? we could see if the issue is with the model(maybe because of the data) or with other modules.
one more thing I have files of 40 sec each do I need to change code for that also?
What is the question here?
ok let me explain it again
I have a data which has 44100hz frequency stereo channel
so as you suggested I changed the files from 16000 to 44100 and channel from 1 to 2
Now I have a question
I have a files of length 40 sec each .So is there any parameter in Deepspeech to change so that my audio files of 40 sec length get adapted to training
because in documentation it was 5 sec .
or it is ok to give 40 sec of file for training
No, it will not be an issue per-se, but it’s going to require huge amount of GPU memory for training. There is no such feature to cut data.
ok Now I have trained a model as you said results improve a lit bit only I will paste my results here if you need
(1)First I got seven as output for all audio files
but now the output is corrected that is if there is one it shows one
but
as my audio file contains 20 times one
it only show one time
why it is
?
and also I got this error
Ambiguous dimension: 1411.2
ValueError: Error converting shape to a TensorShape: Ambiguous dimension: 1411.2
what should be done to remove this error
I’m sorry but I absolutely don’t understand your question here.
Again, without more context on how you get that error, it’s impossible for us to help you.
I have changed script as you said for 44100 hz and stereo channel
changed
this is the log what I got
Help me remove that error
Epoch 0 | Training | Elapsed Time: 1:52:06 | Steps: 96 | Loss: 531.260859
Epoch 0 | Validation | Elapsed Time: 0:01:07 | Steps: 12 | Loss: 819.303584 | Dataset: /app/Deepspeech/dev/dev.csv
I Saved new best validating model with loss 819.303584 to: /app/Deepspeech/results/checkout/best_dev-1836
Epoch 1 | Training | Elapsed Time: 1:41:58 | Steps: 96 | Loss: 506.411064
Epoch 1 | Validation | Elapsed Time: 0:01:10 | Steps: 12 | Loss: 793.307281 | Dataset: /app/Deepspeech/dev/dev.csv
I Saved new best validating model with loss 793.307281 to: /app/Deepspeech/results/checkout/best_dev-1932
Epoch 2 | Training | Elapsed Time: 1:35:40 | Steps: 96 | Loss: 476.467811
Epoch 2 | Validation | Elapsed Time: 0:01:06 | Steps: 12 | Loss: 793.474063 | Dataset: /app/Deepspeech/dev/dev.csv
Epoch 3 | Training | Elapsed Time: 1:30:58 | Steps: 96 | Loss: 430.477815
Epoch 3 | Validation | Elapsed Time: 0:01:04 | Steps: 12 | Loss: 739.606102 | Dataset: /app/Deepspeech/dev/dev.csv
I Saved new best validating model with loss 739.606102 to: /app/Deepspeech/results/checkout/best_dev-2124
Epoch 4 | Training | Elapsed Time: 1:26:48 | Steps: 96 | Loss: 390.402085
Epoch 4 | Validation | Elapsed Time: 0:01:07 | Steps: 12 | Loss: 811.324318 | Dataset: /app/Deepspeech/dev/dev.csv
I Early stop triggered as (for last 4 steps) validation loss: 811.324318 with standard deviation: 25.354381 and mean: 775.462482
I FINISHED optimization in 8:13:11.060505
I Restored variables from best validation checkpoint at /app/Deepspeech/results/checkout/best_dev-2124, step 2124
Testing model on /app/Deepspeech/test/test.csv
Test epoch | Steps: 25 | Elapsed Time: 0:01:12
Test on /app/Deepspeech/test/test.csv - WER: 0.992800, CER: 0.986946, loss: 800.126892
WER: 1.000000, CER: 0.985000, loss: 29.460112
- src: "one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one "
- res: “seven”
WER: 1.000000, CER: 0.985000, loss: 34.642292
- src: "one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one "
- res: “seven”
WER: 1.000000, CER: 0.985000, loss: 43.823837
- src: "one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 829.879456
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 830.506531
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 831.406311
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 833.888916
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 834.442749
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 843.434326
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.988000, loss: 846.850525
- src: "five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five five "
- res: “seven”
WER: 1.000000, CER: 0.992000, loss: 906.634338
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.992000, loss: 917.082153
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.992000, loss: 931.485535
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.992000, loss: 971.765442
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.992000, loss: 980.901733
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.996000, loss: 985.703308
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “two”
WER: 1.000000, CER: 0.992000, loss: 1011.776367
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “four”
WER: 1.000000, CER: 0.996000, loss: 1016.469482
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “two”
WER: 1.000000, CER: 0.996000, loss: 1033.371948
- src: "zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero zero "
- res: “two”
WER: 1.000000, CER: 0.986667, loss: 1068.499268
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1087.635254
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1108.907349
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1111.774170
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1114.400024
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1125.432251
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1149.028564
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1159.635742
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 1.000000, CER: 0.986667, loss: 1185.972534
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 569.788208
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.984000, loss: 654.026611
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.985000, loss: 656.827332
- src: "six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six "
- res: “six”
WER: 0.980000, CER: 0.985000, loss: 659.727051
- src: "six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six "
- res: “six”
WER: 0.980000, CER: 0.984000, loss: 661.934265
- src: "nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine "
- res: “nine”
WER: 0.980000, CER: 0.985000, loss: 666.090271
- src: "one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one one "
- res: “one”
WER: 0.980000, CER: 0.984000, loss: 671.630920
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.985000, loss: 672.173218
- src: "six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six "
- res: “six”
WER: 0.980000, CER: 0.985000, loss: 673.630249
- src: "six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six "
- res: “six”
WER: 0.980000, CER: 0.984000, loss: 675.070740
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.984000, loss: 682.800720
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.984000, loss: 692.452820
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.984000, loss: 695.382751
- src: "nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine nine "
- res: “nine”
WER: 0.980000, CER: 0.985000, loss: 696.252136
- src: "six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six six "
- res: “six”
WER: 0.980000, CER: 0.984000, loss: 713.148743
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.984000, loss: 775.913818
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.984000, loss: 822.966125
- src: "four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four four "
- res: “four”
WER: 0.980000, CER: 0.983333, loss: 935.471558
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 966.865479
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 988.343689
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1004.328613
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1031.770996
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1040.531250
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1045.897217
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1050.801758
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1061.078735
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1063.963989
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1065.088745
- src: "seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven seven "
- res: “seven”
WER: 0.980000, CER: 0.983333, loss: 1067.910645
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1143.155151
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “eight”
WER: 0.980000, CER: 0.983333, loss: 1165.391235
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1183.412720
- src: "eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight eight "
- res: “eight”
WER: 0.980000, CER: 0.983333, loss: 1239.415771
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1263.958374
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1275.644775
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
WER: 0.980000, CER: 0.983333, loss: 1326.395996
- src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
- res: “three”
I Exporting the model…
Traceback (most recent call last):
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py”, line 145, in make_shape
shape = tensor_shape.as_shape(v)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 1125, in as_shape
return TensorShape(shape)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 690, in init
self._dims = [as_dimension(d) for d in dims_iter]
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 690, in
self._dims = [as_dimension(d) for d in dims_iter]
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 632, in as_dimension
return Dimension(value)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 188, in init
raise ValueError(“Ambiguous dimension: %s” % value)
ValueError: Ambiguous dimension: 1411.2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “DeepSpeech.py”, line 836, in
tf.app.run(main)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 828, in main
export()
File “DeepSpeech.py”, line 687, in export
inputs, outputs, _ = create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=FLAGS.export_tflite)
File “DeepSpeech.py”, line 568, in create_inference_graph
input_samples = tf.placeholder(tf.float32, [Config.audio_window_samples], ‘input_samples’)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 2077, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 5789, in placeholder
shape = _execute.make_shape(shape, “shape”)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py”, line 150, in make_shape
e))
ValueError: Error converting shape to a TensorShape: Ambiguous dimension: 1411.2.
@lucifera678 Can you use proper code formatting for console / log output ? It’s complicated to read otherwise.
Again, can you share your changes ?
(.virtualenv) kdavis-19htdh:DeepSpeech kdavis$ find . -name “*.py” -exec grep 16000 {} /dev/null ;
./util/flags.py: f.DEFINE_integer(‘audio_sample_rate’, 16000, ‘sample rate value expected by model’)
./bin/import_cv2.py:SAMPLE_RATE = 16000
./bin/import_fisher.py: origAudios = [librosa.load(wav_file, sr=16000, mono=False) for wav_file in wav_files]
./bin/import_swb.py: audioData, frameRate = librosa.load(temp_wav_file, sr=16000, mono=True)
./bin/import_ts.py:SAMPLE_RATE = 16000
./bin/import_cv.py:SAMPLE_RATE = 16000
./bin/import_gram_vaani.py:SAMPLE_RATE = 16000
./bin/import_lingua_libre.py:SAMPLE_RATE = 16000
./bin/import_aishell.py: durations = (df[‘wav_filesize’] - 44) / 16000 / 2
./examples/vad_transcriber/wavTranscriber.py: audio_length = len(audio) * (1 / 16000)
./examples/vad_transcriber/wavTranscriber.py: assert sample_rate == 16000, “Only 16000Hz input WAV files are supported for now!”
./examples/vad_transcriber/wavSplit.py: assert sample_rate in (8000, 16000, 32000)
./examples/mic_vad_streaming/mic_vad_streaming.py: RATE_PROCESS = 16000
./examples/mic_vad_streaming/mic_vad_streaming.py: “”“Return a block of audio data resampled to 16000hz, blocking if necessary.”""
./examples/mic_vad_streaming/mic_vad_streaming.py: DEFAULT_SAMPLE_RATE = 16000
./stats.py: parser.add_argument("–sample-rate", type=int, default=16000, required=False, help=“Audio sample rate”)
./native_client/python/client.py: sox_cmd = 'sox {} --type raw --bits 16 --channels 1 --rate 16000 --encoding signed-integer --endian little --compression 0.0 --no-dither - '.format(quote(audio_path))
./native_client/python/client.py: return 16000, np.frombuffer(output, np.int16)
./native_client/python/client.py: if fs != 16000:
./native_client/python/client.py: audio_length = fin.getnframes() * (1/16000)
./native_client/python/init.py: def setupStream(self, pre_alloc_frames=150, sample_rate=16000):
changed in all this files as you mentioned above
Please, this is absolutely not useful. Can’t you git diff
and share the changes appropriately using code formatting ?
@lucifera678,
40s sequences to train are very large !!
You took a lot of time to train your model, however you obtain a wer. 99!!
99% error !!
It’s normal that the results are poor…
Sure that you should try training with only 10 sentences, 16k mono, max 15s… See the results…correct the params… Anderstand…
And restart later with all your datas.
Good luck
Vincent
WER: 0.980000, CER: 0.983333, loss: 1326.395996
• src: "three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three three "
• res: “three”
I Exporting the model…
Traceback (most recent call last):
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py”, line 145, in make_shape
shape = tensor_shape.as_shape(v)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 1125, in as_shape
return TensorShape(shape)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 690, in init
self._dims = [as_dimension(d) for d in dims_iter]
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 690, in
self._dims = [as_dimension(d) for d in dims_iter]
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 632, in as_dimension
return Dimension(value)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py”, line 188, in init
raise ValueError(“Ambiguous dimension: %s” % value)
ValueError: Ambiguous dimension: 1411.2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “DeepSpeech.py”, line 836, in
tf.app.run(main)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 828, in main
export()
File “DeepSpeech.py”, line 687, in export
inputs, outputs, _ = create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=FLAGS.export_tflite)
File “DeepSpeech.py”, line 568, in create_inference_graph
input_samples = tf.placeholder(tf.float32, [Config.audio_window_samples], ‘input_samples’)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py”, line 2077, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 5789, in placeholder
shape = _execute.make_shape(shape, “shape”)
File “/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py”, line 150, in make_shape
e))
ValueError: Error converting shape to a TensorShape: Ambiguous dimension: 1411.2.
I have formated the code is there anything that can I do
Sharing the diff of your change s?
./util/flags.py: f.DEFINE_integer(‘audio_sample_rate’, 44100, ‘sample rate value expected by model’)
./bin/import_cv2.py:SAMPLE_RATE = 44100
./bin/import_fisher.py: origAudios = [librosa.load(wav_file, sr= 44100, mono=False) for wav_file in wav_files]
./bin/import_swb.py: audioData, frameRate = librosa.load(temp_wav_file, sr= 44100, mono=True)
./bin/import_ts.py:SAMPLE_RATE = 44100
./bin/import_cv.py:SAMPLE_RATE = 44100
./bin/import_gram_vaani.py:SAMPLE_RATE = 44100
./bin/import_lingua_libre.py:SAMPLE_RATE = 44100
./bin/import_aishell.py: durations = (df[‘wav_filesize’] - 44) / 44100 / 2
./examples/vad_transcriber/wavTranscriber.py: audio_length = len(audio) * (1 / 44100)
./examples/vad_transcriber/wavTranscriber.py: assert sample_rate == 16000, “Only 16000Hz input WAV files are supported for now!”
./examples/vad_transcriber/wavSplit.py: assert sample_rate in (8000, 16000, 32000, 44100)
./examples/mic_vad_streaming/mic_vad_streaming.py: RATE_PROCESS = 44100
./examples/mic_vad_streaming/mic_vad_streaming.py: “”“Return a block of audio data resampled to 16000hz, blocking if necessary.”""
./examples/mic_vad_streaming/mic_vad_streaming.py: DEFAULT_SAMPLE_RATE = 44100
./stats.py: parser.add_argument("–sample-rate", type=int, default= 44100, required=False, help=“Audio sample rate”)
./native_client/python/client.py: sox_cmd = 'sox {} --type raw --bits 16 --channels 1 --rate 44100 --encoding signed-integer --endian little --compression 0.0 --no-dither - '.format(quote(audio_path))
./native_client/python/client.py: return 44100, np.frombuffer(output, np.int16)
./native_client/python/client.py: if fs != 44100:
./native_client/python/client.py: audio_length = fin.getnframes() * (1/ 44100)
./native_client/python/ init .py: def setupStream(self, pre_alloc_frames=150, sample_rate= 44100):
All the Bold formatted text are the changes in the files as you said me to change 16000 to 44100
I’m sorry, that’s still not a diff as I asked. It’s completely unusable.
@lucifera678 maybe you aren’t aware what a diff is? If not, one of these might give you a bit of background:
https://www.git-tower.com/learn/git/ebook/en/command-line/advanced-topics/diffs