Error with Deep Speech Training

Hi,

I am training a model for speech recognition using https://github.com/mozilla/DeepSpeech and data set from https://voice.mozilla.org/en/datasets for 10 epochs, but the loss plateaus at around epoch 3 and the training stops.
Data validation on some sample data gives me this result.


WER: 6.000000, CER: 2.647059, loss: 126.520866

  • src: “it’s not my house”
  • res: "a a a a a a a a a a a a a a a a a a a a a a a a "

How can I solve this issue?
Any pointers would be helpful.

Thanks,
Ansari

You just have not enough data and training for not long enough. Also, please use proper code formatting when sharing console, and include full training parameters as well as full output console.

I’m using this as the input parameter

python3 DeepSpeech.py --epochs 10 --checkpoint_dir /home/aisystem/Documents/DeepSpeech/checkpoint/ --export_dir /home/aisystem/Documents/DeepSpeech/export/destination/ --train_files en/clips/train.csv --dev_files en/clips/dev.csv --test_files en/clips/test.csv

and got this as output

    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint8 = np.dtype([("qint8", np.int8, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint16 = np.dtype([("qint16", np.int16, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      _np_qint32 = np.dtype([("qint32", np.int32, 1)])
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
      np_resource = np.dtype([("resource", np.ubyte, 1)])
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    tf.py_func is deprecated in TF V2. Instead, use
        tf.py_function, which takes a python function which manipulates tf eager
        tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
        an ndarray (just call tensor.numpy()) but having access to eager tensors
        means `tf.py_function`s can use accelerators such as GPUs as well as
        being differentiable using a gradient tape.
        
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Colocations handled automatically by placer.
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.cast instead.
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use standard file APIs to check for files with this prefix.
    I Restored variables from most recent checkpoint at /home/aisystem/Documents/DeepSpeech/new_checkpoint/train-72270, step 72270
    I STARTING Optimization
    Epoch 0 |   Training | Elapsed Time: 0:30:28 | Steps: 2333 | Loss: 226.244783                                                                  
    Epoch 0 | Validation | Elapsed Time: 0:03:44 | Steps: 1167 | Loss: 211.443538 | Dataset: en/clips/dev.csv                                      
    I Saved new best validating model with loss 211.443538 to: /home/aisystem/Documents/DeepSpeech/new_checkpoint/best_dev-74603
    Epoch 1 |   Training | Elapsed Time: 0:20:01 | Steps: 1748 | Loss: 197.309220                                                                  WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use standard file APIs to delete files with this prefix.
    Epoch 1 |   Training | Elapsed Time: 0:30:09 | Steps: 2333 | Loss: 226.253253                                                                  
    Epoch 1 | Validation | Elapsed Time: 0:03:44 | Steps: 1167 | Loss: 211.426090 | Dataset: en/clips/dev.csv                                      
    I Saved new best validating model with loss 211.426090 to: /home/aisystem/Documents/DeepSpeech/new_checkpoint/best_dev-76936
    Epoch 2 |   Training | Elapsed Time: 0:30:18 | Steps: 2333 | Loss: 226.231258                                                                  
    Epoch 2 | Validation | Elapsed Time: 0:03:42 | Steps: 1167 | Loss: 211.408015 | Dataset: en/clips/dev.csv                                      
    I Saved new best validating model with loss 211.408015 to: /home/aisystem/Documents/DeepSpeech/new_checkpoint/best_dev-79269
    Epoch 3 |   Training | Elapsed Time: 0:30:15 | Steps: 2333 | Loss: 226.226419                                                                  
    Epoch 3 | Validation | Elapsed Time: 0:03:45 | Steps: 1167 | Loss: 211.432017 | Dataset: en/clips/dev.csv                                      
    I Early stop triggered as (for last 4 steps) validation loss: 211.432017 with standard deviation: 0.014503 and mean: 211.425881
    I FINISHED optimization in 2:16:13.002074
    I Restored variables from best validation checkpoint at /home/aisystem/Documents/DeepSpeech/new_checkpoint/best_dev-79269, step 79269
    Testing model on en/clips/test.csv
    Test epoch | Steps: 2242 | Elapsed Time: 1:59:23                                                                                               
    Test on en/clips/test.csv - WER: 1.000000, CER: 0.800744, loss: 190.522720
    --------------------------------------------------------------------------------
    WER: 18.000000, CER: 4.000000, loss: 1153.459351
     - src: "undefined"
     - res: "a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 14.000000, CER: 2.545455, loss: 946.829041
     - src: "kettledrums"
     - res: "a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 10.000000, CER: 4.750000, loss: 672.751648
     - src: "amen"
     - res: "a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 7.333333, CER: 1.481481, loss: 181.943405
     - src: "programming requires brains"
     - res: "a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 7.000000, CER: 2.789474, loss: 161.275314
     - src: "very well very well"
     - res: "a a a a a a a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 6.750000, CER: 3.125000, loss: 137.110382
     - src: "i didn't fall in"
     - res: "a a a a a a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 6.750000, CER: 2.040000, loss: 170.167328
     - src: "mother's on the extension"
     - res: "a a a a a a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 6.500000, CER: 1.840000, loss: 168.767593
     - src: "can't you understand that"
     - res: "a a a a a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 6.000000, CER: 2.062500, loss: 110.969696
     - src: "poetry and truth"
     - res: "a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    WER: 6.000000, CER: 2.647059, loss: 126.520866
     - src: "it's not my house"
     - res: "a a a a a a a a a a a a a a a a a a a a a a a a "
    --------------------------------------------------------------------------------
    I Exporting the model...
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.compat.v1.graph_util.convert_variables_to_constants
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.compat.v1.graph_util.extract_sub_graph
    I Models exported at /home/aisystem/Documents/DeepSpeech/export/new_destination/

Thanks ,
Ansari

Thanks, well, that confirms what i mentionned earlier … :confused:

What should I do to solve this issue?

Change the hyperparameters, train for longer, try data augmentation, etc. This is a general research problem and not a bug that can be solved by a simple tweak. The common voice English data is not currently large enough to train a general speech recognition engine for English. You can probably still train useful models, but definitely not with our default hyperparameters.