Problems on running Deepspeech on GPU

Hi everyone,

I’ve already read few posts about this, but i can’t seem to find an answer.

My problem is that Deepspeech doesn’t seem to run on GPU when training a model. Maybe I’m missing something, but I think I did everything accordingly to the Readme of the repo. Here is what I did:

  • create virtualenv with ‘… -p python3’ and activated it
  • cloned the repo and did these steps:
pip3 install -r requirements.txt
pip3 install $(python3 util/taskcluster.py --decoder)

pip3 uninstall tensorflow
pip3 install 'tensorflow-gpu==1.14.0'

I also should have all the CUDA dependencys, that’s what my colleague did and also nvidia-smi spits out the following:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:02:00.0  On |                  N/A |
| 46%   31C    P8     6W / 105W |    383MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1173      G   /usr/lib/xorg/Xorg                           243MiB |
|    0      1391      G   /usr/bin/gnome-shell                         137MiB |
+-----------------------------------------------------------------------------+

When starting training, Deepspeech gives me some warnings though, one of them is:

WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.

which makes me wonder, shouldn’t it be /site-packages/tensorflow-GPU/ or something? Also pip3 list gives me this output:

absl-py              0.8.0    
asn1crypto           0.24.0   
astor                0.8.0    
attrdict             2.0.1    
audioread            2.1.8    
bcrypt               3.1.7    
beautifulsoup4       4.8.0    
bs4                  0.0.1    
certifi              2019.9.11
cffi                 1.12.3   
chardet              3.0.4    
cryptography         2.7      
cycler               0.10.0   
decorator            4.4.0    
deepspeech-gpu       0.5.1    
ds-ctcdecoder        0.6.0a5  
gast                 0.3.2    
google-pasta         0.1.7    
grpcio               1.23.0   
h5py                 2.10.0   
idna                 2.8      
joblib               0.13.2   
Keras-Applications   1.0.8    
Keras-Preprocessing  1.1.0    
kiwisolver           1.1.0    
librosa              0.7.0    
llvmlite             0.29.0   
Markdown             3.1.1    
matplotlib           3.1.1    
numba                0.45.1   
numpy                1.15.4   
pandas               0.25.1   
paramiko             2.6.0    
pip                  19.2.3   
progressbar2         3.46.1   
protobuf             3.9.1    
pycparser            2.19     
PyNaCl               1.3.0    
pyparsing            2.4.2    
python-dateutil      2.8.0    
python-utils         2.3.0    
pytz                 2019.2   
pyxdg                0.26     
requests             2.22.0   
resampy              0.2.2    
scikit-learn         0.21.3   
scipy                1.3.1    
setuptools           41.2.0   
six                  1.12.0   
SoundFile            0.10.2   
soupsieve            1.9.3    
sox                  1.3.7    
tensorboard          1.14.0   
tensorflow-estimator 1.14.0   
tensorflow-gpu       1.14.0   
termcolor            1.1.0    
urllib3              1.25.3   
Werkzeug             0.15.6   
wheel                0.33.6   
wrapt                1.11.2 

What am I missing? Hoping that someone can give me a hint or something.

Thanks in advance!
gneulyn

P.s.: everything seems to works fine, just not on the GPU

It’s installed. If you are starting training in the properly activated virtualenv there is no reason this would not work.

Is this during training ?

You could start by sharing more training logs. And run with --log_level x with x > 1.

That would be consistent with an improperly uninstalled tensorflow python wheel, or incorrectly setup virtualenv. But yes, the package name in that warning should be tensorflow-gpu (the warnings themselves are harmless).

Hey lissyx,

thanks for helping. I was kinda hoping for your replay :wink:

Here is a loglevel 2 log, when i start training:

./DeepSpeech.py --train_files ~/Desktop/Files/train.csv --dev_files ~/Desktop/Files/dev.csv --test_files ~/Desktop/Files/test.csv --epochs 1 --export_dir ~/Desktop/190919_Deepspeech/model_export --checkpoint_dir ~/Desktop/190919_Deepspeech/checkpoints --test_batch_size 200 --train_batch_size 200 --dev_batch_size 200 --log_level 2
WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
W0919 16:54:59.203839 140230625675072 deprecation.py:323] From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0919 16:54:59.264383 140230625675072 deprecation.py:323] From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0919 16:54:59.264564 140230625675072 deprecation.py:323] From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0919 16:54:59.264675 140230625675072 deprecation.py:323] From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0919 16:55:01.491833 140230625675072 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0919 16:55:01.493896 140230625675072 deprecation.py:506] From /home/encoder80/Desktop/190919_Deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f89304ef940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f89304ef940>>: AttributeError: module 'gast' has no attribute 'Num'
W0919 16:55:01.522188 140230625675072 ag_logging.py:145] Entity <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f89304ef940>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method LSTMBlockWrapper.call of <tensorflow.contrib.rnn.python.ops.lstm_ops.LSTMBlockFusedCell object at 0x7f89304ef940>>: AttributeError: module 'gast' has no attribute 'Num'
WARNING:tensorflow:From ./DeepSpeech.py:232: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0919 16:55:01.582261 140230625675072 deprecation.py:323] From ./DeepSpeech.py:232: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 0 |   Training | Elapsed Time: 0:00:26 | Steps: 1 | Loss: 358.728058                                                                                                                    

properly activated virtualenv - by running the activate script, right?

source path/to/bla/activate

the nvidia-smi while training looks like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:02:00.0  On |                  N/A |
| 46%   32C    P8     8W / 105W |    342MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1173      G   /usr/lib/xorg/Xorg                           129MiB |
|    0      1391      G   /usr/bin/gnome-shell                         132MiB |
|    0     27317      C   python                                        77MiB |
+-----------------------------------------------------------------------------+

I think I followed every step in the Readme. How can I properly uninstall tensorflow? I already tried to remove the folder, but then the module is not found of course…

Hoping for your worthy advice

any news here? kinda have the same problem

pip uninstall tensorflow maybe ? Also make sure you’re consistent with pip/pip3 usages, and all inside the virtualenv …

What’s PID 27317 ?