I’ve rented the same instance, and I’m running some experiments. This is on Ubuntu 16.04 provided by Amazon, and installed nvidia-384
driver. Then hand-installed CUDA 9.0 + CuDNN v7.
The first run took 30-45 secs before (expected) failure. I don’t really know why. Subsequent runs seems nicer:
ubuntu@ip-172-31-32-91:~/ds/gpu$ time ./deepspeech ../models/output_graph.pb ../models/alphabet.txt ../audio/ -t
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-07 14:20:41.678897: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-07 14:20:41.786354: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-07 14:20:41.786738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.77GiB freeMemory: 15.35GiB
2018-03-07 14:20:41.786766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Running on directory ../audio/
> ../audio//8455-210777-0068.wav
your powr is sufficient i said
cpu_time_overall=2.45969 cpu_time_mfcc=0.00468 cpu_time_infer=2.45501
> ../audio//4507-16021-0012.wav
why should one halt on the way
cpu_time_overall=0.37646 cpu_time_mfcc=0.00452 cpu_time_infer=0.37194
> ../audio//2830-3980-0043.wav
experience proves tis
cpu_time_overall=0.28520 cpu_time_mfcc=0.00326 cpu_time_infer=0.28194
real 0m4.124s
user 0m2.704s
sys 0m1.524s
Runs with mmap()
are even nicer:
ubuntu@ip-172-31-32-91:~/ds/gpu$ time ./deepspeech ../models/output_graph.pbmm ../models/alphabet.txt ../audio/ -t
2018-03-07 14:21:25.242678: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-03-07 14:21:25.342985: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-07 14:21:25.343375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.77GiB freeMemory: 15.35GiB
2018-03-07 14:21:25.343403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Running on directory ../audio/
> ../audio//8455-210777-0068.wav
your powr is sufficient i said
cpu_time_overall=0.71324 cpu_time_mfcc=0.00432 cpu_time_infer=0.70892
> ../audio//4507-16021-0012.wav
why should one halt on the way
cpu_time_overall=0.45870 cpu_time_mfcc=0.00454 cpu_time_infer=0.45416
> ../audio//2830-3980-0043.wav
experience proves tis
cpu_time_overall=0.33660 cpu_time_mfcc=0.00332 cpu_time_infer=0.33327
real 0m2.201s
user 0m1.588s
sys 0m0.732s
Speaking in term of realtime factor, after a few runs, I get those stable (low variation) values:
file |
audio length |
cpu_infer_time |
|
rt factor |
…/audio//8455-210777-0068.wav |
1,975 |
0,70892 |
|
0,358946835443038 |
…/audio//4507-16021-0012.wav |
2,735 |
0,45416 |
|
0,166054844606947 |
…/audio//2830-3980-0043.wav |
2,59 |
0,33327 |
|
0,128675675675676 |
This is with CUDA 9.0 / CuDNN v7.