Hi @chesterkuo, How did u add server ip for distributed training.
I did this but i encounter errors, maybe I am not doing wright way. Can you point what’s wong in this script.
python -u DeepSpeech.py \
--train_files /data/zh_data/data_thchs30/train.csv \
--dev_files /data/zh_data/data_thchs30/dev.csv \
--test_files /data/zh_data/data_thchs30/test.csv \
--train_batch_size 80 \
--dev_batch_size 80 \
--test_batch_size 40 \
--n_hidden 375 \
--epoch 200 \
--validation_step 1 \
--early_stop True \
--earlystop_nsteps 6 \
--estop_mean_thresh 0.1 \
--estop_std_thresh 0.1 \
--dropout_rate 0.22 \
--learning_rate 0.00095 \
--report_count 100 \
--use_seq_length False \
--export_dir /data/zh_data/exportDir/ \
--checkpoint_dir /data/zh_data/checkpoint/ \
--decoder_library_path /data/jugs/asr/DeepSpeech/native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path /data/zh_data/alphabet.txt \
--lm_binary_path /data/zh_data/zh_lm.binary \
--lm_trie_path /data/zh_data/trie \
--ps_hosts "104.211.xx.xx:2222" \
The error is on --ps_host
parameter. If its not that way to assign parameter server, how should i do. And, my error is:
Traceback (most recent call last):
File "DeepSpeech.py", line 1838, in <module>
tf.app.run()
File "/home/maybe/anaconda3/envs/asr/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 1795, in main
train()
File "DeepSpeech.py", line 1501, in train
results_tuple, gradients, mean_edit_distance, loss = get_tower_results(model_feeder, optimizer)
File "DeepSpeech.py", line 633, in get_tower_results
device = tf.train.replica_device_setter(worker_device=available_devices[i], cluster=cluster)
File "/home/maybe/anaconda3/envs/asr/lib/python3.6/site-packages/tensorflow/python/training/device_setter.py", line 197, in replica_device_setter
cluster_spec = cluster.as_dict()
File "/home/maybe/anaconda3/envs/asr/lib/python3.6/site-packages/tensorflow/python/training/server_lib.py", line 334, in as_dict
if max(task_indices) + 1 == len(task_indices):
ValueError: max() arg is an empty sequence