Hi,
I am trying to train on my custom data set. There are two approaches that i am trying one is to adapt on the default frozen model and second is to use the default checkpoints. I have attached my system configuration.
When i am training from the checkpoint this is how i am giving my command
python -u DeepSpeech.py \
--train_files /home/hdpuser/models/csv/TRAIN/TRAIN.csv \
--dev_files /home/hdpuser/models/csv/DEV/DEV.csv \
--test_files /home/hdpuser/models/csv/TEST/TEST.csv \
--n_hidden 2048 \
--epoch 3 \
--export_dir /home/hdpuser/models/results/model_export/ \
--lm_binary_path /home/hdpuser/models/lm.binary \
--checkpoint_dir /home/hdpuser/models/results/checkout/ \
--decoder_library_path /home/hdpuser/DeepSpeech/NativeClient/native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path /home/hdpuser/models/alphabet.txt \
--lm_trie_path /home/hdpuser/models/trie \
--summary_dir /home/hdpuser/models/summary \
--validation_step 1 \
--limit_train 2 \
--limit_test 2 \
--limit_dev 2 \
When i am trying to train from the frozen model this my command
python -u DeepSpeech.py \
--train_files /home/hdpuser/models/csv/TRAIN/TRAIN.csv \
--dev_files /home/hdpuser/models/csv/DEV/DEV.csv \
--test_files /home/hdpuser/models/csv/TEST/TEST.csv \
--initialize_from_frozen_model /home/hdpuser/models/output_graph.pb \
--n_hidden 2048 \
--epoch 3 \
--export_dir /home/hdpuser/models/results/model_export/ \
--lm_binary_path /home/hdpuser/models/lm.binary \
--checkpoint_dir /home/hdpuser/models/results/checkout/ \
--decoder_library_path /home/hdpuser/DeepSpeech/NativeClient/native_client/libctc_decoder_with_kenlm.so \
--alphabet_config_path /home/hdpuser/models/alphabet.txt \
--initialize_from_frozen_model /home/hdpuser/models/output_graph.pb \
--lm_trie_path /home/hdpuser/models/trie \
--summary_dir /home/hdpuser/models/summary \
--validation_step 1 \
--limit_train 2 \
--limit_test 2 \
--limit_dev 2 \
Both these process get automatically KILLED ans when i use dmesg I get the following
**Out of memory: Kill process 3382 (python) score 402 or sacrifice child**
** Killed process 3382 (python) total-vm:10642244kB, anon-rss:6961516kB, file-rss:0kB**
Can anyone please help with this and point out where exactly the issue is.
Thanks in advance