Hello,
I created an arabic model using:
>alphabets.txt awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' vocab.txt | sort >/dev/null
kenlm/build/bin/lmplz --text vocab.txt --arpa words.arpa --o 4
kenlm/build/bin/build_binary trie -q 16 -b 7 -a 64 words.arpa lm.binary
nat_client.0.1.1/generate_trie alphabets.txt lm.binary vocab.txt words.trie
Then I imported my waves (16KHz, 16bit, 1channel) and created csv files, as expected. When I started deepspeech learning, it runs for some epochs (~2 CPU hours), then triggers an early stop. Just for the sake of testing the system I thought that overfitting a single wav in all test/dev/train should show positive results, like that seen with ldc93s1. But I still get early stops with WER=1, loss>100 and weird output after a short CPU time.
My runner has:
python -u DeepSpeech.py \
--train_files "$COMPUTE_DATA_DIR/arabic1.csv" \
--dev_files "$COMPUTE_DATA_DIR/arabic1.csv" \
--test_files "$COMPUTE_DATA_DIR/arabic1.csv" \
--alphabet_config_path "$COMPUTE_DATA_DIR/alphabets.txt" \
--lm_binary_path "$COMPUTE_DATA_DIR/lm.binary" \
--lm_trie_path "$COMPUTE_DATA_DIR/words.trie" \
\#--export_dir "$COMPUTE_DATA_DIR/exported.model" \
--train_batch_size 1 \
--dev_batch_size 1 \
--test_batch_size 1 \
--epoch 100 \
--display_step 1 \
--validation_step 1 \
--dropout_rate 0.10 \
--n_hidden 1024 \
--default_stddev 0.03125 \
--learning_rate 0.00001 \
--checkpoint_dir "$checkpoint_dir" \
--checkpoint_secs 1800 \
--summary_secs 1800
"$@"
I tried n_hidden = 2048, 512, 1024 (with sdt_dev= sqrt(2/(2*n_hidden)))
I tried rates=0.0001 and 0.00001
I manually checked the ARPA file, it was as expected. Seems that I’m missing a crucial point that’s blocking any learning.