Hi, I was trying to train my own model over on google colab.
Everything seemed to be working fine a few days ago but today all of a sudden the training does not resume from the checkpoints anymore.
As you can see below, the logs do say that checkpoint has been restored but the model starts training back from epoch 0. I also tried using the checkpoint from the 0.4.1 release but no luck. It always starts back from epoch 0.
Has anything in the code been changed over the last 3-4 days ? Cause prior to that it used to work without any problem
Here is the log from my run.
python -u DeepSpeech.py --train_files /content/SubGen/scripts/train.csv --dev_files /content/SubGen/scripts/dev.csv --test_files /content/SubGen/scripts/val.csv --train_batch_size 12 --dev_batch_size 12 --test_batch_size 12 --n_hidden 2048 --epoch -6 --validation_step 1 --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.1 --learning_rate 0.0001 --report_count 100 --use_seq_length False --export_dir /gdrive/My Drive/exported_models/ --checkpoint_dir /gdrive/My Drive/deepspeech-0.4.1-checkpoint
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It is easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Restored variables from most recent checkpoint at /gdrive/My Drive/deepspeech-0.4.1-checkpoint/train-14580, step 14580
I STARTING Optimization
I Training epoch 0...
2% (67 of 2916) | | Elapsed Time: 0:01:05 ETA: 0:46:00
Thanks!