So here’s my problem, I’m trying to create a personal healthcare assistant for healthcare providers. I only have a specific set(23 in total) of commands that the assistant recognizes. I want to overfit the Deepspeech model on these sentences as these are few in number. I want to do this such that I can have a high accuracy on a small amount of data.
How I go about doing that is I have 6 samples of each command from a different speaker. My validation and test data is the training data itself (overfitting right? ). However after continuing training for 3 epochs from a pretrained model results in it just predicting the letter h.
The following is the log after training the model:
Computing acoustic model predictions...
100% (46 of 46) |######################################################################################################| Elapsed Time: 0:01:39 Time: 0:01:39
Decoding predictions...
100% (46 of 46) |######################################################################################################| Elapsed Time: 0:01:16 Time: 0:01:16
Test - WER: 10.146552, CER: 3.705833, loss: 196.455765
--------------------------------------------------------------------------------
WER: 24.500000, CER: 96.000000, loss: 166.814606
- src: "next patient"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h"
--------------------------------------------------------------------------------
WER: 22.000000, CER: 86.000000, loss: 123.011139
- src: "whos next"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h "
--------------------------------------------------------------------------------
WER: 21.500000, CER: 85.000000, loss: 130.863281
- src: "next patient"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h "
--------------------------------------------------------------------------------
WER: 20.500000, CER: 79.000000, loss: 116.693535
- src: "whos next"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h"
--------------------------------------------------------------------------------
WER: 20.333333, CER: 120.000000, loss: 190.458618
- src: "my first patient"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h "
--------------------------------------------------------------------------------
WER: 17.000000, CER: 98.000000, loss: 200.597824
- src: "how many appointments"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h"
--------------------------------------------------------------------------------
WER: 15.500000, CER: 119.000000, loss: 207.080734
- src: "whos my first patient"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h"
--------------------------------------------------------------------------------
WER: 14.666667, CER: 85.000000, loss: 176.761948
- src: "my first patient"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h"
--------------------------------------------------------------------------------
WER: 14.000000, CER: 81.000000, loss: 129.585342
- src: "who is next"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h "
--------------------------------------------------------------------------------
WER: 14.000000, CER: 81.000000, loss: 155.495468
- src: "who is next"
- res: "h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h "
If anyone can help out in identifying the problem in either the training steps or the data quantity/quality. I’m also in need of any different solutions anyone can suggest in approaching this specific requirement. Thanks.