In my training data I have a few samples where the speaker code switches to a word in another language that I don’t want to train on. Can I add a tag such so that word will be ignored when the model is trained?
train.csv
./audio/sample1.wav, 5000, I went to the restaurant and <UNK> we ordered off the menu
I’m wondering how this can happen as the CTC loss function works at the character level. If not I’ll just create two training samples around that word or remove those samples. Any help is appreciated!