I am fine tuning DeepSpeech pretrained models v0.5.1 with Hindi dataset.
Added alphabet.txt with the unique characters got from util/check_characters.py. but i get the below error:
Key error “\u200d”
Your transcripts contain characters which do not occur in data/alphabet.txt! Use util/check_characters.py to see what characters are in your {train,dev,test}.csv transcripts, and then add all these to data/alphabet.txt.’
\u200d is zero width nonjoiner and \u200c is zero width joiner.They are non printable characters.
How do I overcome this error?