Confidence from the metadata of the native client JSON call seems to be: “roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.”
So, in order to get values between 0 … 1 should I divide by the number of alphabet characters? And will this be above 90% for a “good” translation?
Any ideas, comments? What are typical good values?