The return values seems to be only probabilities and texts, how can I get the timestamp for each predict word without using the client?
Have you read the API ? Metadata
structure holds that.
I know we can do this in python deepspeech binary, but can modified the following source code to get both the text and timestamp directly form a saved model? https://github.com/mozilla/DeepSpeech/blob/master/evaluate.py#L114
That info is not currently exposed in the Python bindings for the decoder, no. You’d have to modify it (native_client/ctcdecode/__init__.py
) and rebuild the bindings.