How to get word timestamp by the ctc_beam_search_decoder_batch function?

The return values seems to be only probabilities and texts, how can I get the timestamp for each predict word without using the client?

Have you read the API ? Metadata structure holds that.

I know we can do this in python deepspeech binary, but can modified the following source code to get both the text and timestamp directly form a saved model? https://github.com/mozilla/DeepSpeech/blob/master/evaluate.py#L114

That info is not currently exposed in the Python bindings for the decoder, no. You’d have to modify it (native_client/ctcdecode/__init__.py) and rebuild the bindings.