Serving a model trained with DeepSpeech?

p.holetzky · January 14, 2018, 1:13pm

Hello there,

in the Wiki it says I can use the model with TensorFlow Serving.

you can also use the model exported by export directly with TensorFlow Serving.

Is this information correct? I found comments that serving is no longer supported in the Github Issues section.

I created a simple websocket/bottlepy based server based on the deepspeech-server project for real-time STT and while it works nicely with a single client, I am wondering how to allow inference for multiple users at the same time using one model.
If I understand correctly TensorFlow Serving would be the answer?

Servables are the central abstraction in TensorFlow Serving. Servables are the underlying objects that clients use to perform computation (for example, a lookup or inference).

Thanks for the help

lissyx · January 15, 2018, 8:39am

We used to support Tensorflow Serving in the past, but we stopped doing so, so far. Likely the Wiki needs to be updated :).

For your multiple users inference, I’m not sure exactly what is your problem?

p.holetzky · January 15, 2018, 12:18pm

Oh I see, thanks for clearing that up!

I’m don’t have experience in deploying servers, but I feel that the server quickly reaches its limits if an inference of a 5 seconds wave file takes about 2.5 seconds on my hardware. (e.g. 3 or more users can’t be served concurrently without a delay?)
Maybe I’m looking at this too simple?

LearnedVector · March 13, 2018, 8:47pm

@p.holetzky, I am wondering the same thing. Have you tried this in production? any tips?