Deep Speech training a new model requires a git checkout instead of using the pip install version

This is a bit of a nuisance having to maintain a working GIT checkout if you happen to need to train your own model. Is there a reason why it’s been setup this way? For inference you can just use the latest pip version but it will stop working if your trained model isn’t generated by the latest version. Seems it would be better to be able to train using the pip version so you can avoid the model mismatch issues.

Making a pip-able training version is just a lot of work for relatively few users. Most people will simple get a trained model and use it. So I don’t think this will change.

But you can simply switch both the pip and the git version to get your desired version number. The current version of pip is 0.5.1, so you can swicth to that in git by

git checkout tags/v0.5.1

True but maybe more users would be willing to make their own models it if it was a smoother process?

I can understand that argument, but honestly, git dependency is a very low pain point, IMHO, on the road to a model.

But someone already opened an issue on Github for that. As @othiele said, it’s a lot of work, I’m not sure we have the bandwidth to take care of that as of now.

1 Like

My personal pain point is coming from upgrading to Python 3.8 with my project using DSAlign which is a separate project and it of course needs DeepSpeech for inference, but I found the DeepSpeech interface had changed in the last month or so (something about Model not taking 5 arguments anymore but just two). And after removing the extra parameters i got a different error trying to load my trained model. So I was going through the process of recreating a working recent DeepSpeech checkout for retraining a model when I realised I couldn’t avoid setup a working GIT from scratch each time I did this if possible, it would be nice to just run a (say) DeepSpeechTrain script with the same arguments to prepare a new model. But the current process needs you to access the DeepSpeech.py file and execute it from the top level of the checkout to make it work.

I would like to write down a list of instructions to recreate a recent training and inference pair more or less semi autonomously. Maybe I’ll merge my code and DSAlign into DeepSpeech and that might help maintain it. But it’s messy to think that’s the best way? If there is a better one i’d like to know. I’m coming at DeepSpeech from a ‘user’ perspective more than a ‘developer’.

I already have a repo with a Docker, is this what you are looking for ? https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

Why do you need to upgrade to 3.8 absolutely ?

Well, as a user, you can install the deepspeech module pretty easily.

Right, but I don’t get the point here, why do you need training and inference in the same place ?

With two separate virtual env, you don’t have a problem, do you ?

Yes, we worked to simplify the API. Maybe, if you are required to use Python 3.8, it’d be easier to just rebuild new Python bindings using v0.5.1 binaries for you ?

It’s just a matter of running make -C native_client/bindings TFDIR=... with TFDIR holding a bazel-bin/native_client/libdeepspeech.so, so you could use v0.5.1 libdeepspeech.so.

It’s got some asyncio features which are useful for writing a user interface for DSAlign for my language. Since i’m working on a non-english language with almost zero labelled data(only the one i hand labelled painstakingly!), there about a dozen different hyperparameters to tinker with when building a good model. To do that you really do need a good GUI to see the result of changing a hyperparameter during alignment. Asyncio helps with such things, eg if you’re familiar with Lisp, with an asyncio Python repl you have two things, all simultaneously: a)a web user interface to change hyperparams and to see results of training, b)a REPL to make code and data changes (to UI or algorithm). Normally Python won’t allow you to do both of these things, either you have a REPL and no UI or a UI and no REPL. Asyncio gives you both out of the box. It’s fantastic.

How do you mean, for inference? Yeah I can. But that’s only part of it. If you’re not looking at English then a user has to get his hands dirty :slight_smile: I’m in the minority of not having an existing model to work with so training is actually more important than inference since I am in the early bootstrapping phase of having no labelled data, it’s important to get forced alignment working well.

Two separate virtual envs worked til I had to upgrade Python and then i had to recreate the DSAlignment tree which I did but it could no longer use my old model so then I had to jump back into the DeepSpeech tree and try to get it working again. I did that once so i could do it again, but i don’t think i would want to do it again and again in future when Python goes up to 4.0 or if the TF version changes or something, it sounds hacky. I was looking into it and apparently there is a thing called Git Subtree which might be the way to go, i’ll move my code and my modified DSAlign code and DeepSpeech into a common project directory and use a single VEnv. And then i’ll write some scripts to run inference and training and alignment from their respective locations relative to the project root. What do you think?

Training a DeepSpeech model is a developer task, not a user task. A user task is doing inference on an already trained model.

I guess that’s a matter of interpretation, i have no idea on how DeepSpeech works nor could I develop it but i’m using the technology so in that sense i’m a user. Conceptually what’s the difference? Both cases i’m running scripts from the bin/ directory, one that does inference and one that does training. I don’t end up using that inference script (except for testing!) but conceptually if a user can run one script to do an inference, they can also run another script to do a training run. What’s the difference? Neither involve development per se.

At some point, no.

A developper that wants to use DeepSpeech for inference should just have a library to install.

A developper that wants to train a DeepSpeech model needs some understanding of training machine learning in general, have good overview of its dataset, ensure they are properly imported, create a proper language model, and then tune several parameters. Before starting serious training phases.

So yeah, pip install deepspeech-train might be cool, but the amount of work to make that doable properly, with all the edge cases, is non trivial. On the other hand, the friction from a git clone is very very very low compared to all the requirements a serious training requires.

Not that we don’t want to work on that, but we still have much more urgent tasks to tackle to improve the project.

FTR: https://github.com/mozilla/DeepSpeech/issues/2219

I honestly don’t get what your problems are.

Still don’t get that. There’s no such thing as “minority”, I’m hacking on french model, so I kind of understand the problem.

You can maintain a DeepSpeech virtualenv separately from your DSAlign env, using different python version, I really don’t understand where you problem lies here.

1 Like

Yeah I saw that issue but it was about retraining.

Fair enough.

The problem is I used pip to install DeepSpeech into the DSAlign VEnv which pulled in the latest Pip version (which is fine and a good thing) but my model was trained with an older git checkout of DeepSpeech and there were breaking changes in how DeepSpeech (inference) expects the model to be. It would be nice from a software engineering perspective to be able to ignore changes in DeepSpeech (inference or training) and just treat it as a blackbox and use it in a uniform way regardless of what has happened since. If the user interface is stable (and through DeepSpeech.py and DeepSpeech the inference binary, lets assume it is) then you don’t have to worry about breaking changes since the two can be kept in sync). I’m leaning towards figuring out how git subtree works and having a DeepSpeech git subtree from which I can call training and inference from the top level of the project directory and forget about pip install altogether.

Subtree idea failed,

$ git subtree add --prefix DeepSpeech https://github.com/mozilla/DeepSpeech master

Error downloading object: DeepSpeech/data/lm/lm.binary (e1fa680): Smudge error: Error downloading DeepSpeech/data/lm/lm.binary (e1fa6801b25912a3625f67e0f6cafcdacb24033be9fad5fa272152a0828d7193): batch request: missing protocol: "

error: external filter ‘git-lfs filter-process’ failed
fatal: DeepSpeech/data/lm/lm.binary: smudge filter lfs failed

Don’t even need the LM but i’m not sure how to tell git not to bother with lfs. I guess I have to find another way. I have heard bad things about SubModules so avoiding them for now.

edit, removed git-lfs and deleted the .git/hooks/post-checkout file and then I was able to complete the subtree command successfully :slight_smile:

This is what I don’t get. You install the deepspeech inference code in the DSAlign setup, right?

I still don’t get what is a problem here.

From my perspective, it seems you want to run with Python 3.8 and the only inference code we have for that is > 0.6, which broke your existing model.

Is that right?

Here, I see several things that are just plain wrong. Currently, we don’t guarantee any stability at all.

Also, you can pip install deepspeech=x as well as git clone && git checkout origin/x, so I still don’t get how you cannot keep things in sync.