How much data you training on? Epoch 2?
How many hours? Sounds good for epoch 2, train until epoch 10 or so
Good numbers and good sound, nothing to worry about, just train for longer
@erogol this may interest you, I didn’t know that IBM is using LPCNet:
http://srv-wtts.haifa.il.ibm.com/TTS-voice-conversion-IS2019/
Hello @carlfm01, could you explain to me how I do these steps please. I don’t know exactly how to do them. I would appreciate.
Are these results generated by Tacotron training?
How long did tacotron 2 training take you?. for the 47k steps
thx I saw the work at interspeech but it has a complex work with proprietary parts for linguistic features. However, it shows how promising LPCNet is.
Yes, I’ve tried to adapt a new speaker from male to female but failed, now I’m trying with a new run of male to male voice. New male voice data on the way!
Use the preprocess.py of tacotron, then replace the generated audio directory with your feature extract audio directory.
The 47k audios? Yes
About 2 days using a single K80
Tanks @carlfm01.
I saw that in your Spanish version of Tacotron, in haparams, you have a sample_rate of 16,000, but the data you shared is in 22050. Did you process it that way?
Yes, just make sure your header removal script is converting it to 16KHz, prior to ./feature_extract.sh
result for a new speaker fine tune with 10k steps for taco2 and 1 epoch for LPCNet, still training.
to much ‘s’ sounds, like whisper from the dataset and the breathing is too loud.
voice adapt.zip (1,1 MB)
19h for this new speaker(I’ll share )
Yes the Mozilla’s version with GL:
tuxmozillatts.zip (266,1 KB)
The wavernn fork did not converge and I’m limited with the compute power that I can spent. I guess it needs more training.
I get this error in tacotron training, do you know what it can be?
Traceback (most recent call last):
File “/usr/lib/python3.5/threading.py”, line 914, in _bootstrap_inner
self.run()
File “/usr/lib/python3.5/threading.py”, line 862, in run
self._target(*self._args, **self._kwargs)
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 162, in _enqueue_next_train_group
examples = [self._get_next_example() for i in range(n * _batches_per_group)]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 162, in
examples = [self._get_next_example() for i in range(n * _batches_per_group)]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 196, in _get_next_example
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))
File “/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py”, line 1174, in resize
return mu.zeros(new_shape, a.dtype)
ValueError: negative dimensions are not allowed
Exception in thread background:
Traceback (most recent call last):
File “/usr/lib/python3.5/threading.py”, line 914, in _bootstrap_inner
self.run()
File “/usr/lib/python3.5/threading.py”, line 862, in run
self._target(*self._args, **self._kwargs)
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 176, in _enqueue_next_test_group
test_batches, r = self.make_test_batches()
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 145, in make_test_batches
examples = [self._get_test_groups() for i in range(len(self._test_meta))]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 145, in
examples = [self._get_test_groups() for i in range(len(self._test_meta))]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 129, in _get_test_groups
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))
File “/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py”, line 1174, in resize
return mu.zeros(new_shape, a.dtype)
ValueError: negative dimensions are not allowed
Definitely something wrong with your extracted features, you mind sharing the extraction scripts to check?
feature_extract.sh
mkdir -p /home/manuel_garcia02/LPCNet/spanish/audio/
for i in /home/manuel_garcia02/LPCNet/spanish/s16/*.s16
do
./dump_data -test $i /home/manuel_garcia02/LPCNet/spanish/audio/$(basename “$i” | cut -d. -f1).npy
echo $i
done
header_removal.sh
mkdir -p spanish/s16
for i in spanish/locutores/wavs/*.wav
do
sox $i -r 16000 -c 1 -t sw - > spanish/s16T/audio-$(basename "$i" | cut -d. -f1).s16
echo $i
done
##merge all PCM to single file
mkdir -p spanish/pcm
for i in spanish/s16T/*.s16
do
cat "$i" >> spanish/pcm/final.pcm
echo $i
done
echo "Final.pcm created..."
did you make sure this is compiled with taco=1?
did you replace the audio directory created by preprocess.py with this?
then it may be a broken audio, can you sort by duration and see if the shortest is correct?