Final results LPCNet + Tacotron2 (Spanish)

carlfm01 · October 4, 2019, 5:15am

How much data you training on? Epoch 2?

carlfm01 · October 4, 2019, 5:24am

How many hours? Sounds good for epoch 2, train until epoch 10 or so

carlfm01 · October 4, 2019, 5:29am

Good numbers and good sound, nothing to worry about, just train for longer

carlfm01 · October 5, 2019, 5:19am

@erogol this may interest you, I didn’t know that IBM is using LPCNet:
http://srv-wtts.haifa.il.ibm.com/TTS-voice-conversion-IS2019/

manuel3265 · October 5, 2019, 5:04pm

Hello @carlfm01, could you explain to me how I do these steps please. I don’t know exactly how to do them. I would appreciate.

Are these results generated by Tacotron training?

How long did tacotron 2 training take you?. for the 47k steps

erogol · October 6, 2019, 4:45pm

thx I saw the work at interspeech but it has a complex work with proprietary parts for linguistic features. However, it shows how promising LPCNet is.

carlfm01 · October 6, 2019, 6:55pm

Yes, I’ve tried to adapt a new speaker from male to female but failed, now I’m trying with a new run of male to male voice. New male voice data on the way!

carlfm01 · October 6, 2019, 7:02pm

Use the preprocess.py of tacotron, then replace the generated audio directory with your feature extract audio directory.

The 47k audios? Yes

About 2 days using a single K80

manuel3265 · October 7, 2019, 12:25pm

Tanks @carlfm01.

I saw that in your Spanish version of Tacotron, in haparams, you have a sample_rate of 16,000, but the data you shared is in 22050. Did you process it that way?

carlfm01 · October 7, 2019, 8:56pm

Yes, just make sure your header removal script is converting it to 16KHz, prior to ./feature_extract.sh

carlfm01 · October 10, 2019, 3:01am

result for a new speaker fine tune with 10k steps for taco2 and 1 epoch for LPCNet, still training.
to much ‘s’ sounds, like whisper from the dataset and the breathing is too loud.
voice adapt.zip (1,1 MB)

19h for this new speaker(I’ll share )

manuel3265 · October 10, 2019, 6:29am

Wow @carlfm01. NIce.

Have you tried Tacotron training without LPCNet, if so, are the results good?

carlfm01 · October 10, 2019, 6:51am

Yes the Mozilla’s version with GL:

tuxmozillatts.zip (266,1 KB)

The wavernn fork did not converge and I’m limited with the compute power that I can spent. I guess it needs more training.

manuel3265 · October 10, 2019, 7:23am

@carlfm01

I get this error in tacotron training, do you know what it can be?

Traceback (most recent call last):
File “/usr/lib/python3.5/threading.py”, line 914, in _bootstrap_inner
self.run()
File “/usr/lib/python3.5/threading.py”, line 862, in run
self._target(*self._args, **self._kwargs)
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 162, in _enqueue_next_train_group
examples = [self._get_next_example() for i in range(n * _batches_per_group)]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 162, in
examples = [self._get_next_example() for i in range(n * _batches_per_group)]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 196, in _get_next_example
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))
File “/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py”, line 1174, in resize
return mu.zeros(new_shape, a.dtype)
ValueError: negative dimensions are not allowed
Exception in thread background:
Traceback (most recent call last):
File “/usr/lib/python3.5/threading.py”, line 914, in _bootstrap_inner
self.run()
File “/usr/lib/python3.5/threading.py”, line 862, in run
self._target(*self._args, **self._kwargs)
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 176, in _enqueue_next_test_group
test_batches, r = self.make_test_batches()
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 145, in make_test_batches
examples = [self._get_test_groups() for i in range(len(self._test_meta))]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 145, in
examples = [self._get_test_groups() for i in range(len(self._test_meta))]
File “/home/manuel_garcia02/Tacotron-2/tacotron/feeder.py”, line 129, in _get_test_groups
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))
File “/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py”, line 1174, in resize
return mu.zeros(new_shape, a.dtype)
ValueError: negative dimensions are not allowed

carlfm01 · October 10, 2019, 7:37am

Definitely something wrong with your extracted features, you mind sharing the extraction scripts to check?

manuel3265 · October 10, 2019, 7:37am

@carlfm01 can you give me your pip3 list please?
I would greatly appreciate it

manuel3265 · October 10, 2019, 7:42am

@carlfm01

feature_extract.sh

mkdir -p /home/manuel_garcia02/LPCNet/spanish/audio/
for i in /home/manuel_garcia02/LPCNet/spanish/s16/*.s16
do
./dump_data -test $i /home/manuel_garcia02/LPCNet/spanish/audio/$(basename “$i” | cut -d. -f1).npy
echo $i
done

header_removal.sh

mkdir -p spanish/s16
for i in spanish/locutores/wavs/*.wav
do 
    sox $i -r 16000 -c 1 -t sw - > spanish/s16T/audio-$(basename "$i" | cut -d. -f1).s16
    echo $i
done
##merge all PCM to single file
mkdir -p spanish/pcm
for i in spanish/s16T/*.s16
do 
    cat "$i" >> spanish/pcm/final.pcm
    echo $i
done
echo "Final.pcm created..."

carlfm01 · October 10, 2019, 7:43am

did you make sure this is compiled with taco=1?

did you replace the audio directory created by preprocess.py with this?

manuel3265 · October 10, 2019, 7:46am

@carlfm01 Yes, I did all this

carlfm01 · October 10, 2019, 7:49am

then it may be a broken audio, can you sort by duration and see if the shortest is correct?