It returned ‘0’ in both cases using SUDO, so it is FALSE in both cases (no execution); still, the model doesn’t work
I am not so sure if virtual env would help if you couldn’t do it with sudo.
So, any other suggestion?
I don’t have enough information to go on here. The only thing I can offer is my time. If you don’t mind, give me access to a vm or something safe so I can try to set it up in your environment.
Problem solution (thank you, @alchemi5t):
- reinstall all packets and requirements,
- I was lacking pip (I got pip3);
- of course, Python 3.x is necessary,
- no CUDA required, although, I got ‘only’ 12 GB RAM, so that might be an obstacle, since training uses the whole ‘arsenal’.
- I shouldn’t use Jupter Notebook since it has obvious problems with access.
- PYTHONPATH - new test environment
- TTS from github is under test
- LS data set can be anywhere; just need to be ‘linked’ in config.json; of course, metadata.csv has to be divided into train and validation data sets
- I am attaching my config.json file: config.json
Last two lines should be validation and training sets:
“meta_file_train”: “metadata_train.csv”,
“meta_file_val”: “metadata_val.csv”,
but I havent split them yet
EDIT: it crashes from 68th epoch, but I don’t know why…
Several notifications, among them:
| > Synthesizing test sentences
!! Error creating Test Sentence - 0
....
OSError: [Errno 12] Cannot allocate memory
| > Training Loss: 0.06097 Validation Loss: 0.07554
I got 8 threats and ~12 GB
below you can find config.json file, but for Polish language.
My data set contains 1271 samples and it is split into 1144 and 127 for training and evaluation.
config.json
Hi shad,
I’ve read your DM. I’ll try to help you online first before I get working on it myself. Could you post a little bit more of the log?
Also, if it’s only for training for your thesis, I might be able to train and give you a model for the config you give along with the data.
Hi there Here are some ‘news’ from the log:
> Epoch 96/100
| > Step:17/141 GlobalStep:13650 PostnetLoss:0.06650 DecoderLoss:0.07570 StopLoss:0.47290 AlignScore:0.0880 GradNorm:0.24001 GradNormST:0.32928 AvgTextLen:21.8 AvgSpecLen:165.5 StepTime:0.73 LoaderTime:0.29 LR:0.000054
| > Step:42/141 GlobalStep:13675 PostnetLoss:0.05865 DecoderLoss:0.06999 StopLoss:0.27385 AlignScore:0.0626 GradNorm:0.13811 GradNormST:0.12845 AvgTextLen:30.4 AvgSpecLen:180.6 StepTime:0.88 LoaderTime:0.31 LR:0.000054
| > Step:67/141 GlobalStep:13700 PostnetLoss:0.05929 DecoderLoss:0.06929 StopLoss:0.58532 AlignScore:0.0471 GradNorm:0.16014 GradNormST:0.47606 AvgTextLen:40.6 AvgSpecLen:228.6 StepTime:1.02 LoaderTime:0.35 LR:0.000054
| > Step:92/141 GlobalStep:13725 PostnetLoss:0.06289 DecoderLoss:0.07570 StopLoss:0.26959 AlignScore:0.0346 GradNorm:0.21866 GradNormST:0.09893 AvgTextLen:56.6 AvgSpecLen:307.8 StepTime:1.32 LoaderTime:0.44 LR:0.000054
| > Step:117/141 GlobalStep:13750 PostnetLoss:0.06960 DecoderLoss:0.08432 StopLoss:0.33354 AlignScore:0.0241 GradNorm:0.18223 GradNormST:0.12412 AvgTextLen:80.1 AvgSpecLen:484.0 StepTime:1.74 LoaderTime:0.60 LR:0.000054
| > EPOCH END -- GlobalStep:13774 AvgTotalLoss:0.06121 AvgPostnetLoss:0.07169 AvgDecoderLoss:0.39223 AvgStopLoss:0.05383 EpochTime:187.22 AvgStepTime:1.32 AvgLoaderTime:0.43
> Validation
| > TotalLoss: 1.18051 PostnetLoss: 0.07172 - 0.07172 DecoderLoss:0.08944 - 0.08944 StopLoss: 1.01935 - 1.01935 AlignScore: 0.0957 : 0.0957
| > TotalLoss: 0.62240 PostnetLoss: 0.07491 - 0.07226 DecoderLoss:0.09224 - 0.08981 StopLoss: 0.45526 - 0.68541 AlignScore: 0.0224 : 0.0452
warning: audio amplitude out of range, auto clipped.
| > Synthesizing test sentences
!! Error creating Test Sentence - 0
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 1
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 2
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 3
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
| > Training Loss: 0.06121 Validation Loss: 0.07319
> Number of outputs per iteration: 5
> Epoch 97/100
| > Step:0/141 GlobalStep:13775 PostnetLoss:0.06069 DecoderLoss:0.07308 StopLoss:0.82969 AlignScore:0.2031 GradNorm:0.23467 GradNormST:0.34403 AvgTextLen:9.1 AvgSpecLen:114.1 StepTime:0.57 LoaderTime:0.23 LR:0.000054
| > Step:25/141 GlobalStep:13800 PostnetLoss:0.05430 DecoderLoss:0.06320 StopLoss:0.39068 AlignScore:0.0853 GradNorm:0.20467 GradNormST:0.28177 AvgTextLen:23.5 AvgSpecLen:165.0 StepTime:0.81 LoaderTime:0.29 LR:0.000054
| > Step:50/141 GlobalStep:13825 PostnetLoss:0.06497 DecoderLoss:0.07878 StopLoss:0.68616 AlignScore:0.0598 GradNorm:0.12579 GradNormST:0.48869 AvgTextLen:33.1 AvgSpecLen:205.6 StepTime:0.88 LoaderTime:0.33 LR:0.000054
| > Step:75/141 GlobalStep:13850 PostnetLoss:0.05773 DecoderLoss:0.06606 StopLoss:0.25092 AlignScore:0.0441 GradNorm:0.15950 GradNormST:0.20841 AvgTextLen:44.0 AvgSpecLen:245.8 StepTime:1.13 LoaderTime:0.37 LR:0.000054
| > Step:100/141 GlobalStep:13875 PostnetLoss:0.06377 DecoderLoss:0.07580 StopLoss:0.26661 AlignScore:0.0319 GradNorm:0.13651 GradNormST:0.13730 AvgTextLen:61.8 AvgSpecLen:354.8 StepTime:1.47 LoaderTime:0.48 LR:0.000054
| > Step:125/141 GlobalStep:13900 PostnetLoss:0.06776 DecoderLoss:0.08022 StopLoss:0.24092 AlignScore:0.0209 GradNorm:0.21876 GradNormST:0.12476 AvgTextLen:93.2 AvgSpecLen:496.5 StepTime:1.91 LoaderTime:0.62 LR:0.000054
| > EPOCH END -- GlobalStep:13916 AvgTotalLoss:0.06130 AvgPostnetLoss:0.07161 AvgDecoderLoss:0.39027 AvgStopLoss:0.05399 EpochTime:187.54 AvgStepTime:1.32 AvgLoaderTime:0.43
> Validation
| > TotalLoss: 1.20810 PostnetLoss: 0.07182 - 0.07182 DecoderLoss:0.08552 - 0.08552 StopLoss: 1.05076 - 1.05076 AlignScore: 0.0975 : 0.0975
| > TotalLoss: 0.54540 PostnetLoss: 0.07758 - 0.07427 DecoderLoss:0.09452 - 0.09072 StopLoss: 0.37330 - 0.63299 AlignScore: 0.0227 : 0.0458
warning: audio amplitude out of range, auto clipped.
| > Synthesizing test sentences
!! Error creating Test Sentence - 0
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 1
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 2
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
!! Error creating Test Sentence - 3
Traceback (most recent call last):
File "train.py", line 482, in evaluate
style_wav=style_wav)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 103, in synthesis
inputs = text_to_seqvec(text, CONFIG, use_cuda)
File "/home/marta/Desktop/inz/test/TTS/utils/synthesis.py", line 12, in text_to_seqvec
CONFIG.enable_eos_bos_chars),
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 57, in phoneme_to_sequence
to_phonemes = text2phone(clean_text, language)
File "/home/marta/Desktop/inz/test/TTS/utils/text/__init__.py", line 31, in text2phone
ph = phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/phonemize.py", line 149, in phonemize
logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 42, in __init__
super(self.__class__, self).__init__(language, logger=logger)
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/base.py", line 43, in __init__
'initializing backend %s-%s', self.name(), self.version())
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 104, in version
long_version = cls.long_version()
File "/home/marta/.local/lib/python3.6/site-packages/phonemizer/backend/espeak.py", line 92, in long_version
'{} --help'.format(cls.espeak_exe()), posix=False)).decode(
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
| > Training Loss: 0.06130 Validation Loss: 0.07542
> Number of outputs per iteration: 5
In my thesis I want to present different results from different settings, but still I got some problems with memory, and Google Cloud offers (for free) only 13GB also with 8 threats.
The way you would like to exchange my dataset is up to you.
Ugh, the thing is I got my time so limited
I should be done with training within 3 weeks
Oh 3 weeks is going to be hard. I am tied up with my work right now. I thought you’d atleast have a couple of months.
It doesn’t have to be perfect though.
But yeah, it might be tough. However, set is 10x times smaller than the LJ-speech. So, the sooner we start, the better.
I should start earlier, in November, and approach you then. I’m sorry
anyway, I believe I should have change also best_model_config.json, since it’s more for English than for Polish and where fs= 20000, not 16000
Where I should do changes, too? Because in TTS there is folder called TTS/tests
and I believe I should do there major updatest, too.
And this one: TTS/mozilla_us_phonemes <- should I do my own phoneme folder?
Yes ,you’ll have to change the config to your dataset.
No changes in tests folder.
I am sorry my replies are taking this long. I am really tied up at work. I’ll try and help you out as soon as I get some breathing space.
The phonemes are created in the first epoch so you dont have to create your own.
Anyway, I tried to change best_model_config.json and still, results are poor (just noise), there is problem with creating test sentences; still, I am doing mistake somewhere, but I don’t know, where.
Model itself for 100 epochs, batch size of 6 and test batch of size 2, trains ~9 hours now.
Oh! Batch_size of 6 isn’t going to get you anywhere. The config has a comment saying anything less than 32 has a hard time converging. Also, your dataset is key to training a tts. Needs to be clean and consistent.
As you might remember, I cannot do anything with such a batch, because it is too large to my computer. I can share with you my dataset, which (I believe) I prepared well. (pass is alchemist; it will be ‘alive’ for 7 days)
TTS dataset in Polish
Btw, I got 4GB of GPU only on my computer.
I’ve tried downloading it but the download refuses to start. I’ve tried on different networks and browsers, no luck anywhere.
What you can alos do for small RAM GPUs, is to do gradient aggregation. It is not implemented in TTS but it is quite easy to do so. And it’d be a good PR as well.
To be more clear, you run your small batch of instances for n iterations and aggregate the gradients. After you reach N batches, you backprop the model.