Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training consumes over 60 GB of memory #16

Open
michaelkrzyzaniak opened this issue Aug 10, 2018 · 2 comments
Open

Training consumes over 60 GB of memory #16

michaelkrzyzaniak opened this issue Aug 10, 2018 · 2 comments

Comments

@michaelkrzyzaniak
Copy link

Cool project, thanks for making it available. I pulled the code and the LJSpeech dataset. I prepared the dataset and began training with the default parameters, using the commands at the top of the readme. After printing the line

INFO:tensorflow:Calculate initial statistics.

Python3's memory usage grew to almost 30 GB. After the initial statistics were calculated, the memory usage dropped back to about 1 or 2 GB, and then after

INFO:tensorflow:global_step/sec: 0

It rose steadily to 60 (sixty) GB, at which point my OS killed it. Is this normal? The saved model checkpoint is only 1.2 GB.

I'm using Mac OSX 10.13.4 (High Sierra), Python 3.6.5, tensorflow 1.9.0 (cpu only), librosa 0.6.1. I had similar results on an Ubuntu 14 machine using TensorFlow GPU, where I killed the program after it reached 32 GB.

@michaelkrzyzaniak
Copy link
Author

On further inspection, I think this is just because the model was very large by default. config_jsons/wavenet_mol.json has

"num_stages": 10,
"num_layers": 30,

as compared to https://github.com/ibab/tensorflow-wavenet/blob/master/wavenet_params.json which has by default

"dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
              1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
              1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
              1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
              1, 2, 4, 8, 16, 32, 64, 128, 256, 512
              ],

Would this correspond to this in your project:

"num_stages": 10,
"num_layers": 5,

Is that correct? Running your code this way, the memory usage seems to be bounded by about 20GB peak. (By comparison, TensorFlow-wavenet stays around 5 GB.) Large but workable.

@bfs18
Copy link
Owner

bfs18 commented Aug 12, 2018

Hi, I have a desktop (32 GB ram, gtx1080ti, 8700k, ubuntu 16.04, python 3.6, tensorflow 1.8). I run train_wavenet.py with the default config, and it never exceeds memory limits.
The memory usage also depends on batch size. You can start with a small batch size.
The default config in tensorflow-wavenet should corresponds to

"num_stages": 10,
"num_layers": 50,

If the initialization consume too much memory, juse set data_dep_init_fn = None in train_wavenet.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants