Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I run train.py on coco dataset with resnet-101 model, it 's struck on for a long time. #59

Open
lji72 opened this issue Dec 4, 2018 · 3 comments

Comments

@lji72
Copy link

lji72 commented Dec 4, 2018

/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:97: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Restoring parameters from /home/liuji/light_head_rcnn/data/imagenet_weights/res101.ckpt

^CTraceback (most recent call last):
File "train.py", line 264, in
train(args)
File "train.py", line 186, in train
blobs_list = prefetch_data_layer.forward()
File "/home/liuji/light_head_rcnn/lib/utils/dpflow/prefetching_iter.py", line 78, in forward
if self.iter_next():
File "/home/liuji/light_head_rcnn/lib/utils/dpflow/prefetching_iter.py", line 65, in iter_next
e.wait()
File "/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/threading.py", line 551, in wait
signaled = self._cond.wait(timeout)
File "/home/liuji/anaconda3/envs/tensorflow/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()

Hello, I meet the problem, could you give a more detail solution. Thanks

@XingLiuJia
Copy link

hello,can you send me COCO ?thank you Email:[email protected]

@mbruchalski1
Copy link

Having the same problem, dataset, json, odgt all look good. Able to run the test code, but unable to know what this problem is. Error message is not detailed. Does anyone have a solution for this issue or the code is only for evaluation and does not work for training?

@masotrix
Copy link

I solved it adjusting "nr_dataflow" in config.py (in the corresponding folder you should be training according to README.md) from 16 to 2 in case of 1 GPU, because train_batch_per_gpu=2, (so 8GPUs x 2 images = 16 and 1GPU x 2 image = 2). Hope this helps you ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants