You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a compatability problem with tensorflow-gpu 1.4.1. The train.py can be processed under tf-gpu 1.2.1 with some warnings. Nevertheless, there is always an error track back to
File "train/../libs/layers/wrapper.py", line 172, in assign_boxes inds = tf.where(tf.equal(assigned_layers, l))
under tf-gpu 1.4.1. But the problem disappears with repeatly infos "no CUDA-capable device is detected" if we set CUDA_VISIBLE_DEVICES="". We have centos 7.2.1511, nvidia k40c with driver 384.111 and cuda V8.0.61 with cudnn 5.1.10.
The full problematic log is attached here:
P2P3P4P5anchor_scales = [8, 16, 32]anchor_scales = [4, 8, 16]anchor_scales = [2, 4, 8]anchor_scales = [1, 2, 4]5432WARNING:tensorflow:From train/train.py:224: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.Instructions for updating:Please switch to tf.train.create_global_step/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "2018-03-02 10:56:24.945523: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA2018-03-02 10:56:25.343999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745pciBusID: 0000:06:00.0totalMemory: 11.17GiB freeMemory: 11.09GiB2018-03-02 10:56:25.344083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:06:00.0, compute capability: 3.5)--restore_previous_if_exists is set, but failed to restore in ./output/mask_rcnn/ Nonerestoring resnet_v1_50/conv1/weights:0restoring resnet_v1_50/conv1/BatchNorm/gamma:0restoring resnet_v1_50/conv1/BatchNorm/beta:0restoring resnet_v1_50/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/weights:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/weights:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/weights:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/weights:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/weights:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/weights:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/BatchNorm/moving_variance:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/weights:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_mean:0restoring resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/moving_variance:0restoring resnet_v1_50/logits/weights:0restoring resnet_v1_50/logits/biases:0Restored 267(640) vars from ./data/pretrained_models/resnet_v1_50.ckpt2018-03-02 10:56:39.155490: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function2018-03-02 10:56:39.155816: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]2018-03-02 10:56:39.156089: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]2018-03-02 10:56:39.156227: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]]Traceback (most recent call last): File "train/train.py", line 339, in <module> train() File "train/train.py", line 271, in train [input_image] + [final_box] + [final_cls] + [final_prob] + [final_gt_cls] + [gt] + [tmp_0] + [tmp_1] + [tmp_2] + [tmp_3] + [tmp_4]) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message)tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]] [[Node: pyramid_2/OneHotEncoding_4/one_hot/_1183 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9981_pyramid_2/OneHotEncoding_4/one_hot", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]Caused by op u'pyramid_1/AssignGTBoxes/Where_4', defined at: File "train/train.py", line 339, in <module> train() File "train/train.py", line 193, in train loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0]) File "train/../libs/nets/pyramid_network.py", line 580, in build is_training=is_training, gt_boxes=gt_boxes) File "train/../libs/nets/pyramid_network.py", line 263, in build_heads assign_boxes(rois, [rois, batch_inds], [2, 3, 4, 5]) File "train/../libs/layers/wrapper.py", line 172, in assign_boxes inds = tf.where(tf.equal(assigned_layers, l)) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in where return gen_array_ops.where(input=condition, name=name) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5930, in where "Where", input=input, name=name) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/home/huwh1/virtualenv/tf-1.4/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessInternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_4 = Where[_device="/job:localhost/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_4/_1027)]] [[Node: pyramid_2/OneHotEncoding_4/one_hot/_1183 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_9981_pyramid_2/OneHotEncoding_4/one_hot", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
The text was updated successfully, but these errors were encountered:
There seems to be a compatability problem with tensorflow-gpu 1.4.1. The train.py can be processed under tf-gpu 1.2.1 with some warnings. Nevertheless, there is always an error track back to
under tf-gpu 1.4.1. But the problem disappears with repeatly infos "no CUDA-capable device is detected" if we set CUDA_VISIBLE_DEVICES="". We have centos 7.2.1511, nvidia k40c with driver 384.111 and cuda V8.0.61 with cudnn 5.1.10.
The full problematic log is attached here:
The text was updated successfully, but these errors were encountered: