-
Training top-down multi-animal model on a computing cluster. There were no issues when training the centroid model, but I am experiencing memory issues when training the centered_instance model. I've run a number of top-down multi-animal models before, but this is the first time I'm having this issue. I assume it's related to the fact that it says the shape is [15,9999680,9999680,1], but I don't understand why that would be when for the centroid model, it was just (768, 960, 1). Obviously, it would require a lot of memory for such large data, but I don't understand why it's so large. Even if I reduce input size to 0.5, it is still very big. Any advice would be appreciated, thanks! I have tried the following gpu's: Here's the output: INFO:sleap.nn.training:Versions:
SLEAP: 1.3.3
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.7.16
OS: Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-redhat-9.3-Blue_Onyx
INFO:sleap.nn.training:Training labels file: 240730_combo_01.pkg.slp
INFO:sleap.nn.training:Training profile: centroid.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
"training_job_path": "centroid.json",
"labels_path": "240730_combo_01.pkg.slp",
"video_paths": [
""
],
"val_labels": null,
"test_labels": null,
"base_checkpoint": null,
"tensorboard": false,
"save_viz": false,
"zmq": false,
"run_name": "240730_sca006",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": "auto"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
},
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": true,
"imagenet_mode": null,
"input_scaling": 0.75,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
},
"instance_cropping": {
"center_on_part": "SB_Ant",
"crop_size": null,
"crop_size_detection_padding": 16
}
},
"model": {
"backbone": {
"leap": null,
"unet": {
"stem_stride": null,
"max_stride": 16,
"output_stride": 2,
"filters": 16,
"filters_rate": 2.0,
"middle_block": true,
"up_interpolate": true,
"stacks": 1
},
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
},
"heads": {
"single_instance": null,
"centroid": {
"anchor_part": "SB_Ant",
"sigma": 2.5,
"output_stride": 2,
"loss_weight": 1.0,
"offset_refinement": false
},
"centered_instance": null,
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
},
"base_checkpoint": null
},
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -180.0,
"rotation_max_angle": 180.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": false,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": false,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": true,
"flip_horizontal": false
},
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 8,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
},
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
},
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 20
}
},
"outputs": {
"save_outputs": true,
"run_name": "240730_sca006",
"run_name_prefix": "",
"run_name_suffix": ".centroid",
"runs_folder": "models",
"tags": [
""
],
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
},
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
},
"zmq": {
"subscribe_to_controller": false,
"controller_address": "tcp://127.0.0.1:9000",
"controller_polling_timeout": 10,
"publish_updates": false,
"publish_address": "tcp://127.0.0.1:9001"
}
},
"name": "",
"description": "",
"sleap_version": "1.3.3",
"filename": "centroid.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 10824 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: 240730_combo_01.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training: Splits: Training = 90 / Validation = 10.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2024-07-30 18:00:43.550298: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-30 18:00:45.477763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9469 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:b1:00.0, compute capability: 7.5
INFO:sleap.nn.training:Loaded test example. [3.908s]
INFO:sleap.nn.training: Input shape: (768, 960, 1)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training: Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=3, up_interpolate=True, block_contraction=False)
INFO:sleap.nn.training: Max stride: 16
INFO:sleap.nn.training: Parameters: 1,953,105
INFO:sleap.nn.training: Heads:
INFO:sleap.nn.training: [0] = CentroidConfmapsHead(anchor_part='SB_Ant', sigma=2.5, output_stride=2, loss_weight=1.0)
INFO:sleap.nn.training: Outputs:
INFO:sleap.nn.training: [0] = KerasTensor(type_spec=TensorSpec(shape=(None, 384, 480, 1), dtype=tf.float32, name=None), name='CentroidConfmapsHead/BiasAdd:0', description="created by layer 'CentroidConfmapsHead'")
INFO:sleap.nn.training:Training from scratch
INFO:sleap.nn.training:Setting up data pipelines...
INFO:sleap.nn.training:Training set: n = 90
INFO:sleap.nn.training:Validation set: n = 10
INFO:sleap.nn.training:Setting up optimization...
INFO:sleap.nn.training: Learning rate schedule: LearningRateScheduleConfig(reduce_on_plateau=True, reduction_factor=0.5, plateau_min_delta=1e-06, plateau_patience=5, plateau_cooldown=3, min_learning_rate=1e-08)
INFO:sleap.nn.training: Early stopping: EarlyStoppingConfig(stop_training_on_plateau=True, plateau_min_delta=1e-08, plateau_patience=20)
INFO:sleap.nn.training:Setting up outputs...
INFO:sleap.nn.training:Created run path: models/240730_sca006.centroid
INFO:sleap.nn.training:Setting up visualization...
2024-07-30 18:00:49.571281: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
2024-07-30 18:00:50.628807: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -34 } dim { size: -35 } dim { size: -36 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: -37 } dim { size: -38 } dim { size: 1 } } }
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
Unable to use Qt backend for matplotlib. This probably means Qt is running headless.
INFO:sleap.nn.training:Finished trainer set up. [7.2s]
INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [4.2s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/200
2024-07-30 18:00:59.099209: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8201
2024-07-30 18:01:02.068413: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-07-30 18:01:07.392356: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.18GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2024-07-30 18:01:07.392418: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.18GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
200/200 - 135s - loss: 0.0016 - val_loss: 0.0015 - lr: 1.0000e-04 - 135s/epoch - 673ms/step
Epoch 2/200
200/200 - 122s - loss: 0.0015 - val_loss: 0.0015 - lr: 1.0000e-04 - 122s/epoch - 608ms/step
Epoch 3/200
200/200 - 131s - loss: 0.0011 - val_loss: 6.7788e-04 - lr: 1.0000e-04 - 131s/epoch - 657ms/step
Epoch 4/200
200/200 - 117s - loss: 6.0862e-04 - val_loss: 4.6321e-04 - lr: 1.0000e-04 - 117s/epoch - 586ms/step
Epoch 5/200
200/200 - 118s - loss: 4.0584e-04 - val_loss: 2.8125e-04 - lr: 1.0000e-04 - 118s/epoch - 592ms/step
Epoch 6/200
200/200 - 127s - loss: 2.5527e-04 - val_loss: 2.0094e-04 - lr: 1.0000e-04 - 127s/epoch - 637ms/step
Epoch 7/200
200/200 - 124s - loss: 1.9830e-04 - val_loss: 1.6575e-04 - lr: 1.0000e-04 - 124s/epoch - 618ms/step
Epoch 8/200
200/200 - 120s - loss: 1.7257e-04 - val_loss: 1.5084e-04 - lr: 1.0000e-04 - 120s/epoch - 598ms/step
Epoch 9/200
200/200 - 110s - loss: 1.5131e-04 - val_loss: 1.3636e-04 - lr: 1.0000e-04 - 110s/epoch - 548ms/step
Epoch 10/200
200/200 - 128s - loss: 1.3319e-04 - val_loss: 1.0477e-04 - lr: 1.0000e-04 - 128s/epoch - 640ms/step
Epoch 11/200
200/200 - 119s - loss: 1.2208e-04 - val_loss: 1.2188e-04 - lr: 1.0000e-04 - 119s/epoch - 593ms/step
Epoch 12/200
200/200 - 119s - loss: 1.1151e-04 - val_loss: 1.1364e-04 - lr: 1.0000e-04 - 119s/epoch - 597ms/step
Epoch 13/200
200/200 - 110s - loss: 9.9992e-05 - val_loss: 9.2142e-05 - lr: 1.0000e-04 - 110s/epoch - 548ms/step
Epoch 14/200
200/200 - 128s - loss: 9.5941e-05 - val_loss: 1.0030e-04 - lr: 1.0000e-04 - 128s/epoch - 638ms/step
Epoch 15/200
200/200 - 110s - loss: 9.2658e-05 - val_loss: 8.0204e-05 - lr: 1.0000e-04 - 110s/epoch - 548ms/step
Epoch 16/200
200/200 - 118s - loss: 8.6367e-05 - val_loss: 7.7824e-05 - lr: 1.0000e-04 - 118s/epoch - 588ms/step
Epoch 17/200
200/200 - 118s - loss: 8.3389e-05 - val_loss: 7.3165e-05 - lr: 1.0000e-04 - 118s/epoch - 589ms/step
Epoch 18/200
200/200 - 118s - loss: 8.2116e-05 - val_loss: 6.9677e-05 - lr: 1.0000e-04 - 118s/epoch - 591ms/step
Epoch 19/200
200/200 - 128s - loss: 7.6944e-05 - val_loss: 7.3438e-05 - lr: 1.0000e-04 - 128s/epoch - 641ms/step
Epoch 20/200
200/200 - 109s - loss: 7.6430e-05 - val_loss: 6.5531e-05 - lr: 1.0000e-04 - 109s/epoch - 544ms/step
Epoch 21/200
200/200 - 118s - loss: 7.2628e-05 - val_loss: 6.3437e-05 - lr: 1.0000e-04 - 118s/epoch - 591ms/step
Epoch 22/200
200/200 - 128s - loss: 7.1498e-05 - val_loss: 6.3361e-05 - lr: 1.0000e-04 - 128s/epoch - 642ms/step
Epoch 23/200
200/200 - 120s - loss: 6.9983e-05 - val_loss: 6.1618e-05 - lr: 1.0000e-04 - 120s/epoch - 598ms/step
Epoch 24/200
200/200 - 120s - loss: 6.6453e-05 - val_loss: 5.4330e-05 - lr: 1.0000e-04 - 120s/epoch - 598ms/step
Epoch 25/200
200/200 - 109s - loss: 6.5471e-05 - val_loss: 5.3826e-05 - lr: 1.0000e-04 - 109s/epoch - 547ms/step
Epoch 26/200
200/200 - 118s - loss: 6.5850e-05 - val_loss: 5.9029e-05 - lr: 1.0000e-04 - 118s/epoch - 590ms/step
Epoch 27/200
200/200 - 129s - loss: 6.2315e-05 - val_loss: 5.6383e-05 - lr: 1.0000e-04 - 129s/epoch - 645ms/step
Epoch 28/200
200/200 - 109s - loss: 6.2462e-05 - val_loss: 5.4796e-05 - lr: 1.0000e-04 - 109s/epoch - 546ms/step
Epoch 29/200
Epoch 29: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
200/200 - 118s - loss: 5.9896e-05 - val_loss: 6.5831e-05 - lr: 1.0000e-04 - 118s/epoch - 590ms/step
Epoch 30/200
200/200 - 129s - loss: 5.6717e-05 - val_loss: 4.8033e-05 - lr: 5.0000e-05 - 129s/epoch - 646ms/step
Epoch 31/200
200/200 - 110s - loss: 5.6833e-05 - val_loss: 4.7654e-05 - lr: 5.0000e-05 - 110s/epoch - 548ms/step
Epoch 32/200
200/200 - 126s - loss: 5.6081e-05 - val_loss: 4.9296e-05 - lr: 5.0000e-05 - 126s/epoch - 631ms/step
Epoch 33/200
200/200 - 119s - loss: 5.6601e-05 - val_loss: 5.0415e-05 - lr: 5.0000e-05 - 119s/epoch - 593ms/step
Epoch 34/200
200/200 - 106s - loss: 5.4624e-05 - val_loss: 4.7272e-05 - lr: 5.0000e-05 - 106s/epoch - 530ms/step
Epoch 35/200
200/200 - 113s - loss: 5.4388e-05 - val_loss: 4.8796e-05 - lr: 5.0000e-05 - 113s/epoch - 567ms/step
Epoch 36/200
Epoch 36: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
200/200 - 113s - loss: 5.3033e-05 - val_loss: 4.8231e-05 - lr: 5.0000e-05 - 113s/epoch - 566ms/step
Epoch 37/200
200/200 - 114s - loss: 5.2355e-05 - val_loss: 4.8545e-05 - lr: 2.5000e-05 - 114s/epoch - 572ms/step
Epoch 38/200
200/200 - 114s - loss: 5.1971e-05 - val_loss: 4.8139e-05 - lr: 2.5000e-05 - 114s/epoch - 570ms/step
Epoch 39/200
200/200 - 125s - loss: 5.2411e-05 - val_loss: 4.9553e-05 - lr: 2.5000e-05 - 125s/epoch - 627ms/step
Epoch 40/200
200/200 - 116s - loss: 5.2114e-05 - val_loss: 4.6942e-05 - lr: 2.5000e-05 - 116s/epoch - 578ms/step
Epoch 41/200
200/200 - 116s - loss: 5.2490e-05 - val_loss: 4.5093e-05 - lr: 2.5000e-05 - 116s/epoch - 578ms/step
Epoch 42/200
200/200 - 105s - loss: 5.0768e-05 - val_loss: 5.1244e-05 - lr: 2.5000e-05 - 105s/epoch - 523ms/step
Epoch 43/200
200/200 - 114s - loss: 5.0272e-05 - val_loss: 4.9921e-05 - lr: 2.5000e-05 - 114s/epoch - 572ms/step
Epoch 44/200
200/200 - 114s - loss: 4.9402e-05 - val_loss: 4.4952e-05 - lr: 2.5000e-05 - 114s/epoch - 569ms/step
Epoch 45/200
200/200 - 115s - loss: 4.9057e-05 - val_loss: 4.6840e-05 - lr: 2.5000e-05 - 115s/epoch - 574ms/step
Epoch 46/200
Epoch 46: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
200/200 - 125s - loss: 5.0362e-05 - val_loss: 4.4250e-05 - lr: 2.5000e-05 - 125s/epoch - 624ms/step
Epoch 47/200
200/200 - 115s - loss: 4.9411e-05 - val_loss: 4.7635e-05 - lr: 1.2500e-05 - 115s/epoch - 575ms/step
Epoch 48/200
200/200 - 105s - loss: 4.7379e-05 - val_loss: 4.4865e-05 - lr: 1.2500e-05 - 105s/epoch - 524ms/step
Epoch 49/200
200/200 - 115s - loss: 4.9230e-05 - val_loss: 4.2179e-05 - lr: 1.2500e-05 - 115s/epoch - 574ms/step
Epoch 50/200
200/200 - 124s - loss: 4.8932e-05 - val_loss: 4.8290e-05 - lr: 1.2500e-05 - 124s/epoch - 622ms/step
Epoch 51/200
200/200 - 115s - loss: 4.8875e-05 - val_loss: 4.8321e-05 - lr: 1.2500e-05 - 115s/epoch - 575ms/step
Epoch 52/200
200/200 - 106s - loss: 4.9661e-05 - val_loss: 4.9710e-05 - lr: 1.2500e-05 - 106s/epoch - 529ms/step
Epoch 53/200
200/200 - 114s - loss: 4.8525e-05 - val_loss: 4.6438e-05 - lr: 1.2500e-05 - 114s/epoch - 570ms/step
Epoch 54/200
Epoch 54: ReduceLROnPlateau reducing learning rate to 6.24999984211172e-06.
200/200 - 114s - loss: 4.6240e-05 - val_loss: 4.7178e-05 - lr: 1.2500e-05 - 114s/epoch - 569ms/step
Epoch 55/200
200/200 - 115s - loss: 4.8187e-05 - val_loss: 4.3873e-05 - lr: 6.2500e-06 - 115s/epoch - 574ms/step
Epoch 56/200
200/200 - 115s - loss: 4.7370e-05 - val_loss: 4.2925e-05 - lr: 6.2500e-06 - 115s/epoch - 573ms/step
Epoch 57/200
200/200 - 115s - loss: 4.7761e-05 - val_loss: 4.8665e-05 - lr: 6.2500e-06 - 115s/epoch - 575ms/step
Epoch 58/200
200/200 - 114s - loss: 4.7622e-05 - val_loss: 4.5136e-05 - lr: 6.2500e-06 - 114s/epoch - 568ms/step
Epoch 59/200
200/200 - 113s - loss: 4.8361e-05 - val_loss: 4.7865e-05 - lr: 6.2500e-06 - 113s/epoch - 567ms/step
Epoch 60/200
200/200 - 113s - loss: 4.7528e-05 - val_loss: 4.6166e-05 - lr: 6.2500e-06 - 113s/epoch - 565ms/step
Epoch 61/200
Epoch 61: ReduceLROnPlateau reducing learning rate to 3.12499992105586e-06.
200/200 - 114s - loss: 4.7565e-05 - val_loss: 4.8376e-05 - lr: 6.2500e-06 - 114s/epoch - 568ms/step
Epoch 62/200
200/200 - 113s - loss: 4.8727e-05 - val_loss: 4.7061e-05 - lr: 3.1250e-06 - 113s/epoch - 564ms/step
Epoch 63/200
200/200 - 124s - loss: 4.4835e-05 - val_loss: 4.4468e-05 - lr: 3.1250e-06 - 124s/epoch - 621ms/step
Epoch 64/200
200/200 - 105s - loss: 4.6978e-05 - val_loss: 4.7338e-05 - lr: 3.1250e-06 - 105s/epoch - 526ms/step
Epoch 65/200
200/200 - 114s - loss: 4.7214e-05 - val_loss: 4.5294e-05 - lr: 3.1250e-06 - 114s/epoch - 572ms/step
Epoch 66/200
200/200 - 113s - loss: 4.7359e-05 - val_loss: 4.6547e-05 - lr: 3.1250e-06 - 113s/epoch - 567ms/step
Epoch 67/200
200/200 - 115s - loss: 4.7074e-05 - val_loss: 4.8392e-05 - lr: 3.1250e-06 - 115s/epoch - 574ms/step
Epoch 68/200
Epoch 68: ReduceLROnPlateau reducing learning rate to 1.56249996052793e-06.
200/200 - 114s - loss: 4.6401e-05 - val_loss: 4.8502e-05 - lr: 3.1250e-06 - 114s/epoch - 569ms/step
Epoch 69/200
200/200 - 114s - loss: 4.7373e-05 - val_loss: 4.4641e-05 - lr: 1.5625e-06 - 114s/epoch - 572ms/step
Epoch 69: early stopping
INFO:sleap.nn.training:Finished training loop. [134.5 min]
INFO:sleap.nn.training:Deleting visualization directory: models/240730_sca006.centroid/viz
INFO:sleap.nn.training:Saving evaluation metrics to model folder...
2024-07-30 20:15:27.969793: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } }
2024-07-30 20:15:27.970164: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1024 } dim { size: 1280 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -81 } dim { size: -82 } dim { size: 1 } } }
2024-07-30 20:15:32.305885: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } }
2024-07-30 20:15:32.306235: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1024 } dim { size: 1280 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -81 } dim { size: -82 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 21.2 FPS
INFO:sleap.nn.evals:Saved predictions: models/240730_sca006.centroid/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/240730_sca006.centroid/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.980130
2024-07-30 20:15:36.493087: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } }
2024-07-30 20:15:36.493436: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 4 } dim { size: 1024 } dim { size: 1280 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -81 } dim { size: -82 } dim { size: 1 } } }
2024-07-30 20:15:37.940149: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -69 } dim { size: -70 } dim { size: -71 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 2080 Ti" frequency: 1545 num_cores: 68 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 5767168 shared_memory_size_per_multiprocessor: 65536 memory_size: 9929097216 bandwidth: 616000000 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -72 } dim { size: -73 } dim { size: 1 } } }
2024-07-30 20:15:37.940513: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_UINT8 } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_UINT8 shape { dim { size: 2 } dim { size: 1024 } dim { size: 1280 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -5 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -5 } dim { size: -81 } dim { size: -82 } dim { size: 1 } } }
Predicting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% ETA: 0:00:00 4.2 FPS
INFO:sleap.nn.evals:Saved predictions: models/240730_sca006.centroid/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/240730_sca006.centroid/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.980198
INFO:sleap.nn.training:Versions:
SLEAP: 1.3.3
TensorFlow: 2.8.4
Numpy: 1.21.6
Python: 3.7.16
OS: Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-redhat-9.3-Blue_Onyx
INFO:sleap.nn.training:Training labels file: 240730_combo_01.pkg.slp
INFO:sleap.nn.training:Training profile: centered_instance.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
"training_job_path": "centered_instance.json",
"labels_path": "240730_combo_01.pkg.slp",
"video_paths": [
""
],
"val_labels": null,
"test_labels": null,
"base_checkpoint": null,
"tensorboard": false,
"save_viz": false,
"zmq": false,
"run_name": "240730_sca006",
"prefix": "",
"suffix": "",
"cpu": false,
"first_gpu": false,
"last_gpu": false,
"gpu": "auto"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Training job:
INFO:sleap.nn.training:{
"data": {
"labels": {
"training_labels": null,
"validation_labels": null,
"validation_fraction": 0.1,
"test_labels": null,
"split_by_inds": false,
"training_inds": null,
"validation_inds": null,
"test_inds": null,
"search_path_hints": [],
"skeletons": []
},
"preprocessing": {
"ensure_rgb": false,
"ensure_grayscale": true,
"imagenet_mode": null,
"input_scaling": 1.0,
"pad_to_stride": null,
"resize_and_pad_to_target": true,
"target_height": null,
"target_width": null
},
"instance_cropping": {
"center_on_part": "SB_Ant",
"crop_size": null,
"crop_size_detection_padding": 16
}
},
"model": {
"backbone": {
"leap": null,
"unet": {
"stem_stride": null,
"max_stride": 16,
"output_stride": 4,
"filters": 24,
"filters_rate": 2.0,
"middle_block": true,
"up_interpolate": true,
"stacks": 1
},
"hourglass": null,
"resnet": null,
"pretrained_encoder": null
},
"heads": {
"single_instance": null,
"centroid": null,
"centered_instance": {
"anchor_part": "SB_Ant",
"part_names": null,
"sigma": 2.5,
"output_stride": 4,
"loss_weight": 1.0,
"offset_refinement": false
},
"multi_instance": null,
"multi_class_bottomup": null,
"multi_class_topdown": null
},
"base_checkpoint": null
},
"optimization": {
"preload_data": true,
"augmentation_config": {
"rotate": true,
"rotation_min_angle": -180.0,
"rotation_max_angle": 180.0,
"translate": false,
"translate_min": -5,
"translate_max": 5,
"scale": false,
"scale_min": 0.9,
"scale_max": 1.1,
"uniform_noise": false,
"uniform_noise_min_val": 0.0,
"uniform_noise_max_val": 10.0,
"gaussian_noise": false,
"gaussian_noise_mean": 5.0,
"gaussian_noise_stddev": 1.0,
"contrast": false,
"contrast_min_gamma": 0.5,
"contrast_max_gamma": 2.0,
"brightness": false,
"brightness_min_val": 0.0,
"brightness_max_val": 10.0,
"random_crop": false,
"random_crop_height": 256,
"random_crop_width": 256,
"random_flip": true,
"flip_horizontal": false
},
"online_shuffling": true,
"shuffle_buffer_size": 128,
"prefetch": true,
"batch_size": 8,
"batches_per_epoch": null,
"min_batches_per_epoch": 200,
"val_batches_per_epoch": null,
"min_val_batches_per_epoch": 10,
"epochs": 200,
"optimizer": "adam",
"initial_learning_rate": 0.0001,
"learning_rate_schedule": {
"reduce_on_plateau": true,
"reduction_factor": 0.5,
"plateau_min_delta": 1e-06,
"plateau_patience": 5,
"plateau_cooldown": 3,
"min_learning_rate": 1e-08
},
"hard_keypoint_mining": {
"online_mining": false,
"hard_to_easy_ratio": 2.0,
"min_hard_keypoints": 2,
"max_hard_keypoints": null,
"loss_scale": 5.0
},
"early_stopping": {
"stop_training_on_plateau": true,
"plateau_min_delta": 1e-08,
"plateau_patience": 10
}
},
"outputs": {
"save_outputs": true,
"run_name": "240730_sca006",
"run_name_prefix": "",
"run_name_suffix": ".centered_instance",
"runs_folder": "models",
"tags": [
""
],
"save_visualizations": true,
"delete_viz_images": true,
"zip_outputs": false,
"log_to_csv": true,
"checkpointing": {
"initial_model": false,
"best_model": true,
"every_epoch": false,
"latest_model": false,
"final_model": false
},
"tensorboard": {
"write_logs": false,
"loss_frequency": "epoch",
"architecture_graph": false,
"profile_graph": false,
"visualizations": true
},
"zmq": {
"subscribe_to_controller": false,
"controller_address": "tcp://127.0.0.1:9000",
"controller_polling_timeout": 10,
"publish_updates": false,
"publish_address": "tcp://127.0.0.1:9001"
}
},
"name": "",
"description": "",
"sleap_version": "1.3.3",
"filename": "centered_instance.json"
}
INFO:sleap.nn.training:
INFO:sleap.nn.training:Auto-selected GPU 0 with 10824 MiB of free memory.
INFO:sleap.nn.training:Using GPU 0 for acceleration.
INFO:sleap.nn.training:Disabled GPU memory pre-allocation.
INFO:sleap.nn.training:System:
GPUs: 1/1 available
Device: /physical_device:GPU:0
Available: True
Initalized: False
Memory growth: True
INFO:sleap.nn.training:
INFO:sleap.nn.training:Initializing trainer...
INFO:sleap.nn.training:Loading training labels from: 240730_combo_01.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training: Splits: Training = 90 / Validation = 10.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
2024-07-30 20:16:43.564187: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-30 20:16:44.240409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9469 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:b1:00.0, compute capability: 7.5
2024-07-30 20:16:46.318988: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: 1 } dim { size: 1024 } dim { size: 1280 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -2 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } } device { type: "CPU" vendor: "GenuineIntel" model: "101" frequency: 2000 num_cores: 2 environment { key: "cpu_instruction_set" value: "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 32768 l2_cache_size: 1048576 l3_cache_size: 28835840 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { dim { size: -2 } dim { size: 9999680 } dim { size: 9999680 } dim { size: 1 } } }
2024-07-30 20:16:46.386553: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 5999616006144000 exceeds 10% of free system memory.
2024-07-30 20:16:46.387184: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at crop_and_resize_op.cc:181 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[15,9999680,9999680,1] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
2024-07-30 20:16:46.393067: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 5999616006144000 exceeds 10% of free system memory.
2024-07-30 20:16:46.393129: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at crop_and_resize_op.cc:181 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[15,9999680,9999680,1] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Traceback (most recent call last):
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/bin/sleap-train", line 8, in <module>
sys.exit(main())
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
trainer.train()
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/sleap/nn/training.py", line 924, in train
self.setup()
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/sleap/nn/training.py", line 910, in setup
self._setup_model()
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/sleap/nn/training.py", line 727, in _setup_model
base_example = next(iter(base_pipeline.make_dataset()))
File "/home/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 836, in __next__
return self._next_internal()
File "/home/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 822, in _next_internal
output_shapes=self._flat_output_shapes)
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2923, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/groups/s/home/ssfrz/miniconda3/envs/sleap_v1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 7186, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[15,9999680,9999680,1] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[{{node CropAndResize}}]] [Op:IteratorGetNext] |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @ssfrz, I have been staring at your Traceback for a bit too long now and am a bit stumped - it might be past-time to debug with an example. Would you be able to share your data with me (or an minimal example) through this user upload form? If you are able to share the data, please reply here to notify me (the form will not tell me when you have uploaded, but I will be keeping an eye on it). Apologies for the wait, |
Beta Was this translation helpful? Give feedback.
-
Hi @ssfrz, I have an update! So, your default crop size was set to Auto - which is great - SLEAP will automatically find the maximum instance width for you and use that for the crop size. I was debugging your data and found the code where the crop size was changed from sleap/sleap/nn/data/instance_cropping.py Lines 44 to 53 in 076f3dd Looking at the inst.points_array
rec.array([[ 8.359160e+02, 2.046660e+02],
[ 8.410480e+02, 2.021470e+02],
[ 8.413280e+02, 2.083980e+02],
[ 8.451540e+02, 2.128770e+02],
[ 8.490730e+02, 2.184750e+02],
[ 1.000000e+07, -9.998976e+06]],
dtype=float64)
inst.frame
LabeledFrame(video=HDF5Video('C:\Users\TalmoLab\Downloads\ssfrz\240730_combo_01.pkg.slp'), frame_idx=2, instances=15)
inst.video
Video(backend=HDF5Video(filename='C:\\Users\\TalmoLab\\Downloads\\ssfrz\\240730_combo_01.pkg.slp', dataset='video2/video', input_format='channels_last', convert_range=False)) Then, when pulling up the frame in the GUI, at first I didn't see anything, but when I added edges to the data, I saw this (background intentionally set to black): The fix here would be to delete and relabel these three It is always SB_Post that is given an extremely large value. I am sure that those points were not user-labeled (that far out of frame) and it was likely something SLEAP initialized for you. To find the root cause, I am wondering if you did anything special for the SB_Post node? Also, heads-up that the first frame in video 7 is a bogus label (labeling nothing/the background). Thanks, |
Beta Was this translation helpful? Give feedback.
Hi @ssfrz,
I have an update!
So, your default crop size was set to Auto - which is great - SLEAP will automatically find the maximum instance width for you and use that for the crop size.
I was debugging your data and found the code where the crop size was changed from
None
(an auto crop size) to9999195
:sleap/sleap/nn/data/instance_cropping.py
Lines 44 to 53 in 076f3dd