mmengine the training process didn't run correctly #576

Yoooss · 2022-11-09T03:11:28Z

Yoooss
Nov 9, 2022

I tried to use the command
"
bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712ls.py work_dirs/selfsup/densecl_resnet50_8xb32-coslr-200e_in1k/epoch_200.pth 1
"

And the config I used is:
"
base = 'mmdet::pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py'

data_preprocessor = dict(
type='DetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32)

norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
backbone=dict(
frozen_stages=-1,
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
roi_head=dict(
shared_head=dict(
type='ResLayerExtraNorm',
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch'),
bbox_head=dict(num_classes=2)))

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='RandomChoiceResize',
scales = [(666, 240), (666, 256), (666,272), (666, 288),
(666, 304), (666, 320), (666, 336), (666, 352),
(666, 368), (666, 384), (666, 400)],
keep_ratio=True),
dict(type='RandomFlip', prob=0.5),
dict(type='PackDetInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(666, 400), keep_ratio=True),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
dataset_type = 'VOCDataset'
data_root = '/media/ls/disk1/DOTA/VOCdevkit/'

train_dataloader = dict(
batch_size=2,
num_workers=1,
sampler=dict(type='InfiniteSampler', shuffle=True),
dataset=dict(
delete=True,
type='VOCDataset',
data_root=data_root,
ann_file='VOC2007/ImageSets/Main/trainval.txt',
data_prefix=dict(sub_data_root='VOC2007/'),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=train_pipeline,
))

val_dataloader = dict(dataset=dict(pipeline=test_pipeline,data_root=data_root,))
test_dataloader = val_dataloader

train_cfg = dict(delete=True, type='EpochBasedTrainLoop', max_epochs=24, val_interval=4)
#max_iter = 824

param_scheduler = [
dict(
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
end=1000),
dict(
type='MultiStepLR',
begin=0,
end=24,
by_epoch=True,
milestones=[16, 22],
gamma=0.1)
]

val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points')
test_evaluator = val_evaluator

default_hooks = dict(checkpoint=dict(by_epoch=True, interval=4))

log_processor = dict(by_epoch=True)

custom_imports = dict(
imports=['mmselfsup.evaluation.functional.res_layer_extra_norm'],
allow_failed_imports=False)

"

However the training process stuck at the epoch1 all the time , and the traning epoch couldn't run into the next epoch as below.

The log file is like "mmengine - INFO - Epoch(train) [1][2400/824]", which is "24000" is already over "824". The training process should have went to the "Epoch(train) [2]".
I tried to check the config file, but I couldn't find what caused the error.
May I get some advice? Thanks in advanced.

HAOCHENYE · 2022-11-09T06:13:57Z

HAOCHENYE
Nov 9, 2022

It seems you do not configure train_cfg correctly:

train_cfg = dict(delete=True, type='EpochBasedTrainLoop', max_epochs=24, val_interval=4)

delete=True should be replaced with _delete_=True

2 replies

Yoooss Nov 9, 2022
Author

Thanks for your reply. However the config I used is exactly

" just because the underline makes my quote incorrectly shows.

I still don't know what caused the problem. May I get some help?

fangyixiao18 Nov 9, 2022
Maintainer

Thanks for your reply. However the config I used is exactly " just because the underline makes my quote incorrectly shows.

I still don't know what caused the problem. May I get some help?

hello, maybe you need to modify the sampler here:
sampler=dict(type='InfiniteSampler', shuffle=True),

Change InfiniteSampler to DefaultSampler

you could try it again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmengine the training process didn't run correctly #576

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

mmengine the training process didn't run correctly #576

Yoooss Nov 9, 2022

Replies: 1 comment · 2 replies

HAOCHENYE Nov 9, 2022

Yoooss Nov 9, 2022 Author

fangyixiao18 Nov 9, 2022 Maintainer

Yoooss
Nov 9, 2022

Replies: 1 comment 2 replies

HAOCHENYE
Nov 9, 2022

Yoooss Nov 9, 2022
Author

fangyixiao18 Nov 9, 2022
Maintainer