Low linear test accuracy on specific downstream dataset. #549

hongvin · 2022-10-25T14:13:54Z

hongvin
Oct 25, 2022

Hello, community! I have been surrounded by this for quite a while now and experimented with several different ways to train.

Training the backbone with/without pre-trained Imagenet model.
Augmentation techniques, learning rates, epochs.
Linear evaluation - changing of learning rates, adding more linear layers.

So, a sample of the dataset is as follows.

This is very specific, with little difference between each class. It works well with normal supervised learning way but not contrastively. The first issue arises is when it has high training linear accuracy but low testing linear accuracy. This is explainable because the dataset is split according to drivers. Specifically, if I do a random split again, mixing all images and then split into test and train dataset, I get superb accuracy. (duh)

The current best config I have is:

Pre-trained Imagenet backbone on SimCLR. Trained for 200 epochs, removing random resized crop while remains the other augmentation.
Extract the 200th epoch backbone for linear evaluation.
Linear evaluation with 100 epochs.

The linear evaluation I obtained is has Top-1 of around 45, and Top-5 of around 90. However during training linear layer, Top-1 can reach 90.

I have gone thru many works on custom dataset, but I couldn't found any way to boost the accuracy further. I am not sure if this is the main obstacle of contrastive learning, but I am glad to hear from the community on this.

tonysy · 2022-10-27T02:56:12Z

tonysy
Oct 27, 2022
Maintainer

This task seems like a fine-grained classification task. You may consider try some masked image modeling method(MAE, SimMIM, CAE, etc) as pre-training, which may model the texture and local pattern better.

1 reply

hongvin Nov 8, 2022
Author

Thanks for suggesting MIM-based method. I have tried several of them, but I found it rather interesting because they could not reconstruct the image well. Attached is the MAE model trained with the original weight.

The config is:

_base_ = [
    '_base_/models/mae_vit-base-p16.py',
    '_base_/datasets/imagenet_mae.py',
    '_base_/schedules/adamw_coslr-200e_in1k.py',
    '_base_/default_runtime.py',
]
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
data = dict(
    samples_per_gpu=32,
    workers_per_gpu=32,
    train=dict(
        data_source=dict(
            data_prefix='.',
            ann_file='/home/hongvin/mmselfsup/train.txt',
        )
    ),
    test=dict(
        pipeline = [
            dict(type='Resize', size=(224, 224)),
            dict(type='ToTensor'),
            dict(type='Normalize', **img_norm_cfg),]
    )
)

# optimizer
optimizer = dict(
    lr=1.5e-4 * 4096 / 256 / 8 / 16, #8 GPU to 1 GPU (/8), and BS 512 to 32 (/16)
    paramwise_options={
        'norm': dict(weight_decay=0.),
        'bias': dict(weight_decay=0.),
        'pos_embed': dict(weight_decay=0.),
        'mask_token': dict(weight_decay=0.),
        'cls_token': dict(weight_decay=0.)
    })
optimizer_config = dict()

# learning policy
lr_config = dict(
    policy='StepFixCosineAnnealing',
    min_lr=0.0,
    warmup='linear',
    warmup_iters=40,
    warmup_ratio=1e-4,
    warmup_by_epoch=True,
    by_epoch=False)

# schedule
runner = dict(max_epochs=400) #eval 60

# runtime
checkpoint_config = dict(interval=1, max_keep_ckpts=300, out_dir='')
persistent_workers = True
log_config = dict(
    interval=100, hooks=[
        dict(type='TextLoggerHook'),
    ])

model = dict(
    backbone=dict(
        mask_ratio=0.7,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='/home/hongvin/mmselfsup/mae_vit-base-p16_8xb512-coslr-400e_in1k-224_20220223-85be947b.pth'
        )
    )
)

I have tried with smaller and larger LR and no help. Should I train longer? Because I found the reconstruction is bad and little improvement from 100 epochs onwards.

PS: the test set background settings might not be in the training set. Before trying MIM, I randomly shuffled the dataset and use SIMCLR, and it gets >90%, while if I were to do the split based on 'seen' and 'unseen' background, it drops to 40%+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low linear test accuracy on specific downstream dataset. #549

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Low linear test accuracy on specific downstream dataset. #549

hongvin Oct 25, 2022

Replies: 1 comment · 1 reply

tonysy Oct 27, 2022 Maintainer

hongvin Nov 8, 2022 Author

hongvin
Oct 25, 2022

Replies: 1 comment 1 reply

tonysy
Oct 27, 2022
Maintainer

hongvin Nov 8, 2022
Author