-
Notifications
You must be signed in to change notification settings - Fork 702
Hyperparameter setting for training from scratch on CIFAR-10 #134
Comments
I was also wondering about this. It seems the 32x32 size of CIFAR-10 is incompatible with this model due to the down-sampling layers. |
@Yuancheng-Xu It seems like it can. The downsampling layers should be set to a smaller kernel and stride size (2 and 2 respectively). Without this, the output of the downsampling layers is effectively the same size as the kernel. |
@Yuancheng-Xu I managed to get accuracy to 87% by making a few changes to the code in the link above. Basic changes are mentioned in this repository https://github.com/shamikbose/Fujitsu_Assessment
|
Thanks a lot! |
Hey @shamikbose, I tried training the ImageNet100 dataset for custom input_size = 32, but the accuracy that I am getting is too low. What could I change in the architecture (I tried with making the kernel and stride small)? Any other approach that might help me to get good accuracy? |
@iamsh4shank The parameters used for ImageNet100 are mentioned in the paper. You should be able to reproduce it using those values. |
Actually ig it was for input_size 224 but on changing it to 32 I get accuracy really low |
With image size 32, try the parameters mentioned here #134 (comment) |
I did try changing the Conv layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L28) with kernel size 3 and padding 1. Also, I changed the downsampling layer (https://github.com/facebookresearch/ConvNeXt/blob/main/models/convnext.py#L74) with kernel size 2 and stride 2. It did not change the accuracy much. I am getting test accuracy like 4-5 percent |
Hi,
I am trying to train a convext on CIFAR-10 for a research project that doesn't allow using BN. I use the following configuration:
And the accuracy is only 75% percent (standard ResNet18 is about 93%). If I change the optimizer from AdamW to SGD, the best accuracy actually drops to below 50%. If I use the default input size 224, the accuracy is 84%, still significantly low.
Can ConvNeXt work on CIFAR10 without fine-tuning from a pretrained model? Could you provide a recommended set of hyper parameters for CIFAR10 (that should be robust to different types of optimizers and without mix-up and cutmix)?
Also I have another question on fine-tuning on CIFAR10: it seems that in the colab file the input_size is the default 224. However CIFAR10 image is 32*32. Does this mean that in the data preparation stage the image will be padded to 224 * 224?
Thank you!
The text was updated successfully, but these errors were encountered: