Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The score of the result #47

Open
IItaly opened this issue May 7, 2021 · 17 comments
Open

The score of the result #47

IItaly opened this issue May 7, 2021 · 17 comments

Comments

@IItaly
Copy link

IItaly commented May 7, 2021

Hi,Thanks for your work and the detailed steps.
However,I followed them and trained the model(EfficientNet) in the DFDC
Then I got the strange result such as :
image

The real scores are greater than 1.And I try to predict it with the notebook 'Image prediction.ipynb'
Error while loading the weights:

image

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

The real scores are greater than 1.

That's OK, as the networks output the unnormalized scores of deepfake detection.
If you want a 0-1 probabilistic value, simply apply a sigmoid function to the scores returned by the networks (as we do in the compute_metrics function in the Analyze results.ipynb notebook).

And I try to predict it with the notebook 'Image prediction.ipynb'
Error while loading the weights:

I can't really tell what's happening here, could you provide a more detailed error stack trace?
At a first glance seems to me that you're trying to load the model's weights of an architecture different from the one you have instantiated in the model variable (i.e., maybe you're trying to load the weights for an EfficientNetB4ST in a EfficientNetB4 instance).
Let us know!
Bests,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 7, 2021

Thanks for your explanation.
And I have checked the model name I selected and the model I planed to load,Then I found that it will work when I try to load the pre-trained model you have provided in the /architectures/weigtht (I think it will definitely work.
And then I try to load the original pretrained model EfficientnetB4(efficientnet-b4-6ed6700e.pth),then I got the same error(I am sure that the model name is EfiicientnetB4) :
image

image

image

@IItaly
Copy link
Author

IItaly commented May 7, 2021

By the way:I noticed that the auc value is greater than you reported in your paper.Or it suggested I have made any mistakes?

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

let me see if I have understood everything straight:

  1. you have tried to load the pre-trained model we provide, and it worked;
  2. you have tried to load the original pre-trained EfficientNetB4 model from here (https://github.com/lukemelas/EfficientNet-PyTorch) and it did not work.

If what I wrote is right, it's OK that what you tried in step 2 didn't work, as our EfficientNetB4 model is a slightly modified version of the architecture provided in the original repo.
Indeed, we use the EfficientNetB4 just as a feature extractor, but the final classifier layer is different: we perform a binary classification task, and not a multi-class classification (the original EfficientNetB4 model is designed to work on ImageNet).
Anyway, I still don't understand one thing: did you encounter this error also with a model trained with our code?

By the way:I noticed that the auc value is greater than you reported in your paper.Or it suggested I have made any mistakes?

Yes you're right, your numbers seem a little higher! With which model did you obtain these results? Could you provide us the training hyperparameters?
Bests,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 7, 2021

I tried to train the model follow the steps you provided(train_all.sh) So I felt so strange.I will check it again and confirm whether I have trained the model truly.It seems that the net has not been changed but got the greater score :)

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

yes that is definitely curious! If you used directly the train_all.sh, you should get close numbers to the ones of the paper.
I'm wondering if the pre-trained model from the original EfficientNet-Pytorch repo has been updated recently: since we runned this tests almost a year ago, if the lukemelas repo is providing models with higher accuracy on ImageNet maybe the performance boost may come from here?
We'll check this also on our side with @nicobonne , but please let us know what you find!
Bests,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 9, 2021

Hello @CrohnEngineer ,
A question while loading data:
Just in

total=len(train_loader) // train_loader.batch_size):

it seems that 'len()' should not be used here.cuz it will lead a type error:
'TypeError: Cannot determine the DataLoader length of a IterableDataset'

Thus I replaced the 'train_loader' with 'train_dataset'.
Is it okay to do this ?
The sum number of the iteration seems to be huge.

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

When you call len() on a DataLoader it should return the result of calling len() on the dataset of the loader.
Can you please share the list of your conda environment?
Or did you just clone ours using the environment.yml file provided in our repo?
In any case your fix should work and don't worry, the number of iterations should be huge.
Usually the training stops without even processing the whole dataset.
Best,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 9, 2021

Thanks for your reply.This is my environment.
微信图片_20210509223916

I got it.It usually stops at 20% and says it have completed

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

I see you have pytorch=1.3.1, while in our experiments we use pytorch=1.4.0.
To avoid similar issues in the future, I suggest you to create a separate environment starting from the list of packages provided in our environment.yml file, and use this for executing our code.
You can find the instructions on how to do it in the README.md!
Bests,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 10, 2021

Thanks for your reminder @CrohnEngineer

I have changed my environment and then trained the model.I am sure that I didn't modify the code heavily.The type error while 'len(train_loader)' didn't happen under the torch==1.4.0

However,I still got the higher score..
I have said that I can not load the 'best_val.pth' when I tried to run the 'Video prediction.ipynb'. So I followed the steps in the 'test_model.py' to load the model.Then I got the result:

image

The code I modified in 'Video prediction.ipynb' (I am not sure if ok to do this):
image

The test result and val result(EfficientNetB4 trained in dfdc):

image

train args:
image

test args:
image

What's the result on your side?I would be grateful if you could tell me.And I want to know if I got something wrong.

other pics:
image

environment:
image

@nicobonne
Copy link
Member

It's not clear to me why you are not able to load weights from the notebook, so it will help if you could provide us a complete screenshot from the beginning of the notebook till the error.

What's the result on your side?I would be grateful if you could tell me.And I want to know if I got something wrong.

We are replicating the whole pipeline but it takes time.

@IItaly
Copy link
Author

IItaly commented May 11, 2021

OK.This is the complete screenshot of the notebook.I don't know why either.

image

image

However I got the predicted result when I modified it like this (I am not sure if ok to do this):

image

@nicobonne
Copy link
Member

Got it, inside the state dict we save a bunch of stuff like the losses, the optimizer state etc besides the network weights. You should pass only the net to load_state_dict(), like this

net.load_state_dict(load_url(...)['net'])

like we did in the test script.

@IItaly
Copy link
Author

IItaly commented May 11, 2021

Thank you very much!It seems that I modified it according to the test script.

@CrohnEngineer
Copy link
Collaborator

Hey @IItaly ,

we noticed just now that in the train_all.sh script we set as number of iterations 30000.
In the original paper instead we used 20000 iterations, and from here (training the network longer) might come the performance improvement you have seen.
As we are waiting for our internal tests to finish, are you able to perform another training and set as --maxiter 20000 instead of 30000 please?
As soon as we have our results we will write here to compare them with yours.
Bests,

Edoardo

@IItaly
Copy link
Author

IItaly commented May 11, 2021

Looking forward to your results. I will also try to set 20000 iterations and train it again, but it will take a little time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants