-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21
Comments
Any update on this? Can you confirm that if I am training on a "MultiBinary(12)" environment, the predicted tensors should all just be 0 and 1 or am I misunderstanding how its supposed to work? Am I supposed to use some kind of wrapper or conversion to the tensors that come out of the .pth file? *edit
It's looking like maybe this is kind of working but the results are kind of bad so maybe there is still something else i need to do on top of this? *edit i tried with sample() instead of mean and it seems better. Is this what I should be using? How do i get deterministic? |
Sorry for the late reply! Actually, for these discrete actions, inference will output the logits rather than raw actions. An example is
And you will get the following output:
|
I see thanks you can close this then. |
Since we gonna publish a formal version soon, you're recommended to use the latest repo code to get more stable performance. |
🐛 Bug
Note: this is on the current pip version, havn't tried the git repo version.
Also note I have a newer version of gymnasium and matplotlib than this repo specifies:
I'll start out by explaining the shapes in the training env.
the printout of the above is
SHAPE1 MultiBinary(12)
SHAPE2 Box(0, 1, (8, 12), int8)
The above code snippet is the return value of a function called make_retro_env that I created.
After training using
Note this prints out "SHAPE3 MultiBinary(12)"
When I load the .pth that was automatically saved via the training, using
The tensors look like this:
I'm not sure if I did something wrong, or if perhaps this bug is fixed in the current repo or related to my library versions. If that is the case, let me know.
Thanks.
To Reproduce
No response
Relevant log output / Error message
No response
System Info
({'OS': 'Windows-10-10.0.19045-SP0 10.0.19045', 'Python': '3.8.16', 'Stable-Baselines3': '2.0.0', 'PyTorch': '2.0.0', 'GPU Enabled': 'True', 'Numpy': '1.23.5', 'Cloudpickle': '2.2.1', 'Gymnasium': '0.29.0', 'OpenAI Gym': '0.26.2'}, '- OS: Windows-10-10.0.19045-SP0 10.0.19045\n- Python: 3.8.16\n- Stable-Baselines3: 2.0.0\n- PyTorch: 2.0.0\n- GPU Enabled: True\n- Numpy: 1.23.5\n- Cloudpickle: 2.2.1\n- Gymnasium: 0.29.0\n- OpenAI Gym: 0.26.2\n')
Checklist
The text was updated successfully, but these errors were encountered: