Fix bug in SB3 tutorial ActionMask #1203

dm-ackerman · 2024-05-03T15:49:09Z

Description

SB3ActionMaskWrapper.step() is intended to be compatible with Gymnansium's
interface where step() returns observation, reward, termination, truncation, info

This was implemented using the last() function. But this returns the values for
the current agent, not the agent that just acted as Gymnasium would.

Among other things, this trains on the opponent's reward, encouraging bad play.

The function now returns the reward, termination, truncation, info values for the
agent that just acted. It still returns the observation for the next agent since
it is used to determine the next action.

Fixes #1147

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have run pytest -v and no errors are present.
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

SB3ActionMaskWrapper.step() is intended to be compatible with Gymnansium's interface where step() returns observation, reward, termination, truncation, info This was implemented using the last() function. But this returns the values for the current agent, not the agent that just acted as Gymnasium would. Among other things, this trains on the opponent's reward, encouraging bad play. The function now returns the reward, termination, truncation, info values for the agent that just acted. It still returns the observation for the next agent since it is used to determine the next action.

Some shifting of items from medium to easy with new changes

elliottower · 2024-05-03T21:33:53Z

Good catch, cheers

dm-ackerman added 2 commits May 3, 2024 14:25

update SB3 tutorial tests

409abc0

Some shifting of items from medium to easy with new changes

elliottower approved these changes May 3, 2024

View reviewed changes

elliottower merged commit 38e2520 into Farama-Foundation:master May 3, 2024
46 checks passed

dm-ackerman deleted the sb3_fix branch May 3, 2024 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in SB3 tutorial ActionMask #1203

Fix bug in SB3 tutorial ActionMask #1203

dm-ackerman commented May 3, 2024

elliottower commented May 3, 2024

Fix bug in SB3 tutorial ActionMask #1203

Fix bug in SB3 tutorial ActionMask #1203

Conversation

dm-ackerman commented May 3, 2024

Description

Type of change

Checklist:

elliottower commented May 3, 2024