Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs-fix: Pistonball #1245

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

anordin95
Copy link

@anordin95 anordin95 commented Nov 27, 2024

Description

Remove mentions of local-reward for Pistonball because the code no longer supports it. I'm guessing the functionality was removed earlier but the documentation was not accordingly modified. See below.

# TODO: this was a bad idea and the logic this uses should be removed at some point
self.local_ratio = 0

Besides that comment, you can run the environment and see the rewards are constant across all pistons which shouldn't be possible if there was a local reward component.

image

Also, re-arrange and edit some aspects of the phrasing for clarity & improved ability to quickly skim.

Type of change

  • Bug fix (non-breaking change which fixes an issue) [note: not a code-bug, but docs don't match functionality]
  • This change requires a documentation update

Checklist:

relevant [note: docs-only change]

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have run pytest -v and no errors are present.
  • I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • New and existing unit tests pass locally with my changes

not-relevant

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

…ion because the local-reward was

deprecated.
Modify some aspects of the documentation for improved clarity & skimming.
@anordin95 anordin95 marked this pull request as ready for review November 27, 2024 16:35
@anordin95 anordin95 changed the title Remove mentions of local-reward for Pistonball & modify some other aspects for clarity & skimming. Docs-fix: Pistonball Nov 27, 2024
@David-GERARD David-GERARD self-assigned this Dec 4, 2024
@David-GERARD David-GERARD added the documentation Improvements or additions to documentation label Dec 4, 2024
@David-GERARD
Copy link
Collaborator

David-GERARD commented Dec 4, 2024

Hi @anordin95,

Indeed the local reward was removed in #486 but the code was never updated further than this TODO comment.

I can merge the PR as it stands (once the CI checks have passed), but if you'd like to also modify the environment code to completely remove the local reward do let me know, and I will merge it when you're done with the code update.

Up to you, just let me know, and thank you for your contribution! :)

[Edit]: the connect_four tutorials CI tests have been removed from the latest version so you can ignore CI workflow failures there.

Comment on lines 620 to 627
# A rough, first-order prediction (i.e. velocity-only) of the balls next position.
# The physics environment may bounce the ball off the wall in the next time-step
# without us first registering that win-condition.
ball_predicted_next_pos = (
ball_curr_pos +
self.ball.velocity[0] * self.dt
)
if ball_next_x <= self.wall_width + 1:
# Include a single-pixel fudge-factor for the approximation.
if ball_predicted_next_pos <= self.wall_width + 1:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is my guess at the purpose of this logic! Please let me know if it looks wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

This looks good to me at first glance. Let me know when you're ready for a full review, and I'll tag 2 contributors that have (way) more experience with pistonball than me to have their feedback.

Keep up the good work and thank you for helping improve PettingZoo!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right on! :)

I think it's ready for a full review now.

@anordin95
Copy link
Author

anordin95 commented Dec 4, 2024

P.S. Is there a particular reason the current distance-based reward-function which compensates at every update step was chosen over, say, a reward of 1 upon touching the left-wall and at all other times zero?

@anordin95 anordin95 force-pushed the master branch 2 times, most recently from 78ba747 to 9a43547 Compare December 4, 2024 23:31
@David-GERARD
Copy link
Collaborator

@dm-ackerman @benblack769 @RyanNavillus I saw in the git blame that you made extensive contributions to the pistonball environment, if you have the time would you mind taking a look at this PR?

@jkterry1 since you made the original removal of the local reward in PR #486 I'm also tagging you in case you want/have the time to take a look at it.

@anordin95 anordin95 force-pushed the master branch 4 times, most recently from 70bedbc to 63ff2a7 Compare December 5, 2024 15:10
Copy link
Contributor

@RyanNavillus RyanNavillus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments. I think my changes to the code were mostly a refactor, I just replaced all of the magic numbers with named variables so that the code was more readable and did some refactoring. I think that global reward already didn't work the way we intended back then, or wasn't useful for training, so we partially removed it and never fixed it. I do think it's a good feature to include, but if you're just trying to make the docs consistent I think it's fine to remove for now, and maybe consider adding it back correctly in the future. Overall this PR looks good to me.

self.lastX = int(self.ball.position[0] - self.ball_radius)
self.distance = self.lastX - self.wall_width
self.ball_prev_pos = self._get_ball_position()
self.distance_to_wall_at_game_start = self.ball_prev_pos - self.wall_width
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.distance_to_wall_at_game_start = self.ball_prev_pos - self.wall_width
self.initial_wall_distance = self.ball_prev_pos - self.wall_width

Maybe a bit more concise?

Copy link
Author

@anordin95 anordin95 Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm that is more concise. Though, it may be harder for a reader to differentiate whether "initial" refers to the start of a time-step or the start of the game.

Edit: the variable is also only used once and computed a few hundred lines away, so I think precise context is helpful here.

Comment on lines +635 to +639
reward = (
-1
* (ball_curr_pos - self.ball_prev_pos)
* (100 / self.distance_to_wall_at_game_start)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reward = (
-1
* (ball_curr_pos - self.ball_prev_pos)
* (100 / self.distance_to_wall_at_game_start)
)
reward = (self.ball_prev_pos - ball_curr_pos) * (100 / self.initial_wall_distance)

I think this is still more readable, as long as you keep your comment about how the x axis increases left to right. Another option is this, which makes it clear that 100 is an arbitrary scalar.

Suggested change
reward = (
-1
* (ball_curr_pos - self.ball_prev_pos)
* (100 / self.distance_to_wall_at_game_start)
)
reward = 100 * ((self.ball_prev_pos - ball_curr_pos) / self.initial_wall_distance)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two changes here.

The first is compressing the multi-line expression to one line. I think your current suggestion would still end up being linted, though to a fewer number of lines. For reference, I tried it locally and saw the linter update it like so:

reward = 100 * (
    (self.ball_prev_pos - ball_curr_pos) / self.initial_wall_distance
)

The second is explicitly including the -1 versus modifying the order of current - previous. I think a delta is generally assumed to be current minus previous for any given quantity (e.g. delta-x, delta-t). I'm a bit wary of modifying that typical, intuitive ordering for the sake of packing an extra bit of logic in to save a line of code.

@RyanNavillus
Copy link
Contributor

P.S. Is there a particular reason the current distance-based reward-function which compensates at every update step was chosen over, say, a reward of 1 upon touching the left-wall and at all other times zero?

I think we want this reward function to be as easy to learn as possible, since pistonball is sort of the cartpole of pettingzoo and used in most tutorials (unless things have changed since I last worked on it). Sparse rewards are harder to learn than dense rewards. That's also why I think it makes sense to include the 100x reward multiplier in the code, because that reward scale probably works with the default hyperparameters for most PPO implementations.

@anordin95
Copy link
Author

I think we want this reward function to be as easy to learn as possible, since pistonball is sort of the cartpole of pettingzoo and used in most tutorials (unless things have changed since I last worked on it). Sparse rewards are harder to learn than dense rewards. That's also why I think it makes sense to include the 100x reward multiplier in the code, because that reward scale probably works with the default hyperparameters for most PPO implementations.

Ah, right on. Thanks!

@David-GERARD David-GERARD added this to the 1.24.4 release milestone Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants