Docs-fix: Pistonball #1245

anordin95 · 2024-11-27T15:59:33Z

Description

Remove mentions of local-reward for Pistonball because the code no longer supports it. I'm guessing the functionality was removed earlier but the documentation was not accordingly modified. See below.

PettingZoo/pettingzoo/butterfly/pistonball/pistonball.py

Lines 239 to 240 in d6bd110

    
           # TODO: this was a bad idea and the logic this uses should be removed at some point 
        
           self.local_ratio = 0

Besides that comment, you can run the environment and see the rewards are constant across all pistons which shouldn't be possible if there was a local reward component.

Also, re-arrange and edit some aspects of the phrasing for clarity & improved ability to quickly skim.

Type of change

Bug fix (non-breaking change which fixes an issue) [note: not a code-bug, but docs don't match functionality]
This change requires a documentation update

Checklist:

relevant [note: docs-only change]

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have run pytest -v and no errors are present.
I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
New and existing unit tests pass locally with my changes

not-relevant

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works

David-GERARD · 2024-12-04T13:08:16Z

Hi @anordin95,

Indeed the local reward was removed in #486 but the code was never updated further than this TODO comment.

I can merge the PR as it stands (once the CI checks have passed), but if you'd like to also modify the environment code to completely remove the local reward do let me know, and I will merge it when you're done with the code update.

Up to you, just let me know, and thank you for your contribution! :)

[Edit]: the connect_four tutorials CI tests have been removed from the latest version so you can ignore CI workflow failures there.

anordin95 · 2024-12-04T16:38:19Z

pettingzoo/butterfly/pistonball/pistonball.py

+            # A rough, first-order prediction (i.e. velocity-only) of the balls next position.
+            # The physics environment may bounce the ball off the wall in the next time-step 
+            # without us first registering that win-condition. 
+            ball_predicted_next_pos = (
+                ball_curr_pos + 
+                self.ball.velocity[0] * self.dt
            )
-            if ball_next_x <= self.wall_width + 1:
+            # Include a single-pixel fudge-factor for the approximation.
+            if ball_predicted_next_pos <= self.wall_width + 1:


Note this is my guess at the purpose of this logic! Please let me know if it looks wrong.

Thanks for doing this!

This looks good to me at first glance. Let me know when you're ready for a full review, and I'll tag 2 contributors that have (way) more experience with pistonball than me to have their feedback.

Keep up the good work and thank you for helping improve PettingZoo!

Right on! :)

I think it's ready for a full review now.

anordin95 · 2024-12-04T23:23:44Z

P.S. Is there a particular reason the current distance-based reward-function which compensates at every update step was chosen over, say, a reward of 1 upon touching the left-wall and at all other times zero?

David-GERARD · 2024-12-05T11:38:29Z

@dm-ackerman @benblack769 @RyanNavillus I saw in the git blame that you made extensive contributions to the pistonball environment, if you have the time would you mind taking a look at this PR?

@jkterry1 since you made the original removal of the local reward in PR #486 I'm also tagging you in case you want/have the time to take a look at it.

RyanNavillus

Left a few minor comments. I think my changes to the code were mostly a refactor, I just replaced all of the magic numbers with named variables so that the code was more readable and did some refactoring. I think that global reward already didn't work the way we intended back then, or wasn't useful for training, so we partially removed it and never fixed it. I do think it's a good feature to include, but if you're just trying to make the docs consistent I think it's fine to remove for now, and maybe consider adding it back correctly in the future. Overall this PR looks good to me.

RyanNavillus · 2024-12-08T19:34:22Z

pettingzoo/butterfly/pistonball/pistonball.py

-        self.lastX = int(self.ball.position[0] - self.ball_radius)
-        self.distance = self.lastX - self.wall_width
+        self.ball_prev_pos = self._get_ball_position()
+        self.distance_to_wall_at_game_start = self.ball_prev_pos - self.wall_width


Suggested change

self.distance_to_wall_at_game_start = self.ball_prev_pos - self.wall_width

self.initial_wall_distance = self.ball_prev_pos - self.wall_width

Maybe a bit more concise?

Mhm that is more concise. Though, it may be harder for a reader to differentiate whether "initial" refers to the start of a time-step or the start of the game.

Edit: the variable is also only used once and computed a few hundred lines away, so I think precise context is helpful here.

RyanNavillus · 2024-12-08T19:49:08Z

pettingzoo/butterfly/pistonball/pistonball.py

+            reward = (
+                -1
+                * (ball_curr_pos - self.ball_prev_pos)
+                * (100 / self.distance_to_wall_at_game_start)
+            )


Suggested change

reward = (

-1

* (ball_curr_pos - self.ball_prev_pos)

* (100 / self.distance_to_wall_at_game_start)

)

reward = (self.ball_prev_pos - ball_curr_pos) * (100 / self.initial_wall_distance)

I think this is still more readable, as long as you keep your comment about how the x axis increases left to right. Another option is this, which makes it clear that 100 is an arbitrary scalar.

Suggested change

reward = (

-1

* (ball_curr_pos - self.ball_prev_pos)

* (100 / self.distance_to_wall_at_game_start)

)

reward = 100 * ((self.ball_prev_pos - ball_curr_pos) / self.initial_wall_distance)

I think there are two changes here.

The first is compressing the multi-line expression to one line. I think your current suggestion would still end up being linted, though to a fewer number of lines. For reference, I tried it locally and saw the linter update it like so:

reward = 100 * ( (self.ball_prev_pos - ball_curr_pos) / self.initial_wall_distance )

The second is explicitly including the -1 versus modifying the order of current - previous. I think a delta is generally assumed to be current minus previous for any given quantity (e.g. delta-x, delta-t). I'm a bit wary of modifying that typical, intuitive ordering for the sake of packing an extra bit of logic in to save a line of code.

RyanNavillus · 2024-12-08T19:57:42Z

P.S. Is there a particular reason the current distance-based reward-function which compensates at every update step was chosen over, say, a reward of 1 upon touching the left-wall and at all other times zero?

I think we want this reward function to be as easy to learn as possible, since pistonball is sort of the cartpole of pettingzoo and used in most tutorials (unless things have changed since I last worked on it). Sparse rewards are harder to learn than dense rewards. That's also why I think it makes sense to include the 100x reward multiplier in the code, because that reward scale probably works with the default hyperparameters for most PPO implementations.

anordin95 · 2024-12-08T20:24:51Z

I think we want this reward function to be as easy to learn as possible, since pistonball is sort of the cartpole of pettingzoo and used in most tutorials (unless things have changed since I last worked on it). Sparse rewards are harder to learn than dense rewards. That's also why I think it makes sense to include the 100x reward multiplier in the code, because that reward scale probably works with the default hyperparameters for most PPO implementations.

Ah, right on. Thanks!

…ion because the local-reward was deprecated. Modify some aspects of the documentation for improved clarity & skimming.

…oved clarity.

anordin95 force-pushed the master branch from 36551d6 to 369848b Compare November 27, 2024 16:34

anordin95 marked this pull request as ready for review November 27, 2024 16:35

anordin95 changed the title ~~Remove mentions of local-reward for Pistonball & modify some other aspects for clarity & skimming.~~ Docs-fix: Pistonball Nov 27, 2024

David-GERARD self-assigned this Dec 4, 2024

David-GERARD added the documentation Improvements or additions to documentation label Dec 4, 2024

David-GERARD approved these changes Dec 4, 2024

View reviewed changes

anordin95 commented Dec 4, 2024

View reviewed changes

anordin95 force-pushed the master branch from 3557a2e to eeaf94e Compare December 4, 2024 22:59

anordin95 force-pushed the master branch 2 times, most recently from 78ba747 to 9a43547 Compare December 4, 2024 23:31

anordin95 force-pushed the master branch 5 times, most recently from 63ff2a7 to 6a61ef3 Compare December 8, 2024 18:44

RyanNavillus reviewed Dec 8, 2024

View reviewed changes

David-GERARD added this to the 1.24.4 release milestone Dec 9, 2024

David-GERARD assigned yjhan96 and unassigned David-GERARD Feb 11, 2025

yjhan96 force-pushed the master branch from 852b27e to 6a61ef3 Compare February 23, 2025 13:12

anordin95 added 2 commits February 23, 2025 08:15

Remove mentions of the local-reward in the Pistonball game documentat…

65ef4d9

…ion because the local-reward was deprecated. Modify some aspects of the documentation for improved clarity & skimming.

Remove code references to local_reward & make some refactors for impr…

0b4e183

…oved clarity.

yjhan96 force-pushed the master branch from 6a61ef3 to 0b4e183 Compare February 23, 2025 13:16

style: pass pre-commit by removing trailing whitespaces

d515bb5

David-GERARD merged commit 0fc6133 into Farama-Foundation:master Feb 25, 2025
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs-fix: Pistonball #1245

Docs-fix: Pistonball #1245

anordin95 commented Nov 27, 2024 •

edited

Loading

David-GERARD commented Dec 4, 2024 •

edited

Loading

anordin95 Dec 4, 2024

David-GERARD Dec 4, 2024

anordin95 Dec 4, 2024

anordin95 commented Dec 4, 2024 •

edited

Loading

David-GERARD commented Dec 5, 2024

RyanNavillus left a comment

RyanNavillus Dec 8, 2024

anordin95 Dec 8, 2024 •

edited

Loading

RyanNavillus Dec 8, 2024

anordin95 Dec 8, 2024

RyanNavillus commented Dec 8, 2024

anordin95 commented Dec 8, 2024

	# TODO: this was a bad idea and the logic this uses should be removed at some point
	self.local_ratio = 0

	self.distance_to_wall_at_game_start = self.ball_prev_pos - self.wall_width
	self.initial_wall_distance = self.ball_prev_pos - self.wall_width

Docs-fix: Pistonball #1245

Docs-fix: Pistonball #1245

Conversation

anordin95 commented Nov 27, 2024 • edited Loading

Description

Type of change

Checklist:

David-GERARD commented Dec 4, 2024 • edited Loading

anordin95 Dec 4, 2024

Choose a reason for hiding this comment

David-GERARD Dec 4, 2024

Choose a reason for hiding this comment

anordin95 Dec 4, 2024

Choose a reason for hiding this comment

anordin95 commented Dec 4, 2024 • edited Loading

David-GERARD commented Dec 5, 2024

RyanNavillus left a comment

Choose a reason for hiding this comment

RyanNavillus Dec 8, 2024

Choose a reason for hiding this comment

anordin95 Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

RyanNavillus Dec 8, 2024

Choose a reason for hiding this comment

anordin95 Dec 8, 2024

Choose a reason for hiding this comment

RyanNavillus commented Dec 8, 2024

anordin95 commented Dec 8, 2024

anordin95 commented Nov 27, 2024 •

edited

Loading

David-GERARD commented Dec 4, 2024 •

edited

Loading

anordin95 commented Dec 4, 2024 •

edited

Loading

anordin95 Dec 8, 2024 •

edited

Loading