Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add num_rounds and num_chips to poker envs #1109

Closed
wants to merge 21 commits into from

Conversation

jjshoots
Copy link
Member

Description

This allows the poker envs to have multi rounds and have persistent number of chips for the total episode.

This is accomplished through the num_rounds and num_chips parameter, and bootstraps off the MultiEpisodeWrappers.

@@ -13,21 +13,30 @@ class MultiEpisodeEnv(BaseWrapper):
When there are no more valid agents in the underlying environment, the environment is automatically reset.
After `num_episodes` have been run internally, the environment terminates normally.
The result of this wrapper is that the environment is no longer Markovian around the environment reset.

When `starting_utility` is used, all agents start with a base amount of health points (think of this as poker chips).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to do it, but I'm not sure if this is the best name for it. Can't really think of anything else besides starting_reward or something. Maybe it could be like total_rewards to indicate that it makes the rewards track between resets. Or tally_rewards. Feel like starting_utility doesn't mean anything to me if I don't know how it works. Would make sense and be simple if it was total_rewards adds or subtracts by round.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think utility makes more sense, this is a pretty standard way of saying "starting amount of meaningful substance", starting reward can be confused for the reward given to the agent for starting a new episode, while tally rewards sounds like it should be a boolean.

Copy link
Contributor

@elliottower elliottower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good but will test it locally a bit as well to see what the values are. I believe since the number of chips you start with is 100, and if you lose a round when you go all in you get -100 as reward, it should just represent the amount of chips you lost. But I'm not 100% sure if that's how the rewards for texas holdem work in our envs.

@elliottower
Copy link
Contributor

Ok so looks like the rewards are +raised chips/2 for winner and -raised chips/2 for loser, not sure where the divide by 2 comes from (raise chips I'm assuming is the amount that you raised? Or maybe it's the total pot divided by 2? What if it's more than 2 players? Maybe you could look into this and see if it makes sense or if there's anything on RLCard's paper or website that talks about it)

@jjshoots
Copy link
Member Author

Welp this was a dumb idea, closing.

@jjshoots jjshoots closed this Sep 27, 2023
@elliottower
Copy link
Contributor

(We’re going to do this specific to poker instead, as it sort of breaks other envs when you remove agents on reset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants