-
-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add num_rounds and num_chips to poker envs #1109
Conversation
…includes betting (can be sure rendering works)
@@ -13,21 +13,30 @@ class MultiEpisodeEnv(BaseWrapper): | |||
When there are no more valid agents in the underlying environment, the environment is automatically reset. | |||
After `num_episodes` have been run internally, the environment terminates normally. | |||
The result of this wrapper is that the environment is no longer Markovian around the environment reset. | |||
|
|||
When `starting_utility` is used, all agents start with a base amount of health points (think of this as poker chips). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting way to do it, but I'm not sure if this is the best name for it. Can't really think of anything else besides starting_reward
or something. Maybe it could be like total_rewards
to indicate that it makes the rewards track between resets. Or tally_rewards
. Feel like starting_utility
doesn't mean anything to me if I don't know how it works. Would make sense and be simple if it was total_rewards
adds or subtracts by round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think utility makes more sense, this is a pretty standard way of saying "starting amount of meaningful substance", starting reward can be confused for the reward given to the agent for starting a new episode, while tally rewards sounds like it should be a boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good but will test it locally a bit as well to see what the values are. I believe since the number of chips you start with is 100, and if you lose a round when you go all in you get -100 as reward, it should just represent the amount of chips you lost. But I'm not 100% sure if that's how the rewards for texas holdem work in our envs.
Ok so looks like the rewards are |
Welp this was a dumb idea, closing. |
(We’re going to do this specific to poker instead, as it sort of breaks other envs when you remove agents on reset) |
Description
This allows the poker envs to have multi rounds and have persistent number of chips for the total episode.
This is accomplished through the
num_rounds
andnum_chips
parameter, and bootstraps off the MultiEpisodeWrappers.