Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor and distribution function should not be specified independently #1194

Closed
opcode81 opened this issue Aug 7, 2024 · 1 comment
Closed
Assignees
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement refactoring No change to functionality

Comments

@opcode81
Copy link
Collaborator

opcode81 commented Aug 7, 2024

Currently, the distribution function (e.g. for PPO) and the actor are specified independently, yet there is obviously a very strong connection: The nature of the actor's output determines the distribution function to apply.

This can cause errors that are hard to debug, especially when the input to the distribution function can be interpreted but its semantics are skewed.

Therefore, I propose to create a stronger link, which we can already implement at least in the high-level API:
The ActorFactory, which knows the nature of the actor's output, shall also create the matching distribution function, eliminating the issue. In turn, the dist_fn parameter can be entirely removed from the high-level API (breaking change).

@opcode81 opcode81 added enhancement Feature that is not a new algorithm or an algorithm enhancement refactoring No change to functionality labels Aug 7, 2024
@opcode81 opcode81 changed the title Actor outputs and distribution functions should be specified independently Actor outputs and distribution functions should *not* be specified independently Aug 7, 2024
@opcode81 opcode81 changed the title Actor outputs and distribution functions should *not* be specified independently Actor and distribution function should not be specified independently Aug 7, 2024
@opcode81 opcode81 self-assigned this Aug 7, 2024
opcode81 added a commit that referenced this issue Aug 7, 2024
distribution function (dist_fn) used in policies by creating
the distribution function in the actor factory which knows which
function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level API
because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not
specify the 'dist_fn' parameter, but persisted objects continue to work
as expected]

Implements #1194
opcode81 added a commit that referenced this issue Aug 7, 2024
distribution function (dist_fn) used in policies by creating
the distribution function in the actor factory which knows which
function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level API
because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not
specify the 'dist_fn' parameter, but persisted objects continue to work
as expected]

Implements #1194
opcode81 added a commit that referenced this issue Aug 7, 2024
distribution function (dist_fn) used in policies by creating
the distribution function in the actor factory which knows which
function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level API
because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not
specify the 'dist_fn' parameter, but persisted objects continue to work
as expected]

Implements #1194
opcode81 added a commit that referenced this issue Aug 7, 2024
distribution function (dist_fn) used in policies by creating
the distribution function in the actor factory which knows which
function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level API
because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not
specify the 'dist_fn' parameter, but persisted objects continue to work
as expected]

Implements #1194
opcode81 added a commit that referenced this issue Aug 7, 2024
distribution function (dist_fn) used in policies by creating
the distribution function in the actor factory which knows which
function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level API
because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not
specify the 'dist_fn' parameter, but persisted objects continue to work
as expected]

Implements #1194
MischaPanch added a commit that referenced this issue Aug 8, 2024
…tribution function (#1195)

Establish a strong link between the actor and the distribution function
(dist_fn) used in policies by creating the distribution function in the
actor factory which knows which function is appropriate.

Consequently, remove the policy parameter 'dist_fn' from the high-level
API because it is determined automatically, eliminating the possibility
of misspecification by the user. [breaking change: code must not specify
the 'dist_fn' parameter, but persisted objects continue to work as
expected]

Implements #1194

- [X] I have added the correct label(s) to this Pull Request or linked
the relevant issue(s)
- [X] I have provided a description of the changes in this Pull Request
- [X] I have added documentation for my changes and have listed relevant
changes in CHANGELOG.md
- [X] If applicable, I have added tests to cover my changes.
- [X] I have reformatted the code using `poe format` 
- [X] I have checked style and types with `poe lint` and `poe
type-check`
- [ ] (Optional) I ran tests locally with `poe test` 
(or a subset of them with `poe test-reduced`) ,and they pass
- [ ] (Optional) I have tested that documentation builds correctly with
`poe doc-build`
@MischaPanch
Copy link
Collaborator

Closed by #1195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature that is not a new algorithm or an algorithm enhancement refactoring No change to functionality
Projects
None yet
Development

No branches or pull requests

2 participants