critic activation function may be wrong #13

yuh8 · 2021-11-18T04:30:38Z

tanh activation is used for critic output which limits the output range between -1 and 1. However, the self.reward which is used as ground truth for critic training is discounted cumulative sum of all rewards which can easily exceed 1. Is tanh a sensible activation for loss?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

critic activation function may be wrong #13

critic activation function may be wrong #13

yuh8 commented Nov 18, 2021

critic activation function may be wrong #13

critic activation function may be wrong #13

Comments

yuh8 commented Nov 18, 2021