Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor loss function #4

Open
SimonBurmer opened this issue Feb 8, 2021 · 0 comments
Open

Actor loss function #4

SimonBurmer opened this issue Feb 8, 2021 · 0 comments

Comments

@SimonBurmer
Copy link

SimonBurmer commented Feb 8, 2021

Hi Coac, i really like your BicNet implementation! My goal is to run your BicNet implementation on an environment where every agent gets -1 reward for each time step it needs to finish the env. But there is a problem with your actor loss implementation, because the loss of the actor is defined as the prediction of the critic, the rewards needs to converges to zero if the agents performs perfect, isn't it?

loss_actor = -self.critic(state_batches, clear_action_batches).mean()

Can you explain to me why you implemented it this way? Also, is there a possibility that the reward doesn't converges to 0 when the Agents performs good (linke in the environment i mentioned above)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant