-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average reward after learning a strategy #15
Comments
For online solvers like POMCP, unfortunately I don't think there is a better way to evaluate than through Monte Carlo simulations (the parallel simulator is the best for that: https://juliapomdp.github.io/POMDPSimulators.jl/latest/parallel/#Parallel-1).
If the solver is not tuned properly, it can have very variable results (or the game may just be too big for POMCP to handle reliably). Did you tune the exploration parameter and value estimate/rollout policy to something reasonable for the game? |
Also, I would strongly advocate for trying to solve the simplest and smallest possible version of your problem first before trying to move to a realistic size. |
Also, how do I construct an |
I'll try to explain how I use BasicPOMCP. Also I did not tune parameters of the solver. I'll try it, thank you for all of your advices, they helped a lot. |
Hello I used BasicPOMCP to find optimal strategy in quite large game. I used example to calculate 10000 tree queries, but even tho i see the tree, I am mostly interested in average reward. I know there is function simulate, however i feel like results from this method vary more than i expect (but maybe taking n simulations and then do some kind of average is a good solution).
Simply put is it possible to get average reward immediately after solver solves a game?
Thank you for response
The text was updated successfully, but these errors were encountered: