Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average reward after learning a strategy #15

Open
kubicon opened this issue Feb 15, 2020 · 4 comments
Open

Average reward after learning a strategy #15

kubicon opened this issue Feb 15, 2020 · 4 comments

Comments

@kubicon
Copy link

kubicon commented Feb 15, 2020

Hello I used BasicPOMCP to find optimal strategy in quite large game. I used example to calculate 10000 tree queries, but even tho i see the tree, I am mostly interested in average reward. I know there is function simulate, however i feel like results from this method vary more than i expect (but maybe taking n simulations and then do some kind of average is a good solution).
Simply put is it possible to get average reward immediately after solver solves a game?

Thank you for response

@zsunberg
Copy link
Member

For online solvers like POMCP, unfortunately I don't think there is a better way to evaluate than through Monte Carlo simulations (the parallel simulator is the best for that: https://juliapomdp.github.io/POMDPSimulators.jl/latest/parallel/#Parallel-1).

i feel like results from this method vary more than i expect

If the solver is not tuned properly, it can have very variable results (or the game may just be too big for POMCP to handle reliably). Did you tune the exploration parameter and value estimate/rollout policy to something reasonable for the game?

@zsunberg
Copy link
Member

Also, I would strongly advocate for trying to solve the simplest and smallest possible version of your problem first before trying to move to a realistic size.

@zsunberg
Copy link
Member

Also, how do I construct an OriginalGame with reasonable values? Do you have a default constructor written by now?

@kubicon
Copy link
Author

kubicon commented Feb 16, 2020

I'll try to explain how I use BasicPOMCP.
I have Pursuit Evasion Game for 2 players, where one can fully observe the game, and second only partially. Also I have algorithm, which should solve game for imperfect information player (player 1), against this perfect information player (player 2). So my goal is now to calculate Based on strategy for player 1, if strategy taken for player 2, was really the best, while using fixed strategy of player 1 (basically saying to check if current algorithm really works as intended).
So in BasicPOMCP, I use information from original Game (hence the name originalGame) and strategy calculated from solving algorithm. At the beginning I wanted to make 3 structs, one for original game, second for result from algorithm and third to combine only necessary parameters for solver. But in the middle I decided to use just one struct, and didn't bother changing the name.
I hope this is explanatory enough, and I will attach what I have right now (in workable state).
My Julia skills are not really great, this is my first project in it and it is not even main area of my work atm.

Also I did not tune parameters of the solver. I'll try it, thank you for all of your advices, they helped a lot.
PS: I added small Readme just to explain what every file does and commented how I expect the game to work. And as before it is still work in progress, so there are lot of things which doesn't work optimally.
BasicPOMCPSolving.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants