Average reward after learning a strategy #15

kubicon · 2020-02-15T14:50:26Z

Hello I used BasicPOMCP to find optimal strategy in quite large game. I used example to calculate 10000 tree queries, but even tho i see the tree, I am mostly interested in average reward. I know there is function simulate, however i feel like results from this method vary more than i expect (but maybe taking n simulations and then do some kind of average is a good solution).
Simply put is it possible to get average reward immediately after solver solves a game?

Thank you for response

zsunberg · 2020-02-15T19:23:54Z

For online solvers like POMCP, unfortunately I don't think there is a better way to evaluate than through Monte Carlo simulations (the parallel simulator is the best for that: https://juliapomdp.github.io/POMDPSimulators.jl/latest/parallel/#Parallel-1).

i feel like results from this method vary more than i expect

If the solver is not tuned properly, it can have very variable results (or the game may just be too big for POMCP to handle reliably). Did you tune the exploration parameter and value estimate/rollout policy to something reasonable for the game?

zsunberg · 2020-02-15T19:34:12Z

Also, I would strongly advocate for trying to solve the simplest and smallest possible version of your problem first before trying to move to a realistic size.

zsunberg · 2020-02-15T19:37:56Z

Also, how do I construct an OriginalGame with reasonable values? Do you have a default constructor written by now?

kubicon · 2020-02-16T09:24:40Z

I'll try to explain how I use BasicPOMCP.
I have Pursuit Evasion Game for 2 players, where one can fully observe the game, and second only partially. Also I have algorithm, which should solve game for imperfect information player (player 1), against this perfect information player (player 2). So my goal is now to calculate Based on strategy for player 1, if strategy taken for player 2, was really the best, while using fixed strategy of player 1 (basically saying to check if current algorithm really works as intended).
So in BasicPOMCP, I use information from original Game (hence the name originalGame) and strategy calculated from solving algorithm. At the beginning I wanted to make 3 structs, one for original game, second for result from algorithm and third to combine only necessary parameters for solver. But in the middle I decided to use just one struct, and didn't bother changing the name.
I hope this is explanatory enough, and I will attach what I have right now (in workable state).
My Julia skills are not really great, this is my first project in it and it is not even main area of my work atm.

Also I did not tune parameters of the solver. I'll try it, thank you for all of your advices, they helped a lot.
PS: I added small Readme just to explain what every file does and commented how I expect the game to work. And as before it is still work in progress, so there are lot of things which doesn't work optimally.
BasicPOMCPSolving.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Average reward after learning a strategy #15

Average reward after learning a strategy #15

kubicon commented Feb 15, 2020

zsunberg commented Feb 15, 2020

zsunberg commented Feb 15, 2020

zsunberg commented Feb 15, 2020

kubicon commented Feb 16, 2020 •

edited

Loading

Average reward after learning a strategy #15

Average reward after learning a strategy #15

Comments

kubicon commented Feb 15, 2020

zsunberg commented Feb 15, 2020

zsunberg commented Feb 15, 2020

zsunberg commented Feb 15, 2020

kubicon commented Feb 16, 2020 • edited Loading

kubicon commented Feb 16, 2020 •

edited

Loading