AiBirds

Goal: Implementation of a fast-learning deep neural network with general applicability and especially well Angry Birds playing performance.

The idea originated from the Angry Birds AI Competition.

Key Features • Installation • Usage • Troubleshooting • Acknowledgements • Bibliography • License •

Key Features

Reinforcement learning framework with following features:
- Annealing Epsilon-Greedy Policy
- Dueling Networks
- Prioritized Experience Replay
- Double Q-Learning
- Frame-stacking
- n-step Expected Sarsa
- Distributed RL
- (Sequential Learning (for LSTMs and other RNNs))
- Monte Carlo Return Targets
Environments:
- Angry Birds, with level generator
- Snake
- Tetris
- Chain Bomb (invented)
Environment API: quick switch between environments without need of changing the model
Extensive utility library:
- Plotting: >40 different ways to compare several training run metrics (score, return, time etc.) on different domains (transitions, episodes, wall-clock time, etc.)
- Permanence: save and reload entire training runs, including model weights and statistics
- Pre-training: unsupervised model pre-training on randomly generated environment data
- Train sample control: sample states with extreme loss during training are logged and plotted automatically
- and much more

Soon to come

Frame stacking
Angry Birds environment parallelization
Handling for (practically) infinite episodes

Installation

Just clone the repo, that's it.

Usage

Just tune and run the agent from src/train.py. You can let the agent practice, observe how the agent plays, view plot statistics, and more.

Any generated output (models, plots, statistics etc.) will be saved in out/.

Parameter Overview

Parameter	Reasonable value	Explanation	If too low, then	If too high, then
General
`num_parallel_inst`	`500`	Number of simultaneously executed environments	Training overhead dominates computation time	Possibly worse sample complexity, GPU or RAM out of memory
`num_parallel_steps`	`1000000`	Number of transitions done per parallel environments	Learning stops before agent performance is optimal	Wasted energy, overfitting
`policy`	`"greedy"`	The policy used for planning (`"greedy"` for max Q-value, `"softmax"` for random choice of softmaxed Q-values)	-	-
Model input
`stack_size`	`1`	Number of recent frames to be stacked for input, useful for envs with time dependency like Breakout	Agent has "no feeling for time", bad performance on envs with time dependency	Unnecessary computation overhead
Learning target
`gamma`	`0.999`	Discount factor	Short-sighted strategy, early events dominate return	Far-sighted strategy, late events dominate return, target shift, return explosion
`n_step`	`1`	Number of steps used for Temporal Difference (TD) bootstrapping
`use_mc_return`	`False`	If True, uses Monte Carlo instead of n-step TD
Model
`latent_dim`	`128`	Width of latent layer of stem model	Model cannot learn the game entirely or makes slow training progress	Huge number of (unused) model parameters
`latent_depth`	`1`	Number of consecutive latent layers	Model cannot learn the game entirely or makes slow training progress	Many (unused) model parameters
`lstm_dim`	`128`	Width of LSTM layer	Model cannot learn the game entirely, slow training progress or model is bad at remembering	Many (unused) model parameters
`latent_v_dim`	`64`	Width of latent layer of value part of Q-network	Similar to `latent_dim`	Similar to `latent_dim`
`latent_a_dim`	`64`	Width of latent layer of advantage part of Q-network	Similar to `latent_dim`	Similar to `latent_dim`
Replay (training)
`optimizer`	`tf.Adam`	The `tf` optimizer to use	-	-
`mem_size`	`4000000`	Number of transitions that fit into the replay memory	Overfitting to recent observations	RAM out of memory, too old transitions in replays => in case of RNNs can lead to recurrent state staleness due to representational drift; large computation overhead due to replay sampling
`replay_period`	`64`	Number of (parallel) steps between each training session of the learner	Training overhead dominates computation time	Slow training progress
`replay_size_multiplier`	`4`	Determines replay size by multiplying number of new transitions with this factor	Too strong focus on new observations, overfitting	Too weak focus on new observations, slow training progress
`replay_batch_size`	`1024`	Batch size used for learning, depends on GPU	GPU parallelization not used effectively	GPU out of memory
`replay_epochs`	`1`	Number of epochs per replay	Wasted training data, slow progress	Overfitting
`min_hist_len`	`0`	Minimum number of observed transitions before training is allowed	Unstable training or overfitting in the beginning	Wasted time
`alpha`	`0.7`	Prioritized experience replay exponent, controls the effect of priority	Priorities have low/no effect on training, slower training progress	Too strong priority emphasis, overfitting
`use_double`	`True`	Whether to use Double Q-Learning to tackle moving target issue	-	-
`target_sync_period`	`256`	Number of (parallel) steps between synchronization of online learner and target learner	Moving target problem	Slow training progress
`actor_sync_period`	`64`	Distributed RL: number of (parallel) steps between synchronization of (online) learner and actor	?	Slow training progress
Learning rate
`init_value`	`0.0004`	Starting value of learning rate	Slow training progress	No training progress, bad maximum performance or unstable training
`half_life_period`	`4000000`	Determines learning rate decay, number of played transitions after which learning rate halves	Learning rate decreases too quickly	Learning rate too large
`warmup_transitions`	`0`	Number of episodes used for (linear) learning rate warm-up	Unstable training	Slow training progress
Epsilon
`init_value`	`1`	Starting value for epsilon-greedy policy	Too few exploration, slow training progress	Too much exploration, slow training progress
`decay_mode`	`"exp"`	Shape of epsilon value function over time	-	-
`half_life_period`	`700000`	Determines epsilon annealing, number of played transitions after which epsilon halves	Similar to `init_value`	Similar to `init_value`
`minimum`	`0`	Limit to which epsilon converges when decaying over time	Not enough long-time exploration, missed late-game opportunities for better performance	Similar to `init_value`
Sequential training (for RNNs)
`sequence_len`	`20`	Length of sequences saved in replay buffer and used for learning	Slow training progress	Wasted computation and memory resources
`sequence_shift`	`10`	Number of transitions sequences are allowed to overlap	Few sequences, some time-dependencies might not be captured	Too many similar sequences, overfitting
`eta`	`0.9`	For sequential learning: determines sequence priority. `0`: sequence prio = average instance prio, `1`: sequence prio = max instance prio.	?	?
Other

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

Troubleshooting

Science Birds (Angry Birds simulator) doesn't show any level objects

Symptom: level objects don't show up in Science Birds after level was load. The actual problem is that the objects did spawn outside of the level boundaries. The reason for this turned out to be the OS's language/unit configuration. In my case the system language is de_DE and for this, the decimal point is not a point but a comma (e.g. 2,7). The problem is that unity3D uses the system configuration for their coordination system and coordinates like x=2.5, y=8.4 could not be interpreted correctly.
Solution: start ScienceBirds with the language en_US.UTF-8 so that unity3D uses points for floats instead of commas, or (in case of Windows) set the OS's region to English (U.S.).

Acknowledgements

The team behind Science Birds for a good framework

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
hyperparams		hyperparams
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare.py		compare.py
play.py		play.py
pretrain.py		pretrain.py
run_game.py		run_game.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiBirds

Key Features

Soon to come

Installation

Usage

Parameter Overview

Contributing

Troubleshooting

Science Birds (Angry Birds simulator) doesn't show any level objects

Acknowledgements

Bibliography

License

About

Releases

Packages

Languages

License

BluemlJ/AiBirds

Folders and files

Latest commit

History

Repository files navigation

AiBirds

Key Features

Soon to come

Installation

Usage

Parameter Overview

Contributing

Troubleshooting

Science Birds (Angry Birds simulator) doesn't show any level objects

Acknowledgements

Bibliography

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages