![]() Next we continued training on the same field size for additional 500k episodes. ![]() We trained the model in four phases: In the first run, we used a 4x4 field and ran for 100k episodes to see if the agent is improving. Then the episode would end without negative reward. Usually we stopped playing an episode when the agent reached 200 steps. The discount factor was set to $\gamma=0.995$. We were able to use a learning rate of $1e-3$ through the whole training without running into instabilities. Training of the agentĪfter a bit of experimentation, we decided on a model with one hidden layer and 512 neurons. -1 when game over (hitting a wall or itself).Most of the time the agent just tried to stay on the same spot, being “afraid” of hitting a wall or itself, because that penalty was much larger than getting the slight penalty of not eating. -100 when game over (hitting a wall or itself).The tiles in the agents view are colored in blue.įor the rewards per step we first tried it with We decided, arbitrarily, that we are going to restrict its view to four tiles in each of the possible movement directions (for an example see the blue tiles in Figure 2).įigure 2: The agent playing the game on a 16x16 board. This helps reducing the state space dramatically, it speeds up the learning and it lifts the restriction to a fixed field size. Then we came up with a different idea: Restrict the snakes visibility range to a certain amount of tiles around its head. While this did work well, we soon found out, that this is not easily generalizable, because the observation space grows or shrinks with the size of the game board and one cannot just train one model and reuse it on other board sizes. It is necessary to have four discrete actions to control the snake:ĭetermining the best state representation for our agent took more experimenting: The first approach was to encode all the tiles of the field with its type in a one-dimensional tensor. We need to be able to tell the game what to do (which step to perform next) and we need a way to get an observation describing the current game state. In addition to the game itself, it is necessary to encapsulate the game in an environment suitable for machine learning. The initial length of the snake is one tile, therefore, the maximum score for a given game field size of $N_x$ x $N_y$ is $N_xN_y - 1$. In our implementation always one piece of food is placed randomly onto an empty field of the game. We believe, that the agent is fast enough to handle also that time constraint, if you should really want to use it in the self-moving game. This may make it boring if you are playing as a human, but it saves boilerplate code to discretize the game output into separate observations again. We modified the traditional behavior of an automatically moving snake to a snake, which only moves one step when it gets the next action. Therefore, we implemented the game ourselves, as relying on other implementations or looking into using them, may have taken more time than just doing it ourselves. The game can easily be implemented in a few lines of Python code and when you throw in a couple more, you can even make a simple visualization in PySDL2. The player loses when the snake runs into itself or into the screen border. As the goal is to eat as much food as possible to increase your score, the length of the snake keeps increasing. Usually the length of the tail depends on the amount of food the snake ate. As it moves, it leaves a tail behind, resembling a snake. The gameplay is simple: The player controls a dot, square, or something similar on a 2d world. I, for myself, have to admit to have spent too much time trying to feed that little snake on my Nokia 6310. Nokia started putting a variant of Snake onto their mobile phones in 1998, which brought a lot of new attention to this game. Many of you will probably know Snake from Nokia phones. According to Wikipedia the game concept is as old as 1976 and because it is so easy to implement (but still fun!) a ton of different implementations exist for nearly every computer platform. Snake is a name for a series of video games where the player controls a growing “snake”-like line. We will show that, with moderate effort, an agent can be trained which plays the game reasonably well.įigure 1: The agent playing the game on a 10x10 board for 500 steps. We are going to use TensorFlow to implement the actor-critic algorithm which is then used to learn playing the game. The game is implemented from scratch using Python including a visualization with PySDL2. In this article we are going to use reinforcement learning (RL) to teach a computer to play the classic game Snake (remember the good old Nokia phones?). Teaching an AI how to play the classic game Snake
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |