Reinforcement learning I – Temporal difference learning


After I’ve started working with reward-modulated STDP in spiking neural networks, I got curious about the background of research on which it was based. This led me to the book by Richard Sutton and Andrew Barto called “Reinforcement Learning”.  The book is from 1998 and it’s freely readable on the internet! In the book’s Introduction they cover the example of an agent learning to beat a given (imperfect) agent in the game of Tic Tac Toe. Two remarks have to be made: 1. The agent has to be imperfect because a perfect agent in Tic Tac Toe (if it’s the one doing the first move) can never be beaten. 2. The agent does not learn to play Tic Tac Toe, this skill is assumed, but it learns a value map for its policy.

Since I liked the example and wanted to try it out myself, I decided to write this blog post about it. By the way, the code can be found on github (run Continue reading


My favorite quotes by Albert Einstein

Quote by Albert Einstein from “Ideas and Opinions” found on Brain Pickings:

In this sense I have never looked upon ease and happiness as ends in themselves — this ethical basis I call the ideal of a pigsty. The ideals which have lighted my way, and time after time have given me new courage to face life cheerfully, have been Kindness, Beauty, and Truth.