Packt+ | Advance your knowledge in tech

You're reading from Reinforcement Learning with TensorFlow A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym

Product type Paperback

Published in Apr 2018

Publisher Packt

ISBN-13 9781788835725

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

OpenAI Gym

Concepts

Reinforcement Learning

Author (1):

Dutta

View More author details

Table of Contents (21) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Deep Learning – Architectures and Frameworks FREE CHAPTER

2. Training Reinforcement Learning Agents Using OpenAI Gym

3. Markov Decision Process

4. Policy Gradients

5. Q-Learning and Deep Q-Networks

6. Asynchronous Methods

7. Robo Everything – Real Strategy Gaming

8. AlphaGo – Reinforcement Learning at Its Best

9. Reinforcement Learning in Autonomous Driving

10. Financial Portfolio Management

11. Reinforcement Learning in Robotics

12. Deep Reinforcement Learning in Ad Tech

13. Reinforcement Learning in Image Processing

14. Deep Reinforcement Learning in NLP

1. Further topics in Reinforcement Learning

2. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

The SARSA algorithm

The State–Action–Reward–State–Action (SARSA) algorithm is an on-policy learning problem. Just like Q-learning, SARSA is also a temporal difference learning problem, that is, it looks ahead at the next step in the episode to estimate future rewards. The major difference between SARSA and Q-learning is that the action having the maximum Q-value is not used to update the Q-value of the current state-action pair. Instead, the Q-value of the action as the result of the current policy, or owing to the exploration step like

-greedy is chosen to update the Q-value of the current state-action pair. The name SARSA comes from the fact that the Q-value update is done by using a quintuple Q(s,a,r,s',a') where:

s,a: current state and action
r: reward observed post taking action a
s': next state reached after taking action a
a': action to be performed at state s'

Steps involved in the SARSA algorithm are as follows: