Packt+ | Advance your knowledge in tech

You're reading from Artificial Intelligence for Robotics Build intelligent robots that perform human tasks using AI techniques

Product type Paperback

Published in Aug 2018

Publisher Packt

ISBN-13 9781788835442

Length 344 pages

Edition 1st Edition

Languages

Python

Tools

ROS

Concepts

Artificial Intelligence

Author (1):

Francis X. Govers III

View More author details

Table of Contents (19) Chapters

Title Page

Dedication

Packt Upsell

Contributors

Preface

1. Foundation for Advanced Robotics and AI FREE CHAPTER

2. Setting Up Your Robot

3. A Concept for a Practical Robot Design Process

4. Object Recognition Using Neural Networks and Supervised Learning

5. Picking up the Toys

6. Teaching a Robot to Listen

7. Avoiding the Stairs

8. Putting Things Away

9. Giving the Robot an Artificial Personality

10. Conclusions and Reflections

1. Assessments

2. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Chapter 5, Picking up the Toys

The origin of the Q-learning title is the doctoral thesis of Christopher John Cornish Hellaby Watkins from King’s College, London in May, 1989. Evidently, the Q just stands for “quantity”.
Only pick the Q-states that are relevant and follow-ons to the current state. If one of the states is impossible to reach from the current position, or state, then don’t consider it.
If the learning rate is too small, the training can take a very long time. If the learning rate is too large, the system does not learn a path, but instead “jumps around” and may miss the minimum or optimum solution. If the learning rate is too big, the solution may not converge, or suddenly drop off.

The discount factor works by decreases the reward as the path length gets longer. It is usually a value just short of 1.0 (for example, 0.93). Changing the discount factor higher may have the system reject valid longer paths and not find a solution. If the discount is too small, then paths may be very long.
You would adjust the fitness function to consider path length as a factor in the fitness.
You can implement the SARSA technique into program 2 ,as follows:

# SARSA = State, Action, Reward, State, Action
 Q[lastStat,lastAction]=reward+gamma*Q[stat2,action]
 #Q[stat,action]=reward + gamma * np.max(Q[stat2])

Generally, increasing the learning rate shortens the learning time in generations, up to a limit where the path jumps out of the valid range. For our example program, the lowest learning rate that returns a valid solution is 5, and the highest value is 15.
It causes the simulation to run much faster, but takes many more generations to find a solution.