Chapter 1 – Become an Adaptive Thinker
1. Is reinforcement learning memoryless? (Yes | No)
The answer is yes. Reinforcement learning is memoryless. The agent calculates the next state without looking into the past. This is significantly different to humans. Humans rely heavily on memory. A CPU-based reinforcement learning system finds solutions without experience. Human intelligence merely proves that intelligence can solve a problem. No more, no less. An adaptive thinker can then imagine new forms of machine intelligence.
2. Does reinforcement learning use stochastic (random) functions? (Yes | No)
The answer is yes. In the particular Markov Decision Process model, the choices are random. In just two questions, you can see that the Bellman equation is memoryless and makes random decisions. No human reasons like that. Being an adaptive thinker is a leap of faith. You will have to leave who you were behind and begin to think in terms of equations.
3. Is the Markov Decision Process based on a rule base?
The answer is no. Human rule base experience is useless in this process. Furthermore, the Markov Decision Process provides efficient alternatives to long consulting times with future users that cannot clearly express their problem.
4. Is the Q function based on the Markov Decision Process? (Yes | No)
The answer is yes. The use of the expression "Q" appeared around the time the Bellman equation, based on the Markov Decision Process, came into fashion. It is more trendy to say you are using a Q function than to speak about Bellman, who put all of this together in 1957. The truth is that Andrey Markov was Russian and applied this method in 1913 using a dataset of 20,000 letters to predict future use of letters in a novel. He then extended that to a dataset of 100,000 letters. This means that the theory was there 100 years ago. Q fits our new world of impersonal and powerful CPUs.
5. Is mathematics essential to artificial intelligence? (Yes | No)
The answer is yes. If you master the basics of linear algebra and probability, you will be on top of all the technology that is coming. It is worth spending a few months on the subject in the evening or taking a MOOC. Otherwise, you will depend on others to explain things to you.
6. Can the Bellman-MDP process in this chapter apply to many problems? (Yes | No)
The answer is yes. You can use this for robotics, market analysis, IoT, linguistics, and scores of other problems.
7. Is it impossible for a machine learning program to create another program by itself? (Yes | No)
The answer is no. It is not impossible. It has already been done by Google with AutoML. Do not be surprised. Now that you have become an adaptive thinker and know that these systems rely on equations, not humans, you can easily understand that mathematical systems are not that difficult to reproduce.
8. Is a consultant required to enter business rules in a reinforcement learning program? (Yes | No)
The answer is no. It is only an option. Reinforcement learning in the MDP process is memoryless and random. Consultants are there to manage, explain, and train in these projects.
9. Is reinforcement learning supervised or unsupervised? (Supervised | Unsupervised)
The answer is unsupervised. The whole point is to learn from unlabeled data. If the data is labeled, then we enter the world of supervised learning; that will be searching for patterns and learning them. At this point, you can easily see you are at sea in an adventure—a memoryless, random, and unlabeled world for you to discover.
10. Can Q Learning run without a reward matrix? (Yes | No)
The answer is no. A smart developer could always find a way around this, of course. The system requires a starting point. You will see in the second chapter that it is quite a task to find the right reward matrix in real-life projects.