Logistic activation functions and classifiers
Now that the value of each location of L={l1,l2,l3,l4,l5,l6} contains its availability in a vector, the locations can be sorted from the most available to least available location. From there, the reward matrix for the MDP process described in the first chapter can be built.
Overall architecture
At this point, the overall architecture contains two main components:
- Chapter 1: Become an Adaptive Thinker: A reinforcement learning program based on the value-action Q function using a reward matrix that is yet to be calculated. The reward matrix was given in the first chapter, but in real life, you'll often have to build it from scratch. This could take weeks to obtain.
- Chapter 2: A set of six neurons that represent the flow of products at a given time at six locations. The output is the availability probability from 0 to 1. The highest value is the highest availability. The lowest value is the lowest availability.
At this point, there is some real-life...