At this point, the overall architecture contains two main components:
- Chapter 1: Become an Adaptive Thinker: A reinforcement learning program based on the value-action Q function using a reward matrix that is yet to be calculated. The reward matrix was given in the first chapter, but in real life, you'll often have to build it from scratch. This could take weeks to obtain.
- Chapter 2: A set of six neurons that represent the flow of products at a given time at six locations. The output is the availability probability from 0 to 1. The highest value is the highest availability. The lowest value is the lowest availability.
At this point, there is some real-life information we can draw from these two main functions: