What is the key point about reinforcement learning in a strong simulator setting?
a. We don’t learn the transition or reward model, but directly learn what to do when.
b. The agent cannot teleport to any state and is restricted
c. The agent can jump to any state and start simulating from there.
d. Agent learns both optimal policy + state values.