The truth is, as we use all previous methods to solve decision-making problems, it will be a time when the problems are very large. Some problems become so large that we can no longer represent it in computer memory. Moreover, even if we could hold a table with all state and action pair in memory, collecting experience for every state, action combination would be inefficient.
One way of approaching this problem is to combine states into buckets by similarity. This approach could effectively reduce the number of states of the problem to a number that allows us to solve the problem using one of the methods seen in previous lessons. For example, in the OpenAI Lunar Lander world, we can see how the entire right side of the landing pad and the left side could be counted as 2 unique states. Truth is, no matter where in that right or left area, your best action will be flight either left or right respectively making sure you are in the middle. Additionally, the vertical axis could be easily in the 50% up as a single area and many smaller areas as we get closer to the landing pad. We will see how to apply discretization to the cart-pole problem on this lesson's notebook.
Quickly after looking into discretization, any Machine Learning Engineer would shake his/her head. Why not using function approximation instead of doing this by hand? This is exactly why function approximation exists. In fact, we could use any function approximator like KNN or SVM, however, if the environment is non-linear, then nonlinear function approximators should be used instead as without them we might be able to find a solution that improves but never reaches convergence to the optimal policy. Perhaps, the most popular non-linear function approximators nowadays are neural networks. In fact, the use of neural networks that are more than 3 layers deep in combination with reinforcement learning algorithms is often grouped on a field called Deep Reinforcement Learning. This is perhaps one of the most interesting and promising areas of reinforcement learning and we will look into it on next lesson's notebook.
In this lesson, we got a step closer to what we could call 'real-world' reinforcement learning. In specific, we look at a kind of environment in which there are so many states that we can no longer represent a table of all of them. Either because the state space is too large or flat out continuous.
In order to get a sense for this type of problem, we will look a basic Cart Pole pole, and we will solve it by discretizing the state space in a way to making a manual function approximation of this problem.
Lesson 5 Notebook.
- An Analysis of Reinforcement Learning with Function Approximation
- Residual Algorithms: Reinforcement Learning with Function Approximation
- A Brief Survey of Parametric Value Function Approximation
- A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning
- Playing Atari with Deep Reinforcement Learning
- Function Approximation via Tile Coding: Automating Parameter Choice