Core Insight
When the state space is too large for a Q-table, approximate Q-learning uses feature-based representations to generalize across similar states — essentially turning RL into a supervised learning problem over state-action features.
My Analysis
This clicked for me when I realized the connection to linear regression:
- Q(s,a) = w · f(s,a) — the Q-value is just a weighted sum of features, exactly like linear regression
- Weight update — instead of updating a table entry, you update weights using the TD error
- Feature design — this is where the real art is. Good features capture the structure of the problem (e.g., "distance to nearest ghost" in Pac-Man)
The trade-off: you lose the ability to represent arbitrary Q-functions (since you're limited to a linear combination of features), but you gain massive generalization. A feature like "number of food pellets remaining" lets you generalize across millions of states you've never seen.
Key takeaway: the hardest part of approximate Q-learning isn't the algorithm — it's choosing the right features.