Earlier in this module, I introduced you to the concepts of various RL types. Next, I'll explore model-based and model-fee types in more detail. Let me recap what I discussed earlier in the module about the two approaches of model-based and model-free methods. A model-based system uses a predictive model of the environment to detriment what happens when certain actions are taken. In other words, the policy is set in advance and the agent predicts whether or not each potential step is the best one or not. Model-free system skips the modeling step altogether in favor of learning a control policy directly. In other words, the policy is undefined at first and the agent explores its environment, updating the policy as it learns where it gets rewards. In the model-based method, the agent learns what is considered optimal behavior by learning a model of the environment. It does so through actions and by observing the resulting state and reward outcomes. You provide the agent with heuristics, a model or part of the model of the environment. Through experiences built on the foundation of a model, a value function emerges, which in turn results in a policy the agent can use in its interaction with the environment. When you have existing knowledge of the environment such as areas not worth exploring, you can supply the information that the agent needs in advance. Think of the model-based method as giving a basis of knowledge about the environment to the agent. In the streets and cars example here, you the agent can see where you are and the destination where you are headed. You can also see that there's no point driving towards the dead ends on the right side of the diagram. Alright, I talked about the model-based RL method. Let me contrast it with the model-free method which is called model-free because the agent does not need to know anything about the environment. It is powerful because you can drop an RL equipped agent into any system. So long as you have given the policy access to the observations, actions and enough internal states, the agent will learn how to collect the most rewards on its own. Without any understanding of the environment, an agent needs to explore all areas of the state space, complete its value function. That means it will spend some time exploring low reward areas during the learning process. Think of a model-free method as the agent is free to roam around as it searches for the best reward. Now that have taken you for a more detailed tour of model-based and model-free reinforcement learning types you might be wondering which should I use? What are the tradeoffs? Well, first in the model-based approach, you might have access to the model of the environment. If so, you know the probability distribution over states that you move to. If however, you don't have access to the environment, you first try to build the model yourself, which is an approximation. One nice thing about building a model yourself is it can be useful because it lets you plan, you can think about potential moves without actually performing any actions. The agent learns to then perform in that specific environment. So building a model is helpful but it has a disadvantage. By nature, the model-based method uses a model to learn the environment first and then an agent learns from the model in a feedback loop. Both operations are subject to errors and the agent relies on the model that learned the environment to be accurate. Unfortunately, the model is not accurate, especially in the beginning and worse, it can lead to double accumulation of errors. This growing deviation over many episodes begins to show results that are less and less optimal. The model-fee method is more popular because it is more suitable across applications. In the model-free approach, you're not given a model and you're not trying to explicitly detriment how it works, you just collect some experience and then hopefully derive the optimal policy. There are various algorithmic approaches listed here under each model-based and model-free branch. Let me describe some of the types and where they fall. Examples of model-based methods include, analytic gradient computation, sampling based planning, model-based data generation and value equivalence prediction. Examples of model-free methods include value based, which includes contextual bandits policy based, which could be on or off policy, and actor critic. For the remainder of this module I'll focus on the most commonly used model-free methods, value based, policy based and contextual bandits.