In this video, I will introduce you to log likelihoods. These are just logarithms of the probabilities we're calculating from the last video. But they are way more convenient to work with and they appear throughout deep-learning and NLP. Okay, so let's go back to the table you saw previously that contains the conditional probabilities of each word, for positive or negative sentiment. Words can have many shades of emotional meaning. But for the purpose of sentiment classification, they're simplified into three categories: neutral, positive, and negative. All can be identified by using their conditional probabilities. These categories can be numerically estimated just by dividing the corresponding conditional probabilities of this table. Now, let's see how this ratio looks for the words in your vocabulary. So the ratio for the word i is 0.2 divided by 0.2, or one. The ratio for the word m is again one. The ratio for the word happy is 0.14 divided by 0.1 or 1.4. Do the same for because, learning, and NLP, the ratio is one. For sad and not, their ratio is 0.1 divided by 0.15 or 0.6. Again, neutral words have already shown you one. Positive words have a ratio larger than one. The larger the ratio, the more positive the word's going to be. On the other hand, negative words have a ratio smaller than one. The smaller the value, the more negative the word. In this week's assignment, you'll implement a function that filters words depending on their positivity or negativity. You will find the expression shown here to be very helpful with that. These ratios are essential in Naive Bayes' for binary classification. I'll illustrate why using an example you've seen before. Recall earlier where you used the formula to categorize a tweet as positive if the products of the corresponding ratios of every word appears in the tweet is bigger than one. We said it was negative if it was less than one. This is called the likelihood. If you're to take a ratio between the positive and negative tweets, you'd have what's called the prior ratio. I haven't mentioned it till now because in this small example, you had exactly the same number of positive and negative tweets, making the ratio one. In this week's assignments, you'll have a balanced datasets, so you'll be working with a ratio of one. In the future though, when you're building your own application, remember that this term becomes important for unbalanced datasets. With the addition of the prior ratio, you now have the full Naive Bayes' formula for binary classification, a simple, fast, and powerful method that you can use to establish a baseline quickly. Now it's a good time to mention some other important considerations for your implementation of Naive Bayes'. Sentiments probability calculation requires multiplication of many numbers with values between zero and one. Carrying out such multiplications on their computer runs the risk of numerical underflow when the number returned is so small it can't be stored on your device. Luckily, there's a mathematical trick to solve this. It involves using a property of logarithms. Recall that the formula you're using to calculate a score for Naive Bayes' is the prior multiplied by the likelihood. The trick is to use a log of the score instead of the raw score. This allows you to write the previous expression as the sum of the log prior and the log likelihood, which is a sum of the logarithms of the conditional probability ratio of all unique words in your corpus. Let's use this method to classify the tweets. I'm happy because I'm learning. Remember how you use the Naive Bayes' inference condition earlier to get the sentiment score for your tweets. Now you're going to do something very similar to get the log of your score. What you'll need to calculate the log of the score is called the Lambda. This is the log of the ratio of the probability that your word is positive and you divide that by the probability that the word is negative. Now let's calculate Lambda for every word in our vocabulary. So for the word I, you get the logarithm of 0.05 divided by 0.05. Or the logarithm of one, which is equal to zero. Remember, the tweet will be labeled positive if the product is larger than one. By this logic, I would be classified as neutral at zero. For am, you take the log of 0.04 over 0.04, which again is equal to zero. So you enter zero in the table. For happy, you get a Lambda of 2.2, which is greater than zero, indicating a positive sentiment. From here on out, you can calculate the log score of the entire corpus just by summing out the Lambdas. You're almost done with the log likelihood. Let's stop here and take a quick look back at what you did so far. Words are often emotionally ambiguous but you can simplify them into three categories and then measure exactly where they fall within those categories for binary classification. You do so by dividing the conditional probabilities of the words in each category. This ratio can be expressed as a logarithm as well, called Lambda. You can use that to reduce the risk of numerical underflow.