In this lesson, we will be discussing how to plot univariate charts for numerical variables. Examples of numerical variables from itized data could be price or quantity. So essentially what is a histogram? Let's first discuss what's histogram. So histogram shows a numerical variable grouped in different bins or buckets on the x-axis. And the number of observations that fall in each of these buckets is represented on Y axis. You usually looking for whether a histogram is symmetric or not that kind of tells how are you values distributed. So if the values are normally distributed you will see histogram, which is sort of forming a bell-shaped figure. If you see a histogram where the mound or the higher values on the Y axis are on the left and sort of smaller values going on the right you call it a right skewed distribution, sort of you are looking for a you are looking at a tail on the right side, so that's a right skewed distribution. Alternatively you can have a left skewed distribution. So one of the arguments that you need to specify in histogram is in how many buckets do you want to divide the entire data? The default is 30 but suggested number is the square root of the total number of observations you have. In our data we have 40,000 observations and that's why approximately 200 would be a good number of bins. So that's why I'm specifying 200 bins here and I plot this histogram. When I plot this histogram I see that the X axis is going all the way up to 7, 8,000, but I really don't see anything happening beyond 10 or 50 on the x-axis and it's not very informative. So our data may be right skewed. Another thing that we can do is we can transform our data may be on a log scale. Because it is just going to bring the larger values close to the smaller values and that might help us see things better. So what I'm doing here in this piece of code is I am going to create instead of price the aesthetic that I am going to use is the log of price, that's it, nothing else. And then I can do bins 230, you have to try various bins to see which really looks better. Let's just run this code to see what we get. So when we log transformed it, at least we see that the histogram, not exactly symmetric, but has the higher values in the middle. The next thing that I want to discuss for a univariate exploration of a numerical data is Boxplot. So let's quickly see what a box plot offers. Boxplot shows you a couple of things that you may want to investigate. The first one is the first quartile. What's the first quartile? The value of price below which lie 25% observations. Then is median, the value of price below which live 50% of the observations and around 50% of the observations will have price above this given price, that's the median price. Then you have third quartile, which is 75th percentile. It also reports the minimum and maximum values that you see as the two ends of these whiskers. But remember these minimum and maximum are not the actual minimum and maximum in your data. These are the values that are 1.5 interquartile range away from your 25th percentile in 75th percentile. Any values that are above or below these whiskers are outliers. So let's quickly see that when we create a boxplot we get Lot of values at the top as outliers. And just for our visual inspection, I'm going to zoom in onto a narrow range for Y values. And when I do that, We see that there are a lot of these values on the higher side that are outliers because they are beyond this point. This is the point that we are looking for anything beyond this is an outlier, anything below this is an outlier. This is our 25th percentile or first quartile, this is median, and this is third quartile or 75th percentile. Let's quickly see the boxplot for the log price. Now, I see outiers on both the sides. But without zooming on I can see my boxplot clearly here. So this is because we have used the log transformation, which is helpful.