The size of big data. So, I looked at this slide last weekend and I didn't like what I had here. And so, I decided that I'm going with what I said earlier in the semester. How much data do you have to accumulate before you consider it big data? So this is my opinion, this is what I base on when I see happening. I think data sets that are in excess of 100 terabytes start to constitute big data. I've got about 20 terabytes of physical storage at my house. I have a little NAS box with a bunch of three and a half inch drives hooked up to my Mac at home. And when I back it up to an external drive, it's about two and a half terabytes. And that's a whole bunch of videos and my whole music collection and all my documents. And the music I record as well does take up quite a bit of space actually, but that's under three terabytes. And so for my perspective, anything over 100 terabytes is starting to get into the big data set realm. According to this guru99 article, Facebook ingests, that is how much data goes into Facebook everyday, 500 plus TB. That's a lot. So it beats my requirement of greater than 100 TB of data. And they're dealing with these, big hyperscalers are dealing with data just unbelievable, unbelievable quantities and sizes unheard of five or ten years ago. I heard a talk recently. This was a gentleman from Microsoft, and he indicated that one self-driving car can generate in between 30 and 40 petabytes of data per day. Think about that over a year. You can do the math. [LAUGH] It’s a lot. [LAUGH] Is all that data used, and consumed, and looked at? Unknown. Some of it certainly is looked at. So why look at the data? And the answer is, is we want to apply these techniques to extract key insights. That's why we want to look at the data. We want to make better and faster decisions. We want to reduce costs, we want to improve operational efficiencies and we want to explore the opportunities of businesses to create new products and services. As I mentioned at the beginning of class, big day to analytics, sometimes called predictive analytics and machine learning, they are intimately tied together. And as I mentioned, it was hard for me to pull them apart into machine learning part and a analytics part. I guess I will see how well I've done here. So here is Google's Process, okay? For Big Data analytics, and you can go out to Google site and find this. So they define a business problem, that's one of the first things in analytics problem. What is it you're after? What are you trying to learn, if anything? They say it's important to be able to define a business problem first, so define and prioritize business questions, estimate the opportunity and the size. And then, you design a set of experiments, you brain storm hypothesis for drivers, you understand what other people are doing. You set up an experiment, you build, clean, and merge all of your data and I'll talk more about that later. And then you start conducting experiments you test everything else, they call it test drivers, and you're trying out different algorithms, different ways of processing the data. And iteratively refine them going back around through this loop. And then, present findings and secure buy-in from key stakeholders in your company. This would be executive management, undoubtedly. And then, if it's given the green light and then you can put it into production, you can enable a live data set to start the machine learning algorithms and then the analytics process to actually start taking action in the real world. Define and put in place management approach and communication plans for the public or the consumers of this product, or this service, whatever it is. And then, eventually, if you're successful, you'll capture the key business impacts and outcomes. So you may lead to finding new processes, new skills, and maybe, new ways of measuring impacts to your business And the ones with the little stars on them or really whatever those are, are the most challenging steps as Google sees it.