There is an immense world of options for you well beyond Python and R. I just snatched three of them here and put them down. We're not going to go look at these, but I just include them for your reference. There's another one Strata, ther's another one SPSS, and there's more. We'll talk about frameworks for a second. Framework provides a set of APIs used for programming and here's a whole bunch of examples. Some of these you may have heard of, others you might not have. For natural language processing, Apache has a framework set of APIs that are routines called Singa. Most of you have probably heard about Hadoop File Systems. MapReduce process is actually to separate individual processes. There's a Mapping process, has a Reduce process and it looks for structure in the data sets that might be a good place to start. When you're given a huge multidimensional dataset, you don't know where to begin,it's like,"Oh! My gosh. How am I ever going to find any structure in this? I tried k-means and didn't tell me really anything." Go try MapReduce. This one's really interesting to me. At some point in the future, I want to go play around with this. So, there's Apache created this thing called Spark and part of Spark has a machine learning library, and it's a fast machine learning library. It's relatively, computationally inexpensive. It's used by eBay and IBM. There's one called Caffe. This one's for image processing and they claim it compresses 600 million images a day with a single again video graphics processing unit. Google TensorFlow as a deep learning algorithms that rely on dataflow graphs. There's Oxdata H2O, Nervana Neon and Shogun and the list goes on after this. So, there's many choices and this is just a testament to how much interesting work is going on by computer scientists all over the world, trying to figure out ways of crafting solutions to problems that have resisted traditional database queries and traditional programming, purpose-built programming algorithms to solve these problems. So, as we saw from machine learning, we want to represent the solution to a problem, the output with a number. They can be real, it can be binary, it can be a probability between zero and one, it can be labels. But as we saw and I talked about that if there are labels, we do need to assign numeric values to those labels because these algorithms don't understand red or they don't understand heavy or light, we have to assign some kind of a numerical value to these labels. The inputs are called features. We've got quantitative features, this can be real numbers, binary or probability values and qualitative. I just mentioned labels. These machine learning algorithms don't understand big, fast, blue, small, that kind of thing. So, you have to assign numeric values to them. In the Bayesian tribe, statistics expects that the future won't differ too much from the past. So, you can base future predictions on past data, by employing a random sampling theory. You can expect the distribution of your present samples to closely resemble the distribution of future samples. That's a bold statement, but often it's true. Past behavior can be a predictor of future behavior. Testing, ensuring good results requires testing. You absolutely have to do this, and your algorithm learns from what's called in-sample data. It's also known as the training data from machine-learning segment. We test our algorithms with out-of-sample data, data that you may have had during training time, but you set it aside and you didn't use. So, you withheld it, for testing purposes or you got new data at a later point in time, and now you can use that as your out-of-sample data to test your learning algorithm, and then taking measurements of your model's performance. This is time-consuming, but essential. What's the models output error? How does the model perform with new data that it's never seen before? These are questions you have to ask. It's essential. Are we getting the predictions that we expect? Are those predictions useful? You might get predictions and they might not be useful. You minght say, well, it wasn't what we were really after and we weren't anticipating this response or is not what we were looking for or maybe we need more data, maybe we needed to try a different algorithm. Validations big deal.