Now I'd like to talk to you about some testing principles, but how we test. And this is going to be rather conceptual as principles may be in some cases, but we're going to try and look systematically and step back from 10,000 feet to sort of determine how we can apply systematic principles towards effective testing. So we have our What, Where, When, Who and How in terms of principles for testing analysis. And in this section, we're focusing on how. So, the first thing that we want to do is we want to divide and conquer. So when we test, it's very difficult to test everything at once if we want to be rigorous. Instead, what we have to do is we have to look at testing at multiple levels of abstraction, so that we can be very rigorous on testing the small pieces and then we can roll up those test results in pieces in individual modules that we have confidence in towards testing the system. We want to control the scope of the tests that we have and we want to have different testing strategies when we apply tests for unit testing for integration testing of larger modules, and then for system testing. Another thing that we want to be able to do is divide our testing project along the purpose of tests. And as we'll see in a few more slide packs, we have different testing techniques that we're going to use depending on whether we're trying to test the system for well-formdeness properties like memory safety or whether we're trying to do performance testing, or conformance testing or lots of different testing purposes and the third thing we're going to do is we're going to try and divide the problem in terms of testing techniques. So certain kinds of testing techniques are very effective for finding well-formedness bugs, but they can't tell you anything about whether the system meets its requirements. And other testing techniques are very effective at testing, whether a system meets its requirements, but don't tell you anything about the scalability of it when you try and deploy it across lots of servers. So we're going to apply different testing techniques, depending on the purpose in order to sort of divide and conquer the sort of verification space for the program that we have. A second principle is that of visibility. So one of the real difficulties in testing, especially as we get toward bigger and bigger systems is making sure that the test inputs that we project into this system, that we can observe what happens in the system with them. So oftentimes, we can write test inputs and those test inputs may execute certain code paths in the system. And we may be able to mark them in terms of covering all the statements, for example, but we don't ever see the result of that input. So that input gets stored into some internal state, but never propagates into an output that we can observe. So one of the things that we have to make sure when we write tests is that we have tests not only that test the structure of the program, but always produce things that we can observe on the outside and check. And sometimes this requires that when a program has state hiding like objects do, they have private data. Sometimes for effective testing, we have to be able to expose that private data, so that we can determine whether or not the test passed or failed. And another concept that's related to visibility is logging. So when you get to a certain scale, things are not going to be visible to you. Your passing inputs into a program that may get munched out through 30 different systems and some of them may be outside of your control. And so the only way that you can inspect the test to some degree is if you built into the program, facilities to log, kind of the expected behavior of different systems and how information is transmitted between them. So, this becomes important for effective testing at scale. Another thing that we strive for is repeatability. What we would like is to write tests that always pass or always fail. So one of the things that becomes a bane of testing once you have lots and lots of test cases are things called flakey tests. Flakey tests are those that pass most of the time and fail some of the time, and then someone has to take a look at them, and try to figure out what went wrong. Now sometimes, tests are flakey for reasons that the software is flakey. So, we've talked in the past about when you use parallelism to write programs. Sometimes you have shared memory that's accessed between threads and only 1 execution out of 100 is going to cause a problem, that's going to cause a test to fail. That is not a flakey test. That is a flakey program, but there are other things that can be bad tests. So for example, you're storing results in Java in a set and Java sets are non-deterministically ordered. So depending on how you store things in the set, if you just use the pointer comparison which is the default for objects, it could be that a test fails just because of the way you constructed the test. So tests are code just like programs are code to some degree when we write unit tests for writing Java code and we can write bad Java code that's going to cause the test to fail sometimes, and succeed other times. We want to always avoid that. On the other hand, some reasons for flakey tests can't be avoided like you have a program that has a parallelism bug in it. And some cases, tests that are flakey are due to bad environments. So one thing you have to watch out for is if your test requires resources say, from a database or a file system that the database or file system is in a consistent state each time you run the test, so that you don't get test failures due to the database or the file system being misconfigured. So, these are things that we're going to come back to when we look at sort of the concrete execution of tests. Another thing is redundancy. So, the verification methods that we're talking about are unsound. They miss errors. So we're never going to have a complete set of tests, except for trivial programs. So, what we want to be able to do is you can think of it as having a toolbox. We have a hammer. We have a screwdriver. We have different things and what we want to do is we're going to try and apply as many of these tools as possible to sort of have some redundancy in how we approach the testing problem, so that these problematic areas of the program are hit in multiple ways. It's good to have defense in depth. Another important area is feedback where we take information that we gained previously to improve our testing processes. So it turns out that different applications have different pain points, different sets of modules that cause the most trouble. And once we do some initial testing, we begin to understand where those are and that's where we can focus our testing attention. So, what we want to do is update tests to more thoroughly test areas that are known to be problematic and we also want to learn which classes of bugs are most likely. So if we're doing a scientific system, it may be the precision bugs are a problem. If we're doing highly concurrent software, it may be that concurrency errors are the most important to check for and what we want to do eventually is to work with developers to reduce systematic errors both in terms of testing strategy and in additional programming practices that help to reduce the kinds of errors that we see in the applications we produce. So I guess at the end of the day, we still have the question of how. These principles that we've described are different ways that we'll use to inform our thinking about how to do testing, but this topic is really the focus of the rest of the specialization. So, we will be discussing lots and lots of concrete techniques in the rest of the lectures.