So to finish off this week, it's a nice exercise, I think, just to think about what our predictions would say about the remainder of the season for 2019/20 after the league decides what to do about the pandemic. At the time that this was being prepared, it was unclear whether the league would actually finish the season and play the remaining games. Or whether it would have to make some decision about how to decide the final placings in the league. Which, of course, is important not just for the to identify the champions, but also to decide which teams get promoted and relegated. Might have to decide that without the games being played, find some mechanism, they could use the table as it was at the end of the season. Or they could give teams points proportionate to the points that already one in the season. Or they could do what we're going to do, which is generated forecasting model and then use that model to project what the league standings would be at the end of the season. Well, it was pretty unlikely that they would ever use a statistical model like this, because too many people would be upset and find it be uncomfortable with following this method. But for us, it's actually quite an interesting exercise to try out, and it's interesting to ask whether this wouldn't be ultimately actually a rather better method of deciding it. So this is a relatively straightforward exercise, given that we've already generated the predictions for the whole of the 2019/2020 system. So, to do this well, let's first install the packages that we need, and then we'll import a table, this is a league table as it stood when the league was suspended in March due to the coronavirus pandemic. And the forecast that we generated in the previous session, I've saved those so we can now just bring those back. And this data frame has in it all of the results that we were looking at, including our predictions as to the probabilities and are predicted value for the outcome of the game logitpred. So, what we want to do is add to the league table that we've just imported. We want to add to that what we think the results for each team would have been of the remaining games, so teams have played either 28 or 29 games. There are 38 games in the Premier League season, so each team had nine or ten games left to go in the season. So we're going to create, we're only interested in the teams and our prediction. So, I'm just going to create a sub sample of the data called unplayed, which just has those four variables in it. But this has all 380 rows in it, the only rows we want are the games that were not played. The games that were not played have a value of one, so I restrict the data to just include not played games. And if I do don't describe, you can see there were 92 games that hadn't been played due to the coronavirus and following the shutdown, and those are the values that we get. These are, remember, these are predictions of games, and in each game we have two teams, we have a home team and away team. So we need to find a way to allocate the points for each team and generate a new table, which lists the points one by each team when it was a home team and when it was away team. So what we're going to do is we're going to divide this up into two subsets and calculate the points for each team in each subset, and add them back together again. So, firstly, let's identify the points, so we've got home team points and away team points, and this tells us how many points go for the home team and how many points the away team, depending on what the prediction is. And so, you can see here, so, for example, the first game Aston Villa vs Sheffield United, game not played. Our model predicted a home win, so that would mean three points for the home team and zero points for the away team. Of course, bearing in mind that we'd never predict to draw with our model just like the bookmakers and just like 538. So, let's then say that we will take just the home team and the home team points as being one subset, so I'm going to call that results, this really means results for the home team. And when I do that, so this is for each, I've renamed this column club rather than home team, and I've said they get expected points, XPoints, equal to the number for the home team, for that game. Now, I'm going to produce exactly the analogous data set for the away team. So, I'm going to take the column for the away team and the away team points, and I'm going to rename that the away team column club, and the Apoints I'm going to call them XPoints. So I do that, so now I've got another data frame, which has exactly the same column headings, but remember, these are the away team results not the home team results. And so, you can see that what I'm going to do is I'm going to combine these two. Now, up until now, we've tended to do merges where I put one data set alongside another data set. But now what we want to do is concatenate, meaning we want to add to this list, we want to append a list to the bottom of it. So that we add the list of column names and the list of XPoints on top of each other, and then we can do a sum across those. And so, we use the pd.concat for the two files, and you can see each of our data frames has 92 rows. So when we do the concatenate, you see that it now has 184 rows, which is 2 times 92. So I do if I do describe, you can see, again, we have 184 rows which is each team for the 92 games. Mean number of points is 1.5, which is means every game is either a win for the home team and a loss for the away team or vice versa, which means the average number of points for each game is always 1.5. And so, now what we want to do is group take the sum of points one in all of these games for each team, and so we do a group by for this. And so, now we have a list of the 20 teams and their expected number of points in the remaining games. And then we merge this data into our existing table, we do a merge, and now this is a merge so we add this on to the right hand side of the existing table file. So, we've now got points and expected points, and, of course, what we're interested in is our prediction of the final points for the table, is the sum of the actual points and the expected points. And that gives us our points table, and we can now sort the points table in ascending order. And then we can use that to generate a rank for each team, we can identify the rank associated with the where you are in the list, so the dot rank command allows us to do that. So, a couple of things to notice here, Liverpool is top of the table in this model, which would surprise nobody, Manchester City comes second, also no surprise. Chelsea, Manchester United in this model both have 69 points, we've not predicted goal difference, so there's no way of separating them out. So actually, what our model does is say they each share the 3rd and 4th position, so they each get 3.5 place as it were. And, obviously, if we actually really needed to distinguish their positions, we need to do something, we need to generate something like goal difference, but we don't need to do that here. The other thing is to note is that at the bottom, we don't run into that problem, it would be much more problematic with the bottom three teams being relegated from the Premier League. If there was a tie on points between the team and inside 17th place and 18th place, then you really would need goal difference to separate them. But happily in our model does not have that problem, because the predicted number of points are distinct for each of these ranks. It's worth noting that the final table positions change here for a couple of teams, AFC Bournemouth and Brighton and Hove Albion. Brighton was safe up until the season was suspended, but AFC Bournemouth was in the relegation place, and our model says that AFC Bournemouth would actually escape relegation whilst Brighton would be relegated. That's probably not a very good forecast, actually, and one reason why that's probably not a very good forecast is that, the reason that Bournemouth predicted to do so much better is because their TM value at the beginning of the season was much higher. Bear in mind that our model is based on the TM value at the beginning of the season, and doesn't take in account of what happened in the season. What happened in this season that has actually Brighton did much better than anyone expected, and AFC Bournemouth under performed. And you can think of that as being an effective form, there's the idea that you have quality of the team, but also teams have a run of good form or a run of bad form. And one way you might extend our model and make it in better would actually be to include a form variable. You wouldn't exclude the TM values, but what you could also say, for example, maybe your performance on the last five games might also be a significant contributor to the outcome, and so you might think of enhancing our model by adding form into it. Now, and final thing to observe on this, this is rather crude in which because we actually forecast each game and then added the points for a win or a loss for each game. But there's another way we could do this, bear in mind that our forecasts included a probability as well. So instead of giving teams a win or a loss for each game, we could just give them a value based on their win probability as home team or away team, and the probability of a draw. So what you could say is, if you're the home team, for example, you could give them three times the probability of the home win as points, plus one times the draw probability as points. And so, that would be a more probabilistic way of thinking about the results, and that would generate a different ranking and a different set of points at the end of the season. And if you're interested, you could try that alternative model, and, of course, that's something that's been implied in a lot of the work that we've been doing here already. So, there that concludes this week, where we have shown how to generate out of forecast models, and shown that our very simple model really it's almost impossible to think of a simpler model, actually is very successful at forecasting Premier League games. And maybe gives you idea of how you might generate forecasts and models for other leagues in other times and places. And we've concluded here by just looking at how to construct a table based on our forecasts and see how the season might have panned out, had it not been for the interruption due to the coronavirus pandemic. And, of course, it would probably be a very interesting exercise to see what actually happened in the event that the season was actually completed.