In the first week, we covered different types of alternative data: consumption, the open web, and corporate regulatory disclosures. This week, we'll be focusing on media. By media, we actually refer to many different types of information; newspaper, mostly digital, press releases, specialized magazine in finance, but in other industries as well, as well as social media that we'll be showing some examples for. We actually have two big goals here. One is to listen to what the world is saying through the media. We're trying to establish the perspective of what what the world is saying about financial assets such as stocks and bonds, what the world is saying about different countries and about different themes. The second goal is to combine this information that we extracted from the media through economic intuition in order to predict market variables. Returns, volatility, risks are among the things we can start to predict better if we listen carefully to what the media is saying. If you remember, in the first week of the program, we actually characterized the consumption data along two dimension: how easy it is to obtain and how easy it is to incorporate in financial market applications. For consumption, we realized that it's hard to obtain and easy to incorporate. Media is actually the opposite. It's actually very easy to obtain media information. We can do so through scraping, and we can do those through acquisition of different feeds from different vendors. But it actually turn out to be very hard to incorporate in portfolio management applications. There are many reasons for which media is hard to incorporate in investment process. First, each media articles has to be evaluated with respect to relevancy. Does it really cover the assets that we're interested in information on? Second, it's hard to measure the tone. We need to figure out whether a piece of text is positive or negative or somewhere in between. Then media has a lot of biases. We will look through a few of them in the next few slides. Let's talk about relevancy. I'm sure that many of you follow soccer or football. Then there is a very good player named Raheem Sterling, who recently have been scoring quite a lot of goals. When we read information about the Sterling, how would we teach our machine, our system to figure out that the story about the Sterling is about football and not about the Sterling pound that we could be interested in extracting information for. For those of you who live in the United States, I'm sure that every November, you have a delighted meals with the turkey. When we read so much about the turkey around the Thanks Giving season, how do we know that it's about the bird and dinner and not about the financial crisis in Turkey? So all of this has to be incorporated in our system, which makes it hard to work to incorporate information from the media into investment process. Let's talk politics. Political bias in the media is actually a big topic these days. So my colleagues and I, collected information specifically from left-leaning media and right-leaning media covering the economy around the midterm elections of 2018 in the United States. We then correlated this information with the S&P 500. What we found was striking. The left-leaning media, CNN and Washington Post in this example, tend to cover the economy when the market fall. In this instances, the right-leaning media doesn't talk about the economy. When the market actually rallies, the left-leaning media does not talk about it at all, but the right-leaning media, FOX News in this example, tend to cover it excessively. So we found a correlation that quantify the degree of biases in the media in terms of political coverage. This is important because now when we read information from the media and we try to incorporate into investment processes, we also need to take into account political biases. Let's talk about reporting style biases. My colleagues and I also collected information reported by different types of media covering the same events. What we saw was that the general media tend to have negative tone compared to specialized media covering the same events. So now, again, when we incorporate media information into our process, we need to take into account these reporting style biases. We also found evidence of editorial biases. Some media outlets tend to cover some events that they feel comfortable covering and ignore other events that for some reason they don't fit with their agendas. Let's now talk about information production. Information is actually being produced by people like you and I. We have deadlines and we have bosses and people tell us when we have to meet them. My colleagues and I looked at media information from different sources and we found a very strong effect suggesting that certain media outlets, let's say, The Wall Street Journal covering certain companies, let's say GM every week on the same day. So on Tuesday, if there is an article about General Motors in The Wall Street Journal, there is likely to be another article next week on Tuesday on the same company at the same outlet. The result as you can see are very strongly and statistically significant. So we have to take it into account now when we investigate and we try to incorporate media information into our process. There are many more biases that I'm not going to spend a lot of time introducing, but just let you know so you could start incorporating it in your thinking and processes. Local biases is one example. Local newspapers tend to cover local firms using much more favorable tone than they do when they cover distant firms. We found that article length is very important because articles that are very long tend to have on average much more neutral tone than articles that are very short. Individual reporters tend to have their own writing styles that need to be incorporated into our models. Of course, the topic of today is fake news. How do you determine whether the news is actually real or not? If it's not real, if it's fake, how did you determine whether it's important? So that's going to be the focus of the week. We're going to first look at sentiment analysis technique. That is the tools that will allow us to extract the tone from the textual content of the article. Then we'll go into series of lab sessions, looking at social network data, building networks, and using some networks techniques to identify centrality. We will then do a lab session on sentiment analysis as well, and we will finish with applications that will show us how do we use all this information extracted from the media, process in a way that will allow us to determine characteristics of articles such as sentiment in predicting market variable such as future stock returns and of course, predicting risk?