Companies care a lot about engagement because they think more engaged employees are likely to be more motivated and work harder, perform better, and they're more likely to stay. And so this has been something companies focus a lot of attention on managing and therefore trying to measure. And as I pointed out, kind of the key way organizations have done this in the past is using annual surveys. Are there other approaches that we can take to do this? Yes, this is one of the areas where people are starting to explore using machine learning, okay? How does that work? Well, in order to understand how these algorithms work, I actually think it's quite useful to think about how we might do it right. So suppose you're a manager and you want to assess the engagement of each of your subordinates without using a survey, basically just figure out if they are in a good mood or a bad, mood who's excited. How would you do it, okay? One of the first things you do and the most obviously is just listen to them when they're talking. Are they complaining a lot? Are they saying their work's exciting and they're pleased by it? Or are they kind of a little bit more withdrawn? Is most of what they're talking about the downsides of their work? Just by tracking that, that's probably the first way that you would try and figure out are people engaged. It turns out that these sorts of things are things that machine learning does very well. And so what I want to talk about in terms of tracking engagement is using machine learning tool in a systematic way, work through what employees are saying, talk about two approaches. So one is sentiment analysis and the second is topic modeling. So let's start with sentiment analysis. The basic principle of sentiment analysis is taking text, any text, and trying to figure out the emotional content of that text. Is it text that talks about people being happy? Is it text that talks about people being sad? So again, before thinking about the computer, suppose you needed to train a friend to do this, suppose you have collected a whole bunch of descriptions of your employees of their work where they talk about how they're feeling about it. And you want your friend to tell you how many of them are pleased with their work and how many are disengaged. Suppose also that your friend had literally zero emotional intelligence, right? Maybe they're an academic, right? So they find it hard to do this without really explicit instruction. How could you get somebody who really doesn't have this emotional intelligence to figure it out? Well, what you probably end up telling them to do is just, okay, let's look at all of the words in their descriptions that describe emotions. Some of those words are going to be about positive emotions. They're going to talk about being happy or excited. Other words are going to be negative. They're going to talk about being frustrated or disappointed. And so we could just even just count all of these words and then compare their frequencies where people are using a lot more happy words, they were using words like pleased, excited, engaged. If they're American, probably will describe themselves as stoked. They're using those sorts of words, then we're pretty sure that they're feeling engaged. And this is the basis of sentiment analysis. So what it does is it starts with a pre-defined dictionary of words. So it has a long list of all of the words that we associate with positive emotions and another long list of all of the words that associates with negative emotions. And so any piece of text, any answer to a question that you want to code in this way, it will just go through. Count the number of words with positive emotions, and number words with negative emotions, and look at the differences between them. Now you're probably thinking there are some really obvious problems. We're doing this. What if somebody says they're not happy? How do you code that? So usually these algorithms are sophisticated enough that things like if there's not before, either we ignore the word or we reverse its meanings. Obviously that's not perfect. You can imagine there might be some contorted examples. If I say I'm inadequately excited, is it going to catch that's a negative thing or not? Hard to tell. So yeah, there's going to be some. We can also imagine there are differences in how people express themselves. If you're British and you say the work is fine, you're actually quite excited. If you're American and you say the work's great, it basically means, yeah, it's okay. And so kind of not only do we see these national differences, but obviously kind of between people there are differences. Some people are always exuberant. It's like, yeah, being able to control for those people is hard and so that's also going to lead to inaccuracies. And then the third thing just with the surveys, we can imagine people might be strategic, particularly if they know that we're going to be looking at what they say to figure out how engaged they are. The question may be less how engaged are my and more how engaged do I want people to think I am. Let's say, yeah, there are definitely problems with this. I think when you look at it, people have tried to validate these kind of analyses. Basically use these algorithms to code the sentiment of text and then get human raters to do the same sort of coding. Actually when you do that, you do see quite good correlations between what the computer says and what people say. My sense with this is that ultimately in any one piece, particularly fairly short piece of text, there could be all sorts of errors in it. But when you look at large bodies of text, when you look at what's been written across people, generally the accuracy that you get by these methods is really pretty good. And the obvious advantage of it is, yeah, I don't need to read everything and make my own judgment about what they're doing. I can just get these numbers coming straight out of the analysis from the computer. So sentiment analysis is just a tool. It's a way of coding text for emotions. The obvious question we want to ask ourselves is if we're planning to use this to understand how engaged are people are, what texts are we going to use? It could be anything that they've written, but where are we going to find it? In thinking about where to use sentiment analysis, I think there's kind of a trade off between, say, comprehensiveness and sense of invasiveness by the employee. So if I really want to know people are excited or not, probably the best thing I could look at is their emails and instant messages, right? I mean, these days within the organization, a huge amount of our communication happens electronically intermediated, right, and so all of that data is there. We could easily run sentiment analysis through everything that people have written to kind of get a sense of how excited are they feeling? How is that changing day to day? Which groups are exhibiting more engagement? Which groups are exhibiting less engagement? You do see there are some tools out there, for example to look at Slack messages, and develop these kinds of analysis. Yeah, should we do it mixed? I mean, certainly I think on a legal standpoint within the US, we're fine. I think there's no expectation of privacy around emails, other things that people work, right, on work technology, so we ought to be okay ethically. I think in some organizations, they would have no trouble doing this. In other organizations, people might see it as a big violation of privacy. And so it is certainly, I think once you start going through what people are writing at this kind of level, making sure people understand what you're doing with their data and making sure people are comfortable with it, I think it's important to maintaining their trust. So you can do this whether you should, probably organizations specific. There are other ways that we can use it. So you can also use it, just think about what people are posting about the company on social media. I think this is more fair game, right? I mean if it's on social media, by definition it's public. And so looking through what people are posting about the company can give you a decent sense of overall morale. Obviously you're getting less information. We don't have the kind of granularity of emails of being able to see across different groups and across time, we just have less data, but it's a sensible thing to look at. Another place where people often use this is actually just asking people how are you feeling about your job? And so rather than doing these kind of long engagement surveys, we could ask people every few weeks, maybe every month. We could ask them, write us a couple of sentences about how you're feeling about your work and the company, okay? That open text, it's easier for them. It provides us much richer information. And one of the first things we can do is just do a quick coding of all of it so you know what's the overall satisfaction that we're seeing here. Well, the nice things about this text, it doesn't just lend itself to sentiment analysis though. If we start asking people how they're feeling, it also gives us more leverage to try and figure out why they're feeling upset. What are the themes that come out of this? The downside is now we may have thousands and thousands and thousands of these sentences from which we're trying to extract those themes. And so that is a great task for a second tool that I want to talk about, which is topic modeling.