[SOUND]. We have introduced [INAUDIBLE] based, sequential pattern mining algorithm, GSP. We have introduced vertical format based sequential pattern mining [INAUDIBLE]. Now we come down to see pattern-growth space algorithm, called PrefixSpan. PrefixSpan is a slow spaced mining algorithm, okay? To examine the sequential pattern in more detail, we need to introduce the concept of prefix and a suffix. The prefix means anything in the front, if it's frequent you want to capture them, as a frequent prefix. Like a, a, a, ab. Then their projection becomes a suffix. Remember if you get a a, a, what you see is you got a position holder for the next one is b. So that's why we use underscore as a position holder. Okay. The similar thing, you get an ab, the position holder will shift to c. So that means, given a sequence, you will find a set of prefixes and a set of suffixes, or you can say prefix-based projection, okay. Then for this prediction, what we will find is first find length-1 sequential pattern. If they are frequent, we call them length-1 sequential pattern. Then we can do divide and conquer, that means we divide the search space to mine each projected database. We have a projected database, b projected database, c projected database, up to f projected database. This mining method, methodology, called PrefixSpan, or prefix-projected sequential pattern mining. So let's examine a little detail. For this sequence database, if we find length-1's sequential pattern like this, then we can actually get length-2 sequential pattern by first doing projective database. Then find length-2 sequential patterns. That means, if they are frequently in this projected database, they will form length-2 sequential pattern. And then we can do length-2 sequential pattern-based projection from aa project database, af project database. Okay? And we can keep this one ongoing. Okay? The major strengths, or advantages. There's no candidate subsequence to be generated. And the projected database keeps shrinking. So, let's look at some implementation tricks. So if you do the projection, like you really taking the sequence to do [INAUDIBLE] a, [INAUDIBLE] ab. You will get largely redundant subsequences, or you see postfix but they are largely redundant with the original string. However, we do not need to do the real physical prediction. What do we need is we call pseudo-projection. That means you just say, what is a's predictability base? It's the same sequence, but the position is number two. What is ab's project database is the same sequence as s, but the position is number four. So if you register the next position to be scanned, it will essentially register the projected sequence, or suffix. Okay? So if the database can be held in main memory, this pseudo-projecting is very effective. Because there's no physical copying or suffix. You only need pointer to the sequence. You just get offset. You get suffix. But if it does not fit in the memory, you use pointer. You may involve lots of disk accesses. So you may really want to do physical projection because once you projected it quickly this set can be fit in the main memory. That means we can integrate physical and pseudo-projection any time when the data fits in the main memory, you can use pseudo-projection. [MUSIC]