Next we discuss personal data re-identification risks, which are the risks that the persons or data subjects that we've de-identified, or think we've de-identified, might still be identified either in the data itself to which you have applied some more variety of data privatization or is a result of weaker than intended data protection processing or architecture designs. As we can see in the diagram, some of these risks may be rooted in the overall security design of the architectures and personnel approaches attending the data management data protection activities. For example, not employing clear separation of duties with effective access control policies and monitoring of privileged users combined with a weak key management approach for direct identifier tokenization could lead to resources being able to bridge the gap between the de-identified data and the original data as outlined in risks 2, 3, and 4 in the diagram. Other risks will be more driven by the nature of the data itself, the current de-identification approaches used, and the potential extended uses of the data in the target environments. For example, we may have missed certain sensitive data in the source, or not fully understood the sensitivity of the indirect identifiers that we were dealing with, or introduced in new linkage based re-identification risks if data content is brought into the target environment that is assumed to be safe because it is public, but can be linked to the de-identified data. And now combination of all the data together can lead to an identification of a data subject. And these are noted in risks labeled 0 and 1 in the diagram. We'll review some of these data-oriented risks and mitigations using our data science use case example in some of the next slides. Re-identification risks are a concept that can sometimes be hard for students of data privacy newer to the discipline to accept, seeing it as a very theoretical or requiring high degrees of espionage or deviant or ill-intended behavior. How can we possibly protect against every evil spy? I would agree that in some ways, this might be true, but also argue that the opportunity for breaching a known boundary or using data inappropriately, even if unintentionally, can often be a fatal chink in the armor. Assessing these risks can protect against both the innocent or innocently intended actions leading to exposure as well as those more calculated and contrived. In the end, this is an area requiring the data privacy office and business data owner understanding and scrutiny. Which should lead to appropriate security and data privacy put in place to mitigate these risks. And all this needs to be in proportion to the organization's risk tolerance posture. Another key concept to grasp here is the occurrence of a re-identification does not mean necessarily that there is some traumatic breach that has occurred. Although that of course is always a concern. At the heart of the risk evaluation is, is it possible even within our own walls, with our own supposedly trusted team members that some personally de-identified data, either in the data itself or within the allowed access controls we've wrapped around it, is able to be connected backed to the original data or data subject? Even if no foul or harm has occurred yet, the fact that it is possible must be assessed from a risk perspective and addressed by the organization who is the true custodian of the data. Many of the new regulations around data privacy are including these risk assessment obligations that organizations must address in their comprehensive technical and organization measures to become fully compliant. Finally, perfect data protection's not really possible, not unless you completely fabricate the data. And even in some fabrication approaches, this can be very challenging depending on how you fabricated data, what data intelligence you used to build your fabrication rules. The data protection to risk tolerance conversation, however, must be a continuous discussion in organization data governance board rooms. The diagram outlines a few of the most common kinds of attack services and are worth reviewing. The next slides will focus on some risk assessment and remediations that are more related to the data itself and additional perturbation or actions that may be required, specifically related to the risk 0 and risk 1 attack services in the diagram. We'll continue with our data science example use case and look at some data sensitivity specific risks that might need mitigating. So even after we've analyzed and classified and applied our combined de-identification for all the known identifiers in the data, direct or otherwise, and depending on the use cases and potential exposure that this data may experience, we still may want or need to assess the risks that data subjects in the source data are re-identifiable in the resulting de-identified data. As we discussed though these re-identification scenario paths may seem remote or extreme for some to fathom, at a minimum should be understood and assessed and mitigated where appropriate. This will be especially true in cases where data is legitimately intended to be made public after appropriate levels of de-identification such as in the case of public entities, local governments who have obligations to publish certain and various population or societal data sets. Or in cases where organizations are legitimately selling data sets believing they have done all they need to de-identify the data. In both of these cases being mistaken about the true depersonalized state of the data or the ability for re-identification to occur can have catastrophic outcomes for the organization and the effected data subjects. So some of the data specific risks that we might want to assess include remaining uniqueness, vulnerabilities not yet addressed in the data, and these would be often surrogate identifiers that may seem innocent on surface, but because of their uniqueness are effectively surrogates for the identifiers in the data and may not have been appropriately de-identified and in effect become a connective or connection link back to the original data and the original identities in that data. Transactional uniqueness is another vulnerability. These would be the type of vulnerabilities that are exposed via a set of type of purchase, location of purchase, sets of attributes combined with an individual entity that stand out in some extreme level or outlier type in the total population, such that they point directly to one or a limited set of individuals. An example might be the total hours consumed by a particular individual, of a particular service, the total numbers or size of purchase amounts. These might be visible in some externally available environment and allow for a connection back to the individuals in the data. Pure outliers of content in the data are also a vulnerability and risk and these can be individual elements or combinations of elements. So similar to transaction uniqueness, but may in fact just be individual attributes themselves. They could be a very rare disease, a very high transaction amount, a billion dollar purchase. [LAUGH] And there are also risks that will be associated with the linking of data sets to our de-identified data. We discussed a little bit about this previously. But effectively, these would be sets of content that may be generally considered public information and generally safe. But once combined with our de-identified data have enough information to link those sets of data together and there may be some identifiable information in the public sets such as phone numbers or emails or domain names even that would allow the person using that data or accessing that data to re-identify an individual. So to review some of these risk assessment types. Vulnerable data elements is the premise that individual or combinations of values appear less than some threshold in data population. This can lead to things like linkability to the original data from the de-identifed data that was unexpected, creating an attack path that was not previously known. Some surrogate identification or identifiers and the de-identified data that provides a path back to the original data. Transaction uniqueness, which is measuring the number of unique transactions for typically externally observable attributes in the de-identified data. And it's based on the defining of principles in externally observable attributes and then analyzing combinations that occur as unique within a threshold. So this might be identifying a set of transactional content with principles being the de-identified, direct identifiers something tokenized, perhaps. But then analyzing other attributes, the externally observable attributes such as time of day or location or purchase type and finding uniqueness in the data as related to the unique identifier. These can create risks of re-identification exposures if they are extreme or outliers. And then there's the outliers vulnerability itself, which is based on the desire to protect those principles those identifiers, even if they're masked, where some dimensions are associated with their principle are, whether they're individually or in some aggregated state, as a sum or a count, stand out amongst the overall population. The typical approach with outliers or the safest approach is to just remove them from the population. But an analysis has to occur to determine to what degree does that skew the overall population statistics for those given attributes. And then we discussed the external data set linking which is risks or issues that are introduced when some other raw data or some other data set is combined with our target data set. And upon the combination or aggregation of those two, we find linkages between those sets that create new pointers back to individual data subjects.