This cartoon is about de-identifying PHI under HIPAA. De-identifying personal data is quite complicated. Researchers have been able to re-identify sets of personal data with just names, birth dates, and gender. The reason why de-identifying data is difficult is that there is more and more identified personal data online that can be matched up with de-identified data and used to link up names.
In one infamous example, Netflix released an anonymized database of user movie ratings. Researchers were able to re-identify many people in the database by comparing the data to the Internet Movie Database, a website where people rate movies often using their identities.
As data online proliferates, it is easier to find overlap in the online data with de-identified data sets. Personal data can rarely be de-identified with 0% chance of being re-identified, but there are better and worse ways to de-identifying data. De-identification is not a matter of black-and-white but shades of gray – it’s about degrees of risk.
HIPAA understands the challenges of de-identification, and HIPAA provides for two ways to de-identify data so it can be shared without the restrictions HIPAA ordinarily places on it. An expert statistician can assess the method of de-identification and certify that there’s a low risk for re-identification. Or, in what is known as the Safe Harbor Method, the data can be stripped of 18 types of identifiers, including address, gender, dates, phone numbers, photos, etc. The result is that quite a lot of information must be scrubbed from a record to de-identify it.
One tradeoff with de-identification is that typically to reduce the risk of re-identification, more data must be removed, but this also eliminates useful information in the data set.
* * * *
This post was authored by Professor Daniel J. Solove, who through TeachPrivacy develops computer-based privacy and data security training. He also posts at his blog at LinkedIn, which has more than 1 million followers.