› Clovertech Forums › Read Only Archives › Cloverleaf › Tcl Library › De-identification
Ta
I have written some code – based on the WWII Enigma Wheel code generator.
Someone on Clovertech gave me the their algorithm and I have based my code on this.
Our scrambling project was canned, so it is only used by us.
As usual, it is based on an ‘init’ key.
If this is always the same, the scrambling will return the same value for a given string, so all messages for ‘Joe Smith’ will be transformed to ‘Eoj Htims’ (as an example).
This will not be a turn-key solution, as the TCL fits in with our TCL translations and use of namespaces etc, but you should be able to modify it quite easily.
Hi Richard, would you be willing to share this code?
Thanks.
HL7Spy has a feature built in that allows you to de-identify your entire SMAT files.
For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:
1. Names.
2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:
a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.
b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.
3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.
4. Telephone numbers.
5. Facsimile numbers.
6. Electronic mail addresses.
7. Social security numbers.
8. Medical record numbers.
9. Health plan beneficiary numbers.
10. Account numbers.
11. Certificate/license numbers.
12. Vehicle identifiers and serial numbers, including license plate numbers.
13. Device identifiers and serial numbers.
14. Web universal resource locators (URLs).
15. Internet protocol (IP) address numbers.
16. Biometric identifiers, including fingerprints and voiceprints.
17. Full-face photographic images and any comparable images.
18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.
Chris,
Do those actually need to be removed or does encryption of those data items qualify?
email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.
The word used in the HHS publication is “removed”. If I recall correctly, in compiling these messages you can generate an identification number that can be used in the de-identified message which could be used by you should it be necessary to re-identify the patient at a later date, perhaps for some follow-up care. Only the compiler of the de-identified messages is permitted to know how to decode the identification.
There is much more detailed information here:
<a href="http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf” class=”bbcode_url”>http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf
Thanks for the pdf link.
It says the code(s) can’t be derived from the original data, but has to be random. I think the “key” it refers to is not a decryption key, but more of a matching key that associates the random code to the true identity.
Gene, you’re correct. You would keep a table that associates your pseudo-identifier with a real medical record number for the data going to a particular researcher.
To add to my response to Jim’s question, data encryption is really not appropriate for fields in de-identified messages. The fields should be empty. As an example, if you were to encrypt the street address, still everyone at that address would have the same string of address data. That’s a level of granularity that is not allowed.
For everything you encrypt, there is someone who will want to decrypt it. It’s really hard to decrypt “null”
For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:
1. Names.
2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:
a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.
b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.
3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.
4. Telephone numbers.
5. Facsimile numbers.
6. Electronic mail addresses.
7. Social security numbers.
8. Medical record numbers.
9. Health plan beneficiary numbers.
10. Account numbers.
11. Certificate/license numbers.
12. Vehicle identifiers and serial numbers, including license plate numbers.
13. Device identifiers and serial numbers.
14. Web universal resource locators (URLs).
15. Internet protocol (IP) address numbers.
16. Biometric identifiers, including fingerprints and voiceprints.
17. Full-face photographic images and any comparable images.
18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.
What all segments/fields apply to this?
The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.
I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using BULKCOPY and PATHCOPY where you then try to block things you don’t want.
The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.
I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using
The de-identification rules are primarily used to allow use of patient data in research studies. Data doesn’t have to be de-identified before it is shared with a business partner. I’m curious why people on this board are having to de-identify data.