De-identification

  • Creator
    Topic
  • #49036
    Bill May
    Participant

    Has anyone had any experience/can recommend any tools for de-identifying HL7 messages

    Ta

Viewing 12 reply threads
  • Author
    Replies
    • #60515
      Keith McLeod
      Participant

      Wouldn’t you use the Xlate tools provided?  Remove/replace all personnally identifiable information from the message.  This generally means it will never come back to be re-identified.  Replace the name with common John/Jane Doe or another non meaningful name. REmove all other info that would possibly identifiable.  Or I guess the real question is what information do they need and is it considered identifiable.  Only send what is needed.

    • #60516
      Richard Hart
      Participant

      Bill.

      I have written some code – based on the  WWII Enigma Wheel code generator.

      Someone on Clovertech gave me the their algorithm and I have based my code on this.  

      Our scrambling project was canned, so it is  only used by us.

      As usual, it is based on an ‘init’ key.

      If this is always the same, the scrambling will return the same value for a given string, so all messages for ‘Joe Smith’ will be transformed to ‘Eoj Htims’ (as an example).

      This will not be a turn-key solution, as the TCL fits in with our TCL translations and use of namespaces etc, but you should be able to modify it quite easily.

    • #60517
      Steve Drozdowski
      Participant

      Hi Richard, would you be willing to share this code?

      Thanks.

    • #60518
      Ronald Ortiz
      Participant

      HL7Spy has a feature built in that allows you to de-identify your entire SMAT files.

    • #60519
      Chris Williams
      Participant

      For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:

      1. Names.

      2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:

      a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.

      b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.

      3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

      4. Telephone numbers.

      5. Facsimile numbers.

      6. Electronic mail addresses.

      7. Social security numbers.

      8. Medical record numbers.

      9. Health plan beneficiary numbers.

      10. Account numbers.

      11. Certificate/license numbers.

      12. Vehicle identifiers and serial numbers, including license plate numbers.

      13. Device identifiers and serial numbers.

      14. Web universal resource locators (URLs).

      15. Internet protocol (IP) address numbers.

      16. Biometric identifiers, including fingerprints and voiceprints.

      17. Full-face photographic images and any comparable images.

      18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.

    • #60520
      Jim Kosloskey
      Participant

      Chris,

      Do those actually need to be removed or does encryption of those data items qualify?

      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #60521
      Chris Williams
      Participant

      The word used in the HHS publication is “removed”. If I recall correctly, in compiling these messages you can generate an identification number that can be used in the de-identified message which could be used by you  should it be necessary to re-identify the patient at a later date, perhaps for some follow-up care. Only the compiler of the de-identified messages is permitted to know how to decode the identification.

      There is much more detailed information here:

      <a href="http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf&#8221; class=”bbcode_url”>http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf

    • #60522
      Gene Salay
      Participant

      Thanks for the pdf link.  

      It says the code(s) can’t be derived from the original data,  but has to be random.   I think the “key” it refers to is not a decryption key, but more of a matching key that associates the random code to the true identity.

    • #60523
      Chris Williams
      Participant

      Gene, you’re correct. You would keep a table that associates your pseudo-identifier with a real medical record number for the data going to a particular researcher.

      To add to my response to Jim’s question, data encryption is really not appropriate for fields in de-identified messages. The fields should be empty. As an example, if you were to encrypt the street address, still everyone at that address would have the same string of address data. That’s a level of granularity that is not allowed.

      For everything you encrypt, there is someone who will want to decrypt it. It’s really hard to decrypt “null”

    • #60524
      Ronald Ortiz
      Participant

      Chris Williams wrote:

      For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:

      1. Names.

      2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:

      a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.

      b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.

      3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

      4. Telephone numbers.

      5. Facsimile numbers.

      6. Electronic mail addresses.

      7. Social security numbers.

      8. Medical record numbers.

      9. Health plan beneficiary numbers.

      10. Account numbers.

      11. Certificate/license numbers.

      12. Vehicle identifiers and serial numbers, including license plate numbers.

      13. Device identifiers and serial numbers.

      14. Web universal resource locators (URLs).

      15. Internet protocol (IP) address numbers.

      16. Biometric identifiers, including fingerprints and voiceprints.

      17. Full-face photographic images and any comparable images.

      18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.

      What all segments/fields apply to this?

    • #60525
      Chris Williams
      Participant

      The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.

      I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using  BULKCOPY and  PATHCOPY where you then try to block things you don’t want.

    • #60526
      Ronald Ortiz
      Participant

      Chris Williams wrote:

      The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.

      I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using

    • #60527
      David Barr
      Participant

      The de-identification rules are primarily used to allow use of patient data in research studies. Data doesn’t have to be de-identified before it is shared with a business partner. I’m curious why people on this board are having to de-identify data.

Viewing 12 reply threads
  • The forum ‘Tcl Library’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,129
Forums
28
Topics
9,301
Replies
34,447
Topic Tags
288
Empty Topic Tags
10