De-identification

Clovertech Forums Read Only Archives Cloverleaf Tcl Library De-identification

  • Creator
    Topic
  • #49036
    Bill May
    Participant

      Has anyone had any experience/can recommend any tools for de-identifying HL7 messages

      Ta

    Viewing 12 reply threads
    • Author
      Replies
      • #60515
        Keith McLeod
        Participant

          Wouldn’t you use the Xlate tools provided?  Remove/replace all personnally identifiable information from the message.  This generally means it will never come back to be re-identified.  Replace the name with common John/Jane Doe or another non meaningful name. REmove all other info that would possibly identifiable.  Or I guess the real question is what information do they need and is it considered identifiable.  Only send what is needed.

        • #60516
          Richard Hart
          Participant

            Bill.

            I have written some code – based on the  WWII Enigma Wheel code generator.

            Someone on Clovertech gave me the their algorithm and I have based my code on this.  

            Our scrambling project was canned, so it is  only used by us.

            As usual, it is based on an ‘init’ key.

            If this is always the same, the scrambling will return the same value for a given string, so all messages for ‘Joe Smith’ will be transformed to ‘Eoj Htims’ (as an example).

            This will not be a turn-key solution, as the TCL fits in with our TCL translations and use of namespaces etc, but you should be able to modify it quite easily.

          • #60517
            Steve Drozdowski
            Participant

              Hi Richard, would you be willing to share this code?

              Thanks.

            • #60518
              Ronald Ortiz
              Participant

                HL7Spy has a feature built in that allows you to de-identify your entire SMAT files.

              • #60519
                Chris Williams
                Participant

                  For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:

                  1. Names.

                  2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:

                  a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.

                  b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.

                  3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

                  4. Telephone numbers.

                  5. Facsimile numbers.

                  6. Electronic mail addresses.

                  7. Social security numbers.

                  8. Medical record numbers.

                  9. Health plan beneficiary numbers.

                  10. Account numbers.

                  11. Certificate/license numbers.

                  12. Vehicle identifiers and serial numbers, including license plate numbers.

                  13. Device identifiers and serial numbers.

                  14. Web universal resource locators (URLs).

                  15. Internet protocol (IP) address numbers.

                  16. Biometric identifiers, including fingerprints and voiceprints.

                  17. Full-face photographic images and any comparable images.

                  18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.

                • #60520
                  Jim Kosloskey
                  Participant

                    Chris,

                    Do those actually need to be removed or does encryption of those data items qualify?

                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                  • #60521
                    Chris Williams
                    Participant

                      The word used in the HHS publication is “removed”. If I recall correctly, in compiling these messages you can generate an identification number that can be used in the de-identified message which could be used by you  should it be necessary to re-identify the patient at a later date, perhaps for some follow-up care. Only the compiler of the de-identified messages is permitted to know how to decode the identification.

                      There is much more detailed information here:

                      <a href="http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf&#8221; class=”bbcode_url”>http://privacyruleandresearch.nih.gov/pdf/HIPAA_Privacy_Rule_Booklet.pdf

                    • #60522
                      Gene Salay
                      Participant

                        Thanks for the pdf link.  

                        It says the code(s) can’t be derived from the original data,  but has to be random.   I think the “key” it refers to is not a decryption key, but more of a matching key that associates the random code to the true identity.

                      • #60523
                        Chris Williams
                        Participant

                          Gene, you’re correct. You would keep a table that associates your pseudo-identifier with a real medical record number for the data going to a particular researcher.

                          To add to my response to Jim’s question, data encryption is really not appropriate for fields in de-identified messages. The fields should be empty. As an example, if you were to encrypt the street address, still everyone at that address would have the same string of address data. That’s a level of granularity that is not allowed.

                          For everything you encrypt, there is someone who will want to decrypt it. It’s really hard to decrypt “null”

                        • #60524
                          Ronald Ortiz
                          Participant

                            Chris Williams wrote:

                            For those who may not have had to get involved with de-identification, here is the list of 18 items that HIPAA says must be removed:

                            1. Names.

                            2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and their equivalent geographical codes, except for the initial three digits of a ZIP Code if, according to the current publicly available data from the Bureau of the Census:

                            a. The geographic unit formed by combining all ZIP Codes with the same three initial digits contains more than 20,000 people.

                            b. The initial three digits of a ZIP Code for all such geographic units containing 20,000 or fewer people are changed to 000.

                            3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

                            4. Telephone numbers.

                            5. Facsimile numbers.

                            6. Electronic mail addresses.

                            7. Social security numbers.

                            8. Medical record numbers.

                            9. Health plan beneficiary numbers.

                            10. Account numbers.

                            11. Certificate/license numbers.

                            12. Vehicle identifiers and serial numbers, including license plate numbers.

                            13. Device identifiers and serial numbers.

                            14. Web universal resource locators (URLs).

                            15. Internet protocol (IP) address numbers.

                            16. Biometric identifiers, including fingerprints and voiceprints.

                            17. Full-face photographic images and any comparable images.

                            18. Any other unique identifying number, characteristic, or code, unless otherwise permitted by the Privacy Rule for re-identification.

                            What all segments/fields apply to this?

                          • #60525
                            Chris Williams
                            Participant

                              The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.

                              I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using  BULKCOPY and  PATHCOPY where you then try to block things you don’t want.

                            • #60526
                              Ronald Ortiz
                              Participant

                                Chris Williams wrote:

                                The short answer is: all fields that contain any of the 18 items on the list. You will have to examine the message structures you are using, field by field, and compare them to the list, with particular attention to any fields that may have been hijacked and used for PHI.

                                I agree with Jim K and Keith’s approach to translations where you only send specific fields that are required by the recipient. It is good to avoid using

                              • #60527
                                David Barr
                                Participant

                                  The de-identification rules are primarily used to allow use of patient data in research studies. Data doesn’t have to be de-identified before it is shared with a business partner. I’m curious why people on this board are having to de-identify data.

                              Viewing 12 reply threads
                              • The forum ‘Tcl Library’ is closed to new topics and replies.