please help with regular expression!

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf please help with regular expression!

  • Creator
    Topic
  • #50349
    Sergey Sevastyanov
    Participant

      Hi

      I’m having a trouble with my regex. Even though I specify non-greedy pattern it behaves as greedy.

      When I run exactly the same regex in Perl it works as I expect it to. As this regex will be part of translation in Cloverleaf I need it to work in Tcl. Please help.

      Here is my Tcl code. In first regexp I just get rid of lines “***ADDENDUM*** as I don’t need them. That regexp works just fine

      The text in $obx5 separated by headers into different parts. Each header is a string that looks like this :

      *


      header_type


      *

      Number of dashes in the header can differ from 12 to 20 for different types of headers.

      I need to get header type in one variable ($headertype) and corresponding text into another ($body)

      The second regexp behaves like a greedy one and sets $body to the whole text that follows the first header up to the very last header

      Code:


      set obx5 {*—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~*——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~***ADDENDUM***~~*———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~*————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54}

      # remove ***ADDENDUM*** lines
      regsub -all  — {~~***ADDENDUM***(~~*-{12,14} (Addendum to Report|Changes to Impression) -{12,14}*)} $obx5 {1} obx5

      # the following regex is where I am having a trouble
      # I want to get type of header in $headertype and text following
      # the header up to the next header in $body

      regexp  — {*-{12,20} (Impression) -{12,20}*(.*?)*-{12,20}[^-]*?-{12,20}*} $obx5 -> headertype body ;

      Here is Perl code that uses exactly the same regular expression and works :

      Code:


      use strict;

      my $obx5 = ‘*—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~*——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~***ADDENDUM***~~*———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~*————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54’;

      # remove ***ADDENDUM*** lines
      $_ = $obx5;
      s/~~***ADDENDUM***(~~*-{12,14} (Addendum to Report|Changes to Impression) -{12,14}*)/$1/g;

      # get header type in $1 and body in $2
      m/*-{12,20} (Impression) -{12,20}*(.*?)*-{12,20}[^-]*?-{12,20}*/;

    Viewing 4 reply threads
    • Author
      Replies
      • #65715
        Bala Pisupati
        Participant

          Instead of assigning to $obx5 can you try assigning it to a variable say $var1. then assign the variable to $obx5.

        • #65716
          Sergey Sevastyanov
          Participant

            Thank you, Bala, but that’s not what I was asking and I didn’t have problems with assigning value to variable $obx5. In fact I put it there just for that variable to have text in there. In real life that value is extracted from the HL7 message

          • #65717
            Sergey Sevastyanov
            Participant

              OK, I figured it out.

              That was what Brent B.Welch and Ken Jones wrote in “Practical Programming in Tcl and Tk” book in section about bugs when mixing greedy and non-greedy quantifiers. Patterns “-{12,20}” are greedy quanifiers, while (.*?) is non-greedy. So Tcl was confused with mixed quatifiers and chose to be greedy. When I changed all quantifiers to be non-greedy, i.e. “-{12,20}?” everything started working

              So correct regexp is:

              Code:


              regexp  — {*-{12,20}? (Impression) -{12,20}?*(.*?)*-{12,20}?[^-]*?-{12,20}?*} $obx5 -> headertype body ;

            • #65718
              Nathan Martin
              Participant

                You may also want to try something like this…

                Code:

                regsub -all {(*{1,1}?-{12,20} (.+?) -{12,20}*)} $obx5 “n\2n\1” obx5

                foreach {headertype body} [split [string trim $obx5 “n”] “n”] {
                 puts “headertype: $headertype”
                 puts “body      : $body”
                 puts “”
                }

                Results in:

                Code:

                headertype: Impression
                body      : *—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~

                headertype: Report
                body      : *——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~

                headertype: Addendum to Report
                body      : *————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~

                headertype: Addendum to Report
                body      : *————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~

                headertype: Changes to Impression
                body      : *———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~

                headertype: Addendum to Report
                body      : *————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54

              • #65719
                Sergey Sevastyanov
                Participant

                  Nathan,

                  That’s great! I already did it differently, but I like your idea for its simplicity and effectiveness.

                  Thanks you very much for your help!

              Viewing 4 reply threads
              • The forum ‘Cloverleaf’ is closed to new topics and replies.