please help with regular expression!

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf please help with regular expression!

  • Creator
    Topic
  • #50349
    Sergey Sevastyanov
    Participant

    Hi

    I’m having a trouble with my regex. Even though I specify non-greedy pattern it behaves as greedy.

    When I run exactly the same regex in Perl it works as I expect it to. As this regex will be part of translation in Cloverleaf I need it to work in Tcl. Please help.

    Here is my Tcl code. In first regexp I just get rid of lines “***ADDENDUM*** as I don’t need them. That regexp works just fine

    The text in $obx5 separated by headers into different parts. Each header is a string that looks like this :

    *


    header_type


    *

    Number of dashes in the header can differ from 12 to 20 for different types of headers.

    I need to get header type in one variable ($headertype) and corresponding text into another ($body)

    The second regexp behaves like a greedy one and sets $body to the whole text that follows the first header up to the very last header

    Code:


    set obx5 {*—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~*——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~***ADDENDUM***~~*———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~*————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54}

    # remove ***ADDENDUM*** lines
    regsub -all  — {~~***ADDENDUM***(~~*-{12,14} (Addendum to Report|Changes to Impression) -{12,14}*)} $obx5 {1} obx5

    # the following regex is where I am having a trouble
    # I want to get type of header in $headertype and text following
    # the header up to the next header in $body

    regexp  — {*-{12,20} (Impression) -{12,20}*(.*?)*-{12,20}[^-]*?-{12,20}*} $obx5 -> headertype body ;

    Here is Perl code that uses exactly the same regular expression and works :

    Code:


    use strict;

    my $obx5 = ‘*—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~*——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~***ADDENDUM***~~*————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~***ADDENDUM***~~*———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~*————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54’;

    # remove ***ADDENDUM*** lines
    $_ = $obx5;
    s/~~***ADDENDUM***(~~*-{12,14} (Addendum to Report|Changes to Impression) -{12,14}*)/$1/g;

    # get header type in $1 and body in $2
    m/*-{12,20} (Impression) -{12,20}*(.*?)*-{12,20}[^-]*?-{12,20}*/;

Viewing 4 reply threads
  • Author
    Replies
    • #65715
      Bala Pisupati
      Participant

      Instead of assigning to $obx5 can you try assigning it to a variable say $var1. then assign the variable to $obx5.

    • #65716
      Sergey Sevastyanov
      Participant

      Thank you, Bala, but that’s not what I was asking and I didn’t have problems with assigning value to variable $obx5. In fact I put it there just for that variable to have text in there. In real life that value is extracted from the HL7 message

    • #65717
      Sergey Sevastyanov
      Participant

      OK, I figured it out.

      That was what Brent B.Welch and Ken Jones wrote in “Practical Programming in Tcl and Tk” book in section about bugs when mixing greedy and non-greedy quantifiers. Patterns “-{12,20}” are greedy quanifiers, while (.*?) is non-greedy. So Tcl was confused with mixed quatifiers and chose to be greedy. When I changed all quantifiers to be non-greedy, i.e. “-{12,20}?” everything started working

      So correct regexp is:

      Code:


      regexp  — {*-{12,20}? (Impression) -{12,20}?*(.*?)*-{12,20}?[^-]*?-{12,20}?*} $obx5 -> headertype body ;

    • #65718
      Nathan Martin
      Participant

      You may also want to try something like this…

      Code:

      regsub -all {(*{1,1}?-{12,20} (.+?) -{12,20}*)} $obx5 “n\2n\1” obx5

      foreach {headertype body} [split [string trim $obx5 “n”] “n”] {
       puts “headertype: $headertype”
       puts “body      : $body”
       puts “”
      }

      Results in:

      Code:

      headertype: Impression
      body      : *—————— Impression ——————*~IMPRESSION:~1. UNCHANGED APPEARANCE OF SUBMENTAL LYMPH NODE.~2. NO EVIDENCE OF STONE.  NO ABNORMALITY SEEN UNDER THE BARIUM MARKER PLACED ON~    THE PATIENT’S LEFT CHEEK.~~

      headertype: Report
      body      : *——————– Report ——————–*~~Reason for Exam:  COUGH (786.20)~~NECK WITHOUT CONTRAST~~COMPARISON:  CT of the neck 02/16/2007, CT neck 06/10/2005.~~FINDINGS:  Again seen is an approximately 1.2 cm submental node, not~significantly changed in appearance compared to studies dating back to~06/10/2005.  No new lymphadenopathy identified within the neck.~~Barium marker placed on the left cheek is seen, just superior to an accessory~parotid gland.  Parotid glands appear symmetric.  There is no evidence of ~stone.~This appearance is also unchanged from prior studies dating back to 2005.~~Limited views of the brain are grossly unremarkable.~~Visualized lung apices appear clear.~~Scattered mucous retention cysts seen in the maxillary sinuses bilaterally.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/22/2008 14:29~~

      headertype: Addendum to Report
      body      : *————– Addendum to Report ————–*~~CT NECK WO CONTRAST~~This is a test addendum with no changes to impression.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 10:11~~

      headertype: Addendum to Report
      body      : *————– Addendum to Report ————–*~~CT NECK~~This is the 2nd addendum for this report.~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/23/2008 12:45~~

      headertype: Changes to Impression
      body      : *———— Changes to Impression ————-*~IMPRESSION:  A CT NECK WITH CONTRAST IS RECOMMENDED TO BE PERFORMED AS SOON AS~POSSIBLE.~~

      headertype: Addendum to Report
      body      : *————– Addendum to Report ————–*~~CT NECK NI~~~Electronically Signed By: RALPH JENSEN, MD~Date:  09/24/2008 07:54

    • #65719
      Sergey Sevastyanov
      Participant

      Nathan,

      That’s great! I already did it differently, but I like your idea for its simplicity and effectiveness.

      Thanks you very much for your help!

Viewing 4 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,296
Replies
34,439
Topic Tags
287
Empty Topic Tags
10