regex -inline question

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf regex -inline question

  • Creator
    Topic
  • #54520
    Mike Strout
    Participant

      I have a project where I am charged with scraping a numerical value out of a templated RAD report. I have the regexp working great with one minor issue. When I use the -inline option to get the captured text, it returns too much. Consider the following…

      The block of text…

      set docText “Interpretation Summary:~~The ejection fraction was felt to be 55-60% . The left ventricle is normal in ~structure and size.  Increased echodensity at apex is suspicious for ~thrombus.  A VSD cannot be excluded.  The mitral valve is normal in ~structure. A study of good quality was performed.”

      The regexp string

      set searchString “ejection\s?~?\s*fraction\s?~?\s*was\s?~?\s*felt\s?~?\s*to\s?~?\s*be\s?~?\s*(\d\d\.?\d?)-?%?”

      The regexp command

      set EF [ regexp -inline — $searchString $docText ]

      Basically, it looks for the phrase “ejection fraction was felt to be ##-” with the possibility of there being spaces and/or a ~ between each word. My problem is that in Tcl, -inline returns the following…

      $EF={ ejection fraction was felt to be 55- } 55

      In my mind, it shouldn’t be capturing the phrase, just the 55. To try to make sure the regexp interpreter understood what I was looking for, I wrapped the sections I don’t want in non-capturing groups like so…

      (?:ejections?~?s*fractions?~?s*wass?~?s*felts?~?s*tos?~?s*bes?~?s*)(dd.?d?)(?:-?%?)

      I get the same result. The ugly work around is to grab lindex 1 from the variable holding the result, but this creates another issue. If I don’t find a match, I do…

      set EF “no match found”

      Of course, when I write [ lindex $EF 1 ] to a file, I get the number if the match is found, but if no match is found, I get “match”, the second word in the sentence stored in $EF.

      So, to summarize, my questions are…

      1. How do I get my regexp to not capture both the entire string and the number

      2. When assigning a sentence to a variable, how do I get it to not be interpreted as a list

    Viewing 0 reply threads
    • Author
      Replies
      • #81819
        Charlie Bursell
        Participant

          Why are you using the -inline option?  That option returns a list is why you are seeing it in curly brackets.  Normally you would want to use -inline with -all to get a list of occurances.

          Since you say there may be spaces and/or ~ bewteen words it complicates the regexp a bit.  I would do it in two steps to uncomplicate the regular expression.

          1. Make ~ = space

          2. make space space = space

          set docText “Interpretation Summary:~~The ejection fraction was felt to be 55-60% . The left ventricle is normal in ~structure and size.  Increased echodensity at apex is suspicious for ~thrombus.  A VSD cannot be excluded.  The mitral valve is normal in ~structure. A study of good quality was performed.”

          regsub -all — {s?~s?} $docText { } docText

          set EF “”   ;# Just in case no match

          regexp -nocase — {The ejection fraction was felt to be.*?.} $docText EF

          echo $EF

          ==> The ejection fraction was felt to be 55-60% .

          If you jsut need the number range:

          regexp -nocase — {The ejection fraction was felt to be (.*?)s?.} $docText {} EF

          echo $EF

          ==> 55-60%

          I hope it helps

      Viewing 0 reply threads
      • The forum ‘Cloverleaf’ is closed to new topics and replies.