regex -inline question

This topic has 1 reply, 2 voices, and was last updated 10 years, 6 months ago by Charlie Bursell.

Creator

Topic
January 9, 2015 at 9:59 pm #54520
Mike Strout
Participant
I have a project where I am charged with scraping a numerical value out of a templated RAD report. I have the regexp working great with one minor issue. When I use the -inline option to get the captured text, it returns too much. Consider the following…

The block of text…

set docText “Interpretation Summary:~~The ejection fraction was felt to be 55-60% . The left ventricle is normal in ~structure and size. Increased echodensity at apex is suspicious for ~thrombus. A VSD cannot be excluded. The mitral valve is normal in ~structure. A study of good quality was performed.”

The regexp string

set searchString “ejection\s?~?\s*fraction\s?~?\s*was\s?~?\s*felt\s?~?\s*to\s?~?\s*be\s?~?\s*(\d\d\.?\d?)-?%?”

The regexp command

set EF [ regexp -inline — $searchString $docText ]

Basically, it looks for the phrase “ejection fraction was felt to be ##-” with the possibility of there being spaces and/or a ~ between each word. My problem is that in Tcl, -inline returns the following…

$EF={ ejection fraction was felt to be 55- } 55

In my mind, it shouldn’t be capturing the phrase, just the 55. To try to make sure the regexp interpreter understood what I was looking for, I wrapped the sections I don’t want in non-capturing groups like so…

(?:ejections?~?s*fractions?~?s*wass?~?s*felts?~?s*tos?~?s*bes?~?s*)(dd.?d?)(?:-?%?)

I get the same result. The ugly work around is to grab lindex 1 from the variable holding the result, but this creates another issue. If I don’t find a match, I do…

set EF “no match found”

Of course, when I write [ lindex $EF 1 ] to a file, I get the number if the match is found, but if no match is found, I get “match”, the second word in the sentence stored in $EF.

So, to summarize, my questions are…

1. How do I get my regexp to not capture both the entire string and the number

2. When assigning a sentence to a variable, how do I get it to not be interpreted as a list
Creator

Topic

Viewing 0 reply threads

Author

Replies
- January 10, 2015 at 4:14 am #81819
  Charlie Bursell
  Participant
  Why are you using the -inline option? That option returns a list is why you are seeing it in curly brackets. Normally you would want to use -inline with -all to get a list of occurances.
  
  Since you say there may be spaces and/or ~ bewteen words it complicates the regexp a bit. I would do it in two steps to uncomplicate the regular expression.
  
  1. Make ~ = space
  
  2. make space space = space
  
  set docText “Interpretation Summary:~~The ejection fraction was felt to be 55-60% . The left ventricle is normal in ~structure and size. Increased echodensity at apex is suspicious for ~thrombus. A VSD cannot be excluded. The mitral valve is normal in ~structure. A study of good quality was performed.”
  
  regsub -all — {s?~s?} $docText { } docText
  
  set EF “” ;# Just in case no match
  
  regexp -nocase — {The ejection fraction was felt to be.*?.} $docText EF
  
  echo $EF
  
  ==> The ejection fraction was felt to be 55-60% .
  
  If you jsut need the number range:
  
  regexp -nocase — {The ejection fraction was felt to be (.*?)s?.} $docText {} EF
  
  echo $EF
  
  ==> 55-60%
  
  I hope it helps
Author

Replies

Viewing 0 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.