The block of text…
set docText “Interpretation Summary:~~The ejection fraction was felt to be 55-60% . The left ventricle is normal in ~structure and size. Increased echodensity at apex is suspicious for ~thrombus. A VSD cannot be excluded. The mitral valve is normal in ~structure. A study of good quality was performed.”
The regexp string
set searchString “ejection\s?~?\s*fraction\s?~?\s*was\s?~?\s*felt\s?~?\s*to\s?~?\s*be\s?~?\s*(\d\d\.?\d?)-?%?”
The regexp command
set EF [ regexp -inline — $searchString $docText ]
Basically, it looks for the phrase “ejection fraction was felt to be ##-” with the possibility of there being spaces and/or a ~ between each word. My problem is that in Tcl, -inline returns the following…
$EF={ ejection fraction was felt to be 55- } 55
In my mind, it shouldn’t be capturing the phrase, just the 55. To try to make sure the regexp interpreter understood what I was looking for, I wrapped the sections I don’t want in non-capturing groups like so…
(?:ejections?~?s*fractions?~?s*wass?~?s*felts?~?s*tos?~?s*bes?~?s*)(dd.?d?)(?:-?%?)
I get the same result. The ugly work around is to grab lindex 1 from the variable holding the result, but this creates another issue. If I don’t find a match, I do…
set EF “no match found”
Of course, when I write [ lindex $EF 1 ] to a file, I get the number if the match is found, but if no match is found, I get “match”, the second word in the sentence stored in $EF.
So, to summarize, my questions are…
1. How do I get my regexp to not capture both the entire string and the number
2. When assigning a sentence to a variable, how do I get it to not be interpreted as a list