Does TCL regexp "non-capturing" syntax work?

Homepage Clovertech Forums Cloverleaf Does TCL regexp "non-capturing" syntax work?

  • Creator
    Topic
  • #116327
    Peter Heggie
    Participant

    Given this string:

    set s1 “<item>Provider/Practice : UHCC (UHCC Adult Care)</item>”

    execute the regexp:

    regexp {(?:.*Practice.*)\(.*\)(?:\</item\>)} “$s1” match

    result is 1

    examine the match contents:

    echo $match

    result:

    <item>Provider/Practice : UHCC (UHCC Adult Care)</item>

    The non-capturing tokens should remove all but “(UHCC Adult Care)”, but I’m getting everything. How can I use non-capturing syntax in regexp?

     

    Peter Heggie

Viewing 7 reply threads
  • Author
    Replies
    • #116328
      Charlie Bursell
      Participant

      You are using positive look-ahead where you should be using negative (?!:)

      The following returns:  (UHCC Adult Care)

      regexp {(?!:.*Practice.*)\(.*\)(?!:\</item\>)} “$s1” match

      You could remove the parenthesis.

      Look-ahead in regular expressions is tricky.  If you want just the “UHC Adult Care” from this string I would write it like:  Assumes you only what is inside parenthesis.

      regexp — {\((.*?)\)} $s1 {} match

      Note the {} at the end throws away full match while match will contain UHCC Adult Care

      I use look-ahead only when forced to :=)

    • #116329
      Peter Heggie
      Participant

      Thank you! you are right – I do not know how to use look-ahead – I will use your syntax.. 🙂

      I usually just muddle through regular expressions.. but often I hit land mines.

       

      Peter

      Peter Heggie

    • #116330
      Jim Kosloskey
      Participant

      Seems to me this could be done with string functions.

      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #116369
      Peter Heggie
      Participant

      Jim, yes this could have been done with string functions; I’m saving three or four lines of code by using regular expressions. It is a trade-off when considering everyone’s skill set vs. learning and writing more with regular expressions. Right now I need to get better at regex because we have another product that parses and returns data from PDFs based on regular expressions, but I also just want to know regex better.

      Charlie, I got the regex to work using your negative look-ahead example. I also used your syntax to remove the parenthesis by matching on the opening and closing parens, but then throwing out all but the (second) matching string (I needed the additional beginning and ending text strings because parenthesis can also be found in other areas of the discharge instructions):

      given string s2 of:

      <item>Provider/Practice : SGI, Syracuse Gastroenterology (Syracuse Gastroenterology)</item>

      regex of:

      regexp {(?!:<item>Provider/Practice.*)\((.*?)\)(?!:.*item>)} “$s2” {} match

      returns 1 and $match = Syracuse Gastroenterology

      Thank you

      Peter Heggie

      • #116397
        Jim Kosloskey
        Participant

        Peter,

        I was not implying that string functions over regexp but rather making sure anyone following the post were aware this could be accomplished not using regexp should they not feel comfortable using regexp.

         

        email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #116384
      Charlie Bursell
      Participant

      If you really want to use look-ahead/look behind remember they create a *lot* of overhead and are arcane to the casual user that may have to maintain this later.

      I prefer to keep it simple like:
      regexp — {<item>Provider/Practice.*?\((.*?)\)} $s2 {} match

      If doing something like you did at least provide comments as exactly what you are doing or use the expanded switch with the regex to explain

      Example:

      set x http://www.infor.com

      regexp -expanded — {
      ^             # beginning of string
      [^:]+      # all characters to the first colon
      (?=    # begin positive lookahead
      .*\.com$   # for a trailing .com
      )                   # end positive lookahead
      } $x match
      <b>=> </b>1 <b>
      </b> echo $match
      => http

      Is much better understood than:

      regexp {^[^:]+(?=.*\.com$)} $x match
      => 1
      echo $match
      => http

    • #116387
      Peter Heggie
      Participant

      ok I can see what you are doing with removing the non-capturing components – as long as the only ‘group’, in parenthesis, is the text I want, then by using the null {} return-variable specifier, then only the group(s) are returned, each going to their own specified (sub) variable. and the first subvar  ( ‘match’) points to the first group captured (the only group).

      Why is the question mark there? I thought it specified that the preceding character or group was optional? Apparently I need it in both places, otherwise the group fails to match.

      Peter Heggie

    • #116398
      Peter Heggie
      Participant

      Jim – sure, understood. Actually I use a lot of string functions all over the place – I can pretty much build anything I want with them. I wrote a crude XML parser with string functions (talk about memory leaks!) and still use some for CCDA processing. In this situation, I thought I could use less code. It also happens to include a ‘start position’ option and an ‘indices’ option to tell me where in the text block my string is, which saves a few more lines also as I iterate through the text block looking for the next string match.

      Peter Heggie

    • #116409
      Charlie Bursell
      Participant

      The question mark after .* makes the match non-greedy in that it will find the first match.  By default, the match is always greedy which means it will make the maximum match.  In your case, since there is only one set of parenthesis it was not really needed.

      Warning: do not mix greedy/non-greedy it can return weird results

      In the examples below the first RE is greedy, and the second is non-greedy:

      set x {He sits, but she stands.}

      regexp — {.*} $x match; echo $match
      => He sits, but she

      regexp — {.*?} $x match; echo $match
      => He

      FYI:  there are many regular expression tutorials available in Youtube.

Viewing 7 reply threads
  • You must be logged in to reply to this topic.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,296
Replies
34,439
Topic Tags
287
Empty Topic Tags
10