Does TCL regexp "non-capturing" syntax work?

Clovertech Forums Cloverleaf Does TCL regexp "non-capturing" syntax work?

  • Creator
    Topic
  • #116327
    Peter Heggie
    Participant

      Given this string:

      set s1 “<item>Provider/Practice : UHCC (UHCC Adult Care)</item>”

      execute the regexp:

      regexp {(?:.*Practice.*)\(.*\)(?:\</item\>)} “$s1” match

      result is 1

      examine the match contents:

      echo $match

      result:

      <item>Provider/Practice : UHCC (UHCC Adult Care)</item>

      The non-capturing tokens should remove all but “(UHCC Adult Care)”, but I’m getting everything. How can I use non-capturing syntax in regexp?

       

      Peter Heggie

    Viewing 7 reply threads
    • Author
      Replies
      • #116328
        Charlie Bursell
        Participant

          You are using positive look-ahead where you should be using negative (?!:)

          The following returns:  (UHCC Adult Care)

          regexp {(?!:.*Practice.*)\(.*\)(?!:\</item\>)} “$s1” match

          You could remove the parenthesis.

          Look-ahead in regular expressions is tricky.  If you want just the “UHC Adult Care” from this string I would write it like:  Assumes you only what is inside parenthesis.

          regexp — {\((.*?)\)} $s1 {} match

          Note the {} at the end throws away full match while match will contain UHCC Adult Care

          I use look-ahead only when forced to :=)

        • #116329
          Peter Heggie
          Participant

            Thank you! you are right – I do not know how to use look-ahead – I will use your syntax.. 🙂

            I usually just muddle through regular expressions.. but often I hit land mines.

             

            Peter

            Peter Heggie

          • #116330
            Jim Kosloskey
            Participant

              Seems to me this could be done with string functions.

              email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

            • #116369
              Peter Heggie
              Participant

                Jim, yes this could have been done with string functions; I’m saving three or four lines of code by using regular expressions. It is a trade-off when considering everyone’s skill set vs. learning and writing more with regular expressions. Right now I need to get better at regex because we have another product that parses and returns data from PDFs based on regular expressions, but I also just want to know regex better.

                Charlie, I got the regex to work using your negative look-ahead example. I also used your syntax to remove the parenthesis by matching on the opening and closing parens, but then throwing out all but the (second) matching string (I needed the additional beginning and ending text strings because parenthesis can also be found in other areas of the discharge instructions):

                given string s2 of:

                <item>Provider/Practice : SGI, Syracuse Gastroenterology (Syracuse Gastroenterology)</item>

                regex of:

                regexp {(?!:<item>Provider/Practice.*)\((.*?)\)(?!:.*item>)} “$s2” {} match

                returns 1 and $match = Syracuse Gastroenterology

                Thank you

                Peter Heggie

                • #116397
                  Jim Kosloskey
                  Participant

                    Peter,

                    I was not implying that string functions over regexp but rather making sure anyone following the post were aware this could be accomplished not using regexp should they not feel comfortable using regexp.

                     

                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                • #116384
                  Charlie Bursell
                  Participant

                    If you really want to use look-ahead/look behind remember they create a *lot* of overhead and are arcane to the casual user that may have to maintain this later.

                    I prefer to keep it simple like:
                    regexp — {<item>Provider/Practice.*?\((.*?)\)} $s2 {} match

                    If doing something like you did at least provide comments as exactly what you are doing or use the expanded switch with the regex to explain

                    Example:

                    set x http://www.infor.com

                    regexp -expanded — {
                    ^             # beginning of string
                    [^:]+      # all characters to the first colon
                    (?=    # begin positive lookahead
                    .*\.com$   # for a trailing .com
                    )                   # end positive lookahead
                    } $x match
                    <b>=> </b>1 <b>
                    </b> echo $match
                    => http

                    Is much better understood than:

                    regexp {^[^:]+(?=.*\.com$)} $x match
                    => 1
                    echo $match
                    => http

                  • #116387
                    Peter Heggie
                    Participant

                      ok I can see what you are doing with removing the non-capturing components – as long as the only ‘group’, in parenthesis, is the text I want, then by using the null {} return-variable specifier, then only the group(s) are returned, each going to their own specified (sub) variable. and the first subvar  ( ‘match’) points to the first group captured (the only group).

                      Why is the question mark there? I thought it specified that the preceding character or group was optional? Apparently I need it in both places, otherwise the group fails to match.

                      Peter Heggie

                    • #116398
                      Peter Heggie
                      Participant

                        Jim – sure, understood. Actually I use a lot of string functions all over the place – I can pretty much build anything I want with them. I wrote a crude XML parser with string functions (talk about memory leaks!) and still use some for CCDA processing. In this situation, I thought I could use less code. It also happens to include a ‘start position’ option and an ‘indices’ option to tell me where in the text block my string is, which saves a few more lines also as I iterate through the text block looking for the next string match.

                        Peter Heggie

                      • #116409
                        Charlie Bursell
                        Participant

                          The question mark after .* makes the match non-greedy in that it will find the first match.  By default, the match is always greedy which means it will make the maximum match.  In your case, since there is only one set of parenthesis it was not really needed.

                          Warning: do not mix greedy/non-greedy it can return weird results

                          In the examples below the first RE is greedy, and the second is non-greedy:

                          set x {He sits, but she stands.}

                          regexp — {.*} $x match; echo $match
                          => He sits, but she

                          regexp — {.*?} $x match; echo $match
                          => He

                          FYI:  there are many regular expression tutorials available in Youtube.

                      Viewing 7 reply threads
                      • You must be logged in to reply to this topic.