Weird Thing

  • Creator
    Topic
  • #49878
    Deanna Norman
    Participant

    Hi,

    Just wondering if anybody know what this is…

    We have a thread listening on TCP/IP that receives messages, translates it to another thread and then pushes to another thread that writes to a directory. We receive about 300 messages a day

    The issue is, whenever we restart the Network Monitor for the above process, it seems to queue up all previous messages that have been sent  in the past and they go through each thread again. Thus we get duplicates.

    Does anybody know what can cause this?

    Thanks,

Viewing 13 reply threads
  • Author
    Replies
    • #63973
      Mark McDaid
      Participant

      I would check the recovery database while the thread is running to see if messages are still sitting there after they are sent, maybe in state 14.  Disclaimer:  I’m fairly new to Cloverleaf, but I would think the messages must still be in the recovery database if they are sent again when the thread/process is restarted.  Hope that helps.

    • #63974
      Deanna Norman
      Participant

      Yeah that’s what I was thinking.. but both recovery and error DB are empty.

    • #63975
      Deanna Norman
      Participant

      Actually they are still there.. I changed my search options and all are there.. since Feb 19th! Now I have to figure out why they are staying there even though they successfully get through.

    • #63976
      Mark McDaid
      Participant

      Are there any tcl procs that perform any processing on the messages?  You might look there for a cause.  Are any recover_33 procs used on that thread?  I’m just throwing out ideas of things you might want to search through to find the culprit.  Good luck.

    • #63977
      Deanna Norman
      Participant

      this is my tcl proc that moves the msg from thread 2 to thread 3… I’m thinking that this is what makes a copy of the msg and stores it in the recovery DB… You think that is my problem?

      Code:


      ######################################################################
      # Name: tps_transfer_msg
      # Purpose:
      # UPoC type: tps
      # Args: tps keyedlist containing the following keys:
      #       MODE    run mode (”start”, “run” or “time”)
      #       MSGID   message handle
      #       ARGS    user-supplied arguments:
      #              
      #
      # Returns: tps disposition list:
      #          

      #

      proc tps_transfer_msg { args } {
         keylget args MODE mode               ;# Fetch mode

         set dispList {} ;# Nothing to return

         switch -exact — $mode {
             start {
                 # Perform special init functions
         # N.B.: there may or may not be a MSGID key in args
             }

             run {
         # ‘run’ mode always has a MSGID; fetch and process it
                 keylget args MSGID mh
                 set overmh [msgcreate -meta {USERECOVERDB true} [msgget $mh]]
                 lappend dispList “OVER $overmh”
             }

             time {
                 # Timer-based processing
         # N.B.: there may or may not be a MSGID key in args
             }
             
             shutdown {
         # Doing some clean-up work
      }
         }

         return $dispList
      }

    • #63978
      Mark McDaid
      Participant

      Not sure, but I do notice that a copy of the original message is made, and that copy is given a disposition of OVER to send it back the other direction.  However, the original message is not given a disposition in the proc.  I’m pretty sure this results in a memory leak, and that if the original message is not needed, you would need to give it a disposition of KILL.  I’m not sure from just that small section of code, though, why the original message was copied.  Like I said, I’m fairly new to Cloverleaf, just took the Level 2 class last month, so take what I say with a grain of salt.

    • #63979
      Deanna Norman
      Participant

      That gotta be it.. I’m creating a copy.. sending it over to the next thread, but the original stays.

    • #63980
      Russ Ross
      Participant

      I was just talking with co-worker Jim Kosloskey yesterday and he mentioned doing an interface using OVER and I remembered I had a problem similar to yours with an inhereited interface with an OVER so OVER can cause this behaivor.

      I’m not saying that is your specific problem but it is possible.

      Sounds like you have a handle on that possiblity.

      There are some things I want everyone to be aware of to help stay out of other confusing database madness.

      Whenever doing any of the following here we require stopping all processes in the site, make sure the database is empty, and shut the site down (stop lock manager) so everything is idle:

      – create a new thread

      – delete an existing thread

      – rename an existing thread

      Russ Ross
      RussRoss318@gmail.com

    • #63981
      Mark McDaid
      Participant

      Thanks for those tips, Russ.  I’m getting ready to implement a new thread on our production site and that is definitely good info to know.

    • #63982
      Todd Lundstedt
      Participant

      Shutting down the site to create a thread?  Wow!  That’s a bit over kill, don’t ya think?  You must have one process per site, or something like that.  There’s no way on earth we could do that with our setup (15 processes, 100+ threads).

      We regularly add, delete (seldom change) threads with only stopping the process.  Now, if we got some crazy IPC stuff going on, we take a little extra care.  But mostly, we make our NetConfig changes, stop the process, save the changes, start the process.

    • #63983
      Russ Ross
      Participant

      Yes we have opted towards creating many smaller sites as opposed to a few consolidated sites which we had when I first came to MD Anderson Cancer Center a decade ago.

      The word opted might be misleading, actually it was more like forced to many smaller sites to better utilize our limited resources and be able to have down time with less impact plus much more seemless upgrades.

      Currently I just ran our site/thread counting script and we currently have

      68 prodctuion sites with 506 threads altogether (average 7 – 8 threads per site)

      and

      130 test sites with 784 threads altogether (average 6 threads per site)

      Personally I find creating many smaller sites has been one of the best improvements we have done and I could never go back to many threads in larger sites.

      Literally, cloverleaf was imploading when we had many threads in larger consolidated sites.

      I would like to thank co-worker Jim Kosloskey for helping us to see the light about creating many smaller sites.

      Some people would argue against it and say it is a personal preference, but at some point an opinion becomes a fact with enough experience and this is how I feel about numerous smaller sites.

      Russ Ross
      RussRoss318@gmail.com

    • #63984
      Michael Hertel
      Participant

      One advantage to many sites is that each has it’s own lock manager/recovery database.

      Therefore if you have a huge transaction volume, the lock manager does not become the bottle neck.

      We’ve gone the route of throwing bigger hardware and SAN drives at the problem. So we stick with the few sites concept. It makes daily support much easier for us.

    • #63985
      Steve Carter
      Participant

      Utilizing more sites with fewer threads may work OK in smaller environment.  However, the configuration of an environment must take into account many different variables.  What works well in some shops could be a disaster in others.

      We are currently running 4 Cloverleaf boxes:

      Development – 121 sites – 1021 threads

      Testing (QA) – 172 sites – 3762 threads

      Production (1) – 132 sites – 2832 threads

      Production (2) – 8 sites – 27 threads

      The QA and Production(1) environments continue to grow everyday.

      As you can see, trying to run with an average of 10 threads per site would create a ridiculous number of sites.  The overhead from the monitor daemons alone (without any monitoring) would negate any advantage that this setup ‘might’ create.

      Our environments are well monitored and relatively easy to support.  Based on our needs, this suits us best.

      I don’t disagree that your setup is what works best in your case, but I do disagree that ‘an opinion becomes a fact’.

      I’ve spent the past 10 years watching our environment grow from 1 server with 2 sites to what it is today.  I can tell you that the way our servers are architected is what works best for us.

      Steve

    • #63986
      John Mercogliano
      Participant

      One thing I noticed in your tps is that you are not killing or continueing the message handle associated with the $mh so that message will stay in your recovery database.  

      John

      John Mercogliano
      Sentara Healthcare
      Hampton Roads, VA

Viewing 13 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,296
Replies
34,439
Topic Tags
287
Empty Topic Tags
10