Weird Thing

  • Creator
    Topic
  • #49878
    Deanna Norman
    Participant

      Hi,

      Just wondering if anybody know what this is…

      We have a thread listening on TCP/IP that receives messages, translates it to another thread and then pushes to another thread that writes to a directory. We receive about 300 messages a day

      The issue is, whenever we restart the Network Monitor for the above process, it seems to queue up all previous messages that have been sent  in the past and they go through each thread again. Thus we get duplicates.

      Does anybody know what can cause this?

      Thanks,

    Viewing 13 reply threads
    • Author
      Replies
      • #63973
        Mark McDaid
        Participant

          I would check the recovery database while the thread is running to see if messages are still sitting there after they are sent, maybe in state 14.  Disclaimer:  I’m fairly new to Cloverleaf, but I would think the messages must still be in the recovery database if they are sent again when the thread/process is restarted.  Hope that helps.

        • #63974
          Deanna Norman
          Participant

            Yeah that’s what I was thinking.. but both recovery and error DB are empty.

          • #63975
            Deanna Norman
            Participant

              Actually they are still there.. I changed my search options and all are there.. since Feb 19th! Now I have to figure out why they are staying there even though they successfully get through.

            • #63976
              Mark McDaid
              Participant

                Are there any tcl procs that perform any processing on the messages?  You might look there for a cause.  Are any recover_33 procs used on that thread?  I’m just throwing out ideas of things you might want to search through to find the culprit.  Good luck.

              • #63977
                Deanna Norman
                Participant

                  this is my tcl proc that moves the msg from thread 2 to thread 3… I’m thinking that this is what makes a copy of the msg and stores it in the recovery DB… You think that is my problem?

                  Code:


                  ######################################################################
                  # Name: tps_transfer_msg
                  # Purpose:
                  # UPoC type: tps
                  # Args: tps keyedlist containing the following keys:
                  #       MODE    run mode (”start”, “run” or “time”)
                  #       MSGID   message handle
                  #       ARGS    user-supplied arguments:
                  #              
                  #
                  # Returns: tps disposition list:
                  #          

                  #

                  proc tps_transfer_msg { args } {
                     keylget args MODE mode               ;# Fetch mode

                     set dispList {} ;# Nothing to return

                     switch -exact — $mode {
                         start {
                             # Perform special init functions
                     # N.B.: there may or may not be a MSGID key in args
                         }

                         run {
                     # ‘run’ mode always has a MSGID; fetch and process it
                             keylget args MSGID mh
                             set overmh [msgcreate -meta {USERECOVERDB true} [msgget $mh]]
                             lappend dispList “OVER $overmh”
                         }

                         time {
                             # Timer-based processing
                     # N.B.: there may or may not be a MSGID key in args
                         }
                         
                         shutdown {
                     # Doing some clean-up work
                  }
                     }

                     return $dispList
                  }

                • #63978
                  Mark McDaid
                  Participant

                    Not sure, but I do notice that a copy of the original message is made, and that copy is given a disposition of OVER to send it back the other direction.  However, the original message is not given a disposition in the proc.  I’m pretty sure this results in a memory leak, and that if the original message is not needed, you would need to give it a disposition of KILL.  I’m not sure from just that small section of code, though, why the original message was copied.  Like I said, I’m fairly new to Cloverleaf, just took the Level 2 class last month, so take what I say with a grain of salt.

                  • #63979
                    Deanna Norman
                    Participant

                      That gotta be it.. I’m creating a copy.. sending it over to the next thread, but the original stays.

                    • #63980
                      Russ Ross
                      Participant

                        I was just talking with co-worker Jim Kosloskey yesterday and he mentioned doing an interface using OVER and I remembered I had a problem similar to yours with an inhereited interface with an OVER so OVER can cause this behaivor.

                        I’m not saying that is your specific problem but it is possible.

                        Sounds like you have a handle on that possiblity.

                        There are some things I want everyone to be aware of to help stay out of other confusing database madness.

                        Whenever doing any of the following here we require stopping all processes in the site, make sure the database is empty, and shut the site down (stop lock manager) so everything is idle:

                        – create a new thread

                        – delete an existing thread

                        – rename an existing thread

                        Russ Ross
                        RussRoss318@gmail.com

                      • #63981
                        Mark McDaid
                        Participant

                          Thanks for those tips, Russ.  I’m getting ready to implement a new thread on our production site and that is definitely good info to know.

                        • #63982
                          Todd Lundstedt
                          Participant

                            Shutting down the site to create a thread?  Wow!  That’s a bit over kill, don’t ya think?  You must have one process per site, or something like that.  There’s no way on earth we could do that with our setup (15 processes, 100+ threads).

                            We regularly add, delete (seldom change) threads with only stopping the process.  Now, if we got some crazy IPC stuff going on, we take a little extra care.  But mostly, we make our NetConfig changes, stop the process, save the changes, start the process.

                          • #63983
                            Russ Ross
                            Participant

                              Yes we have opted towards creating many smaller sites as opposed to a few consolidated sites which we had when I first came to MD Anderson Cancer Center a decade ago.

                              The word opted might be misleading, actually it was more like forced to many smaller sites to better utilize our limited resources and be able to have down time with less impact plus much more seemless upgrades.

                              Currently I just ran our site/thread counting script and we currently have

                              68 prodctuion sites with 506 threads altogether (average 7 – 8 threads per site)

                              and

                              130 test sites with 784 threads altogether (average 6 threads per site)

                              Personally I find creating many smaller sites has been one of the best improvements we have done and I could never go back to many threads in larger sites.

                              Literally, cloverleaf was imploading when we had many threads in larger consolidated sites.

                              I would like to thank co-worker Jim Kosloskey for helping us to see the light about creating many smaller sites.

                              Some people would argue against it and say it is a personal preference, but at some point an opinion becomes a fact with enough experience and this is how I feel about numerous smaller sites.

                              Russ Ross
                              RussRoss318@gmail.com

                            • #63984
                              Michael Hertel
                              Participant

                                One advantage to many sites is that each has it’s own lock manager/recovery database.

                                Therefore if you have a huge transaction volume, the lock manager does not become the bottle neck.

                                We’ve gone the route of throwing bigger hardware and SAN drives at the problem. So we stick with the few sites concept. It makes daily support much easier for us.

                              • #63985
                                Steve Carter
                                Participant

                                  Utilizing more sites with fewer threads may work OK in smaller environment.  However, the configuration of an environment must take into account many different variables.  What works well in some shops could be a disaster in others.

                                  We are currently running 4 Cloverleaf boxes:

                                  Development – 121 sites – 1021 threads

                                  Testing (QA) – 172 sites – 3762 threads

                                  Production (1) – 132 sites – 2832 threads

                                  Production (2) – 8 sites – 27 threads

                                  The QA and Production(1) environments continue to grow everyday.

                                  As you can see, trying to run with an average of 10 threads per site would create a ridiculous number of sites.  The overhead from the monitor daemons alone (without any monitoring) would negate any advantage that this setup ‘might’ create.

                                  Our environments are well monitored and relatively easy to support.  Based on our needs, this suits us best.

                                  I don’t disagree that your setup is what works best in your case, but I do disagree that ‘an opinion becomes a fact’.

                                  I’ve spent the past 10 years watching our environment grow from 1 server with 2 sites to what it is today.  I can tell you that the way our servers are architected is what works best for us.

                                  Steve

                                • #63986
                                  John Mercogliano
                                  Participant

                                    One thing I noticed in your tps is that you are not killing or continueing the message handle associated with the $mh so that message will stay in your recovery database.  

                                    John

                                    John Mercogliano
                                    Sentara Healthcare
                                    Hampton Roads, VA

                                Viewing 13 reply threads
                                • The forum ‘Cloverleaf’ is closed to new topics and replies.