DB unavailble strategies

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf DB unavailble strategies

  • Creator
    Topic
  • #54360
    Jeff Dinsmore
    Participant

      I have tcl code in a number of interfaces that dynamically looks up in a MSSQL (Paragon) database to fetch info that needs to be added to the outbound messages.

      The problem arises when the DB is unavailable – as is the case during an upgrade.

      Ancillary systems are often still operational and sending messages.  If one of these messages triggers a query against the Paragon DB,  it will fail to connect and I’m presented with the choice to either continue without the data from the query, or to kill the message with an error.  Neither is particularly desirable.

      Do any of you know of a method to cause Cloverleaf to wait for a period of time before attempting to process the message again?

      I could handle the waiting inside a tcl proc, but that would cause the process thread to block – an undesirable side-effect.

      Ideally, Cloverleaf would accept a return value of “WAIT” in addition to CONTINUE, KILL, etc.

      Any ideas/techniques appreciated.

      Thanks,

      Jeff.

      Jeff Dinsmore
      Chesapeake Regional Healthcare

    Viewing 8 reply threads
    • Author
      Replies
      • #81169
        Elisha Gould
        Participant

          Depending on how your connecting to the database.

          If you are just using tcl, you could add an extra upoc thread.

          set up your thread with the translation in the upoc proc (pdupoc_write) and store the result in a global variable (modifying the message in the upoc proc will not change the data).

          returning ERROR will retry after the Outbound interval.

          returning Continue will call the Send OK Proc.

          Update the message in the send ok proc (send_data_ok), and use the SEND disposition to forward to your send thread.

        • #81170
          Charlie Bursell
          Participant

            One of two ways I would do this.  Any way you choose, it *MUST* be recoverable if the thread goes down so you do not lose messages

            My main choice:

            Set up a SQLite DB with two fields, time and message.  When a message comes into the thread first check that mssql is available. If so pull any saved messages in the SQLite DB and process them in time order and then process the received message.  Use msgcopy to create the message structure for each to maintain metadata.  If mssql is down simply store message with time stamp in the SQLite DB and return disposition of KILL

            Secondary choice

            Set up a global or namespace variable to hold message ID, not the message itself.  If a message comes in and mssql is available first check global list and process those (msgget) if any then process received message.  If mssql is not available then store the received MSgID in the global queue.  But here is the most *IMPORTANT* thing.  DO NOT RETURN DISPOSITION to the engine.  Return the empty string.  This way if the thread goes down messages are still in the recovery database.

            As an aside if there was a disposition of WAIT, how do you un-wait?

          • #81171
            Jeff Dinsmore
            Participant

              Charlie –

              I imagine a WAIT disposition that would wait for some period of time, then submit the same message again.

              Any subsequent messages would line up behind the one at the head of the queue.

              I’m a little sketchy on what Elisha suggested.  Are you saying his method is prone to message loss if the thread goes down?

              Jeff Dinsmore
              Chesapeake Regional Healthcare

            • #81172
              Charlie Bursell
              Participant

                The WAIT you described is already available.  Shut the thread down and the messages will queue up in order.

                The problem with what Elisha suggested is, if you save the messages in a global queue and the thread goes down you have lost them.

                I thought I explained pretty well how to do the same thing but with recovery.

              • #81173
                Jeff Dinsmore
                Participant

                  Charlie,

                  Thanks for clarifying your evaluation of Elisha’s approach.

                  You did explain options clearly.  It wasn’t any of your suggestions I was questioning – just responding to your question about how I imagine a WAIT function would work.

                  To finish answering your how to un-WAIT question, my suggestion was that the WAIT status be a Cloverleaf function similar to KILL/CONTINUE, so CL would manage the requeueing of WAIT-ed messages.

                  Putting messages into a SQLite queue is certainly workable.  The primary drawback of this method would be that queued messages are invisible to the network monitor, so there would be no visual indication of a backup.

                  I like the fact that option 2 uses the recovery DB for message storage. That’s nice and neat.

                  Relative to option 2, let’s say inbound thread A processes messages to outbounds B, C and D – in that order – all in the same process.  Let’s also assume that D accesses MSSQL for some info.

                  If there are 10 messages WAIT-ed in our global wait list for D and the thread goes down, I’m assuming all 10 of those messages (still in the recovery DB) will be requeued to B and C – in addition to D – when the thread is restarted – correct?  

                  If so, we’d need to store the wait list in some non-volatile location like a file or SQLite so that it would be available to B and C to avoid sending duplicate messages – and they would need logic to handle that.

                  The cleaner option, I suppose, would be to have “D” in a process by itself.

                  Thanks for the suggestions.  They’ll give me a good starting point.

                  Jeff Dinsmore
                  Chesapeake Regional Healthcare

                • #81174
                  Elisha Gould
                  Participant

                    I agree the return “” is another way to handle this case.

                    The main issue is that it does not automatically retry and requires a thread bounce to restart processing.

                    It is possible with the way that I mentioned that the global variable could disappear if shut down in the middle of processing, but it should be detectable, as the variable would be empty/missing at send_data_ok.

                    In this case it could be error’d or it may be possible to recover (I havn’t looked into what is possible).

                    Or are there other issues with UPOC processing where messages can get lost?

                  • #81175
                    Charlie Bursell
                    Participant

                      Any time you shut down with messages not in the database they will be lost.

                      There is no need to cycle the thread to restart processing, the next message in would do that if the SQL database is up.  If your throughput here is very slow then you need a different method.

                      Use the SQLite database and have a fileset thread that runs periodically to check if SQL database is up and if there are messages in sqlite to be sent.  The problem here is some messages may get out of sync but it may not be a problem.

                      Lots of ways to tackle this but first the problem and all of its caveats must be clearly defined.  I was addressing general solutions which may not fit your specific needs but it should give you a starting point

                    • #81176
                      Jeff Dinsmore
                      Participant

                        Update:

                        I opted for the SQLite method.  Conveniently, I had developed a package previously that writes messages to a SQLite DB for deferred delivery, so that made the project a little easier.

                        To guarantee in-order delivery, I write all messages to the SQLite DB, then process all messages in that DB, in the order received, if the Paragon MSSQL DB is available.

                        If MSSQL is not available, the messages queue up.  

                        In normal operation, however, it’s one message in, then it is immediately processed out – realtime except for the 100 or so milliseconds required to write, then read from the SQLite DB.  

                        That’s not zero time, but it’s definitely workable unless message volume is high.  At 10,000 messages per day, it would be an average sacrifice of about .69 seconds per minute. With 100,000 messages per day, it would increase to 6.9 seconds per minute which might necessitate a more complex approach.

                        It could be readily modified to keep track in memory whether or not it has any messages queued and to skip the SQLite write/read part if not – with an initial read required at startup to check for any queued messages.

                        Also, to ensure that low message volume doesn’t prevent delivery of any queued messages, I’ve also added a timed UPOC to run every 5 minutes in case no messages are received to trigger a send.  That’s not as clean, but ensures timely delivery after a MSSQL DB outage without having to rely on receipt of a new message.

                        Jeff Dinsmore
                        Chesapeake Regional Healthcare

                      • #81177
                        Jeff Dinsmore
                        Participant

                          Another update…

                          I modified the following approach:

                          Quote:

                          Also, to ensure that low message volume doesn’t prevent delivery of any queued messages, I’ve also added a timed UPOC to run every 5 minutes in case no messages are received to trigger a send.  That’s not as clean, but ensures timely delivery after a MSSQL DB outage without having to rely on receipt of a new message.

                          I didn’t like the ugly factor of another timed thread in every location where I use external DB access, so I opted instead for a single timer thread that injects a primer message into the outbound pools for the destinations that use DB access.

                          It’s a simple message with just MSH and EVN that is discarded, but serves as a trigger to launch a read/send from the SQLite queueing DB every “n” minutes – the same as a received inbound message would do.  During normal operations, it will always trigger zero messages.  After a DB outage, however, it ensures a maximum of “n” minutes before the queued messages are sent – regardless of whether any more messages have come inbound.

                          It’s still not as clean as I’d like, but probably as close as it gets without something like a “WAIT” return status that’s native to CloverLeaf.

                          Jeff Dinsmore
                          Chesapeake Regional Healthcare

                      Viewing 8 reply threads
                      • The forum ‘Cloverleaf’ is closed to new topics and replies.