Unusual Behavior after dbvista clean-up

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Unusual Behavior after dbvista clean-up

  • Creator
    Topic
  • #52928
    Jennifer Hardesty
    Participant

      I’m hoping someone else might have seen this before and can explain what happened so we can prevent it or even just understand what happened.

      Background Info

      Let’s use the following as an example set-up

      site 1:

      adt_in —> js3_adt

      site 3:

      (process_hub)

      jr1_adt —> hs_app1_adt

                —> hs_app2_adt

      (process app1)

      hr_app1_adt —> app1_adt_out

      (process app2)

      hr_app2_adt —> app2_adt_out

      This all started because an ill-advised backload of data in Production feeds (requested by app1’s users) led to an overflow of HL7 messages pounding app2, which cannot handle ADT with discharge dates previous to the current date.  This caused all of these messages to error and slowed down the processing of that queue, eventually backing up that queue into the previous process, one of the hubs on the site.  Then that hub process eventually was so affected that its queues began backing up into the previous process.

      In an attempt to stop the madness, it was decided to put into place a comparison between the discharge date and today’s date and surpress the messages with the old dates, at least temporarily, until the surge was over.  A call to a tclproc called getToday was put into the pre-proc.  However, it was incorrectly coded:

      Code:

      set dtToday getToday( )

      instead of

      Code:

      set dtToday [getToday]

      …and this caused tcl call out errors in the process hub.

      Eventually, the recovery and error dbs were pretty much full, the site was beginning to hang, and there was a db vista error.  So, we went through the steps of bringing everything in the site down, dumping the dbs to files, etc, etc.  However, when we tried to bring the site up, it didn’t want to come back up.  The first two times, it went straight to a db vista -921.

      The Issue:

      The third time, everything looked good in the GUI.  I mean, all the icons were green and they said “up”, when I looked in the logs, data “appeared” to be processing.

      However, this is incorrect.  Messages were writing inbound into the processes but never writing outbound.  In the log files, you could see the the message as it arrived inbound and the message as it was processed through all the pre-procs (b/c I insist that all of the tclprocs write to the log to indicate success or failure) but none of the outbound procs were being called — save_ob_msg, validate_reply, or resend_ob_msg — on the outbound threads.

      This occurred in multiple ADT processes on site 3 for longer than 24 hours and the nightly cyclesave “bounce” of the threads did not resolve the issue.  The entire processes had to be reloaded/”bounced” to resolve the issue.  All of the messages had to be manually resent from the SMAT in site 1.  They were not in the recovery database, nor where they appropriately stored in the SMAT in the site 3 processes during that time since they were re-init.

      More bizarely, this behavior was not consistent.  The 8 ftp processes on the site continued to work correctly and all non-ADT processes continued to work fine.

    Viewing 0 reply threads
    • Author
      Replies
      • #75965
        Michael Vork
        Participant

          Hi Jennifer,

          Did you do a hcidbinit -CAf and drop the monitorShmemFile object within the exec directory?

          Then you can be sure that your internal databases are completely reset.

          Greetings,

          Micha

      Viewing 0 reply threads
      • The forum ‘Cloverleaf’ is closed to new topics and replies.