Unusual Behavior after dbvista clean-up

This topic has 1 reply, 2 voices, and was last updated 13 years, 4 months ago by Michael Vork.

Creator

Topic
February 7, 2012 at 3:24 pm #52928
Jennifer Hardesty
Participant
I’m hoping someone else might have seen this before and can explain what happened so we can prevent it or even just understand what happened.

Background Info

Let’s use the following as an example set-up

site 1:

adt_in —> js3_adt

site 3:

(process_hub)

jr1_adt —> hs_app1_adt

—> hs_app2_adt

(process app1)

hr_app1_adt —> app1_adt_out

(process app2)

hr_app2_adt —> app2_adt_out

This all started because an ill-advised backload of data in Production feeds (requested by app1’s users) led to an overflow of HL7 messages pounding app2, which cannot handle ADT with discharge dates previous to the current date. This caused all of these messages to error and slowed down the processing of that queue, eventually backing up that queue into the previous process, one of the hubs on the site. Then that hub process eventually was so affected that its queues began backing up into the previous process.

In an attempt to stop the madness, it was decided to put into place a comparison between the discharge date and today’s date and surpress the messages with the old dates, at least temporarily, until the surge was over. A call to a tclproc called getToday was put into the pre-proc. However, it was incorrectly coded:

Code: set dtToday getToday( )

instead of

Code: set dtToday [getToday]

…and this caused tcl call out errors in the process hub.

Eventually, the recovery and error dbs were pretty much full, the site was beginning to hang, and there was a db vista error. So, we went through the steps of bringing everything in the site down, dumping the dbs to files, etc, etc. However, when we tried to bring the site up, it didn’t want to come back up. The first two times, it went straight to a db vista -921.

The Issue:

The third time, everything looked good in the GUI. I mean, all the icons were green and they said “up”, when I looked in the logs, data “appeared” to be processing.

However, this is incorrect. Messages were writing inbound into the processes but never writing outbound. In the log files, you could see the the message as it arrived inbound and the message as it was processed through all the pre-procs (b/c I insist that all of the tclprocs write to the log to indicate success or failure) but none of the outbound procs were being called — save_ob_msg, validate_reply, or resend_ob_msg — on the outbound threads.

This occurred in multiple ADT processes on site 3 for longer than 24 hours and the nightly cyclesave “bounce” of the threads did not resolve the issue. The entire processes had to be reloaded/”bounced” to resolve the issue. All of the messages had to be manually resent from the SMAT in site 1. They were not in the recovery database, nor where they appropriately stored in the SMAT in the site 3 processes during that time since they were re-init.

More bizarely, this behavior was not consistent. The 8 ftp processes on the site continued to work correctly and all non-ADT processes continued to work fine.
Creator

Topic

Viewing 0 reply threads

Author

Replies
- February 13, 2012 at 3:59 pm #75965
  Michael Vork
  Participant
  Hi Jennifer,
  
  Did you do a hcidbinit -CAf and drop the monitorShmemFile object within the exec directory?
  
  Then you can be sure that your internal databases are completely reset.
  
  Greetings,
  
  Micha
Author

Replies

Viewing 0 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.