Forum Replies Created
-
AuthorReplies
-
Just guessing, but try running:
ps -ef | grep hcimonitord
and see if you have more than one Net Monitor daemon running for the same site?
Just a thought, what protocol are you using? If it’s the standard pdl-tcip with mlp_tcp_.pdl, then maybe the vendor is not using the hex 0B1C0D “envelope” to encapsulate the ACK (though it seems like you’d see something then with the EO turned up).
Not sure if this is related, but you might want to have a look at this topic thread:
https://usspvlclovertch2.infor.com/viewtopic.php?t=4716
The gist was that we discovered that stop/start activity in one process caused messages to appear in all the other process logs of that site.
Brian,
In your list of commands you have “hcisitectl -K” with a capital “K”, which I believe kills all daemons (both the lock manager and monitor daemon). Then later you have the command “hcisitectl -s m”, which would start the monitor daemon only. Doesn’t look right to me that you aren’t restarting the lock manager daemon, but then I’m on AIX and maybe that thinking doesn’t apply to Cloverleaf running on Windows 2003?
You might want to take a look at this posting for a solution.
https://usspvlclovertch2.infor.com/viewtopic.php?t=1278&highlight=
I wondered why you mentioned dumping them to a file if they were all from a message type you didn’t need, but if it’s possible there are some errors for other message types (or other threads) mixed in, then saving them to a file would be a reasonable precaution.
If you wanted to just delete the DFT^P03 messages and leave everything else in the error db, one option would be:
hcidbdump -e -D -F -f -s 101
This would delete error messages from that thread only, and which had the specific “101” status. Then you could see if there were any errors left in the db that you cared about. I’d still advise reinitializing the error db once you have everything cleared out, for the reasons stated earlier.
If you’d prefer to save all the error messages to a file:
hcidbdump -e -b -a
This will save the messages in a length-encoded format (each message is preceded by a number giving it’s size – where it stops and the next one begins).
Whichever you choose, I’d recommend doing this from a command line. I avoid using the Database Administrator gui whenever there are a large number of messages involved, or the individual messages are very large in size. It seems like the screen can freeze up, leading to database corruption (db Vista errors). (It’s fine to use the gui screen when you are sure there are only a small number of messages.) Though you also need to be careful when using the hcidbdump command. Once you hit enter, you must just leave it alone and allow it to finish executing and return. You should never get impatient and interupt it by using Cntl-C. A handy thing about the gui is you can use it to play around with the options, and then hit the “Show Command” button (rather than “Apply”), and then copy/paste to the command line.
Now, to answer your question about resending the messages from a file. Since these are messages from the error db, that means they are the incoming message, rather than something that has gone through translation, so they would need to be re-sent to the inbound thread. Right click on the thread icon in the NetMonitor and chose Control/Full. In the Controls pop up, choose “resend”. In the “Direction” section at the top, I think you are going to want to select the radio button for “inbound post-TPS”. This is because an inbound thread is usually receiving HL7 messages over a tcp/ip connection, and is sending back HL7 Acknowledgment messages. But you are feeding in messages from a file, so there’s no need for any Acks, and I believe by using the post-TPS stack you skip over that part. Leave the Msg Type as “Data”. Use the “List” button to select the path to your file name. In the “File Format” section at the bottom be sure to change the selection to “Length Encoded”, if that’s how you saved the file. Hit the OK button, and you should see the thread get bogged down with 68,000 messages! Unless you set the priority in the resend lower than the default, real-time messages coming into that thread might get delayed until the resend is finished.
As long as I’ve blathered on this long, I thought I’d mention one other piece of advice (there’s usually several different ways to accomplish things in Cloverleaf, so compare anything I suggest with other postings on this forum). If there was a considerable gap of time between when you first became aware that you were receiving a new, unexpected message type, and when you finally got your HIS system to stop sending the DFT messages, you might have been able to reduce the impact on the error db by creating a new routing entry for the DFT^P03 messages. Do it as a raw route, with the destination set right back to the same inbound thread name, but with “hcitpsmsgkill” as the proc in the route details. That way the engine has somewhere to send the messages, even if they all get a “KILL” disposition along the way, and it doesn’t get flustered and consider them an unsupported transaction type.
In fact, something we have done a number of times is to set up a “Static” routing like this on inbound threads so that no matter what is received it has somewhere to go (and then gets killed). After that we create the individual routing entries for the message types/events we actually expect to receive and want to interface.
If you’ve had that many messages pile up in the error db, it sounds like a good idea to reinitialize the database rather than just try to dump/delete the messages out of it, because the latter approach wont give you back any disk space (and I think all those empty slots in the db make it slower to work with). You will need to end all processes in that site and also stop the site daemons before doing the init command. Probably in your case:
hcidbinit -ef
Or if you are sure there are no messages pending in the recovery db and you want to reset everything:
hcidbinit -AC
I think the search function ignores any puctuation characters, so I have often found it useless in finding what I need. On the upside (?), chances are that with the aquisition by Lawson, we will be shifted over to some user forum of theirs, meaning all this past history of answers and information may become completely lost to us. I don’t know how helpful this will be for your issue, but several years back one of our AIX administrators came up with this solution for automatically setting our DISPLAY variable to each users local ip address.
In our hci user “.profile.local.end” file we added:
Code:
# Set the DISPLAY variable
# Note: this assumes you are connected from a graphics display. Any application
# that uses this variable may fail if you are not running X.
. /usr/local/bin/set.displayAnd this is the contents of the set.display script:
Code:
export DISPLAY=`who -m | awk ‘ { print $6 }’ | cut -f 2 -d’(’ | cut -f 1 -d’)’`:0.0So, for instance, when I start up an xterm window (we use ReflectionsX as our emulator software) and login, the variable is automatically set:
cloprod::hci> echo $DISPLAY
172.21.2.177:0.0
John, thanks for the suggestion. I had in fact originally made use of the new “repeat” functionality available to us in 5.7, and enabled a 1 minute repeat. What I found when I disabled the repeat function was that the alert repeated every two minutes anyway, because each pstart of the thread “resets” the alert status, and the conditions ( >= 90 seconds, for duration of 30 seconds) become true again after 2 minutes and the alert triggers again.
The idea behind this alert was that the Philips feed should normally be constant and reliable, except for the infrequent occasions when they need to reboot one of their dbs servers to make a config change.
As it happens, this interface went live yesterday. The users discovered that they weren’t getting vent data for 2 beds. In the course of figuring out that they had incomplete bed mapping for the new connection, clinical engineering rebooted their server twice during the day. Both times, the alert performed as intended, and after a few repeats the server came back up, responded to the connection request, and the feed was reestablished. I understand that only one thread pstart is required to put the thread in a “listening” status, but because each thread start causes the alert status to be reset, and the conditions become true again after 2 minutes, I think a few repeats are inevitable.
As stated below, the problem in our test environment that Jennifer reported, was because clinical engineering completely shut down the feed to test to repurpose it for production, and I didn’t remember that I should disable or remove the alert. However, I also hadn’t thought that the alert bouncing the thread repeatedly would pose a problem for anything else in that site. But now we know that each pstart causes those INFO messages to show up in all process logs.
The alert currently in production also includes an email notification, so if the alert is firing endlessly it will fill up my inbox, and drive me to distraction rather than Jennifer.
Thanks Michael and John, I think I have a much better understanding now of what was going on (and John, your explanation of the “why” sounds plausible).
The alert was for a Philips vital signs feed that was expected to be very reliable and consistant throughout the entire day, so not recieving any messages for as little as 90 seconds was considered to be proof there was a problem. If the Philips server was ever rebooted, it did not send a tcp/ip signal about the disconnect, so the engine side stayed in an “up” status, connected to a dead socket number. The exec action for the alert was to bounce the thread repeatedly until it reestablished a connection. The problem in our test environment was that as we prepped for a go-live, the clinical engineer shut off the feed to test and used it for the new prod connection. Thus, the alert started triggering every two minutes because it wasn’t receiving messages any more.
John, you were on the right track about it having to do with an alert.
I work on the same team as Jennifer, and it appears these INFO messages were showing up in the logs of ALL the processes in that site.
I had a Last Received alert configured on an interface I was developing, which excuted a pstop and pstart of the affected thread. For some reason it was causing these INFO messages to get posted to the logs of the other processes in that site. Maybe that’s normal, but it was pretty obnoxious because the alert was triggering every 2 minutes. I’ve deleted the alert, and the only remaining INFO messages we are now seeing in the logs appear to be from some crontab scripts that also do stops and starts of threads.
Well, 28 reads and no replys. Maybe that’s because it’s actualy a non-issue?
It bothers me that we get different output in 5.7 vs. 5.4, but I don’t know enough about xml to say that 5.7 is not in fact more correct than 5.4. There doesn’t seem to be a point to including that namespace declaration in output, when none of the elements in the output message are prefixed with “xs:”. The person who is receiving the output from this interface says his side doesn’t need (ignores) the namespace declaration, so doesn’t care if it’s missing when sent from 5.7.
I guess if the schema had defined multiple namespaces, and elements in the message were assigned to different ones, then I’d be concerned if the namespace declarations were missing from output. But in this case, where everything in the schema belongs to a single namespace, then I guess it doesn’t matter that no namespace is defined in output.
Thanks, Charlie, but now I’m having the same problem as you – I can’t get the timeouts to happen anymore!
Right before I tried changing the braces to quotes (from your first reply), the timeouts seemed to stop occurring. I changed back to braces, set the repeat on the alert to happen every 3 minutes and let it run the rest of the day, and still no timeouts. Before I left for the day, I got the idea that maybe the frequent emails were keeping something “awake” on the SMTP relay server and thus preventing the timeouts, so I slowed down the repeat to 61 minutes and let it run overnight, and still no timeouts. If I can’t duplicate the problem, then I can’t figure out which coding change would solve it.
I did contact the admin for the SMTP server yesterday, so maybe he changed something that eliminated the timeouts. He did hint that he had been working with McAfee on some performance issues. So maybe the timeouts were just a transitory issue that will never happen again.
I am going to make the code modifications you suggested in your most recent reply, and leave it at that. Hopefully that will fix the “no such variable” error if a timeout ever occurs again. And if not, at least the timeouts and the variable error didn’t seem to be preventing the emails from being successfully sent.
It’s a bit rude of me to ignore Lawrence’s most recent posting, but I had been meaning to reply to Jim’s earlier post.
In general, I agree that it’s a beutiful thing that one can do a Cloverleaf upgrade install without having to even stop running processes in the older release. But I did want to mention one ‘gotcha’ I encountered this week when I installed 5.7 Rev 2 on our production server (we are currently running 5.4). The install makes changes to the /etc/environment file, changing a section from:
# begin HCIenv 5.4
FPATH=/quovadx/qdx5.4/integrator/kshlib
QUOVADX_INSTALL_DIR=/quovadx/qdx5.4
# end HCIenv 5.4
to this:
# begin HCIenv 5.7
FPATH=/quovadx/qdx5.7/integrator/kshlib
CL_INSTALL_DIR=/quovadx/qdx5.7
# end HCIenv 5.7
We have a small number of scripts that are called from crontab using a non-hardcoded method to set the hci environment, basically just executing ‘setroot’ w/ no parameters, then executing the script. As soon as the install changed the FPATH variable in the environment file to point to 5.7, that’s where the script started looking, even before hcisetenv had been copied to the qdx5.7 directory, or before execute permissions had been set.
We have most of our other crontab jobs using a hard-coded method to set the hci environment, like:
15 19 * * * /usr/bin/ksh -c ‘eval `/quovadx/qdx5.4/integrator/sbin/hcisetenv -root ksh /quovadx/qdx5.4/integrator cloprod3`;
And these jobs started failing in the 5.4 environment because:
“QUOVADX_INSTALL_DIR is not set. Exit.”
So once the install was completed I edited the /etc/environment file to be:
# begin HCIenv 5.7
FPATH=/quovadx/qdx5.4/integrator/kshlib
CL_INSTALL_DIR=/quovadx/qdx5.7
QUOVADX_INSTALL_DIR=/quovadx/qdx5.4
# end HCIenv 5.7
This seems to be keeping our crontab jobs running happily in the 5.4 environment for the time being, and later we will have to pick a point in the migration where we point FPATH back to 5.7.
-
AuthorReplies