Alerts Continue to Fire after Multiple hcimonitord Resets

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Alerts Continue to Fire after Multiple hcimonitord Resets

  • Creator
    Topic
  • #51475
    Mary Kobis
    Participant

    HI All, we’re having an alert attack!… We are on CL 5.7 in an AIX-HA environment. Saturday at 2am some of our SNA connections did not connect properly, so it hung a process on the site. I stopped all processes and re-acquired SNA from the mainframe for all cics regions. After restarting the processes something went awry and I had to “clean” the site. Ever since that time our alerts keep firing off even though all connections are working perfectly fine. It appears the clock/counter is not resetting. I am working with Tech Support but wanted to know if any has experience this…

    Thank you, Mary. 😯

Viewing 5 reply threads
  • Author
    Replies
    • #70446
      Jennifer Hardesty
      Participant

      How was this resolved?

    • #70447
      Russ Ross
      Participant

      I experienced alerts firing that didn’t make sense like you are describing once upon a time.

      I even shutdown the monitor deamon and the alerts were still firing.

      That is when I realized I had an extra monitor daemon running without a corresponding pid file.

      Using the ps -ef command I determined which pid was the extra hidden instance of the monitor daemon for the alert strickened site and used the kill command on it and problem solved.

      Russ Ross
      RussRoss318@gmail.com

    • #70448
      Jennifer Hardesty
      Participant

      That doesn’t appear to be our problem. 🙁

      Early Monday morning, the production server had to be brought down rather ungracefully and the resuscitation was even less pretty.  It took around four hours working with Cloverleaf tech support to bring all four sites back up.

      (The root cause apparently has to do with some sort of runaway “process” on the server itself which eventually sucked up all the CPU memory and hung everything.)

      Anyway here’s the weird thing: Since the recovery, all of the “Last Received” alerts on Site 1 only (note: this is not a problem on any other site) falsely fire every single time.

      There are no problems with any other types of alerts.  They all appear to be working fine.

      I have tried deleting them all and then adding them back in.  I’ve tried changing the the frequency of the checks, changing the email routing, etc.  It doesn’t matter, they still trigger falsely.

      And tech support believes they are “ghost” messages left over from the four hours when we were down on Monday. 🙄

      We’ve reinitialized Site 1 multiple times.  We’ve cleaned it, bounced it, stopped and restarted the lock manager and the monitor daemon.

      You can see that data is flowing through all of those threads.  They are all up to the minute.  Yet, we need these alerts to work for a reason.  Our on-call officers rely on those alerts and I am at a total loss.

    • #70449
      Russ Ross
      Participant

      I had one extreme case in the past where I had a badly behaiving site that I couldn’t get it to mind me no matter what I tried.

      I got to the point I decided to take drastic measures that might work for your sitation.

      Here is a quick overview of what I did:

      – shutdown the badly behaving site

      – use hcisiteinit to create a fresh clean site

      – copy the NetConfig, alerts, xlates, formats, anything I had configured from the badly behaving site to the fresh site

      – start fresh site and my mystery problems no longer plauged me

      Russ Ross
      RussRoss318@gmail.com

    • #70450
      Richard Hinson
      Participant

      [running 5.7 on AIX]

      We’ve been having alert malfunctions today. The monitor daemon has been killed multiple times and reloaded. I even changed the alerts email message and trigger then cycled the monitord, and it’s still sending out the old message and firing erroneously. There is no ghost monitord that I could find with ps -ef |grep monitord

      Anyone know know how to resolve this outside of shutting down the site and reloading them on to a new one?

    • #70451
      James Cobane
      Participant

      You might want to try loading a different alert configuration that has only a single basic alert to see if the problem goes away, then re-load your original alert configuration (or re-build it, one alert at a time).  I would suggest opening an incident with Infor before resorting to anything too drastic.

      Jim Cobane

      Henry Ford Health

Viewing 5 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,292
Replies
34,432
Topic Tags
286
Empty Topic Tags
10