Connection bounce is resetting alerts

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Connection bounce is resetting alerts

  • Creator
    Topic
  • #50458
    John Zalesak
    Participant

      I am new to alerts but there are some setup already in our system.  The employees who set them up are no longer here so I am on my own.

      Here is what I want to do.

      I want to have an alert on last received.  I want to bounce the thread if it did not get something in the last 30 minutes.  If it did not get something in 95 minutes, I want an email.

      The idea is bounce a couple of times, if still no messages, alert someone.

      My problem is that I think the short time alert bounce is resetting the longer time alert begin time and it never goes off.

      For example: here is what I set up in a test site:

      First Bounce Alert

      Type: last receive

      Source: Inbound_side

      Source Count: any

      Comparing: >= 60

      Duration: once

      Second Email Alert

      Type: last receive

      Source: Inbound_side

      Source Count: any

      Comparing: >= 120

      Duration: once

      When I run it,  the 60 second alert continually goes off but the 120 second alert never does.

      Any ideas on how to bring my idea into reality??

      Thanks in advance for your responses.

    Viewing 13 reply threads
    • Author
      Replies
      • #66158
        Robert Kersemakers
        Participant

          Hi John,

          It looks that way, yes. The first alert always fires, so maybe the second one doesn’t because the ‘trigger’ for this thread is reset.

          Have you tried putting the second alert before (ie above) the first alert in the Alert Configuration file? It should cause the second alert to fire (if >120) and if not, the first alert should still fire (if >60).

          Just a wild guess here…

          Zuyderland Medisch Centrum; Heerlen/Sittard; The Netherlands

        • #66159
          John Zalesak
          Participant

            Robert

            Thanks for your idea.  I will give it a try and let you know.

          • #66160
            John Zalesak
            Participant

              Robert,

              I tried your idea first.

              Alerts in this order

              NUMBER 1

              Type: last receive

              Source: Inbound_side

              Source Count: any

              Comparing: >=75

              Duration: Once

              NUMBER 2

              Type: last receive

              Source: Inbound_side

              Source Count: any

              Comparing: >=60

              Duration: Once

              -> All I ever got was Number 2 (60 secs)

              Then I tried

              NUMBER 1

              Type: last receive

              Source: Inbound_side

              Source Count: any

              Comparing: >=1

              Duration: nsec 75

              NUMBER 2

              Type: last receive

              Source: Inbound_side

              Source Count: any

              Comparing: >=1

              Duration: nsec 60

              The only one I got to fire was NUMBER 2 (61 sec)

              I am really grabbing a straws here.

              Can anyone point me in the correct direction??

              Thanks!

            • #66161
              John Zalesak
              Participant

                Here is a little more info.  I checked my hcimonitord file.  Looks OK to me.  The longer alert (Alert #12) never fires.

                Any comments or assistance is greatly appreciated.

                [aler:aler:INFO/0:  hcimonitord:11/13/2008 11:15:02] New alert #12:

                {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 75}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                [aler:aler:INFO/0:  hcimonitord:11/13/2008 11:15:02] can’t read “HCISITEDIR”: no such variable

                [aler:aler:INFO/0:  hcimonitord:11/13/2008 11:15:02] New alert #13:

                {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 60}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:16:07] Alert #13 triggered.

                alert: {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 60}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                action: IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – Thread last inbound message received time of Inbound_side has been more than or equal to 1for 60 seconds. Connection bounced.” &

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:16:07] Completed Cascade Actions

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:17:22] Alert #13 triggered.

                alert: {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 60}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                action: IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – Thread last inbound message received time of Inbound_side has been more than or equal to 1for 60 seconds. Connection bounced.” &

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:17:22] Completed Cascade Actions

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:18:37] Alert #13 triggered.

                alert: {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 60}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                action: IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – Thread last inbound message received time of Inbound_side has been more than or equal to 1for 60 seconds. Connection bounced.” &

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:18:37] Completed Cascade Actions

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:19:53] Alert #13 triggered.

                alert: {VALUE lastr} {SOURCE {Inbound_side }} {MODE actual} {WITH -2} {COMP {>= 1}} {FOR {nsec 60}} {WINDOW {* * * * * *}} {HOST {}} {ACTION {{exec {IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – %A. Connection bounced.”}}}}

                action: IMalerts_jtz.sh sitejtz tst_pro Inbound_side IM_Alert_Last_Recd “$HCISITEDIR – Thread last inbound message received time of Inbound_side has been more than or equal to 1for 60 seconds. Connection bounced.” &

                [aler:aler:WARN/0:  hcimonitord:11/13/2008 11:19:53] Completed Cascade Actions

              • #66162
                Michael Hertel
                Participant

                  Yes you are grabbing.

                  By bouncing you are resetting the stats on the thread so

                  your email will never fire.

                • #66163
                  John Zalesak
                  Participant

                    So what is the trick.

                    I would like to have it try and fix it self a few times before I get woken up at 3 am.

                    Is there another way to come at this problem???

                  • #66164
                    Michael Hertel
                    Participant

                      You could write an alert script that reads/writes to a file

                      which you could extract previous stats.

                      If it were me, I’d email with the first alert and when I see multiple back to back emails I would know there is an issue.

                      You might even be able to have the alert script fire off and look again in x amount of time to see if it fixed itself.

                      I’ll post one of our scripts in a minute.

                    • #66165
                      Michael Hertel
                      Participant

                        Here is part of our script:

                        Look at the outbound qdepth portion to see the sleep statement.



                        #!/hci/root/bin/hcitcl
                        set ts [clock format [clock seconds]]

                        [code]

                        #!/hci/root/bin/hcitcl
                        set ts [clock format [clock seconds]]

                      • #66166
                        John Zalesak
                        Participant

                          Michael,

                          Thanks for the script.  I go to Tcl class next week so maybe it will make a little more sense when I get back…. If I pay attention!

                          Currently, we do get the email on the first alert, but after being woken up at 3am for a couple of days only to say, its the first one go back to bed.  You starting thinking about doing something different.

                          Our 1st idea (simple) is to

                          Set alert to bounce if none received in 45 minutes

                          Set alert to send e-mail to blackberry (and get me out of bed) if the tread is opening for 30 minutes.

                          Hopefully the bounce fixes the problem.  If after the bounce, a connection can not be made in 30 minutes, its time to get woken up.

                          Our 2nd idea (more complex) is to

                          Set alert to bounce in none received in say 15 minutes

                          The script that bounces should write a time stamp to a log file.

                          The script will also read the log file, do an analysis – say maybe 4 bounces in a row-, and if need be -> send an email to my blackberry to wake me up.

                          Any comments / suggestions would be greatly appreciated.

                        • #66167
                          Michael Hertel
                          Participant

                            Good luck with the class  ðŸ˜€

                          • #66168
                            John Zalesak
                            Participant

                              thanks

                            • #66169
                              James Cobane
                              Participant

                                John,

                                You could utilize the counter functions provided by Cloverleaf (i.e. CtrNextValue, CtrResetValue ) within a tcl script to keep a counter of how many times the first alert fires.  You could run this script on your 30-minute Alert; if the counter value is >= your desired number (i.e. 3), then you could reset the counter and trigger the e-mail within the script via an ‘exec’ command.

                                Also, I don’t believe the order of the alerts has any effect; it’s all based on the defined conditions.  Order would likely only play a role if two alerts had the same condition defined.

                                Hope this helps.

                                Jim Cobane

                                Henry Ford Health

                              • #66170
                                John Zalesak
                                Participant

                                  James – Thanks for you comments

                                  I agree, in my testing, the order of the alerts has no effect.

                                  Thanks for the counter functions in Cloverleaf.  I had no idea they were there.  Where do I find out about them???  Are there counters that will tell us no messages over the last 3 bounces (Alerts) ???

                                  We thought of a similar idea with our own counter stored in a file.  Our problem was how do you differentiate between 3 bounces in the last 90 minutes versus 3 bounce in the last week.  

                                  Thanks again.

                                • #66171
                                  James Cobane
                                  Participant

                                    John,

                                    The counter functions are documented in the ‘Reference Manual’ under Tcl extensions, Counter Commands.  The counter commands allow you to create your own counter files for whatever use you desire (i.e. creating/maintaining sequence numbers, etc).  To determine if the count is for the last 90 minutes vs. the past week, you’ll probably have to store the date/time info off into a file to use for comparison.

                                    Jim Cobane

                                    Henry Ford Health

                                    P.S.  Have fun in your tcl class!

                                Viewing 13 reply threads
                                • The forum ‘Cloverleaf’ is closed to new topics and replies.