All sites Down – Swap Space Issue

Clovertech Forums Read Only Archives Cloverleaf General All sites Down – Swap Space Issue

  • Creator
    Topic
  • #48459
    Amit Anand
    Participant

      Hi Guys

      We have around 200 real time interfaces. Our interface engine went down yesterday. When we tried to manually bring the sites up, everything came back up fine.

      There is nothing much in the LOGS that would help us identify as to why the engine went down except logs for one of the sites show “WARNING: terminating due to swap space shortage”.

      We already have 4 blocks for 512 MB assigned for swap space.

      Can you think of any reason for the swap space being short at  that time. I would really appreciate your help in this regard.

      Thanks

      Amit

    Viewing 2 reply threads
    • Author
      Replies
      • #58691
        Anonymous
        Participant

          Amit,

          How much memory do you have? I think the recommended is 4GB RAM and 4GB swap. It seems that you are using the swap more than you should. If you increase the RAM you will avoid this problem and at the same time you should see a BIG improvement in performance.

          I remember having a problem where some processes were using more and more memory. Only bouncing the process we could recover the resources. We only found a workaround and it was to bounce those processes automatically (cron job)

          None of those processes had problems with message leaks (not killed or not continued)

          We still have that cron job that bounces the processes that adquire too much memory, but I don’t think it actually bounced any process since we moved to version 5.2

          We are in AIX and Sun Solaris and we experienced the same problem on both OSes before we upgraded.

        • #58692
          Frank Hartmann
          Participant

            Amit,

            We ran into a similar situation a couple months ago. It turned out that our database was the culprit as it was using all of our swap space because we weren’t correctly shrinking the database back down and recovering the space during our monthly maintenance. Once we got back to recovering that memory from the database it hasn’t happened again.

          • #58693
            Amit Anand
            Participant

              Hi Frank

              Good to hear from you…it is almost a year since the training.

              Just to verify, when you say clean the database, you mean clean the Recovery and Error database of the various sites by using hcidbinit -r OR hcidbinti -e OR hcidbinit -AC for all the three databases.

              Thanks for your help.

              Amit

          Viewing 2 reply threads
          • The forum ‘General’ is closed to new topics and replies.