100% CPU

  • Creator
    Topic
  • #54352
    Mark Brown
    Participant

      Yesterday, we moved our production server to a virtual Windows 2008 server and upgraded Cloverleaf to 6.02 and it’s been a nightmare ever since.

      When the server was first turned on, the CPU went straight to 100%. We expanded the number of CPUs, shut down the threads and after bringing the site back up, everything was fine for a minute, then it went back to 100%. Then we disabled the virus checker since that’s caused issues before and everything was great for a few hours.  At midnight when it does its cyclesaves, the cpu went back to 100%  I shut down one site and it seemed stable after I brought it back up. This morning, back to 100%.

      I called Infor support and they looked at and felt it was a resource problem since it only had 4GB instead of 8GB. We added more  memory, rebooted and everything looked good for almost a minute.

      Then I considered that it might be the number of processes running. When task manager was showing less than 90 processes, it runs fine. It will run okay for a while if I don’t turn one (doesn’t matter which) process on the engine back on. I disabled a process that wasn’t really important and that worked for about half an hour.

      Nothing I do seems to be a permanent fix.  Everything will be fine until suddenly, the CPU usage just mysteriously ramps up and goes to 100% and never comes down.

      I thought about moving a process or two to a different site. I’m running two sites now. The smaller site never seems to have a problem.

      Any ideas on what to look for? This is a production server. I can’t just keep starting and stopping processes.

    Viewing 8 reply threads
    • Author
      Replies
      • #81127
        Rob Lindsey
        Participant

          Is there a single process that is staying at the top of the process list? We are on AIX so I am not sure how the processes look on the windows system.  What other differences in the system settings between the physical and virtual system?

          As last resort can you go back to the physical system until you can figure out what is going on?

          Rob

        • #81128
          Mark Brown
          Participant

            No, there isn’t any specific process that seems to be the cause.

          • #81129
            Elisha Gould
            Participant

              Make sure you have enough physical resources on the VM server, and that there isn’t something triggering a failover.

              The main causes of your issue would normally be:

              * Processes swapping cpu cores frequently.

              * Low memory and starting to use disk cache (check both VM and VM host)

              * Insufficient disk io.

              You may need to set up the VM so it has dedicated resources on the machine and prevent it swapping to other cores. Its best to ensure that cache is never used. Check the other VM’s on the server and kick off any that would cause an issue to another server.

              We have also had issues when the san was performing poorly. Normally our disk io takes less than 1 ms, but at one point it got up to 10-15 ms, which caused a huge spike.

            • #81130
              Mark Brown
              Participant

                Thanks for the replies.  I’ll pass this on to our server guy. Initially I was only given 4GB of which almost 70% was allocated which I’m sure was causing a lot of disk caching. I now have 8GB and I have removed a couple of processes and right now the engine is quiet. I hope it stays that way.

              • #81131
                Jim Kosloskey
                Participant

                  Mike,

                  Could it be you have over-saturated a site?

                  You say you have 2 sites.

                  How many process/threads are in that site and what is the peak demand (largest number of messages in any 15 minute window let’s say)?

                  Did you upgrade from 5.x Cloverleaf? If so, I think 6.x consumes more resources to some degree (SMAT certainly can consume more disk unless upgrading from a later 5.x).

                  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.

                • #81132
                  Mark Brown
                  Participant

                    Between the two sites, there are 202 threads and 35 processes.

                    When it’s running normally, all the processes showing in the task manager are almost always showing 0 cpu usage.  When the server hits 100%, then there are several processes all running 5% or more, but added up, none of them make up 100%.  All the processes using the most cpu are the hciengine.exe processes and it doesn’t seem to be any particular one causing it.

                  • #81133
                    Mark Brown
                    Participant

                      I’m still waiting for the permanent license.  Would a temporary license be limited?

                    • #81134
                      Jim Kosloskey
                      Participant

                        Mike,

                        As far as I know a temporary license is not constrained from a resource standpoint I think it is only a date based constraint,

                        email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.

                      • #81135
                        Mark Brown
                        Participant

                          I hate to say anything in case I jinx it, but we added 4 more cores for a total of 8 and the engine is barely breathing, so I guess that was the problem.

                          Thanks for the replies.

                      Viewing 8 reply threads
                      • The forum ‘Cloverleaf’ is closed to new topics and replies.