CPU idle% "is zero"

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf CPU idle% "is zero"

  • Creator
    Topic
  • #52999
    Mitchell Rawlins
    Participant

      We’re running cloverleaf 5.7 on RHEL  (5.x, can’t remember which point version), and after about 90 days of uptime we’ll get an alert that idle% is zero, which also likes to crash the monitor daemon.

      /opt/quovadx/qdx5.7/integrator/bin/top lists (for example)

      CPU states:  9.6% user,  0.0% nice, 15.5% system,  0.0% idle, 74.9% iowait

      while

      /usr/bin/top lists

      Cpu(s):  1.3%us,  1.7%sy,  0.0%ni, 86.1%id, 10.8%wa,  0.0%hi,  0.0%si,  0.0%st

      Cloverleaf top is version 3.6, and ldd outputs:

      linux-gate.so.1 =>  (0x009fa000)

      libm.so.6 => /lib/libm.so.6 (0x009b1000)

      libtermcap.so.2 => /lib/libtermcap.so.2 (0x00c3d000)

      libelf.so.1 => /usr/lib/libelf.so.1 (0x00b06000)

      libc.so.6 => /lib/libc.so.6 (0x00856000)

      /lib/ld-linux.so.2 (0x00837000)

      RedHat top is procps version 3.2.7, and ldd outputs:

      linux-gate.so.1 =>  (0x00cb2000)

      libproc-3.2.7.so => /lib/libproc-3.2.7.so (0x009b1000)

      libncurses.so.5 => /usr/lib/libncurses.so.5 (0x04e3f000)

      libc.so.6 => /lib/libc.so.6 (0x00856000)

      libdl.so.2 => /lib/libdl.so.2 (0x009dc000)

      /lib/ld-linux.so.2 (0x00837000)

      Has anyone else seen this before, or have an idea what causes it?  Other than rebooting the server does anybody know what could help fix this?

    Viewing 1 reply thread
    • Author
      Replies
      • #76204
        John Parker
        Participant

          Not seen it in our Cloverleaf but have seen it on other Linux boxes.

          You have an IO issue going on and you need to isolate it so you can find a resolution.  There are tools to help you diagnose IO issues:  iostat, iotop and strace can get you started.

          Also, check out your memory utilization and make sure you have enough memory so Linux can cache properly.  You could have a memory leak that takes time to become critical

          Also check your disk storage subsystem and verify nothing is happening to decrease the throughput.

          To help you get an overview and explanation of the terms do:  man iostat and read the manpage.  It explain what iowait and the other top values actually mean.

          Hope this helps.

          John Parker

          Oconee Medical Center

        • #76205
          Mitchell Rawlins
          Participant

            iostat is showing the average cpu is 3% iowait and 96% idle.

            iotop fluctuates a lot, with multiple threads vying for the top spot.  It doesn’t look like any single process is taking up very much IO.  

            Every tool from RedHat agrees there’s no IO problem.  The only tool that thinks anything’s wrong is the version of top that came with Cloverleaf.  We’re currently working under the hypothesis that the OS-bundled tools are more accurate.

        Viewing 1 reply thread
        • The forum ‘Cloverleaf’ is closed to new topics and replies.