netcfgLoad takes 2 minutes to execute

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf netcfgLoad takes 2 minutes to execute

  • Creator
    Topic
  • #54898
    Jeff Dinsmore
    Participant

      I use the netcfgLoad utility in some TCL scripting and suddenly, it’s taking two minutes to execute where normally it’s MUCH quicker – like sub-second quicker…  

      A process that reads all of my NetConfig files that two days ago was taking 4-5 seconds is now taking something like 650 seconds!!  That’s about 130 times longer.

      So, it’s taking an incredible amount of time, which is causing problems, obviously.

      What’s even more perplexing is that it’s taking the same amount of time on my TEST engine as well.  Very Strange.

      Do any of you know what might be causing it to run so slowly?  Does it connect to any site/process processes to gather info, or does it just read the NetConfig file?

      If it matters, I’m running CL5.6 on RHEL.

      Jeff Dinsmore
      Chesapeake Regional Healthcare

    Viewing 9 reply threads
    • Author
      Replies
      • #83346
        James Cobane
        Participant

          Is it hanging up on a particular site’s NetConfig?  I thought the ‘netCfgLoad’ command simply read the NetConfig and doesn’t connect to any running process.  Did any of these NetConfigs recently change?  If so, I would start with that to hone in on the possible culprit.

          Jim Cobane

          Henry Ford Health

        • #83347
          Jeff Dinsmore
          Participant

            Jim,

            Thanks for your quick response.

            It’s all sites I tried, but will attempt with others today.  I tried 3-4 on my Prod server and one on Test.  All behaved the same way.

            So, I don’t think it’s NetConfig related.

            Prior to this discovery, I rebooted our engine yesterday in a shotgun attempt to fix the problem.  So, the prod engine has been up less than 24 hours, the test engine has been up for 195 days and both exhibit the same behavior.

            I will continue to dig for clues and will add any that seem relevant to this discussion.

            Jeff Dinsmore
            Chesapeake Regional Healthcare

          • #83348
            Jeff Dinsmore
            Participant

              I found the source of the slowness.

              It’s a host checking loop in checkHostArray

              which is called by NCFG:reload

              which is called by NCFG:checkReload

              which is called by netcfgLoad

              So, I know what is causing the slowdown, but I don’t know why it’s suddenly taking so much longer to run…

              Jeff Dinsmore
              Chesapeake Regional Healthcare

            • #83349
              James Cobane
              Participant

                Does someone have one of the NetConfigs locked that might be causing it to take longer?  Not sure if this would be applicable, but just throwing out ideas…

                Jim Cobane

                Henry Ford Health

              • #83350
                Jeff Dinsmore
                Participant

                  Problem solved!

                  I’ve discovered that the radical slowdown was caused when we decommissioned a couple of our old DNS servers.

                  checkHostArray calls CHD:checkProtocolConfig

                  which calls CHD:badTcpHost

                  which validates TCP addresses/hostnames

                  After the old DNS went away, nslookup was no longer able to resolve “localhost” to an address.  And, when nslookup can’t resolve to an address, it takes a Long Time.

                  So, I made a couple changes.

                  First:

                  For situations where it’s desirable to run checkHostArray, I added a special case to CHD:badTcpHost that skips nslookup for “localhost” to prevent long execution times due to failure to resolve “localhost”. This improves the normal operation of checkHostArray by 10-20%.

                  I believe checkHostArray is run as part of process startup, so the improvement in performance looks to have reduced a bit the time it takes for Cloverleaf processes to start.

                  Second:

                  My code that uses netcfgLoad instructs it to skip checkHostArray altogether.  This makes the reading of all NetConfigs at least 10 times faster than the 4-5 seconds it used to take.  Execution time for this mode is now something like 0.2-0.4 seconds.

                  Jeff Dinsmore
                  Chesapeake Regional Healthcare

                • #83351
                  James Cobane
                  Participant

                    Jeff,

                    Glad you were able resolve it; excellent detective work!  This is good information.  Thanks for posting.

                    Jim Cobane

                    Henry Ford Health

                  • #83352

                    Jeff, would you please post your code? I’d like to use it myself. 🙂 And I may may a suggestion to R&D to update their code.

                    -- Max Drown (Infor)

                  • #83353
                    Jeff Dinsmore
                    Participant

                      Certainly.

                      In netData.tlib, I changed this:

                      Code:

                      proc CHD:badTcpHost { checkValue warnTextVar } {
                         global tcl_platform
                         upvar $warnTextVar warnText

                         set tcpAddrPttn {^[0-9]+.[0-9]+.[0-9]+.[0-9]+$}

                         if [regexp $tcpAddrPttn $checkValue] {
                      return 0

                         }

                         
                         set nslook “”
                         if {[regexp -nocase — {Windows} $tcl_platform(os)] } {
                      catch {set nslook [exec nslookup $checkValue 2>NUL]}
                         } else {
                      catch {set nslook [exec nslookup $checkValue 2>/dev/null]}
                         }
                         
                      set nscheck “can’t find $checkValue”
                         if { [regexp -nocase — {Windows} $tcl_platform(os)] ||
                      [regexp -nocase — {AIX} $tcl_platform(os)]     ||
                      [regexp -nocase — {HP-UX} $tcl_platform(os)] } {
                         if { [lsearch [listTcpHosts] $checkValue] < 0 && ![regexp — $checkValue $nslook]} {
                      set warnText "cannot find host '$checkValue'"
                      return 1
                          }} else {
                      if { [lsearch [listTcpHosts] $checkValue] < 0 && [regexp — $nscheck $nslook] } {
                      set warnText "cannot find host '$checkValue'"
                      return 1
                          }
                      }

                         return 0
                      }

                      To this:

                      Code:

                      proc CHD:badTcpHost { checkValue warnTextVar } {
                         global tcl_platform
                         upvar $warnTextVar warnText

                         set tcpAddrPttn {^[0-9]+.[0-9]+.[0-9]+.[0-9]+$}

                         if [regexp $tcpAddrPttn $checkValue] {
                      return 0
                         } elseif { [string tolower $checkValue] eq “localhost” } {
                      #puts “CHD:badTcpHost: LOCALHOST – return 0”
                      return 0
                      }

                         
                         set nslook “”
                         if {[regexp -nocase — {Windows} $tcl_platform(os)] } {
                      catch {set nslook [exec nslookup $checkValue 2>NUL]}
                         } else {
                      catch {set nslook [exec nslookup $checkValue 2>/dev/null]}
                         }
                         
                      set nscheck “can’t find $checkValue”
                         if { [regexp -nocase — {Windows} $tcl_platform(os)] || [regexp -nocase — {AIX} $tcl_platform(os)] || [regexp -nocase — {HP-UX} $tcl_platform(os)] } {
                         if { [lsearch [listTcpHosts] $checkValue] < 0 && ![regexp — $checkValue $nslook]} {
                      set warnText "cannot find host '$checkValue'"
                      return 1
                      }
                      } else {
                      if { [lsearch [listTcpHosts] $checkValue] < 0 && [regexp — $nscheck $nslook] } {
                      set warnText "cannot find host '$checkValue'"
                      return 1
                      }
                      }

                         return 0
                      }

                      The only difference is the addition of this check for localhost:

                      Code:

                         } elseif { [string tolower $checkValue] eq “localhost” } {
                      #puts “CHD:badTcpHost: LOCALHOST – return 0”
                      return 0
                      }

                      And, I added a global option variable (__CRMC_SKIP_HOST_ARRAY_CHECK__) in nci.tlib to NCFG:reload to skip host verification entirely.  

                      The rationale is that the host names/IP addresses change infrequently, so don’t need validation every time one of my scripts wants a listing of sites/processes.

                      The modified code is here:

                      Code:

                      proc NCFG:reload { config } {

                         global __CRMC_SKIP_HOST_ARRAY_CHECK__

                         global _ncfg_ConnData _ncfg_ProcessData _ncfg_ProcessConns _ncfg_Version
                         global _ncfg_GroupConns _ncfg_TypeCounts

                         if { ! [info exists __CRMC_SKIP_HOST_ARRAY_CHECK__] } {
                      set __CRMC_SKIP_HOST_ARRAY_CHECK__ 0
                      }

                         # Remove, initialize the array
                         catch { unset _ncfg_ConnData _ncfg_ProcessData _ncfg_ProcessConns }
                         set   _ncfg_ConnData(bogusKey) {}
                         unset _ncfg_ConnData(bogusKey)
                         set   _ncfg_ProcessData(bogusKey) {}
                         unset _ncfg_ProcessData(bogusKey)
                         set   _ncfg_ProcessConns(bogusKey) {}
                         unset _ncfg_ProcessConns(bogusKey)
                         set   _ncfg_GroupConns(bogusKey) {}
                         unset _ncfg_GroupConns(bogusKey)

                         set _ncfg_TypeCounts(external) 0
                         set _ncfg_TypeCounts(vendor) 0

                         if { [catch {set ncf [open $config r]} msg] } {
                      error $msg
                         }
                      if {[uplevel #0 {info exists EuroBinary}]} {
                      fconfigure $ncf -encoding binary -eofchar {}
                      }

                         set minvers 3.0
                         set maxvers 3.10

                         if [getNetVersion $ncf _ncfg_Version] {
                      # Check the lower and upper bounds of compatible versions
                      if { $_ncfg_Version $maxvers } {
                      set    errMsg “$config is version $_ncfg_Version; ”
                      append errMsg “versions $minvers through $maxvers are supported.”

                      error $errMsg
                      }
                      } else {
                      error “Unable to fetch version from $config.”
                         }

                         # Now process the data in the file

                         while { [lgets $ncf line] >= 0 } {
                      if { [lempty $line] } continue

                      lassign $line type name config

                      switch -exact — $type {
                      protocol {
                      if { [info exists _ncfg_ConnData($name)] } {
                      close $ncf
                      error “Duplicate NetConfig host entry: $name”
                      }

                      set _ncfg_ConnData($name) $config ;# Store conn data

                      # Prepare the process to name mapping

                      if { [keylget config PROCESSNAME procName] == 0 } {
                      close $ncf
                      error “PROCESSNAME missing for entry: $name”
                      }
                      lappend _ncfg_ProcessConns($procName) $name

                      # Fetch the thread’s CloverleafGateway type and
                      # update the count.  If the thread is untyped, assume
                      # vendor (vendor threads are more likely to be
                      # license limited than external threads).
                      # N.B.: this is _different_ than
                      # engine/protocols/ProtocolThreadData.c
                      # PR-2839: use $gwtype to not clobber $type
                      if { [keylget config GATEWAYTHREADTYPE gwtype] == 0 } {
                      set gwtype vendor
                      }
                      if { [string compare $gwtype external] && [string compare $gwtype vendor] } {
                      close $ncf
                      error “$name: bogus GATEWAYTHREADTYPE ‘$gwtype'”
                      }
                      incr _ncfg_TypeCounts($gwtype)

                      # Add a protocol connection to all its groups

                      if [cequal $type protocol] {
                      if { [keylget config GROUPS groups] == 0 } {
                      close $ncf
                      error “GROUPS missing for entry: $name”
                      }

                      foreach grp $groups {
                      lappend _ncfg_GroupConns($grp) $name
                      }
                      }
                      }

                      process {
                      if { [info exists _ncfg_ProcessData($name)] } {
                      close $ncf
                      error “Duplicate NetConfig process entry: $name”
                      }

                      set _ncfg_ProcessData($name) $config ;# Store process data
                      }

                      default {
                      # Just ignore everything else
                      }
                      }
                         }

                         # check the processes for errors
                         set verifyWarnList “”
                         if {[catch {checkProcessArray verifyWarnList _ncfg_ProcessData} msg]} {
                      puts stderr “nNetConfig Error: $msgn”
                         }

                         if { ! $__CRMC_SKIP_HOST_ARRAY_CHECK__ } {
                      # check the threads for errors
                      if {[catch {checkHostArray verifyWarnList _ncfg_ConnData} msg]} {
                      puts stderr “nNetConfig Error: $msgn”
                      }
                      }

                         close $ncf
                      }

                      Jeff Dinsmore
                      Chesapeake Regional Healthcare

                    • #83354
                      David Barr
                      Participant

                        Why don’t you put localhost in the /etc/hosts file and avoid all this? You can put “hosts: files dns” in your /etc/nsswitch.conf file to tell the system to only query DNS if it can’t find a match in the hosts file. This is how my Redhat system was configured by default.

                      • #83355
                        David Barr
                        Participant

                          I guess the problem is that the script is calling “exec nslookup” instead of using host_info or another command that uses the system gethostbyname call.

                      Viewing 9 reply threads
                      • The forum ‘Cloverleaf’ is closed to new topics and replies.