› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › netcfgLoad takes 2 minutes to execute
A process that reads all of my NetConfig files that two days ago was taking 4-5 seconds is now taking something like 650 seconds!! That’s about 130 times longer.
So, it’s taking an incredible amount of time, which is causing problems, obviously.
What’s even more perplexing is that it’s taking the same amount of time on my TEST engine as well. Very Strange.
Do any of you know what might be causing it to run so slowly? Does it connect to any site/process processes to gather info, or does it just read the NetConfig file?
If it matters, I’m running CL5.6 on RHEL.
Jeff Dinsmore
Chesapeake Regional Healthcare
Is it hanging up on a particular site’s NetConfig? I thought the ‘netCfgLoad’ command simply read the NetConfig and doesn’t connect to any running process. Did any of these NetConfigs recently change? If so, I would start with that to hone in on the possible culprit.
Jim Cobane
Henry Ford Health
Jim,
Thanks for your quick response.
It’s all sites I tried, but will attempt with others today. I tried 3-4 on my Prod server and one on Test. All behaved the same way.
So, I don’t think it’s NetConfig related.
Prior to this discovery, I rebooted our engine yesterday in a shotgun attempt to fix the problem. So, the prod engine has been up less than 24 hours, the test engine has been up for 195 days and both exhibit the same behavior.
I will continue to dig for clues and will add any that seem relevant to this discussion.
Jeff Dinsmore
Chesapeake Regional Healthcare
I found the source of the slowness.
It’s a host checking loop in checkHostArray
which is called by NCFG:reload
which is called by NCFG:checkReload
which is called by netcfgLoad
So, I know what is causing the slowdown, but I don’t know why it’s suddenly taking so much longer to run…
Jeff Dinsmore
Chesapeake Regional Healthcare
Does someone have one of the NetConfigs locked that might be causing it to take longer? Not sure if this would be applicable, but just throwing out ideas…
Jim Cobane
Henry Ford Health
Problem solved!
I’ve discovered that the radical slowdown was caused when we decommissioned a couple of our old DNS servers.
checkHostArray calls CHD:checkProtocolConfig
which calls CHD:badTcpHost
which validates TCP addresses/hostnames
After the old DNS went away, nslookup was no longer able to resolve “localhost” to an address. And, when nslookup can’t resolve to an address, it takes a Long Time.
So, I made a couple changes.
First:
For situations where it’s desirable to run checkHostArray, I added a special case to CHD:badTcpHost that skips nslookup for “localhost” to prevent long execution times due to failure to resolve “localhost”. This improves the normal operation of checkHostArray by 10-20%.
I believe checkHostArray is run as part of process startup, so the improvement in performance looks to have reduced a bit the time it takes for Cloverleaf processes to start.
Second:
My code that uses netcfgLoad instructs it to skip checkHostArray altogether. This makes the reading of all NetConfigs at least 10 times faster than the 4-5 seconds it used to take. Execution time for this mode is now something like 0.2-0.4 seconds.
Jeff Dinsmore
Chesapeake Regional Healthcare
Jeff,
Glad you were able resolve it; excellent detective work! This is good information. Thanks for posting.
Jim Cobane
Henry Ford Health
Jeff, would you please post your code? I’d like to use it myself. 🙂 And I may may a suggestion to R&D to update their code.
-- Max Drown (Infor)
Certainly.
In netData.tlib, I changed this:
proc CHD:badTcpHost { checkValue warnTextVar } {
global tcl_platform
upvar $warnTextVar warnText
set tcpAddrPttn {^[0-9]+.[0-9]+.[0-9]+.[0-9]+$}
if [regexp $tcpAddrPttn $checkValue] {
return 0
}
set nslook “”
if {[regexp -nocase — {Windows} $tcl_platform(os)] } {
catch {set nslook [exec nslookup $checkValue 2>NUL]}
} else {
catch {set nslook [exec nslookup $checkValue 2>/dev/null]}
}
set nscheck “can’t find $checkValue”
if { [regexp -nocase — {Windows} $tcl_platform(os)] ||
[regexp -nocase — {AIX} $tcl_platform(os)] ||
[regexp -nocase — {HP-UX} $tcl_platform(os)] } {
if { [lsearch [listTcpHosts] $checkValue] < 0 && ![regexp — $checkValue $nslook]} {
set warnText "cannot find host '$checkValue'"
return 1
}} else {
if { [lsearch [listTcpHosts] $checkValue] < 0 && [regexp — $nscheck $nslook] } {
set warnText "cannot find host '$checkValue'"
return 1
}
}
return 0
}
To this:
proc CHD:badTcpHost { checkValue warnTextVar } {
global tcl_platform
upvar $warnTextVar warnText
set tcpAddrPttn {^[0-9]+.[0-9]+.[0-9]+.[0-9]+$}
if [regexp $tcpAddrPttn $checkValue] {
return 0
} elseif { [string tolower $checkValue] eq “localhost” } {
#puts “CHD:badTcpHost: LOCALHOST – return 0”
return 0
}
set nslook “”
if {[regexp -nocase — {Windows} $tcl_platform(os)] } {
catch {set nslook [exec nslookup $checkValue 2>NUL]}
} else {
catch {set nslook [exec nslookup $checkValue 2>/dev/null]}
}
set nscheck “can’t find $checkValue”
if { [regexp -nocase — {Windows} $tcl_platform(os)] || [regexp -nocase — {AIX} $tcl_platform(os)] || [regexp -nocase — {HP-UX} $tcl_platform(os)] } {
if { [lsearch [listTcpHosts] $checkValue] < 0 && ![regexp — $checkValue $nslook]} {
set warnText "cannot find host '$checkValue'"
return 1
}
} else {
if { [lsearch [listTcpHosts] $checkValue] < 0 && [regexp — $nscheck $nslook] } {
set warnText "cannot find host '$checkValue'"
return 1
}
}
return 0
}
The only difference is the addition of this check for localhost:
} elseif { [string tolower $checkValue] eq “localhost” } {
#puts “CHD:badTcpHost: LOCALHOST – return 0”
return 0
}
And, I added a global option variable (__CRMC_SKIP_HOST_ARRAY_CHECK__) in nci.tlib to NCFG:reload to skip host verification entirely.
The rationale is that the host names/IP addresses change infrequently, so don’t need validation every time one of my scripts wants a listing of sites/processes.
The modified code is here:
proc NCFG:reload { config } {
global __CRMC_SKIP_HOST_ARRAY_CHECK__
global _ncfg_ConnData _ncfg_ProcessData _ncfg_ProcessConns _ncfg_Version
global _ncfg_GroupConns _ncfg_TypeCounts
if { ! [info exists __CRMC_SKIP_HOST_ARRAY_CHECK__] } {
set __CRMC_SKIP_HOST_ARRAY_CHECK__ 0
}
# Remove, initialize the array
catch { unset _ncfg_ConnData _ncfg_ProcessData _ncfg_ProcessConns }
set _ncfg_ConnData(bogusKey) {}
unset _ncfg_ConnData(bogusKey)
set _ncfg_ProcessData(bogusKey) {}
unset _ncfg_ProcessData(bogusKey)
set _ncfg_ProcessConns(bogusKey) {}
unset _ncfg_ProcessConns(bogusKey)
set _ncfg_GroupConns(bogusKey) {}
unset _ncfg_GroupConns(bogusKey)
set _ncfg_TypeCounts(external) 0
set _ncfg_TypeCounts(vendor) 0
if { [catch {set ncf [open $config r]} msg] } {
error $msg
}
if {[uplevel #0 {info exists EuroBinary}]} {
fconfigure $ncf -encoding binary -eofchar {}
}
set minvers 3.0
set maxvers 3.10
if [getNetVersion $ncf _ncfg_Version] {
# Check the lower and upper bounds of compatible versions
if { $_ncfg_Version $maxvers } {
set errMsg “$config is version $_ncfg_Version; ”
append errMsg “versions $minvers through $maxvers are supported.”
error $errMsg
}
} else {
error “Unable to fetch version from $config.”
}
# Now process the data in the file
while { [lgets $ncf line] >= 0 } {
if { [lempty $line] } continue
lassign $line type name config
switch -exact — $type {
protocol {
if { [info exists _ncfg_ConnData($name)] } {
close $ncf
error “Duplicate NetConfig host entry: $name”
}
set _ncfg_ConnData($name) $config ;# Store conn data
# Prepare the process to name mapping
if { [keylget config PROCESSNAME procName] == 0 } {
close $ncf
error “PROCESSNAME missing for entry: $name”
}
lappend _ncfg_ProcessConns($procName) $name
# Fetch the thread’s CloverleafGateway type and
# update the count. If the thread is untyped, assume
# vendor (vendor threads are more likely to be
# license limited than external threads).
# N.B.: this is _different_ than
# engine/protocols/ProtocolThreadData.c
# PR-2839: use $gwtype to not clobber $type
if { [keylget config GATEWAYTHREADTYPE gwtype] == 0 } {
set gwtype vendor
}
if { [string compare $gwtype external] && [string compare $gwtype vendor] } {
close $ncf
error “$name: bogus GATEWAYTHREADTYPE ‘$gwtype'”
}
incr _ncfg_TypeCounts($gwtype)
# Add a protocol connection to all its groups
if [cequal $type protocol] {
if { [keylget config GROUPS groups] == 0 } {
close $ncf
error “GROUPS missing for entry: $name”
}
foreach grp $groups {
lappend _ncfg_GroupConns($grp) $name
}
}
}
process {
if { [info exists _ncfg_ProcessData($name)] } {
close $ncf
error “Duplicate NetConfig process entry: $name”
}
set _ncfg_ProcessData($name) $config ;# Store process data
}
default {
# Just ignore everything else
}
}
}
# check the processes for errors
set verifyWarnList “”
if {[catch {checkProcessArray verifyWarnList _ncfg_ProcessData} msg]} {
puts stderr “nNetConfig Error: $msgn”
}
if { ! $__CRMC_SKIP_HOST_ARRAY_CHECK__ } {
# check the threads for errors
if {[catch {checkHostArray verifyWarnList _ncfg_ConnData} msg]} {
puts stderr “nNetConfig Error: $msgn”
}
}
close $ncf
}
Jeff Dinsmore
Chesapeake Regional Healthcare
Why don’t you put localhost in the /etc/hosts file and avoid all this? You can put “hosts: files dns” in your /etc/nsswitch.conf file to tell the system to only query DNS if it can’t find a match in the hosts file. This is how my Redhat system was configured by default.
I guess the problem is that the script is calling “exec nslookup” instead of using host_info or another command that uses the system gethostbyname call.