› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Engine process core dump
[pdl :PDL :ERR /0:hie_sof_mdm_ob:12/07/2014 01:05:35] write timeout expired
[pdl :PDL :ERR /0:hie_sof_mdm_ob:12/07/2014 01:05:35] PDL signaled exception: code 1, msg write failure
[pti :sign:WARN/0:hie_sof_mdm_ob:12/07/2014 01:05:35] Thread 24 ( hie_sof_mdm_ob ) received signal 11
[pti :sign:WARN/0:hie_sof_mdm_ob:12/07/2014 01:05:35] PC = 0xffffffff
Jim
Signal 11, or officially know as “segmentation fault”, means that the program accessed a memory location that was not assigned.
Is this one of the “standard” Cloverleaf PDLs or or one written by you? What version of Cloverleaf? Has this thread been running successful for a time before this started? Any recent changes?
You might try recompiling or better still, if it is an MLP PDL, use the encapsulation mode in the standard TCP/IP protocol
Charlie we are upgraded to 6.0.2 on 11/8 and this is the standard mlp_tcp.pdl. It seem at first to be just our VPN connections but then it happened again with one of our in-house conncections. We have never used the encapsulation mode in the standard TCP/IP. How do we set this up correctly?
Thank you,
Jim
Jim
Forgot to add that these connections have been in place for years.
Jim
Charlie,
We tried recompiling the pdl to no avail the process panicked about an hour later. It only seems to crash when there are over 100 messages queued.
Jim
If crashing with messages queued then it does not seem to be the PDL. There is no queing within the PDL.
Have you tried turning up full EO to see what is happening just prior to the crash?
If you want to try the encapsulated mode:
Protocol:tcpip -> Properties
Encapsulated -> Configure
Just accept defaults – should be OK for standard MLLP
Charlie,
It seems that when a thread is in an opening or up (but not connected) status and the queue depth is greater than 100 process panics. We have changed the engine output to enable_all and submitted the logs and core dumps to both McKesson and Infor. I have had no resolution yet, but still hoping they come up with some idea of what is wrong.
Jim
Jim
Cahrlie,
I have changed 2 of the affected thread to use the TCP/IP with encapsulation, but have experienced no outages since the change.
Jim
Jim
I mostly quit using straight TCP/IP connections with length encoding a good while back even though they are more efficeint.
The motivation to stop using them was due to the undesired side effect that the log file gets pounded when they aren’t connected.
It does not take long for the log file to grow to the max allowed size of 1 GB on our server, which could result in a process crash/hang.
We now have a best practice to configure our processes to cycle logs at 50 MB inside the NetConfig to also catch other run away log output.
I don’t know if this undesired side effect lives on in current versions of cloverleaf because I waved good by to TCP/IP lenght ecnoded connections for our internal hops a long long time ago.
We are using the less efficient PDL/MLP encapuslated connections with a flat “ACK”, granted more wastefull on the machine but more robust to support.
Russ Ross
RussRoss318@gmail.com
Fellow Clovertechers
As it turns out this is a bug in 6.0.2 which manifests itself when you have a connection that is either up or opening and several hundred messages queued for that thread. The process panics and produces a core dump. This is what support had to say;
“R&D told me this was a bug that was reported and the fix will be in 6.0.3 which should be coming out soon. They suggest to go back to 6.0 until the patch is available. “
Jim
Jim
Greetings,
We are running the 6.1.0 Integrator on our Test server right now and a bit puzzled that they will patch the base 6.0 Release?
No mention of 6.1 currently available?
With its planned patch of .01?
Thanks.
Bob, as far as I’m aware this PDL issue is resolved in 6.1.
We’ve just released 6.0.3 which resolves this in 6.0.
We provide official support for both the 6.1 and 6.0 releases – which means we provide patches for both.
Rob Abbott
Cloverleaf Emeritus
Rob,
I go get concerned about fixes in earlier releases that may not be passed along in a later release.
Thanks for the confirmation here.
No problem 🙂 Actually the fix for 6.0.3 was backported from 6.1. So you should be good to go testing 6.1.
Rob Abbott
Cloverleaf Emeritus
Are you sure this issue is fixed in 6.1? I have multiple core dumps in my 6.1 instance.
This does not appear to be resolved in 6.1. I am getting core dumps as well relgaed to signal exception exceptions.
Any advice on how to resolve this in 6.1 would be appreciated.
[pdl :PDL :ERR /0:to_sgsound_adt:09/21/2015 15:18:09] PDL signaled exception: code 1, msg write failure
[pti :sign:WARN/0:to_sgsound_adt:09/21/2015 15:20:14] Thread 3 ( to_sgsound_adt ) received signal 11
[pti :sign:WARN/0:to_sgsound_adt:09/21/2015 15:20:14] PC = 0x10025714
Serious one this morning….impacting lab…didn’t catch right away as well as messages disappeared????
Running 6.1.1……
Incident : 9015406 Summary: process crash 09/30/2015
Bob
I did not recompile PDls from 585 to 61……(did I see that as a recommendation) ?
/etc/security/limits
hci:
fsize = 2097151
core = 2097151
cpu = -1
nofiles = 10000
rss = -1
stack = -1
data = -1
BUT…I did not necessarrily restart )all the) processes after the above limits change…..so there is hope…..but it does beg the question…why the need to raise for 611 ?
If I recall, I believe the ‘hcirootcopy’ process recompiles the pdl’s for you when you promote the sites.
Jim Cobane
Henry Ford Health
I think youre correct Jim
We are running Cloverleaf 6.1.1 and we have an issue with a process going down in a signal 11 when we bounce an UP thread with a few hundred messages in it. Should this be happening in 6.1.1? Is there a patch we are not aware of?
If this is only happening for the one process, you may want to take a look at any of the procs/tcl that may be employed on those associated threads. Particularly something that may be running in ‘start’ mode context within a proc since it appears to be happening when bounce a thread.
Jim Cobane
Henry Ford Health
Guinea pig: Infor has delivered to me a possible fix for 611,
We have run without incident in the test environments. Plan on moving to prod end of next week.
When GA?
Setting to solution proposed for applying 6.1.2.1 when available.
In the interim:
Change your protocol to straight tcp/ip encapsulation,,,,,,from pdl… pdl is the issue
Bob