Engine process core dump

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Engine process core dump

  • Creator
    Topic
  • #54487
    Jim
    Participant

      We are getting the following error messages and then a core dump on the process. Has anyone experienced anything similar? There 32 of the write timeout expired messages prior to the core dump.

      [pdl :PDL :ERR /0:hie_sof_mdm_ob:12/07/2014 01:05:35] write timeout expired

      [pdl :PDL :ERR /0:hie_sof_mdm_ob:12/07/2014 01:05:35] PDL signaled exception: code 1, msg write failure

      [pti :sign:WARN/0:hie_sof_mdm_ob:12/07/2014 01:05:35] Thread 24 ( hie_sof_mdm_ob ) received signal 11

      [pti :sign:WARN/0:hie_sof_mdm_ob:12/07/2014 01:05:35] PC = 0xffffffff

      Jim

    Viewing 20 reply threads
    • Author
      Replies
      • #81679
        Charlie Bursell
        Participant

          Signal 11, or officially know as “segmentation fault”, means that the program accessed a memory location that was not assigned.

          Is this one of the “standard” Cloverleaf PDLs or or one written by you?  What version of Cloverleaf?  Has this thread been running successful for a time before this started?  Any recent changes?

          You might try recompiling or better still, if it is an MLP PDL, use the encapsulation mode in the standard TCP/IP protocol

        • #81680
          Jim
          Participant

            Charlie we are upgraded to 6.0.2 on 11/8 and this is the standard mlp_tcp.pdl. It seem at first to be just our VPN connections but then it happened again with one of our in-house conncections. We have never used the encapsulation mode in the standard TCP/IP. How do we set this up correctly?

            Thank you,

            Jim

            Jim

          • #81681
            Jim
            Participant

              Forgot to add that these connections have been in place for years.

              Jim

            • #81682
              Jim
              Participant

                Charlie,

                We tried recompiling the pdl to no avail the process panicked about an hour later. It only seems to crash when there are over 100 messages queued.

                Jim

              • #81683
                Charlie Bursell
                Participant

                  If crashing with messages queued then it does not seem to be the PDL.  There is no queing within the PDL.

                  Have you tried turning up full EO to see what is happening just prior to the crash?

                  If you want to try the encapsulated mode:

                  Protocol:tcpip -> Properties

                  Encapsulated -> Configure

                  Just accept defaults – should be OK for standard MLLP

                • #81684
                  Jim
                  Participant

                    Charlie,

                    It seems that when a thread is in an opening or up (but not connected) status and the queue depth is greater than 100 process panics. We have changed the engine output to enable_all and submitted the logs and core dumps to both McKesson and Infor. I have had no resolution yet, but still hoping they come up with some idea of what is wrong.

                    Jim

                    Jim

                  • #81685
                    Jim
                    Participant

                      Cahrlie,

                      I have changed 2 of the affected thread to use the TCP/IP with encapsulation, but have experienced no outages since the change.

                      Jim

                      Jim

                    • #81686
                      Russ Ross
                      Participant

                        I mostly quit using straight TCP/IP connections with length encoding a good while back even though they are more efficeint.

                        The motivation to stop using them was due to the undesired side effect that the log file gets pounded when they aren’t connected.

                        It does not take long for the log file to grow to the max allowed size of 1 GB on our server, which could result in a process crash/hang.

                        We now have a best practice to configure our processes to cycle logs at 50 MB inside the NetConfig to also catch other run away log output.

                        I don’t know if this undesired side effect lives on in current versions of cloverleaf because I waved good by to TCP/IP lenght ecnoded connections for our internal hops a long long time ago.

                        We are using the less efficient PDL/MLP encapuslated connections with a flat “ACK”, granted more wastefull on the machine but more robust to support.

                        Russ Ross
                        RussRoss318@gmail.com

                      • #81687
                        Jim
                        Participant

                          Fellow Clovertechers

                          As it turns out this is a bug in 6.0.2 which manifests itself when you have a connection that is either up or opening and several hundred messages queued for that thread. The process panics and produces a core dump. This is what support had to say;

                          “R&D told me this was a bug that was reported and the fix will be in 6.0.3 which should be coming out soon. They suggest to go back to 6.0 until the patch is available. “

                          Jim

                          Jim

                        • #81688
                          Bob Richardson
                          Participant

                            Greetings,

                            We are running the 6.1.0 Integrator on our Test server right now and a bit puzzled that they will patch the base 6.0 Release?

                            No mention of 6.1 currently available?

                            With its planned patch of .01?

                            Thanks.

                          • #81689
                            Rob Abbott
                            Keymaster

                              Bob, as far as I’m aware this PDL issue is resolved in 6.1.

                              We’ve just released 6.0.3 which resolves this in 6.0.

                              We provide official support for both the 6.1 and 6.0 releases – which means we provide patches for both.

                              Rob Abbott
                              Cloverleaf Emeritus

                            • #81690
                              Bob Richardson
                              Participant

                                Rob,  

                                I go get concerned about fixes in earlier releases that may not be passed along in a later release.

                                Thanks for the confirmation here.

                              • #81691
                                Rob Abbott
                                Keymaster

                                  No problem 🙂  Actually the fix for 6.0.3 was backported from 6.1.  So you should be good to go testing 6.1.

                                  Rob Abbott
                                  Cloverleaf Emeritus

                                • #81692
                                  Alice Kazin
                                  Participant

                                    Are you sure this issue is fixed in 6.1?   I have multiple core dumps in my 6.1 instance.

                                  • #81693
                                    Ted Viens
                                    Participant

                                      This does not appear to be resolved in 6.1.  I am getting core dumps as well relgaed to signal exception exceptions.

                                      Any advice on how to resolve this in 6.1 would be appreciated.

                                      [pdl :PDL :ERR /0:to_sgsound_adt:09/21/2015 15:18:09] PDL signaled exception: code 1, msg write failure

                                      [pti :sign:WARN/0:to_sgsound_adt:09/21/2015 15:20:14] Thread 3 ( to_sgsound_adt ) received signal 11

                                      [pti :sign:WARN/0:to_sgsound_adt:09/21/2015 15:20:14] PC = 0x10025714

                                    • #81694
                                      Bob Schmid
                                      Participant

                                        Serious one this morning….impacting lab…didn’t catch right away as well as messages disappeared????

                                        Running 6.1.1……

                                        Incident : 9015406  Summary: process crash   09/30/2015

                                        Bob

                                        I did not recompile PDls from 585 to 61……(did I see that as a recommendation) ?

                                        /etc/security/limits

                                        hci:

                                               fsize = 2097151

                                               core = 2097151

                                               cpu = -1

                                               nofiles = 10000

                                               rss = -1

                                               stack = -1

                                               data = -1

                                        BUT…I did not necessarrily restart )all the) processes after the above limits change…..so there is hope…..but it does beg the question…why the need to raise for 611 ?

                                      • #81695
                                        James Cobane
                                        Participant

                                          If I recall,  I believe the ‘hcirootcopy’ process  recompiles the pdl’s for you when you promote the sites.

                                          Jim Cobane

                                          Henry Ford Health

                                        • #81696
                                          Bob Schmid
                                          Participant

                                            I think youre correct Jim

                                          • #81697
                                            Mark Stancil
                                            Participant

                                              We are running Cloverleaf 6.1.1 and we have an issue with a process going down in a signal 11 when we bounce an UP thread with a few hundred messages in it.  Should this be happening in 6.1.1?  Is there a patch we are not aware of?

                                            • #81698
                                              James Cobane
                                              Participant

                                                If this is only happening for the one process, you may want to take a look at any of the procs/tcl that may be employed on those associated threads.  Particularly something that may be running in ‘start’ mode context within a proc since it appears to be happening when bounce a thread.

                                                Jim Cobane

                                                Henry Ford Health

                                              • #81699
                                                Bob Schmid
                                                Participant

                                                  Guinea pig: Infor has delivered to me a possible fix for 611,

                                                  We have run without incident in the test environments. Plan on moving to prod end of next week.

                                                  When GA?

                                                  Setting to solution proposed for applying 6.1.2.1 when available.  

                                                  In the interim:

                                                  Change your protocol to straight tcp/ip encapsulation,,,,,,from pdl… pdl is the issue

                                                  Bob

                                              Viewing 20 reply threads
                                              • The forum ‘Cloverleaf’ is closed to new topics and replies.