WARNING: engine terminating due to disk space shortage

Clovertech Forums Cloverleaf WARNING: engine terminating due to disk space shortage

Tagged: ,

  • Creator
    Topic
  • #118490
    Jeff Dawson
    Participant

      Never seen this the type of error before, process crashed which receives RTF documents from Epic, which uses tcp/ip multi-server configuration.  The Warning in the log seems to be misleading, I checked our disc space and its not close to be full in PRD.  Has anyone come across this type of error before?

      Running CIS 6.2.6.1 on AIX 7.2 TL2

       

      02/09/2021 14:00:02
      WARNING: engine terminating due to disk space shortage
      [cmd :cmd :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] Receiving a command
      [prod:prod:INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02] Checking for leaked handles in the Xlate interpreter…
      [tcl :out :INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02]
      [prod:prod:INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:Fepic_rtf_xlate:02/09/2021 14:00:02]
      [prod:prod:INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02]
      [prod:prod:INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02] Checking for leaked handles in the TPS interpreter…
      [tcl :out :INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:TSAA_tran_rtf:02/09/2021 14:00:02]
      [prod:prod:INFO/0: TSO_rtf_T:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0: TSO_rtf_T:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0: TSO_rtf_T:02/09/2021 14:00:02]
      [prod:prod:INFO/0: TSO_rtf_T:02/09/2021 14:00:02] Checking for leaked handles in the TPS interpreter…
      [tcl :out :INFO/0: TSO_rtf_T:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0: TSO_rtf_T:02/09/2021 14:00:02]
      [prod:prod:INFO/0:TSO_trans_rtf:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0:TSO_trans_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:TSO_trans_rtf:02/09/2021 14:00:02]
      [prod:prod:INFO/0:TSO_trans_rtf:02/09/2021 14:00:02] Checking for leaked handles in the TPS interpreter…
      [tcl :out :INFO/0:TSO_trans_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:TSO_trans_rtf:02/09/2021 14:00:02]
      [prod:prod:INFO/0: FEtrans_rtf:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0: FEtrans_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0: FEtrans_rtf:02/09/2021 14:00:02]
      [prod:prod:INFO/0: FEtrans_rtf:02/09/2021 14:00:02] Checking for leaked handles in the TPS interpreter…
      [tcl :out :INFO/0: FEtrans_rtf:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0: FEtrans_rtf:02/09/2021 14:00:02]
      [cmd :cmd :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] Shutting down command thread Fepic_rtf_cmd.
      [prod:prod:INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] Checking for leaked handles in the NCI interpreter…
      [tcl :out :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02]
      [prod:prod:INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] Checking for leaked handles in the General interpreter…
      [tcl :out :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] <No active handles>
      [tcl :out :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02]
      [cmd :cmd :INFO/0:Fepic_rtf_cmd:02/09/2021 14:00:02] Removing engine command port file.
      [prod:prod:INFO/0: STARTUP_TID:02/09/2021 14:00:02] Removing pid file.
      [prod:prod:INFO/0: STARTUP_TID:02/09/2021 14:00:02] Engine process 31458002 is terminating
      [prod:prod:INFO/0: STARTUP_TID:02/09/2021 14:00:02] Ended at Tue Feb 9 14:00:02 2021

      • This topic was modified 3 years, 11 months ago by Jeff Dawson.
    Viewing 5 reply threads
    • Author
      Replies
      • #118492
        Jerry Sawa
        Participant

          Hi Jeff.    Sorry, don’t mean to sound condescending, but when you’re checking the disk space, you’re checking for that specific volume/file system, right?  Not the disk space for the entire server?

          • #118496
            Jeff Dawson
            Participant

              Hi Jerry,

              No offense taken and appreciate the thought, I checked all our file systems on our AIX server, the FS in question is listed below the output of the df -g command.

               

              <td width=”74″>Filesystem<td width=”66″>GB blocks<td width=”35″>Free<td width=”64″>%Used<td width=”64″>Iused<td width=”64″>%Iused<td width=”86″> Mounted on

              /dev/fslv01 610 488 20% 258715 1% /cis

               

              Hmmm Take two on a better formatted output

              Filesystem – /dev/fslv01

              GB blocks  – 610

              Free – 488.31

              %Used – 20%

              Iused – 258715

              %Iused – 1%

              Mounted on –     /cis

              • This reply was modified 3 years, 11 months ago by Jeff Dawson.
          • #118493
            Jay Hammond
            Participant

              I don’t remember that error exactly, but I know that our Epic (between versions 2017 and November 2020) had some pretty nasty issues with RTF document sizes (Wound Care I believe with embedded pics) that would time out between Epic and Cloverleaf.  You might check the size of the file being sent to make sure that it isn’t at least part of the issue you’re seeing.

              • #118498
                Jeff Dawson
                Participant

                  Jay that’s an excellent thought, i’ll check back to the time the process crashed and see if there are any file sizes that seem out of the ordinary.  One other note the thread receiving this data isn’t doing any type of translation or major processing.

              • #118499
                Jeff Dawson
                Participant

                  It looks like that is the prime suspect now, the SMAT file for that time period is 1.7 GB.  Looking at the RTF messages during that time frame i see a steady flow of 27kb sized messages then we get a few 2.2MB RTF hl7 messages before the crash occurred.  The question I have now is there any type of setting that could possible increase the memory of a process if this is hitting a thresh hold.  At this point I’m going to open a support ticket as well since this occurred in our Production environment and see if they have any suggestions too.  I’ll make sure to update with any findings.

                • #118501
                  Jeff Dawson
                  Participant

                    Here is Infor supports reply, just a note our shell is ksh so the bash command also works in our .profile file located under /home/hci/.profile.

                     

                    “Doing some research, it seems this is a known bug.

                    The official fix is scheduled for CIS 20.1 with AR24896. In the meantime, this should get you through it and back to normal functionality…

                    We can add an env var to the .profile for hci

                    for csh, set it with command “setenv IGNORE_FSFULL 1”
                    for bash, with the command “export IGNORE_FSFULL=1″

                    Add the above line, whichever works for your system, to the .profile for hci.
                    Open a new terminal and login as hci.
                    Type: echo $IGNORE_FSFULL, if it returns a value, it is set, restart your host and engines. If it does not return a value, use the other style entry.”

                    Service Team indicated the following:

                    I would stop and restart the process that it affects s, so it picks up the env variable.
                    Also, you have to restart the hostserver with the env var set.

                  • #118544
                    Matthew Brophy
                    Participant

                      fwiw – we ran into this issue a few years ago (cis6.1.2) with large documents ending up crashing the site the processes our mdms.

                      The killer is loading the messages into recovery db (x2), processing the messages while also smatting.  What we ended up doing was setting a GUI alert to shut down the incoming threads that feed our document repository (OnBase) if the outbound queue is greater than 200 msgs.  This RARELY happens, but its a life saver because the alternative is crashing the site and potentially losing thousands of message if they aren’t saved off appropriately.

                       

                      We have a two dozen receive threads that feed this OnBase connection so if a system started flooding us with MDMs/PDFs we could prevent a system crash

                    • #119666
                      Matthew Rasmussen
                      Participant

                        Hey, so we had the same issue recently.  I’m going to copy the errors that we witnessed for search indexing, as Jim had a hard time finding this post.  But first, here’s an updated response from Infor for the particular issue we were seeing:

                        The following solution might help to answer questions about the incident or resolve the issue associated with it: 2153485

                        Event Note: Matthew,
                        Can we please try the following?
                        For the hci profile please modify the profile.local.end and change the following entries
                        From
                        export IGNORE_MEMCHECK=1
                        export IGNORE_FSFULL=1
                        export MALLOC_CHECK=0

                        to

                        #export IGNORE_MEMCHECK=1
                        #export IGNORE_FSFULL=1
                        #export MALLOC_CHECK=0

                        Here are the errors we found in our logs:

                        WARNING: engine terminating due to disk space shortage
                        [pd :open:ERR /0:zzz_a360_ib_rte:04/06/2022 12:32:56] [0.0.354731441] Unable to complete inbound save due to full disk, engine is terminating.

                        [pd :open:ERR /0:zzz_ib_EpicHMS_ADT:04/06/2022 12:40:59] Initializing secondary for [THREAD].in when not present
                        [pd :open:ERR /0:zzz_ib_EpicHMS_ADT:04/06/2022 12:48:38] [0.0.761496171] Unable to complete inbound save due to full disk, engine is terminating.
                        [pd :open:ERR /0:zzz_ib_EpicHMS_ADT:04/06/2022 12:48:38] Attempting to start an invalid transaction in [SMATDB file]
                        [pd :open:ERR /0:zzz_ib_EpicHMS_ADT:04/06/2022 12:48:38] Unable to start transaction while wrtiting to db [SMATDB file]
                        [pd :open:ERR /0:zzz_ib_EpicHMS_ADT:04/06/2022 12:48:38] Initializing secondary for [SMATDB file] when not present

                        [dbi :dbi :ERR /0:EPIC_945504_IN:04/06/2022 12:26:32] dbiWriteLogMsg: database (/Cloverleaf/cis19.1/integrator/[siteName]/exec/databases/) disk is > 10240 kilobytes full – engine terminating! err:0
                        [dbi :dbi :WARN/0:EPIC_945504_IN:04/06/2022 12:26:32] [0.0.294414534] dbiWriteMsgToRecoveryDb: failed inserting a recovery db record; try again
                        [dbi :dbi :WARN/0:EPIC_945504_IN:04/06/2022 12:26:32] [0.0.294414534] dbiWriteMsgToRecoveryDb: failed inserting a recovery db record; try again
                        [dbi :dbi :ERR /0:EPIC_945504_IN:04/06/2022 12:26:32] [0.0.294414534] dbiWriteMsgToRecoveryDb: failed inserting a recovery db record; err 4

                    Viewing 5 reply threads
                    • You must be logged in to reply to this topic.