Delivery to xlate thread dohbeds_out_xlate failed. Requeuing msg. iclErr=3

This topic has 7 replies, 3 voices, and was last updated 3 months, 2 weeks ago by Jerry Sawa.

Creator

Topic
March 23, 2026 at 1:52 pm #122409
Peter Heggie
Participant
Has anyone seen this?

I see the same line over and over. From midnight to noon, 2048M of process log. I recycled the process and all the messages went through. So this is some kind of loop, because nothing moved during that time, but there were no errors either.

Peter

Peter Heggie
PeterHeggie@crouse.org
Creator

Topic

Viewing 6 reply threads

Author

Replies
- March 24, 2026 at 8:57 am #122410
  John Mercogliano
  Participant
  Hi Peter,
  
  A 2GB log file is pretty big, do you have auto cycling for logs and smat turned on for the process? I remember on aix we had issues with the interface hanging when any file got to 2GB because of AIX cloverleaf being 32bit.
  
  John Mercogliano
  Semi Retired, contractor
  Hampton Roads, VA
- March 24, 2026 at 9:14 am #122411
  Peter Heggie
  Participant
  thank you – yes we recycle the processes every night. Also I was able to tail and head the process log and I found those messages starting showing up in the log fairly close to the top of the log. I have debug statements in procs at various UPOCs, and I did see some of these write to the log before the error messages started. I looked at the debug output and I didn’t see any obvious problems like non-printable characters, etc. There are only three messages sent every other hour, each around 6500 bytes, so not a lot of volume. Copies are written to the log. They contain a lot of square brackets and curly braces but that shouldn’t be a problem. Its been running like this since last September, never saw this error before. And yes we are on AIX as well.
  
  Peter
  
  Peter Heggie
  PeterHeggie@crouse.org
- March 24, 2026 at 1:49 pm #122412
  John Mercogliano
  Participant
  Peter, the fact that the log file hit 2GB and you are on AIX would lead me to believe that this is the reason the messages stopped processing. I would check the cycle settings for that process in the netconfig to ensure they are checked as you definitely don’t want your logs getting to 2GB.
  
  John Mercogliano
  Semi Retired, contractor
  Hampton Roads, VA
- March 24, 2026 at 1:58 pm #122413
  Peter Heggie
  Participant
  we run shell scripts on AIX that recycle all sites, all processes.
  
  the last successful message transmission was around 00:18 and the process was recycled at 00:55. And about a minute after that the log started filling up with those entries. Will have to see if the entries continued to post or stopped at some point.
  
  Peter Heggie
  PeterHeggie@crouse.org
- March 24, 2026 at 6:13 pm #122414
  Jerry Sawa
  Participant
  Peter, your log reach 2GB because of the error, correct?
  
  When you stopped/started the process then all worked as expected.
  
  Could you send a portion of the process log from when process was started to the point where you started getting the error?
  
  Quite often we get error “ICL initialization failed” when starting a process. We have code that checks for ICL issues upon startup of process. If found, we stop and restart the process.
  - This reply was modified 3 months, 2 weeks ago by Jerry Sawa.
- March 25, 2026 at 11:04 am #122416
  Peter Heggie
  Participant
  yes – we ran our normal recycle at 00:53 and had 48 normal startup messages in the log, down to here:
  
  [prod:prod:INFO/0:dohbeds_out_cmd:03/23/2026 00:53:27] Log History feature is enabled.
  
  Then we started to see the icl messages:
  
  [xlt :thre:ERR /0:dohbeds_out_xlate:03/23/2026 00:53:27] XLATE ICL server open failed, iclErr=1
  
  [pd :pdtd:ERR /0:fr_dohbeds_db2:03/23/2026 00:53:30] Sending ICL cmd to ‘dohbeds_out_xlate’ failed, iclErr=3
  
  There were no more of these messages and there were a few more normal process startup messages, ending with this:
  
  [cmd :cmd :INFO/0:dohbeds_out_cmd:03/23/2026 00:53:43] Command client went away. Closing connection.
  
  At 2:16:18 there was an application msg processed. This would have been the first application activity since startup, and it was expected. There was debugging output (sorry for the length):
  
  [tcl :out :INFO/0:fr_dohbeds_db2:03/23/2026 02:16:18] tpsPrintMsg: 2026-03-23 02:05:04.0,2026-03-23 02:16:04.607,,CRH,ready,ip,9C902EC5-7F26-F111-8882-005056957E5C,Yes,12,0,0,30,26.90,56,51,1.45,5,21,26,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,267,295,62,67,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,210,235,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,36,41,0,0,0,0,46,51,16,21,0,0,0,0,61,0,1.45,12,26.90,51,312
  [tcl :out :INFO/0:fr_dohbeds_db2:03/23/2026 02:16:18] DOH_BedAvail_sending_Update updated guid: 9C902EC5-7F26-F111-8882-005056957E5C status: sending result: 0
  
  Right after this, we got the “delivery failed” message:
  
  [pd :pdtd:ERR /0:fr_dohbeds_db2:03/23/2026 02:16:18] [0.0.7755725] Delivery to xlate thread dohbeds_out_xlate failed. Requeuing msg. iclErr=3
  
  At this point we started getting approximately 500 of these per second.
  
  There were two more sets of application messages, as expected, just a few minutes apart, as expected. There were error messages interspersed between the application messages.
  
  Then the error messages continued until 04:13. it seems like we were getting around 3100 error messages per second.
  
  At 04:13 the last line was truncated, so I think it stopped there – space, memory, whatever – just stopped, no crash, no panic, just stopped.
  
  I restarted around 12:20 and all was fine, all application messages that attempted to go, did go through, successfully.
  
  I have disabled the debugging code, in case that contributed to the issue, but its been running without changes since September.
  
  I do have ksh shell scripts that monitor some process logs, looking for Java errors, so I could look for this also and raise an alert or maybe just recycle the process.
  
  Peter
  
  Peter Heggie
  PeterHeggie@crouse.org
- March 25, 2026 at 11:27 am #122417
  Jerry Sawa
  Participant
  [xlt :thre:ERR /0:dohbeds_out_xlate:03/23/2026 00:53:27] XLATE ICL server open failed, iclErr=1
  
  “XLATE ICL server open failed, iclErr=1” is one of the errors we check for after restarting a process. If it happens, the error appears in log immediately after starting the process. If we find that error in the log, we recycle the process. We’ve had that in place for several years and has worked for us 100% of the time.
  - This reply was modified 3 months, 2 weeks ago by Jerry Sawa.
Author

Replies

Viewing 6 reply threads

You must be logged in to reply to this topic.