Inrecoverable socket error

This topic has 4 replies, 4 voices, and was last updated 19 years ago by Richard Hart.

Creator

Topic
May 17, 2005 at 4:12 pm #47750
Janice Criscoe
Participant
I have a process that is receiving the following error message. Is anyone familiar with this type of error. Thanks.

[icl :tcpi:ERR /0:anc_result_cmd] write failed: Broken pipe

[cmd :cmd :INFO/0:anc_result_cmd] Since there are some error had occured while attempted to send an Ack back to client

[cmd :cmd :INFO/0:anc_result_cmd] We try to process the command anyway but no further Ack will be send back to Client

[cmd :cmd :INFO/0:anc_result_cmd] Received command: ‘anc_result_xlate xrel_post’

[cmd :cmd :INFO/0:anc_result_xlate] Doing ‘xrel_post’ command with args ‘‘

[cmd :cmd :INFO/0:anc_result_cmd] Inrecoverable socket error. Closing connection.

isACSII: TRUE
Creator

Topic

Viewing 3 reply threads

Author

Replies
- May 18, 2005 at 5:15 am #56618
  Richard Hart
  Participant
  Janice.
  
  We have had similar errors and raised, through our vendor, a support call to Quovadx.
  
  In our case we did not lose data, it just slowed the communication by about 75%!
- May 18, 2005 at 6:47 pm #56619
  Michael Hertel
  Participant
  Here’s an explanation from the archive:
  
  =========================
  
  Original Message
  
  From: Rob Abbott
  
  Sent: Thursday, June 17, 2004 1:52 PM
  
  To: Technical Issues
  
  Subject: [clovertech] Re: broken pipe
  
  Here’s the explanation:
  
  hcicmd connects to the command thread via a TCP/IP loopback connection.
  
  hcicmd does a “resend” or other expensive operation that keeps a thread busy
  
  for over 30 seconds.
  
  hcicmd waits for acknowledgement from the engine that the command has
  
  completed.
  
  hcicmd times out waiting for ack (30 seconds)
  
  Engine operation completes. Command thread gets control.
  
  Command thread attempts to send acknowledgement back on socket.
  
  hcicmd has gone away. O/S returns “broken pipe” on the socket due to the
  
  client disconnect.
  
  This is a non-fatal error. It’s simply that hcicmd has disconnected before
  
  the engine has had a chance to ACK. When the engine tries to ACK the error
  
  occurs.
  
  You often see this error when starting an engine with a lot of threads. The
  
  reason for this is when each thread starts, it sends a message to each
  
  engine process letting the engine know “I’m alive, if you have any pending
  
  messages for me, please release them” – the command is “xrel_post”.
  
  Since engines with a lot of threads may take a while to start, the “hcicmd
  
  xrel_post” processes will time out, and you’ll see a load of broken pipe
  
  errors once the engine fully starts and is able to ack all the xrel_post
  
  commands it’s receiving.
  
  I hope this helps clear things up. The bottom line is that these broken
  
  pipe errors are non-fatal and should not require an engine bounce or
  
  anything of the sort.
  
  Regards — Rob
  
  ================
  
  Also:
  
  ================
  
  Date: Thu, 17 Jun 2004 14:04:43 -0500
  
  Author: Rob Abbott <Rob.Abbott@quovadx.com>
  
  Subject: Re: broken pipe
  
  Body: I neglected to mention that hcicmd is a perl script. If you want (at your
  
  own risk 🙂) to change the 30-second timeout, look for “my $time=30;” at
  
  around line 254. Change 30 to whatever integer you wish. 0 (zero) means
  
  wait forever.
  
  — Rob
- July 17, 2006 at 5:16 pm #56620
  Kathy Zwilling
  Participant
  We started experiencing this same error in the last week and it has occurred 3 times now so I am anxious to find a way to “fix” it. We are on 5.2 rev. 1.
  
  In our case, the data does stop processing completely like the connections are frozen in the site affected. All connections are showing “up” and “green” but data is not either coming in or going out.
  
  It seems to me that the monitor daemon has to be related to this because the site that is affected is “frozen” with no data processing and when I stop the Monitor Daemon the data starts to flow. Note is does not wait for me to start the monitor daemon back up.
  
  This is happened in 3 different sites in the past week so it is not the same site each time.
  
  Any ideas what might be happening? The messages I am getting are the same as those listed in the email above.
  
  Should I implement a cron to cycle the monitor daemon daily to avoid this?
  
  Thanks for your help!
  
  Kathy Zwilling
- July 18, 2006 at 12:10 am #56621
  Richard Hart
  Participant
  Kathy.
  
  Using ‘cron’ to cycle the monitor daemon would probably help – we don’t use them, so haven’t seen this issue.
  
  My post on this topic a few months back was related to communications between Cloverleaf and an Application socket listener. In our case, it happened after about 7 hours of sending maximum messages (for the receiving application) through and required a thread stop/start to get the communications back on track. We were never given enough time to prove the issue, but the networks guys indicated that the receiving application did not send the TCP ACK back that we were expecting.
Author

Replies

Viewing 3 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.