2 Gig log file then process crashes

This topic has 5 replies, 5 voices, and was last updated 17 years, 1 month ago by Russ Ross.

Creator

Topic
May 23, 2008 at 3:51 pm #50068
Andrew Deters
Participant
We receive these errors from one of our feeds. Typically this message is repeated several thousand times per minute. Eventually (roughly 2 hours), the log file and error file grow to 2Gig in size each and the process crashes. That leads to the logical volume running out of space when those crash files are copied to the errors directory.

[prod:prod:WARN/0: engine_cmd:05/21/2008 23:45:17] Output log file cycled

[pdl :PDL :ERR /0: FEED1:05/22/2008 01:46:38] read failed: Connection timed out

[pdl :PDL :ERR /0: FEED1:05/22/2008 01:46:38] read returned error 110 (Connection timed out)

Followed by this message a little later (probably when the connection is restored).

[pdl :PDL :ERR /0: FEED1:05/22/2008 19:28:37] read returned error 0 (Success)

Whatever is happening at this point it is telling me the error message is 0 “Success”.

These feeds are connected via a vpn to a vendor. The EOConfig for these is disable_all. I know it is a protocol error but am at a loss in how to troubleshoot the best way to fix the problem. Has anyone every seen this type of behavior? It does not happen at the same time.
Creator

Topic

Viewing 4 reply threads

Author

Replies
- May 23, 2008 at 4:59 pm #64754
  James Cobane
  Participant
  Andrew,
  
  Why don’t you just turn-on automatic Log cycling on the process? Set the size (in KB) to what seems reasonable, and you should be good to go.
  
  Jim Cobane
  
  Henry Ford Health
- May 23, 2008 at 5:15 pm #64755
  Kevin Kinnell
  Participant
  I’d definitely do as Jim suggested, but I’d look pretty closely at that thread
  
  setup. We had a similar situation with fast growing error logs. In that case,
  
  there were multiple problems, but what allowed us to debug it was to
  
  increase the timeout for ACKs, which kept things sane enough that we could
  
  gather enough information to blame the other side of the connection. 😉
  
  What happens when you “stutter” the thread (bring it up and almost
  
  immediately stop it again)?
- May 23, 2008 at 5:58 pm #64756
  Andrew Deters
  Participant
  I’ll set up the automatic log cycling. They currently cycle once per day. I tried to stop and start the feed repeatedly and do not recieve the results described above. The feeds act as expected.
  
  When i’m able to bring down the VPN tunnel, I’ll attempt stopping and starting the tunnel and see if that doesn’t produce the above results.
  
  Thanks
- May 23, 2008 at 7:05 pm #64757
  Paul Johnston
  Participant
  Andrew,
  
  We have run into the exact same issue a while back . It crashed our system a couple of times.
  
  We developed a UNIX script that examines the size of the Error file . When is gets to a certain size we bound the process.
  
  I like Jim’s solution also .
- May 27, 2008 at 11:55 am #64758
  Russ Ross
  Participant
  It is not surprising that it crashes when the log file size reaches 2 GB’s.
  
  Typically that is the maximum file size defined in the /etc/security/limits file.
  
  Here at MD Anderson Cancer Center it is a mandatory in-house standard for all production processes to have automatic log cycling implemented.
  
  Even if the interface does not have any problems all it takes is to turn on EO config output and forget to turn it off, and our log file will grow to 2 GB long before your daily cycling of our log files get invoked via cron.
  
  Russ Ross
  RussRoss318@gmail.com
Author

Replies

Viewing 4 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.