connect to process error

This topic has 10 replies, 6 voices, and was last updated 12 years, 3 months ago by Peter Heggie.

Creator

Topic
August 22, 2006 at 1:13 pm #48724
Kevin Scantlan
Participant
The last couple of weeks we started getting these type of error message on both our production and test sites. We traced it down to hcicmd contains this error message. We can create this error on command. However, we get the same type of error when we use hcisitectl to bring down the monitor and hciprocstatus when we take down a process. We bounced the monitor daemon and also bounced a process for this example, but to no avail. It’s not just this process that is getting the error.

Example:

Unable to contact process ‘test2_ps16’ on port 56732

Error was: A remote host refused an attempted connect operation.

Try to connect with its-goofy

Response:

pstop issued for thread ‘gecard_ci_16’

We are not aware of any changes that we’ve made in prod and test that we can point to in the last 2 weeks.

Thanks.
Creator

Topic

Viewing 9 reply threads

Author

Replies
- August 22, 2006 at 5:12 pm #59498
  Anonymous
  Participant
  Kevin,
  
  Assuming you are running on the Unix, try the following command.
  
  netstat -an | grep 56732
  
  This would show on your side whether any threads are attepting to connect to this port or listen loop or connected.
  
  Once you have that cleared up, perhaps you may need to work with network folks why it is not able to establish a connection.
  
  Also I would try a enable_all config in the process level and bounce the threads. A detail log would be easier to trace thru.
  
  Hope this helps.
  
  Reggie
- August 23, 2006 at 2:03 pm #59499
  Kevin Scantlan
  Participant
  I turned “enable all” on for the process shown below, but did not see anything in the process log that stood out. I did the netstat -an command as suggested and you can see the output below. Anyone see anything that suggests a problem?
  
  *************************************
  
  [test2]/hci/qdx5.2/integrator/test2/exec/processes/test2_ps16>cat cmd_port
  
  50867
  
  [test2]/hci/qdx5.2/integrator/test2/exec/processes/test2_ps16>ep 50867 < tcp4 0 0 *.50867 *.* LISTEN [test2]/hci/qdx5.2/integrator/test2/exec/processes/test2_ps16>ci_16 pstop’ < Unable to contact process ‘test2_ps16’ on port 50867 Error was: A remote host refused an attempted connect operation. Try to connect with its-goofy Response: pstop issued for thread ‘gecard_ci_16’ [test2]/hci/qdx5.2/integrator/test2/exec/processes/test2_ps16>netstat -an | g>
  
  tcp4 0 0 *.50867 *.* LISTEN
  
  tcp4 0 0 161.130.112.91.57814 161.130.112.91.50867 TIME_WAIT
  
  [test2]/hci/qdx5.2/integrator/test2/exec/processes/test2_ps16>netstat -an | g>
  
  tcp4 0 0 *.50867 *.* LISTEN
  
  ***********************************
- August 23, 2006 at 2:07 pm #59500
  Jim Kosloskey
  Participant
  Kevin,
  
  This is just a wild guess but could this port actually be in use by an application (even a Cloverleaf(R) inbound or outbound thread)?
  
  Jim Kosloskey
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- August 23, 2006 at 2:22 pm #59501
  Kevin Scantlan
  Participant
  I can’t image another application having control of the port that’s in the cmd_port file. Additionally, it seems to happen with every with every process and also the monitor daemon when I do the hcisitectl. So it happens with hcicmd, hcienginerun, hcienginestop, hcisitectl to name a few. And it’s happening on both our production and test machines.
- August 24, 2006 at 2:21 pm #59502
  Kevin Scantlan
  Participant
  We came upon the solution while dealing with something we though unrelated. Here’s the deal:
  
  the hcicmd command has a “-h” parameter, which we never use, for the host. If left off, it defaults to LOCALHOST. We have LOCALHOST set up in our hosts file pointing to 127.0.0.1 as we should. However, our sys admin tells us that AIX by default first looks at the DNS to resolve names, then looks at the hosts file. This has not been a problem until someone in the network added a DNS entry that had localhost.xxx.yyy (xxx.yyy being our domain), so we were trying to connect to that IP to send the hcicmd to.
  
  Our solution has been to have AIX first look at the hosts file, then the DNS for name resolution. Our sys admin is looking into that. Hopefully it will not require a reboot of the server.
- August 24, 2006 at 2:34 pm #59503
  Anonymous
  Participant
  Kevin,
  
  As per log “A remote host refused an attempted connect operation”.
  
  On the destination system, it can connect to one port only.
  
  That means your prod and test interfaces to the destination system must use unique ports.
  
  I think thats where the problem is.
  
  It looks like you posted the contents of the test log. The cmd_port will not give you the exact port for which it is attempting to connect to.
  
  Look in the netconfig and then navigate to destination thread’s properties.
  
  Look in the port number.
  
  Assume the destination port is 5556 and the host ip address is 111.33.44.55
  
  then type the following:
  
  netstat -an | grep 5556
  
  you would see an entry like the following.
  
  tcp4 0 0 155.33.333.50162 111.33.44.55.5556 ESTABLISHED
  
  If the connection is not established, you would see LISTEN.
  
  Your log does not give much information.
  
  Once you enable_all you need to bounce the threads, and processes.
  
  When the panic occurs again, look in the log or upload the log to this forum, so that I can take a look at it and give you the idea. Both prod and test logs are needed to see why the panic occured.
  
  Thanks
  
  Reggie
- May 8, 2013 at 3:16 pm #59504
  Peter Heggie
  Participant
  We just encountered the same problem – multiple commands sent to processes returning this error:
  
  Unable to contact process ‘xyz’ on port 12345.
  
  This was on two different physical servers.
  
  Also impacted an TCL smtp email function, showing this error:
  
  error reading “sock49”: connection timed out
  
  DNS administrator found an entry for localhost – removed it and flushed the cache, and a few minutes later, all was well.
  
  Peter Heggie
  PeterHeggie@crouse.org
- May 9, 2013 at 12:56 pm #59505
  Donna Bailey
  Participant
  Had the same error a couple of weeks ago and we had a process with a problem with a tcl proc….but when I stopped/started threads in my process I had 2 or 3 threads with this error. Ended up using the ps -ef |grep threadname and killed the process (had to use -9 too I believe)…don’t know if this will help you or not…
  
  Donna
  
  Donna Bailey
  Tele: 315-729-3805
  dbailey@microstar.health
  Micro Star Inc.
- May 9, 2013 at 9:50 pm #59506
  Bob Richardson
  Participant
  Greetings,
  
  We are running AIX 6.1 TL 7 Unix of course.
  
  The Cloverleaf software uses the ephemeral port range
  
  as defined by your system admin: default is 32K to 64K.
  
  Avoid using these ports for any interfaces developed in Cloverleaf.
  
  Hcicmd uses this range to get its port for communication with process
  
  threads. Cloverleaf multi-connect server threads (interfaces)
  
  also use the ephmeral range.
  
  Also: as a rule we avoid ranges below 8K as some system services
  
  use these ports for various utilities and services. Check your /etc/services file if you are curious.
  
  We are running the 5.8.5.0 Integrator and plan to apply Revision 6
  
  which fixes some issues with hcicmd and server port problems.
  
  Hope this helps to narrow down your problem.
- May 10, 2013 at 12:22 pm #59507
  Peter Heggie
  Participant
  Thank you – the localhost definition in the network DNS was the problem; removing it and flushing the cache immedately solved our problem. None of our ports are below 8k or above 32k. and yes, when we had the problem, we had to use the command line stop which (eventually) used the sig kill.
  
  Peter Heggie
  PeterHeggie@crouse.org
Author

Replies

Viewing 9 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.