Hello,
Incase anyone else may be running CIS 20.1.1.3, we are running this on AIX 7.2 TL 4. Currently we are in the upgrade process and started validating by sending messages through our new environment. While sending Epic PDF and RTF messages we noticed messages were getting stuck between threads that used protocl:tcpip encapsulation mllp2 with a default timeout of 30 seconds for internal interface traffic or if we need to route messages from one site to another. The pdf in question we found stuck holding up traffic in the client thread was a size of 6MB which doesn’t seem like an unreasonable size to me, we tried cycling the process, cycling both sides of the client/server threads however nothing seemed to work. Besides recieving PDF from Epic we have quite a few other systems that send our engine PDF’s, Epiphany, Digisonic, Varian, Sunquest Copath reports are a few examples.
Below is some of the logging we enabled to see if we could get a look at what was going on behind the scene.
[tcp :wrte:DBUG/0: TSO_trans_pb:06/13/2022 12:47:50] Start MLP v2 reply wait
[tcp :read:ERR /0: TSO_trans_pb:06/13/2022 12:48:20] Tcp MLLP2/USER2 send timed out waiting for reply.
[pd :pdtd:INFO/1: TSO_trans_pb:06/13/2022 12:48:20] Executing callback function for writing partial message
[pd :pdtd:INFO/1: TSO_trans_pb:06/13/2022 12:48:20] [0.0.112195] Writing message completed
[pd :thrd:INFO/0: TSO_trans_pb:06/13/2022 12:48:20] [0.0.112195] Requeuing undelivered message
[msg :Msg :INFO/0: TSO_trans_pb:06/13/2022 12:48:20] [0.0.112195] Updating the recovery database
[dbi :rlog:INFO/1: TSO_trans_pb:06/13/2022 12:48:20] [0.0.112195] Update msg in recovery db to state OB post-SMS
[pd :thrd:INFO/1: TSO_trans_pb:06/13/2022 12:48:20] OB-Data queue has 1 msgs
We ended up opening a ticket with Infor and have been working with support, they had me change the timeout setting under Data Options > Encapsulated > Configure > Timeout from 30 seconds to 120. *Note before we tried bumping this up to 60 and 90 seconds with no luck then infor recommended 120. After making this change the message was finally sent but after a few minutes. Also depending on the system the message could also be sent to another pair of threads to avoid process to process communication that used this same mllp2 configuration which adds another few minutes to the delay.
Back on CIS 6.0 and previous QDX versions we used to use PROTOCOL:pdl-tcpip PDL: mlp2_tcp.pdl but had switched this out per Infor’s recommendation to use the protocol:tcpip. I tried switching back to the PDL protocol and confirmed the 6mb PDF was sent in a few seconds, we tried a 44mb PDF and it took around 20 seconds but figure some of that time the message needs time to read in.
We met with Infor and they went over another option of inter-site routing which also appeared to work but had some issues on the setup and experienced quite a few lengthy delays when trying to setup this option when going through the configuration points in CIS 20.1. The CIS 20.1 help document for inter-site routing I feel needs to be updated to state what specifically needs to be cycled to get this working host server, monitor daemon, etc still unclear. We worked through a few ICL thread errors which not for sure why they occurred. This is another option that could be used for site to site communication however we still would have the issue of any tcpip threads being used to avoid process to process communication within a site, which we use quite heavily in our interface engine.
Infor is going to escalate this issue and test this out with a few large PDF messages to see if this is an application defect. One other note we currently are running CIS 6.2.6.1 and this issue does not exist even with protocol:tcpip mllp2 set with a timeout of 30 seconds where in CIS 20.1 the message would stay stuck in the client thread.
Jeff