perplexing irratic ftp behavior

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf perplexing irratic ftp behavior

  • Creator
    Topic
  • #50082
    Jennifer Hardesty
    Participant

      Greetings,

      I’m hoping one of the brilliant minds here can sort out what has become a most perplexing problem.

      We have a number of batch billing jobs using fileset-ftp, which have been running for years without manual intervention.  As we’ve needed a new one, we’ve simply copied the threads, process, tclprocs, translators, etc and made any necessary window-dressing changes for the system/department involved and life has continued pretty smoothly…until now.  The latest one has been in production for two weeks and has required manual intervention every day.  

      What appears to be happening is that that advanced scheduling does not work with the ftp for this process.  Sometimes it looks like it just doesn’t trigger at all and sometimes it looks like it tried to open a connection and the thread goes into “opening status” and never comes out.  However, that’s just what we see on the surface.  Once I turned on sane_eo, things looked even more perplexing to me.  It actually looks to me as if the process connects to the ftp site when I start it up, even through that is unnecessary, and then tries to hold onto that connection until the scheduled time.  I suspect the connection is being lost but the process never recognizes that officially and that is what is causing the actual fetching of the data when the time comes to fail.

      Here are our basics: Cloverleaf 5.4 (AIX v.5 r.2)

      Under the Fileset Options tab in the inbound thread:

      Inbound

      Directory: /

      Style: nl

      Directory Parse: tps_ftp_impact_dft

      Deletion: tps_ftp_no_delete   (we are using this currently only because of the inconsistant behavior of the process)

      Scheduling

      Read Interval: 5

      Max Messages: 500

      Use Advance Sheduling: checked [The shedule is currently set to 19:10:00, but we have been moving it daily (often during working hours) due to having to watch it; we would like it to be 19:30:00 M-F]

      Outbound

      Style: nl

      On the FTP options tab:

      The login and password are filled in correctly.  I can use these to ftp into the site using an ftp tool and inconsistantly through the process.

      FTP Options

      Directory: impact/

      Dir List Command: nlst

      Read Time: 5

      Pulse Time: 500

      Close Connection After Write: checked

      Delay Connection Until Needed   (I don’t think this works as I think it should)

      Data Type: ASCII

      FTP Host Information

      Host: [not the ip address but the host name of the computer]

      Port: ftp

      ######

      I’m including the tclproc for the directory parse.  I had a flag in there to not read the directory on start up but since it won’t work reliably on a timer, I now have to manually start and stop it when asked, so you’ll see the flag commented out.  Maybe someone will see something in there that is buggy.  Also maybe someone can suggest how I can keep it from connecting at startup, which it was doing with or without the flag.

      I’m also including two sane_eo logs I saved.  They were run back to back and prove that the process on a timer is unpredictable.  All I did between stopping and starting it was change the time in the schedule and one time it worked and one time it didn’t.  The time it didn’t work, you can see where it had and unexpected read and failed — that time, the thread just stayed green and then when the scheduled time occured, it went into “openning” and stayed that way.  No disernable error was generated, nothing that would have generated an alert.

      There’s only one more thing I want to add that is different about this connection from the others; it’s something I have no control over and know nothing about.  Aparently our “network guys” set this ftp folder up through a “virtual conduit” so it’s not on the actual server that’s generating the billing files.  I’m told that it should make no difference and that we “see” exactly the same as we would if it was an ftp folder on the actual server.  Has anyone else had experience with this?

      I sure hope someone can provide some wisdom and advice.  I’ve been banging my head on this one for two weeks and I’ve tried every piece of advice I thought I could gleam from this forum.

      Thanks in advance,

      Jennifer Hardesty

      Systems Interface Specialist

      Maine Medical Center

      hardej@mmc.org

      (207) 662-3955

    Viewing 10 reply threads
    • Author
      Replies
      • #64812
        Michael Hertel
        Participant

          First, in the “did not work” file I see:

          [tps :tps :INFO/1:  fr_impac_ft:05/29/2008 14:06:39] tds.string = tps_ftp_impac_dft {CONTEXT fileset_ibdirparse} {ARGS {}} {MODE start} {VERSION 3.0}

          [tps :tps :ERR /0:  fr_impac_ft:05/29/2008 14:06:39] ‘1’ (returned by ‘tps_ftp_impac_dft ‘) does not match { }

          Something bombed in start mode.

          Then I see:

          [tps :tps :INFO/1:  fr_impac_ft:05/29/2008 14:06:46] tds.string = tps_ftp_impac_dft {MSGID message0} {CONTEXT fileset_ibdirparse} {ARGS {}} {MODE run} {VERSION 3.0}

          impac_dft_startup:  1

          [msg :Mid :DBUG/0:  fr_impac_ft:05/29/2008 14:06:46] mid free [0.0.1133256] 0x20c51bc8

          [fset:read:DBUG/1:  fr_impac_ft:05/29/2008 14:06:46] Scheduling IB for 5 second interval

          [is  :ISEv:INFO/0:  fr_impac_ft:05/29/2008 14:06:46] *** Next Interval = 194, Event(1) time = Thu May 29 14:10:00 2008

          [fset:cont:INFO/1:  fr_impac_ft:05/29/2008 14:06:46] Processing FTP ‘Idle’ control state

          [pd  :pdtd:INFO/1:  fr_impac_ft:05/29/2008 14:06:46] Set driver status to PD_STATUS_UP

          [fset:read:DBUG/0:  fr_impac_ft:05/29/2008 14:06:46] The socket is ready.  Out of files; setting IB scan timer

          [is  :ISEv:INFO/0:  fr_impac_ft:05/29/2008 14:06:46] *** Next Interval = 194, Event(1) time = Thu May 29 14:10:00 2008

          [fset:read:INFO/0:  fr_impac_ft:05/29/2008 14:06:46] Integrated Scheduling: next event(0) will occur at Thu May 29 14:10:00 2008

          [fset:read:DBUG/1:  fr_impac_ft:05/29/2008 14:06:46] Scheduling IB for 194 second interval

          [is  :ISEv:INFO/0:  fr_impac_ft:05/29/2008 14:06:46] *** Next Interval = 194, Event(1) time = Thu May 29 14:10:00 2008

          [pd  :pdtd:INFO/1:  fr_impac_ft:05/29/2008 14:06:46] Set driver status to PD_STATUS_TIMER

          [fset:cont:INFO/1:  fr_impac_ft:05/29/2008 14:09:39] Processing FTP ‘Unexpected read’ control state

          [fset:cont:INFO/1:  fr_impac_ft:05/29/2008 14:09:39] Processing FTP ‘Response read setup’ control state

          [fset:cont:INFO/1:  fr_impac_ft:05/29/2008 14:09:39] Processing FTP ‘Read response’ control state

          [fset:cont:INFO/2:  fr_impac_ft:05/29/2008 14:09:39] 5 second select() awaiting response

          [fset:cont:DBUG/1:  fr_impac_ft:05/29/2008 14:09:39] Parsing ‘421 Timeout (120 seconds): closing control connection.

          421 Terminating connection.

          The connection timed out.

          It looks like in both “start” and “fileset_ibdirparse” modes that you are not returning a message handle to the engine.

          Try putting the templated “set dispList {}” and “return $dispList” in their appropriate areas of the proc.

          Also, a quick glance looks like you’ve commented out the “}” of the run portion of the proc. Check to make sure your braces match up.

          Hope this helps.

          -mh

        • #64813
          Russ Ross
          Participant

            I would see if I could set up a FTP test without going through your “virtual conduit”, especially if this integration is the first one to use the “virtual conduit” and all the others are a direct FTP .

            If that test is doable and works fine, then you can bring some evidence to the table that the so called “virtual conduit” is not all that transparent.

            At least you will begin to know if chasing your tail in the cloverleaf world is a waste of time.

            I have observed undependable (that’s putting it nicely, actually more like unuseable) behavior when doing indirect or routed FTP like you might be doing with your “virtual conduit”.

            Consequently, at this time we only do direct FTP to staic IPs that we harcode because our DNS servers are also unreliable.

            Somehow I want to believe that virtual FTP routing would work but I feel it will take more understanding than just reading the cook book 101 to get it working.

            Russ Ross
            RussRoss318@gmail.com

          • #64814
            Michael Hertel
            Participant

              More on what I see:

              In the “does_work”file I see:

              [tps :tps :INFO/1:  fr_impac_ft:05/29/2008 14:40:00] tds.string = tps_ftp_impac_dft {MSGID message0} {CONTEXT fileset_ibdirparse} {ARGS {}} {MODE run} {VERSION 3.0}

              impac_dft_startup:  0

              ***

              *** 05/29/08 @ 14:40:00 ***

              ***

              *** tps_ftp_impac_dft – Remote files found to process: IMPC_Bath_036.chg IMPC_Port_012.chg IMPC_Scarb_312.chg

              ***

              I don’t see an echo statement in the proc you supplied for echoing the startup state (impac_dft_startup:  0)

              So I am assuming you are not using the same proc for a successful run vs failed run. Comparing those outputs without seeing both procs will put me on a fishing expedition.

              I propose maybe starting over and rebuilding this interface.

            • #64815
              Jennifer Hardesty
              Participant

                One of the things I really don’t understand, after reading and rereading the documentation, is the checkbox for “Delay Connection Until Needed”.  According to the documentation, this sets a flag FTPDELAY, which should actually cause the connection to not attempt to connect until it’s needed, which presumably be when the advanced scheduler triggers it.  However, the connection is made as soon as the thread is started and every time the thread cycles through (which I believe is the Pulse setting) it rechecks this connection using the “run” mode.

                I’ve tried setting a flag in the time { } section but nothing I enter in there seems to ever be executed.

                Michael – I have to admit that I do wonder about this line:

                Code:

                [tps :tps :ERR /0:  fr_impac_ft:05/29/2008 14:06:39] ‘1’ (returned by ‘tps_ftp_impac_dft ‘) does not match { }

                It does seem to correspond with my setting the startup flag to 1, but I can’t figure out why that simple set statement would trip up.

                I’ve added “set dispList {}” and “return $dispList” and everything that goes along but it doesn’t seem to help or add anything to the log.

                The commented out “}” is actually attached to the commented out if-else above which I explained in my initial posting — I’ve had to comment out my startup flag that kept tried to keep the tclproc from pulling the files on startup so it would only work on the timed schedule; since the timed schedule piece is unreliable I am having to stop and start the  thread manually every day.

                Russ — Thank you for verifying some of what I was thinking.  Unfortunately I am having the worst time convincing our networking people either give us a direct ftp connection (security issue) or push the files via ftp to our server (which we already do in 3 other instances).

              • #64816
                Charlie Bursell
                Participant

                  The “Delay Connection Until Needed” flag is only useful with an outbound connection.  It is usually used in conjuction with the “Close Connecton after write” flag.

                  What this means is to not try to connect until a message is present and then disconnect after the message is ent.  It has nothing to do with the scheduler which would normally be used for inbound connections.

                  The Pukse time field, which is no longer there with newer versions, is there to maintain the connection.  A lot of FTP servers will disconnect if nothing is sent in a given amount of time.  The Pulse time, defualt 500 Millliseconds is to periodically send a NOP to keep the connection up.

                  I hope this helps

                • #64817
                  Russ Ross
                  Participant

                    Charlie:

                    You reminded me about the FTP connection time out problem I had with a particular server.

                    Their timeout setting even effected an active FTP thus preventing anyone from moving a giant file.

                    Russ Ross
                    RussRoss318@gmail.com

                  • #64818
                    Jennifer Hardesty
                    Participant

                      I ended up having to set the Pulse time to 100 milliseconds to fix the problem.  Basically, the connection stays in “up” status all of the time now and never goes to “timer”, but it’s the only way it works consistently.  

                      Charlie, if the Pulse time field isn’t there in future versions, how would I resolve this same issue?

                    • #64819
                      Jennifer Hardesty
                      Participant

                        We now have our second instance of this issue.  It seems to only occur on when the ftp folder we are pulling from is on a Windows 2003 Server.

                        When the process first launches, it automatically ftps/logs in to the folder and does an ls, then closes the connection.  saneo states what the interval is to the next scheduled login.  However, 2 1/2 minutes later, there appears to be a time out with an “unexpected read” error.  The log messages say the connection is closed but the process status remains in the “timer” status until it tries to open and connect at the scheduled time; then it begins the process of trying to connect to the ftp folder with no response and gets stuck with an “openning” status.

                        Processing SOCKET (FileSet/FTP Control socket) event 0x2125cad8

                        Calling cb 0x200aa9d8

                        Processing FTP ‘Unexpected read’ control state

                        Processing FTP ‘Response read setup’ control state

                        Processing FTP ‘Read response’ control state

                        5 second select() awaiting response

                        Parsing ‘421 Timeout (120 seconds): closing control connection.

                        421 Terminating connection.

                        We suspect this might be a security set-up issue with the new Windows servers as nothing has changed on our AIX Cloverleaf box and all of the other ftp billing processes continue to work fine.

                        Jennifer Hardesty

                      • #64820
                        Lawrence Williams
                        Participant

                          In an earlier post the question was asked  “if the Pulse time field isn’t there in future versions, how would I resolve this same issue?”

                          I did not see a reply to that question and I now have the same question, if the option is not there in 5.7 how is it handled ?

                        • #64821
                          Jennifer Hardesty
                          Participant

                            I don’t have an answer for Lawrence — sorry! — but I am interested in the answer since we plan to upgrade to 5.7 with the next 6 months or so.

                            However, I did want to follow up that this issue was resolved by increasing the interval on the Windows box itself.  The set up that the Windows 2003 servers arrive with is set incredibly low — below actual standards.  The Windows PC folks changed their value to “800” and it works fine, but now we have to go through the whole thing again and again every time they add a Windows 2003 box into the mix without telling us and we always have to re-prove it.

                            By the way, Lawrence, I’m from down around your parts originally.  I grew up in New Orleans & Metairie and I worked a while at EJGH.

                          • #64822
                            Mark Perschbacher
                            Participant

                              Just my two cents here.  Last year, I set up a fileset-ftp thread, and it worked ok for a couple weeks.  We lost connectivity over the VPN, and the thread went into an error state.  It caused the process to panic, and the process started sending out multiple copies of the outbound messages, very ugly.  I opted to set up scheduled jobs with Windows Scheduler that execute batch files with ftp code that grab fileset-local files.  Works great.

                          Viewing 10 reply threads
                          • The forum ‘Cloverleaf’ is closed to new topics and replies.