Locked SMAT file issue

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Locked SMAT file issue

  • Creator
    Topic
  • #54194
    Mike Strout
    Participant

      Here is a new one I haven’t found an answer for in my searching. I have a SMAT file that throws the following error when I try to open it.

      An error coourred while checking the version of the opened index file…

      The Process cannot access the file because it is being used by another process.

      I tried to solve the problem by…

      Restarting the process

      Cycling the SMAT for the thread

      When I attempt to read the file, I get the same error.

      Next, I tried turned off logging on the thread, restarting the process, turning logging back on and then restarting the process again.

      Still no luck.

      Next, changed the name of the target file in the thread’s properties page and then restarted the process. This worked, but I don’t like the fact that the file is not named the same as the thread, so I changed it back to the original name and restarted the process. That darn error popped up again.

      BTW, I am running 6.02 on AIX 6.x

      Any thoughts?

      Mike

      <><

    Viewing 4 reply threads
    • Author
      Replies
      • #80532
        Russ Ross
        Participant

          Check that none of the other threads in the site are using the same SMAT file name that appears to be locked.

          This happens sometimes when you copy a thread and overlook changing the SMAT file name when configuring the copy.

          I thought when we went to Cloverleaf 6.0 and saved the NetConfig that it catches this oversight now but in the older Cloverleaf 5.6 it went unoticed.

          Russ Ross
          RussRoss318@gmail.com

        • #80533
          Mike Strout
          Participant

            I opened the NetConfig in UltraEdit and did a search for the file name in question and it wasn’t found, telling me that none of the processes in the site are trying to write to it. At the time, outbound SMAT was pointed to a different file.

            From there, I went into the directory and deleted the ecd, idx and msg files and got no warning about them being in use. I then restarted the process to see if it would recreate them, a sure sign that they were referenced somewhere else, but they didn’t get recreated. I then pointed the outbound SMAT back to the troublesome file and restarted the thread. The new files were created as expected and after a few seconds, data started flowing into them. Also as expected, when I attempted to view the messages in files via the SMAT tool, I got the “used by another process” error. Grrrr.

            I guess it is time to open a ticket. 🙁

            Mike

            <

          • #80534
            Russ Ross
            Participant

              Another thing that comes to mind to check, see if you have multiple instaces of the same process running for the same site.

              On my AIX platform I would do something like this to check for that possibility:

              # stop all processes in the troublesome site by normal methods then do

              ps -ef | grep my_site_name

              # if you get any hits that look like a rouge instance of any process for this site are running then use “kill -9 PID#”

              # now restart site processes as normal and see if lock problem has gone away

              Russ Ross
              RussRoss318@gmail.com

            • #80535
              Russ Ross
              Participant

                I also remembered I have used lsof to determine what process is accessing a given port number when there is contention.

                I decided to see if lsof might be useful for determining what process has a file open and maybe locked.

                Here is an interactivve snippet of me using lsof to get the PID that is accessing a SMAT msg file and then using “ps -ef” to determine more about what that process is.

                Code:

                (hci@dopilhub1b) /cloverleaf/cis6.0/integrator/prod_cbord/exec/processes/global_adt
                > ls -al
                total 11920
                drwxrwxr-x    2 hci      staff          4096 May  9 12:35 .
                drwxrwxr-x    4 hci      staff           256 Jul 26 2013  ..
                -rw-rw-r–    1 hci      staff            33 Oct 18 2013  .tps_check_reply_2.ob_cbord_adt.ctr
                -rw-rw-r–    1 hci      staff            33 May  9 12:35 .tps_msh_13_seq_num.ob_cbord_adt.ctr
                -rw-rw-r–    1 hci      staff            33 May  9 12:35 .tps_msh_seq_num.ob_cbord_adt.ctr
                -rw-rw-r–    1 hci      staff            33 May  9 12:35 cbord_adt.ctr
                -rw-rw-r–    1 hci      staff             6 Apr 13 02:24 cmd_port
                -rw-rw-r–    1 hci      staff            40 Apr 13 02:15 exit_log
                -rw-rw-r–    1 hci      staff            77 May  8 23:36 global_adt.err
                -rw-rw-r–    1 hci      staff           164 May  8 23:36 global_adt.err.old
                -rw-rw-r–    1 hci      staff         36394 May  9 12:33 global_adt.log
                -rw-rw-r–    1 hci      staff         65423 May  8 23:36 global_adt.log.old
                -rw-rw-r–    1 hci      staff            35 Oct 25 2013  jr_global_adt_18310.in.ecd
                -rw-rw-r–    1 hci      staff       1652958 May  9 12:35 jr_global_adt_18310.in.idx
                -rw-rw-r–    1 hci      staff       3707165 May  9 12:35 jr_global_adt_18310.in.msg
                -rw-rw-r–    1 hci      staff            35 Oct 25 2013  ob_cbord_adt.in.ecd
                -rw-rw-r–    1 hci      staff        146196 May  9 12:35 ob_cbord_adt.in.idx
                -rw-rw-r–    1 hci      staff         17112 May  9 12:35 ob_cbord_adt.in.msg
                -rw-rw-r–    1 hci      staff            35 Apr 13 02:32 ob_cbord_adt.out.ecd
                -rw-rw-r–    1 hci      staff        146196 May  9 12:35 ob_cbord_adt.out.idx
                -rw-rw-r–    1 hci      staff        182723 Oct 29 2013  ob_cbord_adt.out.len10
                -rw-rw-r–    1 hci      staff         39051 May  9 12:35 ob_cbord_adt.out.msg
                -rw-rw-r–    1 hci      staff             9 Apr 13 02:24 pid
                -rw-rw-r–    1 hci      staff           352 Apr 13 02:24 startup_log
                -rw-rw-r–    1 hci      staff             9 Apr 13 02:24 wpid

                (hci@dopilhub1b) /cloverleaf/cis6.0/integrator/prod_cbord/exec/processes/global_adt
                > lsof jr_global_adt_18310.in.msg
                lsof: WARNING: compiled for AIX version 6.1.3.0; this is 6.1.0.0.
                COMMAND        PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
                hciengine 16580794  hci   33wW VREG   42,5  3721809 39936 jr_global_adt_18310.in.msg

                (hci@dopilhub1b) /cloverleaf/cis6.0/integrator/prod_cbord/exec/processes/global_adt
                > ps -ef | grep 16580794
                    hci 12386362  9044166   1 12:35:27  pts/4  0:00 grep 16580794
                    hci 16580794 14614578   0   Apr 13      – 35:16 /cloverleaf/cis6.0/integrator/bin/hciengine -S prod_cbord -p global_adt -s jr_global_adt_18310 ob_cbord_adt

                (hci@dopilhub1b) /cloverleaf/cis6.0/integrator/prod_cbord/exec/processes/global_adt
                >

                Russ Ross
                RussRoss318@gmail.com

              • #80536
                Kevin Kinnell
                Participant

                  I bet this “lock” would go away with a cleanup.  You won’t see the pid accessing a smat file if there isn’t actually any access going on.  Just sayin’…

              Viewing 4 reply threads
              • The forum ‘Cloverleaf’ is closed to new topics and replies.