Dan Goodman

Forum Replies Created

Viewing 15 replies – 1 through 15 (of 16 total)
  • Author
    Replies
  • in reply to: Tunning of Parameters to reduce CPU usage on AIX #56544
    Dan Goodman
    Participant

      What command are you using to deduce that your usage is in that range?

      BTW, your swap space looks low, probably unrelated. What does lsps -a show?

      I vaguely recall that mem usage in AIX is misleading in that it includes memory buffering held in reserve by the O/S, and not actually in use by the application.

      😎

      in reply to: Linux Virtual Memory is too full #56680
      Dan Goodman
      Participant

        And how much swap space do you have?

        Redhat recommends 1.5 to 2.0 x installed memory (RHEL4 — not sure about RHEL3).

        😎

        in reply to: Using a SAN #56195
        Dan Goodman
        Participant

          I would also like to know the answer to the licensing question when mirroring to a remote server for recovery. We are in the preliminary planning stages for doing this.

          8)

          in reply to: Loss of data during crash #56587
          Dan Goodman
          Participant

            Sort of tangential to this discussion, but we elected to mirror a local physical drive and a SAN drive. When a SAN outage occurs, AIX issues an error report (errpt command), and the SAN side of the mirror goes stale, but the data is captured locally.

            When the SAN connect is restored, AIX (5.2) resynchs the SAN disk to the up to date local disk.

            Sometimes a belt and suspenders isn’t enough, and a piece of twine is all you have left to hang on to. Overall, the SAN is more robust when operational, but is subject to more frequent maintenance outages than local disk.

            in reply to: files in revision directory #56440
            Dan Goodman
            Participant

              While we are on the subject of revisions, would it be possible to have the timestamps on the revised NetConfigs have a fixed width for hours, minutes, months and days?

              Yes, I know you can look at the Unix timestamp, if it hasn’t been stepped on, to see the actual date, but does the fragment “121” refer to December 1st, or January 21st? Why not “1201” and “0121”???

              As a sidenote, why do Unix programmers have trouble with holiday scheduling?

              Ans.  Because Oct 31 = Dec 25!

              in reply to: AIX Authority and Permissions #56316
              Dan Goodman
              Participant

                In a Sarbanes-Oxley/HIPAA world, it is better from an audit perspective to separate out the admin duties from the applicaton duties.

                Where there is a need for root, it is usually for a third party package install, in which case, it is best if a sysadmin is involved, rather than just tossing the software on the box, and hoping no conflicts arise.

                If all the permissions are set as QDX recommends, there shouldn’t be a need to have root access to manage all QDX code and commands. If they are not set right, it is better to set them right. I think QDX has a utility for this now, but it used to “here is the specification, you implement it.”

                If none of the above are sufficient, the admin can always configure a free utility, sudo, to allow program execution only for selected (necessary) programs.

                in reply to: High Availability #56341
                Dan Goodman
                Participant

                  HA/XD is “shorthand” for HACMP/XD, which is the replacement for HA/GEO.

                  “Back in the day”, IBM Global Services typically would come in and do all the requirements analysis, etc., then put up a canned two node config.

                  Supposedly, with HA/XD, there is an “express” configuration option for the common two-node solution. (Think: no consulting fees for installation and setup.)

                  But for us, it *could* be the case that we would eventually have three machines, with intermachine “health check” heartbeats.

                  With the appropriate failover capability in the app, I would think three machines would be the minimum that could reasonably detect and lockout a failing node automatically.

                  Here is a link to the IBM Announcement letter for HA/XD

                  205085 IBM HACMP/XD expands its business continuity solution with

                           improved, simpler geographic-distance data mirroring and

                           disaster recovery (14.5KB)

                           

                  http://www.ibm.com/isource/cgi-bin/goto?it=usa_annred&on=205-085HA

                  in reply to: High Availability #56338
                  Dan Goodman
                  Participant

                    I am also interested, especially from the point of view of application level support for fast failover.

                    We have a tested home-grown solution to rehost our production site onto a backup platform, without a reboot.

                    We are also looking at HA/XD, but expect that applevel support is required to do more for us than we are doing for ourselves now.

                    We have thoroughly tested our failover (rehosting) mechanism on our test site, and are planning to go live on a new p615 running AIX 5.2L and Platform 5.3 (zero rev?)

                    in reply to: Cloverleaf 5.3 Supported Platform List #56087
                    Dan Goodman
                    Participant

                      Will 5.4 include integration of the Multisite Monitor with the Advanced Security Server?

                      Will it include the LDAP-less Advanced Security Server?

                      in reply to: Board Reorganization #56274
                      Dan Goodman
                      Participant

                        Keep a separate forum for O/S-related topics.

                        in reply to: Linux vs. AIX #56132
                        Dan Goodman
                        Participant

                          We’re currently doing a new (non IE) system here with lots of Lintel, and lots of HW and SW issues so far.

                          If you go with AIX, you can mix and match Linux in, as AIX supports Linux on top of AIX.

                          With respect to ODBC, I suspect the Perl ODBC package could be used on either platform. I have set it up on an AIX box for a DBA. I’d be very surprised if it didn’t work equally as well.

                          I like the system management tools, robustness and maturity of the AIX platform, but if you’re looking to blaze new trails, go Linux.

                          Personally, for a large hospital production environment, I would stick to AIX, but a lot of it depends on the business continuity requirements of your applications.

                          Linux is a nice platform and “ready for prime time” in many respects, but to use a military metaphor, I’d rather ride into combat in an agile tank with lots of defensive weapons than to roll up in the (in)famous Bradley Fighting Vehicle. To me, AIX is like the more armored and better defended of the two.  Linux can also do the job, but the HW robustness of the AIX HW platforms are hard to beat..

                          in reply to: Cloverleaf 5.3 Supported Platform List #56083
                          Dan Goodman
                          Participant

                            😀  Thanks for the prompt response, Rob.

                            Has a target GA date (or date range) been set yet?

                            in reply to: Using a SAN #56190
                            Dan Goodman
                            Participant

                              We are AIX 5.1L, using SAN in a mirrored config (one local, one SAN drive).

                              We *have* had loss of SAN copy, including one unplanned, due to SAN maintenance. AIX resyncs the mirror in the background when the SAN is restored. This eliminates the issue of possible SAN MTTF (mean time to failure) and MTTR (mean time to recover) being possibly worse than local hardware.

                              The database corruption point is a good one, although I still see some interest locally in using the shared SAN disk as a quick recovery method.

                              What we have done instead is (1) acquire identical HW platforms (p615’s beefed up, incl. total of 4 disks); (2) mirror rootvg locally; (3) mirror separate appvg, one local, one SAN; (4) retain 4th drive for future OS upgrades using alt_disk_install; (5) acquired dual HBA’s for primary platform and loaded auto-failover driver (Hitachi HDLM).

                              (We have license keys, tied to systemID already in place on both platforms.)

                              In addition, we replicate our $CLROOT/production directory from our primary platform to our secondary platform nightly. This does not conflict with the secondary platform’s role as a development/test machine, as that work is done in $CLROOT/test.

                              We may, in the future, *in addition to* not in place of, our current config, add an additional SAN disk to our primary mirroring, with the idea of fast failing it, and the app, to the secondary machine.

                              We expect that this will require additional software from QVDX as well, and proof-of-concept of the same.

                              Remember that moving the SAN disk from one platform to another does nothing for messages backed up in memory only queues, so you need to either store *all* messages to disk (correct me if I’m wrong), or have one slick retransmit capability with all your ancillaries, preferably automated, but at a minimum, with automatic detection and removal of duplicated transactions.

                              Not sure what all of this would buy us, in that we can autofail our production (tcpip, SNA, hostnaming) from the primary to the secondary (all but application times) in under five minutes, without a reboot to either box.

                              The actual runtime from command initiation, is under fifteen seconds, except for the SNA, whose routing is controlled by our z/OS, which interfaces with SMS/Siemens. This is more on the order of five minutes.

                              Ancillaries with robust socket management seem to pick this up automatically, but there are always a few stragglers that need to bounce their connection…

                              We like it.   8)  (Up 597 days since last, inplace, OS and app upgrade — 3 SAN outages, one unplanned, all due to SAN maintenance — zero HW errors — zero SW outages at OS level).

                              Dan Goodman

                              in reply to: Rev upgrades/ Bug list #56007
                              Dan Goodman
                              Participant

                                I second this motion.   😛

                                in reply to: Cloverleaf 5.3 Supported Platform List #56081
                                Dan Goodman
                                Participant

                                  Will QDX 5.3 be certified at AIX 5.3? If not, is another release planned to be supported at AIX 5.3?

                                Viewing 15 replies – 1 through 15 (of 16 total)