Forum Replies Created
-
AuthorReplies
-
Very interesting! That is what we have been told another site did as well. The problem is with Paragon, we are running in snapshot mode and we also need ORM messages for non patient labs. Soft is telling us that they can’t send the ORM’s out of two separate lines. It is a cluster if you ask me.
This is good to know though. I’m going to rethink how we are handling this upgrade.
I’d just like to add, after spending a little time getting 5.7 running on a new Win 7 machine, to follow the steps on selecting Win XP compatibility on the main application. Then, for the patch, edit the file, go to the part where it looks a bit like this:
@findstr /c:5.1 temp
@if %ERRORLEVEL% EQU 0 goto :windowsXP
and change it to look like this:
@findstr /c:6.1 temp
@if %ERRORLEVEL% EQU 0 goto :windowsXP
Sure beats copying all the patch files over manually.
I’ve made some attempts at parsing the netconfig. I pull some basic stuff (host, port, connection, plus a few others) once a night, then import that into SQL Server via SSIS, and have a bit of an ASP page that puts it on our intranet.
We (as in the integration team) are the only ones that really use it, but we find it quite helpful just to keep up with port numbers, if nothing else.
I am sure some other folks on here have made a lot more progress in completely parsing the netconfig than I have. I feel that is the best way, then, you can automate documentation.
Thanks for the replies.
AIX 6.1
GUI Build: 5.7PRev2
Server Build: 5.7PRev2
Everyone could have all of their clients closed and the problem still happens. There seems to be no rhyme or reason to when it happens. Sometimes, it might be a day or two w/o incidence, then it might be 5 times in a day.
Soon, I will have deconstructed all the threads into smaller sites. I think that should help isolate the problem. I am also contemplating redoing this particular environment completely, shutting the old one down and bringing the new one up. Everything just takes a bit of time though. 🙂
Let me further clarify a bit, when I say that the “monitor hangs,” I don’t just mean that the client is freezing up, I mean that all messages stop flowing. If it was just the client freezing, that would be managable. When this is happening, no data is flowing for any application connected in that environment.
The only way to fix it is to go out to Unix, find the PID of the particular monitor, force a kill on it, and then restart the monitor. We are not losing data as the various systems connecting to us (which, to them, it never appears that the connection is actually down, just that they can’t send data) queue their data, but before I wrote my script to watch things, it could go on for an hour before everyone starts noticing that orders have not been crossing over.
I can’t use an alert because they stop working as well when the monitor hangs. There are no errors. There are no panics. It simply stops working. If you have the client up on your screen, you can’t even tell because everything still shows up and running. You have to check the status of a thread or do some other thread related task before you know. If you try to do something to the thread, it will just sit there and wait and wait and do nothing. My script checks it every 15 minutes now, so that we can be on top of things.
The computers here that run the client don’t even have to be on and it can still freeze. I am fairly sure that it has nothing to do with the client or our personal computers. It is something to do with that particular environment and how the processing is happening in 5.7 vs 5.4. We ran for 4 or more years on 5.4 with this same environment and it never froze like this. The other 6 environments we have are not freezing. It is very odd! If we were at least getting some type of error, that would be different. At least then we would have a starting point.
Anyway, phase 2 of our upgrade to 5.7 includes splitting up each environment into 4 environments, as they are getting quite crowded. It is our hope, that in splitting them up, that we can isolate the issue. We shall see soon as I have already begun the process of splitting.
We have tried several things, still hanging up around once a day. Sometimes, several times in a day. We are going to split out the environments next and try to pinpoint the issue.
In the meantime, I wrote a script that checks for a hung monitor, kills it, and restarts it.
Thanks Laura, but I should have said we are running AIX, so that doesn’t apply.
Donna, that is a good idea as we do have some operators that might still have a link to the old version. I am going to head over that way, check their pc’s and, if they have the old link, try it out and see if it causes the others to hang.
As of right now, we have moved the thread we suspect as a possible cause to a different environment and rebooted the server. We did not have any hangs over the weekend.
Thanks for the reply. The OS is good, all of our other environments run fine, it is just one that is giving us issues (never hiccuped in 5.4 though). I think we may have traced it to a thread doing something it just doesn’t like. We are going to try a reboot during a quite time this weekend though before we set up an environment for just that one thread.
We are using 5.7 without any problems. Thanks Troy! I swear we went through that configuration menu a dozen times and missed that every time. You just made us all happy campers. On that same subject, I have seen panic’s crop up when log files get quite large and then you try to do some type of operation with them. Looks like the compression operation was your culprit (as you found). I had an environment that I forgot to enable any of the auto log type features, then, when I set them up after not having them on for quite some time, I got a very similar error. Just keep those sizes small!
We have a couple of scripts, written outside of the engine, that run via the cron. They are just some tcl scripts (also, we have built in alerts to a lot of batch process that takes place outside of CL, but is kicked off when CL places a file in a certain directory) that check for various criteria every x number of minutes. The advantage is that they can email as many or as few addresses as we want. Since there are only 3 of us that get these pages, we all get them, as the person on call knows they are on call. We all want to be notified if something is dead at 3AM though, just incase. 🙂 We are moving to 5.7r2 in a month or so and I think we will take a look at the built in alert system again at that point.
Ryan, we are in a similar boat, except we are moving up from 5.4 to the newest version. We also use TCL for most of our processes, there are some xlates here and there, but not much. TCL’s seem to be a lot more robust and powerful.
Anyway, that said, I have been working on breaking out our processes, not just for an aesthetics sake, but to lessen the impact of making changes to threads on the go and for the rare *knock on wood* process hang. We have some processes that, if you don’t take them down/bring them up in a certain order, you run into issues with other systems, among other little things, and, as we all know, if you add a new thread or make some changes, you have to stop and start the process to get everything kosher. Also, in the case of a panic, all the threads in that process are effected. Here is what I am doing (using outbound ADT for an example):
Outbound ADT from STAR feeds about 15 other threads, I have separated them into three groups, high, medium, and low. The naming is for our 24 hour operators to know not to call us at 3AM if something in the low catagory dies, but you could, of course, use whatever you wanted for naming. I then have ADT coming into a main, ADT out, this sends messages to adt_high, adt_med, adt_low, all under the process “ADT.” adt_high, adt_med, and adt_low are all tcp types, sending to the local host (in another environment actually), to adt_out_high, adt_out_med, and adt_out_low, all in the processes of their respective names (since this is acting in a server to client way, there are no issues with cross processing). Those split off into those three processes, each containing 4 to 10 threads.
I know that is sorta hard to follow w/o a flow chart, but I think you get the idea. Also, I know that you would end up with extra threads this way, but 1) it keeps the process cleaner (especially when you use an extra “loop” environment for part of the routing) 2) making changes effects fewer systems and 3) panics and other hangs make less of an impact
FYI on the pearl script error I was seeing in their testing site… After talking to someone, not sure who (should have gotten her name/number as she sounded like she actually knew what she was talking about), at UHC, that error is coming up because we are testing. It is the reports that are generated in the “download” section that are the main concern. It would be great if they let test generate really useful data. They said that the only real point of test was to make sure your format was working, nothing to do with any real data validation (outside of some envelope data).
Has anyone been submitting test files via Connectivity Director? I have been working with Jim on this, but I thought I would check here since he is doing real time and I am doing batch… If you have been submitting files, can you tell me if you have gotten this error:
“Faciledi Validation Use of uninitialized value in string ne at /usr/local/claredi/scripts/routing/faciledi.pl line 183.”
It then spits out some other lines and gives me some trash reports…
I have been working diligently with their help desk folks (who are very nice and I think they might be working with Ingenix and not UHC) on the problem. They will run my file through a secondary validation that gives more detailed information, but every time I fix what it says, I still end up with that error. I know that is a pearl script and it is missing a value, but they don’t seem to know what it is/won’t directly say. Today, just for grins, I set up a proper envelope and copied exactly what is listed in the companion guide as a test message into my file. Surprise, I still got the same error. Most of the errors I have been fixing are things that are directly in the guides, of course, it doesn’t matter because I still can’t get past the error above. ANYONE? This is driving me crazy.
-
AuthorReplies