Another SMAT cycling question

This topic has 17 replies, 5 voices, and was last updated 8 years, 11 months ago by Max Drown (Infor).

Creator

Topic
August 20, 2016 at 1:23 am #55170
Mike Keys
Participant
So, here’s what I want to do, and I’m not extremely versed in Unix scripting. Anything that is OOB that I can tweak would be helpful.

A. I understand about SMAT DB auto cycling upon a certain KB size limit. I will invoke this.

B. What I would like to do is run a script at midnight that would join all the SMATdb files for the previous day into a master SMATdb file that would then be moved to an Archive folder. Softlinks would be used to access this folder from SmatHistory (I think this would be possible).

C. I’m using a borrowed script that a vendor wrote that was actually designed to archive *.idx.old and *msg.old files, along with the old log files. This archiving is done by site, then date, then process. I want to tweak this site to join the SMAT DB files and move to the Archive. Using this script because it already includes a proc for ‘x’ days retention.

D. Why isn’t something like this already included in Cloverleaf? I would think that in lieu of Message Warehouse, that some type of SMAT DB management would be included that would automatically cycle save threads at a certain time?
Creator

Topic

Viewing 16 reply threads

Author

Replies
- August 20, 2016 at 1:36 am #84434
  Mike Keys
  Participant
  To clarify, the SMAT DB join would not be a single master for each day, but for each process. So, if a process kicked out 6 SMAT DB files in a day due to KB size limits, I want to join all of those into a single SMAT DB file that can be moved to the Archive.
- August 20, 2016 at 6:47 am #84435
  Charlie Bursell
  Participant
  I think the time cycling will come with CL 6.2
  
  As for combining them into a master that would not be very difficult. Simply create a DB with a table for each process and then move them in.
  
  Would that be an accumulative thing or one master for each day?
  
  You begin to see the problem? If Cl accommodated *EVERYTHING* each user wanted done and the way he wanted it done can you imagine the size of the tool set?
  
  You are provided all of the tools you need to roll your own. As I said before, that’s why you are paid the big bucks 😀
  
  I get bored at times. Send me an e-mail and I will help if I can.
- August 20, 2016 at 2:02 pm #84436
  Mike Keys
  Participant
  ~~Charlie Bursell wrote:~~
  
  I think the time cycling will come with CL 6.2
  
  As for combining them into a master that would not be very difficult.
- August 20, 2016 at 3:32 pm #84437
  James Cobane
  Participant
  I believe the ETA for 6.2 is the end of this year (so I’ve heard through grapevine).
  
  Jim Cobane
  
  Henry Ford Health System
- August 21, 2016 at 5:40 pm #84438
  Peter Heggie
  Participant
  We are looking forward to this functionality in 6.2.
  
  In the meantime, we have implemented a scripted process that cycle saves into SmatHistory, copies that new file from SmatHistory to an archive file system, inserts the contents of that file into a 7-day holding file in SmatHistory and then deletes the file and also deletes any message older than 7 days from the 7-day file.
  
  So for every SMATDB, we have a 7-day file in SmatHistory, which ends up being a rolling window of messages that are up to 7 days old. And we keep a copy of each cycle saved file in our archive file system. We will implement a purge script for these archive files soon.
  
  We used to run the cycle-save script four times a day with the old Smat Files, so we only had up to six hours of data in our current file and all older, saved files were compressed. Simple process but annoying when looking for messages older than 6 hours.
  
  With the SmatDB, the search capability, for our volume of about 90,000 messages per day, can easily handle 7 days worth of data with only a few seconds of wait time per search. Big improvement. So is the search criteria function, especially the AND criteria. Still waiting for the fix to allow resubmits when your criteria contains a NOT clause. Really love the criteria for meta-data – use it all the time, especially for inbound arrival time.
  
  Peter Heggie
  PeterHeggie@crouse.org
- August 22, 2016 at 8:26 am #84439
  Charlie Bursell
  Participant
  It seems to me you might need some method to scour all of your archived SMATDB files when looking for a particular patient, MRN, etc.
  
  As I said I get bored so I threw together this little script I call grepDB. I have done some testing but not a whole lot. I just started it yesterday.
  
  It should run in Unix if the .htc is removed. If you move to Unix remove CR
  
  perl -pi -e ‘s/r//g’ grepDB
  
  I do not have access to a Unix box so I did not test there.
  
  Maybe it is useful, maybe not. I will not be offended either way.
  
  If useful I would appreciate anyone that wants to wring it out for me. I am not married to it so I am open for ideas , changes, etc.
  
  Let me know.
- August 22, 2016 at 9:48 am #84440
  Mike Keys
  Participant
  ~~Peter Heggie wrote:~~
  
  We are looking forward to this functionality in 6.2.
  
  In the meantime, we have implemented a scripted process that cycle saves into SmatHistory, copies that new file from SmatHistory to an archive file system, inserts the contents of that file into a 7-day holding file in SmatHistory and then deletes the file and also deletes any message older than 7 days from the 7-day file.
  
  So for every SMATDB, we have a 7-day file in SmatHistory, which ends up being a rolling window of messages that are up to 7 days old. And we keep a copy of each cycle saved file in our archive file system. We will implement a purge script for these archive files soon.
  
  We used to run the cycle-save script four times a day with the old Smat Files, so we only had up to six hours of data in our current file and all older, saved files were compressed. Simple process but annoying when looking for messages older than 6 hours.
  
  With the SmatDB, the search capability, for our volume of about 90,000 messages per day, can easily handle 7 days worth of data with only a few seconds of wait time per search. Big improvement. So is the search criteria function, especially the AND criteria. Still waiting for the fix to allow resubmits when your criteria contains a NOT clause. Really love the criteria for meta-data – use it all the time, especially for inbound arrival time.
  
  Peter,
  
  Would you be willing to share this script? Would like to take a look at it. Might be able to tweak it to fit our needs.
  
  Regards,
  
  Mike Keys
  
  keys.mike@mhsil.com
- August 22, 2016 at 1:49 pm #84441
  Peter Heggie
  Participant
  I’ll post the script. I like to break up complex scripts into smaller, simpler, reusable chunks.
  
  smatdb_archive
  
  (sites) – smatdb_skipchk
  
  (processes) – smatdb_skipchk
  
  – getProcessThreads
  
  (threads) – smatdb_skipchk
  
  – getThreadSmatName
  
  – smatdb_archive_thread
  
  – smatdb_redefine
  
  – smatdb_getstats
  
  – smatdb_insert_all
  
  – smatdb_delete_age
  
  Please be aware that there is not much error checking, especially when moving or copying files – if you run out of space on the target file system, you will be SOL.
  
  I’ll bundle all of these scripts into a single text file with each script prefixed with the script name.
  
  These are a mix of tcl and ksh scripts.
  
  You can use the config file (smatdb_retention_list) to control which sites, processes and threads should be skipped and which should be processed.
  
  Peter Heggie
  PeterHeggie@crouse.org
- August 23, 2016 at 12:35 am #84442
  Max Drown (Infor)
  Keymaster
  If someone can explain to me why you would want one big file instead of many smaller files, I’ll be happy to make the request to R&D.
  
  In my opinion, that function works very good the way it is. I see no reason for scheduling cycling nor do I see a reason for combining the files. The SMAT DB search works across files seamlessly.
  
  I realize there may be use cases, etc. that I am not aware of, so I’d happily change my mind and submit a request given more information.
  
  The scheduled cycling has been added to 6.2 which is in beta. No promises on whether it will be in the release or not.
  
  -- Max Drown (Infor)
- August 23, 2016 at 6:21 am #84443
  Charlie Bursell
  Participant
  I agree with Max here. Creating one large DB to hold all the small ones could present more problems than it would solve. Since by necessity it would require a different schema then you would need a separate GUI to access it.
  
  Using many small files seems a better answer. You should decay out any files after a certain period. I provided a method to search all of you archived databases to find where a specific record is. If the need is to resend then proper file naming should tell which file contains the records to resend.
  
  If you need to maintain data for empirical purposes, then you would ascertain which data is required and populate another database with just that data. It seems that empirical data should be available from other sources like your registration system, etc.
  
  Just my $.02 worth which may be overpriced. 😀
- August 23, 2016 at 11:31 am #84444
  Mike Keys
  Participant
  Perhaps it’s just getting used to the SMAT DB capabilities. Our current strategy is the current and previous day SMAT files uncompressed, then everything else is compressed and moved to an archive location. It’s a PITA to search and a PITA to uncompress and move back to its original location.
  
  However, as much as this new DB will help, there are some that are very used to one file per process per day to look at and they want to keep that same “feel”. My question was how best to replicate this in the SMAT DB environment?
- August 23, 2016 at 2:23 pm #84445
  Max Drown (Infor)
  Keymaster
  I know that change is hard, but in the case it is absolutely worth it.
  
  In order to take full advantage of the SMAT DB and auto-cycling functions, I recommend that you do two things differently: 01) do not compress the archived data and 2) do not move the archived data. I mean the archived data found in LogHistory and SmatHistory.
  
  By making these two changes you will allow Cloverleaf to search across all of the data in a site including all of the historical data. This is a very powerful function and is very easy to use. Imagine how easy a search for a visit number will be like this!! You do not have to know which files contain the messages! Everything is handled for you by the application now.
  
  Cloverleaf will automatically prune the SmatHistory sub-directories based on the values you choose, and I recommend you simply set a number of days to keep (ex. 30 days). This is done in the Site Options (Log, SMAT). There is also a setting to keep the logs pruned, and I suggest you set that to a number of files (ex. 25).
  
  This will increase the amount of disk you use of course, so you will need to plan for that. However, because of the automatic pruning, the amount of disk space used will be relatively stable once you make the adjustment and will only grow if your volume grows (like if new interfaces are added). If yo need to change your disk size now in order to accommodate for SMAT DB, then go ahead and plan for 5 years of growth just like you’d do with CPU and RAM.
  
  Because Cloverleaf will automatically delete old data, you will need to use a third party application like Tivoli to make daily backups and these backups can be compressed. Then you only need to mess with these backups when you need to search for data older than what you keep stored on disk (like the 30 day example above). You should be backing up your Cloverleaf installation at least daily anyways, so this is probably already being done in some way.
  
  The SMAT DB even with encryption turned on is greater that 20% disk I/O improved over the SMAT files. As we all know by know, disk I/O is the slowest part of the engine. Almost every performance problem we see with Cloverleaf is disk-related. So using SMAT DB is a huge help with throughput.
  
  -- Max Drown (Infor)
- August 23, 2016 at 2:32 pm #84446
  James Cobane
  Participant
  You can also move the SMAT DB files off, but simply soft-link the archive location under the $HCISITEDIR to allow the SMAT DB tool to access them. We currently cycle nightly, then move the SMAT DB files off to /logs and have that soft-linked under the site, so it appears simply as another directory under $HCISITEDIR
  
  Hope that helps.
  
  Jim Cobane
  
  Henry Ford Health System
- August 24, 2016 at 3:07 pm #84447
  Peter Heggie
  Participant
  We want our SmatDB searches to run quickly. Perhaps this is an organizational or work-style issue.
  
  The difference between a 5 second search and a 90 second search is important to us. After a recent HIS migration, and before an archive strategy was implemented, we had up to 45 days of data in each SmatDB. Searching for an account number in the incoming ADT feed was taking more than 90 seconds. This is significant to us. When problem resolution involves not just one search, but usually multiple searches for multiple accounts, with multiple variations of accompanying data, we could spend a lot of time just looking for things. We are still fighting a lot of fires even two months after our migration so we like to be able to find things quickly.
  
  It is also valuable to us to be able to select one day’s data or seven days data or 30 days data, just by selecting the files to be included from SmatHistory. We are pursuing the use of soft links to our archive folder, to avoid copying older files back into our SmatHistory folder for inclusion. As long as we can selectively include or exclude files, that solution will be fine.
  
  Searching across all files at once is a non-starter with the volume we have. We have already encountered searches that cause the GUI to hang and requires killing the Windows client session. I have posted about that before and I hope there is a fix for that.
  
  Using the site option to prune the SmatHistory means that anything older than that must be restored using our enterprise storage management software. It also means that we will have a lot more SmatDB files in that folder than we want to deal with. Following recommendations that inbound (ADT for example) and outbound threads be kept in the same process would mean that one process folder (one SmatHistory folder) could contain, for example with 20 ancillaries, 600 SmatDB files (or 1200 if we keep the ACKs) if we keep data for 30 days. So this could be a little annoying when selecting the outbound SmatDB files for one ancillary.
  
  Our new archiving process does not mix SmatDBs for different threads, it only combines 7 days worth of SMAT for one thread/one direction into one file.
  
  Perhaps we have spoiled our users with fast response times and the ability to ask us what happened to an account or a transaction that happened four months ago. It can sure help them when we can give them that information. And right now we are still in a ‘stabilization period’ and fixing a lot of issues that never surfaced during four rounds of heavy integration testing. So searching back two months for a single transaction is quite common.
  
  We also need some flexibility to handle, in one site, different kinds of interfaces with different retention requirements. We have incoming data from a document management system having large payloads of base64 encoded data. We don’t need to store this for more than a few days. Right next to it could be another interface that has to be archived for 90 days.
  
  Perhaps over time we will find our searching needs reduced such that we can include all files easily but we are not there yet.
  
  If we are talking about enhancements, I’d like to see those little ‘hide’ / ‘unhide’ arrows that we currently have for the SmatDB file selection window pane for the search criteria as well, just to free up some more screen real-estate. We would also love to see a little enable/disable flag on the drop down search criteria list so that we could just disable a criteria instead of deleting it and having to re-create it a few minutes later.
  
  Peter Heggie
  PeterHeggie@crouse.org
- August 24, 2016 at 3:11 pm #84448
  Max Drown (Infor)
  Keymaster
  Peter, that is fantastic feedback and exactly what I am looking for. Would you mind helping me organize a request for R&D? If so, please shoot me an email.
  
  -- Max Drown (Infor)
- August 25, 2016 at 8:37 pm #84449
  Mike Keys
  Participant
  Jim,
  
  Thanks for your advice! I reused an older Tcl proc, rewrote some sections and created softlinks.
  
  The Archive contains up to 60 days of uncompressed SMAT DB files for each process. The Tcl script runs right before midnight to try to get as close to capturing a full day’s worth. Even if the thread is cycled during the day, those SMAT DB files will be captured as well. The script also moved the LogHistory files and stats.
  
  Mike
  
  ~~James Cobane wrote:~~
  
  You can also move the SMAT DB files off, but simply soft-link the archive location under the $HCISITEDIR to allow the SMAT DB tool to access them.
- September 2, 2016 at 2:13 pm #84450
  Max Drown (Infor)
  Keymaster
  From R&D,
  
  ~~Quote:~~
  
  AR14811 is submitted with high priority to track this.
  
  -- Max Drown (Infor)
Author

Replies

Viewing 16 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.