Inbound PDF to base64-encoded HL7

Clovertech Forums Cloverleaf Inbound PDF to base64-encoded HL7

Tagged: ,

  • Creator
    Topic
  • #118608
    Timothy O’Donnell
    Participant

      Good afternoon! I’m working on an interface to process PDFs, base64 encode and HL7-embed that information and I’m running into issues. On Cloverleaf 6.1.4.

      VisualCron can drop a PDF file to Cloverleaf for me, and I want to use a fileset-local inbound to parse the PDF, pull the file name (which will have the patient identifier like V0000011.pdf,) base64-encode the PDF and pass both that encoded string and the file name to an xlate that would build an ORU, using SQL queries to grab more data on the patient and drop the encoded PDF string into OBX.5.

      I’ve written a dirParse tcl to process only PDF files and an archive tcl to copy the PDF off to an archive as needed. I have a TPS to encode the PDF and pass that and the file name forward to the xlate. Then I have the xlate on a route between the fileset-local and another file thread (just for testing for now) with a VRL inbound and HL7 2.4 ORU outbound. The issue I’m having is that the data that is being passed from the fileset-local inbound is just thousands of characters, which I assume is the PDF being parsed as text, so I’m getting dozens of outbound ORUs with various strings in the OBX.5. Clearly I’m doing something wrong but I can’t for the life of me find anything on this forum that clearly indicates how to setup something like this.

      I think I need to have the inbound tcl on the Trx ID Determination Format: UPOC instead of TPS Inbound Data but no matter what I do, I can’t get the filename from DRIVERCTL and I can’t get just the encoded PDF string and the filename sent forward to the xlate instead of all those random PDF characters. Any help would be appreciate and I can post some tcl if need be but it’s super rudimentary so I’m willing to start from scratch if needed. Thanks!

      -Timothy

    Viewing 6 reply threads
    • Author
      Replies
      • #118610
        Jim Kosloskey
        Participant

          What is the Style you have associated with the Fileset Protocol?

          If you expect just one PDF per file, then use a style like ‘single’ which will treat the entire file as a single message.

          You could put the file name in the USERDATA message metadata (don’t forget to use a keyed list) then your VRL would only have one field (unlimited length) – that being the Base64 encoded PDF. But that is just something different and should not have caused you the issue you are seeing. I just like to have a PDF defined as a single field VRL whenever possible.

          Let us know if you are already using the ‘single’ style.

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

          • #118612
            Timothy O’Donnell
            Participant

              Jim,

              I didn’t have the Style set to “Single” so I have changed that and bounced the process.

              For the Directory Parse, I’m simply saying “PDF only” and then I have a tps on TPS Inbound Data (code snippet below) to encode the PDF:

              run {
              # ‘run’ mode always has a MSGID; fetch and process it

              keylget args MSGID mh
              package require base64

              set msg [msgget $mh]
              fconfigure $msg -translation binary
              set encodedPDF [base64::encode [read -nonewline $msg]]
              lappend new_msg $encodedPDF

              msgset $mh $new_msg
              lappend dispList “continue $mh”

              }

              Now I’m getting this ERR: can not find channel named “%PDF-1.7 so I’m not sure what I should be writing for this TPS Inbound Data tcl in order to pass the appropriate data along to the xlate.

              -Timothy

            • #118613
              Jim Kosloskey
              Participant

                There should be no need for you to read the file again. The file has already been read at the point your proc is being invoked and what is in your msg variable should be the PDF. Then just Base64 encode $msg.

                email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

              • #118614
                Timothy O’Donnell
                Participant

                  Jim,

                  I removed that unnecessary read but the value being passed to me, instead of the actual PDF data, is just the string “%PDF-1.7” instead of the actual PDF that I’m feeding it as a test.

                  This is the relevant code snippet for my dirParse:

                  run {
                  # ‘run’ mode always has a MSGID; fetch and process it

                  keylget args MSGID mh

                  # add double quotes around filename with spaces
                  #set files [string map {“\x0d\x0a” “\x01”} [msgget $mh]]

                  set files [msgget $mh]
                  set newlist {}
                  foreach f $files {
                  if ![regexp — {.*\.pdf} $f] {
                  echo Skip $f
                  continue
                  }
                  lappend newlist $f
                  }
                  msgset $mh $newlist
                  lappend dispList “CONTINUE $mh”
                  }

                  And this is the relevant code snippet for my TPS Inbound Data tcl:

                  run {
                  # ‘run’ mode always has a MSGID; fetch and process it

                  keylget args MSGID mh
                  package require base64

                  set msg [msgget $mh]
                  echo “This is the Message: $msg”
                  fconfigure $msg -translation binary
                  set encodedPDF [base64::encode $msg]
                  echo “The encoded PDF is $encodedPDF”
                  lappend new_msg $encodedPDF

                  msgset $mh $new_msg
                  lappend dispList “CONTINUE $mh”

                  }

                   

                • #118615
                  Jim Kosloskey
                  Participant

                    I don’t think you need the fconfigure either. At this point there is no file just a message (hopefully) from the file

                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                  • #118616
                    Timothy O’Donnell
                    Participant

                      Jim,

                      I removed that fconfigure as well. I’m still getting just the string “%PDF-1.7” so I’m wondering if you know what would cause that to be passed from the Directory Parse tcl.

                      -Timothy

                    • #118617
                      Jim Kosloskey
                      Participant

                        The dirparse UPoC proc receives a list of files Cloverleaf has located in the specified directory. The proc then alters that list to only include the names of the files one wants and returns that list.

                        At this point NO files have actually been read.

                        Once the DirParse UPoC returns control to the engine, the protocol Opens and Reads each file in the modified list one at a time.

                        Each read (based on style) returns a Message Handle to the IB TPS UPoC. which is the current message found in the current read of the current file.

                        In your case. the msg variable contains the message as read by the Fileset Local protocol (based on the style).

                        So now when you test if you have the EO turned up, the process log should have the message as read by the protocol. If all of the Protocol settings are correct, that should be your PDF.

                        I am assuming you are testing with just one file (and hopefully a small PDF) to get this working. If that is the case, you should see your PDF message from the file in the log.

                        If you would like to take this off-line, email me and I will try to assist.

                        email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                    • #118611
                      Jim Kosloskey
                      Participant

                        Oh and please don’t forget to use the appropriate components of OBX-5 to identify and contain the Embedded Data as well as setting OBX-2 to the proper value (ED).

                        email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                      • #118639
                        Robert Kersemakers
                        Participant

                          Oh, I have done this.
                          As Jim said: you shouldn’t use DirParse as it only reads/lists the files found and does not read the actual files.

                          You will need to make a UPoC that reads the filename and processes this file as you would like it to. What I did: read the filename, copy the file itself to the place specified (as parameter of the script) then use the filename and turn it into a (VRL) message. This message is sent on to be translated by an xlate. This way I have a generic script to process several different files.

                          Zuyderland Medisch Centrum; Heerlen/Sittard; The Netherlands

                        • #118676
                          Timothy O’Donnell
                          Participant

                            For anyone who comes across this thread in the future looking to do something similar, I finally got this to work with the help of everyone on this thread:

                            Setup Inbound thread as fileset-local with Style: single (Important if each file is a single non-base64 encoded PDF!)

                            I’m using a dirParse but just to skip any non-PDF file. Doesn’t do anything else. I also have an Archive tcl for Deletion to just copy the PDF to another folder for testing purposes. This won’t be in Production.

                            I have a tcl on TPS Inbound Data for the base64 encode that returns the file name (without extension) and the base64 encoded PDF with the data separated by commas. That data is then passed to the xlate which has uses an inbound VRL with two fields – file name and embedded PDF – then I build the HL7 from there with the necessary data. Here’s the run from my TCL, your mileage may vary depending on what you need to do with the information. At the very least, this could be a good jumping off point.

                            run {
                            # ‘run’ mode always has a MSGID; fetch and process it

                            keylget args MSGID mh
                            package require base64
                            set msg [msgget $mh]
                            set filepath {}
                            set drvCtl [msgmetaget $mh DRIVERCTL]
                            keylget drvCtl FILENAME filepath
                            set filename [file rootname [file tail $filepath]]
                            lappend new_msg $filename
                            set encodedPDF [base64::encode -maxlen 0 $msg]
                            lappend new_msg $encodedPDF

                            msgset $mh [join $new_msg “,”]
                            lappend dispList “CONTINUE $mh”
                            }

                            Hope this helps!

                            -Timothy

                            • #119039
                              Timothy O’Donnell
                              Participant

                                UPDATE: After being delayed for a few months, we finally went live with this project only to find out that the process I outlined above didn’t work exactly as planned. The overall concept worked but the PDF was blank – correct title and number of pages, but no content. We didn’t run into this when first testing but we also tested with smaller-sized PDFs and I relied on the vendor to confirm the PDFs were valid. Lessons learned.

                                I decided to reassess how I set this up and came up with a solution that actually worked. The setup is roughly the same as before but on my fileset-local inbound thread, on the Inbound tab > TPS Inbound Data, I changed the tcl to only pull the filename of the PDF (which is a patient identifier in this case) and pass that along to the xlate. The VRL I changed to have one field – the filename – instead of before where it had two – one for filename and one for the base64-encoded PDF string.

                                I also updated the Archive tcl on the fileset-local inbound thread copying the PDF to a “staging” folder on the CL server.

                                The xlate builds the HL7, SQL-querying with the filename to build the patient demographics out as required by the vendor. This is the same as it was before. The difference is I now have a tcl snippet taking in the filename value, which lets me open the PDF that I put in the “staging” folder from the Archive tcl and then I do the base64 encoding here, using fconfigure to translate the PDF to binary first. Outbound variable now was the base64 string and then I delete the copied file that I put in the “staging” folder as it’s no longer needed. I’ve put the relevant tcl snippet below.

                                package require base64
                                set visitNumber $xlateInVals
                                set filePath “$HciRoot/data/UTF/Staging/$visitNumber.PDF”
                                set fileHandle [open $filePath r]
                                fconfigure $fileHandle -translation binary
                                set encodedPDF [string map {\n “”} [base64::encode [read $fileHandle [file size $filePath]]]]
                                close $fileHandle
                                set xlateOutVals $encodedPDF
                                file delete -force “$HciRoot/data/UTF/Staging/$visitNumber.PDF”

                                This may not be the most elegant or efficient way to do this, but given the size and scope of this interface – 20-30 600KB PDFs dropped overnight during non-peak hours for an interface that doesn’t require a huge amount of patient demographics in the HL7 – it works swimmingly. I also found a base64 encode/decode site is a huge help for quickly identifying if my base64-encoded PDF string was correct or not. Definitely recommend for speedy confirmation.

                                -Timothy

                            • #118677
                              Jim Kosloskey
                              Participant

                                I am glad you got this to work.

                                Thanks for sharing. I am sure this will be helpful to others.

                                email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                              • #118678
                                Jim Kosloskey
                                Participant

                                  I am curious – where do you get the demographic information for the HL/7 message (at least a patient identifier)? Or does the receiving system not care?

                                  email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                                  • #118679
                                    Timothy O’Donnell
                                    Participant

                                      The PDF filename will always be the Patient Visit Number from our EMR and then I have an Advanced Database Lookup to our EMR SQL server in the XLATE to pull the minimum necessary demographics like Name, DOB, Gender, Department, etc. based on that Visit Number to fill in the gaps in the HL7. The volume for this interface is likely to be relatively small and controlled – files will always be dropped at a specified time and not in large quantities – so the combination of PDF file sizes and SQL queries shouldn’t be too taxing. If the overall volume was larger, I’d probably want the patient demographics in file metadata or in the file name as well.

                                      -Timothy

                                  • #118680
                                    Jim Kosloskey
                                    Participant

                                      Excellent! And you can use the Filest Protocol pacing parameters to control the arrival rate in order to smooth things out.

                                      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                                  Viewing 6 reply threads
                                  • You must be logged in to reply to this topic.