How can I determine number of pages in encoded PDF?

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf How can I determine number of pages in encoded PDF?

  • Creator
    Topic
  • #54736
    Suzy Hoffman
    Participant

      I am getting a Base64 encoded PDF in a Z segment of an HL7 message.  Does anyone know of a way to determine how many pages are in the PDF using tcl?  If there is no way to do it with a Base 64 encoded PDF, is there another encoding I can change the PDF to, that would then allow me to check the number of pages (using tcl)?

    Viewing 2 reply threads
    • Author
      Replies
      • #82777
        David Barr
        Participant

          You can use the base64 package from Tcllib (included with Cloverleaf) to decode base64 data. Here’s an example:

          Code:

                     keylget args MSGID mh
                     package require hl7
                     package require base64
                     set msg [msgget $mh]
                     set hl7 [hl7::parse_msg $msg]
                     keylset userdata MRN [hl7::get_field hl7 PID.3]
                     keylset userdata ACCT [hl7::get_field hl7 PID.18]
                     keylset userdata DOC_ID [hl7::get_field hl7 OBR.2]
                     keylset userdata DOC_TYPE PACEEVAL
                     msgmetaset $mh USERDATA $userdata

                     msgset $mh [::base64::decode [hl7::get_field hl7 OBX.5.5]]
                     lappend dispList “CONTINUE $mh”

          After calling base64::decode you have the PDF data in a string. You have two options after that. One way is to send the data to an external utility like pdfinfo and read back the number of pages. You’d have to use the exec system call or use “open” with a piped command instead of a filename. The advantage of this option is that it is reliable and will continue to work even if the sender makes changes in the way that the PDF is produced.

          The other option is to look for a string in the PDF content that has page information and parse it out of the PDF. I just looked at one of the PDFs that I’m processing and found this line:

          1 0 obj<>

          I could use a regular expression to get the page count (2) without sending that data to an external utility.

        • #82778
          Suzy Hoffman
          Participant

            Thanks, David.  Finding the page count in the decoded string is just what I needed.  Now comes the struggle with the regular expression…

          • #82779
            David Barr
            Participant

              Something like this should work:

              Code:

              set pdf [::base64::decode [hl7::get_field hl7 OBX.5.5]]
              regexp {/Type/Pages/Count (d+)/} $pdf -> pages
              echo pages = $pages

              You might have to adjust the expression to match the format of your PDFs.

          Viewing 2 reply threads
          • The forum ‘Cloverleaf’ is closed to new topics and replies.