How can I determine number of pages in encoded PDF?

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf How can I determine number of pages in encoded PDF?

  • Creator
    Topic
  • #54736
    Suzy Hoffman
    Participant

    I am getting a Base64 encoded PDF in a Z segment of an HL7 message.  Does anyone know of a way to determine how many pages are in the PDF using tcl?  If there is no way to do it with a Base 64 encoded PDF, is there another encoding I can change the PDF to, that would then allow me to check the number of pages (using tcl)?

Viewing 2 reply threads
  • Author
    Replies
    • #82777
      David Barr
      Participant

      You can use the base64 package from Tcllib (included with Cloverleaf) to decode base64 data. Here’s an example:

      Code:

                 keylget args MSGID mh
                 package require hl7
                 package require base64
                 set msg [msgget $mh]
                 set hl7 [hl7::parse_msg $msg]
                 keylset userdata MRN [hl7::get_field hl7 PID.3]
                 keylset userdata ACCT [hl7::get_field hl7 PID.18]
                 keylset userdata DOC_ID [hl7::get_field hl7 OBR.2]
                 keylset userdata DOC_TYPE PACEEVAL
                 msgmetaset $mh USERDATA $userdata

                 msgset $mh [::base64::decode [hl7::get_field hl7 OBX.5.5]]
                 lappend dispList “CONTINUE $mh”

      After calling base64::decode you have the PDF data in a string. You have two options after that. One way is to send the data to an external utility like pdfinfo and read back the number of pages. You’d have to use the exec system call or use “open” with a piped command instead of a filename. The advantage of this option is that it is reliable and will continue to work even if the sender makes changes in the way that the PDF is produced.

      The other option is to look for a string in the PDF content that has page information and parse it out of the PDF. I just looked at one of the PDFs that I’m processing and found this line:

      1 0 obj<>

      I could use a regular expression to get the page count (2) without sending that data to an external utility.

    • #82778
      Suzy Hoffman
      Participant

      Thanks, David.  Finding the page count in the decoded string is just what I needed.  Now comes the struggle with the regular expression…

    • #82779
      David Barr
      Participant

      Something like this should work:

      Code:

      set pdf [::base64::decode [hl7::get_field hl7 OBX.5.5]]
      regexp {/Type/Pages/Count (d+)/} $pdf -> pages
      echo pages = $pages

      You might have to adjust the expression to match the format of your PDFs.

Viewing 2 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,129
Forums
28
Topics
9,301
Replies
34,447
Topic Tags
288
Empty Topic Tags
10