Convert TIFF to PDF

Clovertech Forums Cloverleaf Convert TIFF to PDF

  • Creator
    Topic
  • #120030
    Dustin Sayes
    Participant

      The EMR vendor is sending a base64 TIFF in an ORU.

      We need this TIFF image file to be text searchable.

      Are there any options to convert a TIFF image to a text searchable format – PDF or HTML?

       

      Thank you!

    Viewing 1 reply thread
    • Author
      Replies
      • #120033
        Todd Hamilton
        Participant

          Dustin,

          I recommend using a commercial product to do image-to-OCR-to-PDF conversion.   There is a reason there is a market for this kind of solution.  It is very difficult to get it right. That being said, here is a link that will point you in the right direction for creating your own solution: Konrad Voelkel » Linux, OCR and PDF: Scan to PDF/A «

          Generally here is what you do:

          1. Extract the Base64 encoded string from OBX.5
          2. Decode the string to binary
          3. Validate the binary is a TIFF
          4. Create a PDF (to be used later)
          5. For each “page” of the TIFF do the following:
          6.   Perform OCR on the TIFF page to capture the text. (Usually will not be accurate enough for clinical)
          7.   Add a hidden PDF layer to the page that is the Text
          8.   Add another PDF layer to the page which is the image

          Again I recommend looking for a commercial product to do this.  Home – ARMA Buyer’s Guide (armabuyersguide.org)

           

          todd.hamilton.omaha@gmail.com
          (402) 660-2787

        • #120036
          Dustin Sayes
          Participant

            Hey Todd, just wanted to thank you for taking the time to respond here. OCR was my missing link; I was using tiff2pdf, but that was simply creating a pdf file that was still an image, where the customer needs a text searchable file.

            I’ve not come to a complete solution here. My OCR results are not super awesome quality. Some letters are changed, and some symbols are missing. I’ve tried tesseract, gocr and cuneiform so far. I have not looked into a professional/for sale solution. My tiff source image is a graphical representation of data (printed anesthesia record). My concern is as you mentioned above “Usually will not be accurate enough for clinical”

             

            Anyway, just wanted to thank you for your instruction. This has been a great learning exercise for me.

        Viewing 1 reply thread
        • You must be logged in to reply to this topic.