I recommend using a commercial product to do image-to-OCR-to-PDF conversion. There is a reason there is a market for this kind of solution. It is very difficult to get it right. That being said, here is a link that will point you in the right direction for creating your own solution: Konrad Voelkel » Linux, OCR and PDF: Scan to PDF/A «
Generally here is what you do:
Extract the Base64 encoded string from OBX.5
Decode the string to binary
Validate the binary is a TIFF
Create a PDF (to be used later)
For each “page” of the TIFF do the following:
Perform OCR on the TIFF page to capture the text. (Usually will not be accurate enough for clinical)
Add a hidden PDF layer to the page that is the Text
Add another PDF layer to the page which is the image
Hey Todd, just wanted to thank you for taking the time to respond here. OCR was my missing link; I was using tiff2pdf, but that was simply creating a pdf file that was still an image, where the customer needs a text searchable file.
I’ve not come to a complete solution here. My OCR results are not super awesome quality. Some letters are changed, and some symbols are missing. I’ve tried tesseract, gocr and cuneiform so far. I have not looked into a professional/for sale solution. My tiff source image is a graphical representation of data (printed anesthesia record). My concern is as you mentioned above “Usually will not be accurate enough for clinical”
Anyway, just wanted to thank you for your instruction. This has been a great learning exercise for me.