Has anyone parsed the contents of a PDF document using TCL?

This topic has 3 replies, 2 voices, and was last updated 6 years, 11 months ago by David Barr.

Creator

Topic
September 6, 2018 at 11:55 pm #55778
David Coffey
Participant
Actually I do know of a shop that has done it to extract PHI to build the HL7 on the fly, I am hoping they respond. Anyone else? How is this done?

David Coffey
Creator

Topic

Viewing 2 reply threads

Author

Replies
- September 7, 2018 at 9:08 pm #86483
  David Barr
  Participant
  Install poppler-utils on Redhat. This includes a utility called “pdftotext”. You can write your message out to a file, then you can exec the pdftotext utility from TCL and read the output back in. You’ll have to parse the results based on the format of the data in the PDF and convert it to HL7. It usually helps to run “pdftotext -layout”. This makes the output correspond to the order of items on the page rather than the order they appear in the PDF file (they can differ).
- September 7, 2018 at 9:13 pm #86484
  David Coffey
  Participant
  I’m sorry I should have posted that I am running 5.8.5 on Windows. I am aware of the linux utility but I am on the wrong OS.
- September 7, 2018 at 9:59 pm #86485
  David Barr
  Participant
  You can get a Windows version of Poppler (http://blog.alivate.com.au/poppler-windows/), or you can use the Poppler package from Cygwin, which is what I use on Windows.
Author

Replies

Viewing 2 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.