HTML to Plain Text Converter

Homepage Clovertech Forums Read Only Archives Cloverleaf Tcl Library HTML to Plain Text Converter

  • Creator
    Topic
  • #53373
    James Mestack
    Participant

    Does anyone have any tcl code to replace html and insert proper spacing for HL7 OBX:5 fields in a result message?  Need some help ASAP.  Thanks.

Viewing 3 reply threads
  • Author
    Replies
    • #77462
      David Barr
      Participant

      I would write a script to call an external utility. This one might work:

      http://www.aaronsw.com/2002/html2text/

      You can also install lynx and run “lynx -dump file.html > file.txt”.

    • #77463
      Chris Williams
      Participant

      Perhaps we have been lucky, but when we have encountered HTML or RTF from external systems, there has often been a configuration setting on those systems that will select plain text versus formatted as the output. Have you touched base with the system’s vendor to see if that is an option?

      There are a number of conversion routines available, however we have not found any that deal well with formatted items within the document, like multi-column tables, headers, footers and other items requiring specific positioning on the page.

      Cheers.

    • #77464
      James Mestack
      Participant

      Unfortunately, I am stuck on a windows box.  Considering this, I was able to obtain lynx for win32, looks like it was compiled back in 99.  But it appears to work if the html is stored in a file and I call it from the command line.

      So this question may be related back to David to find out if he might know of a way to send lynx an html string in an exec call from a tcl script?

      Can I approach it through tcl this way or is there a more efficient way?:

      1) extract html from hl7 message and write to a test.tmp file.

      2) set html_to_txt_variable [exec lynx.exe -dump test.tmp]

      3) delete test.tmp file.

      4) set new_hl7_txt_string [string map

        $html_to_txt_variable]
    • #77465
      David Barr
      Participant

      Yeah, those are probably the steps that you’ll have to follow.

      One nice thing about lynx is that it does a pretty good job preserve table layouts.

Viewing 3 reply threads
  • The forum ‘Tcl Library’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,129
Forums
28
Topics
9,301
Replies
34,448
Topic Tags
288
Empty Topic Tags
10