UTF-8 encoded XML/CCDs with CL 5.8

This topic has 11 replies, 6 voices, and was last updated 11 years, 10 months ago by Robert Milfajt.

Creator

Topic
January 17, 2013 at 6:48 pm #53476
Steve Drozdowski
Participant
Hello, we are having issues with CCDs (large XML documents) from our Cerner EMR system that are UTF-8 encoded – The XML header contains:

The encoding on our inbound and outbound threads is set to the default of “ASCII”. We are also running these msgs thru several TCL procs along the way.

One of our downstream systems is complaining that they are receiving invalid non-ASCII characters and invalid XML chars, including: x00 x13 x19 x1c x1d xa0 xad xae xb0 xb3 xb7 xba xbc xbd xbe xe9

I wrote a TCL proc that uses “string map” to replace all of these chars with a space. However the proc does not seem to be working. The extended ASCII chars (> 127) are being replaced, but not the lower chars (Ex. x19).

Has anyone had any similar experiences? I am wondering if I need to change the encoding on my inbound and/or outbound threads from “ASCII” to “UTF-8” or ??

Also, I have read that TCL itself uses its own encoding, so maybe I need to over-ride that somehow?

Running CL 5.8.4 on AIX 6.1

Thanks in advace,

Steve Drozdowski

Banner Health
Creator

Topic

Viewing 10 reply threads

Author

Replies
- January 17, 2013 at 7:07 pm #77803
  Robert Milfajt
  Participant
  Not sure this will help, but starting sometime between 5.5 and 5.8.5, Cloverleaf started added Encoding as a thread property. See right click on thread in NetConfig->Thread Properties->Thread tab.
  
  If your source sends non-ASCII characters, which I found our Cerner system does a lot, you get a lot of garbage on the outbound side. If you changed encoding to bypass for both source and destination threads, this does not happen.
  
  Hope this helps,
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- January 17, 2013 at 9:24 pm #77804
  Steve Drozdowski
  Participant
  Hi Robert, thanks for the info…
  
  Do you know exactly what does “bypass” do? Does it just skip all msg encoding/decoding and just treat the msg as a binary data-stream?
  
  Also, do you know if I would need to change the encoding to “bypass” on just the IB thread, or on everything thread that the XML msgs might flow thru, incouding the OB thread?
  
  Thanks again.
- January 21, 2013 at 3:04 pm #77805
  Robert Milfajt
  Participant
  I do not know what bypass does, and can only guess it acts like the engine did prior to adding Encoding. I can tell you it fixed the problems I was having with Cerner Radnet reports filing in Epic.
  
  I had to implement on both inbound and outbound threads.
  
  Hope this helps,
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- January 22, 2013 at 3:19 pm #77806
  Fred Rosenberger
  Participant
  We use a script that simply removes all non-ascii characters. The core being
  
  regsub -all {[^x00-x7F]} $msg {} msg
  
  This could be modified to remove the lower characters as well, and/or to put in a space.
- January 22, 2013 at 6:49 pm #77807
  Robert Milfajt
  Participant
  We were getting characters outside the ASCII 127, which were actually displaying correctly in Epic, like subscripted two. That is why we needed to make this work.
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- January 29, 2013 at 9:40 pm #77808
  Steve Drozdowski
  Participant
  FYI – I was able to get things working properly by changing the encoding on all affected threads to “binary”. “bypass” did not seem to work for me… Thanks everyone.
- January 31, 2013 at 12:17 am #77809
  Elisha Gould
  Participant
  From what I have seen, bypass still converts the encoding, but when it sends on the message it converts it back to the original.
  
  Be careful if you do translations on the part that has the non-ascii encoding.
  
  If you need to translate the the non-ascii portion, use the binary encoding or use “encoding convertto identity”. binary encoding is as close to an actual bypass encoding as possible.
  
  Just as an additional note. If you write the message to the log file it will not be encoded propery in the log file, including when using binary.
- September 11, 2013 at 3:56 pm #77810
  Russ Ross
  Participant
  We have upgraded from Cloverleaf 5.6 to Cloverleaf 6.0 and our validation testing kicked out differences for messages with high value character(s) in the range of 128-255.
  
  We then noticed the new encoding drop down in Cloverleaf 6.0 that isn’t in Cloverleaf 5.6 and started tinkering with that.
  
  We also learned after doing hcirootcopy of a site from Cloverleaf 5.6 to Cloverleaf 6.0 the default encoding is set to ASCII so we first tried using binary to get some interfaces working like TIF/RTIF interfaces to our EBCDIC mainframe.
  
  We too have been palying with binary and wondering about bypass.
  
  I have not gotten comfortable with understanding when to use which one and we have been going thru trail and error to try and self educate.
  
  I looked for the encoding mentioned above called “encoding convertto identity”, but don’t see it in Cloverleaf 6.0.
  
  I’m listing the interface encodings that I’m seeing as choices in the Cloverleaf 6.0 GUI/IDE; the bolded ones are the ones we have tried to understanding thus far.
  
  ASCII
  
  Big5
  
  binary
  
  bypass
  
  cyrillic
  
  gb18030
  
  GB2312
  
  greek
  
  hebrew
  
  IBM037
  
  latin1
  
  latin2
  
  latin3
  
  latin4
  
  Shift_JIS
  
  UTF-16
  
  UTF-16BE
  
  UTF-16LE
  
  UTF-8
  
  windows-1252
  
  Russ Ross
  RussRoss318@gmail.com
- September 11, 2013 at 4:41 pm #77811
  James Cobane
  Participant
  We had some interfaces that had extended-ASCII characters in them; for those we changed the encoding to Windows-1252.
  
  Thanks,
  
  Jim Cobane
  
  Henry Ford Health
- September 12, 2013 at 1:28 am #77812
  Elisha Gould
  Participant
  Hi Russ,
  
  The use of “encoding convertto identity” is within a tcl script if you are doing data manipulation or writing to files. You probably wont need to use it if you are not doing anything except passing on.
  
  If you don’t touch the data and just route the message then bypass should be ok, otherwise use binary for your encoding.
  
  I believe bypass takes the data as is and sets the encoding type as UTF-8 for it, rather than going from the specified encoding to UTF-8 and back again on the other side.
  
  My suggestion is to set up the site both in 5.6 and 6.0 run through some data and check that the output is the same.
- September 12, 2013 at 7:13 pm #77813
  Robert Milfajt
  Participant
  We changed encoding to bypass and it worked just fine for us, no need for other scripting, etc.
  
  Hope this helps,
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
Author

Replies

Viewing 10 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.