UTF-8 encoded XML/CCDs with CL 5.8

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf UTF-8 encoded XML/CCDs with CL 5.8

  • Creator
    Topic
  • #53476
    Steve Drozdowski
    Participant

      Hello, we are having issues with CCDs (large XML documents) from our Cerner EMR system that are UTF-8 encoded – The XML header contains:  

      The encoding on our inbound and outbound threads is set to the default of “ASCII”.  We are also running these msgs thru several TCL procs along the way.

      One of our downstream systems is complaining that they are receiving invalid non-ASCII characters and invalid XML chars, including:  x00 x13 x19 x1c x1d xa0 xad xae xb0 xb3 xb7 xba xbc xbd xbe xe9

      I wrote a TCL proc that uses “string map” to replace all of these chars with a space.  However the proc does not seem to be working.  The extended ASCII chars (> 127) are being replaced, but not the lower chars (Ex. x19).

      Has anyone had any similar experiences?  I am wondering if I need to change the encoding on my inbound and/or outbound threads from “ASCII” to “UTF-8” or ??

      Also, I have read that TCL itself uses its own encoding, so maybe I need to over-ride that somehow?

      Running CL 5.8.4 on AIX 6.1

      Thanks in advace,

      Steve Drozdowski

      Banner Health

    Viewing 10 reply threads
    • Author
      Replies
      • #77803
        Robert Milfajt
        Participant

          Not sure this will help, but starting sometime between 5.5 and 5.8.5, Cloverleaf started added Encoding as a thread property.  See right click on thread in NetConfig->Thread Properties->Thread tab.

          If your source sends non-ASCII characters, which I found our Cerner system does a lot, you get a lot of garbage on the outbound side.  If you changed encoding to bypass for both source and destination threads, this does not happen.

          Hope this helps,

          Robert Milfajt
          Northwestern Medicine
          Chicago, IL

        • #77804
          Steve Drozdowski
          Participant

            Hi Robert, thanks for the info…

            Do you know exactly what does “bypass” do?  Does it just skip all msg encoding/decoding and just treat the msg as a binary data-stream?

            Also, do you know if I would need to change the encoding to “bypass” on just the IB thread, or on everything thread that the XML msgs might flow thru, incouding the OB thread?

            Thanks again.

          • #77805
            Robert Milfajt
            Participant

              I do not know what bypass does, and can only guess it acts like the engine did prior to adding Encoding.  I can tell you it fixed the problems I was having with Cerner Radnet reports filing in Epic.

              I had to implement on both inbound and outbound threads.

              Hope this helps,

              Robert Milfajt
              Northwestern Medicine
              Chicago, IL

            • #77806
              Fred Rosenberger
              Participant

                We use a script that simply removes all non-ascii characters.  The core being

                regsub -all {[^x00-x7F]} $msg {} msg

                This could be modified to remove the lower characters as well, and/or to put in a space.

              • #77807
                Robert Milfajt
                Participant

                  We were getting characters outside the ASCII 127, which were actually displaying correctly in Epic, like subscripted two.  That is why we needed to make this work.

                  Robert Milfajt
                  Northwestern Medicine
                  Chicago, IL

                • #77808
                  Steve Drozdowski
                  Participant

                    FYI – I was able to get things working properly by changing the encoding on all affected threads to “binary”.  “bypass” did not seem to work for me…  Thanks everyone.

                  • #77809
                    Elisha Gould
                    Participant

                      From what I have seen, bypass still converts the encoding, but when it sends on the message it converts it back to the original.

                      Be careful if you do translations on the part that has the non-ascii encoding.

                      If you need to translate the the non-ascii portion, use the binary encoding or use “encoding convertto identity”. binary encoding is as close to an actual bypass encoding as possible.

                      Just as an additional note. If you write the message to the log file it will not be encoded propery in the log file, including when using binary.

                    • #77810
                      Russ Ross
                      Participant

                        We have upgraded from Cloverleaf 5.6 to Cloverleaf 6.0 and our validation testing kicked out differences for messages with high value character(s) in the range of 128-255.

                        We then noticed the new encoding drop down in Cloverleaf 6.0 that isn’t in Cloverleaf 5.6 and started tinkering with that.

                        We also learned after doing hcirootcopy of a site from Cloverleaf 5.6 to Cloverleaf 6.0 the default encoding is set to ASCII so we first tried using binary to get some interfaces working like TIF/RTIF interfaces to our EBCDIC mainframe.

                        We too have been palying with binary and wondering about bypass.

                        I have not gotten comfortable with understanding when to use which one and we have been going thru trail and error to try and self educate.

                        I looked for the encoding mentioned above called “encoding convertto identity”, but don’t see it in Cloverleaf 6.0.

                        I’m listing the interface encodings that I’m seeing as choices in the Cloverleaf 6.0 GUI/IDE; the bolded ones are the ones we have tried to understanding thus far.

                        ASCII

                        Big5

                        binary

                        bypass

                        cyrillic

                        gb18030

                        GB2312

                        greek

                        hebrew

                        IBM037

                        latin1

                        latin2

                        latin3

                        latin4

                        Shift_JIS

                        UTF-16

                        UTF-16BE

                        UTF-16LE

                        UTF-8

                        windows-1252

                        Russ Ross
                        RussRoss318@gmail.com

                      • #77811
                        James Cobane
                        Participant

                          We had some interfaces that had extended-ASCII characters in them; for those we changed the encoding to Windows-1252.

                          Thanks,

                          Jim Cobane

                          Henry Ford Health

                        • #77812
                          Elisha Gould
                          Participant

                            Hi Russ,

                            The use of “encoding convertto identity” is within a tcl script if you are doing data manipulation or writing to files. You probably wont need to use it if you are not doing anything except passing on.

                            If you don’t touch the data and just route the message then bypass should be ok, otherwise use binary for your encoding.

                            I believe bypass takes the data as is and sets the encoding type as UTF-8 for it, rather than going from the specified encoding to UTF-8 and back again on the other side.

                            My suggestion is to set up the site both in 5.6 and 6.0 run through some data and check that the output is the same.

                          • #77813
                            Robert Milfajt
                            Participant

                              We changed encoding to bypass and it worked just fine for us, no need for other scripting, etc.

                              Hope this helps,

                              Robert Milfajt
                              Northwestern Medicine
                              Chicago, IL

                          Viewing 10 reply threads
                          • The forum ‘Cloverleaf’ is closed to new topics and replies.