Corrupt Content of Error DB

This topic has 3 replies, 2 voices, and was last updated 7 years, 11 months ago by Paul Glezen.

Creator

Topic
July 5, 2017 at 7:48 pm #55443
Paul Glezen
Participant
I believe I finally figured out some mysterious behavior whereby some messages in my error DB are corrupt. I’d like to relate it here to have my suspicion vetted.

We have two inbound nodes that accept messages from mainframe systems that arrive in quasi-EBCDIC. Quasi-EBCDIC is my term for EBCDIC characters mixed with binary fields. Because the binary fields must be treated in a special manner, we must set encoding to binary on the inbound Encoding field and perform the translation ourselves.

Inbound Node1 performs this translation via a Tcl proc invoked as an IB TPS. It transforms the binary fields with custom logic and decodes the rest of the message using one of the tables referenced through msgmapdata. Inbound Node2 performs that translation through a Tcl proc invoked as a route detail pre proc. I’m told this was because the EBCDIC-to-UTF8 conversion depended on the destination which wasn’t known until the translation thread.

In the absence of errors, both schemes work, even though both are scary. When an error is encountered within the translation thread, a message is added to the error DB. Error DB entries from Node1 have valid (UTF8) contents. Error DB entries from Node2 have corrupt somewhat-EBCDIC contents. When dumped and decoded with hciencode (cp037), there is some recognizable content; but the character “C” is inserted between most other characters.

My suspicion is that Node1 recovers successfully because it completes manual EBCDIC to UTF8 encoding in the protocol thread and this message makes it intact to the pre translation queue. If an error occurs during translation, it is the message in the pre translation queue that is used to populate the error DB. Node2, on the otherhand, places quasi-EBCDIC on the pre translation queue. If the Node2 translation rule encounters an error, Cloverleaf takes over a message in the pre translation queue that is not valid UTF8. Cloverleaf fails to attempt to interpret this as UTF8 (the way it expects all messages internal to the engine) and the resulting content in the error DB is the failed decoding.

Does that seem like a sensible explanation?

My work around for now is to go dig it out of the inbound SMAT and resend from there (after I’ve fixed what caused the error in the first place).
Creator

Topic

Viewing 2 reply threads

Author

Replies
- July 12, 2017 at 2:36 pm #85355
  Paul Glezen
  Participant
  Just an update on my findings. I’m more confident about my suspicions outlined above after I figured out how to decode the error DB content. I was mistaken near the end about the failure to decode the recovery DB content as UTF-8 (though it seems like a miracle that it somehow succeeded). Nevertheless, if I dump the contents of the message and
  
  1. manually utf8-decode the contents,
  
  2. treat the result as a hex representation of an EBCDIC encoding and decode that,
  
  I get the expected contents. The mysterious “C” character in between most of the other characters is the result of utf8 starting each two-byte sequence with 1100. In EBCDIC this is the letter C. So the utf8 encoding of an EBCDIC encoding applied to the capital letter P converts it to xC3x97.
  
  EBCDIC(“P”) = xD7
  
  utf8(EBCDIC(“P”)) = utf8(xD7) = xC3x97 = EBCDIC(“Cp”)
  
  What a coincidence that utf8-encoding an EBCDIC capital letter will convert it to a “C” followed by the corresponding lowercase letter. The coincidence is enabled by the fact that upper-to-lower conversion amounts to changing the second bit from 1 to 0. (In ASCII, this is changing the 3rd bit from 0 to 1).
  
  “P” = xD7 = 1101 0111 —- 1001 0111 = x97 = “p”
  
  That really through me off.
  
  I’m teaching myself how to wield the “binary scan” and “binary format” Tcl commands in order to write scripts to extract, edit, and resend these messages.
- July 12, 2017 at 8:45 pm #85356
  David Coffey
  Participant
  Go here for a quick explanation on how to go back and forth between ASCII and EBCDIC http://wiki.tcl.tk/6205
- July 28, 2017 at 4:45 pm #85357
  Paul Glezen
  Participant
  Here is how I ended up doing it. I removed most of the error handling for brevity.
  
  Code: # This Tcl script should be invoked as # # tclsh – > # # where is an input file and format is one of # “a” (for ASCII), “e” (for EBCDIC), or “u” (for UTF-8). # # … parameter assigning and error checking removed for brevity # The emt_date should be 6/26/2017 in Julian format. # which is yyyyddd where ddd is days since January 1. # set emt_date_ascii “2017177” set emt_date_ebcdic [encoding convertto ebcdic $emt_date_ascii] puts stderr “Setting PIX emt_date to $emt_date_ascii” set infd [open $infile “rb”] set ubin [read -nonewline $infd] set ebin [encoding convertfrom utf-8 $ubin] set abin [encoding convertfrom ebcdic $ebin] set ebout [string replace $ebin 48 54 $emt_date_ebcdic] set about [string replace $abin 48 54 $emt_date_ascii] set ubout [encoding convertto utf-8 $ebout] switch $format { “utf8” { puts -nonewline $ubout } “ebcdic” { puts -nonewline $ebout } “ascii” { puts -nonewline $about } default { puts stderr “Error: unknown format.” } }
  
  To replay the messages, I invoked this with -u. If I just wanted to see the contents, I invoked it with -a.
Author

Replies

Viewing 2 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.