Next: Internal Character Encoding, Up: Internal Mule Encodings [Contents][Index]
ASCII characters are encoded using their position code directly. Other characters are encoded using their leading byte followed by their position code(s) with the high bit set. Characters in private character sets have their leading byte prefixed with a leading byte prefix, which is either 0x9E or 0x9F. (No character sets are ever assigned these leading bytes.) Specifically:
Character set Encoding (PC=position-code, LB=leading-byte) ------------- -------- ASCII PC-1 | Control-1 LB | PC1 + 0xA0 | Dimension-1 official LB | PC1 + 0x80 | Dimension-1 private 0x9E | LB | PC1 + 0x80 | Dimension-2 official LB | PC1 + 0x80 | PC2 + 0x80 | Dimension-2 private 0x9F | LB | PC1 + 0x80 | PC2 + 0x80
The basic characteristic of this encoding is that the first byte of all characters is in the range 0x00 - 0x9F, and the second and following bytes of all characters is in the range 0xA0 - 0xFF. This means that it is impossible to get out of sync, or more specifically:
None of the standard non-modal encodings meet all of these conditions. For example, EUC satisfies only (2) and (3), while Shift-JIS and Big5 (not yet described) satisfy only (2). (All non-modal encodings must satisfy (2), in order to be unambiguous.)
Next: Internal Character Encoding, Up: Internal Mule Encodings [Contents][Index]