Next: CCL, Previous: Encodings, Up: MULE Character Sets and Encodings [Contents][Index]
In SXEmacs/Mule, each character set is assigned a unique number, called a leading byte. This is used in the encodings of a character. Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has a leading byte of 0), although some leading bytes are reserved.
Charsets whose leading byte is in the range 0x80 - 0x9F are called official and are used for built-in charsets. Other charsets are called private and have leading bytes in the range 0xA0 - 0xFF; these are user-defined charsets.
More specifically:
Character set Leading byte ------------- ------------ ASCII 0 Composite 0x80 Dimension-1 Official 0x81 - 0x8D (0x8E is free) Control-1 0x8F Dimension-2 Official 0x90 - 0x99 (0x9A - 0x9D are free; 0x9E and 0x9F are reserved) Dimension-1 Private 0xA0 - 0xEF Dimension-2 Private 0xF0 - 0xFF
There are two internal encodings for characters in SXEmacs/Mule. One is called string encoding and is an 8-bit encoding that is used for representing characters in a buffer or string. It uses 1 to 4 bytes per character. The other is called character encoding and is a 19-bit encoding that is used for representing characters individually in a variable.
(In the following descriptions, we’ll ignore composite characters for the moment. We also give a general (structural) overview first, followed later by the exact details.)
• Internal String Encoding: | ||
• Internal Character Encoding: |