Previous: , Up: Coding Systems   [Contents][Index]


66.6.8 Coding Systems Implemented

MULE initializes most of the commonly used coding systems at SXEmacs’s startup. A few others are initialized only when the relevant language environment is selected and support libraries are loaded. (NB: The following list is based on XEmacs 21.2.19, the development branch at the time of writing. The list may be somewhat different for other versions. Recent versions of GNU Emacs 20 implement a few more rare coding systems; work is being done to port these to SXEmacs.)

Unfortunately, there is not a consistent naming convention for character sets, and for practical purposes coding systems often take their name from their principal character sets (ASCII, KOI8-R, Shift JIS). Others take their names from the coding system (ISO-2022-JP, EUC-KR), and a few from their non-text usages (internal, binary). To provide for this, and for the fact that many coding systems have several common names, an aliasing system is provided. Finally, some effort has been made to use names that are registered as MIME charsets (this is why the name ’shift_jis contains that un-Lisp-y underscore).

There is a systematic naming convention regarding end-of-line (EOL) conventions for different systems. A coding system whose name ends in "-unix" forces the assumptions that lines are broken by newlines (0x0A). A coding system whose name ends in "-mac" forces the assumptions that lines are broken by ASCII CRs (0x0D). A coding system whose name ends in "-dos" forces the assumptions that lines are broken by CRLF sequences (0x0D 0x0A). These subsidiary coding systems are automatically derived from a base coding system. Use of the base coding system implies autodetection of the text file convention. (The fact that the -unix, -mac, and -dos are derived from a base system results in them showing up as "aliases" in ‘list-coding-systems’.) These subsidiaries have a consistent modeline indicator as well. "-dos" coding systems have ":T" appended to their modeline indicator, while "-mac" coding systems have ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).

In the following table, each coding system is given with its mode line indicator in parentheses. Non-textual coding systems are listed first, followed by textual coding systems and their aliases. (The coding system subsidiary modeline indicators ":T" and ":t" will be omitted from the table of coding systems.)

### SJT 1999-08-23 Maybe should order these by language? Definitely need language usage for the ISO-8859 family.

Note that although true coding system aliases have been implemented for XEmacs 21.2, the coding system initialization has not yet been converted as of 21.2.19. So coding systems described as aliases have the same properties as the aliased coding system, but will not be equal as Lisp objects.

automatic-conversion
undecided
undecided-dos
undecided-mac
undecided-unix

Modeline indicator: Auto. A type undecided coding system. Attempts to determine an appropriate coding system from file contents or the environment.

raw-text
no-conversion
raw-text-dos
raw-text-mac
raw-text-unix
no-conversion-dos
no-conversion-mac
no-conversion-unix

Modeline indicator: Raw. A type no-conversion coding system, which converts only line-break-codes. An implementation quirk means that this coding system is also used for ISO8859-1.

binary

Modeline indicator: Binary. A type no-conversion coding system which does no character coding or EOL conversions. An alias for raw-text-unix.

alternativnyj
alternativnyj-dos
alternativnyj-mac
alternativnyj-unix

Modeline indicator: Cy.Alt. A type ccl coding system used for Alternativnyj, an encoding of the Cyrillic alphabet.

big5
big5-dos
big5-mac
big5-unix

Modeline indicator: Zh/Big5. A type big5 coding system used for BIG5, the most common encoding of traditional Chinese as used in Taiwan.

cn-gb-2312
cn-gb-2312-dos
cn-gb-2312-mac
cn-gb-2312-unix

Modeline indicator: Zh-GB/EUC. A type iso2022 coding system used for simplified Chinese (as used in the People’s Republic of China), with the ascii (G0), chinese-gb2312 (G1), and sisheng (G2) character sets initially designated. Chinese EUC (Extended Unix Code).

ctext-hebrew
ctext-hebrew-dos
ctext-hebrew-mac
ctext-hebrew-unix

Modeline indicator: CText/Hbrw. A type iso2022 coding system with the ascii (G0) and hebrew-iso8859-8 (G1) character sets initially designated for Hebrew.

ctext
ctext-dos
ctext-mac
ctext-unix

Modeline indicator: CText. A type iso2022 8-bit coding system with the ascii (G0) and latin-iso8859-1 (G1) character sets initially designated. X11 Compound Text Encoding. Often mistakenly recognized instead of EUC encodings; usual cause is inappropriate setting of coding-priority-list.

escape-quoted

Modeline indicator: ESC/Quot. A type iso2022 8-bit coding system with the ascii (G0) and latin-iso8859-1 (G1) character sets initially designated and escape quoting. Unix EOL conversion (ie, no conversion). It is used for .ELC files.

euc-jp
euc-jp-dos
euc-jp-mac
euc-jp-unix

Modeline indicator: Ja/EUC. A type iso2022 8-bit coding system with ascii (G0), japanese-jisx0208 (G1), katakana-jisx0201 (G2), and japanese-jisx0212 (G3) initially designated. Japanese EUC (Extended Unix Code).

euc-kr
euc-kr-dos
euc-kr-mac
euc-kr-unix

Modeline indicator: ko/EUC. A type iso2022 8-bit coding system with ascii (G0) and korean-ksc5601 (G1) initially designated. Korean EUC (Extended Unix Code).

hz-gb-2312

Modeline indicator: Zh-GB/Hz. A type no-conversion coding system with Unix EOL convention (ie, no conversion) using post-read-decode and pre-write-encode functions to translate the Hz/ZW coding system used for Chinese.

iso-2022-7bit
iso-2022-7bit-unix
iso-2022-7bit-dos
iso-2022-7bit-mac
iso-2022-7

Modeline indicator: ISO7. A type iso2022 7-bit coding system with ascii (G0) initially designated. Other character sets must be explicitly designated to be used.

iso-2022-7bit-ss2
iso-2022-7bit-ss2-dos
iso-2022-7bit-ss2-mac
iso-2022-7bit-ss2-unix

Modeline indicator: ISO7/SS. A type iso2022 7-bit coding system with ascii (G0) initially designated. Other character sets must be explicitly designated to be used. SS2 is used to invoke a 96-charset, one character at a time.

iso-2022-8
iso-2022-8-dos
iso-2022-8-mac
iso-2022-8-unix

Modeline indicator: ISO8. A type iso2022 8-bit coding system with ascii (G0) and latin-iso8859-1 (G1) initially designated. Other character sets must be explicitly designated to be used. No single-shift or locking-shift.

iso-2022-8bit-ss2
iso-2022-8bit-ss2-dos
iso-2022-8bit-ss2-mac
iso-2022-8bit-ss2-unix

Modeline indicator: ISO8/SS. A type iso2022 8-bit coding system with ascii (G0) and latin-iso8859-1 (G1) initially designated. Other character sets must be explicitly designated to be used. SS2 is used to invoke a 96-charset, one character at a time.

iso-2022-int-1
iso-2022-int-1-dos
iso-2022-int-1-mac
iso-2022-int-1-unix

Modeline indicator: INT-1. A type iso2022 7-bit coding system with ascii (G0) and korean-ksc5601 (G1) initially designated. ISO-2022-INT-1.

iso-2022-jp-1978-irv
iso-2022-jp-1978-irv-dos
iso-2022-jp-1978-irv-mac
iso-2022-jp-1978-irv-unix

Modeline indicator: Ja-78/7bit. A type iso2022 7-bit coding system. For compatibility with old Japanese terminals; if you need to know, look at the source.

iso-2022-jp
iso-2022-jp-2 (ISO7/SS)
iso-2022-jp-dos
iso-2022-jp-mac
iso-2022-jp-unix
iso-2022-jp-2-dos
iso-2022-jp-2-mac
iso-2022-jp-2-unix

Modeline indicator: MULE/7bit. A type iso2022 7-bit coding system with ascii (G0) initially designated, and complex specifications to insure backward compatibility with old Japanese systems. Used for communication with mail and news in Japan. The "-2" versions also use SS2 to invoke a 96-charset one character at a time.

iso-2022-kr

Modeline indicator: Ko/7bit A type iso2022 7-bit coding system with ascii (G0) and korean-ksc5601 (G1) initially designated. Used for e-mail in Korea.

iso-2022-lock
iso-2022-lock-dos
iso-2022-lock-mac
iso-2022-lock-unix

Modeline indicator: ISO7/Lock. A type iso2022 7-bit coding system with ascii (G0) initially designated, using Locking-Shift to invoke a 96-charset.

iso-8859-1
iso-8859-1-dos
iso-8859-1-mac
iso-8859-1-unix

Due to implementation, this is not a type iso2022 coding system, but rather an alias for the raw-text coding system.

iso-8859-2
iso-8859-2-dos
iso-8859-2-mac
iso-8859-2-unix

Modeline indicator: MIME/Ltn-2. A type iso2022 coding system with ascii (G0) and latin-iso8859-2 (G1) initially invoked.

iso-8859-3
iso-8859-3-dos
iso-8859-3-mac
iso-8859-3-unix

Modeline indicator: MIME/Ltn-3. A type iso2022 coding system with ascii (G0) and latin-iso8859-3 (G1) initially invoked.

iso-8859-4
iso-8859-4-dos
iso-8859-4-mac
iso-8859-4-unix

Modeline indicator: MIME/Ltn-4. A type iso2022 coding system with ascii (G0) and latin-iso8859-4 (G1) initially invoked.

iso-8859-5
iso-8859-5-dos
iso-8859-5-mac
iso-8859-5-unix

Modeline indicator: ISO8/Cyr. A type iso2022 coding system with ascii (G0) and cyrillic-iso8859-5 (G1) initially invoked.

iso-8859-7
iso-8859-7-dos
iso-8859-7-mac
iso-8859-7-unix

Modeline indicator: Grk. A type iso2022 coding system with ascii (G0) and greek-iso8859-7 (G1) initially invoked.

iso-8859-8
iso-8859-8-dos
iso-8859-8-mac
iso-8859-8-unix

Modeline indicator: MIME/Hbrw. A type iso2022 coding system with ascii (G0) and hebrew-iso8859-8 (G1) initially invoked.

iso-8859-9
iso-8859-9-dos
iso-8859-9-mac
iso-8859-9-unix

Modeline indicator: MIME/Ltn-5. A type iso2022 coding system with ascii (G0) and latin-iso8859-9 (G1) initially invoked.

koi8-r
koi8-r-dos
koi8-r-mac
koi8-r-unix

Modeline indicator: KOI8. A type ccl coding-system used for KOI8-R, an encoding of the Cyrillic alphabet.

shift_jis
shift_jis-dos
shift_jis-mac
shift_jis-unix

Modeline indicator: Ja/SJIS. A type shift-jis coding-system implementing the Shift-JIS encoding for Japanese. The underscore is to conform to the MIME charset implementing this encoding.

tis-620
tis-620-dos
tis-620-mac
tis-620-unix

Modeline indicator: TIS620. A type ccl encoding for Thai. The external encoding is defined by TIS620, the internal encoding is peculiar to MULE, and called thai-xtis.

viqr

Modeline indicator: VIQR. A type no-conversion coding system with Unix EOL convention (ie, no conversion) using post-read-decode and pre-write-encode functions to translate the VIQR coding system for Vietnamese.

viscii
viscii-dos
viscii-mac
viscii-unix

Modeline indicator: VISCII. A type ccl coding-system used for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is given priority by SXEmacs.

vscii
vscii-dos
vscii-mac
vscii-unix

Modeline indicator: VSCII. A type ccl coding-system used for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is given priority by SXEmacs. Use (prefer-coding-system 'vietnamese-vscii) to give priority to VSCII.


Previous: , Up: Coding Systems   [Contents][Index]