Next: , Up: Charsets   [Contents][Index]


66.2.1 Charset Properties

Charsets have the following properties:

name

A symbol naming the charset. Every charset must have a different name; this allows a charset to be referred to using its name rather than the actual charset object.

doc-string

A documentation string describing the charset.

registry

A regular expression matching the font registry field for this character set. For example, both the ascii and latin-iso8859-1 charsets use the registry "ISO8859-1". This field is used to choose an appropriate font when the user gives a general font specification such as ‘-*-courier-medium-r-*-140-*’, i.e. a 14-point upright medium-weight Courier font.

dimension

Number of position codes used to index a character in the character set. SXEmacs/MULE can only handle character sets of dimension 1 or 2. This property defaults to 1.

chars

Number of characters in each dimension. In SXEmacs/MULE, the only allowed values are 94 or 96. (There are a couple of pre-defined character sets, such as ASCII, that do not follow this, but you cannot define new ones like this.) Defaults to 94. Note that if the dimension is 2, the character set thus described is 94x94 or 96x96.

columns

Number of columns used to display a character in this charset. Only used in TTY mode. (Under X, the actual width of a character can be derived from the font used to display the characters.) If unspecified, defaults to the dimension. (This is almost always the correct value, because character sets with dimension 2 are usually ideograph character sets, which need two columns to display the intricate ideographs.)

direction

A symbol, either l2r (left-to-right) or r2l (right-to-left). Defaults to l2r. This specifies the direction that the text should be displayed in, and will be left-to-right for most charsets but right-to-left for Hebrew and Arabic. (Right-to-left display is not currently implemented.)

final

Final byte of the standard ISO 2022 escape sequence designating this charset. Must be supplied. Each combination of (dimension, chars) defines a separate namespace for final bytes, and each charset within a particular namespace must have a different final byte. Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final bytes in the range 0x30 - 0x3F are reserved for user-defined (not official) character sets. For more information on ISO 2022, see Coding Systems.

graphic

0 (use left half of font on output) or 1 (use right half of font on output). Defaults to 0. This specifies how to convert the position codes that index a character in a character set into an index into the font used to display the character set. With graphic set to 0, position codes 33 through 126 map to font indices 33 through 126; with it set to 1, position codes 33 through 126 map to font indices 161 through 254 (i.e. the same number but with the high bit set). For example, for a font whose registry is ISO8859-1, the left half of the font (octets 0x20 - 0x7F) is the ascii charset, while the right half (octets 0xA0 - 0xFF) is the latin-iso8859-1 charset.

ccl-program

A compiled CCL program used to convert a character in this charset into an index into the font. This is in addition to the graphic property. If a CCL program is defined, the position codes of a character will first be processed according to graphic and then passed through the CCL program, with the resulting values used to index the font.

This is used, for example, in the Big5 character set (used in Taiwan). This character set is not ISO-2022-compliant, and its size (94x157) does not fit within the maximum 96x96 size of ISO-2022-compliant character sets. As a result, SXEmacs/MULE splits it (in a rather complex fashion, so as to group the most commonly used characters together) into two charset objects (big5-1 and big5-2), each of size 94x94, and each charset object uses a CCL program to convert the modified position codes back into standard Big5 indices to retrieve a character from a Big5 font.

Most of the above properties can only be set when the charset is initialized, and cannot be changed later. See Charset Property Functions.


Next: , Up: Charsets   [Contents][Index]