Next: Basic Charset Functions, Up: Charsets [Contents][Index]
Charsets have the following properties:
name
A symbol naming the charset. Every charset must have a different name; this allows a charset to be referred to using its name rather than the actual charset object.
doc-string
A documentation string describing the charset.
registry
A regular expression matching the font registry field for this character
set. For example, both the ascii
and latin-iso8859-1
charsets use the registry "ISO8859-1"
. This field is used to
choose an appropriate font when the user gives a general font
specification such as ‘-*-courier-medium-r-*-140-*’, i.e. a
14-point upright medium-weight Courier font.
dimension
Number of position codes used to index a character in the character set. SXEmacs/MULE can only handle character sets of dimension 1 or 2. This property defaults to 1.
chars
Number of characters in each dimension. In SXEmacs/MULE, the only allowed values are 94 or 96. (There are a couple of pre-defined character sets, such as ASCII, that do not follow this, but you cannot define new ones like this.) Defaults to 94. Note that if the dimension is 2, the character set thus described is 94x94 or 96x96.
columns
Number of columns used to display a character in this charset. Only used in TTY mode. (Under X, the actual width of a character can be derived from the font used to display the characters.) If unspecified, defaults to the dimension. (This is almost always the correct value, because character sets with dimension 2 are usually ideograph character sets, which need two columns to display the intricate ideographs.)
direction
A symbol, either l2r
(left-to-right) or r2l
(right-to-left). Defaults to l2r
. This specifies the
direction that the text should be displayed in, and will be
left-to-right for most charsets but right-to-left for Hebrew
and Arabic. (Right-to-left display is not currently implemented.)
final
Final byte of the standard ISO 2022 escape sequence designating this charset. Must be supplied. Each combination of (dimension, chars) defines a separate namespace for final bytes, and each charset within a particular namespace must have a different final byte. Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final bytes in the range 0x30 - 0x3F are reserved for user-defined (not official) character sets. For more information on ISO 2022, see Coding Systems.
graphic
0 (use left half of font on output) or 1 (use right half of font on
output). Defaults to 0. This specifies how to convert the position
codes that index a character in a character set into an index into the
font used to display the character set. With graphic
set to 0,
position codes 33 through 126 map to font indices 33 through 126; with
it set to 1, position codes 33 through 126 map to font indices 161
through 254 (i.e. the same number but with the high bit set). For
example, for a font whose registry is ISO8859-1, the left half of the
font (octets 0x20 - 0x7F) is the ascii
charset, while the right
half (octets 0xA0 - 0xFF) is the latin-iso8859-1
charset.
ccl-program
A compiled CCL program used to convert a character in this charset into
an index into the font. This is in addition to the graphic
property. If a CCL program is defined, the position codes of a
character will first be processed according to graphic
and
then passed through the CCL program, with the resulting values used
to index the font.
This is used, for example, in the Big5 character set (used in Taiwan).
This character set is not ISO-2022-compliant, and its size (94x157) does
not fit within the maximum 96x96 size of ISO-2022-compliant character
sets. As a result, SXEmacs/MULE splits it (in a rather complex fashion,
so as to group the most commonly used characters together) into two
charset objects (big5-1
and big5-2
), each of size 94x94,
and each charset object uses a CCL program to convert the modified
position codes back into standard Big5 indices to retrieve a character
from a Big5 font.
Most of the above properties can only be set when the charset is initialized, and cannot be changed later. See Charset Property Functions.
Next: Basic Charset Functions, Up: Charsets [Contents][Index]