Next: General Guidelines for Writing Mule-Aware Code, Previous: Working With Character and Byte Positions, Up: Coding for Mule [Contents][Index]
When an external function, such as a C library function, returns a
char
pointer, you should almost never treat it as Bufbyte
.
This is because these returned strings may contain 8bit characters which
can be misinterpreted by SXEmacs, and cause a crash. Likewise, when
exporting a piece of internal text to the outside world, you should
always convert it to an appropriate external encoding, lest the internal
stuff (such as the infamous \201 characters) leak out.
The interface to conversion between the internal and external
representations of text are the numerous conversion macros defined in
buffer.h. There used to be a fixed set of external formats
supported by these macros, but now any coding system can be used with
these macros. The coding system alias mechanism is used to create the
following logical coding systems, which replace the fixed external
formats. The (dontusethis-set-symbol-value-handler) mechanism was
enhanced to make this possible (more work on that is needed - like
remove the dontusethis-
prefix).
Qbinary
This is the simplest format and is what we use in the absence of a more
appropriate format. This converts according to the binary
coding
system:
Qfile_name
Format used for filenames. This is user-definable via either the
file-name-coding-system
or pathname-coding-system
(now
obsolete) variables.
Qnative
Format used for the external Unix environment—argv[]
, stuff
from getenv()
, stuff from the /etc/passwd file, etc.
Currently this is the same as Qfile_name. The two should be
distinguished for clarity and possible future separation.
Qctext
Compound–text format. This is the standard X11 format used for data stored in properties, selections, and the like. This is an 8-bit no-lock-shift ISO2022 coding system. This is a real coding system, unlike Qfile_name, which is user-definable.
There are two fundamental macros to convert between external and internal format.
TO_INTERNAL_FORMAT
converts external data to internal format, and
TO_EXTERNAL_FORMAT
converts the other way around. The arguments
each of these receives are a source type, a source, a sink type, a sink,
and a coding system (or a symbol naming a coding system).
A typical call looks like
TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
which means that the contents of the lisp string str
are written
to a malloc’ed memory area which will be pointed to by ptr
, after
the function returns. The conversion will be done using the
file-name
coding system, which will be controlled by the user
indirectly by setting or binding the variable
file-name-coding-system
.
Some sources and sinks require two C variables to specify. We use some preprocessor magic to allow different source and sink types, and even different numbers of arguments to specify different types of sources and sinks.
So we can have a call that looks like
TO_INTERNAL_FORMAT (DATA, (ptr, len), MALLOC, (ptr, len), coding_system);
The parenthesized argument pairs are required to make the preprocessor magic work.
Here are the different source and sink types:
DATA, (ptr, len),
input data is a fixed buffer of size len at address ptr
ALLOCA, (ptr, len),
output data is placed in an alloca()ed buffer of size len pointed to by ptr
MALLOC, (ptr, len),
output data is in a malloc()ed buffer of size len pointed to by ptr
C_STRING_ALLOCA, ptr,
equivalent to ALLOCA (ptr, len_ignored)
on output.
C_STRING_MALLOC, ptr,
equivalent to MALLOC (ptr, len_ignored)
on output
C_STRING, ptr,
equivalent to DATA, (ptr, strlen (ptr) + 1)
on input
LISP_STRING, string,
input or output is a Lisp_Object of type string
LISP_BUFFER, buffer,
output is written to (point)
in lisp buffer buffer
LISP_LSTREAM, lstream,
input or output is a Lisp_Object of type lstream
LISP_OPAQUE, object,
input or output is a Lisp_Object of type opaque
Often, the data is being converted to a ’\0’-byte-terminated string,
which is the format required by many external system C APIs. For these
purposes, a source type of C_STRING
or a sink type of
C_STRING_ALLOCA
or C_STRING_MALLOC
is appropriate.
Otherwise, we should try to keep SXEmacs ’\0’-byte-clean, which means
using (ptr, len) pairs.
The sinks to be specified must be lvalues, unless they are the lisp
object types LISP_LSTREAM
or LISP_BUFFER
.
For the sink types ALLOCA
and C_STRING_ALLOCA
, the
resulting text is stored in a stack-allocated buffer, which is
automatically freed on returning from the function. However, the sink
types MALLOC
and C_STRING_MALLOC
return xmalloc()
ed
memory. The caller is responsible for freeing this memory using
xfree()
.
Note that it doesn’t make sense for LISP_STRING
to be a source
for TO_INTERNAL_FORMAT
or a sink for TO_EXTERNAL_FORMAT
.
You’ll get an assertion failure if you try.
Next: General Guidelines for Writing Mule-Aware Code, Previous: Working With Character and Byte Positions, Up: Coding for Mule [Contents][Index]