SXEmacs User’s Manual: Recognize Coding

17.7 Recognizing Coding Systems

Most of the time, SXEmacs can recognize which coding system to use for any given file–once you have specified your preferences.

Some coding systems can be recognized or distinguished by which byte sequences appear in the data. However, there are coding systems that cannot be distinguished, not even potentially. For example, there is no way to distinguish between Latin-1 and Latin-2; they use the same byte values with different meanings.

SXEmacs handles this situation by means of a priority list of coding systems. Whenever SXEmacs reads a file, if you do not specify the coding system to use, SXEmacs checks the data against each coding system, starting with the first in priority and working down the list, until it finds a coding system that fits the data. Then it converts the file contents assuming that they are represented in this coding system.

The priority list of coding systems depends on the selected language environment (see Language Environments). For example, if you use French, you probably want SXEmacs to prefer Latin-1 to Latin-2; if you use Czech, you probably want Latin-2 to be preferred. This is one of the reasons to specify a language environment.

However, you can alter the priority list in detail with the command M-x prefer-coding-system. This command reads the name of a coding system from the minibuffer, and adds it to the front of the priority list, so that it is preferred to all others. If you use this command several times, each use adds one element to the front of the priority list.

Sometimes a file name indicates which coding system to use for the file. The variable file-coding-system-alist specifies this correspondence. There is a special function modify-coding-system-alist for adding elements to this list. For example, to read and write all ‘.txt’ using the coding system china-iso-8bit, you can execute this Lisp expression:

(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)

The first argument should be file, the second argument should be a regular expression that determines which files this applies to, and the third argument says which coding system to use for these files.

You can specify the coding system for a particular file using the ‘-*-…-*-’ construct at the beginning of a file, or a local variables list at the end (see File Variables). You do this by defining a value for the “variable” named coding. SXEmacs does not really have a variable coding; instead of setting a variable, it uses the specified coding system for the file. For example, ‘-*-mode: C; coding: iso-8859-1;-*-’ specifies use of the iso-8859-1 coding system, as well as C mode.

Once SXEmacs has chosen a coding system for a buffer, it stores that coding system in buffer-file-coding-system and uses that coding system, by default, for operations that write from this buffer into a file. This includes the commands save-buffer and write-region. If you want to write files from this buffer using a different coding system, you can specify a different coding system for the buffer using set-buffer-file-coding-system (see Specify Coding).