****************************************************** ACTIVATING CHARACTERS FOR INTERNATIONAL USE ****************************************************** --- by Laurent Siebenmann Forward The following note was written in a moment of enthousiasm in June 1990 when it appeared politically possible to get a quick update of AmSTeX and LamSTeX to alleviate difficuties obstructing the activation of ;:! and ? for the needs of French language users. These hopes were disappointed, but the problem has not gone away and I hope that by publishing in TeXMag, I will stir up some reactions that will in one way or another catalyse a solution. TeXhax would be a good medium for discussion; it would be helpful to hear in particular about the needs of other languages and of other formats. Abstract Over the past six months, in correspondence with Mike Spivak and Bernard Gaulle I have been trying to sort out the problems posed by European typography for punctuation and accents under AmSTeX and LamSTeX (the two formats created by Spivak). This has involved a multiplicity of problems caused by change of category of characters from 12 (=other) to 13 (=active) and back. Through the hurly-burly of individual problems I now perceive a reasonable partition of responsibilities between format designers (such as Spivak) and national user groups (such as Gutenberg, the French group presided by Gaulle). Timely concertation among all format designers could greatly simplify the elaboration of national style files --- by providing a few standard low-level macros that facilitate category change. ¤1. Problems posed by activation. How do the problems with active characters arise? We can illustrate simply by focusing on the semicolon, the story for three more :!? and perhaps some others is quite similar. The semicolon in French prose typography requires more space before it than in English. (How much and what sort is a matter for French typographers to decide). There is a well known mechanism of TeX to allow this: one assigns the semicolon category 13 (=active), for example by a command \catcode`\;=13. The active semicolon has the `intelligent' behavior of a TeX macro, whereas with the original category 12 (=other) it is a `dumb character'. Thus the French typographer can issue a command \def;{} to modify the behavior of the semicolon. When one leaves French and enters another language one can either change the macro or revert to category 12 (=other).% % Remark: There are two other another possible solutions that should be mentioned. a) Since the early days of TeX, many French typists have been trained to type a tilde before the semicolon in prose, since that provides an unbreakable space (under essentially all formats). Now, the tilde is a active character under essentially all formats and its expansion can be altered to provide exactly the desired space in case the following character is a semicolon. This solution is typographically sound and does not require category change of any character at all; there are TeXperts who would consider the matter settled! However this solution is less convenient because the tilde is *required*; typists who encounter simpler typing of punctuation (without a tilde, as in English or with a space instead of a tilde) are disconcerted. In everyday matters such as punctuation, TeX owes the typist the most ergonomic solution; the recommended solution above would normally behave well whether or not the typist explicitly indicates space before the semicolon. b) The generalised kerns and ligatures of TeX3.0 may offer a solution since the concepts are very powerful. However, one wants a portable solution, so I would not recommend using a special collection of virtual fonts for the job. Further, if one wants some stretch in the space preceeding the semicolon, then I fail to see how these new features will help. (The stretch is definitely there for the colon as used in Le Monde.) Nevertheless this approach should be kept in mind. In the beginning there was Plain TeX. Under Plain the activation approach works in a perfectly straightforward way. And there is not the slightest need to alter the situation. Under AmSTeX and LamSTeX, as they officially stand today, change of category of the semicolon causes problems. I presume always that a standard (unadulterated) format is used, built upon standard Plain TeX. A criterion of portability for .tex files dictates this, see paragraph ¤3 below. My contention will ultimately be that these and other formats should be coherently revised to be essentially as flexible as Plain TeX, see ¤2. There are basically two problems: 1) The semicolon appears explicitly in many macro expansions of AmSTeX and LamSTeX. This means the semicolon of category 12. Recall that TeX permanently assigns a category code among 0,...,16 to each character as it is being read in (see TeXbook Chap~8). Some of these should remain in category 12, for example semicolons in error messages. Others should switch to category 13 for as long as the user makes the semicolon active, for example semicolons to be printed as such. This first problem is a clear sign that the second author was blissfully unaware of European needs while writing these formats! Fortunately I have a very simple revision of the formats that provides a remedy. One can define a private macro \semicolon@ and put it in place of the semicolons that should change category. Initially, one gives a definition \let\semicolon@=; (while the semicolon has category 12). Just after the user switches the semicolon category to 13, he should directly or indirectly reiterate \let\semicolon@=; to make the category value 13, and inversely. Recall that, internally, characters are tagged by their category code so that in a very real sense category 13 and category 12 semicolons are distinct characters that exist simultaneously. 2) The semicolon appears explicitly in the syntax surrounding some macros of LamSTeX. For example, in LamSTeX, \cgaps{3;2;2.5} sets the first three column gaps in a commutative diagram in terms of a standard gap. Under the original standard LamsTeX, the semicolon really means the category 12 semicolon. Such macros will fail when the semicolon is assigned category 13. In such cases I again propose a change of the format. At the time when semicolon gets category 13, we roughly speaking reiterate relevant definitions involving the semicolon in their syntax. For efficiency, a number of macro definitions are rearranged so that those explicitly involving the semicolon in their syntax are extremely short. This (incomplete) discussion for the semicolon is illustrative for all four punctuation marks (;:!?). For a more thorough discussion, see the technical report []. See also [GPAMS]: Germans often make the " (double-quote) character active and use it to provide the umluat accent; this involves a special little problem since the " character is used by TeX itself for indicating hexadecimal numbers. I should add some comments concerning auxiliary macro files; they pose the same sort of difficulties. Those macro files that are official adjuncts of a major format will hopefully be carefully revised by the wizards in charge of the format. When it comes to the innumerable unsupported style files, the casual TeXpert may find himself suddenly responsible for adapting them; in that case, there are a couple of nuances: (a) It may be necessary to quickly hack together a revised version of a style file. (b) It may be desirable to produce an auxiliary file rather than alter the style file. Here is some advice: To begin, look carefully through the macro file for any definition involving a character whose category-code you have to change. When the category change causes trouble at this point, an almost universal remedy is to *restate* the definition in an appropriate category-code environment. This may cost a very great deal of space but it is usually quickly arranged. Further the restatements can be put in an auxiliary file. Punctuation to be printed to the screen can be preceded by \string, which in effect forces category 12. ¤2. An inter-format standard for category change? One of the major strengths of elaborate formats is their ability to shield the enduser, and even the typographer, from the intricate inner workings of TeX. What I have said up to this point is still unsatisfactory inasmuch as some detailed knowledge of the workings of AmSTeX and LamSTeX appears to be required. This requirement can be eliminated by installing a high level interface to category change. I propose to define standard public macros in the next updates of AmSTeX and LamSTeX: \semicolonactive \semicolonother \colonactive \colonother \exclamationmarkactive \exclamationmarkother \questionmarkactive \questionmarkother \Quoteactive \Quoteother ...(and some others? cf. ¤4) --- whose job is to make the extra format-specific modifications that are necessary when the category-code changes envisaged above are made. For reasons that may become clearer in ¤3, I hope that each format will define these macros. Or at least those necessary in their format. As the use of TeX evolves and spreads, more characters may be added to this list. (The above list is based on current French and German needs.) Then for example, to adapt AmSTeX to the needs of the Terpenty Coast Republic, where just the semicolon has to be active, the national TeX users' group would sponsor a national style file terp.sty whose contents might be roughly as follows. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% terp.sty (Terpenty Coast national style file) %% July 1990 (alpha version) %% %% Hyphenation: supposes TeX version 3 \ifx\terpentine\undefined \newlanguage\terpentine {\language=\terpentine \input terp.hyph % file of form \patterns{...}\hyphenation{...} % that establishes Terpentine hyphenation % to be used when language is Terpentine } \fi %% Punctuation: \ifx\semicolonactive\undefined % true for Plain \gdef\semicolonactive{\relax}\fi \ifx\semicolonother\undefined % true for Plain \gdef\semicolonother{\relax}\fi \def\terppunct{\catcode`\,=13 \def,{} \semicolonactive} \def\nontreppunct{\catcode`\,=12 \semicolonother} \def \terps{\language=\terpentine\terppunct} \def \noterps{\language=0\nontreppunct} \endinput If Terpentine hyphenation has already been installed (for language number \terpentine) nothing happens in the hyphenation part. But otherwise there is an attempt to install it. Because of the (implicit) use of the \patern primitive, it will then normally be necessary to process this file using INITEX the enriched initialisation version of the TeX program. At the prompt ** of INITEX, one could type. **&amstex \input terp.sty \dump This assumes amstex.fmt is an available precompiled format, and it quickly produces a new precompiled format, Tpamstex.fmt say. Once terp.sty has been input, TeX will respond to the command \terps by providing national hyphenation and punctuation for Terpenty Coast nationals and to \noterps by returning to Knuth's punctuation and English hyphenation. These two are the only macros above that the everyday user in Terpenty will employ. (Many members of the Terpentine TeX users group feel that the national features offered by this file are woefully incomplete and an ad hoc group is exploring numerous elaborations.) Note that, this file contains no specific reference to the format used. All that is required is that the macros from my proposed list \semicolonactive and \semicolonother be suitably defined, i.e. so as to prevent undesirable side-effects when the comma switches category respectively to 13 and back to 12. For other nationalities, the national style file could, I hope be similar in structure (so far as punctuation is concerned), and similarly independent of format. Thus the above file will work as described with AmSTeX and LamSTeX and hopefully with any other format if and when the the changes I propose are implemented. As a stop-gap measure while awaiting that happy time, I have provided [], for both AmSTeX and LamSTeX, suitable *external* definitions of all the macros in the list (*). Such external definitions would have to be input before the national style file. While the revisions for AmSTeX and LamSTeX proposed to define (*) cost negligible space, (a couple of K octets in all), these external definitions are ugly and bulky, in all about 6K for AmSTeX and 14K for LamSTeX. Retrofit is costly! To give the flavor, here is the definition of \semicolonactive as it is currently proposed for a revision of LamSTeX. {% group to localize category changes \catcode`\;=\active@ % makes ; active \catcode`\@=11 % makes @ a letter \gdef\semicolonactive{% %definition is global but effect is local \let\semicolon@=;% see problem (1) above \let\ds@\ds@active % see problem (2) above % ... plus five similar lets using in place of \ds % the following: \dtX, \dtY, \cgaps, \rgaps, \gaps@ } % end of the definition of \semicolonactive } % this ends the group with special local catcodes The new macro \ds@active is defined in revised LamSTeX by \def\ds@active(#1,#2){\ds@@{#1}{#2}} where ds@@ is essentially a preexisting 1989 LamsTeX macro. The definition of \semicolonother for (revised) LamSTeX is entirely similar but `other' replaces `active'. Since the semicolon is the most troublesome of ;:!? this gives a petty honest overview of the whole solution, and indicates that it is reasonably tidy. My fondest hope is that the approach to the activation of characters that I have sketched for AmSTeX and LamSTeX will prove suitable for other formats and that a consensus among all formats will be possible. ¤3. Criteria for implementing national styles. One main purpose of the interformat macros (*) we are proposing is to facilitate the construction of national style files. There are almost too many ways to go about the task of implementing a national style. But various natural criteria I list below happily narrow down the choices, and make a consensus more likely. Hopefully, my proposals above for related category changes are in harmony with all of them. I am indebted to Bernard Gaulle president of GUTenberg, the French users group, for mentioning several criteria, on the basis of his experience in writing a provisional French style file for LaTeX (see []). 1) Portability. It is vital that one be able to send an article of a given language anywhere in the world and have it printed at its destination with all its national style features intact. At a stroke this precludes national styles based on a revision of one or more formats. The first functional francisation of Plain TeX was based on a modification of Plain TeX done in Strasbourg by Desarmenien et al in th mid eighties. A marvel in its time, but non-portable. Portability has been greatly assisted by the multilingual hyphenation in version 3 of TeX (and of MLTeX before it). The tricky bilingual hyphenation table of Desarmenien of 42K octets can now be replaced by a French hyphenation file of about 5K (also by Desarmenien). Hopefully, French articles will soon be able to move around the world with less than 10K of extra baggage --- in the form of a a hyphenation file plus a style file. Since, in many computer centers, standard formats are most easily available as precompiled binary formats, it is desirable that any national style file be loadable after the standard format. This strongly influences design. So far as possible, it should not matter when the style file is loaded. The need to use INITEX, the unfamiliar initialisation version of TeX, to input any file that uses \patterns is a regrettable inconvenience that is sure to scare off many users wishing to exploit national styles beyond their national frontiers. Fortunately, some implementations of TeX assimilate the functions of INITEX in TeX itself, for example, Textures on the Macintosh. 2) A clear division of responsibilities between format designers and implementors of national styles. TeX is used for many languages and TeX version 3 is expecting to be used for many more. It is unreasonable to expect format designers to go on writing macros specific to national groups. It is equally unreasonably to ask national groups to cede such responsibilities to format designers. Instead, format designers should provide the low-level tools --- for example the macros (*) --- to allow local texperts to independently implement their national styles. A couple slogans are apt: Ma\^\itres chez nous! (Ren\'e Levesque, Parti Qu\'ebecois) Give us the tools and we'll do the job. (an Alglo Americain) 3) Independence of format. One national style file should apply to many (hopefully all!) formats. It is difficult to decide where to draw a line in elaborating a national style file. This criterion may give strong hints. German.sty for LaTeX by H. Partl et al. is one style that is notable for having applied to both LateX and Plain; with my approach it could apply to AmSTeX and LamSTeX too. 4) Mutual compatibility (two facets) Language independence for national style file design. There are some very international people and organizations, and for their sake, it would be nice if, once you understand one sufficiently complicated national style file you understand them all. Multilingual works such as conference proceedings can greatly benefit from mutual functional compatibility of all the language style files involved. This means that their commands can be mixed in one typescript. 5) Simplicity. Since problems will fatally arise, simplicity should be preserved to give ordinary mortals half a chance to solve then! ¤4. Concluding remarks. My concrete proposal, namely that the macros (*) be defined by any format in which activation of the characters ;:!?" causes problems that do not occur in Plain TeX, is a very conservative one. Indeed, it is nothing but an orderly return to the liberties available in Plain TeX! While it is clearly motivated by French and German needs, all nationalities are put on an equal footing. Active punctuation has been used by Knuth to implement what is called *hanging punctuation*, see TeXbook, Appendix D. This is the practice of letting punctuation protrude into the margin (on the grounds that this produces more aesthetic allignments). There is sufficient support for this practice that `what you see is what you get' wordprocessing on the Macintosh microcomputers will shortly offer this feature. This hanging punctuation brings home two points: (i) Active punctuation can be is of interest for all languages; indeeds for matters that are not language specific. (ii) The list of characters whose activation aught to be facilitated by macros (*) should probably be extended to provide for hanging punctuation (and perhaps other applications). Specifically, \periodactive \periodother \commaactive \commaother \lquoteactive \lquoteother \rquoteactive \rquoteother are desirable additions. As Knuth observes, the comma and the period are particularly awkward (even under Plain!) because the the period (alternatively comma) is used in specification of dimensions, as in \vskip=3.5truein ; when . is made active, one is seemingly obliged to type something like \vskip=3\pnt5truein instead, where \pnt is defined to be the category 12 period. Hopefully, each format designer will in future document the freedoms and constraints that apply to activation of characters under his format, in particular for those in the list ";:!?,.`' This note has not mentioned the new possibility, under TeX version 3, of exploiting extended ASCII codes 128-255. French punctuation does not seem to benefit directly. German accents can certainly be handled by using them; but an optimized seven bit classic ASCII standard will remain useful for information exchange, notably by email. My experience does suggest that the nascent eight bit TeX standard (or standards?) for codes 128-255 should include conventions for category and category change. How difficult are the macros (*) to implement for other formats? (The double quote " and the semicolon ; proved somewhat painful in LamsTeX, but the rest were easy.) Do the macros (*) really give the best available solution to the ``activation problem''? Which characters should one be able to make active for other languages? (I.e. is the list ";:!?,.`' adequate?) I solicit comments on the macros (*) from readers so that the best consensus will crystallize. REFERENCES = L. Siebenmann, LamSTeX un nouveau formateur de M. Spivak, Cahiers Gutenberg, no 6, Juillet, 1990, pages 25-33. --- FPAMS.TEX 17K --- FPLAMS.TEX 27K --- GPAMS.TEX 11K These three files were written at my request by Mike Spivak in spring 1990. They are available by email or by ftp 130.84.128.100 alias rsovax.circe.fr; Login: anonymous; pwd: anything; directory: [anonymous.siebenmann] Laurent Siebenmann Mathematique, Bat. 425, Univ de Paris-Sud, 91405-Orsay, France lcs@matups.matups.fr LS@FrMaP711.bitnet (weekends) siebenmann@LALCLS.decnet.cern.ch (reliable) Fax number: 33-1-6941-6221 RELEVANT ADDRESSES 1) Mike Spivak, Texplorators, 1703 W. Alabama, Suite 450-273, PO Box 27703-273, Houston Texas, 77027. 2) Rainer Schoepf, Konrad-Zuse-Zentrum fuer Informationstechnik Berlin, Heilbronner Strasse 10 D-1000 Berlin 31, Germany or . Rainer Schoepf adapted AmSTeX to LateX for the AMS. 3) Michael J. Downes, Amer. Math. Soc., 201 Charles Street, Providence RI02904, USA. Mike Downes has recently worked on AmSTeX.