Friday, December 31, 2010

Forcing full expansion

Unfortunately, the current version of pdfTeX is lacking a \expanded primitive that would fully expand its argument, similar to the full expansion that happens for the replacement text of an \edef. However, there is a trick that will keep expanding macros until it reaches an unexpandable token.

This trick depends on an obscure aspect of TeX's grammar. Namely, wherever TeX expects a number, there are 4 different ways to give it a literal (and several more ways besides): (1) as a decimal number 1234, (2) as an octal number '31, (3) as a hexadecimal number "AA55, or (4) as the character code of a character token `X. For any of those, the literal can be preceded by an optional string of minuses (each one negates the number that is to follow so --5 is the same as 5) and can be followed by an optional space token. For example,
\count@=-37
contains both the optional minus and the optional space (because the newline was turned into a space token).

We can use case (4) along with the fact that \romannumeral expands to nothing when the following number is nonpositive to produce a full expansion:
\romannumeral-`X\foo
\foo will expand until there is a single nonexpandable token. Why does this work? Well, -`X is a complete number, namely -88. Since it is negative, \romannumeral-`X expands to nothing, as we would expect. But that leaves \foo. Remember that optional space in the grammar? It turns out that TeX will now expand tokens until it finds something unexpandable in its search for that optional space. The downside to this is that if \foo expands to something that starts with a space, the space will disappear.

As an example of this hack, consider the \hexnumber macro. This macro does not produce any nonexpandable tokens until it is ready to produce all of them. This makes it a fine candidate for the \romannumeral hack.
\expandafter\def\expandafter\foo\expandafter{\romannumeral-`X\hexnumber{37}}
\show\foo
\expandafter\def\expandafter\bar\expandafter{\hexnumber{37}}
\show\bar
If we examine TeX's output, the difference is immediately obvious.
> \foo=macro:
->25.
> \bar=macro:
->\ifnumcomp {37}<0{}{\hn@i {37}{}}.

Thanks to Joseph Wright for pointing this out.

Wednesday, December 29, 2010

Quines

Everybody loves a good quine. The most obvious, and most cheating way to write a “quine” In LaTeX is to use the listings package.
\documentclass{article}
\usepackage{listings}
\lstset{language=[LaTeX]TeX}
\begin{document}
\lstinputlisting{\jobname}
\end{document}
This is not very exciting and it's cheating on two levels. One, a quine is not supposed to read its source code. So just as
#!/bin/sh
cat "$0"
is not a quine, using the listings package is not a quine. Secondly, a quine should produce code that can be compiled directly. As it stands, I know of no TeX implementation that will read the code from a typeset pdf.

This is the shortest quine (give or take the characters used to construct the output file name) that I could construct.
\toks0{\immediate\openout2\jobname.copy.tex\def\t{\immediate\write2{\string
\toks0{\the\toks0}\string\the\string\toks0\end}}\t}\the\toks0\end
One thing to note is that due to the way TeX parses and displays tokens, this is not technically a quine. As with most quines on the Internet, this is a quine generator. Run it once to produce a quine:
$ tex quine
This is TeX, Version 3.1415926 (TeX Live 2010)
(./quine.tex )
No pages of output.
Transcript written on quine.log.
$ tex quine.copy
This is TeX, Version 3.1415926 (TeX Live 2010)
(./quine.copy.tex )
No pages of output.
Transcript written on quine.copy.log.
$ diff -s quine.copy.tex quine.copy.copy.tex
Files quine.copy.tex and quine.copy.copy.tex are identical

Tuesday, December 28, 2010

A fully expandable conversion to hexadecimal numbers

The TeX primitive \number will convert anything that can be converted to a number into a decimal number and what's more, it is expandable. That is \edef\foo{\number\count37} will define \foo to be whatever the value of count register 37 is. It turns out that using ε-TeX's \numexpr, it is possible to write a fully expandable macro \hexnumber that expands to the (positive) hexadecimal that is its argument. It's not quite as good as \number because it is a macro and not a primitive, but it is the best we can do without modifying TeX. To be fully explicit about what is going on here, I've opted to write this in plain TeX (with \numexpr, of course). First, we just need some boilerplate code that compares two numbers. (After I wrote this, I remembered that the LaTeX package etoolbox has a similar macro—in fact, it is implemented in exactly the same way, but that's not really surprising as it's the right way to do it—and so I renamed my macro to match etoolbox.)
\catcode`@11
\def\@firstoftwo#1#2{#1}
\def\@secondoftwo#1#2{#2}
\def\ifnumcomp#1#2#3{%
        \ifnum\numexpr#1\relax#2\numexpr#3\relax
                \expandafter\@firstoftwo
        \else
                \expandafter\@secondoftwo
        \fi
}
The first line is the same as LaTeX's \makeatletter. The next two are standard LaTeX definitions. Next, we need a way to divide two integers and truncate the result. TeX's primitive \divide does exactly this, but it is not expandable. \numexpr can do division; however, it rounds rather than truncates, so we need to handle that ourselves. In particular, if x<round(x/y)*y, then the division was rounded up so we need to subtract 1.
\def\truncdiv#1#2{%
        \ifnumcomp{#1}<{(#1)/(#2)*(#2)}{%
                \numexpr(#1)/(#2)-1%
        }{%
                \numexpr(#1)/(#2)%
        }%
}
This brings us to the main macro.
\def\hexnumber#1{%
        \ifnumcomp{#1}<0{}{\hn@i{#1}{}}%
}
If the argument is negative, expand to nothing. This is similar to the primitive \romannumeral. Otherwise call our first helper function. (Technically, there is no calling going on, just expansion.)
\def\hn@i#1#2{%
        \ifnumcomp{#1}<{16}
        {%
                \hn@digit{#1}#2%
        }{%
                \expandafter\hn@ii\expandafter{%
                        \the\numexpr\truncdiv{#1}{16}%
                }{#1}{#2}%
        }%
}
This macro takes two arguments. The first is the number to convert to hex and the second is the string converted so far. When \hexnumber calls this, it passes its argument as the first argument and an empty argument for the second. If the number to convert is less than 16, convert it to hex followed by what has already been converted. Otherwise, we call the second helper macro where the first argument is the decimal representation of our value to convert divided by 16. (The \expandafters are to prevent large \numexpr expressions from piling up.) The second and third arguments are our value to convert and string converted. Our final helper macro looks complicated, but isn't terribly.
\def\hn@ii#1#2#3{%
        \expandafter\hn@i\expandafter{%
                \number\numexpr#1\expandafter\expandafter\expandafter
                \expandafter\expandafter\expandafter\expandafter}%
                \expandafter\expandafter\expandafter\expandafter
                \expandafter\expandafter\expandafter{%
                        \hn@digit{(#2)-16*(#1)}#3}%
}
Again, we don't really need quite so many \expandafters, but they keep the second (resp. third) argument to \hn@i (resp. \hn@ii) from piling up. This is essentially just expanding to \hn@i{#1}{\hn@digit{(#2)-16*(#1)}#3} except that both arguments are fully expanded. Finally, we need to be able to turn a number 0≤x<16 into a single hex digit. One little caveat is that to act like \number, we need all of the output to have category code 12. To that end, we start a new group, change the catcodes, and then globally define \hn@digit.
\begingroup
\catcode`012\catcode`112\catcode`212\catcode`312\catcode`412
\catcode`512\catcode`612\catcode`712\catcode`812\catcode`912
\catcode`A12\catcode`B12\catcode`C12\catcode`D12\catcode`E12
\catcode`F12
\gdef\hn@digit#1{%
        \ifcase\numexpr#1\relax 0%
        \or \expandafter 1%
        \or \expandafter 2%
        \or \expandafter 3%
        \or \expandafter 4%
        \or \expandafter 5%
        \or \expandafter 6%
        \or \expandafter 7%
        \or \expandafter 8%
        \or \expandafter 9%
        \or \expandafter A%
        \or \expandafter B%
        \or \expandafter C%
        \or \expandafter D%
        \or \expandafter E%
        \or \expandafter F%
        \fi
}
\endgroup
And that's all there is to it!

Sunday, December 26, 2010

\vspace* is broken

As LaTeX tutorials are quick to point out, \vspace{x} can disappear at the top or bottom of pages. This is because each \vspace expands to two \vskips which are discardable. Thus, if you write \vspace{3in} near the bottom of the page, the next paragraph will start at the top of the next page.

Of course, if the goal is to start writing the text below the top of the page, then \vspace will not work. The solution, or so say the tutorials, is to use \vspace* instead. This, we're led to believe will always insert that much space. Want to start writing 1 inch from the top margin? No problem, use \vspace*{1in}. The trouble is, that's not quite true. It turns out that TeX has a parameter called \topskip which is related to the amount of glue TeX puts at the top of the page. It's goal is to make the baseline of the first box on the main vertical list of each page start at the same place.

Consider the first line of a page consisted of a single x and the first line of the next page consisted of a single X. We'd like the first line of text on each page to line up, even though they have different heights. So that's exactly what TeX does. If h is the height of the first line of text on the page and t is the value of \topskip, then TeX inserts max {t - h, 0} glue before the first box on the page.

So what does that have to do with \vspace*? The answer lies in how LaTeX implements \vspace*. It is implemented as a 0 height \hrule (because \hrule is not discardable), a penalty to prevent page breaks, and then two \vskips (the second is for 0 pt, presumably so that \unskip doesn't remove it). There is also some code to deal with \vspace* in horizontal mode as well as some bookkeeping involving \prevdepth that isn't important here. Because it starts off with an \hrule, TeX inserts \topskip glue. In practice, this means that \vspace*{0pt} will add vertical space at the top of a page. This is undesirable. TeX by Topic suggests that one use \hbox{}\kern-\topskip when one desires material start at the very top of the page. Obviously, this solution is not workable for \vspace* since it will insert -\topskip space every time. A better solution is to save \topskip, set it equal to 0 pt, perform the space and then restore it. Something like the following.
\begingroup
\topskip0pt
\vspace*{3in}%
\endgroup
Here, we're using TeX's grouping facility to keep the assignment local to the group. A better way is to define a new macro, \Vspace, that performs all of the relevant code. Most of this is copied from the definition of \@vspacer—a helper macro for \vspace*.
\newskip\Vspace@topskip
\providecommand\Vspace[1]{%
        \Vspace@topskip\topskip
        \topskip\z@skip
        \ifvmode
                \dimen@\prevdepth
                \hrule\@height\z@
                \nobreak
                \vskip\glueexpr#1\relax
                \vskip\z@skip
                \prevdepth\dimen@
        \else
                \@bsphack
                \vadjust{%
                        \@restorepar
                        \hrule\@height\z@
                        \nobreak
                        \vskip#1%
                        \vskip\z@skip
                }%
                \@esphack
        \fi
        \topskip\Vspace@topskip
}
This macro can be used wherever \vspace* was used and it will always give exactly the space in the argument. Of course, this might not always be desired. In particular, if you want to start 4 lines down from the top of the page, then \vspace*{4\baselineskip} will do exactly that and keep all lines on various pages aligned. Using \Vspace to get the same space would require \Vspace{4\baselineskip + \topskip}.

Saturday, December 25, 2010

Sieve of Eratosthenes

By popular demand, here is the Sieve of Eratosthenes written in Plain TeX.
\newcount\cm\cm100
\newcount\ci
\newcount\cj
\def\prime{prime}
\def\composite{composite}
\def\set#1#2{\global\expandafter\let\csname s\number#1\endcsname#2}
\def\get#1{\csname s\number#1\endcsname}
\def\doprime{%
        \set\ci\prime
        \cj\ci
        \loop
                \advance\cj\ci
                \ifnum\cj<\cm
                        \set\cj\composite
        \repeat
}
\ci2
\loop\ifnum\ci<\cm
        \expandafter\ifx\csname s\number\ci\endcsname\relax
                \begingroup
                \doprime
                \endgroup
        \fi
        \advance\ci1
\repeat
% Print them
\ci2
\loop\ifnum\ci<\cm
        \number\ci: \get\ci\endgraf
        \advance\ci1
\repeat
\bye
Replacing \begingroup with \let\outerbody\body and \endgroup with \let\body\outerbody, you can test even more numbers due to the difference in hash space and save space. Of course, you can test twice as many if you don't bother testing the even numbers.
\newcount\cm\cm300000
\newcount\ci
\newcount\cj
\def\prime{prime}
\def\composite{composite}
\def\set#1#2{\global\expandafter\let\csname s\number#1\endcsname#2}
\def\get#1{\csname s\number#1\endcsname}
\def\doprime{%
        \set\ci\prime
        \cj\ci
        \loop
                \advance\cj\ci
                \advance\cj\ci
                \ifnum\cj<\cm
                        \ifodd\cj
                                \set\cj\composite
                        \fi
        \repeat
}
\set2\prime
\ci3
\loop\ifnum\ci<\cm
        \expandafter\ifx\csname s\number\ci\endcsname\relax
                \let\outerbody\body
                \doprime
                \let\body\outerbody
        \fi
        \advance\ci2
\repeat
% Print them
2: \prime\par
\ci3
\loop\ifnum\ci<\cm
        \number\ci: \ifodd\ci\get\ci\else\composite\fi\endgraf
        \advance\ci1
\repeat
\bye
Hendrik Vogt pointed out that Knuth wrote a different prime computing function in TeX that appears in The TeXbook on page 218. Thanks!