Friday, December 31, 2010

Forcing full expansion

Unfortunately, the current version of pdfTeX is lacking a \expanded primitive that would fully expand its argument, similar to the full expansion that happens for the replacement text of an \edef. However, there is a trick that will keep expanding macros until it reaches an unexpandable token.

This trick depends on an obscure aspect of TeX's grammar. Namely, wherever TeX expects a number, there are 4 different ways to give it a literal (and several more ways besides): (1) as a decimal number 1234, (2) as an octal number '31, (3) as a hexadecimal number "AA55, or (4) as the character code of a character token `X. For any of those, the literal can be preceded by an optional string of minuses (each one negates the number that is to follow so --5 is the same as 5) and can be followed by an optional space token. For example,
\count@=-37
contains both the optional minus and the optional space (because the newline was turned into a space token).

We can use case (4) along with the fact that \romannumeral expands to nothing when the following number is nonpositive to produce a full expansion:
\romannumeral-`X\foo
\foo will expand until there is a single nonexpandable token. Why does this work? Well, -`X is a complete number, namely -88. Since it is negative, \romannumeral-`X expands to nothing, as we would expect. But that leaves \foo. Remember that optional space in the grammar? It turns out that TeX will now expand tokens until it finds something unexpandable in its search for that optional space. The downside to this is that if \foo expands to something that starts with a space, the space will disappear.

As an example of this hack, consider the \hexnumber macro. This macro does not produce any nonexpandable tokens until it is ready to produce all of them. This makes it a fine candidate for the \romannumeral hack.
\expandafter\def\expandafter\foo\expandafter{\romannumeral-`X\hexnumber{37}}
\show\foo
\expandafter\def\expandafter\bar\expandafter{\hexnumber{37}}
\show\bar
If we examine TeX's output, the difference is immediately obvious.
> \foo=macro:
->25.
> \bar=macro:
->\ifnumcomp {37}<0{}{\hn@i {37}{}}.

Thanks to Joseph Wright for pointing this out.

Wednesday, December 29, 2010

Quines

Everybody loves a good quine. The most obvious, and most cheating way to write a “quine” In LaTeX is to use the listings package.
\documentclass{article}
\usepackage{listings}
\lstset{language=[LaTeX]TeX}
\begin{document}
\lstinputlisting{\jobname}
\end{document}
This is not very exciting and it's cheating on two levels. One, a quine is not supposed to read its source code. So just as
#!/bin/sh
cat "$0"
is not a quine, using the listings package is not a quine. Secondly, a quine should produce code that can be compiled directly. As it stands, I know of no TeX implementation that will read the code from a typeset pdf.

This is the shortest quine (give or take the characters used to construct the output file name) that I could construct.
\toks0{\immediate\openout2\jobname.copy.tex\def\t{\immediate\write2{\string
\toks0{\the\toks0}\string\the\string\toks0\end}}\t}\the\toks0\end
One thing to note is that due to the way TeX parses and displays tokens, this is not technically a quine. As with most quines on the Internet, this is a quine generator. Run it once to produce a quine:
$ tex quine
This is TeX, Version 3.1415926 (TeX Live 2010)
(./quine.tex )
No pages of output.
Transcript written on quine.log.
$ tex quine.copy
This is TeX, Version 3.1415926 (TeX Live 2010)
(./quine.copy.tex )
No pages of output.
Transcript written on quine.copy.log.
$ diff -s quine.copy.tex quine.copy.copy.tex
Files quine.copy.tex and quine.copy.copy.tex are identical

Tuesday, December 28, 2010

A fully expandable conversion to hexadecimal numbers

The TeX primitive \number will convert anything that can be converted to a number into a decimal number and what's more, it is expandable. That is \edef\foo{\number\count37} will define \foo to be whatever the value of count register 37 is. It turns out that using ε-TeX's \numexpr, it is possible to write a fully expandable macro \hexnumber that expands to the (positive) hexadecimal that is its argument. It's not quite as good as \number because it is a macro and not a primitive, but it is the best we can do without modifying TeX. To be fully explicit about what is going on here, I've opted to write this in plain TeX (with \numexpr, of course). First, we just need some boilerplate code that compares two numbers. (After I wrote this, I remembered that the LaTeX package etoolbox has a similar macro—in fact, it is implemented in exactly the same way, but that's not really surprising as it's the right way to do it—and so I renamed my macro to match etoolbox.)
\catcode`@11
\def\@firstoftwo#1#2{#1}
\def\@secondoftwo#1#2{#2}
\def\ifnumcomp#1#2#3{%
        \ifnum\numexpr#1\relax#2\numexpr#3\relax
                \expandafter\@firstoftwo
        \else
                \expandafter\@secondoftwo
        \fi
}
The first line is the same as LaTeX's \makeatletter. The next two are standard LaTeX definitions. Next, we need a way to divide two integers and truncate the result. TeX's primitive \divide does exactly this, but it is not expandable. \numexpr can do division; however, it rounds rather than truncates, so we need to handle that ourselves. In particular, if x<round(x/y)*y, then the division was rounded up so we need to subtract 1.
\def\truncdiv#1#2{%
        \ifnumcomp{#1}<{(#1)/(#2)*(#2)}{%
                \numexpr(#1)/(#2)-1%
        }{%
                \numexpr(#1)/(#2)%
        }%
}
This brings us to the main macro.
\def\hexnumber#1{%
        \ifnumcomp{#1}<0{}{\hn@i{#1}{}}%
}
If the argument is negative, expand to nothing. This is similar to the primitive \romannumeral. Otherwise call our first helper function. (Technically, there is no calling going on, just expansion.)
\def\hn@i#1#2{%
        \ifnumcomp{#1}<{16}
        {%
                \hn@digit{#1}#2%
        }{%
                \expandafter\hn@ii\expandafter{%
                        \the\numexpr\truncdiv{#1}{16}%
                }{#1}{#2}%
        }%
}
This macro takes two arguments. The first is the number to convert to hex and the second is the string converted so far. When \hexnumber calls this, it passes its argument as the first argument and an empty argument for the second. If the number to convert is less than 16, convert it to hex followed by what has already been converted. Otherwise, we call the second helper macro where the first argument is the decimal representation of our value to convert divided by 16. (The \expandafters are to prevent large \numexpr expressions from piling up.) The second and third arguments are our value to convert and string converted. Our final helper macro looks complicated, but isn't terribly.
\def\hn@ii#1#2#3{%
        \expandafter\hn@i\expandafter{%
                \number\numexpr#1\expandafter\expandafter\expandafter
                \expandafter\expandafter\expandafter\expandafter}%
                \expandafter\expandafter\expandafter\expandafter
                \expandafter\expandafter\expandafter{%
                        \hn@digit{(#2)-16*(#1)}#3}%
}
Again, we don't really need quite so many \expandafters, but they keep the second (resp. third) argument to \hn@i (resp. \hn@ii) from piling up. This is essentially just expanding to \hn@i{#1}{\hn@digit{(#2)-16*(#1)}#3} except that both arguments are fully expanded. Finally, we need to be able to turn a number 0≤x<16 into a single hex digit. One little caveat is that to act like \number, we need all of the output to have category code 12. To that end, we start a new group, change the catcodes, and then globally define \hn@digit.
\begingroup
\catcode`012\catcode`112\catcode`212\catcode`312\catcode`412
\catcode`512\catcode`612\catcode`712\catcode`812\catcode`912
\catcode`A12\catcode`B12\catcode`C12\catcode`D12\catcode`E12
\catcode`F12
\gdef\hn@digit#1{%
        \ifcase\numexpr#1\relax 0%
        \or \expandafter 1%
        \or \expandafter 2%
        \or \expandafter 3%
        \or \expandafter 4%
        \or \expandafter 5%
        \or \expandafter 6%
        \or \expandafter 7%
        \or \expandafter 8%
        \or \expandafter 9%
        \or \expandafter A%
        \or \expandafter B%
        \or \expandafter C%
        \or \expandafter D%
        \or \expandafter E%
        \or \expandafter F%
        \fi
}
\endgroup
And that's all there is to it!

Sunday, December 26, 2010

\vspace* is broken

As LaTeX tutorials are quick to point out, \vspace{x} can disappear at the top or bottom of pages. This is because each \vspace expands to two \vskips which are discardable. Thus, if you write \vspace{3in} near the bottom of the page, the next paragraph will start at the top of the next page.

Of course, if the goal is to start writing the text below the top of the page, then \vspace will not work. The solution, or so say the tutorials, is to use \vspace* instead. This, we're led to believe will always insert that much space. Want to start writing 1 inch from the top margin? No problem, use \vspace*{1in}. The trouble is, that's not quite true. It turns out that TeX has a parameter called \topskip which is related to the amount of glue TeX puts at the top of the page. It's goal is to make the baseline of the first box on the main vertical list of each page start at the same place.

Consider the first line of a page consisted of a single x and the first line of the next page consisted of a single X. We'd like the first line of text on each page to line up, even though they have different heights. So that's exactly what TeX does. If h is the height of the first line of text on the page and t is the value of \topskip, then TeX inserts max {t - h, 0} glue before the first box on the page.

So what does that have to do with \vspace*? The answer lies in how LaTeX implements \vspace*. It is implemented as a 0 height \hrule (because \hrule is not discardable), a penalty to prevent page breaks, and then two \vskips (the second is for 0 pt, presumably so that \unskip doesn't remove it). There is also some code to deal with \vspace* in horizontal mode as well as some bookkeeping involving \prevdepth that isn't important here. Because it starts off with an \hrule, TeX inserts \topskip glue. In practice, this means that \vspace*{0pt} will add vertical space at the top of a page. This is undesirable. TeX by Topic suggests that one use \hbox{}\kern-\topskip when one desires material start at the very top of the page. Obviously, this solution is not workable for \vspace* since it will insert -\topskip space every time. A better solution is to save \topskip, set it equal to 0 pt, perform the space and then restore it. Something like the following.
\begingroup
\topskip0pt
\vspace*{3in}%
\endgroup
Here, we're using TeX's grouping facility to keep the assignment local to the group. A better way is to define a new macro, \Vspace, that performs all of the relevant code. Most of this is copied from the definition of \@vspacer—a helper macro for \vspace*.
\newskip\Vspace@topskip
\providecommand\Vspace[1]{%
        \Vspace@topskip\topskip
        \topskip\z@skip
        \ifvmode
                \dimen@\prevdepth
                \hrule\@height\z@
                \nobreak
                \vskip\glueexpr#1\relax
                \vskip\z@skip
                \prevdepth\dimen@
        \else
                \@bsphack
                \vadjust{%
                        \@restorepar
                        \hrule\@height\z@
                        \nobreak
                        \vskip#1%
                        \vskip\z@skip
                }%
                \@esphack
        \fi
        \topskip\Vspace@topskip
}
This macro can be used wherever \vspace* was used and it will always give exactly the space in the argument. Of course, this might not always be desired. In particular, if you want to start 4 lines down from the top of the page, then \vspace*{4\baselineskip} will do exactly that and keep all lines on various pages aligned. Using \Vspace to get the same space would require \Vspace{4\baselineskip + \topskip}.

Saturday, December 25, 2010

Sieve of Eratosthenes

By popular demand, here is the Sieve of Eratosthenes written in Plain TeX.
\newcount\cm\cm100
\newcount\ci
\newcount\cj
\def\prime{prime}
\def\composite{composite}
\def\set#1#2{\global\expandafter\let\csname s\number#1\endcsname#2}
\def\get#1{\csname s\number#1\endcsname}
\def\doprime{%
        \set\ci\prime
        \cj\ci
        \loop
                \advance\cj\ci
                \ifnum\cj<\cm
                        \set\cj\composite
        \repeat
}
\ci2
\loop\ifnum\ci<\cm
        \expandafter\ifx\csname s\number\ci\endcsname\relax
                \begingroup
                \doprime
                \endgroup
        \fi
        \advance\ci1
\repeat
% Print them
\ci2
\loop\ifnum\ci<\cm
        \number\ci: \get\ci\endgraf
        \advance\ci1
\repeat
\bye
Replacing \begingroup with \let\outerbody\body and \endgroup with \let\body\outerbody, you can test even more numbers due to the difference in hash space and save space. Of course, you can test twice as many if you don't bother testing the even numbers.
\newcount\cm\cm300000
\newcount\ci
\newcount\cj
\def\prime{prime}
\def\composite{composite}
\def\set#1#2{\global\expandafter\let\csname s\number#1\endcsname#2}
\def\get#1{\csname s\number#1\endcsname}
\def\doprime{%
        \set\ci\prime
        \cj\ci
        \loop
                \advance\cj\ci
                \advance\cj\ci
                \ifnum\cj<\cm
                        \ifodd\cj
                                \set\cj\composite
                        \fi
        \repeat
}
\set2\prime
\ci3
\loop\ifnum\ci<\cm
        \expandafter\ifx\csname s\number\ci\endcsname\relax
                \let\outerbody\body
                \doprime
                \let\body\outerbody
        \fi
        \advance\ci2
\repeat
% Print them
2: \prime\par
\ci3
\loop\ifnum\ci<\cm
        \number\ci: \ifodd\ci\get\ci\else\composite\fi\endgraf
        \advance\ci1
\repeat
\bye
Hendrik Vogt pointed out that Knuth wrote a different prime computing function in TeX that appears in The TeXbook on page 218. Thanks!

Tuesday, September 21, 2010

LaTeX's architecture

Someone asked about the architecture of LaTeX. After avoiding the question for a day, I decided to answer. Since it's possible the question will be closed and possibly deleted, I figured I'd reproduce the relevant portion of my answer here. Briefly, LaTeX consists of several layers. The lowest layer is the three hundred or so TeX primitives such as \hbox or \vskip. On top of this, there is the LaTeX kernel. This is a set of macros such as \documentclass, \usepackage, or \begin. The documented source of the kernel is documented in the source2e document. Each document written in LaTeX begins (more or less) with \documentclass{foo} which loads the file foo.cls, for example article.cls or book.cls. This file is another set of macros, this time written using a combination of TeX primitives and macros from the LaTeX kernel. These class files can also load another class to extend it or modify the macros from that class. It can also load packages which are up next. LaTeX packages—that is, files ending in .sty (and their related files)—are additional sets of macros written using any combination of macros from lower levels. Some are specific to particular document classes, others are more general. These packages can be loaded using \usepackage from user documents, or \RequirePackage from classes and other packages. At the highest level, the users' documents use a class file and optionally packages. These documents contain macros written using any combination of TeX primitives and class- and package-defined macros as well as the prose, verse, and mathematics to be typeset. There is a lot more detail that one could go into regarding things like fonts and class and package options, but this isn't so "brief" after all.

Wednesday, September 8, 2010

Private bibliographies

I have two versions of my cv: one for public consumption and one which contains extra private bibliography entries for work that is in submission. Obviously these are mostly the same so I had one one that had a \newif\ifpublic and then later included the commands to for the private bibliography entries \unless\ifpublic ... \fi. Of course, the source still leaked information about the papers in submission which made it so that I could not distribute the source. The solution to this problem lies with \InputIfFileExists. So, for example, one can write
\InputIfFileExists{private}{}{}
which will \input{private} if private.tex exists. This is pretty good except that now we have to have 2 additional files, both private.tex and the private bibliography file, call it private.bib. Here, we can use a feature of BibTeX to reduce the number of extra files. Since BibTeX ignores all text outside of @{...} and TeX ignores everything in a file after \endinput, we can have the necessary LaTeX code for including the private bibliography at the start of private.bib and end the code with \endinput before the BibTeX entries begin. Then the \InputIfFileExists can be changed to
\InputIfFileExists{private.bib}
Now the source for the cv can be distributed. If a public version of the cv needs to be generated, then one can insert another \endinput at the start of private.bib or temporarily move private.bib aside while building.

Saturday, September 4, 2010

Computing the maximum width of two boxes

On occasion, one has cause to compare two boxes and compute the maximum of the two widths. If the two boxes are 0 and 2, then the standard way to do that is something like
\ifdim\wd0>\wd2
    \dimen0 \wd0
\else
    \dimen0 \wd2
\fi
Of course, this works perfectly well. But sometimes, you want something that's a little less easy to read. Thus, I propose
\dimen0 \wd\ifdim\wd0>\wd2 0 \else2 \fi
The spaces around the numbers are to keep TeX from reading more input looking for additional digits.

Friday, August 27, 2010

http://tex.stackexchange.com/

New stack exchange site for TeX and LaTeX. I've probably spent too much time there. I've been meaning to write more here...and I will soon.

Wednesday, May 12, 2010

BibTeX hacking

I've avoided editing bst files but I finally encountered something that I felt needed to change. When abbreviating the name Foo Bar Baz, the abbrv bibliography style produces
F.~B.~Baz
which has too much space between the F and the B for my tastes. After making a copy of abbrv.bst (and renaming it), we can change the argument to the format.name$ function (see Section 18) from "{f.~}{vv~}{ll}{, jj}" to "{f{.\nobreak\,}.~}{vv~}{, jj}" to produce a nonbreaking thin space between the F. and the B. I hope I never again have to edit a a bst.

Sunday, May 9, 2010

Including overly large graphics

Sometimes a graphic (or table or what have you) is simply too large to fit on the page without extending into the margins. Here's a simple way to center the graphic in the page without getting any Overfull hbox warnings.
\par
\hbox to\textwidth{\hss\includegraphics{foo}\hss}%
\par
The two \par aren't really essential. They're just to ensure the graphic is in a paragraph by itself. Replace \includegraphics with any other centered material. Note that this actually causes the graphic to extend an equal amount into the margins. If the margins aren't the same size, this is not centering on the page, but it's probably what you want anyway.

Monday, April 5, 2010

Ordinal numbers

Ever want TeX to write an ordinal number correctly? There are a number of fairly complicated solutions on the web so I decided to write one that was easy to read and follow.
\def\ord#1{\begingroup
        \count@=#1\relax
        \the\count@
        \ifnum\count@>99
                \count0\count@
                \divide\count0 by100
                \multiply\count0 by100
                \advance\count@ by-\count0
        \fi
        \ifnum\count@>19
                \count0\count@
                \divide\count0 by10
                \multiply\count0 by10
                \advance\count@ by-\count0
        \fi
        \ifcase\count@ th%
                \or st%
                \or nd%
                \or rd%
                \else th%
        \fi
\endgroup}
Note that this requires @ to be a letter, so \makeatletter or \catcode`@11. The first \ifnum block mods out by 100 since that plays no role in determining the ordinal suffix. The next block mods out by 10 if \count@ is at least 20. This is because 10th through 19th use th but 20th through 99th only examine the ones digit. Lastly, if the one's digit is 0, 4, 5, 6, 7, 8, or 9, (or the tens digit is 1) use th, otherwise use st, nd, or rd as appropriate. Using old-style superscripts requires a trivial modification.

Default units

In the previous post, I mentioned that when changing font sizes, there is a neat hack that allows the font size macros to have default units. That is the subject of today's brief post. The setup is you want to devise a macro that sets a dimension register to a particular value. For example,
\newdimen\foo
\newcommand*\set[1]{\foo#1\relax}
Now if you know that most of the time the units used are in points—as is the case with the font size macros—you can add the units in the \set macro,
\newcommand*\set[1]{\foo#1 pt\relax}
This works unless you want to sometimes specify your own units. At this point, we can use the \afterassignment primitive and a macro with a delimited argument to scoop up the default units. This is exactly what the \@defaultunits macro does.
\def\@defaultunits{\afterassignment\remove@to@nnil}
\def\remove@to@nill#1\@nnil{}
We can change our \set macro to be
\newcommand*\set[1]{\@defaultunits\foo#1 pt\relax\@nnil}
Let's take a look to see what this does in two cases.
\set{10}
\set{10ex}
In the first case, \set expands which causes \@defaultunits to expand and the \afterassignment is set to \remove@to@nnil. Next, \foo 10 pt\relax causes the dimension register specified by \foo to be set to 10pt and now \remove@to@nnil expands which gobbles up the \@nnil and \set is finished.

In the second case, \@defaultunits expands and \afterassignment is set as before. This time, \foo 10ex causes the dimension register to be set to 10ex and \remove@to@nnil expands which gobbles the pt\relax\@nnil and \set is finished. The \relax is more crucial if the register being set is a skip register rather than a dimension register since the pt by itself is not sufficient for the assignment but the \relax causes TeX to stop looking for more skip specification tokens. Less brief than I intended.

Wednesday, March 31, 2010

Global font size changes

The question of how to change the font size for a row in a table came up recently. It's tricky since each cell in a tabular is in its own group. My thought was a global font size change for the row that needed it and then a global change back. Since commands like \tiny actually perform many different assignments, \global\tiny does not do what we need. TeX provides an \afterassignment primitive that takes the next token and places it after the next assignment. For example,
\afterassignment\TeX
\count255=3
\the\count255
acts like
\count255=3
\TeX
\the\count255
The basic idea is to use \afterassignment with a helper macro which uses \afterassignment and \global.
\def\helper{\afterassignment\helper\global}
This doesn't quite work since there's no good way to stop it. Modifying it slightly to make it conditional allows us to turn it off:
\newif\ifhelper
\def\helper{\ifhelper\afterassignment\helper\global\fi}
Two more snags remain. First, the font size changing macros actually make use of \afterassignment to allow default units (a neat trick in its own right) and \global\afterassignment is an error so we need to reimplement it. Similarly, vrule is used and \global \vrule is an error so we have to work around that. The final snag is that the first time TeX executes one of the font size commands like \tiny it performs extra work and that extra work doesn't play so nicely with our hack. The workaround for this is to simply execute the font size changing macro first. The complete code looks like this.
\newif\ifhelper
\makeatletter
\def\unithelper#1\@nnil{\global\helpertrue\helper}
\def\helper{\ifhelper\afterassignment\helper\global\fi}
\newcommand*\globalfontsize[1]{%
\begingroup
       #1%
       \def\@defaultunits{\helperfalse\afterassignment\unithelper\global}%
       \let\realvrule\vrule
       \def\vrule{\helperfalse\global\helpertrue\afterassignment\helper\realvrule}%
       \global\helpertrue
       \helper#1%
       \helperfalse
\endgroup}
\makeatother
\newcommand*\globaltiny{\globalfontsize\tiny}
\newcommand*\globalnormalsize{\globalfontsize\normalsize}
Now we can use \globaltiny and \globalnormalsize in the table to change the font for the whole row.