Saturday, July 2, 2011

Using TeX to merge \input from top level files

This is silly. There are better ways to do it. Still, it was sort of fun to write.
\endlinechar=-1
\newread\in
\newwrite\out
\message{Please enter input file name: }
\read16to\inname
\openin\in=\inname \relax
\ifeof\in
        \immediate\write16{Failed to open \inname.}
        \expandafter\end
\fi
\message{Please enter output file name: }
\read16to\outname
\immediate\openout\out=\outname \relax
\begingroup
\catcode`@0
\catcode`(1
\catcode`)2
\catcode`\{12
\catcode`\}12
\catcode`I12
\catcode`N12
\catcode`P12
\catcode`U12
\catcode`T12
\catcode`\\12
@lowercase(
        @gdef@dosplitline#1\INPUT{#2}#3@splitsentinal(@def@ante(#1)@def@file(#2)@def@post(#3))
        @gdef@splitline(@expandafter@dosplitline@line\INPUT{@sentinal}@splitsentinal)
)
@endgroup
\def\splitpost{\expandafter\dosplitline\post\splitsentinal}
\def\sentinal{\sentinal}
\catcode`\%12
\def\processline{
        \ifx\file\sentinal
                \immediate\write\out{\ante}
                \let\temp\relax
        \else
                \immediate\write\out{\ante%}
                \let\temp\processline
                \copyfile
                \splitpost
                \ifx\empty\ante
                        \ifx\file\sentinal
                                \let\temp\relax
                        \fi
                \fi
        \fi
        \temp
}
\newread\f
\def\copyfile{
        \openin\f=\file\relax
        \ifeof\f
                \immediate\write16{Failed to open \file. Continuing.}
        \else
                \begingroup
                \loop
                        \readline\f to\line
                        \unless\ifeof\f
                        \immediate\write\out{\line}
                \repeat
                \endgroup
                \closein\f
        \fi
}

\loop
        \readline\in to\line
        \unless\ifeof\in
        \splitline
        \processline
\repeat
\closein\in
\immediate\closeout\out
\end

Friday, June 3, 2011

Inverted pyramid typesetting

University thesis committees are fairly well-known in the typesetting world for having the most absurd requirements. One example requires heading text to be typeset centered and no more than 4.5 in wide in which the lines become progressively shorter.

This is stupid. Still, I've got a solution, based on egreg's:
\newcommand\stupid[1]{%
        \vbox{%
                \hsize=4.5in
                \parindent=0pt
                \leftskip=0pt plus.5fil
                \rightskip=0pt plus-0.5fil
                \parfillskip=0pt plus1fil
                \emergencystretch=1in
                \parshape6
                0.00in 4.50in
                0.25in 4.00in
                0.50in 3.50in
                0.75in 3.00in
                1.00in 2.50in
                1.25in 2.00in
                \huge
                \bfseries
                \strut
                #1%
        }%
}
The \parshape specifies what to do for the first 6 lines by giving pairs of numbers. The first in the pair is the indentation and the second is the line length. The settings of \leftskip, \rightskip, and \parfillskip come from TeX by Topic and are used to center the last line.

Thursday, May 26, 2011

Typesetting on a grid 1: heightrounded

One thing that I dislike about LaTeX's output—especially in two columns—is that lines of prose are not typeset on a grid. I'm hoping to do a series of posts on little things that can be done to improve the situation. (There is the grid package, but I've never had it work for me.)

One thing that really stood out like a sore thumb to me, especially with two columns and large paragraphs is that frequently there is some small space between paragraphs. The reason for this is quite simple. Often one has fixed margin and leading requirements, say 1" margin on each side and a 12 pt leading. At 72.27 pt per inch and an 8.5" x 11" paper, one cannot get an integral number of lines of text per page. In fact, with a 1" margin and a 12 pt leading, one can fit 54 lines of text on the page with 2.43 pt to spare.

There are two things that can be done about this. The first is to change \topskip so that the top line of each page is moved down. The second is to change the \textheight so that the bottom of the page is moved up. The right way to do the second is to use the heightrounded option from the geometry package. Since it will be useful later, we will also set \topskip to \baselineskip. In essence, we will be doing both at once.

This looks something like this.
\topskip=\baselineskip
\usepackage[margin=1in,heightrounded]{geometry}
Note that we need to change \topskip before we use geometry since it uses the value of \topskip that is in force when the option is loaded. (Alternatively, one can use \geometry{heightrounded} to use the new value, for example if geometry has already been loaded.)

Sunday, March 27, 2011

LaTeX's failure with floats

It's probably fairly uncontroversial to say that floats are one of the main areas where LaTeX performs poorly in comparison to WYSIWYG editors. The basic complaint is that floats just don't go where we want them.

To compensate for this, people often use the h float specifier or H from the float package to say, “place the figure here!” This is often a poor approach since there's no real idea of where here really is. This leads to moving the code around between paragraphs, trying to find a reasonable place to put it. Since this is a bad idea, I'm not going to focus on it. Instead, I'm going to talk about real floating material.

Part of the problem is that TeX produces output one page at a time. Once a page is finished, it is shipped out (i.e., written to the dvi or pdf) and not touched again. What this means for LaTeX is that by the time it has seen the \begin{figure} (or other floating material), it has already finished with all of the pages before it. So what it does is it performs a complicated interaction with the output routine which will try to place the figure (subject to the specifiers) on the current page. If that fails, it gets stuffed onto a defer list to be inserted later.

Often, what we really want is for the image to go onto the previous page since that leads to better overall placement. But since LaTeX cannot handle that, we're forced to move the image code ourselves, just like we had to do with the H specifier.

One partial solution is to put each float in a separate file, floatfoo.tex and then move \input{floatfoo} around until a reasonable placement is found. This is not entirely satisfactory since we still have this guess and check procedure.

What I would like is a solution that allows the author to specify a page number (and position) and have the float placed there, if at all possible. I haven't fully thought through what I'd like in an interface, so here are some thoughts about requirements and challenges.
  • The interface should work with twocolumn documents at the very least and it would be better if it supported the multicol package. Something like
    \begin{figure}[page=4,column=2,position=tb]
    might be nice.
  • There are tokenization issues so it is probably not acceptable to tokenize the body of the float, store it somewhere, and then reproduce it when needed since category codes will be assigned at tokenization time. This almost certainly requires writing the output to another file.
  • One idea is to use the filecontents environment to write the body of the figure to separate file with appropriate \if... guards. I'm envisioning something like
    \begin{figure}[page=4,position=tb]
        \centering
        \includegraphics{foo}
        \caption{bar}
        \label{fig:foo}
    \end{figure}
    being written to \jobname.figs as
    \ifnum4=\count0
    \begin{figure}[tb]
        \centering
        \includegraphics{foo}
        \caption{bar}
        \label{fig:foo}
    \end{figure}
    \endif
    Then, for each page, \jobname.figs is \input in a manner similar to \afterpage or \AtBeginPage. This would need something extra for twocolumn documents.
  • There's the question of trying to keep figures in order if some are specified with particular page requirements and others are not.
  • There's an issue if a float depends on a macro being defined but the page specifier puts it before the definition. I don't see how to get around that.
  • There's an issue with trying to work with other floating environments such as the excellent lstlisting from the listings package.
I'm sure there are more issues I haven't considered. This does seem doable though.

Sunday, March 20, 2011

Knuth quote V

Somewhat mysteriously, in the middle of the chapter on macros in The TeXbook, Knuth defines \rhead—the macro he uses to keep track of the running headline. The definition itself is a little odd in that when \rhead is executed, it globally redefines \rhead to be almost the same text sans the definition.
\def\rhead{Chapter \chapno: Definitions (aka Macros)% my little joke
  \gdef\rhead{Chapter \chapno: Definitions (also called Macros)}}
However, it is the comment here that interests me. What is his joke? My best guess is that he used macros to change the running headline after the page on which this \def appears. He never calls attention to this change in the text and never explains the joke.

Saturday, March 19, 2011

Random numbers in TeX

Recent versions of pdfTeX contain primitives for generating random integers.
  • \pdfuniformdeviate num generates a uniformly distributed random integer in the range [0, num).
  • \pdfnormaldeviate generates a normally distributed random integer with mean 0 and “a unit of 65536”. (I've never seen unit used that way, so I'm not sure exactly what the manual means.)

These are both expandable and can be used if you need random numbers for some reason. Here's one toy example that generates coinflips with a biased coin.
\def\coinflip#1{%
        \ifnum#1>\pdfuniformdeviate1000
                H%
        \else
                T%
        \fi
}
\tt
\parindent=0pt
\raggedright
\newcount\n \n=0
\newcount\heads \heads=0
\newcount\tails \tails=0
\loop\ifnum\n<1000
        \if\coinflip{327}H%
                \advance\heads by 1
                H
        \else
                \advance\tails by 1
                T
        \fi
        \advance\n by 1
\repeat

\vskip\baselineskip
\rm
$p=0.327$\par
Heads: \number\heads\par
Tails: \number\tails
\bye
This generates 1000 coin flips with a bias of p = 0.327. It prints the results of each coin flip as well as counting the number of heads and tails.

Sunday, March 13, 2011

Knuth quote IV

When quoting Lamport about writing Greek letters being as easy as writing “... as easy as $\pi$” in The TeXbook, Knuth cites the book as LaTeX Document Preparation System. He comments,
Note: the final manual has a slightly different wording on p43.
It's now called "LaTeX: A Document Preparation System" (1986)
But I decided to cite the original, partly because I have
no smallcaps sans-serif `A' to match the new LaTeX logo!

Friday, February 4, 2011

Fixing ugly Times math

Some time ago, I gave recommendations for packages to use to produce output in Times roman fonts (and others that look nice with Times). Doing this has the downside that certain aspects of the Times math fonts are really ugly. In particular the sum operator and the calligraphy font look awful.


René van Bevern has a solution for using Computer Modern for \mathcal and \sum along with the mathptmx package.

Using his idea, I have a very similar solution for using the Latin Modern font family instead.
\SetMathAlphabet{\mathcal}{normal}{OMS}{lmsy}{m}{n}
\SetMathAlphabet{\mathcal}{bold}{OMS}{lmsy}{m}{n}
\SetSymbolFont{largesymbols}{normal}{OMX}{lmex}{m}{n}
\SetSymbolFont{largesymbols}{bold}{OMX}{lmex}{m}{n}

Sunday, January 30, 2011

Text font sizes

One of the most commonly asked questions is how to change font sizes in LaTeX beyond those offered by \tiny through \Huge. There are a number of different font parameters that can be selected using the New Font Selection Scheme. In particular, one can select the exact font size and line height (the leading). To select a 10pt font on 12pt leading, one can use
\fontsize{10}{12}%
\selectfont
Of course, 10 and 12 can be replaced with any values. Since \fontsize uses \@defaultunits (see here), units other than pt can be used.

Saturday, January 29, 2011

Knuth quote III

Here's a great one from the source code to TeX. It's the beginning of section 1154.
The simplest math formula is, of course, ‘$ $’, when no noads are generated. The next simplest cases involve a single character, e.g., ‘$x$’. Even though such cases may not seem to be very interesting, the reader can perhaps understand how happy the author was when ‘$x$’ was first properly typeset by TeX.
Being a programmer myself, I most certainly can.

A noad, by the way, is what TeX calls (most of) the elements of a math list. (It can also contain nodes, which are what appear in horizontal and vertical lists.)

Sunday, January 16, 2011

Knuth quote II

Okay, this one isn't a comment in code, yet it amuses me anyway. This is Exercise 24.1 in The TeXbook.
Can you think of a reason why you might want ‘A12’ to be a ‹hex digit› even though the letter A has category 11? (Don't worry if your answer is “no.”)
For the record, my answer was “yes.” Unfortunately, my reason wasn't correct.

Friday, January 14, 2011

Knuth quote I

Scattered throughout Knuth's programs are a number of great comments. I really enjoy encountering these and I thought I'd share them. (As I wrote in my first post, this blog is mostly a way for me to collect pieces of information. As such, I'm mostly collecting these quotes for me.)

In tex.web, line 950 is
last:=first; {cf.\ Matthew 19\thinspace:\thinspace30}
This line of code is assigning first to the variable last. You can find many versions of that verse here. The King James version is
But many [that are] first shall be last; and the last [shall be] first.

Wednesday, January 12, 2011

Better \mbox and \fbox

As currently implemented in LaTeX, \mbox and \fbox (among others) have a bizarre limitation: their arguments cannot change category codes. One practical consequence of this is that \verb is not allowed in the arguments.

This limitation is completely artificial and with a little care can be removed. To see why this limitation exists, it's helpful to take a look at the implementation of \mbox.
\long\def\mbox#1{\leavevmode\hbox{#1}}
From this, it becomes clear why category codes cannot be changed. The reason is simple; parameters to macros are tokenized so the parameter for \mbox has fixed category codes.

The fix for this is simple and has two consequences.
\def\bettermbox{\leavevmode\hbox}
Here, we see that \bettermbox takes no arguments and so \bettermbox{foo} will expand to \leavevmode\hbox{foo}. The first consequence is that the parameter is not tokenized before being executed so we are free to write
\bettermbox{\verb!&^%$#!}
The second consequence is slightly more subtle. We can give box specifications such as to 3in or spread 12pt before the left brace.

Okay, fixing \mbox was easy, but what about something more complicated like \fbox? First we need to look at its definition.
\long\def\fbox#1{%
        \leavevmode
        \setbox\@tempboxa\hbox{%
                \color@begingroup
                \kern\fboxsep
                {#1}%
                \kern\fboxsep
                \color@endgroup
        }%
        \@frameb@x\relax
}
The actual framing of the box happens in \@frameb@x. Again, the parameter is tokenized when it need not be. Unfortunately, this time, it's not immediately obvious how to proceed. The trick is to use \afterassignment to insert code just after the opening brace (and before the tokens from \everyhbox, if any, are inserted) and then to use \aftergroup to close the box and typeset it using \@frameb@x.

Here is the code.
\def\betterfbox{%
        \leavevmode
        \afterassignment\bfb@i
        \setbox\@tempboxa\hbox
}
\def\bfb@i{%
        \color@begingroup
        \kern\fboxsep
        \bgroup
        \aftergroup\bfb@ii
}
\def\bfb@ii{%
        \kern\fboxsep
        \color@endgroup
        \egroup
        \@frameb@x\relax
}
If we expand this all out, we see that \betterfbox{foo} becomes
\leavevmode
\setbox\@tempboxa\hbox{%
        \color@begingroup
        \kern\fboxsep
        \bgroup
        foo}%
        \kern\fboxsep
        \color@endgroup
\egroup
\@frameb@x\relax
The only difference between that and \fbox is the braces for the \hbox are { \egroup and the braces around foo are \bgroup } whereas for \fbox they are all explicit brace tokens.

So did the LaTeX team make the right choice? I'm not sure. The original definitions are certainly clearer. A general rule of thumb I try to follow when writing TeX code that others might use is to delay tokenization as long as possible.

As a final point, this is much easier to do with environments than macros because the code that ends the box can just go in the \endfoo macro. LaTeX makes extensive use of this.

Tuesday, January 11, 2011

A simpler .dtx template

Two things that annoyed me with making LaTeX packages using a .dtx file were the duplication of the copyright/license info and that nearly every line in the file was commented out. About the only thing that wasn't commented out was a small batch file that generated the output.

For the most part, both of those annoyances are unnecessary. The following is a simple template file for making a .sty file from a .dtx. Almost every place that the name of the package would appear has been replaced with \jobname. The exceptions are in the license at the top of the file (I am not a lawyer and so I'd rather not give an opinion on what it means to claim that \jobname.dtx and \jobname.sty are covered by the license) and then near the end where the SVN keyword Id has been expanded out.
% \iffalse The license starting three lines down applies to this file
%<*batchfile>
{\obeylines\obeyspaces \gdef\thepreamble{
Copyright (C) 2011 by YOUR NAME

This file may be distributed and/or modified under the
conditions of the LaTeX Project Public License, either
version 1.3c of this license or (at your option) any later
version.  The latest version of this license is in:

    http://www.latex-project.org/lppl.txt

and version 1.3c or later is part of all distributions of
LaTeX version 2005/12/01 or later.

This work has the LPPL maintenance status 'maintained'.

The Current Maintainer of this work is YOUR NAME.

This work consists of THISFILE.dtx and the derived file
THISFILE.sty.
}}
\begingroup
\input docstrip
\keepsilent
\usedir{tex/latex/\jobname}
\expandafter\preamble\thepreamble\endpreamble
\askforoverwritefalse
\generate{\file{\jobname.sty}{\from{\jobname.dtx}{}}}
\endgroup
\documentclass{ltxdoc}
\usepackage{\jobname}
\usepackage[margin=1.5in]{geometry}
\usepackage[pdfborder={0 0 0}]{hyperref}

\CheckSum{0}
\CharacterTable
 {Upper-case    \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z
  Lower-case    \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z
  Digits        \0\1\2\3\4\5\6\7\8\9
  Exclamation   \!     Double quote  \"     Hash (number) \#
  Dollar        \$     Percent       \%     Ampersand     \&
  Acute accent  \'     Left paren    \(     Right paren   \)
  Asterisk      \*     Plus          \+     Comma         \,
  Minus         \-     Point         \.     Solidus       \/
  Colon         \:     Semicolon     \;     Less than     \<
  Equals        \=     Greater than  \>     Question mark \?
  Commercial at \@     Left bracket  \[     Backslash     \\
  Right bracket \]     Circumflex    \^     Underscore    \_
  Grave accent  \`     Left brace    \{     Vertical bar  \|
  Right brace   \}     Tilde         \~}

\DoNotIndex{\def}

\EnableCrossrefs
\CodelineIndex
\RecordChanges
\GetFileInfo{\jobname.sty}
\title{The \textsf{\jobname} package\thanks{This document
corresponds to \textsf{\jobname}~\fileversion, dated \filedate.}}
\author{YOUR NAME\\\texttt{YOU@YOUR.ADDRESS}}

\begin{document}
\maketitle

\phantomsection
\addcontentsline{toc}{section}{\abstractname}
\begin{abstract}
The \textsf{\jobname} package XXX.
\end{abstract}

\phantomsection
\addcontentsline{toc}{section}{\contentsname}
\tableofcontents

\section{Introduction}
The \textsf{\jobname} package XXX

\section{Usage}
XXX

\StopEventually{
        \typeout{**************************************************}
        \typeout{*}
        \typeout{* To finish the installation, you have to move the}
        \typeout{* following file into a directory searched by TeX:}
        \typeout{*}
        \typeout{* \space\space \jobname.sty}
        \typeout{*}
        \typeout{* Documentation is in \jobname.\ifpdf pdf\else dvi\fi.}
        \typeout{*}
        \typeout{* Happy TeXing!}
        \typeout{**************************************************}
        \end{document}
}
\clearpage
\DocInput{\jobname.dtx}
\clearpage
\phantomsection
\addcontentsline{toc}{section}{Change History}
\PrintChanges
\phantomsection
\addcontentsline{toc}{section}{Index}
\PrintIndex
\Finale
%</batchfile>
% \fi
%
% \section{Implementation}
% XXX This is the only part of the file where the main text is written
% with leading comments.
%
% \changes{v1.0}{2011/01/09}{Initial version}
%    \begin{macrocode}
\NeedsTeXFormat{LaTeX2e}[1999/12/01]
\RequirePackage{svn-prov}
\ProvidesPackageSVN
        {$Id: example.dtx 1 2011-01-11 07:21:05Z TH $}
        [v1.0 \revinfo\ SHORT DESCRIPTION.]

% YOUR PACKAGE CONTENTS HERE.
\endinput
%    \end{macrocode}
% \endinput
A quick description of how this works. It first saves the copyright/license text into a macro \thepreamble. Then the code between the begin and end group use docstrip to create \jobname.sty from \jobname.dtx. This works by stripping out all lines that start with comments and all lines between %<*batchfile> and %</batchfile> prepending the preamble text. So what results is the copyright/license text and the actual code of the package.

After \endgroup, we have the standard LaTeX documentation class, packages to use, the check sum, character, etc. that are normally used with the doc/ltxdoc package/class.

If the implementation is not going to be typeset, the code in \StopEventually will run, outputting some instruction. Otherwise, \DocInput{\jobname.dtx} executes. This causes the .dtx file to be input, but this time, % is ignored. If we look at the top of the file, there is an % \iffalse and just after the </batchfile> is the matching % \fi. Thus, all of that will be skipped by TeX.

And that brings us to the only part of the code that is prefixed with comments: the implementation documentation. As usual, the code appears between %    \begin{macrocode} and %    \end{macrocode}. Finally, the input is finished by % \endinput. After this, you can put something like
% vim: set ts=4 sts=4 expandtab:
which will be ignored by docstrip because it starts with a comment and by \DocInput because of the \endinput just before it.

After \DocInput is finished, this prints the change history and the index. Lastly, \Finale expands to the argument of \StopEventually, ending the document.

If you name the file example.dtx, it can be compiled as follows.
$ pdflatex example.dtx
$ makeindex -s gglo -o example.gls example.glo
$ makeindex -s gind example
$ pdflatex example.dtx
$ pdflatex example.dtx
Happy TeXing!

Friday, January 7, 2011

Center last line in a paragraph

I'm not totally sure why one would want to center the last line in a paragraph, but the excellent TeX by Topic provides a surprising solution.
\leftskip=0pt plus.5fil
\rightskip=0pt plus-.5fil
\parfillskip=0pt plus1fil
The reason this works is that for all lines except for the last, the amount of stretch in the line is 0, so no skips are inserted. In the last line, the addition of the \parfillskip gives a skip of .5fil on the left and right so the line is centered.

Sunday, January 2, 2011

Counting the number of lines in a file

The simplest way to count the number of lines in a file seems to be to use the ε-TeX primitive \readline. (One could probably get away with using the TeX primitive \read instead, but that reads lines according to the current catcodes set. Also, the code is slightly more complicated since we cannot use \unless without ε-TeX.)
\newread\lf
\newcount\linecount
\newcommand\linesinfile[1]{%
        \linecount-1
        \openin\lf#1
        \unless\ifeof\lf
                \loop\unless\ifeof\lf
                        \readline\lf to\lfline
                        \advance\linecount by1
                \repeat
                \closein\lf
        \fi
}
If the file does not exist, \linecount will be -1. Otherwise, it'll contain the number of lines in the file.