• R/O
  • SSH
  • HTTPS

tsukurimashou: Commit


Commit MetaInfo

Revision481 (tree)
Time2013-11-04 10:22:35
Authormskala

Log Message

changes for final article submission to TUGboat

Change Summary

Incremental Difference

--- trunk/tug2013/reach.tex (revision 480)
+++ trunk/tug2013/reach.tex (revision 481)
@@ -27,9 +27,9 @@
2727 \begin{tikzpicture}
2828 \node at (-3,0) {\scalebox{10}{\char"53CA}};
2929 \node at (-3,-3) {Kaku};
30- \draw[black!40!white,very thick] (-2.8,0.6) circle[radius=1];
30+ \draw[red,very thick] (-2.8,0.6) circle[radius=1];
3131 \node at (3,0) {\scalebox{10}{\fontspec{TsukurimashouMinchoPS}\char"53CA}};
3232 \node at (3,-3) {\fontspec{TsukurimashouMinchoPS}Mincho};
33- \draw[black!40!white,very thick] (3.2,0.6) circle[radius=1];
33+ \draw[red,very thick] (3.2,0.6) circle[radius=1];
3434 \end{tikzpicture}
3535 \end{document}
--- trunk/tug2013/outlook.tex (revision 480)
+++ trunk/tug2013/outlook.tex (revision 481)
@@ -26,7 +26,7 @@
2626 \noindent
2727 \begin{tikzpicture}[yscale=1.2]
2828 \node at (0,0) {\scalebox{3}{観}};
29- \node at (0,-1.5) {\scalebox{2}{⿰}};
29+ \node[blue] at (0,-1.5) {\scalebox{2}{⿰}};
3030 \node at (2.5,0) {\begin{bil}U+893B\\\emph{kan}\\``outlook''\end{bil}};
3131
3232 \begin{scope}
@@ -33,7 +33,7 @@
3333 \clip (-6,-6) rectangle (-3.67,-2);
3434 \node at (-3.6,-4) {\scalebox{3.3}{観}};
3535 \end{scope}
36- \node at (-4,-5.5) {\scalebox{2}{⿻}};
36+ \node[blue] at (-4,-5.5) {\scalebox{2}{⿻}};
3737 \node at (-1.5,-4) {[unknown]};
3838
3939 \node at (-7,-8) {\scalebox{3}{矢}};
--- trunk/tug2013/tree.tex (revision 480)
+++ trunk/tug2013/tree.tex (revision 481)
@@ -26,12 +26,12 @@
2626 \noindent
2727 \begin{tikzpicture}[yscale=1.2]
2828 \node at (0,0) {\scalebox{3}{語}};
29- \node at (0,-1.5) {\scalebox{2}{⿰}};
29+ \node[blue] at (0,-1.5) {\scalebox{2}{⿰}};
3030 \node at (2.5,0) {\begin{bil}U+8A9E\\\emph{go}\\``language''\end{bil}};
3131 \node at (-4,-4) {\scalebox{3}{言}};
3232 \node at (-1.5,-4) {\begin{bil}U+8A00\\\emph{i}\\``speak''\end{bil}};
3333 \node at (4,-4) {\scalebox{3}{吾}};
34- \node at (4,-5.5) {\scalebox{2}{⿱}};
34+ \node[blue] at (4,-5.5) {\scalebox{2}{⿱}};
3535 \node at (6.5,-4) {\begin{bil}U+543E\\\emph{ware}\\``myself''\end{bil}};
3636 \node at (1,-8) {\scalebox{3}{五}};
3737 \node at (3.5,-8) {\begin{bil}U+4E94\\\emph{go}\\``five''\end{bil}};
--- trunk/tug2013/forest.tex (revision 480)
+++ trunk/tug2013/forest.tex (revision 481)
@@ -26,7 +26,7 @@
2626 \noindent
2727 \begin{tikzpicture}
2828 \node at (0,0) {\scalebox{3}{林}};
29- \node at (0,-1.5) {\scalebox{2}{⿰}};
29+ \node[blue] at (0,-1.5) {\scalebox{2}{⿰}};
3030 \node at (2.5,0) {\begin{bil}U+6797\\\emph{hayashi}\\``forest''\end{bil}};
3131 \node at (-3,-4) {\scalebox{3}{木}};
3232 \node at (-0.5,-4) {\begin{bil}U+6728\\\emph{ki}\\``tree''\end{bil}};
@@ -35,10 +35,10 @@
3535 \draw[ultra thick,bigah] (-1,-1) -- (-2.5,-3);
3636 \draw[ultra thick,bigah] (1,-1) -- (2.5,-3);
3737 %
38- \draw[black!50!white,ultra thick] (-2.2,-7.5) circle[radius=2.2];
39- \fill[black!25!white,xshift={3.0cm},yshift={-7.5cm},rotate=45]
38+ \draw[green!80!black,ultra thick] (-2.2,-7.5) circle[radius=2.2];
39+ \fill[red!50!white,xshift={3.0cm},yshift={-7.5cm},rotate=45]
4040 (-2.5,-0.1) rectangle (2.5,0.1);
41- \fill[black!25!white,xshift={3.0cm},yshift={-7.5cm},rotate=-45]
41+ \fill[red!50!white,xshift={3.0cm},yshift={-7.5cm},rotate=-45]
4242 (-2.5,-0.1) rectangle (2.5,0.1);
4343 \node at (-2.1,-7.5) {\scalebox{7}{林}};
4444 \begin{scope}
--- trunk/tug2013/preprint.tex (revision 480)
+++ trunk/tug2013/preprint.tex (revision 481)
@@ -2,19 +2,12 @@
22
33 \usepackage{graphicx}
44 \usepackage{ifpdf}
5-\def\CJK{CJK}
6-%\def\GNU{GNU}
7-%\def\EPS{EPS}
8-%\def\TikZ{Ti{\em k}Z}
9-%\def\XML{XML}
10-%\usepackage{metalogo}
11-
12-%\ifpdf
13-%\usepackage[breaklinks,colorlinks,linkcolor=black,citecolor=black,
14-% urlcolor=black]{hyperref}
15-%\else
5+\ifpdf
6+\usepackage[breaklinks,colorlinks,linkcolor=black,citecolor=black,
7+ urlcolor=black]{hyperref}
8+\else
169 \usepackage{url}
17-%\fi
10+\fi
1811
1912 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2013
@@ -36,9 +29,7 @@
3629 \maketitle
3730
3831 \begin{abstract}
39-\MF-based font projects for the Chinese,
40-\linebreak%%!!
41-Japanese, and Korean (\CJK)
32+\MF-based font projects for the Chinese, Japanese, and Korean (\CJK)
4233 languages have been announced every few years since the early 1980s,
4334 even predating the current form of the \MF\ language. Except for a few
4435 non-parameterized conversions of fonts that originated in other formats,
@@ -45,7 +36,7 @@
4536 in 30 years every \MF\ \CJK\ font has been abandoned at or before the
4637 8-bit barrier of 256 \emph{kanji}, nowhere near the thousands required for
4738 practical typesetting. In this presentation I describe the first
48-project to break that barrier: $\!$Tsukurimashou\kern-0.7pt\
39+project to break that barrier: Tsukurimashou
4940 (\url{http://tsukurimashou.sourceforge.jp/}), currently at over 1500
5041 \emph{kanji} (as well as kana, Latin, and Korean hangul) and steadily growing.
5142 I discuss technical and human challenges facing this kind of project,
@@ -58,15 +49,11 @@
5849 \section{Introduction}
5950
6051 The Han script, used by the Chinese, Japanese, and Korean (\CJK) languages
61-among others, includes
62-\linebreak%%!!
63-very many characters. Just counting them is tricky,
52+among others, includes very many characters. Just counting them is tricky,
6453 but a human being might typically need to know a few thousand for basic
6554 literacy in a Han-script language. The list of 2136 characters taught in
6655 the Japanese school system (the \emph{jouyou kanji}) is one benchmark, near
67-the low end. Chinese requires
68-\linebreak
69- more, and a typesetting system may require
56+the low end. Chinese requires more, and a typesetting system may require
7057 more still, because of rare characters found in names, historical contexts,
7158 and so on. A human being can get away with failing to read the occasional
7259 character; typesetting systems need to be able to print nearly all of them.
@@ -79,9 +66,7 @@
7966 building even a simple Latin font with \MF, it may be no surprise that there
8067 are no complete \MF-native \CJK\ typefaces. But on the other hand,
8168 examination of Han-script text (even, or especially, by someone who cannot
82-read it) quickly reveals that characters can be decomposed into
83-\linebreak%%!!
84-smaller
69+read it) quickly reveals that characters can be decomposed into smaller
8570 parts, as shown in Figure~\ref{fig:tree}. Computer scientists who examine
8671 Figure~\ref{fig:tree} are likely to believe they understand it. ``Of
8772 course,'' one supposes, ``the tens of thousands of Han characters are just a
@@ -115,9 +100,7 @@
115100 Figure~\ref{fig:tree}. Mei's report includes images of 346 ``basic strokes
116101 and radicals,'' and 112 completed characters.
117102
118-Subsequent work on \MF-native \CJK
119-\linebreak%%!!
120-fonts includes that of Hobby and Guoan
103+Subsequent work on \MF-native \CJK\ fonts includes that of Hobby and Guoan
121104 in 1984, who created 128 characters~\cite{Hobby:Chinese}; Hosek in 1989,
122105 character count unknown but two are displayed in the \TUB\
123106 article~\cite{Hosek:Design}; Yiu and Wong in 2003, in a project that
@@ -128,14 +111,11 @@
128111 to form more complicated characters.
129112
130113 I listed published \MF-related projects. Similar ideas have also been used
131-behind closed
132-\linebreak%%!!
133- doors in commercial font foundries (\acro{CDL} from Wenlin
114+behind closed doors in commercial font foundries (\acro{CDL} from Wenlin
134115 Institute seems to be an example~\cite{Wenlin:CDL}), and non-\MF\ research
135116 projects like the \acro{LISP}-based Wadalab toolkit~\cite{Tanaka:Wadalab}.
136117 The Wadalab font project ran during the 1990s; much of the work was lost or
137-withdrawn after hard drive failures and copyright infringement concerns that
138-came to light in 2003, but some of its fonts survived to become widely used
118+withdrawn, but some of its fonts survived to become widely used
139119 in the free software world. These kinds of projects use grammars of
140120 character parts, but they lack the full parameterization that \MF\ users
141121 expect. There has also been work on using \CJK\ fonts from other sources in
@@ -168,9 +148,7 @@
168148
169149 Many font file formats are limited to 256 glyphs by their use of 8-bit
170150 character codes. People who attempt to typeset \CJK\ documents in classical
171-\TeX\ use elaborate workarounds involving slicing their
172-\linebreak%%!!
173- fonts into 256-glyph
151+\TeX\ use elaborate workarounds involving slicing their fonts into 256-glyph
174152 sub-fonts. Handling the input encoding for documents written in large
175153 character sets with these slicing schemes is a tough problem too, but
176154 fortunately not one we must solve as font designers. There are extended
@@ -194,9 +172,7 @@
194172 little or no previous work on them in the \MF\ context because nobody has
195173 built systems this size in \MF\ before.
196174
197-Classical \MF\ is designed to produce
198-\linebreak%%!!
199-bitmap fonts, but bitmap fonts are no
175+Classical \MF\ is designed to produce bitmap fonts, but bitmap fonts are no
200176 longer such a desired commodity. A present-day \CJK\ font project will
201177 presumably target a vector format, but making \MF\ or some variation of it
202178 produce vector fonts requires additional layers of software, all of which
@@ -243,9 +219,7 @@
243219 GlyphWiki~\cite{Kamichi:GlyphWiki}, which sacrifices parameterization for a
244220 more purely graphical approach that demands less from the participants.
245221
246-Finally, many of the potential rewards of a
247-\linebreak%%!!
248-\MF\ \CJK\ project, such as
222+Finally, many of the potential rewards of a \MF\ \CJK\ project, such as
249223 academic publications, can be had at the start, before the boring part; and
250224 then there are no more rewards until the end, and few then. You can publish
251225 one paper about your innovative techniques for building fonts; and you can
@@ -280,7 +254,6 @@
280254 \includegraphics[scale=0.70]{forest.pdf}
281255 \caption{A forest is not two identical trees.}
282256 \label{fig:forest}
283-\vskip-1mm%%!!
284257 \end{figure}
285258
286259 In Figure~\ref{fig:outlook}, the left side of ``outlook,'' in addition to
@@ -327,9 +300,7 @@
327300
328301 But in order to produce high-quality fonts with full parameterization, with
329302 all the characters needed to typeset real documents, we must be able to
330-override the simple descriptions and combinations of
331-\linebreak%%!!
332- parts in arbitrarily
303+override the simple descriptions and combinations of parts in arbitrarily
333304 complicated ways\Dash per character and depending non-linearly on the
334305 parameters. To work at full scale, the font description language must have
335306 the power of a general-purpose programming language.
@@ -368,13 +339,9 @@
368339 corporate or large-scale collaborative effort.
369340 \end{itemize}
370341
371-Tsukurimashou is hosted as a free software
372-\linebreak%%!!
373- project on SourceForge Japan,
342+Tsukurimashou is hosted as a free software project on SourceForge Japan,
374343 with the bilingual project home page at
375-\url{http://tsukurimashou.}
376-\linebreak%%!!
377-\url{sourceforge.jp/} featuring downloadable packages,
344+\url{http://tsukurimashou.sourceforge.jp/} featuring downloadable packages,
378345 a Subversion repository for the source code, a bug tracker, mailing list,
379346 and so on. The package as a whole is distributed under the \GNU\ General
380347 Public License, version 3, with a clarifying paragraph added to explicitly
@@ -408,7 +375,7 @@
408375 other projects. Some of it is publishable research in computer science,
409376 certainly welcome for someone hoping to establish an academic career. And
410377 because it places heavy (in some cases unprecedented) demands on other free
411-software systems, Tsukurima\-shou has proven useful in the development of
378+software systems, Tsukurimashou has proven useful in the development of
412379 those systems. Given that I am already committing to spend some time per
413380 character on learning the language, the hope is to make that time pay off in
414381 as many ways as possible.
@@ -415,9 +382,7 @@
415382
416383 \subsection{A brief tour of the fonts}
417384
418-Tsukurimashou as a software package generates
419-\linebreak%%!!
420- OpenType font files as its
385+Tsukurimashou as a software package generates OpenType font files as its
421386 main output. Those are intended for use in general typesetting and word
422387 processing, not only within the \TeX\ world. I most often use them with
423388 \XeTeX. The OpenType fonts are divided up into families, of which the main
@@ -441,7 +406,6 @@
441406 \includegraphics[scale=0.80]{styles.pdf}
442407 \caption{A sample of the Tsukurimashou meta-family of fonts.}
443408 \label{fig:styles}
444-\vskip-2mm%%!!
445409 \end{figure}
446410
447411 \begin{figure}
@@ -448,49 +412,38 @@
448412 \includegraphics[scale=0.69]{mincho.pdf}
449413 \caption{\emph{Kana} and Grade One \emph{kanji} in Tsukurimashou Mincho.}
450414 \label{fig:mincho}
451-\vskip-1mm%%!!
452415 \end{figure}
453416
454-These are outline fonts intended for
455-\linebreak%%!!
456-high-resolution printing. They contain
417+These are outline fonts intended for high-resolution printing. They contain
457418 hinting for bitmap conversion, but it is done automatically and not expected
458-to be extremely high quality. $\!$Japanese-language typesetting has
419+to be extremely high quality. Japanese-language typesetting has
459420 traditionally used monospace metrics, simple scaling (i.e., no corrections
460421 for optical weight), and no slanting or italicization; Tsukurimashou
461-currently offers a choice between
462-\linebreak%%!!
463- monospace or proportional, no optical
422+currently offers a choice between monospace or proportional, no optical
464423 weight features, and italics for the Latin script only.
465424
466-Although the largest use of Tsukurimashou
467-\linebreak%%!!
468-fonts to date has been for
425+Although the largest use of Tsukurimashou fonts to date has been for
469426 typesetting the project's own documentation in English, the design of the
470427 Tsukurimashou Latin glyphs, especially in the Mincho style, is intended
471428 primarily for setting the short fragments of English that sometimes occur in
472-\linebreak
473-Japanese text. Tsukurimashou Mincho used for
474-\linebreak%%!!
475- pure English text ends up
429+Japanese text. Tsukurimashou Mincho used for pure English text ends up
476430 looking like a display face and might not be appropriate for entire
477431 sentences and paragraphs. Tsukurimashou Kaku is more suitable for extended
478432 settings in English.
479433
480-The Jieubsida family (the name is a translation to Korean of
481-``Tsukurimashou'') is intended to support Korean \emph{hangul} (alphabetic)
482-script. \emph{Hanja} (the Korean equivalent of \emph{kanji}) are not
483-included. This character set is relatively orthogonal: the main sequence of
484-11172 glyphs is algorithmically generated from a few tens of basic parts,
485-though many less common letters had to be defined with more human
486-intervention. Work on these fonts has proven useful in debugging the
487-infrastructure at full scale, given that the Tsukurimashou series of fonts
488-will eventually grow to a significant fraction of the size already reached
489-by the Jieubsida series.
434+The Jieubsida\footnote{Intended as a translation to Korean of the name
435+``Tsukurimashou,'' but I am informed that ``Mandeubsida'' would be a better
436+translation, and am considering changing it.} family is intended to support
437+Korean \emph{hangul} (alphabetic) script. \emph{Hanja} (the Korean
438+equivalent of \emph{kanji}) are not included. This character set is
439+relatively orthogonal: the main sequence of 11172 glyphs is algorithmically
440+generated from a few tens of basic parts, though many less common letters
441+had to be defined with more human intervention. Work on these fonts has
442+proven useful in debugging the infrastructure at full scale, given that the
443+Tsukurimashou series of fonts will eventually grow to a significant fraction
444+of the size already reached by the Jieubsida series.
490445
491-Beyond the main Tsukurimashou package,
492-\linebreak%%!!
493- there are several smaller software
446+Beyond the main Tsukurimashou package, there are several smaller software
494447 packages called ``parasites,'' which appear in subdirectories of the
495448 distribution or may be detached. Some of these are font packages that share
496449 some of the Tsukurimashou infrastructure without really being part of the
@@ -505,9 +458,7 @@
505458 unpack it, and type \texttt{./configure} and \texttt{make}.
506459
507460 The build system is based on \GNU\ Autotools. Choosing which source code
508-files are needed for
509-\linebreak%%!!
510-which font styles involves doing some logical inference
461+files are needed for which font styles involves doing some logical inference
511462 that would not be convenient to do in a Makefile, so the Makefiles invoke
512463 additional code written in a subset of Prolog to evaluate the style
513464 selections, then run Perl scripts that scan the \MF\ sources to look for
@@ -518,14 +469,12 @@
518469 target is OpenType outline fonts. There are several \MF\ variants that can
519470 produce outline output from \MF\ source. I chose
520471 MetaType1~\cite{Jackowski:Programming} for
521-Tsu\-kurimashou. This package originates with the Polish \TeX\ users group
472+Tsukurimashou. This package originates with the Polish \TeX\ users group
522473 \acro{GUST}\ and may be most famous for its use in the Latin Modern
523474 project~\cite{Jackowski:Latin}. It consists
524475 primarily of a macro package for Metapost and a postprocessing script for
525476 \GNU\ \texttt{awk}. One run of Metapost generates the glyphs of a font as
526-\EPS\ files; another generates metrics; then the \texttt{gawk} script
527-\linebreak%%!!
528-merges
477+\EPS\ files; another generates metrics; then the \texttt{gawk} script merges
529478 those and does some rewriting of the Postscript code to turn them into a
530479 single Postscript Type 1 font.
531480
@@ -545,9 +494,7 @@
545494 Each Postscript font contains up to 256 glyphs (but usually far fewer than
546495 that), corresponding to a 256-character block of the Unicode character
547496 space. Many of these Postscript fonts are needed for each full-coverage
548-OpenType font. The build system runs them individually through a
549-\linebreak
550-FontForge
497+OpenType font. The build system runs them individually through a FontForge
551498 script that removes overlapping sections of splines, this being an easier
552499 operation in FontForge than on the \MF\ side, and then once all Postscript
553500 fonts for an OpenType font have had their overlaps removed, it runs another
@@ -556,13 +503,9 @@
556503 during development where only some of the Postscript fonts have changed: it
557504 reduces the amount of work needed to reassemble the updated OpenType font.
558505
559-There are additional stages of processing in
560-\linebreak%%!!
561- FontForge after the Postscript
562-fonts are merged. $\!$The raw outlines generated by \MF\ may contain excessive
563-or poorly-located spline control points;
564-\linebreak%%!!
565- scripts in FontForge attempt to
506+There are additional stages of processing in FontForge after the Postscript
507+fonts are merged. The raw outlines generated by \MF\ may contain excessive
508+or poorly-located spline control points; scripts in FontForge attempt to
566509 remove those. Similarly, some technical rules of the font formats (such as
567510 having points at the $x$ and $y$ extrema of each curve) need to be enforced.
568511 There is another processing chain for automated horizontal spacing and
@@ -570,9 +513,7 @@
570513 system generates bitmap fonts in \acro{BDF} format and a C program
571514 calculates spacing corrections, which are then applied back to the merged
572515 OpenType fonts. Other scripts run on the side do things like constructing
573-OpenType glyph-
574-\linebreak%%!!
575-substitution tables for Korean \emph{hangul} support, and
516+OpenType glyph-substitution tables for Korean \emph{hangul} support, and
576517 collecting data for proof generation. According to recent statistics from
577518 Ohloh~\cite{Ohloh:Languages}, 63\% of the project's code is written in
578519 Metapost (the font descriptions proper), 8\% is in \LaTeX\ (documentation),
@@ -581,9 +522,7 @@
581522
582523 \subsection{The \MF\ code}
583524
584-Here is Tsukurimashou's code defining the
585-\linebreak%%!!
586- ``language''
525+Here is Tsukurimashou's code defining the ``language''
587526 glyph of Figure~\ref{fig:tree}; three styles of it are shown at the top of
588527 Figure~\ref{fig:threestyle}. This glyph is of about
589528 average complexity; some are even simpler, and a few involve much more
@@ -607,7 +546,6 @@
607546 \includegraphics[scale=0.63]{threestyle.pdf}
608547 \caption{Three styles of ``language'' and ``five.''}
609548 \label{fig:threestyle}
610-\vskip-2mm%%!!
611549 \end{figure}
612550
613551 This code exists in a file named \verb|tsuku-8a.mp|, which covers the
@@ -621,9 +559,7 @@
621559 requires the build system to keep track of all the inter-file dependencies.
622560
623561 Tsukurimashou frequently uses a sort of functional programming via \MF's
624-concept of
625-\linebreak%%!!
626- text arguments to macros. There is a global stack data structure
562+concept of text arguments to macros. There is a global stack data structure
627563 of objects (several kinds) that will eventually be rendered into the glyph.
628564 A macro will receive one or more arguments that are themselves fragments of
629565 code; it runs them, then examines the objects they added to the stack and
@@ -634,12 +570,8 @@
634570 finished glyph.
635571
636572 The macro
637-\verb|build_kanji.lr|, for combining
638-\linebreak%%!!
639-things left-to-right, allows its two
640-arguments to run, then scales and shifts their results to cover two
641-\linebreak%%!!
642- smaller
573+\verb|build_kanji.lr|, for combining things left-to-right, allows its two
574+arguments to run, then scales and shifts their results to cover two smaller
643575 rectangles. The numeric arguments $(450,0)$ specify that in this case, the
644576 dividing line is at $x$ coordinate $450$, and the two rectangles overlap by
645577 an amount of $0$. So the left side runs from $(50,-50)$ to $(450,850)$ and
@@ -655,9 +587,7 @@
655587 putting together existing pieces in a standardized way.
656588
657589 Here is code for the \emph{kanji} numeral ``five,'' which is invoked indirectly by
658-\verb|kanji.grtwo.language|
659-\linebreak%%!!
660- when it calls \verb|kanji.grnine.my|. This
590+\verb|kanji.grtwo.language| when it calls \verb|kanji.grnine.my|. This
661591 glyph is shown at the bottom of Figure~\ref{fig:threestyle}. This is
662592 typical of the basic shapes that are not made up of smaller components.
663593 \begin{verbatim}
@@ -680,17 +610,13 @@
680610 enddef;
681611 \end{verbatim}
682612
683-The \verb|push_stroke| macros save paths on the
684-\linebreak
685-stack, with each stroke
613+The \verb|push_stroke| macros save paths on the stack, with each stroke
686614 defined by one path for the spine of the stroke, and a second path
687615 describing how the stroke weight (eventually translated to ``width'' through
688616 a style-dependent matrix) changes along the length of the stroke. Other
689617 macros, such as \verb|set_boserif|, push other objects on the stack to indicate
690618 where serifs (\emph{uroko}) should be added in styles that use them. The
691-whole thing, like
692-\linebreak%%!!
693-\verb|kanji.grtwo.language| before it, is bracketed by
619+whole thing, like \verb|kanji.grtwo.language| before it, is bracketed by
694620 \verb|push_pbox_toexpand| and \verb|expand_pbox|, which respectively save, and adjust
695621 the size of, an object called a ``proof box.''
696622
@@ -738,9 +664,7 @@
738664 at the upper right?'' Existing dictionaries sometimes offer what is called
739665 ``multi-radical'' search, whereby the user can specify one or more
740666 components and then see a list of all \emph{kanji} that contain all those
741-components. But multi-radical
742-\linebreak%%!!
743- search features seldom if ever capture
667+components. But multi-radical search features seldom if ever capture
744668 structural information like ``on the left''; such a system would just show
745669 all the characters that contain ``speak'' in one pile for the user to dig
746670 through. In the initial stages of laying out Tsukurimashou's \emph{kanji}
@@ -750,18 +674,14 @@
750674
751675 The IDSgrep package attempts to serve that need. With some irony intended,
752676 IDSgrep's stated goal is to bring the user-friendliness of \verb|grep| to
753-Han character dictionaries. IDSgrep is one of the Tsuku\-rimashou parasites:
677+Han character dictionaries. IDSgrep is one of the Tsukurimashou parasites:
754678 it comes included with the full distribution in a separate directory, or can
755679 be distributed on its own.
756680
757681 Recall the tree decomposition of Figure~\ref{fig:tree}. That tree might be
758682 rendered into a simple \acro{ASCII}-based prefix notation as
759-``\verb|[lr](speak)[tb](five)|
760-\linebreak%%!!
761-\verb|(mouth)|'': it is a left-right combination of
762-two
763-\linebreak%%!!
764-things, the first of which is ``speak'' and the second is a top-bottom
683+``\verb|[lr](speak)[tb](five)(mouth)|'': it is a left-right combination of
684+two things, the first of which is ``speak'' and the second is a top-bottom
765685 combination of ``five'' and ``mouth.'' As argued earlier in this paper,
766686 such descriptions are not enough to render high-quality glyphs; but maybe if
767687 we include a few general catch-all categories like ``overlap,'' and accept
@@ -791,35 +711,23 @@
791711 regular expression search on these descriptions may be less than
792712 satisfactory. IDSgrep implements a tree-matching query language in which
793713 the user can specify character components to search for explicitly, or use
794-matching operators like wildcard, match-
795-\linebreak%%!!
796-anywhere, Boolean operations, and so
714+matching operators like wildcard, match-anywhere, Boolean operations, and so
797715 on. The \acro{IDS} syntax is not quite sufficiently flexible and
798-well-
799-\linebreak%%!!
800-defined to encompass all the tasks IDSgrep demands of it, and the
716+well-defined to encompass all the tasks IDSgrep demands of it, and the
801717 special Unicode combining operation characters are difficult to type (and to
802718 typeset in Computer Modern!); so IDSgrep defines extensions to the syntax
803719 and \acro{ASCII} synonyms for the special characters,
804720 forming a language of Extended Ideographic Description Sequences
805-(\acro{EIDS}es) that subsumes the
806-%\linebreak%%!!
807-Unicode \acro{IDS} syntax.
721+(\acro{EIDS}es) that subsumes the Unicode \acro{IDS} syntax.
808722
809-IDSgrep's user interface consists of a Unix
810-\linebreak%%!!
811-command-line utility similar to
723+IDSgrep's user interface consists of a Unix command-line utility similar to
812724 \verb|grep|. It reads a database of trees in \acro{EIDS} syntax, from files
813-or standard input, and writes out any that match the
814-\linebreak%!!
815- matching pattern
725+or standard input, and writes out any that match the matching pattern
816726 specified on the command line: just like \verb|grep|. The syntax for
817727 matching patterns is complicated because it is powerful, but no worse for
818728 skilled users than standard regular expressions. After learning the syntax,
819729 a user can easily and quickly compose queries like ``What characters have
820-this
821-\linebreak%%!!
822-component in that location, but not that other component anywhere?''
730+this component in that location, but not that other component anywhere?''
823731
824732 The latest version, IDSgrep 0.4, uses Bloom filters and binary decision
825733 diagrams to speed up searches. Although the full tree-matching algorithm is
Show on old repository browser