• R/O
  • SSH
  • HTTPS

tsukurimashou: Commit


Commit MetaInfo

Revision386 (tree)
Time2013-03-04 11:00:43
Authormskala

Log Message

"branch," "pulse," "clothes," "rend," fntbase changes, idsgrep doc

Change Summary

Incremental Difference

--- trunk/idsgrep/idsgrep.tex (revision 385)
+++ trunk/idsgrep/idsgrep.tex (revision 386)
@@ -924,13 +924,15 @@
924924 section on ``cooked output'' in this manual for more details.
925925
926926 \item[\texttt{-f}, \texttt{--font-chars}]
927-Read a font file and make its character coverage available as a
928-user-defined matching predicate through the ``\texttt{\#}'' matching
929-operator. In the current version, this feature can only read TrueType
930-and OpenType files that contain Unicode (or near equivalent) mappings
931-described with cmap subtable types 0, 2, 4, 12, or 13. This option may
932-be specified multiple times, with successive invocations corresponding
933-to user-defined predicates 1, 2, 3, and so on.
927+Read a font file and make its character coverage available as a user-defined
928+matching predicate through the ``\texttt{\#}'' matching operator. In the
929+current version, this feature can only read TrueType and OpenType files that
930+contain Unicode (or near equivalent) mappings described with cmap subtable
931+types 0, 2, 4, 12, or 13. This option may be specified multiple times, with
932+successive invocations corresponding to user-defined predicates 1, 2, 3, and
933+so on. The maximum number of user-defined predicates is limited to the
934+number of bits in the largest integer type available to the C compiler; 32
935+or 64 on many systems.
934936
935937 \item[\texttt{-U}, \texttt{--unicode-list}]
936938 Generate a dictionary of Unicode code points, and read that before
@@ -1152,7 +1154,7 @@
11521154 The hexadecimal escapes \texttt{\textbackslash x} and \texttt{\textbackslash
11531155 X} offer a choice of two-digit, four-digit, or variable-length (enclosed by
11541156 curly braces) hexadecimal specification of Unicode code points. The hex
1155-codes are case-insensitive. Values greater than 1FFFFF, and therefore
1157+codes are case-insensitive. Values greater than 10FFFF, and therefore
11561158 outside the Unicode range, will be replaced by the Unicode replacement
11571159 character U+FFFD.
11581160
--- trunk/idsgrep/idsgrep.1.in (revision 385)
+++ trunk/idsgrep/idsgrep.1.in (revision 386)
@@ -11,10 +11,9 @@
1111 .SH DESCRIPTION
1212 The
1313 .B @PACKAGE@
14-program parses the input files, or standard input if no filename is
15-specified, into Extended Ideographic Description Sequences
16-(EIDSes, described
17-in more detail below) separated by whitespace.
14+program parses the input files, or standard input if no other source of
15+data is specified, into Extended Ideographic Description Sequences
16+(EIDSes, described in more detail below) separated by whitespace.
1817 Any EIDS in the input that matches
1918 .I PATTERN
2019 is echoed through to standard output, along with its trailing whitespace
@@ -117,9 +116,43 @@
117116 .I NAME
118117 being empty.
119118 .TP
119+.BR \-f "FONT\fR,\fP " "\-\^\-font-chars=" FONT
120+Create a user-defined matching predicate that will match a tree if and only
121+if its head is a single character covered by the font in the file named
122+.IR FONT .
123+The font should be an OpenType font with appropriate encoding tables for
124+Unicode, ASCII, or some near equivalent.
125+Windows TrueType will probably work; it is unknown to what extent Macintosh
126+fonts that are not recent OpenType fonts may work.
127+The first use of this option creates the matching predicate #1; the next
128+creates #2, and so on, up to a limit determined by host word size (32
129+or 64 on most hosts).
130+.TP
120131 .BR \-h ", " \-\^\-help
121132 Display a brief help message.
122133 .TP
134+.BR \-U "CFG\fR,\fP " "\-\^\-unicode-list=" CFG
135+Generate a list of Unicode characters and use that as a dictionary, before
136+and in addition to any others that may have been specified.
137+The generated dictionary contains 1112064 entries, one for every Unicode
138+code point excluding surrogates; the head of the entry is the single
139+character, and the tail is (by default) a nullary semicolon, or (if a
140+.I CFG
141+string has been specified) a nullary functor containing some information
142+about the character.
143+The
144+.I CFG
145+string should be some combination of the following characters, each of which
146+will cause the inclusion of something in the dictionary entries:
147+.B b
148+to include the Unicode block name for characters that have it, using block
149+names according to Unicode 6.2;
150+.B d
151+to include the decimal value of the Unicode code point; and
152+.B x
153+to include the hexadecimal value of the Unicode code point, with a
154+preceding \(lqU+\(rq.
155+.TP
123156 .BR \-V ", " \-\^\-version
124157 Display the version and license information.
125158 .
@@ -334,7 +367,7 @@
334367 parentheses, and will thus become the functor of a nullary node.
335368 The complete list of characters that have sugary implicit brackets, with
336369 the brackets they imply, is:
337-(;) (?) .!. ./. .=. .*. .@. [&] [,] [|]
370+(;) (?) .!. ./. .=. .*. .@. .#. [&] [,] [|]
338371 [<U+2FF0>] [<U+2FF1>] [<U+2FF4>] [<U+2FF5>] [<U+2FF6>] [<U+2FF7>]
339372 [<U+2FF8>] [<U+2FF9>] [<U+2FFA>] [<U+2FFB>]
340373 {<U+2FF2>} {<U+2FF3>}.
@@ -367,12 +400,14 @@
367400 .B -c
368401 command-line option makes it possibile to skip this transformation on
369402 input, or perform its inverse on output.
370-The list of replacements is: (anything) to (?); .anywhere. to ...; [and] to
371-[&]; [or] to [|]; .not. to .!.; .regex. to ./.; .equal. to .=.; [lr] to
372-[<U+2FF0>]; [tb] to [<U+2FF1>]; {lcr} to {<U+2FF2>}; {tcb} to {<U+2FF3>};
403+The list of replacements is:
404+(anything) to (?); .anywhere. to ...; .not. to .!.; .regex. to
405+./.; .equal. to .=.; .unord. to .*.; .assoc. to .@.; .user. to .#.;
406+[and] to [&]; [or] to [|]; [lr] to [<U+2FF0>]; [tb] to [<U+2FF1>];
373407 [enclose] to [<U+2FF4>]; [wrapu] to [<U+2FF5>]; [wrapd] to [<U+2FF6>];
374408 [wrapl] to [<U+2FF7>]; [wrapul] to [<U+2FF8>]; [wrapur] to [<U+2FF9>];
375-[wrapll] to [<U+2FFA>]; and [overlap] to [<U+2FFB>].
409+[wrapll] to [<U+2FFA>]; [overlap] to [<U+2FFB>]; {lcr} to {<U+2FF2>};
410+and {tcb} to {<U+2FF3>}.
376411 .
377412 .IP \(bu 4
378413 Total length, and number of consecutive nullary nodes (which are like
@@ -495,6 +530,25 @@
495530 parser and so an additional level of escaping may be necessary if
496531 backslash escapes are desired in a pattern.
497532 .IP \(bu 4
533+If the pattern is
534+.RI .#. "x"
535+then the head of
536+.I x
537+is parsed as a decimal index to select a user-defined matching predicate.
538+If
539+.I x
540+has no head, the head cannot be parsed by the C library's
541+.BR atoi (3)
542+function, or it can be parsed and produces a number that is zero or
543+negative, then the index is deemed to be equal to 1.
544+Then the user-defined predicate at that index is invoked.
545+If that many user-defined predicates have not been defined, then the
546+pattern matches nothing.
547+In this version, user-defined matching predicates always test the
548+Unicode character coverage of font files:
549+the match succeeds if and only if the head of the input exists and is a
550+single Unicode character covered by the font.
551+.IP \(bu 4
498552 Otherwise, the pattern matches the input if and only if its functor and
499553 arity are the same as the input's and all the children of the pattern match
500554 the corresponding children of the input.
@@ -610,7 +664,7 @@
610664 .
611665 .SH COPYRIGHT
612666 Copyright \(co
613-2012
667+2012, 2013
614668 Matthew Skala
615669 .PP
616670 This program is free software: you can redistribute it and/or modify
--- trunk/idsgrep/configure.ac (revision 385)
+++ trunk/idsgrep/configure.ac (revision 386)
@@ -168,8 +168,8 @@
168168 AC_CONFIG_HEADERS([config.h])
169169 AC_CONFIG_MACRO_DIR([m4])
170170 AC_REVISION([$Id: configure.ac 1015 2011-12-15 22:24:32Z mskala $])
171-AC_COPYRIGHT([Copyright (C) 2012 Matthew Skala])
172-AC_SUBST([release_date],["August 26, 2012"])
171+AC_COPYRIGHT([Copyright (C) 2012, 2013 Matthew Skala])
172+AC_SUBST([release_date],["March 3, 2013"])
173173 AM_SILENT_RULES
174174 #
175175 ############################################################################
Show on old repository browser