Yasuhiro Horimoto 2019-01-04 15:26:16 +0900 (Fri, 04 Jan 2019) Revision: a77d1ecec59e4cfe88d044208890e4000b90a6f8 https://github.com/groonga/groonga/commit/a77d1ecec59e4cfe88d044208890e4000b90a6f8 Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_regexp.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+73 -70) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 14:57:46 +0900 (c13d71c72) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 15:26:16 +0900 (c3bf527d6) @@ -26964,76 +26964,6 @@ msgstr "組み込みトークナイザー" msgid "Here is a list of built-in tokenizers:" msgstr "以下は組み込みのトークナイザーのリストです。" -msgid "``TokenDelimitNull``" -msgstr "" - -msgid "``TokenRegexp``" -msgstr "" - -msgid "" -"``TokenDelimitNull`` is similar to :ref:`token-delimit`. The difference " -"between them is separator character. :ref:`token-delimit` uses space " -"character (``U+0020``) but ``TokenDelimitNull`` uses NUL character (``U" -"+0000``)." -msgstr "" -"``TokenDelimitNull`` は :ref:`token-delimit` に似ています。違いは区切り文字で" -"す。 :ref:`token-delimit` は空白文字( ``U+0020`` )を使いますが、 " -"``TokenDelimitNull`` はNUL文字( ``U+0000`` )を使います。" - -msgid "``TokenDelimitNull`` is also suitable for tag text." -msgstr "``TokenDelimitNull`` もタグテキストに適切です。" - -msgid "Here is an example of ``TokenDelimitNull``:" -msgstr "以下は ``TokenDelimitNull`` の例です。" - -msgid "This tokenizer is experimental. Specification may be changed." -msgstr "このトークナイザーは実験的です。仕様が変わる可能性があります。" - -msgid "" -"This tokenizer can be used only with UTF-8. You can't use this tokenizer " -"with EUC-JP, Shift_JIS and so on." -msgstr "" -"このトークナイザーはUTF-8でしか使えません。EUC-JPやShift_JISなどと一緒には使" -"えません。" - -msgid "" -"``TokenRegexp`` is a tokenizer for supporting regular expression search by " -"index." -msgstr "" -"``TokenRegexp`` はインデックスを使った正規表現検索をサポートするトークナイ" -"ザーです。" - -msgid "" -"In general, regular expression search is evaluated as sequential search. But " -"the following cases can be evaluated as index search:" -msgstr "" -"一般的に、正規表現検索は逐次検索で実行します。しかし、次のケースはインデック" -"スを使って検索できます。" - -msgid "Literal only case such as ``hello``" -msgstr "``hello`` のようにリテラルしかないケース" - -msgid "The beginning of text and literal case such as ``\\A/home/alice``" -msgstr "" -"``\\A/home/alice`` のようにテキストの最初でのマッチとリテラルのみのケース" - -msgid "The end of text and literal case such as ``\\.txt\\z``" -msgstr "``\\.txt\\z`` のようにテキストの最後でのマッチとリテラルのみのケース" - -msgid "In most cases, index search is faster than sequential search." -msgstr "" -"多くのケースでは、逐次検索よりもインデックスを使った検索の方が高速です。" - -msgid "" -"``TokenRegexp`` is based on bigram tokenize method. ``TokenRegexp`` adds the " -"beginning of text mark (``U+FFEF``) at the begging of text and the end of " -"text mark (``U+FFF0``) to the end of text when you index text:" -msgstr "" -"``TokenRegexp`` はベースはバイグラムを使います。 ``TokenRegexp`` は、インデッ" -"クス時に、テキストの先頭にテキストの先頭であるというマーク( ``U+FFEF`` )を" -"入れ、テキストの最後にテキストの最後であるというマーク( ``U+FFF0`` )を入れ" -"ます。" - msgid "``TokenBigram``" msgstr "" @@ -27598,9 +27528,28 @@ msgstr "正規表現を使って、トークンを分割します。" msgid ":doc:`../commands/tokenize`" msgstr "" +msgid "``TokenDelimitNull``" +msgstr "" + +msgid "" +"``TokenDelimitNull`` is similar to :ref:`token-delimit`. The difference " +"between them is separator character. :ref:`token-delimit` uses space " +"character (``U+0020``) but ``TokenDelimitNull`` uses NUL character (``U" +"+0000``)." +msgstr "" +"``TokenDelimitNull`` は :ref:`token-delimit` に似ています。違いは区切り文字で" +"す。 :ref:`token-delimit` は空白文字( ``U+0020`` )を使いますが、 " +"``TokenDelimitNull`` はNUL文字( ``U+0000`` )を使います。" + msgid "``TokenDelimitNull`` hasn't parameter::" msgstr "``TokenDelimitNull`` には、引数がありません。" +msgid "``TokenDelimitNull`` is also suitable for tag text." +msgstr "``TokenDelimitNull`` もタグテキストに適切です。" + +msgid "Here is an example of ``TokenDelimitNull``:" +msgstr "以下は ``TokenDelimitNull`` の例です。" + msgid "``TokenMecab``" msgstr "" @@ -27767,6 +27716,60 @@ msgstr "" msgid "Outputs reading of token." msgstr "トークンの読みがなを出力します。" +msgid "``TokenRegexp``" +msgstr "" + +msgid "This tokenizer is experimental. Specification may be changed." +msgstr "このトークナイザーは実験的です。仕様が変わる可能性があります。" + +msgid "" +"This tokenizer can be used only with UTF-8. You can't use this tokenizer " +"with EUC-JP, Shift_JIS and so on." +msgstr "" +"このトークナイザーはUTF-8でしか使えません。EUC-JPやShift_JISなどと一緒には使" +"えません。" + +msgid "" +"``TokenRegexp`` is a tokenizer for supporting regular expression search by " +"index." +msgstr "" +"``TokenRegexp`` はインデックスを使った正規表現検索をサポートするトークナイ" +"ザーです。" + +msgid "``TokenRegexp`` hasn't parameter::" +msgstr "``TokenRegexp`` には、引数がありません。" + +msgid "" +"In general, regular expression search is evaluated as sequential search. But " +"the following cases can be evaluated as index search:" +msgstr "" +"一般的に、正規表現検索は逐次検索で実行します。しかし、次のケースはインデック" +"スを使って検索できます。" + +msgid "Literal only case such as ``hello``" +msgstr "``hello`` のようにリテラルしかないケース" + +msgid "The beginning of text and literal case such as ``\\A/home/alice``" +msgstr "" +"``\\A/home/alice`` のようにテキストの最初でのマッチとリテラルのみのケース" + +msgid "The end of text and literal case such as ``\\.txt\\z``" +msgstr "``\\.txt\\z`` のようにテキストの最後でのマッチとリテラルのみのケース" + +msgid "In most cases, index search is faster than sequential search." +msgstr "" +"多くのケースでは、逐次検索よりもインデックスを使った検索の方が高速です。" + +msgid "" +"``TokenRegexp`` is based on bigram tokenize method. ``TokenRegexp`` adds the " +"beginning of text mark (``U+FFEF``) at the begging of text and the end of " +"text mark (``U+FFF0``) to the end of text when you index text:" +msgstr "" +"``TokenRegexp`` はベースはバイグラムを使います。 ``TokenRegexp`` は、インデッ" +"クス時に、テキストの先頭にテキストの先頭であるというマーク( ``U+FFEF`` )を" +"入れ、テキストの最後にテキストの最後であるというマーク( ``U+FFF0`` )を入れ" +"ます。" + msgid "``TokenTrigram``" msgstr "" Modified: doc/source/reference/tokenizers.rst (+0 -39) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 14:57:46 +0900 (0d92a6602) +++ doc/source/reference/tokenizers.rst 2019-01-04 15:26:16 +0900 (b3f281133) @@ -107,47 +107,8 @@ Built-in tokenizsers Here is a list of built-in tokenizers: - * ``TokenRegexp`` - .. toctree:: :maxdepth: 1 :glob: tokenizers/* - -.. _token-regexp: - -``TokenRegexp`` -^^^^^^^^^^^^^^^ - -.. versionadded:: 5.0.1 - -.. caution:: - - This tokenizer is experimental. Specification may be changed. - -.. caution:: - - This tokenizer can be used only with UTF-8. You can't use this - tokenizer with EUC-JP, Shift_JIS and so on. - -``TokenRegexp`` is a tokenizer for supporting regular expression -search by index. - -In general, regular expression search is evaluated as sequential -search. But the following cases can be evaluated as index search: - - * Literal only case such as ``hello`` - * The beginning of text and literal case such as ``\A/home/alice`` - * The end of text and literal case such as ``\.txt\z`` - -In most cases, index search is faster than sequential search. - -``TokenRegexp`` is based on bigram tokenize method. ``TokenRegexp`` -adds the beginning of text mark (``U+FFEF``) at the begging of text -and the end of text mark (``U+FFF0``) to the end of text when you -index text: - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-regexp-add.log -.. tokenize TokenRegexp "/home/alice/test.txt" NormalizerAuto --mode ADD Added: doc/source/reference/tokenizers/token_regexp.rst (+56 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_regexp.rst 2019-01-04 15:26:16 +0900 (3adf9b38f) @@ -0,0 +1,56 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +.. _token-regexp: + +``TokenRegexp`` +=============== + +Summary +------- + +.. versionadded:: 5.0.1 + +.. caution:: + + This tokenizer is experimental. Specification may be changed. + +.. caution:: + + This tokenizer can be used only with UTF-8. You can't use this + tokenizer with EUC-JP, Shift_JIS and so on. + +``TokenRegexp`` is a tokenizer for supporting regular expression +search by index. + +Syntax +------ + +``TokenRegexp`` hasn't parameter:: + + TokenRegexp + +Usage +----- + +In general, regular expression search is evaluated as sequential +search. But the following cases can be evaluated as index search: + + * Literal only case such as ``hello`` + * The beginning of text and literal case such as ``\A/home/alice`` + * The end of text and literal case such as ``\.txt\z`` + +In most cases, index search is faster than sequential search. + +``TokenRegexp`` is based on bigram tokenize method. ``TokenRegexp`` +adds the beginning of text mark (``U+FFEF``) at the begging of text +and the end of text mark (``U+FFF0``) to the end of text when you +index text: + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-regexp-add.log +.. tokenize TokenRegexp "/home/alice/test.txt" NormalizerAuto --mode ADD -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/e28fd0df/attachment-0001.html>