Yasuhiro Horimoto 2019-01-04 11:25:36 +0900 (Fri, 04 Jan 2019) Revision: 6c12a8cc25962d6287fefb844acdf66f75769a18 https://github.com/groonga/groonga/commit/6c12a8cc25962d6287fefb844acdf66f75769a18 Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_bigram_ignore_blank.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+41 -38) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 11:01:14 +0900 (589301d59) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 11:25:36 +0900 (065072506) @@ -27007,28 +27007,6 @@ msgid "``TokenRegexp``" msgstr "" msgid "" -"``TokenBigramIgnoreBlank`` is similar to :ref:`token-bigram`. The difference " -"between them is blank handling. ``TokenBigramIgnoreBlank`` ignores white-" -"spaces in continuous symbols and non-ASCII characters." -msgstr "" -"``TokenBigramIgnoreBlank`` は :ref:`token-bigram` と似ています。違いは空白文" -"字の扱いです。 ``TokenBigramIgnoreBlank`` は連続する記号と非ASCII文字の間にあ" -"る空白文字を無視します。" - -msgid "" -"You can find difference of them by ``日 本 語 ! ! !`` text because it has " -"symbols and non-ASCII characters." -msgstr "" -"``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス" -"トは記号と非ASCII文字を両方含んでいるからです。" - -msgid "Here is a result by :ref:`token-bigram` :" -msgstr ":ref:`token-bigram` での実行結果です。" - -msgid "Here is a result by ``TokenBigramIgnoreBlank``:" -msgstr "``TokenBigramIgnoreBlank`` での実行結果です。" - -msgid "" "``TokenBigramIgnoreBlankSplitSymbol`` is similar to :ref:`token-bigram`. The " "differences between them are the followings:" msgstr "" @@ -27055,6 +27033,16 @@ msgstr "" "``TokenBigramIgnoreBlankSplitSymbol`` は記号をバイグラムでトークナイズしま" "す。" +msgid "" +"You can find difference of them by ``日 本 語 ! ! !`` text because it has " +"symbols and non-ASCII characters." +msgstr "" +"``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス" +"トは記号と非ASCII文字を両方含んでいるからです。" + +msgid "Here is a result by :ref:`token-bigram` :" +msgstr ":ref:`token-bigram` での実行結果です。" + msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``:" msgstr "``TokenBigramIgnoreBlankSplitSymbol`` の実行結果です。" @@ -27377,6 +27365,21 @@ msgstr "" "う例です。" msgid "" +"``TokenBigramIgnoreBlank`` is similar to :ref:`token-bigram`. The difference " +"between them is blank handling. ``TokenBigramIgnoreBlank`` ignores white-" +"spaces in continuous symbols and non-ASCII characters." +msgstr "" +"``TokenBigramIgnoreBlank`` は :ref:`token-bigram` と似ています。違いは空白文" +"字の扱いです。 ``TokenBigramIgnoreBlank`` は連続する記号と非ASCII文字の間にあ" +"る空白文字を無視します。" + +msgid "``TokenBigramIgnoreBlank`` hasn't parameter::" +msgstr "``TokenBigramIgnoreBlank`` には、引数がありません。" + +msgid "Here is a result by ``TokenBigramIgnoreBlank``:" +msgstr "``TokenBigramIgnoreBlank`` での実行結果です。" + +msgid "" "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The difference " "between them is symbol handling." msgstr "" @@ -28258,17 +28261,13 @@ msgid "``window_sum``" msgstr "" #~ msgid "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. " -#~ "The difference between them is symbol, alphabet and digit handling. " -#~ "``TokenBigramSplitSymbolAlphaDigit`` tokenizes symbols, alphabets and " -#~ "digits by bigram tokenize method. It means that all characters are " -#~ "tokenized by bigram tokenize method:" +#~ "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The " +#~ "difference between them is symbol handling. ``TokenBigramSplitSymbol`` " +#~ "tokenizes symbols by bigram tokenize method:" #~ msgstr "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似ています。" -#~ "違いは記号とアルファベットと数字の扱いです。 " -#~ "``TokenBigramSplitSymbolAlphaDigit`` は記号とアルファベット数字のトークナ" -#~ "イズ方法にバイグラムを使います。つまり、すべての文字をバイグラムでトークナ" -#~ "イズします。" +#~ "``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号" +#~ "の扱いです。 ``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラ" +#~ "ムを使います。" #~ msgid "" #~ "``TokenBigramSplitSymbolAlpha`` is similar to :ref:`token-bigram`. The " @@ -28281,10 +28280,14 @@ msgstr "" #~ "とアルファベットのトークナイズ方法にバイグラムを使います。" #~ msgid "" -#~ "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The " -#~ "difference between them is symbol handling. ``TokenBigramSplitSymbol`` " -#~ "tokenizes symbols by bigram tokenize method:" +#~ "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. " +#~ "The difference between them is symbol, alphabet and digit handling. " +#~ "``TokenBigramSplitSymbolAlphaDigit`` tokenizes symbols, alphabets and " +#~ "digits by bigram tokenize method. It means that all characters are " +#~ "tokenized by bigram tokenize method:" #~ msgstr "" -#~ "``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号" -#~ "の扱いです。 ``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラ" -#~ "ムを使います。" +#~ "``TokenBigramSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似ています。" +#~ "違いは記号とアルファベットと数字の扱いです。 " +#~ "``TokenBigramSplitSymbolAlphaDigit`` は記号とアルファベット数字のトークナ" +#~ "イズ方法にバイグラムを使います。つまり、すべての文字をバイグラムでトークナ" +#~ "イズします。" Modified: doc/source/reference/tokenizers.rst (+0 -24) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 11:01:14 +0900 (e3d97ec2f) +++ doc/source/reference/tokenizers.rst 2019-01-04 11:25:36 +0900 (bd16790cf) @@ -128,30 +128,6 @@ Here is a list of built-in tokenizers: tokenizers/* -.. _token-bigram-ignore-blank: - -``TokenBigramIgnoreBlank`` -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``TokenBigramIgnoreBlank`` is similar to :ref:`token-bigram`. The -difference between them is blank handling. ``TokenBigramIgnoreBlank`` -ignores white-spaces in continuous symbols and non-ASCII characters. - -You can find difference of them by ``日 本 語 ! ! !`` text because it -has symbols and non-ASCII characters. - -Here is a result by :ref:`token-bigram` : - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-with-white-spaces.log -.. tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto - -Here is a result by ``TokenBigramIgnoreBlank``: - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-ignore-blank-with-white-spaces.log -.. tokenize TokenBigramIgnoreBlank "日 本 語 ! ! !" NormalizerAuto - .. _token-bigram-ignore-blank-split-symbol: ``TokenBigramIgnoreBlankSplitSymbol`` Added: doc/source/reference/tokenizers/token_bigram_ignore_blank.rst (+41 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_bigram_ignore_blank.rst 2019-01-04 11:25:36 +0900 (1b1617dd0) @@ -0,0 +1,41 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +``TokenBigramIgnoreBlank`` +========================== + +Summary +------- + +``TokenBigramIgnoreBlank`` is similar to :ref:`token-bigram`. The +difference between them is blank handling. ``TokenBigramIgnoreBlank`` +ignores white-spaces in continuous symbols and non-ASCII characters. + +Syntax +------ + +``TokenBigramIgnoreBlank`` hasn't parameter:: + + TokenBigramIgnoreBlank + +Usage +----- + +You can find difference of them by ``日 本 語 ! ! !`` text because it +has symbols and non-ASCII characters. + +Here is a result by :ref:`token-bigram` : + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-with-white-spaces.log +.. tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto + +Here is a result by ``TokenBigramIgnoreBlank``: + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-ignore-blank-with-white-spaces.log +.. tokenize TokenBigramIgnoreBlank "日 本 語 ! ! !" NormalizerAuto -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/2656dbb9/attachment-0001.html>