Yasuhiro Horimoto 2019-01-04 12:43:09 +0900 (Fri, 04 Jan 2019) Revision: 72be2250bedb7b315e6542b3faf8728ece09f311 https://github.com/groonga/groonga/commit/72be2250bedb7b315e6542b3faf8728ece09f311 Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha_digit.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+4 -32) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:28:55 +0900 (bfdbe7b16) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:43:09 +0900 (841cd0d66) @@ -27376,6 +27376,10 @@ msgstr "" msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``:" msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` の実行結果です。" +msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` hasn't parameter::" +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` には、引数がありません。" + msgid "``TokenBigramSplitSymbol``" msgstr "" @@ -28265,35 +28269,3 @@ msgstr "" msgid "``window_sum``" msgstr "" - -#~ msgid "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. " -#~ "The difference between them is symbol, alphabet and digit handling. " -#~ "``TokenBigramSplitSymbolAlphaDigit`` tokenizes symbols, alphabets and " -#~ "digits by bigram tokenize method. It means that all characters are " -#~ "tokenized by bigram tokenize method:" -#~ msgstr "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似ています。" -#~ "違いは記号とアルファベットと数字の扱いです。 " -#~ "``TokenBigramSplitSymbolAlphaDigit`` は記号とアルファベット数字のトークナ" -#~ "イズ方法にバイグラムを使います。つまり、すべての文字をバイグラムでトークナ" -#~ "イズします。" - -#~ msgid "" -#~ "``TokenBigramSplitSymbolAlpha`` is similar to :ref:`token-bigram`. The " -#~ "difference between them is symbol and alphabet handling. " -#~ "``TokenBigramSplitSymbolAlpha`` tokenizes symbols and alphabets by bigram " -#~ "tokenize method:" -#~ msgstr "" -#~ "``TokenBigramSplitSymbolAlpha`` は :ref:`token-bigram` と似ています。違い" -#~ "は記号とアルファベットの扱いです。 ``TokenBigramSplitSymbolAlpha`` は記号" -#~ "とアルファベットのトークナイズ方法にバイグラムを使います。" - -#~ msgid "" -#~ "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The " -#~ "difference between them is symbol handling. ``TokenBigramSplitSymbol`` " -#~ "tokenizes symbols by bigram tokenize method:" -#~ msgstr "" -#~ "``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号" -#~ "の扱いです。 ``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラ" -#~ "ムを使います。" Modified: doc/source/reference/tokenizers.rst (+0 -35) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 12:28:55 +0900 (38dbc6c05) +++ doc/source/reference/tokenizers.rst 2019-01-04 12:43:09 +0900 (8be95a2f8) @@ -107,7 +107,6 @@ Built-in tokenizsers Here is a list of built-in tokenizers: - * ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` * ``TokenUnigram`` * ``TokenTrigram`` * ``TokenDelimit`` @@ -121,40 +120,6 @@ Here is a list of built-in tokenizers: tokenizers/* -.. _token-bigram-ignore-blank-split-symbol-alpha-digit: - -``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to -:ref:`token-bigram`. The differences between them are the followings: - - * Blank handling - * Symbol, alphabet and digit handling - -``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces -in continuous symbols and non-ASCII characters. - -``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, -alphabets and digits by bigram tokenize method. It means that all -characters are tokenized by bigram tokenize method. - -You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text -because it has symbols and non-ASCII characters with white spaces, -alphabets and digits. - -Here is a result by :ref:`token-bigram` : - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol-and-alphabet-and-digit.log -.. tokenize TokenBigram "Hello 日 本 語 ! ! ! 777" NormalizerAuto - -Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``: - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol-and-alphabet-digit.log -.. tokenize TokenBigramIgnoreBlankSplitSymbolAlphaDigit "Hello 日 本 語 ! ! ! 777" NormalizerAuto - .. _token-unigram: ``TokenUnigram`` Added: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha_digit.rst (+53 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha_digit.rst 2019-01-04 12:43:09 +0900 (87df21eae) @@ -0,0 +1,53 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +.. _token-bigram-ignore-blank-split-symbol-alpha-digit: + +``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` +=============================================== + +Summary +------- + +``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to +:ref:`token-bigram`. The differences between them are the followings: + + * Blank handling + * Symbol, alphabet and digit handling + +Syntax +------ + +``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` hasn't parameter:: + + TokenBigramIgnoreBlankSplitSymbolAlphaDigit + +Usage +----- + +``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces +in continuous symbols and non-ASCII characters. + +``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, +alphabets and digits by bigram tokenize method. It means that all +characters are tokenized by bigram tokenize method. + +You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text +because it has symbols and non-ASCII characters with white spaces, +alphabets and digits. + +Here is a result by :ref:`token-bigram` : + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol-and-alphabet-and-digit.log +.. tokenize TokenBigram "Hello 日 本 語 ! ! ! 777" NormalizerAuto + +Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``: + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol-and-alphabet-digit.log +.. tokenize TokenBigramIgnoreBlankSplitSymbolAlphaDigit "Hello 日 本 語 ! ! ! 777" NormalizerAuto -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/18d9ccdc/attachment-0001.html>