Yasuhiro Horimoto 2019-01-04 12:04:44 +0900 (Fri, 04 Jan 2019) Revision: bb64d86a636bbfb63e68090210ff72ad6b1b1d18 https://github.com/groonga/groonga/commit/bb64d86a636bbfb63e68090210ff72ad6b1b1d18 Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+3 -0) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 11:45:34 +0900 (065072506) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:04:44 +0900 (193a481a7) @@ -27379,6 +27379,9 @@ msgstr "``TokenBigramIgnoreBlank`` には、引数がありません。" msgid "Here is a result by ``TokenBigramIgnoreBlank``:" msgstr "``TokenBigramIgnoreBlank`` での実行結果です。" +msgid "``TokenBigramIgnoreBlankSplitSymbol`` hasn't parameter::" +msgstr "``TokenBigramIgnoreBlankSplitSymbol`` には、引数がありません。" + msgid "" "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The difference " "between them is symbol handling." Modified: doc/source/reference/tokenizers.rst (+0 -38) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 11:45:34 +0900 (bd16790cf) +++ doc/source/reference/tokenizers.rst 2019-01-04 12:04:44 +0900 (35887342e) @@ -107,12 +107,6 @@ Built-in tokenizsers Here is a list of built-in tokenizers: - * ``TokenBigram`` - * ``TokenBigramSplitSymbol`` - * ``TokenBigramSplitSymbolAlpha`` - * ``TokenBigramSplitSymbolAlphaDigit`` - * ``TokenBigramIgnoreBlank`` - * ``TokenBigramIgnoreBlankSplitSymbol`` * ``TokenBigramIgnoreBlankSplitSymbolAlpha`` * ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` * ``TokenUnigram`` @@ -128,38 +122,6 @@ Here is a list of built-in tokenizers: tokenizers/* -.. _token-bigram-ignore-blank-split-symbol: - -``TokenBigramIgnoreBlankSplitSymbol`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``TokenBigramIgnoreBlankSplitSymbol`` is similar to -:ref:`token-bigram`. The differences between them are the followings: - - * Blank handling - * Symbol handling - -``TokenBigramIgnoreBlankSplitSymbol`` ignores white-spaces in -continuous symbols and non-ASCII characters. - -``TokenBigramIgnoreBlankSplitSymbol`` tokenizes symbols by bigram -tokenize method. - -You can find difference of them by ``日 本 語 ! ! !`` text because it -has symbols and non-ASCII characters. - -Here is a result by :ref:`token-bigram` : - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol.log -.. tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto - -Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``: - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol.log -.. tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto - .. _token-bigram-ignore-blank-split-symbol-alpha: ``TokenBigramIgnoreBlankSplitSymbolAlpha`` Added: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol.rst (+51 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol.rst 2019-01-04 12:04:44 +0900 (5e86071a9) @@ -0,0 +1,51 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +.. _token-bigram-ignore-blank-split-symbol: + +``TokenBigramIgnoreBlankSplitSymbol`` +===================================== + +Summary +------- + +``TokenBigramIgnoreBlankSplitSymbol`` is similar to +:ref:`token-bigram`. The differences between them are the followings: + + * Blank handling + * Symbol handling + +Syntax +------ + +``TokenBigramIgnoreBlankSplitSymbol`` hasn't parameter:: + + TokenBigramIgnoreBlankSplitSymbol + +Usage +----- + +``TokenBigramIgnoreBlankSplitSymbol`` ignores white-spaces in +continuous symbols and non-ASCII characters. + +``TokenBigramIgnoreBlankSplitSymbol`` tokenizes symbols by bigram +tokenize method. + +You can find difference of them by ``日 本 語 ! ! !`` text because it +has symbols and non-ASCII characters. + +Here is a result by :ref:`token-bigram` : + +.. groonga-command +.. include:: ../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol.log +.. tokenize TokenBigram "日 本 語 ! ! !" NormalizerAuto + +Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``: + +.. groonga-command +.. include:: ../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol.log +.. tokenize TokenBigramIgnoreBlankSplitSymbol "日 本 語 ! ! !" NormalizerAuto -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/a7974847/attachment-0001.html>