Yasuhiro Horimoto 2019-01-04 12:28:55 +0900 (Fri, 04 Jan 2019) Revision: 01a541474b06e62bd022b135a8e40a7a644770e9 https://github.com/groonga/groonga/commit/01a541474b06e62bd022b135a8e40a7a644770e9 Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+114 -111) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:04:44 +0900 (193a481a7) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:28:55 +0900 (bfdbe7b16) @@ -26964,27 +26964,6 @@ msgstr "組み込みトークナイザー" msgid "Here is a list of built-in tokenizers:" msgstr "以下は組み込みのトークナイザーのリストです。" -msgid "``TokenBigram``" -msgstr "" - -msgid "``TokenBigramSplitSymbol``" -msgstr "" - -msgid "``TokenBigramSplitSymbolAlpha``" -msgstr "" - -msgid "``TokenBigramSplitSymbolAlphaDigit``" -msgstr "" - -msgid "``TokenBigramIgnoreBlank``" -msgstr "" - -msgid "``TokenBigramIgnoreBlankSplitSymbol``" -msgstr "" - -msgid "``TokenBigramIgnoreBlankSplitSymbolAlpha``" -msgstr "" - msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``" msgstr "" @@ -27007,86 +26986,15 @@ msgid "``TokenRegexp``" msgstr "" msgid "" -"``TokenBigramIgnoreBlankSplitSymbol`` is similar to :ref:`token-bigram`. The " -"differences between them are the followings:" -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbol`` は :ref:`token-bigram` と似ています。違" -"いは次の通りです。" - -msgid "Blank handling" -msgstr "空白文字の扱い" - -msgid "Symbol handling" -msgstr "記号の扱い" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbol`` ignores white-spaces in continuous " -"symbols and non-ASCII characters." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbol`` は連続した記号と非ASCII文字の間の空白文" -"字を無視します。" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbol`` tokenizes symbols by bigram tokenize " -"method." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbol`` は記号をバイグラムでトークナイズしま" -"す。" - -msgid "" -"You can find difference of them by ``日 本 語 ! ! !`` text because it has " -"symbols and non-ASCII characters." -msgstr "" -"``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス" -"トは記号と非ASCII文字を両方含んでいるからです。" - -msgid "Here is a result by :ref:`token-bigram` :" -msgstr ":ref:`token-bigram` での実行結果です。" - -msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``:" -msgstr "``TokenBigramIgnoreBlankSplitSymbol`` の実行結果です。" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` is similar to :ref:`token-" -"bigram`. The differences between them are the followings:" -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は :ref:`token-bigram` と似ていま" -"す。違いは次の通りです。" - -msgid "Symbol and alphabet handling" -msgstr "記号とアルファベットの扱い" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` ignores white-spaces in " -"continuous symbols and non-ASCII characters." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は連続した記号と非ASCII文字の間の" -"空白文字を無視します。" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` tokenizes symbols and alphabets " -"by bigram tokenize method." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は記号とアルファベットをバイグラム" -"でトークナイズします。" - -msgid "" -"You can find difference of them by ``Hello 日 本 語 ! ! !`` text because it " -"has symbols and non-ASCII characters with white spaces and alphabets." -msgstr "" -"``Hello 日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜなら空白" -"文字入りの記号と非ASCII文字だけでなく、アルファベットも含んでいるからです。" - -msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``:" -msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` の実行結果です。" - -msgid "" "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to :ref:`token-" "bigram`. The differences between them are the followings:" msgstr "" "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似て" "います。違いは次の通りです。" +msgid "Blank handling" +msgstr "空白文字の扱い" + msgid "Symbol, alphabet and digit handling" msgstr "記号とアルファベットと数字の扱い" @@ -27115,6 +27023,9 @@ msgstr "" "ら、このテキストは空白文字入りの記号と非ASCII文字だけでなく、アルファベットと" "数字も含んでいるからです。" +msgid "Here is a result by :ref:`token-bigram` :" +msgstr ":ref:`token-bigram` での実行結果です。" + msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``:" msgstr "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` の実行結果です。" @@ -27200,6 +27111,9 @@ msgstr "" "入れ、テキストの最後にテキストの最後であるというマーク( ``U+FFF0`` )を入れ" "ます。" +msgid "``TokenBigram``" +msgstr "" + msgid "" "``TokenBigram`` is a bigram based tokenizer. It's recommended to use this " "tokenizer for most cases." @@ -27364,6 +27278,9 @@ msgstr "" "以下は ``TokenBigram`` が非ASCII文字にはトークナイズ方法としてバイグラムを使" "う例です。" +msgid "``TokenBigramIgnoreBlank``" +msgstr "" + msgid "" "``TokenBigramIgnoreBlank`` is similar to :ref:`token-bigram`. The difference " "between them is blank handling. ``TokenBigramIgnoreBlank`` ignores white-" @@ -27376,13 +27293,93 @@ msgstr "" msgid "``TokenBigramIgnoreBlank`` hasn't parameter::" msgstr "``TokenBigramIgnoreBlank`` には、引数がありません。" +msgid "" +"You can find difference of them by ``日 本 語 ! ! !`` text because it has " +"symbols and non-ASCII characters." +msgstr "" +"``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス" +"トは記号と非ASCII文字を両方含んでいるからです。" + msgid "Here is a result by ``TokenBigramIgnoreBlank``:" msgstr "``TokenBigramIgnoreBlank`` での実行結果です。" +msgid "``TokenBigramIgnoreBlankSplitSymbol``" +msgstr "" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbol`` is similar to :ref:`token-bigram`. The " +"differences between them are the followings:" +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbol`` は :ref:`token-bigram` と似ています。違" +"いは次の通りです。" + +msgid "Symbol handling" +msgstr "記号の扱い" + msgid "``TokenBigramIgnoreBlankSplitSymbol`` hasn't parameter::" msgstr "``TokenBigramIgnoreBlankSplitSymbol`` には、引数がありません。" msgid "" +"``TokenBigramIgnoreBlankSplitSymbol`` ignores white-spaces in continuous " +"symbols and non-ASCII characters." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbol`` は連続した記号と非ASCII文字の間の空白文" +"字を無視します。" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbol`` tokenizes symbols by bigram tokenize " +"method." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbol`` は記号をバイグラムでトークナイズしま" +"す。" + +msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbol``:" +msgstr "``TokenBigramIgnoreBlankSplitSymbol`` の実行結果です。" + +msgid "``TokenBigramIgnoreBlankSplitSymbolAlpha``" +msgstr "" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` is similar to :ref:`token-" +"bigram`. The differences between them are the followings:" +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は :ref:`token-bigram` と似ていま" +"す。違いは次の通りです。" + +msgid "Symbol and alphabet handling" +msgstr "記号とアルファベットの扱い" + +msgid "``TokenBigramIgnoreBlankSplitSymbolAlpha`` hasn't parameter::" +msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` には、引数がありません。" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` ignores white-spaces in " +"continuous symbols and non-ASCII characters." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は連続した記号と非ASCII文字の間の" +"空白文字を無視します。" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` tokenizes symbols and alphabets " +"by bigram tokenize method." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlpha`` は記号とアルファベットをバイグラム" +"でトークナイズします。" + +msgid "" +"You can find difference of them by ``Hello 日 本 語 ! ! !`` text because it " +"has symbols and non-ASCII characters with white spaces and alphabets." +msgstr "" +"``Hello 日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜなら空白" +"文字入りの記号と非ASCII文字だけでなく、アルファベットも含んでいるからです。" + +msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``:" +msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` の実行結果です。" + +msgid "``TokenBigramSplitSymbol``" +msgstr "" + +msgid "" "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The difference " "between them is symbol handling." msgstr "" @@ -27396,6 +27393,9 @@ msgid "``TokenBigramSplitSymbol`` tokenizes symbols by bigram tokenize method:" msgstr "" "``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラムを使います。" +msgid "``TokenBigramSplitSymbolAlpha``" +msgstr "" + msgid "" "``TokenBigramSplitSymbolAlpha`` is similar to :ref:`token-bigram`. The " "difference between them is symbol and alphabet handling." @@ -27413,6 +27413,9 @@ msgstr "" "``TokenBigramSplitSymbolAlpha`` は記号とアルファベットのトークナイズ方法にバ" "イグラムを使います。" +msgid "``TokenBigramSplitSymbolAlphaDigit``" +msgstr "" + msgid "" "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. The " "difference between them is symbol, alphabet and digit handling." @@ -28264,13 +28267,17 @@ msgid "``window_sum``" msgstr "" #~ msgid "" -#~ "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The " -#~ "difference between them is symbol handling. ``TokenBigramSplitSymbol`` " -#~ "tokenizes symbols by bigram tokenize method:" +#~ "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. " +#~ "The difference between them is symbol, alphabet and digit handling. " +#~ "``TokenBigramSplitSymbolAlphaDigit`` tokenizes symbols, alphabets and " +#~ "digits by bigram tokenize method. It means that all characters are " +#~ "tokenized by bigram tokenize method:" #~ msgstr "" -#~ "``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号" -#~ "の扱いです。 ``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラ" -#~ "ムを使います。" +#~ "``TokenBigramSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似ています。" +#~ "違いは記号とアルファベットと数字の扱いです。 " +#~ "``TokenBigramSplitSymbolAlphaDigit`` は記号とアルファベット数字のトークナ" +#~ "イズ方法にバイグラムを使います。つまり、すべての文字をバイグラムでトークナ" +#~ "イズします。" #~ msgid "" #~ "``TokenBigramSplitSymbolAlpha`` is similar to :ref:`token-bigram`. The " @@ -28283,14 +28290,10 @@ msgstr "" #~ "とアルファベットのトークナイズ方法にバイグラムを使います。" #~ msgid "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` is similar to :ref:`token-bigram`. " -#~ "The difference between them is symbol, alphabet and digit handling. " -#~ "``TokenBigramSplitSymbolAlphaDigit`` tokenizes symbols, alphabets and " -#~ "digits by bigram tokenize method. It means that all characters are " -#~ "tokenized by bigram tokenize method:" +#~ "``TokenBigramSplitSymbol`` is similar to :ref:`token-bigram`. The " +#~ "difference between them is symbol handling. ``TokenBigramSplitSymbol`` " +#~ "tokenizes symbols by bigram tokenize method:" #~ msgstr "" -#~ "``TokenBigramSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似ています。" -#~ "違いは記号とアルファベットと数字の扱いです。 " -#~ "``TokenBigramSplitSymbolAlphaDigit`` は記号とアルファベット数字のトークナ" -#~ "イズ方法にバイグラムを使います。つまり、すべての文字をバイグラムでトークナ" -#~ "イズします。" +#~ "``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号" +#~ "の扱いです。 ``TokenBigramSplitSymbol`` は記号のトークナイズ方法にバイグラ" +#~ "ムを使います。" Modified: doc/source/reference/tokenizers.rst (+0 -33) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 12:04:44 +0900 (35887342e) +++ doc/source/reference/tokenizers.rst 2019-01-04 12:28:55 +0900 (38dbc6c05) @@ -107,7 +107,6 @@ Built-in tokenizsers Here is a list of built-in tokenizers: - * ``TokenBigramIgnoreBlankSplitSymbolAlpha`` * ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` * ``TokenUnigram`` * ``TokenTrigram`` @@ -122,38 +121,6 @@ Here is a list of built-in tokenizers: tokenizers/* -.. _token-bigram-ignore-blank-split-symbol-alpha: - -``TokenBigramIgnoreBlankSplitSymbolAlpha`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``TokenBigramIgnoreBlankSplitSymbolAlpha`` is similar to -:ref:`token-bigram`. The differences between them are the followings: - - * Blank handling - * Symbol and alphabet handling - -``TokenBigramIgnoreBlankSplitSymbolAlpha`` ignores white-spaces in -continuous symbols and non-ASCII characters. - -``TokenBigramIgnoreBlankSplitSymbolAlpha`` tokenizes symbols and -alphabets by bigram tokenize method. - -You can find difference of them by ``Hello 日 本 語 ! ! !`` text because it -has symbols and non-ASCII characters with white spaces and alphabets. - -Here is a result by :ref:`token-bigram` : - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol-and-alphabet.log -.. tokenize TokenBigram "Hello 日 本 語 ! ! !" NormalizerAuto - -Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``: - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol-and-alphabet.log -.. tokenize TokenBigramIgnoreBlankSplitSymbolAlpha "Hello 日 本 語 ! ! !" NormalizerAuto - .. _token-bigram-ignore-blank-split-symbol-alpha-digit: ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` Added: doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha.rst (+51 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_bigram_ignore_blank_split_symbol_alpha.rst 2019-01-04 12:28:55 +0900 (ec662ef5c) @@ -0,0 +1,51 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +.. _token-bigram-ignore-blank-split-symbol-alpha: + +``TokenBigramIgnoreBlankSplitSymbolAlpha`` +========================================== + +Summary +------- + +``TokenBigramIgnoreBlankSplitSymbolAlpha`` is similar to +:ref:`token-bigram`. The differences between them are the followings: + + * Blank handling + * Symbol and alphabet handling + +Syntax +------ + +``TokenBigramIgnoreBlankSplitSymbolAlpha`` hasn't parameter:: + + TokenBigramIgnoreBlankSplitSymbolAlpha + +Usage +----- + +``TokenBigramIgnoreBlankSplitSymbolAlpha`` ignores white-spaces in +continuous symbols and non-ASCII characters. + +``TokenBigramIgnoreBlankSplitSymbolAlpha`` tokenizes symbols and +alphabets by bigram tokenize method. + +You can find difference of them by ``Hello 日 本 語 ! ! !`` text because it +has symbols and non-ASCII characters with white spaces and alphabets. + +Here is a result by :ref:`token-bigram` : + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-with-white-spaces-and-symbol-and-alphabet.log +.. tokenize TokenBigram "Hello 日 本 語 ! ! !" NormalizerAuto + +Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``: + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-bigram-ignore-blank-split-symbol-with-white-spaces-and-symbol-and-alphabet.log +.. tokenize TokenBigramIgnoreBlankSplitSymbolAlpha "Hello 日 本 語 ! ! !" NormalizerAuto -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/24c05a39/attachment-0001.html>