Yasuhiro Horimoto 2019-01-04 14:12:16 +0900 (Fri, 04 Jan 2019) Revision: 68ddc067e88d661f5f106efdc5f7d9bf44d09c0e https://github.com/groonga/groonga/commit/68ddc067e88d661f5f106efdc5f7d9bf44d09c0e Message: doc: Separate from tokenizers page Added files: doc/source/reference/tokenizers/token_unigram.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/tokenizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+88 -59) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 12:43:09 +0900 (841cd0d66) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-04 14:12:16 +0900 (ab7923212) @@ -26964,12 +26964,6 @@ msgstr "組み込みトークナイザー" msgid "Here is a list of built-in tokenizers:" msgstr "以下は組み込みのトークナイザーのリストです。" -msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``" -msgstr "" - -msgid "``TokenUnigram``" -msgstr "" - msgid "``TokenTrigram``" msgstr "" @@ -26986,59 +26980,6 @@ msgid "``TokenRegexp``" msgstr "" msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to :ref:`token-" -"bigram`. The differences between them are the followings:" -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似て" -"います。違いは次の通りです。" - -msgid "Blank handling" -msgstr "空白文字の扱い" - -msgid "Symbol, alphabet and digit handling" -msgstr "記号とアルファベットと数字の扱い" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces in " -"continuous symbols and non-ASCII characters." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は連続した記号と非ASCII文字の" -"間の空白文字を無視します。" - -msgid "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, alphabets " -"and digits by bigram tokenize method. It means that all characters are " -"tokenized by bigram tokenize method." -msgstr "" -"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は記号、アルファベット、数字" -"をバイグラムでトークナイズします。つまり、すべての文字をバイグラムでトークナ" -"イズします。" - -msgid "" -"You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text because " -"it has symbols and non-ASCII characters with white spaces, alphabets and " -"digits." -msgstr "" -"``Hello 日 本 語 ! ! ! 777`` というテキストを使うと違いがわかります。なぜな" -"ら、このテキストは空白文字入りの記号と非ASCII文字だけでなく、アルファベットと" -"数字も含んでいるからです。" - -msgid "Here is a result by :ref:`token-bigram` :" -msgstr ":ref:`token-bigram` での実行結果です。" - -msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``:" -msgstr "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` の実行結果です。" - -msgid "" -"``TokenUnigram`` is similar to :ref:`token-bigram`. The differences between " -"them is token unit. :ref:`token-bigram` uses 2 characters per token. " -"``TokenUnigram`` uses 1 character per token." -msgstr "" -"``TokenUnigram`` は :ref:`token-bigram` に似ています。違いはトークンの単位で" -"す。 :ref:`token-bigram` は各トークンが2文字ですが、 ``TokenUnigram`` は各" -"トークンが1文字です。" - -msgid "" "``TokenTrigram`` is similar to :ref:`token-bigram`. The differences between " "them is token unit. :ref:`token-bigram` uses 2 characters per token. " "``TokenTrigram`` uses 3 characters per token." @@ -27300,6 +27241,9 @@ msgstr "" "``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス" "トは記号と非ASCII文字を両方含んでいるからです。" +msgid "Here is a result by :ref:`token-bigram` :" +msgstr ":ref:`token-bigram` での実行結果です。" + msgid "Here is a result by ``TokenBigramIgnoreBlank``:" msgstr "``TokenBigramIgnoreBlank`` での実行結果です。" @@ -27313,6 +27257,9 @@ msgstr "" "``TokenBigramIgnoreBlankSplitSymbol`` は :ref:`token-bigram` と似ています。違" "いは次の通りです。" +msgid "Blank handling" +msgstr "空白文字の扱い" + msgid "Symbol handling" msgstr "記号の扱い" @@ -27376,10 +27323,51 @@ msgstr "" msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``:" msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` の実行結果です。" +msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``" +msgstr "" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to :ref:`token-" +"bigram`. The differences between them are the followings:" +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似て" +"います。違いは次の通りです。" + +msgid "Symbol, alphabet and digit handling" +msgstr "記号とアルファベットと数字の扱い" + msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` hasn't parameter::" msgstr "" "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` には、引数がありません。" +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces in " +"continuous symbols and non-ASCII characters." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は連続した記号と非ASCII文字の" +"間の空白文字を無視します。" + +msgid "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, alphabets " +"and digits by bigram tokenize method. It means that all characters are " +"tokenized by bigram tokenize method." +msgstr "" +"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は記号、アルファベット、数字" +"をバイグラムでトークナイズします。つまり、すべての文字をバイグラムでトークナ" +"イズします。" + +msgid "" +"You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text because " +"it has symbols and non-ASCII characters with white spaces, alphabets and " +"digits." +msgstr "" +"``Hello 日 本 語 ! ! ! 777`` というテキストを使うと違いがわかります。なぜな" +"ら、このテキストは空白文字入りの記号と非ASCII文字だけでなく、アルファベットと" +"数字も含んでいるからです。" + +msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``:" +msgstr "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` の実行結果です。" + msgid "``TokenBigramSplitSymbol``" msgstr "" @@ -27788,6 +27776,47 @@ msgstr "" msgid "Outputs reading of token." msgstr "トークンの読みがなを出力します。" +#, fuzzy +msgid "" +"``TokenTrigram`` is similar to :ref:`token-bigram`. The differences between " +"them is token unit." +msgstr "" +"``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号の" +"扱いです。" + +#, fuzzy +msgid "``TokenTrigram`` hasn't parameter::" +msgstr "``TokenBigram`` には、引数がありません。" + +#, fuzzy +msgid "" +":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 " +"characters per token as below example." +msgstr "" +"``TokenTrigram`` は :ref:`token-bigram` に似ています。違いはトークンの単位で" +"す。 :ref:`token-bigram` は各トークンが2文字ですが、 ``TokenTrigram`` は各" +"トークンが3文字です。" + +msgid "``TokenUnigram``" +msgstr "" + +msgid "" +"``TokenUnigram`` is similar to :ref:`token-bigram`. The differences between " +"them is token unit." +msgstr "" +"``TokenUnigram`` は :ref:`token-bigram` と似ています。違いはトークンの単位で" +"す。" + +msgid "``TokenUnigram`` hasn't parameter::" +msgstr "``TokenUnigram`` には、引数がありません。" + +msgid "" +":ref:`token-bigram` uses 2 characters per token. ``TokenUnigram`` uses 1 " +"character per token as below example." +msgstr "" +":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように " +"``TokenUnigram`` は各トークンが1文字です。" + msgid "Tuning" msgstr "チューニング" Modified: doc/source/reference/tokenizers.rst (+0 -14) =================================================================== --- doc/source/reference/tokenizers.rst 2019-01-04 12:43:09 +0900 (8be95a2f8) +++ doc/source/reference/tokenizers.rst 2019-01-04 14:12:16 +0900 (5d3dda525) @@ -107,7 +107,6 @@ Built-in tokenizsers Here is a list of built-in tokenizers: - * ``TokenUnigram`` * ``TokenTrigram`` * ``TokenDelimit`` * ``TokenDelimitNull`` @@ -120,19 +119,6 @@ Here is a list of built-in tokenizers: tokenizers/* -.. _token-unigram: - -``TokenUnigram`` -^^^^^^^^^^^^^^^^ - -``TokenUnigram`` is similar to :ref:`token-bigram`. The differences -between them is token unit. :ref:`token-bigram` uses 2 characters per -token. ``TokenUnigram`` uses 1 character per token. - -.. groonga-command -.. include:: ../example/reference/tokenizers/token-unigram.log -.. tokenize TokenUnigram "100cents!!!" NormalizerAuto - .. _token-trigram: ``TokenTrigram`` Added: doc/source/reference/tokenizers/token_unigram.rst (+34 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/tokenizers/token_unigram.rst 2019-01-04 14:12:16 +0900 (ea91a094a) @@ -0,0 +1,34 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: tokenizers + +.. _token-unigram: + +``TokenUnigram`` +================ + +Summary +------- + +``TokenUnigram`` is similar to :ref:`token-bigram`. The differences +between them is token unit. + +Syntax +------ + +``TokenUnigram`` hasn't parameter:: + + TokenUnigram + +Usage +----- + +:ref:`token-bigram` uses 2 characters per +token. ``TokenUnigram`` uses 1 character per token as below example. + +.. groonga-command +.. include:: ../../example/reference/tokenizers/token-unigram.log +.. tokenize TokenUnigram "100cents!!!" NormalizerAuto -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/deaf2003/attachment-0001.html>