[Groonga-commit] groonga/groonga at 68ddc06 [master] doc: Separate from tokenizers page

Back to archive index
Yasuhiro Horimoto null+****@clear*****
Fri Jan 4 14:12:16 JST 2019


Yasuhiro Horimoto	2019-01-04 14:12:16 +0900 (Fri, 04 Jan 2019)

  Revision: 68ddc067e88d661f5f106efdc5f7d9bf44d09c0e
  https://github.com/groonga/groonga/commit/68ddc067e88d661f5f106efdc5f7d9bf44d09c0e

  Message:
    doc: Separate from tokenizers page

  Added files:
    doc/source/reference/tokenizers/token_unigram.rst
  Modified files:
    doc/locale/ja/LC_MESSAGES/reference.po
    doc/source/reference/tokenizers.rst

  Modified: doc/locale/ja/LC_MESSAGES/reference.po (+88 -59)
===================================================================
--- doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 12:43:09 +0900 (841cd0d66)
+++ doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 14:12:16 +0900 (ab7923212)
@@ -26964,12 +26964,6 @@ msgstr "組み込みトークナイザー"
 msgid "Here is a list of built-in tokenizers:"
 msgstr "以下は組み込みのトークナイザーのリストです。"
 
-msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``"
-msgstr ""
-
-msgid "``TokenUnigram``"
-msgstr ""
-
 msgid "``TokenTrigram``"
 msgstr ""
 
@@ -26986,59 +26980,6 @@ msgid "``TokenRegexp``"
 msgstr ""
 
 msgid ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to :ref:`token-"
-"bigram`. The differences between them are the followings:"
-msgstr ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似て"
-"います。違いは次の通りです。"
-
-msgid "Blank handling"
-msgstr "空白文字の扱い"
-
-msgid "Symbol, alphabet and digit handling"
-msgstr "記号とアルファベットと数字の扱い"
-
-msgid ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces in "
-"continuous symbols and non-ASCII characters."
-msgstr ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は連続した記号と非ASCII文字の"
-"間の空白文字を無視します。"
-
-msgid ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, alphabets "
-"and digits by bigram tokenize method. It means that all characters are "
-"tokenized by bigram tokenize method."
-msgstr ""
-"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は記号、アルファベット、数字"
-"をバイグラムでトークナイズします。つまり、すべての文字をバイグラムでトークナ"
-"イズします。"
-
-msgid ""
-"You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text because "
-"it has symbols and non-ASCII characters with white spaces, alphabets and "
-"digits."
-msgstr ""
-"``Hello 日 本 語 ! ! ! 777`` というテキストを使うと違いがわかります。なぜな"
-"ら、このテキストは空白文字入りの記号と非ASCII文字だけでなく、アルファベットと"
-"数字も含んでいるからです。"
-
-msgid "Here is a result by :ref:`token-bigram` :"
-msgstr ":ref:`token-bigram` での実行結果です。"
-
-msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``:"
-msgstr "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` の実行結果です。"
-
-msgid ""
-"``TokenUnigram`` is similar to :ref:`token-bigram`. The differences between "
-"them is token unit. :ref:`token-bigram` uses 2 characters per token. "
-"``TokenUnigram`` uses 1 character per token."
-msgstr ""
-"``TokenUnigram`` は :ref:`token-bigram` に似ています。違いはトークンの単位で"
-"す。 :ref:`token-bigram` は各トークンが2文字ですが、 ``TokenUnigram`` は各"
-"トークンが1文字です。"
-
-msgid ""
 "``TokenTrigram`` is similar to :ref:`token-bigram`. The differences between "
 "them is token unit. :ref:`token-bigram` uses 2 characters per token. "
 "``TokenTrigram`` uses 3 characters per token."
@@ -27300,6 +27241,9 @@ msgstr ""
 "``日 本 語 ! ! !`` というテキストを使うと違いがわかります。なぜならこのテキス"
 "トは記号と非ASCII文字を両方含んでいるからです。"
 
+msgid "Here is a result by :ref:`token-bigram` :"
+msgstr ":ref:`token-bigram` での実行結果です。"
+
 msgid "Here is a result by ``TokenBigramIgnoreBlank``:"
 msgstr "``TokenBigramIgnoreBlank`` での実行結果です。"
 
@@ -27313,6 +27257,9 @@ msgstr ""
 "``TokenBigramIgnoreBlankSplitSymbol`` は :ref:`token-bigram` と似ています。違"
 "いは次の通りです。"
 
+msgid "Blank handling"
+msgstr "空白文字の扱い"
+
 msgid "Symbol handling"
 msgstr "記号の扱い"
 
@@ -27376,10 +27323,51 @@ msgstr ""
 msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlpha``:"
 msgstr "``TokenBigramIgnoreBlankSplitSymbolAlpha`` の実行結果です。"
 
+msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``"
+msgstr ""
+
+msgid ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` is similar to :ref:`token-"
+"bigram`. The differences between them are the followings:"
+msgstr ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は :ref:`token-bigram` と似て"
+"います。違いは次の通りです。"
+
+msgid "Symbol, alphabet and digit handling"
+msgstr "記号とアルファベットと数字の扱い"
+
 msgid "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` hasn't parameter::"
 msgstr ""
 "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` には、引数がありません。"
 
+msgid ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` ignores white-spaces in "
+"continuous symbols and non-ASCII characters."
+msgstr ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は連続した記号と非ASCII文字の"
+"間の空白文字を無視します。"
+
+msgid ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` tokenizes symbols, alphabets "
+"and digits by bigram tokenize method. It means that all characters are "
+"tokenized by bigram tokenize method."
+msgstr ""
+"``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` は記号、アルファベット、数字"
+"をバイグラムでトークナイズします。つまり、すべての文字をバイグラムでトークナ"
+"イズします。"
+
+msgid ""
+"You can find difference of them by ``Hello 日 本 語 ! ! ! 777`` text because "
+"it has symbols and non-ASCII characters with white spaces, alphabets and "
+"digits."
+msgstr ""
+"``Hello 日 本 語 ! ! ! 777`` というテキストを使うと違いがわかります。なぜな"
+"ら、このテキストは空白文字入りの記号と非ASCII文字だけでなく、アルファベットと"
+"数字も含んでいるからです。"
+
+msgid "Here is a result by ``TokenBigramIgnoreBlankSplitSymbolAlphaDigit``:"
+msgstr "``TokenBigramIgnoreBlankSplitSymbolAlphaDigit`` の実行結果です。"
+
 msgid "``TokenBigramSplitSymbol``"
 msgstr ""
 
@@ -27788,6 +27776,47 @@ msgstr ""
 msgid "Outputs reading of token."
 msgstr "トークンの読みがなを出力します。"
 
+#, fuzzy
+msgid ""
+"``TokenTrigram`` is similar to :ref:`token-bigram`. The differences between "
+"them is token unit."
+msgstr ""
+"``TokenBigramSplitSymbol`` は :ref:`token-bigram` と似ています。違いは記号の"
+"扱いです。"
+
+#, fuzzy
+msgid "``TokenTrigram`` hasn't parameter::"
+msgstr "``TokenBigram`` には、引数がありません。"
+
+#, fuzzy
+msgid ""
+":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 "
+"characters per token as below example."
+msgstr ""
+"``TokenTrigram`` は :ref:`token-bigram` に似ています。違いはトークンの単位で"
+"す。 :ref:`token-bigram` は各トークンが2文字ですが、 ``TokenTrigram`` は各"
+"トークンが3文字です。"
+
+msgid "``TokenUnigram``"
+msgstr ""
+
+msgid ""
+"``TokenUnigram`` is similar to :ref:`token-bigram`. The differences between "
+"them is token unit."
+msgstr ""
+"``TokenUnigram`` は :ref:`token-bigram` と似ています。違いはトークンの単位で"
+"す。"
+
+msgid "``TokenUnigram`` hasn't parameter::"
+msgstr "``TokenUnigram`` には、引数がありません。"
+
+msgid ""
+":ref:`token-bigram` uses 2 characters per token. ``TokenUnigram`` uses 1 "
+"character per token as below example."
+msgstr ""
+":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように "
+"``TokenUnigram`` は各トークンが1文字です。"
+
 msgid "Tuning"
 msgstr "チューニング"
 

  Modified: doc/source/reference/tokenizers.rst (+0 -14)
===================================================================
--- doc/source/reference/tokenizers.rst    2019-01-04 12:43:09 +0900 (8be95a2f8)
+++ doc/source/reference/tokenizers.rst    2019-01-04 14:12:16 +0900 (5d3dda525)
@@ -107,7 +107,6 @@ Built-in tokenizsers
 
 Here is a list of built-in tokenizers:
 
-  * ``TokenUnigram``
   * ``TokenTrigram``
   * ``TokenDelimit``
   * ``TokenDelimitNull``
@@ -120,19 +119,6 @@ Here is a list of built-in tokenizers:
 
    tokenizers/*
 
-.. _token-unigram:
-
-``TokenUnigram``
-^^^^^^^^^^^^^^^^
-
-``TokenUnigram`` is similar to :ref:`token-bigram`. The differences
-between them is token unit. :ref:`token-bigram` uses 2 characters per
-token. ``TokenUnigram`` uses 1 character per token.
-
-.. groonga-command
-.. include:: ../example/reference/tokenizers/token-unigram.log
-.. tokenize TokenUnigram "100cents!!!" NormalizerAuto
-
 .. _token-trigram:
 
 ``TokenTrigram``

  Added: doc/source/reference/tokenizers/token_unigram.rst (+34 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/reference/tokenizers/token_unigram.rst    2019-01-04 14:12:16 +0900 (ea91a094a)
@@ -0,0 +1,34 @@
+.. -*- rst -*-
+
+.. highlightlang:: none
+
+.. groonga-command
+.. database: tokenizers
+
+.. _token-unigram:
+
+``TokenUnigram``
+================
+
+Summary
+-------
+
+``TokenUnigram`` is similar to :ref:`token-bigram`. The differences
+between them is token unit.
+
+Syntax
+------
+
+``TokenUnigram`` hasn't parameter::
+
+  TokenUnigram
+
+Usage
+-----
+
+:ref:`token-bigram` uses 2 characters per
+token. ``TokenUnigram`` uses 1 character per token as below example.
+
+.. groonga-command
+.. include:: ../../example/reference/tokenizers/token-unigram.log
+.. tokenize TokenUnigram "100cents!!!" NormalizerAuto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/deaf2003/attachment-0001.html>


More information about the Groonga-commit mailing list
Back to archive index